Productive vocabulary knowledge and evaluation of esl writing in corpus-based language learning
A limited time offer! Get a custom sample essay written according to your requirements urgent 3h delivery guaranteedOrder Now
In recent years, vocabulary has drawn attention in ESL (English as a Second Language) and EFL (English as a Foreign Language) teaching and research (Bogaards & Laufer, 2004; Carter, 1998; Nation, 1990, 2001, 2008; Coady & Huckin, 1997; Schmitt & McCarthy, 1997; Schmitt, 2000; Zimmerman, 2009). With respect to the nature of vocabulary knowledge, research has identified two types of knowledge: receptive knowledge and productive knowledge.1 Nation (2001) described the dual characteristics vocabulary knowledge as follows:
[R]eceptive vocabulary use involves perceiving the form of a word while listening or reading and retrieving its meaning. Productive vocabulary use involves wanting to express a meaning through speaking or writing and retrieving and producing the appropriate spoken or written word form (pp. 24-25). As the studies of vocabulary teaching and research ramify, teaching vocabulary knowledge and its productive realization in ESL and EFL writing has been an area of interest in vocabulary research (Laufer, 1998; Lee, 2003; Lee & Muncie, 2006; Melka, 1997; Mondria & Wiersma, 2004).
Despite the growing attention to the importance of vocabulary knowledge, studies have also attributed ignorance of vocabulary to the deductive rule-oriented Latin grammar in the Age of Reasons (Schmitt, 2000) and the Grammar Translation method that labeled vocabulary as supplements of teaching grammar rule (Zimmerman, 1997a).
According to Nation (2001), the terms passive and active have been used interchangeably to denote the terms receptive and productive. In the current research, the terms receptive and productive are used to emphasize the autonomy of language users, who actively receive and produce vocabulary knowledge.
Although the history of L2 (second language) and FL (foreign language) learning dates back to at least B.C. when Roman children studied Greek, not only has the role of vocabulary in language teaching been ignored, but most of the approaches also has not handled vocabulary teaching effectively (Schmitt, 2000; Zimmerman, 1997a). Consequently, such approaches “mostly relied on bilingual word lists or hoped the vocabulary would be absorbed naturally” (Schmitt, 2000, p. 15). In fact, it was only in the 1980s when the major vocabulary research started to be concerned with the pattern of vocabulary, especially with the advances of computer technology in major language teaching methods and the development of electronic text corpora (Schmitt, 2000; e.g., COBUILD project, see Richards & Rogers (2001) or Nation (2001)).
There have been numerous approaches, methods, and strategies of teaching and learning L2 and FL (Richards & Rogers, 2001). As computer technology develops and the internet resources become more and more available, views on language (approaches) have changed and consequently, instructional tools became diverse to support classroom applications of the views (methods). The use of online dictionaries, which many language learners visit for their vocabulary reference, is a good example. Another example, but relatively less known, is that of the concordancer, which provides the language learners with words and their contextual usage from a large and structured set of electronically processed texts, and corpus. As an application of the Lexical Approach, which views vocabulary as building blocks of language, and the corpus-based language learning method (Lewis, 1993; 1997; 2000), a concordancer has recently drawn attention in ESL and EFL education fields because of its authenticity of corpus and easy accessibility of word and its contexts (Cain, 2002; Chan & Liou, 2005; Nam & Wang, 2004; Sun, 2000, 2003, 2007; Sun & Wang, 2003; Varley, 2009).
Researchers and practitioners have recognized the usefulness of using a corpus and a concordancer in a language classroom, and developed a course design (J. Flowerdew, 1994; Tribbble & Jones, 1997), materials (Fox, 1998; McCarthy & O’Dell, 2005; O’Dell & McCarthy, 2008; J. Willis, 1998), and classroom applications (Tribble & Johns, 1997; O’Keeffe, McCarthy, & Carter, 2007) based on corpus-based language instruction. From the learner’s perspective of writing, a concordancer has been welcome. A corpus-based language learning meets the learners’ needs in that it stimulates the learners’ motivation of learning with authentic examples (Sun, 2007; Yoon & Hirvela, 2004) through inductive thinking strategies (Sun, 2003; Sun & Wang, 2003) in developing ESL learners’ writing skills (Gilmore, 2009; O’Sullivan & Chambers, 2006), writing feedbacks (Gaskell & Cobb, 2004), and transferring learned vocabulary knowledge in writing (Kaur & Hegelheimer, 2005).
Addressing the issues of productive vocabulary knowledge in ESL writing and corpus-based language leaning, the current study attempts to answer the following research questions with ESL undergraduate students’ writing samples and their survey data:
1. How does the use of online concordancer and thesaurus influence the quality of writing as indicated by lexical variations?
2. To what extent do corpus-based writing instructions change the learners’ grammatical knowledge of adjective and preposition usage?
3. What are the learners’ attitudes toward the corpus-based writing instruction?
To answer these questions, the current research reviews the literature that has been written by teachers and researchers who have worked on vocabulary teaching and learning in L2 and FL education and the corpus-based language learning, which utilizes a corpus, collocation, and a concordancer (Sun, 2000; 2003; Someya, 2000). Specifically, Chapter 2 overviews the role of vocabulary in the historical context of L2 and FL education (Richards & Rogres, 2001; Schmitt, 2000; Zimmerman, 1997a), receptive and productive vocabulary knowledge (Melka, 1997; Mondria & Wiersma, 2004), and the characteristics and applicable ranges of corpus-based language learning: collocation learning (Howarth, 1998; Kita & Ogata, 1997; Sun & Wang, 2003); extensive reading (Sun, 2003); vocabulary acquisition (Cobb, 1999; Thuratun & Candlin, 1998), stylistics (Kettemann, 1995), critical literary (Louw, 1997); grammar (Sun, 2000; 2003); and writing (Gilmore, 2009; O’Sullivan & Chambers, 2006; Sun, 2007; Tribble, 1990, 1991, 2001).
Following the literature review, Chapter 3 presents a research model that collects the ESL users’ writing samples, analyzes their productive vocabulary knowledge in the collected samples, and investigates the attitudes toward the corpus-based writing instruction compared to another vocabulary reference tool, the thesaurus. The chapter also introduces the methods of analysis, i.e., pre-/post-test style experimental design analysis of the control group and experimental group with treatments. In the analysis of the collected writing sample data, the ESL writers’ vocabulary usage patterns and grammatical knowledge are investigated with corpus-based text analysis methods, both quantitatively and qualitatively (Baker, 2006; Barnbrook, 1996; Koller & Mautner, 2004; Scott, 2008). The learners’ attitude toward the corpus-based language learning is also analyzed in pre- and post-questionnaire style survey data (Sun, 2000, 2003; Yoon & Hirvela, 2004; Yoon, 2005, 2008). Therefore, the goal of the research model is to investigate the ESL writers’ writing performance reflecting vocabulary knowledge production and to explore the learners’ attitude towards the corpus-based language learning as a vocabulary reference tool.
Based on the chapters of literature review and research methodology, Chapter 4 presents the results of the ESL writing sample analyses and survey analyses. The results are expected to answer the effect of using a corpus-based language learning method in the ESL writers’ grammatical vocabulary knowledge gain, the patterns of the ESL writers’ vocabulary usage, and the attitude changes toward the corpus-based writing instruction. It is expected that the current research answers the benefits of using a corpus-based language learning method in developing the ESL learners’ productive vocabulary knowledge and the effectiveness and the efficacy of adopting a corpus-based language education. In addition to the direct results from the analysis, the current study is projected to provide sound research methodology of investigate ESL writing through a corpusbased analysis in a quantitative and qualitatively balanced method.
The purpose of this chapter is to review the role of vocabulary in the context of L2 and FL teaching as well as vocabulary knowledge and its application to ESL writing in corpus-based language learning. The chapter begins with a review of the approaches and methods of ESL and EFL teaching (Richards & Rogers, 2001), the role of vocabulary in major trends, and alternative approaches and methods of vocabulary teaching and learning (Schmitt, 2000; Zimmerman, 1997a). Next, we identify the characteristics of productive vocabulary knowledge in ESL writing, which is loosely based on Nation’s (2001) goals for language learning, including language items, ideas of content, skills and text discourse. Finally, the chapter ends with a review of the characteristics of corpusbased language learning as an approach and method used to enhance ESL learners’ productive vocabulary knowledge. A pilot study, which focuses on an analysis of ESL writing discourse, is undertaken and the implications of the analysis are discussed at the end of the chapter.
2.1 Vocabulary and L2 and FL learning
In order to successfully teach L2 and FL, one must first decide the best way to teach the subject, i.e., the most effective and efficient way of learning the language within a given amount of time and with a limited number of resources. The best way to teach these subjects may differ depending on the teacher’s view of the language; approach, or how this view is implemented in the classroom and method (Richards & Rogers, 2001).
Over time, numerous language teaching approaches and methods have been developed.2 However, regardless of the teaching philosophy of the teacher and the learning strategies being utilized by the students, language teaching and learning primarily focus on two crucial components: vocabulary and grammar. Of these two components, the importance of vocabulary was not emphasized until the 1970s, even though it is a vital part of learning a language. “[W]e could not accept that vocabulary would be initially less important than grammar. The fact is that while without grammar very little can be conveyed, without vocabulary nothing can be conveyed” (Wilkins, 1972, p. 111, italics original). Later, Richards (1976) argued that “the teaching and learning of vocabulary have never aroused the same degree of interest within applied linguistics as have such issues as grammar, reading or writing, which have received considerable attention from scholars and teachers” (p. 77). Notwithstanding the importance of vocabulary in language learning, the role of vocabulary has not been highlighted in language teaching.
Vocabulary has never been a major subject of language education research within the trends of language teaching (Schmitt, 2000; Zimmerman, 1997a). For example, in Grammar Translation, which was broadly used as the primary method for foreign language education in Europe and the United States (Zimmerman, 1997a), vocabulary selection of the method is based solely on the reading texts uses, and words are taught through bilingual word lists, dictionary study and memorization (Richards & Rogers, 2001). For this reason, Meara (1990) stated that vocabulary is under-researched given the considerable difficulties that language learners experience with it: “vocabulary acquisition…has been very largely neglected by recent developments in research…most learners identify the acquisition of vocabulary as their greatest single source of problems” (p. 221). Laufer (1986) attributed the neglect of vocabulary in L2 and FL education to the linguists who have preferred to study grammar and phonology, which are closed systems and lend themselves easily to theorization, as opposed to vocabulary, which is an open system and, as such, is consequently harder to generalize.
Since the rise of Generative Grammar in theoretical linguistics, the inductive approach to language learning has lost its appeal because the theory hypothesizes that language is an innate capability of humans.3 Therefore, the main interests of methodologist and language acquisition researchers have focused on theories of linguistics, such as Universal Grammar (Gass & Selinker, 2008), which has dominated the theoretical study of second language acquisition and fails to address the importance of lexicon, while emphasizing the mental grammar that mediates between the sound and meaning of a language. The impact on language education from the shifting trends of theoretical linguistics and second language acquisition theories has resulted in an emphasis on grammar in language teaching. Therefore, it has only been since 1990 that a scholarly recognition of vocabulary has existed (Nyikos & Fan, 2007).
Since traditional L2 and FL instructional emphasis has been on grammar, vocabulary has been poorly or unsystematically taught and most of the learning responsibility has been left to the learners (Oxford & Scarcella, 1994). Therefore, the history of language teaching approaches and methods shows that vocabulary has been a neglected aspect of language learning.
Later, Chomsky, who proposed Generative Grammar, has adopted lexicon-is-prime position in his Minimalist Linguistic theory (Richards & Rogers, 2001). 8
People have attempted to learn FL since the time of the Romans (Schmitt, 2000). One of the earliest, well-known and still widely used modern language teaching methods, the Grammar Translation method, dominated European foreign language teaching from the 1840s to the 1940s (Richards & Rogers, 2001). The Grammar Translation method was first introduced in order to teach modern languages in public schools in Prussia at the end of the 18th century (Zimmerman, 1997a). In this method, the role of vocabulary is to support grammar learning, while the goal is to prepare students to read and write classical Latin and Greek. The assumption behind the method is that “most students would never actually use the target language, but would profit from the mental exercise” (Zimmerman, 1997a, p. 5). With vocabulary selected based on its ability to explain a grammar rule “students are largely expected to learn the necessary vocabulary themselves through bilingual word lists, which made the bilingual dictionary an important reference tool” (Schmitt, 2000, p. 12).
One of the criticisms of the Grammar Translation method is the ignorance of the role of vocabulary because the method overly emphasizes the structure of the language. According to Schmitt (2000), “[Grammar Translation] focused on the ability to analyze language and not the ability to use it” (p. 12, emphasis original). Therefore, the method focused more on etymology and the derivation of vocabulary, than on actually teaching the vocabulary for practical use (Zimmerman, 1997a). The Direct Method, a reaction to the Grammar Translation Method, was developed in order to teach FL and is based on observations of children’s language usage, while emphasizing exposure to oral language and naturalistic principles of language learning (Richards & Rogers, 2001). For these reasons, the method virtually prohibits the use of L1 in the classroom, and vocabulary is assumed to be learned naturally through interactions during lessons. Therefore, the initial vocabulary taught is simple and familiar concrete, while the abstract vocabulary is presented according to topic or semantic associations (Schmitt, 2001).
A new language teaching method was developed during World War II that rejected the assumptions of the Direct Method and, instead, exposed learners directly to the language by exclusive use of target language in instruction and forced them to use and gradually absorb its grammatical patterns (Richards & Rogers, 2001).4 The main principal of the method, called the Audio-lingual Method, is that language learners learn using drills rather than an analysis of the target language. The Audio-lingual Method emphasizes oral skills, accurate production of language and limited vocabulary knowledge as a way to build good language use habits. Therefore, the grammar is the focal point of the method and learners are taught grammatical structures through examples and drills rather than through analysis and memorization of grammar rules (Richards & Rogers, 2001; Zimmerman, 1997a).
With the belief that language learning is a process of habit formation, the Audiolingual method calls for attention to be placed on pronunciation and intensive oral drills of basic sentence patterns. Vocabulary items are selected for the purpose of the simplicity and familiarity of the grammatical structure pattern acquisition and new words are introduced only when necessary for the completion of a drill (Richards & Rogers, 2001; Schmitt, 2000; Zimmerman, 1997a). Emphasizing limited vocabulary knowledge as a way to build good language use habits, the method assumes that good language habits and an exposure to the language itself will eventually lead to an increased vocabulary (Coady, 1993). The assumption of the limited vocabulary size in the method was made because the method was developed when the American military found itself lacking individuals fluent in foreign languages and, therefore, needed a means by which to quickly train soldiers in oral and aural skills (Schmitt, 2000). Given that the primary consideration of the method was not the needs of the language learners as they would not need to function in fields of high literary demand (Coady, 1993), the development of the Audio-lingual Method might or underestimates the roles of vocabulary.
The Audio-lingual Method went out of style in the 1960s when a major linguistic theory change was triggered by Noam Chomsky. He rejected the structuralistic approach to language description as well as the behaviorist theory of language learning and proposed mental grammar competence and the actual use or performance of the language.5 Therefore, the validity of the types of practices, drills and memorizations used in the Audio-lingual Method was questioned because these items do not result in competency (Richards & Rogers, 2001). At the same time, language teachers were becoming disappointed in the method as the practical results were falling short of expectations and students often found that they were unable to use the learned skills in real communication outside of the classroom (Zimmerman, 1997a). After the Audio-lingual Method lost its attraction, both theoretically and practically, the focus of language teaching changed to communicative proficiency, i.e., the Communicative Method.
The Communicative Method places great emphasis on the sociolinguistic and pragmatic factors governing effective language use, while not rejecting Chomsky’s concept of competence and performance. However, it does not seek to restore vocabulary instruction as a primary concern, rather the focus is on the appropriate use of language varieties, such as notions and functions, as well as an emphasis on language in discourse (Zimmerman, 1997a). In this method, the learning of L2 and FL is treated as a phenomenon analogous to L1 acquisition. Since L1 vocabulary development seems to occur naturally, it was assumed that vocabulary would take care of itself in L2 and FL learning. From the point-of-view of vocabulary learning, however, L2 and FL learning might not be as natural as the approach assumes, especially due to the amount of vocabulary input as well as the cultural and learning contexts that surround the language learner.
A similar approach is the Natural Approach (Krashen & Terrell, 1983), which was designed in order to enable a beginning student to reach acceptable levels of oral communicative ability in the language classroom. The approach emphasizes comprehensible input rather than grammatically correct production. The theoretical model consists of five hypotheses: the acquisition-learning hypothesis, the natural order hypothesis, the monitor hypothesis, the input hypothesis and the affective filter hypothesis. It follows that the Natural Approach considers the vocabulary, a holder of meaning, as an important component of language acquisition.
Krashen and Terrell (1983) state that “language acquisition does not take place without comprehension of vocabulary” (p. 155). Their claim suggests several aspects of vocabulary learning. First, they recognize the importance of vocabulary in language education, which previously had placed an emphasis on the structure of the language. Second, they introduced the concept of vocabulary comprehension. Until the Natural Approach, vocabulary learning had been a product of memorization or a supplement component to fill gaps in grammar. Without the comprehension of vocabulary, language learners neither understood nor produced the language, at least, properly. Teachers of L2 and FL education could use the concept of vocabulary comprehension in the approach as a guideline for teaching vocabulary because the approach emphasizes the importance of interesting and relevant input for the learners.
Until recently little literature existed on L2 and FL vocabulary teaching and learning (Carter & McCarthy, 1988; Coady & Huckin, 1997; Griffin & Harley, 1996; Hiebert & Kamil, 2005; Huckin, Haynes & Coady, 1993; Johns, 1994; McCarthy, 1990; Nation, 1990, 2001, 2002, 2008; Nyikos & Fan, 2007; Richards & Rogers, 2001; Schmitt, 2000; Zimmerman, 1997a). Despite the expansion of this field of research, teachers and researchers have continued to question the exact meaning of terms such as “efficient” and “effective” in regard to vocabulary instruction (DeCarrico, 2001; Hunt & Beglar, 2002; Mondria & Wiersma, 2004) and, also, the use of vocabulary within the learner’s reading, writing, listening and speaking instruction.
In order to better understand vocabulary learning, researchers have considered several factors that surround language education: input (listening and reading), output (speaking and writing), the role of context, intentional and incidental learning, and the use of technology. In the mean time, teachers and researchers have continued to emphasize the role of vocabulary in L2 and FL education (Nattinger & DeCarrico, 1992; Hinkel, 2004; Lewis, 1993, 1997, 2000; Nation, 1990, 2001, 2009a, 2009b), and studied the nature of language learners’ vocabulary knowledge (Laufer, 1990, 1997; McKewon & Curtis, 1987; Miller, 1999; Richards, 1976).
Recently, with the rapid development of computer technology and its application to computational linguistics, the perspectives on L2 and FL education have changed. One of the major changes is a growing attention to vocabulary in language instruction. The Collins-Birmingham University International Language Database (COBUILD) used extensive computer analysis based upon a corpus of twenty million words and to publish the COBUILD dictionary. The project created a new approach in regard to teaching L2 and FL: the Lexical Approach (Lewis, 1993, 1997, 2000). This new orientation in language description and approach has led many to rethink the nature of language of which the role of vocabulary is acknowledged in L2 and FL education.
The Lexical Approach is a theory about the nature of language and language learning that is “derived from the belief that the building blocks of language communication teaching are not grammar, functions, notions or some other units of planning and teaching, but lexis” (Richards & Rogers, p. 132). The reorientation of this approach to language began from a new description and understanding of language in that “language consists of grammaticalized lexis, not lexicalized grammar” and that “grammar is a subordinate structure to lexis’ and the lexical approach reflects a belief in the centrality of the lexicon to language structure, second language learning and language use” (Lewis, 1990, p. iv, vii). As the lexical approach has emerged, the role of vocabulary has become more important in L2 and FL education, primarily due to its authenticity of corpus, usefulness of extracting words and the contexts of the words (Cain, 2002; Chan & Liou, 2005; Nam & Wang, 2004; Sun, 2000, 2003, 2007; Sun & Wang, 2003; Varley, 2009).
The implication of vocabulary as a building block of a language suggests that language learners should not know about vocabulary but, instead, know vocabulary in an authentic context. Knowing vocabulary not only includes the meaning of the words, but also grammar, usage, and the discourse of the target language. If language learners can master this type of vocabulary knowledge, then they and their language teachers would reduce the burdens of productive vocabulary knowledge in language teaching and learning.
2.2 Receptive and productive vocabulary knowledge
Vocabulary is a unit of language that overarches other units, systems and levels of language. Questions such as “What is vocabulary?” and “What is knowing vocabulary?” might be answered differently depending on how one defines the scope of the unit. For example, theoretical linguists, who study the nature of language, or psychologists, who study the mind and behavior of humans, might answer the questions differently. Consequently, if these questions are asked of L2 and FL educators, the answers would be different than those provided by theoretical linguists or psychologists. In regard to L2 and FL vocabulary knowledge, simply knowing the form and meaning of a word is not enough when attempting to gain full knowledge of the word.
Nation (1990) attempted to answer the questions “What is a word?” and “What is involved in knowing a word?” from the point of view of an EFL teacher by stating that a word can be defined with reference to the learner’s L1 or to English because the ways of defining words in L1 and English are different.7 The success of vocabulary learning depends on how the learners acquire the vocabulary knowledge as much of the meaning of the word is mastering the ability to transfer the vocabulary knowledge from one language to the other.
For ESL learners, knowing vocabulary consists of three categories of knowledge: form; meaning and use (Nation, 2001).8 Each category is involved with receptive and productive knowledge, which is related to the four skills of language use: reading, writing, speaking and listening. Therefore, from the perspective of L2 and FL education, vocabulary knowledge not only involves information about the form and meaning of words, but also the metalinguistic vocabulary knowledge with the ability to use it in appropriate contexts.
One of the problems transferring vocabulary knowledge from one language to another language is culturally loaded words. In a study that investigated the transfer of culturally loaded words between Chinese and English, Liu and Zhong (1999) found that Chinese native speakers and English native speakers rated the appropriateness of using the words fat and old differently. In English, these two words have negative connotations, while in Chinese, fat almost always has positive connotation and old has both positive and negative connotations. The examples of the connotational differences between the Chinese and English words indicate why language teachers and learners should not treat word meanings in the target language as mirrors of the mother tongue. It also shows why in L2 and FL learning vocabulary and acquiring vocabulary knowledge should not simply involve developing gap-filling skills in the grammatical structure or simple one-on-one meaning translations (Nation, 1990).
The issues of culturally loaded words occur among L1, L2 and FL. However, issues also exist in regard to language learners within the L2 and FL vocabulary selection called register (McCarthy, 1990; Zimmerman, 2009). Register, according to Zimmerman (2009), is the variation in style by which language users select vocabulary based on who the speaker and audience are, what the contentment and purpose are, and what the situation is.9 One of the register types is academic register. Academic register is crucial for ESL learners who wish to study at English speaking academic institutions because without the mastery of academic register and academic vocabulary, learners do not have access to the content within the school curriculum (Zimmerman, 2009). Francis and Simpson (2009) argued that “if college students are to succeed, they need an extensive vocabulary and a variety of strategies for understanding the words and language of an academic discipline” (p. 97). For example, college students must understand not only general vocabulary, but also the discipline-specific and technical vocabulary in their assigned texts, lectures and discussions, and also use this information in research papers and oral presentations.
ESL learners who use English as their L2 in an academic setting have a heavy vocabulary burden because, according to Coxhead (2000), academic vocabulary in English for Academic Purposes (EAP) causes ESL learners much difficulty due to that fact that the learners are generally not familiar with the technical vocabulary and the vocabulary is usually words not often used. The simple definition of vocabulary does not help language learners, especially those who are or have to be at an advanced proficiency level, such as college level ESL learners. Therefore, in order to maintain their language as the standard necessary to complete coursework, the ESL learners need to have a certain level of vocabulary knowledge.
Often, language learners tend to equate knowing vocabulary with knowing only the meanings of the words. In addition, language educators further this idea as they teach students to memorize as many word meanings as possible in order to “master” the vocabulary. Figure 2.1 presents one of the Internet portal websites for Korean English learners who prepare academic for English tests.
The portal is one of the English learning portals in Korea, especially in regard to English test preparation. The vocabulary used within this screenshot seems to be chosen randomly outside of a specific context for the vocabulary. Learning vocabulary simply in order to pass tests may be one of the reasons why some ESL students have difficulties in academic writing. In such situations, even if the student knows enough words for a particular expression that they want to use, they might not understand the translated meaning behind the words in such a way as to write the expression properly.
Language learners consider words as the basic components of a language and once they are equipped with enough of these components, they think that they have the ability to command the language. However, such a result is not always the case. Most of the time, this lack of command is because vocabulary usage is not easily predicted by its grammatical patterns, i.e. even if two words are synonyms, they might not have the same collocations or
they might have passive/active voice incompatibility or are countable/uncountable noun that must be used in different contexts (Nation, 2008). In reality, the elements of a language do not consist of a single word, but the combinations of words that depend on their context (Lewis, 1993). Lewis’ argument suggests that L2 and FL educators and learners should not look at words as single and isolated items in a language. Instead, they should look at the words as interdependent elements in an internal language context, i.e., grammar and external language contexts or target language cultures and discourses.
While discussing comprehensive review of research on the speech of children, Fraser et al. (1963) argued that most researchers and parents observe and believe that children understands the language of others considerably before he/she actually uses language himself/herself. Since then, research has focused on children’s ability to distinguish and understand grammar as well as their ability to produce grammatical features.10 Just as grammatical knowledge has two distinguishing features, vocabulary knowledge has the same features, i.e., receptive and productive vocabulary knowledge. Receptive vocabulary takes effect when language learners receive linguistic input from others through listening or reading and try to comprehend when they have heard or read. Productive vocabulary knowledge occurs when language learners produce language forms by speaking and writing in order to convey messages to others (Nation, 1990, 2001, 2008).
In L2 and FL research, the concept of receptive and productive vocabulary knowledge did not draw attention until recently. Therefore, information on this area of research is still in its infancy (Laufer, 1998; Meara, 1990, 1997; Melka, 1982, 1997; Mondrea & Wiersma 2004; Schmitt, 2000; Waring, 1999; Webb, 2005). Melka (1997) attempted to define the well-known assumption, but rarely described notion, of receptive and productive vocabulary knowledge. She attempted to replace the idea of a gap between receptive and productive knowledge with a degree of knowledge, such as “familiarity of the vocabulary,” which she defined as knowing various meanings of polysemous words, knowing collocations or idioms, grammatical information on vocabulary and appropriateness of usage (Melka, 1997, p. 101). She argued that certain degrees of knowledge could be labeled as higher degrees of familiarity, which moved the knowledge closer to productive knowledge. Then, a crucial factor could be used to establish at what point familiarity is such that receptive knowledge can be converted into productive knowledge.
The notion of receptive and productive vocabulary knowledge being familiar may have implications for vocabulary teaching. Teaching vocabulary implies a choice in what meanings of the word and context within the word could fit to teach. Making such choices reduces the distance between receptive and productive knowledge.
Nation (2001) more concretely summarized the idea of receptive and productive aspects of vocabulary knowledge. He proposed meaning, form, and use as components of vocabulary knowledge and that each component has receptive and productive aspects. He devised 18 questions that showed that the three components could be interpreted in terms of receptive and productive aspects. For example, in order to identify whether a word’s meaning is productive or receptive, the following two questions could be asked (Nation, 2001, p. 26): “What meaning does this word form signal?” and “What word form can be used to express this meaning?”11 He also showed which type of learning would be most effective for enhancing the three types of vocabulary knowledge. For the knowledge of use, he argued that implicit learning with repetition activities for grammar collocation and explicit learning with explicit guidance and feedback for constraints on use were the most effective types of learning.
As described earlier, receptive vocabulary is much larger than productive vocabulary and reception precedes production. Therefore, Nation (2001) attempted to explain why a difference exists between the sizes of receptive and productive vocabulary knowledge by listing and describing the following four factors of the discrepancies between receptive and productive vocabulary knowledge: amount of knowledge, practice, access and motivation. The amount of vocabulary knowledge accounts for difficulties in producing vocabulary knowledge because it requires extra learning of the new word, while learners only need a few distinctive features of the vocabulary to perceive vocabulary. This difference is especially important when the learner’s L1 is different from the target language.
According to literature, differences in practice explain why receptive use usually gets more practice than productive use, which may, in turn, explain why receptive knowledge is larger than productive knowledge (Melka, 1997; Nation, 2001). The access explanation suggests that word knowledge differences exist depending on the language use, i.e., whether the use is L1-L2 productive use or L2-L1 receptive use. For example, for the L2-L1 receptive use, the learner only needs a one-on-one meaning translation. For L1-L2 productive use, however, the learner needs more than the meaning translation knowledge. The learner needs other parts of the knowledge such as the word’s correct grammatical form, collocations with other words and extra-linguistic features, including context and culture. The motivation explanation is not directly related to vocabulary knowledge, although it explains that if a learner is not motivated to use the learned vocabulary, then the vocabulary knowledge would never be productive.
The explanation of gaps between the receptive and productive vocabulary knowledge leads language teachers to question what they should focus on in order to increase productive knowledge with familiarity and appropriateness in order to create native-like mastery. Although literature has addressed the receptive and productive vocabulary knowledge distinction, Schmitt (2000) argued that it is difficult to draw a line between L2 language learners’ receptive and productive knowledge though empirical research because of different definitions of receptive and productive knowledge and different measurement methods. Meara (1997), however, presented a vocabulary acquisition model which explains that vocabulary is receptively known until learners reach a certain threshold at which point it becomes productive. Recently, Mondria and Wiersma (2004) investigated the popular belief that vocabulary words learned both receptively, i.e. learning vocabulary in the L2-L1 language order, and productively, i.e. learning vocabulary in the L1-L2 language order, are better retained receptively than those learned just receptively.
One of the findings of their research is that receptive learning leads to a certain amount of productive retention, but that productive learning leads to a larger amount of receptive retention. Therefore, when productive knowledge is the aim of vocabulary learning, it is advised to learn the words productively. Adding receptive learning is not useful, as it does not lead to improved productive knowledge. Having pointed out that the productive retention is lower than the receptive retention, Mondria and Wiersma recommended that for a higher productive retention additional learning and/or exercises are necessary.
One of the ways by which to increase productive knowledge can be found in a study conducted by Meara (1990). He argued that receptive vocabulary knowledge is qualitatively different from productive vocabulary knowledge. In order to explain this concept, he used the Graph Theory, a mathematical model, which explains that certain relationships and processes can be represented as a system of points or nodes connected together by lines or arcs (p. 150). For example, suppose there is a phrase, which consists of words. Then, this phrase becomes a set of words (nodes) linked together by directional associations (arcs).
For example, if somebody wants to express a meaning about the weather or circumstances, they might use the idiom rain or shine. In this expression, the word rain will often elicit shine, as a directional association, but not vice-versa. The expression consists of the word shine, the passive vocabulary (a part of the receptive vocabulary knowledge), which responds only to external stimuli and the word rain, the active vocabulary (a part of the productive vocabulary knowledge), which does not require an external stimulus, but can be activated by other words in other contexts. His model suggests that the traditional methods of teaching vocabulary by relying on teaching single words might not be an efficient way of converting receptive vocabulary knowledge into productive vocabulary knowledge.
Since Pawley and Syder (1983) questioned “how [the native speaker] selects a sentence that is natural and idiomatic from among the range of grammatically correct paraphrases, many of the which are non-nativelike or highly marked usages,” researchers have tried to find the answer by grouping characteristics of vocabulary words in the language into types, such as lexical phrases (Nattinger & DeCarrico, 1992), formulaic languages (Wary, 2002) or formulaic sequences (Schmitt, Ed., 2004).
According to Wray (2002), the definition of grouping characteristic, or a formulaic sequence, is a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar (p. 9). For native speakers, formulaic sequences seem to be easy ways to produce L1 and sound like the native speakers, however for L2 learners, mastering formulaic sequences is one of the difficult levels that they can achieve unless they literally memorize all the sequences. Learning to comprehend and produce L2 or FL means understanding how the parts of the language, i.e., vocabulary, are arranged without causing contradiction in a given situation. For example, learning to write in the target language means learning to convey one’s ideas or thoughts in an organic sequence of written vocabulary. Students need to learn words not as isolated, planned answers to tests, but rather how to use the vocabulary in a larger text in a fully productive manner. Therefore, teaching vocabulary by deliberately stressing the associational links leading from already known words to newly learned words might be a more effective way of activating a passive vocabulary.
2.3 Productive vocabulary knowledge of writing
L2 and FL learners often state that they experience discrepancies between producing the language and receiving it, i.e., writing is more difficult than reading and speaking is more difficult than listening. These discrepancies exist because of the receptive and productive vocabulary distinction in the learners’ lexicon (Laufer, 1998; Meara, 1990, 1997; Melka, 1982, 1997; Mondrea & Wiersma, 2004; Nation, 1990, 2001, 2008; Schmitt, 2000; Waring, 1999). Previous literature has shown that a gap or distance exists between the receptive and productive vocabulary knowledge. For example, although L2 and FL learners know and recognize learned vocabulary, they utilize vocabulary knowledge with a lower production than reception in short sentence writing tests.
This discrepancy is especially prevalent in individuals in L2 in a certain context. According to Nam’s (2008b) qualitative study of Korea to English bilingual speakers’ native speakerness, even individuals who are bilingual experienced difficulties in professional and academic writing in English, although they did not experience any difficulties in every day oral communication. The findings correspond with the results from Leki and Carson’s (1994) study in which they studied whether L2 learners see their lack of vocabulary as a major factor affecting the quality of their writing. The need for vocabulary improvement and usage in L2 writing is also felt by those individuals who read documents written by ESL students. Santos (1988) investigated professors’ reactions to compositions by ESL speakers in academic settings.
One of the findings of the study was that even though the errors made by the speakers were easily understood by the professors, they were not academically unacceptable. Within these errors, the lexical errors were considered to be the most serious. The research concluded greater emphasis needs to be placed on vocabulary improvement and lexical selection in ESL student writing. Productive vocabulary knowledge is an essential component of L2 and FL writing, not only for the writers, but also for the readers. Therefore, the next logical question to ask is how the language learners can improve the quality of their writing. Ironically, in order to improve their vocabulary knowledge, the language learners need rely on their writing practices. As stated earlier, Nation (2001) argued that language learners use L1-L2 exercises in order to increase productive vocabulary knowledge.
Corson (1997) argued that students should acquire academic vocabulary productively in order to show the academic community a membership. Franken (1987) also argued that reading texts within the genres that the learners are required to write in is important preparation because it allows the learners to analyze the texts and find useful expressions to memorize. Such activities force learners to read a text that is similar to the one that they need to write, which is helpful because the text serves as a good model in the usage of words, particularly collocations and their grammar. This example is useful, according to Corson, because learners can focus on vocabulary, while still focusing in the general aspects of writing and membership in the writing community.
2.3.1 ESL writing
One of the burdens for L2 and FL learners is to memorize all of the necessary words to command language proficiency. Therefore, teachers and students have attempted to find efficient and effective ways to learn vocabulary. The vocabulary learning burden can be interpreted as the amount of effort required by ESL learners to learn vocabulary (Nation, 2001). According to the literature, a conservative estimate of size of the vocabulary of native English-speaking university graduates is approximately 20,000 word families (Nation & Waring, 1997; Nation, 1990, 2001).13 For adult EFL learners, the gap between their vocabulary size and that of a native speaker is usually very large. Therefore, this number of words that English native speakers acquire in each age stage is a good indicator of the EFL learner’s vocabulary learning burden. In order to meet the same word knowledge as a native speaker, an L2 or FL student would need to double the vocabulary knowledge necessary by the native or L1 speakers each year. Nation’s (2001) explanation implies that one of the keys to successful language proficiency depends upon reducing this type of vocabulary learning burden.
Further, when issues of productive vocabulary knowledge become involved, the vocabulary burden for ESL learners becomes qualitatively complicated. The ESL learners not only need to increase the number of words in their vocabulary, they also need to know different usages of the words depending on different contexts, purposes, and the target culture. The vocabulary burden, therefore, involves quantitative and qualitative accounts of the ESL learners.
De Groot and Keijzer (2000) considered different aspects of the vocabulary learning burden of L2 and FL learners such as from the characteristics of the vocabulary itself as encountered by the learner. They argue that cognates and concrete words are easier to learn and less susceptible to being forgotten than non-cognates and abstract words. Word frequency barely affected the productive testing results and receptive testing showed better recall than productive testing.
Laufer (1990) questioned why words are separated into two categories for L2 learners: easy and hard. She argued that these categories only exist because of the similarities and differences between the word meaning in each language, i.e., L1 and L2 words may share meaning equivalent, but they have different collocation usages. In order to deal with these words, she suggested that teachers develop strategies for self-learning using guessing in context. In her examples, Hebrew-speaking English learners tended to say high education or stand in front of a problem, when translating their L1 collocations. The teacher had a hard time explaining such errors using reference rules and dictionaries as these items did not provide the necessary collocations in their examples of the words’ uses. The implication of her argument suggested that the learners should use the context of the words in order to discern similarities and differences and develop strategies for self-learning, i.e., guessing in context in order to reduce the vocabulary learning burden.
According to McKewon, Beck, Omanson and Pople (1985), L1 frequency vocabulary instruction resulted in better outcomes in tasks of the definition of knowledge, fluency of access to word meanings, context interpretations and story comprehension than traditional vocabulary instruction. For L2 and FL, Nation (1990) argued that the vocabulary learning burden varies depending on the learner’s familiarity with the word or his or her background knowledge in the target languages. He also argued that teachers can help students reduce the vocabulary learning burden by drawing attention to systematic patterns and analogies within L2 when the learners’ first language is not similar to L2, which often causes a heavier burden (Nation, 2001).
The results of McKewon, Beck, Omanson and Pople (1985) and the arguments of Nation (1990, 2001) suggest a possible solution to reducing the vocabulary learning burden: the learners need to become familiar with the frequent usage of each word and pay attention to the systematic patterns of the language. For example, a word is often used differently in the different contexts and, therefore, will have slight meaning differences each time, i.e., in some cases synonyms should not be used interchangeably depending on the context. These differences can be easily learned using a list of usage frequencies that show systematic patterns within the usage of the word. Many researchers have stated the importance of using context in L2 and FL vocabulary learning when attempting to teach vocabulary.
According to Fischer (1994), vocabulary learning within context for L2 and FL learners has shown similar results to vocabulary learning by first language learners. He investigated the independent and interactive effects of contextual and definitional informational vocabulary learning. For example, in his study, he present one group German students learning English a text containing unfamiliar English words, while he present a second group with a monolingual English dictionary entry. A third group was given both the text and the dictionary entry. The result showed that the groups with the text had better results than the group with just the direction, while shows that having information about the context in which the word is used is crucial in regard to understanding the meaning of the word.
Nagy (1995, 1997) also reported the importance of vocabulary learning through context for L2 learners. He compared the breadth and depth of vocabulary knowledge, while arguing that knowing a word involves more than knowing a definition. He stated that word knowledge is generally recognized as including other components, such as syntactic frames, the word’s collocational possibilities, its register, potential morphological relationships and its semantic relationship with other words. He summarized two reasons why L2 learners need to use context as an important means of vocabulary growth: 1) the use of context is a crucial strategy for dealing with text containing unfamiliar words and 2) L2 learners tend to show even greater benefits from increases in the volume of their reading than L1 learners. In his report, he concluded that although learning from context is more difficult in L2 settings, L2 readers can gain significant word knowledge simply by reading. In addition, he found that increasing the amount of text that the learner is required to read produces significant gains in vocabulary knowledge and other aspects of linguistic proficiency.
Regardless of whether the L2 or FL learners are children or adults, they all face a learning burden. In order to overcome this burden, they must first learn basic grammar and vocabulary. Then they may be able to fulfill a desire to write and speak more sophisticatedly. The importance of measuring vocabulary learning at different proficiency levels and for different purposes in L2 education has given rise to English for Specific Purposes (ESP) (Dudley-Evans & St. John, 1998; Hutchinson & Waters, 1987).14 Given that people learn ESL or EFL with specific purposes, whether it is for pleasure, academic achievement or employment, vocabulary needs to be taught within the area that best fits their purpose. If language learners can narrow down the scope of the vocabulary necessary depending on their specific purpose, it would reduce their vocabulary learning burden. Adult non-native speakers who intend to study in English-speaking academic institutions not only need to know the vocabulary that the similar-aged native speakers know, but, also, the vocabulary that the native speakers would have learned during their school years and understand how the vocabulary might be used differently in the specific setting (i.e., the words scaffolding and scaffold in education and building construction).
ESL students have limited time and must cope with large amounts of weekly reading and writing assignments. Their vocabulary size, although they have completed English proficiency exams, such as the TOEFL®, often regresses from the level necessary for writing for academic purposes. ESL students may simply believe that mere exposure to English, via speaking or reading texts at the appropriate level, will eventually result in vocabulary knowledge sufficient for academic writing. However, usually the students find that their years of efforts and laborious vocabulary learning do not help them as they had desired because many of them realize difficulties of using the learned vocabulary in class activities, such as assignments, presentation, or discussion. They may have needed practical and effective ways to learn vocabulary in order to better make use of their learned vocabulary knowledge in writing.
In academic settings, regardless of the learner’s intuition, natural learning will not provide the literacy skills necessary to function successfully in academic surroundings. Carrell, Devine and Eskey (1988) claimed that, unlike oral language skills, academic literacy skills are not acquired naturally. These skills require instruction and training in order for the learner to obtain a certain level of proficiency in the area. This claim also applies to vocabulary learning. Learners with academic purposes need to become familiar with words that are frequently used in academic setting and understand the usage of these words in academic contexts. A learner who wishes to develop an extensive vocabulary knowledge for literacy purposes, such as reading and writing, needs to receive direct instruction and training as well as extensive exposure to the target language vocabulary.
College-level ESL students, although they have already acquired a considerable level of L1 literacy and studied English, need to develop the ability of reading and writing in English appropriately to meet the requirements of educational institutions. As discussed earlier, the issue of productive vocabulary knowledge, vocabulary size and complicated quality of vocabulary make it difficult for ESL students to appropriately write academic papers. Language teachers can assist students with effective engagements using target language texts by focusing on the role of vocabulary in the academic genre. Language learners, while focusing on genre specific vocabulary, will be able to better understand genre differences within the target language.
In addition to vocabulary knowledge as discussed previously, vocabulary learning should include external language context in which the target language is used. The vocabulary knowledge concerning external context includes culture/connotation (Liu and Zong, 1999), genre (Swales, 1990), discourse (Schmitt, 2000), or collocation (Zimmerman, 2009). ESL learners may not have complete vocabulary knowledge until they are equipped with this type of vocabulary knowledge along with L1-L2 simple meaning equivalents.
It is important to keep in mind that grammar is one of the essential components of writing. Teaching grammar for L2 writing is not intended to develop the language learners’ overall native-like proficiency (Pica, 1994). Without grammatical knowledge, language learners are unable to develop a full range of L2 writing. Therefore, for the L2 users who intend to use grammatical knowledge, for example, in academic writing, their grammar instruction needs to be carried out in tandem with vocabulary instruction and academic collocations because grammar learning in contextual and lexicalized chunks are effective (Hinkel, 2004). Research on the Lexical Approach that investigated the effectiveness of teaching grammar with vocabulary also supports Hinkel’s (2004) argument (Lewis, 1993, 1997, 2000; Nattinger and DeCarrico, 1992). In an attempt to suggest the benefits of teaching words and word patterns, Hunston, Francis and Manning (1997) described language teaching syllabi and textbooks that often teach grammar and vocabulary as separate areas of language teaching and learning.
Many course books have separate sections on grammar and vocabulary, syllabi list grammatical structures and key vocabulary items separately; students are described as being good at grammar but having a short storage vocabulary, or vice versa, and often times grammar and vocabulary are tested separately. Traditionally, language courses have been as a set of grammatical points, with vocabulary selected to support the topic of each course unit (p. 208).
What Hunston, Francis and Manning (1997) criticized in language teaching was that grammar and vocabulary should not be taught separately, as it is in traditional FL teaching. They argued that if L2 and FL learners did not learn vocabulary from rote memorization from a list of words, but, instead, from word meanings within patterns that build blocks of language, then it is possible to have a positive synergistic effect that promotes language learners’ understanding, fluency, accuracy and flexibility of language learning.
From the perspective of the target language speakers, vocabulary and grammar instruction can be combined into an efficient tool of language instruction. Such teaching would provide connections between grammar and vocabulary by focusing on patterns within the vocabulary. As Hunston, Francis and Manning (1997) argued, words can be shown to have patterns and words which have the same patterns tend to share aspects of meaning. Being able to learn these patterns and aspects would reduce the learners’ learning burdens. The advantages of acquiring patterns in vocabulary teaching and learning have been argued in other literature. For example, according to Nation (2001) “if words of related meaning like hate and like take similar patterns [of requiring gerunds], then the learning problem of one of them will be reduced because the previous learning of the other will act as a guide” (p. 56). The related meaning of certain word groups and their similar structural patterns strongly support why grammar and vocabulary needs to be taught together.
Recently, the importance of vocabulary was widely recognized by language practitioners and researchers and the word frequency has been used as an organizing principle of language teaching courses, while grammar is brought in as support where necessary. This approach is called the Lexical Approach (Johns, 1994; Lewis, 1993, 1997, 2000; Tribble & Johns, 1997; Sinclair & Reunof, 1988; Willis, 1990). For example, students learning English list particular verbs which are typically followed by either a toinfinitive, a gerund or both. For example, the learners must learn that: the verbs appear and manage are followed by a to-infinitive only; the verbs finish and suggest are followed by a present participle only; the verbs begin and like are followed by either form of gerund or to-infinitive with roughly the same meaning; and the verbs forget, remember and stop have different meanings when used with each grammatical form. When learners learn these types of vocabulary skills, they will remember not only the meaning of each word, but also the grammatical property of how each word fits into English grammar.
In the line of the same reasoning, Schmitt (2000) suggested that lexical and grammatical knowledge are inextricably interrelated in a type of no discrete boundary between vocabulary and grammar, lexicogrammar. The view suggests that vocabulary and grammar are closely linked and these two components of language should not be taught separately. The implication of the view is realized in corpus-based language learning for grammar and vocabulary teaching, which relies on the extent of lexical patterning in authentic language use and language databases (Johns, 1994). In the light of the fact that L2 and FL learning always takes place under time constraints for the teachers and learners, it is important to maximize the language gains and make learning as efficient as possible. Using lexicogrammatic units and getting used to the patterns that frequently occur in authentic texts are likely to be an efficient way to obtain vocabulary as well as grammatical knowledge in writing. With the applications of corpus-based language learning, L2 and FL learners’ writing proficiency and accuracy can be increased by learning vocabulary from authentic examples and grammar that is embedded in vocabulary.
2.3.3 Writing discourse
Gee (1989) defined discourse as “a socially accepted association among ways of using language, of thinking, and of acting that can be used to identify oneself as a member of a socially meaningful group or social network” (p. 18). One of the purposes of learning vocabulary is to convey messages in extended texts as a member of a certain discourse community. As a member of a writing discourse community, therefore, the learner needs to demonstrate the ability to use the vocabulary of the discourse. Vocabulary plays an important role in forming the writing discourse of language learners because vocabulary is a unit of a form and meaning of the messages in the discourse. Vocabulary in extended texts tends to have special characteristics that are worth attention.
According to Sinclair (1991), words tend to cluster together in systematic ways and, sometimes, this pattern becomes so regular that the resulting clusters seem to be more than a list of words that fit in a grammatical framework. For example, the discourse and text in published articles in the academic community show certain traits of writings, such as an average sentence length around 25 words, more frequent uses of abstract nouns and nominalized instead of concrete nouns (Swales, 1990). McCarthy (1990) explained discourse analysis, which is the process by which language acquires meaning in contexts, the forms and products of those processes, the types of texts that are produced, and the structure of exchanges in language interactions. This type of analysis implies that once words are put together in a system of grammar to create a text, then the vocabulary of the text will show a pattern in accordance with the linguistic environment in which it lays.
McCarthy (1991) and McCarthy and Carter (1994) argued that certain words are strongly associated with certain patterns and organizations of texts. In this respect, vocabulary use signals the uniqueness of a text that distinguishes it from other types of texts and shares general discourse messages among similar types of texts (Nation, 2001). Therefore, the discourse analysis of writing involves studying longer texts in order to discover the distinguishing lexical patterns in the texts.
Widdowson (1993) emphasized the importance of placing the right words in the right places. He argued that with all the emphasis in the literature focused on the correct grammatical patterns, morphological modifications (inflectional and derivational) and or syntactic order (internal relationships), another concern has been left unanswered: external relationships. These relationships focus on choices about which words to place in which section of the text in order to make a complete relationship. These choices are also called the conventional and range of vocabulary usage. Nation (2001) also asserted two related aspects should be considered in regard to the role of vocabulary in language external context. He stated that “vocabulary use signals and contributes to one text from other texts; vocabulary use carries general discourse message which are shared with other texts of similar types” (p. 205). In L2 and FL writing tasks, selecting the proper vocabulary and placing it into the proper place is one of the most difficult tasks even for advanced learners.
The discussion of vocabulary in writing and the evaluation of text can lead to the idea of words acting in unison and affecting one another in the texts. Sinclair (1996) found that choosing a particular word guides and constrains the lexical choices several words away from the initial one. Schmitt (2000) exemplified this lexical patterning with the word sorry and its subsequent word selection. He presented evidence that the word sorry is almost always followed by some form of inconvenience to which the speaker regrets having caused. Stubbs (1995) argued that learning vocabulary in ostensive ways has limits.
For example, he showed that by comparing the target words big and little and large and small with the words boy and girl from a 2.3 million word contemporary English corpus and the Oxford English Dictionary on CD-ROM that the meaning of many common words can be learned only via their repeated co-occurrence with other words. His summary of this finding showed that even though the large-big pair and small-little pair are in complementary distribution, the words large and small are usually neutral words denoting physical size. The words big and little can be used neutrally and literally, but are often patronizing, critical or both. Therefore, he concluded that many aspects of word meanings cannot be learned by ostensive definition only. Cultural aspects are encoded not just in the words, but also in combinations of very common words. Knowing this characteristic of vocabulary is important for the second and foreign language learners.
As introduced earlier, Liu and Zhong (1999) reported that even though culturally loaded words are similar in primary meanings in L1 and L2, culturally different connections exist within the interaction between the words, which explains why FL learners who already know the meaning of a particular word sometimes use inappropriate expressions in a target language discourse. L2 and FL teachers and researchers need to pay attention to this characteristic of vocabulary so that vocabulary can be taught with the understanding of the target language’s adequate and appropriate socio-cultural contexts. Intermediate and advanced proficiency learners, those who have already mastered enough grammar of the target language and would want to use the learned language for their own purposes, often still have problems using vocabulary in an appropriate way in longer texts in the target language discourse. For this reason, Nattinger and DeCarrico (1992) are interested in situationalized expressions which could be used to perform social or pragmatic functions that can be easily retrieved for written or spoken communication. They described language in terms of lexical phrases and introduced various teaching applications that use lexical phrases in different forms of discourse, including spoken and written discourse.
Ringbom (1998) compared the frequency of word use in English native speakers’ argumentative essays and learners’ essays written by non-native speakers of English. The results showed two conclusions about advanced English learners. The first is that language learners either underused or overused certain vocabulary and the second is that learner corpora of the non-native speakers’ writings showed that L2 academic essays contained a smaller range of vocabulary than L1 essays do. In addition, the researcher concluded that some core nouns (time, way, society, people and things), core verbs (think and get), auxiliaries (be, do, have, and can), vague quantifiers (all, some, and very) and conjunctions (but and or) are overused in the learner corpora. Milton (1999) investigated problems that EFL novice writers have in acquiring lexical features of written discourse. He identified the problems of using pre-fabricated expressions and the students’ overuse of these expressions in the essays.
Then, he explored examples of certain fixed expressions that were overused in the students’ essays. By comparing student essays with a L1 essays, he confirmed that the L2 students are more likely to reuse phrases than L1 writers. Due to his results, he was able to create a list of alternative phrases taken from the L1 essays that he could include in his classes in order to help the students in when writing essays. The investigation into L2 writers’ discourses implies a new direction of vocabulary learning for L2 and FL writing in that vocabulary knowledge should not revolve around word meaning translation, but, instead, should focus on target language discourse. If L2 and FL learners want to express a thought or concept within a longer text, whether spoken or written, they should pay attention to their word choices to make sure that they are appropriate and form a unity in accordance with the target language discourse.
2.4 Corpus-based language learning
A corpus is a large collection of naturally-occurring language text collected in a systematic way that is usually stored and processed electronically. Although corpora existed before computers, the first modern computer readable corpus can be consider the one-million English word Brown Corpus, compiled by Francis and Kučera at Brown University between 1961 and 1964. In the 1970s, the LOB Corpus, a British counterpart, was compiled. Both were assembled primarily for linguistic research. Since then, the sizes of a corpus have become several hundred million words and the possibilities of using corpora have grown beyond simple linguistic research to language teaching and research.
As technology has developed, corpus-based language learning has received an increasing amount of attention from language teachers and researchers who have stated that it is an effective L2 and FL teaching and learning style for course design (J. Flowerdew, 1994; Tribbble & Jones, 1997), teacher development (Allan, 2002; Tsui, 2004), materials (Fox, 1998; McCarthy & O’Dell, 2005; O’Dell & McCarthy, 2008; Willis, 1998), classroom applications (O’Keeffe, McCarthy, & Carter, 2007), vocabulary (Sun, 2000, 2003), grammar (Conrad, 2000; Meunier, 2002), learners’ writing skills (Conzett, 2000; Gilmore, 2009; O’Sullivan & Chambers, 2006; Sun 2007), reading (Brodine , 2001), writing feedback (Gaskell & Cobb, 2004) and transferring learned vocabulary knowledge to writing (Kaur & Hegelheimer, 2005).In language education, corpora have been used in order to develop dictionaries, such as the Collins COBUILD English Language Dictionary.
In addition, they have also been used to develop concordancers, or computer programs to be used with ESL and EFL teaching.16 A concordance, according to Sinclair (1991), is an index to words in a text that provides access to language patterns in a corpus in a systematic way. In addition, a program that generates the concordance line is called a concordancer (Sinclair, 1991).17 The computergenerated concordance output can be flexible in length or grammatical boundaries, depending on the settings of the program. Today, thanks to the some corpora and concordancers are available online.
Related to corpora and concordancers, a collocation is the occurrence of two or more words within a short span of each other in a text. This tool is useful in discovering language patterns, such as grammatical patterns of language. The usual span of intervening words between two words is four (Sinclair, 1991), although the number can be adjusted for different research purposes (Baker, 2006). An example of a collocation can be found in English verbs and their relationships with to-infinitive and gerunds, i.e., certain verbs have preferences to certain forms of complements and depending on the forms of the complements, the verb phrase structure will have different meanings. For example, the difference between the meanings of stop to talk and stop talking.
Advocates of corpus-based language learning have proposed the application of utilizing the concordancer in designing language teaching syllabuses and materials to be used in L2 and FL education (Cain, 2002; Ciesielska-Ciupek, 2003; Davis & RussellPinson, 2004; L. Flowerdew, 2001; Fox, 1998; Lewis, 1993, 1997, 2000; Osborne, 2003; Tribble & Johns, 1997; Wichmann, Fligelstone, McEnery, & Knowles, 1997; Willis, 1990; Willis, 1998). They have claimed that the use of corpora in L2 and FL education can provide not only a means of learning about the language and culture, but also opportunities for using it communicatively, with a focus on situated textual meanings rather than just the linguistic forms. Therefore, they state that using corpora in L2 and FL education can be a beneficial aid of developing reading and writing skills and understanding and producing particular texts and types of texts. From the perspective of the language learner, corpus-based L2 and FL instruction can offer a means by which to increase his motivation and render him more autonomous (Woolard, 2000), while allow him to mine language descriptions through the corpus (Aston, 2001).
As described earlier, one of the benefits of using a corpus for L2 and FL education is that language learners can access the authentic language, which might not be accomplished using a traditional dictionary, in which limited or artificial sentences are used as examples. Using this idea, several studies have reported the use of concordancing during the acquisition of L2 or FL, such by collocation learning (Howarth, 1998; Kita & Ogata, 1997; Sun & Wang, 2003), lexical acquisition (Cobb, 1999; Thuratun & Candlin, 1998), writing (Sun, 2007; Tribble, 1990, 1991, 2001), stylistics (Kettemann, 1995), critical literary (Louw, 1997) and grammar (Sun, 2000, 2003).
As an application of corpus-based language learning, Sun (2000, 2003) used a simple example of using corpus in grammar instruction. She argued that EFL students lack enough exposure to authentic English both inside and outside of the classroom to improve their English proficiency. She also indicated some pitfalls of using a concordance, such as the output was too large for the users to manage and the users were overloaded by the information.
The unique sorting function used by a concordancer, however, can help learners to uncover those grammar rules systematically using the contexts of the vocabulary. The importance of context when learning vocabulary is evident from observations that a word’s meaning varies depending on the context. Therefore, context provides helpful information to learners when attempting to learn vocabulary (Nagy, 1997). Specifically, Sternberg (1987) hypothesized a set of processes and contextual cues to make use of context-based vocabulary learning possible: selective encoding, which involves separating relevant information from irrelevant information; selective combination, which involves combining relevant cues into a workable definition; and selective comparison, which is a process by which new information about a word is related to old information already stored in memory.
In order to operate these vocabulary learning processes in context, he identified several types of contextual cues and moderating variables, mainly the frequency of the vocabulary item in the context. This research has also been supported by research conducted by Krashen (1989, 2004), Nagy, Anderson, and Herman (1987), and Nagy, Herman, and Anderson (1985) and books that introduce the use of context as a part of the practical teaching instruction of L2 and FL education (Johns, 1997; McCarthy, 1990; Wallace, 1982).
In addition to the usefulness of context in vocabulary learning, a long-debated language learning strategy is involved in corpus-based language learning: inductive and deductive learning. Usually, corpus-based language learning relies on inductive learning. Learners look up words in the concordance lines and infer the meanings and usages of the words. On the other hand, the corpus-based language learning offers deductive learning as well. The concordance lines can be lined up in the descending and ascending order in which a keyword occurs so that the learners can observe the usage of the vocabulary in terms of usage frequency. This frequency list allows the learners to deductively learn the usage of the vocabulary. Corpus-based language learning, therefore, uses advantages of both learning strategies.
Corpus-based language learning partially relies on the context of the vocabulary usage as provided by the concordance lines. Researchers have hypothesized that a large portion of the vocabulary growth of school children occurs through incidental learning from written context. Krashen (1989) remarked that if the hypothesis that the use of written context reading in vocabulary acquisition of the first language holds the validity, then the hypothesis that vocabulary are developed in second languages as the they are in the first language is at least reasonable (p. 454).
Therefore, some teachers of advanced L2 and FL learners prefer to expose students to new words in context, hoping that the students will learn the vocabulary thorough contextual clues instead of using vocabulary drills to try to encourage learning and retention.
The practical application of employing frequent word usage and identifying systematic patterns in context (McKewon, Beck, Omanson, & Pople, 1985; Nation, 1990, 2001) can be found in modern corpus-based language learning. However, the use of corpora to describe an aspect of a language dates back to before the invention of the computer. Michael West published the General Service List of English Words in 1953, which ranked the most common words in English by frequency based on a manual analysis of several million words of text. This list has been widely used to design EFL graded reading programs. Since then the logistics of corpora have changed dramatically. Computer techniques have made such frequency counting jobs much easier and lexicographers and grammarians now regularly use corpora to not only establish word frequency, but, also, make use of the wide range of different corpora available.
Further, the development of corpora on various types of media including CD-ROM and online and concordancing tools enable users to process the corpora for their own purposes. Educators who teach and develop syllabi for L2 and FL courses and students studying L2 or FL can also benefit from the easy access to and ability to manage corpora. Johns (1994) proposed classroom concordancing or Data-driven Learning in order to teach ESL learners in academic institutions language forms and functions in an innovative way. The approach assumes that effective language teaching is a form of linguistic research because it encourages language learners to have direct access to linguistic data so that they have their own ability to learn the vocabulary knowledge.
In this approach, learners are provided with samples from concordances instead of examples from textbooks or teachers. The approach assumes learners’ independent learning with inductive learning strategies of vocabulary. Problems, however, may exist when employing concordancing in the classroom, i.e., problems related to corpus (size and choice of corpus), usefulness of the data (learners’ proficiency level in regard to understanding the concordance lines; lengths of the concordance line examples; amount of concordance data; different representativeness of variety usages in the concordance lines), and the need for training on search techniques (J. Flowerdew, 1996). As corpus-based language instruction has developed, a substantial number of studies on learners’ attitudes toward using corpus in the classroom have been undertaken. Sun (2000) undertook one of the earliest studies of learners’ attitudes toward corpusbased language learning and developed a three-week online corpus-based lesson for 37 college EFL students in order to investigate their attitudes toward the tool. The research employed qualitative and quantitative data with survey and open-ended questions.
The results indicated that the students showed positive attitudes toward the use of the online corpus-based instruction for the characteristics of the authentic language examples. The students also reported their perception of the effectiveness of the corpus-based language learning. The students responded that the vocabulary and phrase usages were the two most effective areas of the corpus-based language learning, while the grammar and writing were the least effective areas out of the nine categories.18 Yoon and Hirvela’s (2004) investigation of attitudes toward corpus-based writing instruction, unlike Sun’s (2000) results, reported more promising aspects of using corpora in ESL writing classes. The results of the qualitative and quantitative analyses indicated that the students perceived the use of the corpora as beneficial to developing ESL writing skills and increased their confidence toward ESL writing. Koo (2006) included a short survey of attitudes toward using corpus for paraphrasing newspaper articles for Korean ESL graduate students in his study.
The descriptive survey results showed that the concordancing program helped the learners gain confidence in their English writing. The results reflected the findings of Yoon’s (2005) study, which also investigated graduate level ESL students’ attitudes toward corpus-based writing instruction in which the writers became more independent and confident when increased availability to linguistic resources existed. The results of this study show that by using authentic texts, students become more aware of stylistic issues and became more independent in solving their own linguistic and writing questions. A follow-up study conducted by Yoon (2008) included six graduate students in ESL academic writing courses and revealed different interpretations of ESL writers’ attitudes. The learners not only benefited from the corpusbased writing instruction for their linguistic knowledge correction, but the instruction also promoted their lexicogrammatical awareness. Further, the students become more independent and confident writers.
2.5 Pilot ESL writing analysis
Nam (2008a) conducted a corpus-based text analysis in order to investigate how EFL learners use pronouns in their writing discourse. The study investigated 24 EFL learners’ expository writing samples as available in Park (2004) and investigated the vocabulary usage through corpus-based analysis. According to Park (2004), the EFL Korean writers were asked to write an expository essay comparing the differences between Korean and American cultures. In Nam (2008a), for the purpose of the analysis, the 24 writing samples were compiled as a text corpus. The combined wordlist from the 24 essays showed that, out of 10,282 words, 179 instances of the pronoun they and occurrences of the pronoun we existed in two different contexts.
In addition, 147 instances of America or American(s) and 210 instances of Korea or Korean(s) existed. Given that the words we and they are pronouns that usually take place of the nouns in the preceding contexts, it is possible to assume that these words usually referred to either the Korean or American people. In order to identify instances of the pronoun they that referred to the American people, the pronoun they was set as a keyword and all of the collocational instances of America and American(s) within the span of seven words in the left side of the keyword were extracted in order to capture the maximum locations of the pronoun. In the same way, the cases of the pronoun they referring to the Korean people, the pronoun we referring to the American people and the pronoun we referring to the Korean people were extracted as well. For the sake of simplicity, the labels AMERICAN and KOREAN represent the words America and American(s) and the words Korea and Korean(s), respectively, were used.
Data shows that the pronoun they refers to the Korean people 24 times and the American people 20 times. Although, the numbers in the table were extracted solely based on the context setting, i.e., up to seven preceding words from the pronouns, the table helped to verify the overall usage patterns of the pronouns we and they. However, some instances of the pronoun usage need to be investigated closely. For example, in one section, the pronoun we refers to the American people four times. Given that the writing samples were written by Korean EFL learners, it does not make sense that they would refer to the American people as we.
Below are the actual concordance lines where the pronoun we was collocated with the words America or American: Collocations of the word we and AMERICAN (1) is important to understand American culture. In order to enjoy American life, we should know about tipping. (2) continents. If we compare the Korean culture with the American culture, we can notice that the differences between Korea and the U.S.A are (3) most difficult thing for me to be used to in America is tipping. In Korea, we don’t have to tip for service. We think the price already includes (4) So in Korea, it’s harder to be a good friend than America. In America, we just use English because it doesn’t have respect form. Sometimes, As presented above in the concordance lines, the pronoun we refers to a generic we or the Korean people, not the American people. These incidents were captured because the words America or American(s) occurred in the immediate seven preceding words before the pronoun we.
Based on the above issue, it is worth investigating the use of the pronoun they as referring to the Korean and American people. Given that the nationality of all of the writers is Korean, it may be awkward to use the pronoun they for the Korean people because the pronoun they is a pronoun referring to the third person plural. For the same reason, it is reasonable for the Korean writers would use the pronoun they for the American people, and the pronoun we for the Korean people. Below are the concordance lines for the pronoun they in regard to KOREAN.
Collocations of the words they and KOREAN
(1) differences the house etiquette. In Korea, when people enter the house that they have to be shoes off but In America people do not have to. (2) not enough place for playing them. Usually Korean people want to play sports, they go to facilities to which people pay for using, and it’s not so (3) friends through these kinds of party. Usually, in Korea, people have a party, they should have alcohol drinking and play gambles. When I met my (4) not accustomed to having roommate because Korean culture is common culture. They think they have to share all of their life. People would like (5) to having roommate because Korean culture is common culture. They think they have to share all of their life. People would like to help each (6) is based on individualism. Therefore, if Korean wants to live with roommate, they have to know these differences between cultures and have to try (7) “Our Korea” or “Our University”. Korean has special nationalism, wherever they live. They think all of Korean has only one ancestor, so they (8) or “Our University”.
Korean has special nationalism, wherever they live. They think all of Korean has only one ancestor, so they always use (9) wherever they live. They think all of Korean has only one ancestor, so they always use “OUR” before the country or a group what they are (10) “MY. Most of Korean thinks there is only one race in the Korean peninsula, so they think all of Korean is their family. Because of that they are (11) peninsula, so they think all of Korean is their family. Because of that they are very friendly to neighborhoods, but they aren’t positive to (12) foreigner thinks Korean is very rude, because Korean has very busy life, so they don’t care other person. American has an optimistic view of (13) First of all, amount of drinking alcohol. Koreans usually drink so much even they lost their memory, or drink through the night. Why? I think (14) never drink.
Last, a habit of drinking. When Koreans drink alcohol, usually they want to see ‘The end’. ‘The end’ means the moment that they see (15) of the difference from the group. Koreans call this Wangtta (big isolation). They never want to be isolated, so they try to belong to a peer group (16) between Korea and the U.S.A is the Koreans behave the same way and so they don’t seem to have individual characteristic, while the (17) their own tastes than Korean. For example, most Koreans really care about how they are show to other people, likewise, they tend to easily (18) foods and when people eat food, they eat food with Korean soup.
Whenever they eat soup and spicy food, even though it’s very hot, they eat (19) people misunderstand such as two Korean women must be lesbian because they hold hands. These things are about differences between American (20) the TV show or movie. These young Koreans are easily influenced by media, so they usually imitate the fashion of singers and movie stars. If you (21) to have brand name goods. However, Korean women like that too much. Even they are not rich enough to buy that, they start to save money in (22) its still strange to them. American is more individualistic than Korean. They don’t care what they concern exactly. So they don’t talk to (23) to them. American is more individualistic than Korean. They don’t care what they concern exactly. So they don’t talk to children although they (24) the different possessive case. When Koreans introduce their mother or house, they usually say like “She is our mother and this is our house.”
To As presented above, 24 instances of the pronoun they in KOREAN exist in the EFL writing corpus. Within these instances, sometimes they replaces the preceding noun in the immediate sentence, such as people in lines 1, 2 and 3. This replacement is a grammatical requirement if the writers know the simple grammar rules of the anaphoric pronouns. Several interesting cases exist that might explain why the writers use they for the Korean people. These examples can be found in lines of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20 and 21. In these concordance lines, the context is about a negative aspect of Korean culture. It seems that the writers, even if they are Korean, want to detach themselves from the Korean people in their essay so that they can examine the negative aspect of their culture objectively.
The writers also use they in the AMERICAN instances, as shown below. Unlike in the cases in which they use they to refer to KOREAN, the writers, in these cases, use they as a grammatical requirement during both negative and positive contexts.
Collocations of the word they and AMERICAN (1) to the Koreans. First of all, most Americans are individualistic. They consider that it is very important to make sure own portion of (2) are done with work or on the weekend. If you ask a favor to Americans when they are taking a rest, they could refuse your request politely. (3) on the weekend. If you ask a favor to Americans when they are taking a rest, they could refuse your request politely. I would rather say like this. (4) politely. I would rather say like this. Americans want and values privacy. They feel the easy and the need to give people their privacy or to (5) of which was spending time which was that American people had extra time what they did for themselves. I felt that they were really enjoyed their (6) with my host family, so I can experience American culture in American family. They, even if they were not typical American family, show many kinds (7) family, so I can experience American culture in American family. They, even if they were not typical American family, show many kinds of ways how (8) But it’s such a generalized manner Americans take for granted that they don’t seem hesitating to give money back.
Though the same (9) than Korean think. Although American people share most space with roommate, they do not interfere in other’s life. Roommate is just a roommate. (10) nation or kind of location name. Example, “My America” or “My University.” They don’t say that “Our America” or “Our University.” It is weird (11) before the country or a group what they are member. However American doesn’t. They don’t think they are one community. In fact, most of people in (12) or a group what they are member. However American doesn’t. They don’t think they are one community. In fact, most of people in America are (13) so they don’t care other person. American has an optimistic view of life. They usually use an exaggerated statement.
In spite of not fine, (14) though there are still troubles between different races in American society, they are trying to solve that kind of problems and made some systems (15) are different. I’d never seen American presses someone to drink. If they want to drink, they drink. If they don’t want to drink, they (16) have individual characteristic, while the Americans behave individually and so they don’t seem to have in common, but they look to live their own (17) for people to distinguish in Korea. On the other hand, American looks like they don’t concern about their look in order to distinguish. However (18) in fashion style. Expressing fashion style differs from American women. They try to find their own color whereas Korean women don’t. Since (19) other form older people and younger people than you. But in America, although they have polite word forms it isn’t more important than Korea. It (20) its still strange to them. American is more individualistic than Korean. They don’t care what they concern exactly. So they don’t talk to anybody we should bow to anybody. That is very differences manner. As you (3) enjoy alcohol. After I felt these differences, I thought, Korea is wrong.
We have to change the bad habits. When I go back to Korea, I am going (4) continents. If we compare the Korean culture with the American culture, we can notice that the differences between Korea and the U.S.A are (5) most difficult thing for me to be used to in America is tipping. In Korea, we don’t have to tip for service. We think the price already includes (6) hurt his feelings. Then he said to her, “That’s why I hate Korean people.” We were so shocked that we told our European friends that he said (7) he said to her, “That’s why I hate Korean people.” We were so shocked that we told our European friends that he said that. Then they said that (8) uncomfortable with such a direct way to say something. In Korea, even if we are harmed, as long as it is not so serious, we generally tend to (9) between American culture and Korean culture based on their own space.
We cannot say which is good or bad. These differences is made from (10) Korea and America, but Especially, the language is different. In Korea, we have respective word forms so we use other form older people and (11) the language is different. In Korea, we have respective word forms so we use other form older people and younger people than you. But in (12) we don’t know their age, we can be familiar friends easily. In Korea, we respect adults. In public transportations, we yield our seats to (13) friends easily. In Korea, we respect adults. In public transportations, we yield our seats to older peoples, and we feel it’s natural.
(14) To talk to children although they are annoying you in public places. In Korea, we can scold them if you are enough old to teach them. If their Lines 2, 3, 5, 8, 10, 11, 12, 13 and 14 can be grouped into terms of the positive context of the Korean culture that the writers value and are proud of. Line 3 is a bit confusing because the writer used we in a somewhat negative context, however, writer did so when acknowledging his/her willingness to help his/her culture change their bad habits.
From the pilot analysis of the pronouns in expository writing, the corpusbased writing analysis has been found to be useful in detecting systematic patterns in Korean ESL writers’ writing. The results showed the Korean writers used pronoun they when they would exclude themselves from the group they describe, while they used the pronoun we when they would include themselves in the group. Consequently, through the corpus-based writing analysis, it is also possible to capture how the writers use certain words differently in various contexts. Hyland (2002) investigated the use of authorial references in academic writing discourses. He investigated the use of the pronoun I for 64 Hong Kong undergraduate theses. In the report, he showed a significant under usage of authorial references when the authors made a claim or argument. Nam’s (2008a) results were similar to Hyland’s (2002) results in that they both showed that the use of pronouns in writing discourses may show a “culturally and socially constructed view of self” (p. 1111).
As briefly introduced in the previous chapters, the main purpose of the current study is to identify how corpus-based language learning would be beneficial in producing vocabulary appropriate in ESL writing. The current research explores ESL learners’ productive vocabulary knowledge in writing and the effect of corpus-based language learning writing instruction. Three research questions were formulated in order to investigate the ESL learners’ writing samples and their attitudes towards the concordancer:
1. How does the use of online concordancer and thesaurus influence the quality of writing as indicated by lexical variations?
2. To what extent do corpus-based writing instructions change the learners’ grammatical knowledge of adjective and preposition usage?
3. What are the learners’ attitudes toward the corpus-based writing instruction?
According to Fraenkel and Wallen (2003), an experimental research design is the best way to answer these types of research questions because it establishes cause-andeffect relationships among the variables. The experimental design for the current study involved collecting writing samples from participants randomly assigned to two different writing instruction treatments, i.e., writing using online concordancer or online thesaurus as vocabulary references. For a period of writing practices, the participants were asked to conduct weekly or biweekly writing samples with both groups writing on the same topics.
In order to address the first two research questions, several writing samples from each participant over a period of several weeks were analyzed, i.e., pre/post project, compared to native speaker samples and comparing the two instructional groups. This design and the comparisons helped to establish the relationships between the treatments (the types of the writing instruction) and the relative changes in performance. In order to address the third research question, a cross-sectional survey was conducted to collect information from the subjects of each group. This survey enabled the analysis of the subjects’ opinions and reactions to the corpus-based writing instruction before and after the experiment. The subjects were asked to complete a Likert-scale style survey before and after the experiment, regarding their experiences with each vocabulary reference tool.
3.1 Subjects and setting
Korean undergraduate students at a large mid-western American public university participated in the research. Subjects were recruited by online bulletin and hardcopy filers on campus. When potential subjects contacted the researcher, their fist language and their English education prior to their admittance to the higher education institution were verified. This verification occurred because many Korean students studied in the U.S. during their secondary education and, therefore, their vocabulary knowledge in regard to writing would be different from those who came to the U.S. for their undergraduate degrees only. This process was used in order to reduce the risk of including participants with L1 and English language familiarity.
A total of 46 subjects agreed to participate in the research. Later, two opted out of the research project. Twenty-three subjects did not complete the writing tasks. The remaining 21 subjects finished the writing tasks and completed the experiment. One of the reasons why some of the subjects did not finish the experiment or retracted their consent forms was the heavy load of the undergraduate students’ coursework during the semester. Undergraduate students at the university usually register for 15-18 credit hours. Each class usually requires reading assignments, short paper writing assignments and quizzes. In addition, the students must take mid-terms and finals and receive descent grades in order to complete the course. Many of the students who left the program stated that the requirements of the study were too overwhelming while fulfilling their requirements for their undergraduate coursework.
The requirements for the study included the completion of seven essays and their revisions based on feedback provided by the researcher. In exchange for their participation, the participants received direct and indirect compensations in the form of a 1 GB flash memory drive. The benefits of participation in the study were two folds: the subjects would have a chance to practice their English writing in an academic setting and would have their writings reviewed by an English native ESL/EFL specialist. After the initial screening process, a potential subject was randomly assigned either to the experimental or control group. Once the prospective subject’s group was decided, the researcher explained the entire process, including the tutorials on how to use the online concordancer and thesaurus. After the prospective subject voluntarily agreed to participate in the research, the researcher received the signed consent form and collected subject’s demographic information and pre-questionnaire data.
The survey questionnaire data included subjects’ initial attitudes toward the vocabulary reference tools. After each of the writing sessions were completed, a postquestionnaire survey was collected. The two data sets were compared in order to investigate the subjects’ attitude changes before and after the writing tasks completed with the different vocabulary reference tools. While the majority of the subjects were business majors, other majors were included as well including journalism, psychology, biology, education and telecommunication. In order to verify the subjects’ English proficiency, data from the self-rated reading and writing proficiency and TOEFL® scores were collected (See Appendix A for the questionnaire sheet ). Although two subjects rated their English reading or writing proficiency as poor (subjects CONC4 and CONC7), the majority of the subjects rated their English proficiency as to equal or above fair. All of the subjects rated their computer knowledge and skills, including web-surfing and word processing, as good or above average.
3.2 Data collection
Two sets of data were collected from each group: writing samples and survey data. Both sets were collected via email attachment. All of the subjects wrote a series of writing samples using different vocabulary reference tools during the experiment period. The writing sample data were expected to provide evidence as to how each instructional treatment, the use of an online concordancer or an online thesaurus, affected the subjects’ writing performances. The subjects’ writing samples were analyzed both quantitatively and qualitatively through corpus-based text analysis in order to address questions of grammatical knowledge and changes in their writing discourse. In order to analyze the changes in attitudes toward each vocabulary reference tool, a set of survey sessions that consisted of pre/post-questionnaire items was administered. Before and after the experiment, the subjects were asked to complete survey questionnaires. With this research design, it was possible to compare the differences between the groups and within the groups for both writing performance and attitude changes. The survey data were analyzed qualitatively and quantitatively in order to locate themes and patterns in the subjects’ reactions and responses to the corpus-based vocabulary reference tools.
3.2.1 Writing samples
Each subject wrote seven writing samples (approximately 350-400 words each) during the research period. The writing prompts were developed in such a way that the subjects could write about different topics and use different writing styles. In order to measure the differences or changes in the subjects writing before and after the study, each subject was asked to write on the first and last topic without the help of the vocabulary reference tools.
The writing topics were modeled after the TWE® (TOEFL® Writing Essay) writing topics (http://ftp.ets.org/pub/toefl/989563wt.pdf). The subjects were asked to write comparisons or opinions about familiar social phenomena.
Many Korean students choose to attend schools or universities in the United States. Why do you think they decided to study in the United States? Use specific reasons and details to explain your answer.
Do you agree or disagree with the following statement?
Television, newspapers, magazines, and other media pay too much attention to the personal lives of famous people such as public figures and celebrities. Use specific reasons and details to explain your opinion.
What is the difference between the education in Korea and one in the United States? Use specific reasons and details to explain your answer.
Some people say that the Internet provides people with a lot of valuable information. Others think access to so much
information creates problem. Which view do you agree with? Use specific reasons and examples to support your opinion.
What is the difference between the culture, such as manners and lifestyle in your home country and one in the United States? Use specific reasons and details to explain your answer.
Do you agree or disagree with the following statement? It is more important for students to study history and literature than it is for them to study science and mathematics. Use specific reasons and examples to support your opinion.
According to a recent news report, there are 87,000 Korean students in the United States, one of the highest numbers of the international students in the country. Why do you think they choose to study in the U.S.? Use specific reasons and details to explain your answer.
Throughout the data collection period, a total of 273 (147 for initial writings and 126 for revision writings) samples were collected from the 21 participants (11 in the control group and 10 in the experimental group). Out of the 273 writing samples, the initial writings of the first and seventh topic (a total of 42 writings samples) were used for the actual analysis of the current research. These writings were used in order to examine the changes in writing before and after using the vocabulary reference tools. The 42 writing samples are the divided and re-compiled as four different corpora based on the group and topic: PRETHES and POSTTHES for the control group’s first and seventh writings and PRECONC and POSTCONC for the experimental group’s first and seventh writings.
The turn-around time for each topic was longer than expected. The number of writing assignments was decided based on Someya’s (2000) study, which assigned seven writing assignments during three months. It was expected that one cycle of the subject’s initial writing-feedback-revision would take about a week. However, during the actual process, it sometimes took more than two weeks to get one writing session completed.
3.2.2 Vocabulary reference tools
When students have questions about words that they are about to use, they usually consult dictionaries. As Internet accessibility becomes a greater part of students’ learning environments, an increasing number of websites provide online dictionaries, thesauri or translators.
While dictionaries, grammar reference books, and translation aids are useful to ESL writers, language teachers and researchers have begun discussing how to use linguistic tools, such as concordancers to help ESL or EFL learners with their language learning. Corpus evidence, as described in the previous chapter, can be used to identify typical and atypical usage of vocabulary and suggest what words are typically used in the target language writing discourse. From the writers’ observations of the regularity of the vocabulary usage, i.e., the frequency of the occurrence or co-occurrence of the words, and the regularities of usage, they can identify the collocation, typical grammatical and semantic rules, and appropriate pragmatics from the authentic texts. Once the writers observe the corpus evidence and learn the vocabulary, then they can use it appropriately when writing.
With the rapid development and advances of technology, concordancers have begun to be available on the Internet. Therefore language learners have more convenient access to concordancers and can efficiently manipulate large amounts of authentic texts to make use of the findings for language learning purposes. The convenience of access to corpora is especially possible because online concordancers deal with huge amount of authentic and sorted language examples available in large corpora. Recently, the potential for concordancers in ESL and EFL teaching and learning has become the focus of attention among language teachers and researchers.
Of the many online concordancers available, VLC Web Concordancer was used for the current study. This tool was used because it has a relatively better graphic user interface system that eliminates other statistics/linguistics search functions not directly related to functions necessary for ESL writers who need vocabulary reference tools. In addition, the VLC Web Concordancer is equipped with a variety of sub-corpora topics, including the Brown Corpus, sports, business/economy, language and teaching, computer, and Time magazine. Although concerns were raised about Internet stability and speed in a previous study (Sun, 2000), the current research site university’s high levels of IT logistics eliminated these concerns as the subjects had a wireless internet connection at 54.0 Mbps on campus.
MSN Encarta Thesaurus® was selected as the control group’s vocabulary reference tool. I chose to use a thesaurus over a dictionary for two reasons.
First, current dictionaries contain information that concordancers may not have. Given that the purpose of the research is not to compare the effectiveness between dictionaries and concordancers, the research model only requires minimal intervention from the control group. Second, when students write papers using a word processor, such as MS Word®, they search for synonyms by right-clicking on the word that they need to replace. This function provides a list of synonyms and, sometimes antonyms. Coincidently, MSN Encarta® and MS Word® are products of Microsoft® and provide the same results. Therefore, the comparison of the ESL writers writing’ samples using the different vocabulary reference tools, MSN Encarta Thesaurus® and online concordancer, is meaningful because it reflects real applications of using a thesaurus in ESL students’ writings.
Before the first writing assignment, an introductory session was held on each vocabulary reference tool. The actual online concordancer tutorials created by the researcher are available in Appendix B.20 Particular attempts were made to create the concordance tutorial. The attempts were made, because, unlike other types of vocabulary reference tools, such as online dictionary and online thesaurus, the VLC Web Concordancer and other similar online concordancers contain a number of options that language learners need to manipulate to obtain right information of vocabulary. Without the skills of manipulating the options of the concordancer, the learners cannot fully benefit from the concordancer. For the current research, only three types of options were introduced to the subjects who were in the Concordancer group. Specifically, these options include: search string (equal to, starts with, end with, or contains); corpus selection; and sort type.
The search string option is useful because learners can sort out different morphological or derivational forms of a word, i.e., for example, in the concordanc output, learners can have only concordance lines with the word provide but not with provided or providing. The corpus selection is also useful because learners can obtain concordance lines with a word that can be used differently depending on language genre, discourse, and community. The sort type option is useful especially when the learner looks for, for example, collocating adjectives of a noun or prepositions of a verb. If the learner looks for the adjectives, the left sort function alphabetically sorts the immediate right words of the keyword. If the learner looks for prepositions that follow a verb, then the right sort function alphabetically sorts the words immediately follow the keyword. The concordancer tutorial session explained the advantages of using the concordancer when writing in English.
Then, the researcher demonstrated how to choose keywords and read the concordance lines as shown in. I also allowed the participants to have hands-on experience with the concordancer by walking them through tutorial and examples, and answered any questions that they had. The introduction session lasted about 45 minutes. For the introduction session of the thesaurus tutorial, the session lasted about 10 minutes by introducing the website link. To encourage usage of the thesaurus and concordancer and to check which words the subject checked or replaced, the subjects were asked to keep a list of any of the words that they looked up and return them along with the writings.
During the subjects’ writing process, the ESL writers received feedback on their writings and revised their initial writings based on this feedback. The purpose of providing feedback was to encourage the subjects to engage in the writing assignments more actively. The writings were reviewed by a native speaker of English who is specializes in ESL/EFL. Then, the feedback from the native speaker of English was compiled by the researcher and covered the vocabulary selection of the writing in the areas of grammatical knowledge and appropriate use of vocabulary (See Appendices C and D for writing the feedback guidelines and sample feedback). The feedback was returned to the subjects 3 to 5 days before the next writing assignment was to begin. After receiving the feedback, the subjects revised their writings and sent the revisions back to the researcher via email. During the revision process, the control group was encouraged to use the online thesaurus, while the experimental group was encouraged to use the online concordancer.
According to Bitchner, Young and Cameron (2005):
direct or explicit feedback occurs when the teacher identifies an error and provides the correct form, while indirect strategies refer to situations when the teacher indicates that an error has been made, but does not provide a correction, thereby leaving the student to diagnose and correct it (p. 193). With respect to the degree of the explicitly of the error feedback, Ferris and Roberts (2001) found that learners who received feedback significantly outperformed those who did not receive feedback on the revision tasks. However, no significant differences existed between the types of feedback, with or without error code feedback.21 The current research employed error-highlights as feedback for both groups. The purpose of providing the error-highlights was to have the subjects locate the places at which the errors occurred so that they could make use of the assigned vocabulary reference tools in order to correct and revise the writings.
A set of 5-point Likert-scale pre/post-questionnaire items was developed based on studies conducted by Sun (2000), Yoon (2005) and Yoon and Hirvela (2004) in order to collect data on any attitude changes toward the vocabulary reference tools. The actual survey materials are also available in Appendix A.
The questionnaire data collection sessions were conducted twice, before and after the experiment. The pre-questionnaire data were collected when the subjects were introduced to and agreed to participate in the study. The post-questionnaire data were collected when the subjects submitted their final writing samples via email. The questionnaires for the control and experimental groups were identical, except that concordancer was replaced with thesaurus for the thesaurus group’s questionnaire, and vice versa. The questionnaire consisted of 42 items with the emphases on the following five categories: (1) English writing proficiency improvement, (2) reaction to the tool, (3) outside assignment usage, (4) areas of improvement using the tool and (5) receiving feedback.
The pre-questionnaire scores served as a baseline for the comparison to the postquestionnaire scores. After the subjects completed the seven writing sessions, they filled out a post-questionnaire. The post-questionnaire items were the same as the prequestionnaire items except for the modal of the verbs. The collected data was then used to investigate the attitude changes between the groups and within the group.
During the experiment, each subject was asked to write a total of seven expository or argumentative short essays based on the writing prompts. The length of each essay was expected to be 350-400 words. While writing the essays, subjects were not allowed to use other vocabulary references, except for the online thesaurus or concordancer, depending on the group to which they were assigned. Each writing session consisted of two small sections: initial and revision writing. After the subjects submitted their initial writings, they received feedback in 3 to 5 days. The subjects then revised the initial writings based on feedback and submitted their revised writings via email. For each initial and revision writing, the subjects were encouraged to use the assigned vocabulary reference tools. For the purpose of analysis, however, the subjects were not allowed to consult the vocabulary reference tools during the initial writing of the first and last topics. Throughout the experiment, the subjects used the tools 11 times. All of the communication between the subjects and researcher were conducted via email.
The analyses of the research data followed both quantitative and qualitative research traditions. For the writing sample analysis, a corpus-based linguistic analysis was employed because it integrates the quantitative and qualitative approaches. Koller and Mautner (2004) described the three contributions of the corpus-based analysis to qualitative analysis. First, it exhaustively analyzes syntactic and semantic properties of words. Second, it helps raise questions as heuristic tools and draws researchers’ attention to phenomena. Third, it produces results in its own right. In short, word frequency lists of a text can be a source of quantitative analysis and concordance lines of collocation of certain words can be used for qualitative analysis. Corpus-based analysis can accomplish the goals of discourse analysis by describing and explaining social interaction and structure both quantitatively and qualitatively. For the survey analysis of attitude change, the survey items and open-ended questions were both quantitatively and qualitatively analyzed.
Two computer programs were used to analyze the writing samples and survey data. For the writing sample analysis, WordSmith Tools 5 (Scott, 2008) was used. The software consists of three programs used for text analysis: Concord, KeyWords and WordList. Concord creates a concordance from text files, while KeyWords creates a list of keywords within text by comparing its word frequency list against a larger reference corpus. For the current analysis, the New York Times corpus, a sub-corpus of the American National Corpus (ANC) Second Release (Reppen, Ide, & Suderman, 2005) was used as a reference. KeyWords looks up statistically calculated collocates and the contexts of the keywords. WordList, the third computer program, creates a word list from the texts that can be converted into a word frequency or alphabetic order. For the statistical analysis of the survey data, Statistical Package of Social Science (SPSS) 13 was used to administer the significance tests.
3.4.1 Writing discourse and lexical diversity
One of the purposes of the ESL writing analysis is to examine how the ESL students’ productive vocabulary knowledge has been changed after using the corpusbased vocabulary reference tool. Writings can be considered a form of discourse in which the writers produce their ideology in a written form. This form tends to be considered a whole discourse. Also, the whole discourse consists of small consistent fragments, such as words, phrases, sentences and paragraphs. The choice of these fragments matters with the whole structure of the writing discourse. In ESL and EFL writing, therefore, the appropriate word choice in accordance with the consistency of the writing discourse is also important. The writing discourse is a part of the language proficiency that ESL learners need to acquire when they express their own arguments in a written format. For the same reason, it is possible to analyze the learner’s writing discourse by investigating specific words and their contexts.
One of the ways to investigate a written text is to discover the word frequency of the writing, i.e., the 100 most frequently used words. However, such data only provides a general overview (L. Flowerdew, 2008). Laufer and Nation (1999) explained that most of the time frequency lists only provide a limited explanation of the text: The word the accounts for 7% of the running words in written text. The most frequent 10 words account for around 25% of the running words in spoken and written text. The most frequent 1000 words account for around 75% of the running words in formal texts. By contrast, the tenth 1000 most frequent words account for much less than 1% of the running words in a text (p.35). Many of the high frequent words tend to be function or grammatical words in a given frequency.
Function words hardly provide any information in regard to interpreting the purpose or meaning of the text. Barnbrook (1996) also pointed out that because high frequency words are usually grammatical words, such as articles, conjunctions and pronouns, a simple frequency list does not contribute to understanding the meaning of a text. The frequency list becomes much more meaningful once it is compared with similar lists constructed from other texts. For the current analysis, two sets of statistical analyses of text, type-token ratio and keyword analysis, were employed. Type-token ratio is a way of investigating the word usage of lexical richness or lexical variation, in a text (Engber, 1995). A token is an individual occurrence of any word form in a text and a type is representative word forms. Hypothetically, if a text consists of five words as in “I think I like it.”, five tokens and four types are present in the sentence because there are two occurrences of the pronoun I. Thus, the type-token ratio is 80%. If a text consists of five words as in “I think she did it.”, then the type-token ratio is 100%, meaning that this sentence is more lexically diverse than the previous one. Using this method, the lexical diversity of a text can be detected.
While the type-token ratio provides an index for the lexical variety of a text, the keyness score provides useful information about the content of the text in a reliable way. A frequency list may not provide an accurate picture the purpose of the writing because high frequency words are usually function words, while content words, which actually tell what the text is about, are lower frequency words. In order to alleviate this difficulty in analyzing the purpose of the text, the concept of keyness and keywords has been introduced. According to Scott & Tribble (2006), “[k]eyness is a quality words may have in a given text or set of texts, suggesting that they are important, they reflect what the text is really about, avoiding trivia and insignificant detail” (Scott & Tribble, 2006, p. 56).
Baker (2006) also described how keyness can be used effectively for comparing texts with different styles:
Keyness works particularly well when something is compared against something else. A keyword analysis can therefore be used to compare two (or more) sides of an argument or it could simply be used to compare the linguistic styles of different speakers (Baker, 2006, pp. 146-147).
Therefore, keyness is useful to extract the information on what a text is written about and how the text is written.
For the keyness scores of the current research, WordSmith Tools 5 performed log likelihood ratio tests, which give each word in the corpora a p value. The p value indicates the amount of confidence that a word is a keyword due to chance. WordSmith Tools 5 uses a default of p < .000001, which emphasizes the notion of selectivity rather than risk (Baker, 2006). Therefore, the keyness (or keyword) analysis enables researchers to investigate the nature of a text by singling out keywords. The keyword scores, i.e., keyness, are calculated by comparing another text, which is sometimes called reference corpora. Reference corpora usually refer to a very large and balanced corpus, which can represent a genre. For the analysis of this research, the New York Times corpus of the American National Corpus was used. By the comparisons of a text against a reference corpus, as Baker (2006) suggested, keyness can be used to understand the styles and arguments of texts. Keyness is, therefore, a matter of being a statistically unusual relative to the norm.22 Regarding the keyness scores, the higher the score, the stronger the keyness of the word. Keyness has the following statistical properties:
Statistically, [keywords] are clearly outstanding even if they do not reflect importance and aboutness, so they must represent some other factor. All the algorithm can tell us is that these feature of [the play] stand out as being annually frequent (Scott & Tribble, 2006, p. 60).
A [keyword] is simply a term for statistically significant lexical items … [s]imply being statistically significant is not in itself the important point of interest. That lies in the link between keywords and style (Culpeper, 2009, p.32). The keyness score can provide insightful ways of analyzing writing samples, however, it is the researcher’s responsibility to gather all the relevant information and draw conclusion of the text analysis.
Once keywords are selected from a text, it is useful to look at the words neighboring the keywords. One of the concerns that critical discourse analysis deals with is how particular discourses construct reality, social
identities and social relationships. As an application of critical discourse analysis, Koller and Maunter (2004) suggested that a consistent aura of meaning, called the semantic prosody with which a form is imbued by its collocates (Louw, 1993), can help unravel how discourses are constructed. Koller and Maunter (2004) exemplified the semantic prosody with two content words, federal and federalism, and how these two words are used in different editorials in terms of good or bad prosody. However, the semantic prosody in the writing discourse can be found in many other cases, such as in function words or pronouns. For example, social identities and relations in the writing can be found in the use of between A and B. The word between requires two components, usually noun equivalents connected with and. According to the corpus examples from the VLC Web Concordancer, the component before and stands for the writer’s identity and the one after and stands for the other party.
From a Hong Kong government report corpus (301,218 words), when between occurs with the association of and country names, Hong Kong usually comes first and the other party comes second. Nam (2008a) reported similar patterns of between Korea and America in Korean EFL writers’ writing samples. Keeping the above points in mind, how language learners correctly choose and use words, especially content words, can be evidence of how the learner delivers his or her arguments, descriptions or explanations effectively, and construct themselves in their discourses.
A discussion have been taking place as to how discourse analysis interacts with corpus linguistic tools (Baker et al., 2008; Koller & Maunter, 2004; Stubbs, 1995, 1996, 2001) Researchers have argued that the discourse could be analyzed by detecting the frequency data of metaphorical expressions and lexical items on a specific topic, and by comparing the relationship between word frequencies in one homogenous topic to other topics. The frequency of the lexical words would give an idea or leads in regard to the purpose of the writing. For example, Stubbs (1995) examined 300,000 occurrences of the adjectives little, small, big and large, and found that they occur in a
large complementary distribution, with different uses and collocates. In particular, little has strong cultural connotations. The most frequent noun to co-occur with little is girl(s), and the most frequent adjective to co-occur with girl is little. This type of phrase is nearly twenty times as frequent as small girl(s), whereas little boy(s) is only twice as frequent as small boy(s). Little typically occurs in phrases such as charming little girl, while small typically occurs in formal phrases, such as relatively small amount.
Based on previous research, the following two implications can be employed in the ESL writing discourse. First, frequent content words in the writing discourse convey the writer’s cultural stereotypes and connotations of the word in the target language (Nam, 2008a). Those frequent content words provide an explanation of what the writer intends to deliver in his or her writing. Second, the frequent use of a collocation can reveal the writer’s communicative competence and fluency in the target language because fluent language use depends on how much the writer internalizes such collocation or formulaic phrases (Nattinger and DeCarrico, 1992; Schmitt (Ed.), 2004). During the experiment, the subjects wrote short essays based on given writing topics. The use of the content words was different from topic to topic; however, content words in the first and last writings could possibly overlap as the topics were very similar.
In addition, the writings that received different treatments, i.e., used either the concordancer or thesaurus, may have resulted in different sets of content word lists. As the pattern of the frequency of the content words (or keywords) changes, the use of their collocating content words would change to meet the consistence of the writing. The analysis of the frequency and collocation, such as keywords, is also important because the analysis would answer how the ESL students differentiate between word meanings after using one of the two vocabulary reference tools. The keywords and collocation data were cross-examined, compared and analyzed by subject group and assignment topic in order to lead to a general conclusion about the corpus-based vocabulary reference tool. The results showed that the concordancer worked differently than the thesaurus in ESL writers’ vocabulary usage.
With the help of a corpus linguistic analysis program, WordSmith Tools 5, the following procedures were used in order to investigate the ESL writing samples: identifying the keywords (a word frequency list was used to discover what types of words were frequently used in each of the writing topics), identifying the collocates (context setting, i.e. how many words to the left and right of the keywords, which helps the researcher to discover possible collocates of the keyword), counting the joint frequency of the keywords and collocates (this procedure allows the researcher to render the general picture of how the keywords are used in each writing discourse) and comparing the results against each other.
3.4.2 Grammatical knowledge
The current research examined ESL writers’ grammatical knowledge in regard to their correct vocabulary usage of propositions, adjectives and nouns. The initial writings of the control and experimental groups were evaluated. In order to detect vocabulary errors, mutual information (MI) scores from the reference corpus, The New York Times corpus, were used. The MI scores, according to Barnbrook (1996), represent “the amount of information that each of the two words provide about each other by comparing the observed probability of the co-occurrence with the expected probability, assuming that they were distributed randomly” (p. 98). Once the MI scores for all the two consecutive word sets in The New York Times corpus are calculated, the scores can serve as a reference of appropriate word usages.
WordSmith Tools 5 calculated the MI scores for the words in The New York Times corpus. The results were then loaded onto a spreadsheet in order to search for possible word combinations.
Once all of the errors in regard to the adjective and preposition usages were counted for in each initial writing, then the averages were taken and compared between the groups and within the group. The results of the averaged scores were used to investigate which component of the productive vocabulary knowledge was most affected by the corpus-based vocabulary reference tool. The mean differences were compared using paired-sample t-tests.
The learners’ attitude changes toward corpus-based writing instruction were investigated using four sets of survey data: pre/post questionnaire and control/experimental group questionnaire. With the four sets of survey data, it was possible to measure the attitude changes toward each vocabulary reference tool and compare the ESL writers’ attitudes about the tools. In each survey, the average score for each questionnaire item was calculated and compared. In order to compare the mean differences among the questionnaire item scores, a paired-sample t-test was administered for the 42 questionnaire items. The comparisons were made between the groups and within each group so that the results could compare the attitude changes before and after the writing sessions and between the vocabulary reference tools.
The main purpose of the current research presented in this dissertation is to provide a better understanding of how corpus-based language learning can help improve the kind of vocabulary knowledge that college level ESL learners’ can produce in writing. Based on the literature review on vocabulary learning and corpus-based language learning, as well as the proposed methodology outlined in the previous chapters, three research questions have been considered:
1. How does the use of online concordancer and thesaurus influence the quality of writing as indicated by lexical variations?
2. To what extent do corpus-based writing instructions change the learners’ grammatical knowledge of adjective and preposition usage?
3. What are the learners’ attitudes toward the corpus-based writing instruction?
Sets of pre- and post-test style experiments were administered to answer these research questions. Two groups of Korean undergraduate students at a large Midwestern American public university participated in the research, with a total number of 21 participants. The subjects were randomly assigned to either the experimental group (n=10), in which the subjects used the online concordancer as a vocabulary reference tool, or to the control group (n=11), in which the subjects used the online thesaurus. To answer the first research question, descriptive statistics of vocabulary usage in the students’ writings were observed and analyzed, concerning the lexical usage changes and differences displayed in the students’ writings after they used the vocabulary reference tools. After analyzing the descriptive statistics, a context of selected keywords were examined to compare qualitative differences among the writing samples.
A keyword list was generated statistically by comparing the groups’ writing samples against a reference corpus—the New York Times Corpus. This process allowed to eliminate high frequency function words and generated a list of keywords, which was mostly consisting of content words. The concordances of the writing excerpts were generated by statistically generated keywords and their closely neighboring words or associated words. This qualitative examination is designed to reveal the similarities and differences of the writings generated by the students, before and after they used the vocabulary reference tools. Therefore, the ESL students’ writings were evaluated both quantitatively and qualitatively in a four-way comparison: by vocabulary reference tools, and also, according to their writings both before and after they used the reference tools.
To answer the second research question about potential changes in the subjects’ grammatical knowledge, inappropriate preposition and adjective usages were counted based on their usages in the New York Times Corpus—all the instances of preposition and adjective usage were collected and analyzed. Mutual Information (MI) scores, which explain the strength of the relationship between two words, of any pair of two words used together in the New York Times Corpus were calculated to identify the appropriateness of how prepositions and adjectives are used in the reference corpus—then an MI score list of the New York Times Corpus was created. If an actual usage of the writing samples did not match the preposition and adjective usages present in the New York Times Corpus, then it was considered an inappropriate usage. For the words that were not included in the New York Times MI score list, a native speaker of English, who specializes in ESL writing, checked the appropriateness of the usages.
Statistical significance tests were carried out to compare the error rates within and between the groups.
A careful statistical consideration was brought to the evaluation of writing samples with the type-token ratio analysis and grammatical error analysis. As stated in aforementioned research questions, the main purposes of the current research is to investigate how ESL writers benefit from the concordancer, a corpus-based vocabulary reference tool, compared to another vocabulary reference tool, the online thesaurus in producing vocabulary in their writing. To test the differences in writing quality between the groups of fixed effects, one using the concordancer and the other using the thesaurus, a statistical model that can test the differences in lexical diversity and grammatical error from two independent variables (i.e., the concordancer group and the thesaurus group) with multiple levels (i.e., seven topic writings) needs to be designed.
First, as a base statistical model, a repeated measures ANOVA model was adopted. The model was called for because of the mixed nature of the current study, which tested for mean differences between two groups, while subjects being under repeated measures. Then, to explain variation in dependent variables through a control of an extraneous variable, i.e., a covariate, was also considered. Therefore, a statistical model of repeated measures ANCOVA was employed to analyze the effects of using the concordancer and the thesaurus in the ESL learners’ productive vocabulary knowledge in writing within a certain period of time. The statistical model allowed examining and comparing the means of lexical diversity and grammatical error for the concordancer group and the thesaurus group that are related to each other.
Covaiates were introduced in the design for the statistical control of extraneous variables. There are two types of variable in the study: (1) a between-subject variable of the concordancer group and the thesaurus group; and (2) a within-subjects variable of seven topic writing engagements with the vocabulary reference tools. The results of the comparisons between the groups and within the groups were expected to answer how ESL learners benefit from corpus-based language learning in acquiring productive vocabulary knowledge in writing. For the statistical analysis, Statistical Package for the Social Sciences (SPSS 13) was used.
The null hypotheses of the research is that: (1) there is no differences of lexical diversity and grammatical error between the groups and between the interventions of the vocabulary reference tools; and (2) there is no interaction of the vocabulary reference tools for ESL writing performance (i.e., in increasing lexical diversity and reducing grammatical error) and the number of writing activities with the vocabulary reference tools in (Topic 1 writing through Topic 7 writing) as they relate to the type-token ratio and the number of grammatical errors per topic writing.
The attitude changes toward the vocabulary reference tools were measured quantitatively. A set of pre-/post-questionnaires was administered to identify attitudes toward each tool before use, and changes in attitude which might have developed after using them. The ESL learners’ attitude differences before and after using the concordancer and the thesaurus was tested through paired-samples t-tests. The questionnaire results were also analyzed within and between the groups.
4.1 Writing quality evaluation
4.1.1 Lexical diversity: type-token ratio
A total of 147 writing samples were collected and examined for the research. Since there were 11 subjects in the control group—the thesaurus group, each writing topic consisted of 11 different writing samples per writing topic. For the experimental group—the concordancer group, each topic writing consisted of 10 different writing samples, because there were 10 subjects in the group.
To have an overview of the vocabulary usage in the writing samples, the concept of a type-token ratio was introduced. In a word frequency list, tokens are individual occurrences of words. The number of tokens which appear in a text is equivalent to the total number of words. Types are the number of unique word forms, rather than the total number of words in the text. Therefore, as explained briefly in the previous chapter, the type-token ratio, “the average number of tokens per type” (Baker, 2006, p. 52) or the “ratio in percent [which exists] between the different words in a text and the total number of words” (Laufer & Nation, 1995, p. 310), can be calculated by the number of types divided by the number of tokens: this method is a way of investigating the lexical richness of the subjects’ word usage—or the lexical variation—which appears in their writings. Type-token ratio information for each writing sample is calculated through the corpus analysis software WordSmith Tools 5 (Scott, 2008).
According to Baker (2006), the type-token ratio is still useful when looking at relatively small text files under 5,000 words to characterize lexical diversity. However, the type-token ratio has been criticized for its simplicity, or its potential to mislead based on the inverse relationship of essay length and the number of different lexical items in an essay; this development can happen because it is easily affected by differences in text lengths (Engber, 1995; Laufer & Nation, 1995). Engber (1995) and Laufer and Nation’s (1995) claims are important in corpus linguistic text analysis in using the type-token ratio for evaluating lexical diversity is in its sensitivity to the length of text as Bayeen (2008) states “the number of different word types depends on the number of tokens” (p. 204). Therefore to analyze the type-token ratio of the ESL writing samples, the text size of the baseline needs to be treated as a covariate.
Since the formula of the type-token ratio is the simple calculation of the number of types divided by the number of tokens and is expressed as a percentage, a text with a low type-token ratio contains word repetition, whereas a high type-token ratio suggests more diverse usage of words in a text. Along with the type-token ratio, the number of sentences and the average number of the words in a sentence are also expected to provide evidence of writing quality (Howerton et al, 1977).