A Comparison of Lexical Richness in L2 Written Productions

Indeed research has shown that vocabulary acquisition is one of the challenges of a language learner; even more so for productive vocabulary. Thus this study intends to investigate the lexical richness of 2 groups of EFL learners. This was done through a content analysis of 139 essays of entry level university students and 140 essays of third year university students studying at an English medium university. Both these groups of students scored at least a band 6 or 550 in TOEFL. Specifically, the objective of this study is to find out if there are differences in the lexical richness of these two groups of essays in the type-token ratio. This study also intends to find out if there are differences between the two different groups of essays in the use of the 1,000, 2,000, word levels, the AWL as well as the use of the words not-in-the-list. The RANGE programme developed by Nation, Heatley & Coxhead (2002) was used to carry out the above procedure. Findings of the study showed that the two groups of essays show statistically significant difference in the use of the 1,000 , 2,000, word levels and the AWL by the two groups of learners. There was also a difference in the not-in-the-list category. This research has pedagogical implications for the teaching of vocabulary in the language classroom with a specific focus on the development of lexical richness in EFL learners‘ written production. Keywords—Lexical Frequency Profile, Productive Vocabulary, Lexical Richness)


Introduction
The overall writing progress of students can be measured by the lexical richness of their texts. This statistical measure which gauges the lexical richness may be referred to as lexical density, lexical diversity or lexical richness (Gregori-Signes & Clavel-Arroitia, 2015). The lexical richness of a text accounts for how many different words are used in a text, while lexical density provides a measure of the proportion of the lexical items ( ie, nouns, verbs, adjectives and some adverbs ) in the text. The term lexical density was originally coined by Ure (1971) to indicate the number of lexical words of a text.
For a language learner to function independently he needs to be able to use at least the 2000 most frequent words (Schmitt, 2000). This has also been evident in the findings of Engku Haliza et. al (2013) showing that the receptive vocabulary size of these learners at entry to the faculties are on average at the 2,000 word level. It is also the claim of Cobb (1995) that language courses normally do not target for learners to acquire vocabulary beyond a few thousands as it is assumed that learners will continue to learn new words on their own. However little is known as to the extent of vocabulary acquisition or support given to learners in ensuring that their acquisition of vocabulary is progressive throughout their years of academic study.
Research on vocabulary growth have mainly been conducted on receptive vocabulary growth such as that of Milton and Maera (1995), Cobb and Horst (2000) and Schmitt and Maera (1997). Previous studies have also focused much on receptive vocabulary growth with the exception of Ozturk (2015). In his study, Ozturk attempted to measure the vocabulary growth of both receptive and productive vocabulary growth through the use of the Nation's (2001) Vocabulary Levels Test. Unlike Ozturk's and most previous studies this current study focuses only on the measure of productive vocabulary. The emphasis on productive vocabulary is essential as a learner's communicative competence is greatly manifested through their speech and written work (Laufer, 2005). The focus on writing as a productive vocabulary in the context of this study would further contribute to existing literature on lexical richness and vocabulary growth.
The objective of this study is to compare the lexical richness of pre-sessional students and that of advanced students of the International Islamic University Malaysia (IIUM). The specific objectives of this study are to determine:  The difference between the type-token ratio in the first year students' essays and that of the advanced level students' essays;  The difference between the use of 2000-word level in the pre-sessional students' essays and that of the advanced level students' essays; and  The difference between the use of academic words in the pre-sessional students' essays and that of the advanced level students' essays 2

Literature Review
This present study is concerned with one area of lexical richness which is the lexical diversity or variation that is measureable using the Lexical Frequency Profile (LFP), now renamed as RANGE (Nation, Heatley & Coxhead, 2002). It is a programme that has the ability to measure lexical richness developed by Laufer and Nation in 1995. The assumption behind the LFP is that the proficient learner uses more words of the higher vocabulary levels, while less proficient learners tend to use words at the lower vocabulary levels ( Chen, 2015). This is based on the knowledge that L2 learners acquire vocabulary in an incremental manner, where lower level words are typically acquired faster than that of the higher level words. Furthermore, research has shown that sample writings that contain simple vocabulary (lower level words) are typically rated low (Cobb, 2003;Hinkel, 2003) while sample writings rated high typically contain higher lexical richness (Laufer & Nation, 1995).
Earlier studies have shown that there is a link between lexical richness and overall quality of essays. Linnarud (1986) for example, was one of the earlier studies that measured the lexical richness of 42 Swedish learners of English and 12 native English speakers. Her findings showed that there was a significant moderate correlation (0.47) between the use of unique words and quality of essay. Several other studies such as that of Jarvis (2002), Engber (1995) and Li (1997) also indicate a similar idea of the ability to discriminate the quality of essay from the indication of lexical richness. In other words, these findings suggest that lexical richness in learners' writing seems to be a moderately good predictor of overall text quality. In contrast, research findings have also revealed that writing samples that contain simple vocabulary do not receive high ratings (Laufer & Nation, 1995).
Several research have also attempted to correlate the vocabulary used with lexical richness using sample essays of high stakes proficiency examinations. Douglas's (2010) study on a large-scale Canadian test of university entrance-level writing competence for instance, found moderate to strong correlations between independent measures of lexical breadth of knowledge and overall final assessments. In addition, Banerjee, Franceschina, and Smith (2007) carried out a similar study on the academic writing module of the International English Language Testing System (IELTS). Findings of their study suggest that there is a positive relationship between the sophistication and judgement of lexical output and that of IELTS band levels, where lexical sophistication is measured by the percentage of low-frequency words.

Lexical frequency profile (LFP)
The LFP which has been renamed RANGE, is a programme that allows a sample essay to be inserted as raw input. The programme then produces an output that profiles the lexical content of the text in the various frequency bands. These frequency bands are categorised according to the levels of the vocabulary of the 1000 most frequent word families (West, 1957), the next 1000 most frequent word families (West, 1957), and the Academic Wordlist (Coxhead, 2000), which contains the 570 most frequent word families drawn from academic texts. Words that do not belong in any of the mentioned categories will be grouped under the ‗not-in-the-lists' category. The LFP has been useful as a measure of one's lexical richness where a proficient learner has been shown to have the ability to use lower frequency words when compared to a less proficient learner. Additionally, it has also been shown that a more proficient learner uses a wider array of words when compared to a less proficient learner. (Chen, 2015). The LFP then, allows for an analysis of the free productive vocabulary that are produced by a learner. An added advantage of the LFP is also that it is simple to run, cost effective, and able to produce information instantly. This programme has also been made available on the website of Tom Cobb (https://www.lextutor.ca/range) as well as Paul Nation's website (https://www.victoria.ac.nz/lals/resources/range) which has eased access to many researchers.
The use of LFP however, is not without its criticisms. Two main criticisms is its inability to distinguish individual differences and the instability of the results when essay lengths are inconsistent (Smith, 2004). Maera (2005) also highlights its inability to differentiate formulaic sequences in written productions. Despite the criticism of the LFP, several research has been carried out taking into account its weaknesses. Among others are that of Nazli Azodi, Fatemah Karimi & Ramin Vaezi (2014), most of whom understand the potential of the LFP as a pedagogical rather than as an assessment tool. The strength of LFP is that it is the most sensitive measurement tool of productive vocabulary since it is based on the incorporation of the real corpus such as GSL, AWL, BNC and COCA. It also has the capability of allowing researchers to insert their corpus of choice.
The LFP has today been accepted as the best available programme that enables a standard analysis of lexical richness to be carried out. Findings suggest that the prediction of overall text quality could be done through an analysis of the lexical richness of the text. Lexical richness is manifested in terms of the sophistication as well as the range of productive vocabulary of an L2 learner (Wolfe-Quintero, Inagaki, & Kim, 1998).

Participants and setting
The participants of this study consisted of 2 groups of students. The first group of students are pre-sessional students who have undergone an intensive English programme at the Centre for Foundation Studies of the International Islamic University Malaysia. The second group of 100 students are final year students (henceforth referred to as post-sessional) who have gone through university academic programmes with English as the medium of instruction, and are about to graduate. Both groups have met the minimum entry requirement of the university which is an equivalent of IELTS band 6 or TOEFL 550.
This software shows the relative proportion of words from different frequency levels in a written text. The LFP calculates the proportion of words that belong to the following four levels or lists: the first 1000 most frequent words, the second 1000 most frequent words, the AWL words and a fourth level called the ‗not-in-the-lists' word list consisting of words not contained in any of the other levels. Proper nouns and incorrect spelling were deleted. Misspelled words, depending on the severity of the misspelling, were corrected. The aim was to limit human intervention that could corrupt the data, where possible.

Procedure
Based on the assumption that a learner improves by 500 productive words a year, our focus is therefore on students who have yet to enter their degree programmes and students who are in the final year of their degree programmes. Vocabulary profiles for each participant were established and both groups of students were given a task to write on a general topic of approximately 300 words. No aids such as dictionaries or digital devices were allowed, nor were they allowed to consult each other.
These 300 written productions were then digitized using a processing programmed in the form of a .txt format. As part of the data cleansing process, proper nouns were eliminated and minor spelling errors were corrected to enable the Range to recognize the words. The texts were then inserted into Range and results in were summarized terms of type/token ratio (TTR) in the form of the percentage of words of the text that fall into the first thousand most common, the second thousand, the Academic Word List (AWL), and not-in-the-list words. A token is counted based on the number of word forms that occurs in a text, while a type is the word form that is counted only once (Cobb, 2004).
Taking into consideration the criticisms of LFP mentioned earlier, this study focus on comparisons of different groups instead of individuals. Precautions were also taken to ensure that he length of all written productions were kept consistent at approximately 300 words also on the same genre.

4
Results and Findings

Research question 1
To find out the difference between the type-token ratio in the students' essays and that of the advanced level students' essays. The results of running a Mann Whitney U test in Table 1 showed that a statistically significant mean rank difference existed in the percentage of one thousand level words employed, U = 1817.00, p < .001 between pre-sessional (MR = 63.21) and post -sessional students (MR = 92.95). Hence the data suggests that post-sessional students employ a greater percentage of one thousand level words as compared to their pre-sessional counterparts. According to Nation (2001), the first 1,000 words make up 77% of running words of most academic texts. Our data shows that there tend to be an over-dependence of the post-sessional students on the 1,000 most frequent words while the pre-sessional students show that they are using fewer of the 1,000 most frequent words.

RQ 2
To find out the difference between the use of 2000-word level in the pre-sessional students' essays and that of the advanced level students' essays The results of running a Mann Whitney u tests ( Table 2) showed that a statistically significant mean rank difference existed in the percentage of two thousand level words employed, U = 1176.50, p < .001 between pre-sessional (MR = 99.79) and post-sessional (MR =53.40). Hence, the data suggests that pre-sessional students employ a greater percentage of two thousand level words as compared to their postsessional counterparts. The results of running a Mann Whitney U test showed that a statistically significant mean rank difference existed in the percentage of three thousand level words employed, U = 1948.50, p < .001 between pre-sessional (MR = 64.86) and postsessional students (MR = 91.17). The data suggests that post-sessional students employ a greater percentage of AWL words compared to their pre-sessional counterparts. This is in line with the findings of Laufer (1994) from Nation, 2001; 179 who show that when learners have continuous contact with English, their 2,000 words become less and the words from the AWL increase.

RQ 3
To find out the difference between the use of words not-in-the-list in the presessional students' essays and that of the advanced level students' essays The results of running a Mann Whitney U test showed that no statistically significant mean rank difference existed in the use of words not in the list, U = 2953.50, p > .05 between pre-sessional (MR = 77.58) and post-sessional students (MR = 77.41).
Hence the data showed that the use of words not in the list is almost the same for groups of students.

Conclusion
Many researchers, for instance Laufer and Nation (1995), found the LFP to be a useful tool for curriculum design purposes. However, the main purpose of this present study was to provide automatic feedback to our learners of the quality of texts submitted. More specifically the feedback is meant to bring attention learners on the types of vocabulary they have the tendency to use, the repetitive use of some levels of vocabulary and so on which might affect the quality of essays produced. Nation (2001;186) recommends the use of frequency, in which case the LFP is one, to provide feedback to learners of their vocabulary use from the perspective of accuracy, clarity and liveliness. In cases where teachers have to monitor large numbers of students such as that of this current study, technologies that provide automatic feedback would assist teachers in guiding learners in their vocabulary development. This also train learners to track their own free productive vocabulary development which contributes in creating learner autonomy.
Nation also recommends teachers to provide feedback of individual writings based on the types of words learners use in their writings. He further states that it is important for learners intending to pursue university education in English to have productive mastery of the AWL and time invested in learning these words is time well spent as one of the indicators of lexical richness is having the ability to use low frequency words. This is one of the essential indicators of academic success.