Rogers, James M.. 2016. What are the collocational exemplars of high-frequency English vocabulary? on identifying Mwus most representative of high-frequency, lemmatized concgrams . PhD Thesis Doctor of Philosophy. University of Southern Queensland.

Digital Object Identifier (DOI)https://doi.org/10.26192/5bf5ff14ed350

Collocations, simply defined, are words that have a high frequency of co-occurrence (Biber et al., 1999: Shin, 2006). Collocational fluency is an essential aspect of communicating in and comprehending a second language in a native-like fashion. However, second language learners of English struggle to obtain such fluency since there is a lack of focus on it in the classroom and in ESL resources. This stems from the lack of a large-scale resource that identifies which collocations to teach to help learners master high-frequency English. So, although a large number of researchers agree upon the importance of collocational fluency and focusing on high-frequency collocations directly, learners, teachers and materials writers lack guidance as to which items to focus on. Such a resource is not available because research that has consideration for all the important aspects of identifying collocations that previous researchers have identified has yet to be implemented on a large scale. Therefore, this thesis set out to accomplish such a task. The goal was to create a methodology which would result in a practical resource which identifies multi-word units most representative of high-frequency collocations of high-frequency lemma of English, and which of these items would be most useful for Japanese learners to study. It aimed to identify such items by collecting and analyzing corpus data with the help of eight native English speaking university teachers in Japan who teach English as a second language, two native English speaking junior high school teachers in Japan who teach English as a second language, five native Japanese translators with native-like ability in English, one native English speaking university professor who teaches English as a second language and has extensive knowledge developing concordance software, and one Romanian translator with native-like ability in both English and Japanese. Once identified, Japanese university freshmen were tested on their knowledge of these items. This study took a corpus linguistics approach, working with data from the Corpus of Contemporary American English (COCA), to identify high-frequency collocations and the multiword units they most commonly occur in. A frequency cut-off was identified which resulted in approximately 11,000 multi-word units that only consist of approximately 3,000 word families, of which the vast majority are high-frequency. Corpus dispersion and chronological data were iii deemed unreliable for determining whether or not items selected had general usefulness over a variety of genres and throughout time, and time-consuming manual analysis for general usefulness was deemed essential. This was due to the fact that this study’s data analysis alone would either lead to items deemed worthy of direct instruction by native speakers being flagged as having unbalanced data dispersion at certain parameters, while at other parameters items deemed unworthy of direct instruction were shown to have balanced data dispersion. Also, consideration for colligation was found to only improve upon a small percentage of items, and while useful for improving the quality of data, the process was found to be extremely complex and time consuming due to the lack of an established methodology and dedicated software. Expanding multi-word units beyond their core was found to be an essential step in that native speakers opted to do this in over half of the items identified. For example, concordance data identified equal access as the most frequent multi-word unit that the two lemma equal/access occur in (the core unit), but the native speaker opted to add the next most common multi-word unit instead (equal access to) in regards to what unit should be studied directly by learners. Semantic transparency analysis to help select only items that are semantically opaque and thus deserve more study time was not fruitful since the majority of items identified were considered to be semantically transparent. In contrast, L1-L2 congruency was found to be a very important criterion to consider with half of the items identified being considered incongruent to an extent, thus deserving more study time. Furthermore, native speaker intuition was found to be extremely reliable in regards to context creation using mostly high-frequency vocabulary. Out of 130,000 tokens of example sentence context created, the added content only reduced the percentage of tokens in the high-frequency realm (3,000 word families) by 0.92 percent. Confirming this was essential in that if their intuition could be relied upon for context creation that used mostly high-frequency vocabulary it would help avoid adding additional learning burden. Finally, university students’ knowledge of a balanced selection of the items with consideration for frequency and L1-L2 congruency was found to be quite low overall, highlighting the need for increased focus on the list in general. This study thus filled a major gap in the research in that it resulted in a list of items which can be utilized to help create resources or studied directly to help improve collocational fluency. A variety of steps were taken to create this resource which helped highlight the value or lack iv thereof of each of these steps to achieve this study’s goal. Therefore, this study should be considered a valuable contribution towards research which aims to help second language learners achieve collocational fluency.

Keywordscollocational fluency; English as a second language; Japanese
ANZSRC Field of Research 2020470306. English as a second language
Byline AffiliationsFaculty of Business, Education, Law and Arts
