City University of Hong Kong Dep
Department of Chinese, Translation and Linguistics
Research Degree Forum
Age Tagging and Word Frequency for Learner’s Dictionaries
Mr. LI Hanhong
PhD candidate, Department of Chinese, Translation and Linguistics, City University of Hong Kong
Date: 12 Nov 2009, Thursday
Time: 4:30 - 5:30pm
Venue: Y7302 (7/F, Yellow Zone), Academic Building, CityU
The use of corpora for word frequency information is unquestioned in contemporary lexicography, particularly in learner’s dictionaries. A review of the current practice shows that word frequency information is used for entry selection, sense ranking, and collocation identification as well as defining vocabulary. However, age information in linguistic corpora has not been adequately highlighted or exploited. Early experiments have demonstrated that word retrieval in long-term memory is much more influenced by the age of acquisition than word frequency. For EFL English learners, it is necessary to know what words native speakers tend to use at different ages besides frequent words. Core words are not simply those with high frequency but also those with even distribution in different age groups as recorded in corpora. If learner’s dictionaries can incorporate a word profile based on frequency and its distribution across different age groups, it will be of much help for English learning and teaching as well as research of core vocabulary. Our research makes use of the age group information in the British National Corpus XML Edition (BNC xml 2007). The age group information for the spoken part of BNC xml is indirectly tagged. In order to extract word units according to age group information in the spoken BNC xml, we replace its utterance tagging with a more detailed uniform pattern. In this case, it is much easier to collect data for further analysis. With the above modification, we find that higher coverage can be achieved when we select core words by the combined parameters of word frequency and its distribution in different age groups rather than raw frequency only. Moreover, young age groups rely more on core words in their daily communication than adults. People from different age groups use more core words selected on frequency-age basis than those from Longman or Oxford Defining Vocabulary which are selected on frequency or frequency-range basis.
Mr. LI Hanhong is currently a PhD candidate in the Department of Chinese, Translation and Linguistics. His research interest mainly involves corpus linguistics, lexicography, SLA and core vocabulary research.
~ CTL Staff and Research Degree Students only ~