News & Events
Topic: Research Degree Forum: "Use of Terms and Term-Related Units as Feature Sets for Automatic Text Classification"
Posted - 10/09/2010 : 12:46:46
Department of Chinese, Translation and Linguistics
Research Degree Forum
Use of Terms and Term-Related Units as Feature Sets for Automatic Text Classification
Ms. CAO Jing
PhD candidate, Department of Chinese, Translation and Linguistics, City University of Hong Kong
Date: 15 September 2010, Wednesday
Time: 3:30 - 4:30 pm
Venue: B7603 (7/F, Blue Zone), Academic Building, CityU
The current study investigates how terminologically-informed features would contribute to automatic text classification. In particular, we examine the use of terms and term-related units as feature sets in different classification tasks. A sub-corpus of 80 texts was created out of the British component of the International Corpus of English. Three classification tasks were determined according to subject domains, registers and text categories. The performance of the selected feature sets was evaluated in terms of F-score through machine learning techniques. Such performance was also compared with that of conventional lexical and grammatical feature sets. Although it is a comparatively small corpus, the empirical results show that while features determined according to the lexical criterion have a consistent performance, the use of terms produced superior classification performance when classifying texts according to subject domains.
Ms. CAO Jing is currently a PhD candidate in the Department of Chinese, Translation and Linguistics. Her research interest mainly involves corpus/computational linguistics and terminology.
~ CTL Staff and Research Degree Students only ~