City University of Hong Kong Dep
Department of Chinese, Translation and Linguistics,
The Halliday Centre for Intelligent Applications of Language Studies &
Language Information Sciences Research Centre
Mining Linguistic Information:
The ANC as a Resource for Language Processing and Linguistics Research
Prof. Nancy Ide
The Computer Science Department, Vassar College, Poughkeepsie, New York
Date: 31 August 2006, Tuesday
Time: 4:30pm - 5:30pm
Venue: B7603 (Lift 3, 7/F, Blue Zone),Academic Building,CityU
This presentation will survey the ways in which the ANC is being used and potential uses of the corpus and its annotations in linguistics and language processing research. The availability of a large corpus of American English provides a rich resource for research into contemporary American English usage and serves as a source of information and examples for teaching English as a second language. It is also a valuable resource for researchers in natural language processing, who utilize the corpus to develop language models that in turn guide statistical language processing software. Unlike the BNC, the ANC is or is being annotated for a variety of phenomena at various linguistic levels, including not only part of speech but also syntax, semantics, co-reference, etc., and in many cases is annotated with several versions of a given phenomenon produced using different annotation software engines. This opens up the possibility to pursue a far wider range of questions concerning contemporary American English usage, explore the interactions between linguistic phenomena at different levels, and compare and merge annotations of the same type in order to improve the performance of automatic annotation software.
Nancy Ide is Professor and Chair of the Computer Science Department at Vassar College in Poughkeepsie, New York. She has been involved in the development of representation standards for language data since 1987, when she founded the Text Encoding Initiative. Since then she has been involved in several corpus building projects, including MULTEXT, MULTEXT-EAST, and now the American National Corpus , and continues to work on standards as a member of the International Standards Organization's committee on Language Resource Management. Professor Ide has published extensively in the fields of computational linguistics and computational lexicography, especially in the area of word sense disambiguation. Since 1997 she has been a co-organizer of the biennial EUROLAN summer schools on computational linguistics. She is currently co-editor-in-chief of the journal "Language Resources and Evaluation"(formerly "Computers and the Humanities"), and co-edits a book series for Springer entitled "Text, Speech, and Language Technology".
~ All Are Welcome ~