City University of Hong Kong Dep
Department of Chinese, Translation and Linguistics
A Query-focused Multi-document
Based on Lexicon Chains
Prof. Dr. Le SUN
Institute of Software, Chinese Academy of Sciences
Date: 23 January 2007, Tuesday
Time: 11:00pm - 12:00pm
Venue: B7603 (Lift 3, 7/F, Blue Zone),Academic Building,CityU
With the development of Internet, the phenomenon of "Information overload" has become a growing problem. Automatic text summarization is one of the key technologies to address it. In this presentation, the classification of automatic text summarization will be introduced firstly. Then we will focus on an improved summarization algorithm based on lexical chain. The improvements of our algorithm are: 1) Selecting those candidate words in lexicon chains with the descending sequence of word frequency in order to generate the chains efficiently; 2) adopting a separately-building twice-merging strategy to handle large-scale document; 3) Adding the relevance between queries and candidate sentences into summarization algorithm. Finally, we give a brief description of the implementation of a Chinese and English summarizer, named IS_SUM, based on our new algorithm. Its experiment results in Document Understanding Conferences (DUC) in 05 and 06 will also be presented.
Dr. Le SUN, Associate Professor/Associate Director, Centre for Chinese Information Processing, Institute of Software, Chinese Academy of Sciences. He got his PhD at 1998 from Nanjing University of Science & Technology, China. During June, 1998 to November, 2000, as Post-doctoral Researcher at Institute of Software, Chinese Academy of Science, China. During March to September, 2003, as Visiting Senior Research Fellow at Centre for Corpus Linguistics, Department of English, University of Birmingham, UK. During December, 2004 to December, 2005, as visiting scholar at RALI, Department of Computer Science, University of Montreal, Canada. His research interests include Chinese Information Processing, Computer-Aided Translation (CAT), Information Retrieval (IR), and Information Extraction (IE).