Chinese term extraction using minimal resources

Yuhang Yang, Qin Lu, Tiejun Zhao

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

15 Citations (Scopus)

Abstract

This paper presents a new approach for term extraction using minimal resources. A term candidate extraction algorithm is proposed to identify features of the relatively stable and domain independent term delimiters rather than that of the terms. For term verification, a link analysis based method is proposed to calculate the relevance between term candidates and the sentences in the domain specific corpus from which the candidates are extracted. The proposed approach requires no prior domain knowledge, no general corpora, no full segmentation and minimal adaptation for new domains. Consequently, the method can be used in any domain corpus and it is especially useful for resource-limited domains. Evaluations conducted on two different domains for Chinese term extraction show quite significant improvements over existing techniques and also verify the efficiency and relative domain independent nature of the approach. Experiments on new term extraction also indicate that the approach is quite effective for identifying new terms in a domain making it useful for domain knowledge update. Licensed under the Creative Commons.
Original languageEnglish
Title of host publicationColing 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
Pages1033-1040
Number of pages8
Volume1
Publication statusPublished - 1 Dec 2008
Event22nd International Conference on Computational Linguistics, Coling 2008 - Manchester, United Kingdom
Duration: 18 Aug 200822 Aug 2008

Conference

Conference22nd International Conference on Computational Linguistics, Coling 2008
CountryUnited Kingdom
CityManchester
Period18/08/0822/08/08

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Cite this