A comparative study of the effect of word segmentation on chinese terminology extraction

Luning Ji, Qin Lu, Wenjie Li, YiRong Chen

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to Chinese terminology extraction. Detailed study shows that propagated errors caused by word segmentation have great impact on the result of terminology extraction. Based on our analysis and experiments, it is proven that character-based terminology extraction yields much better result than that using segmentation as a preprocessing step.
Original languageEnglish
Title of host publicationPACLIC 20 - Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation
Pages101-109
Number of pages9
Publication statusPublished - 1 Dec 2006
Event20th Pacific Asia Conference on Language, Information and Computation, PACLIC 20 - Wuhan, China
Duration: 1 Nov 20063 Nov 2006

Conference

Conference20th Pacific Asia Conference on Language, Information and Computation, PACLIC 20
CountryChina
CityWuhan
Period1/11/063/11/06

Keywords

  • Chinese corpus
  • Terminology extraction
  • Word segmentation

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this