Classification-based Chinese collocation extraction

Ruifeng Xu, Qin Lu, Kam Fai Wong, Wenjie Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better F1performance compared to most existing algorithms for Chinese collocation extraction.
Original languageEnglish
Title of host publicationIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
Pages308-315
Number of pages8
DOIs
Publication statusPublished - 1 Dec 2007
EventInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007 - Beijing, China
Duration: 30 Aug 20071 Sept 2007

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
Country/TerritoryChina
CityBeijing
Period30/08/071/09/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Classification-based Chinese collocation extraction'. Together they form a unique fingerprint.

Cite this