A hybrid extraction model for chinese noun/verb synonym bi-gram collocations

Wanyin Li, Qin Lu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
Original languageEnglish
Title of host publicationPACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Pages430-439
Number of pages10
Publication statusPublished - 1 Dec 2011
Event25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore
Duration: 16 Dec 201118 Dec 2011

Conference

Conference25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
CountrySingapore
Period16/12/1118/12/11

Keywords

  • Collocation extraction
  • Hownet
  • Semantic relationship
  • Similarity calculation
  • Statistical model
  • Syntactic rules

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this