Abstract
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
Original language | English |
---|---|
Title of host publication | PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation |
Pages | 430-439 |
Number of pages | 10 |
Publication status | Published - 1 Dec 2011 |
Event | 25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 - , Singapore Duration: 16 Dec 2011 → 18 Dec 2011 |
Conference
Conference | 25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25 |
---|---|
Country/Territory | Singapore |
Period | 16/12/11 → 18/12/11 |
Keywords
- Collocation extraction
- Hownet
- Semantic relationship
- Similarity calculation
- Statistical model
- Syntactic rules
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)