Abstract
This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 headwords are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bigram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bigram collocations are examined which is essential to collocation research, especially for Chinese.
Original language | English |
---|---|
Title of host publication | ACL 2007: The LAW - Proceedings of The Linguistic Annotation Workshop |
Pages | 61-68 |
Number of pages | 8 |
Publication status | Published - 1 Dec 2007 |
Event | Linguistic Annotation Workshop, LAW 2007 - Prague, Czech Republic Duration: 28 Jun 2007 → 29 Jun 2007 |
Conference
Conference | Linguistic Annotation Workshop, LAW 2007 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 28/06/07 → 29/06/07 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language