Annotating chinese collocations with multi information

Ruifeng Xu, Qin Lu, Kam Fai Wong, Wenjie Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 headwords are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bigram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bigram collocations are examined which is essential to collocation research, especially for Chinese.
Original languageEnglish
Title of host publicationACL 2007: The LAW - Proceedings of The Linguistic Annotation Workshop
Pages61-68
Number of pages8
Publication statusPublished - 1 Dec 2007
EventLinguistic Annotation Workshop, LAW 2007 - Prague, Czech Republic
Duration: 28 Jun 200729 Jun 2007

Conference

ConferenceLinguistic Annotation Workshop, LAW 2007
Country/TerritoryCzech Republic
CityPrague
Period28/06/0729/06/07

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this