Building a Chinese collocation bank

R. Xu, Qin Lu, K.F. Wong, Wenjie Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. The definition and properties are first studied. Based on a combination of different properties, a classification scheme is proposed to categorize Chinese collocations into four types. With the help of computational tools, bigram collocations and n-gram collocations of 3,643 headwords are manually identified in a 5-million-word corpus. Furthermore, for each identified bigram collocation, its dependency relation, chunking information and classification are annotated to produce a collocation bank. Currently, the Chinese collocation bank contains 23,581 bigram collocations and 2,752 n-gram collocations. The Chinese collocation bank is a valuable resource for Chinese collocation related research. Through statistical analysis on the collocation bank, some interesting characteristics of Chinese bigram collocations are presented in this paper.
Original languageEnglish
Pages (from-to)21-47
Number of pages27
JournalInternational journal of computer processing of languages
Volume22
Issue number1
DOIs
Publication statusPublished - 2009

Keywords

  • Chinese collocation
  • Collocation bank
  • Collocation annotation
  • Collocation classification

Cite this