Word boundary decision with CRF for Chinese word segmentation

Shoushan Li, Chu-ren Huang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Chinese word segmentation systems necessarily perform both accurately and quickly for real applications. In this paper, we study on word boundary decision (WBD) approach for Chinese word segmentation and implement it as a 2-tag character tagging with conditional random filed (CRF). With a help of tag transition features, WBD with CRF segmentation approach can achieve comparative performances compared to 4-tag character tagging approach (represents the state-of-the-art segmentation approach). But it requires only about half training time and memory space as much as 4-tag character tagging approach. These results encourage that WBD segmentation approach is a good choice for real Chinese word segmentation systems.
Original languageEnglish
Title of host publicationPACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation
Pages726-732
Number of pages7
Volume2
Publication statusPublished - 1 Dec 2009
Event23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 - Hong Kong, Hong Kong
Duration: 3 Dec 20095 Dec 2009

Conference

Conference23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23
Country/TerritoryHong Kong
CityHong Kong
Period3/12/095/12/09

Keywords

  • Chinese word segmentation
  • Conditional random field
  • Word boundary decision

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this