Abstract
Chinese word segmentation systems necessarily perform both accurately and quickly for real applications. In this paper, we study on word boundary decision (WBD) approach for Chinese word segmentation and implement it as a 2-tag character tagging with conditional random filed (CRF). With a help of tag transition features, WBD with CRF segmentation approach can achieve comparative performances compared to 4-tag character tagging approach (represents the state-of-the-art segmentation approach). But it requires only about half training time and memory space as much as 4-tag character tagging approach. These results encourage that WBD segmentation approach is a good choice for real Chinese word segmentation systems.
Original language | English |
---|---|
Title of host publication | PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation |
Pages | 726-732 |
Number of pages | 7 |
Volume | 2 |
Publication status | Published - 1 Dec 2009 |
Event | 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 - Hong Kong, Hong Kong Duration: 3 Dec 2009 → 5 Dec 2009 |
Conference
Conference | 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 3/12/09 → 5/12/09 |
Keywords
- Chinese word segmentation
- Conditional random field
- Word boundary decision
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)