Abstract
The fact that words are not conventionally demarcated in Chinese orthography makes the process of word segmentation non-trivial. Chinese word segmentation remains a challenging topic in Chinese computational linguistics. We survey previous approaches to Chinese word segmentation, including dictionary look-up, strength of internal binding, as well as character tagging and machine learning. The Word Boundary Decision (WBD) approach which requires no prior lexical knowledge is proposed. It is shown that the WBD model greatly reduces the complexity of Chinese word segmentation and may provide a promising approach to address domain adaption and robustness issues. Language and Linguistics Compass
Original language | English |
---|---|
Pages (from-to) | 494-505 |
Number of pages | 12 |
Journal | Linguistics and Language Compass |
Volume | 6 |
Issue number | 8 |
DOIs | |
Publication status | Published - 1 Aug 2012 |
ASJC Scopus subject areas
- Linguistics and Language