Words without Boundaries: Computational Approaches to Chinese Word Segmentation

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)

Abstract

The fact that words are not conventionally demarcated in Chinese orthography makes the process of word segmentation non-trivial. Chinese word segmentation remains a challenging topic in Chinese computational linguistics. We survey previous approaches to Chinese word segmentation, including dictionary look-up, strength of internal binding, as well as character tagging and machine learning. The Word Boundary Decision (WBD) approach which requires no prior lexical knowledge is proposed. It is shown that the WBD model greatly reduces the complexity of Chinese word segmentation and may provide a promising approach to address domain adaption and robustness issues. Language and Linguistics Compass
Original languageEnglish
Pages (from-to)494-505
Number of pages12
JournalLinguistics and Language Compass
Volume6
Issue number8
DOIs
Publication statusPublished - 1 Aug 2012

ASJC Scopus subject areas

  • Linguistics and Language

Cite this