Abstract
该文研究和探讨一种新的分词方法:基 于词边界分类的方法。该方法直接对字符与字符之间的边界进行分类,判断其是否为两个词之间的边界,从而达到分词的目的。相对于目前主流的基于字标注的分词 方法,该方法的实现和训练更加快速、简单和直接,但却能获得比较接近的分词效果。更显著的是我们可以很容易地从词边界分类方法获得在线分词学习方法,该方 法能够使我们的分词系统非常迅速地学习新的标注样本。 ||This paper focuses on the word boundary decision(WBD) approach to Chinese word segmentation.This new approach classifies a boundary between two characters into either a word boundary or not.Compared to the stat-of-the-arts methods based on character tagging,this approach is easier to implement and faster to execute,as well as a competitive performance.Particularly,the robust online learning module can be added to adapt a WBD system to new data quickly,enabling a reliable online Chinese segmentation system without domain or training data constraints.
Original language | Chinese (Simplified) |
---|---|
Pages (from-to) | 3-7 |
Number of pages | 5 |
Journal | 中文信息学报 (Journal of Chinese information processing) |
Volume | 24 |
Issue number | 1 |
Publication status | Published - 2010 |
Keywords
- Computer application
- Chinese information processing
- Chinese word segmentation
- WBD approach
- Online learning