Abstract
Language modeling is a current research topic in many domains including speech recognition, optical character recognition, handwriting recognition, machine translation and spelling correction. There are two main types of language models, the mathematical and the linguistic. The most widely used mathematical language model is the n-gram model inferred from statistics. This model has three problems: long distance restriction, recursive nature and partial language understanding. Language models based on linguistics present many difficulties when applied to large scale real texts. We present here a new hybrid language model that combines the advantages of the n-gram statistical language model with those of a linguistic language model which makes use of grammatical or semantic rules. Using suitable rules, this hybrid model can solve problems such as long distance restriction, recursive nature and partial language understanding. The new language model has been effective in experiments and has been incorporated in Chinese sentence input products for Windows and Macintosh OS.
Original language | English |
---|---|
Pages (from-to) | 109-128 |
Number of pages | 20 |
Journal | International Journal of Pattern Recognition and Artificial Intelligence |
Volume | 19 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Feb 2005 |
Keywords
- Chinese input
- Computational linguistics
- Hybrid language model
- N-gram model
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Artificial Intelligence