A hybrid language model based on statistics and linguistic rules

Xiaolong Wang, Daniel S. Yeung, James N.K. Liu, Wing Pong Robert Luk, Xuan Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Language modeling is a current research topic in many domains including speech recognition, optical character recognition, handwriting recognition, machine translation and spelling correction. There are two main types of language models, the mathematical and the linguistic. The most widely used mathematical language model is the n-gram model inferred from statistics. This model has three problems: long distance restriction, recursive nature and partial language understanding. Language models based on linguistics present many difficulties when applied to large scale real texts. We present here a new hybrid language model that combines the advantages of the n-gram statistical language model with those of a linguistic language model which makes use of grammatical or semantic rules. Using suitable rules, this hybrid model can solve problems such as long distance restriction, recursive nature and partial language understanding. The new language model has been effective in experiments and has been incorporated in Chinese sentence input products for Windows and Macintosh OS.
Original languageEnglish
Pages (from-to)109-128
Number of pages20
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume19
Issue number1
DOIs
Publication statusPublished - 1 Feb 2005

Keywords

  • Chinese input
  • Computational linguistics
  • Hybrid language model
  • N-gram model

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this