Distant BI-Gram model, collocation, and their applications in post-processing for Chinese character recognition

Rui Feng Xu, Qin Lu, Daniel S. Yeung, Xi Zhao Wang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

In this paper, we present a Distant BI-Gram model, which extended the regular BI-Gram model by considering the distance information and weight parameters, in order to describe the long-distance restrictions among the Chinese sentence. The extraction of the statistical information and weight parameters of this language model is discussed. Based on this work, the word combination strength and spread are employed to extract the recurrent word combinations, i.e. collocations. The Distant BI-Gram Model and collocation are applied to a statistic-based post-processing system for improving the recognition performance of Chinese character. The experimental results show that by employ these two language models, the post-processing system achieves a higher improvement performance.
Original languageEnglish
Title of host publicationProceedings of 2002 International Conference on Machine Learning and Cybernetics
Pages2251-2255
Number of pages5
Volume4
Publication statusPublished - 1 Dec 2002
EventProceedings of 2002 International Conference on Machine Learning and Cybernetics - Beijing, China
Duration: 4 Nov 20025 Nov 2002

Conference

ConferenceProceedings of 2002 International Conference on Machine Learning and Cybernetics
CountryChina
CityBeijing
Period4/11/025/11/02

Keywords

  • Character recognition
  • Collocation
  • Distant BI-Gram
  • Post-processing

ASJC Scopus subject areas

  • Engineering(all)

Cite this