Abstract
In this paper, we present a Distant BI-Gram model, which extended the regular BI-Gram model by considering the distance information and weight parameters, in order to describe the long-distance restrictions among the Chinese sentence. The extraction of the statistical information and weight parameters of this language model is discussed. Based on this work, the word combination strength and spread are employed to extract the recurrent word combinations, i.e. collocations. The Distant BI-Gram Model and collocation are applied to a statistic-based post-processing system for improving the recognition performance of Chinese character. The experimental results show that by employ these two language models, the post-processing system achieves a higher improvement performance.
Original language | English |
---|---|
Title of host publication | Proceedings of 2002 International Conference on Machine Learning and Cybernetics |
Pages | 2251-2255 |
Number of pages | 5 |
Volume | 4 |
Publication status | Published - 1 Dec 2002 |
Event | Proceedings of 2002 International Conference on Machine Learning and Cybernetics - Beijing, China Duration: 4 Nov 2002 → 5 Nov 2002 |
Conference
Conference | Proceedings of 2002 International Conference on Machine Learning and Cybernetics |
---|---|
Country/Territory | China |
City | Beijing |
Period | 4/11/02 → 5/11/02 |
Keywords
- Character recognition
- Collocation
- Distant BI-Gram
- Post-processing
ASJC Scopus subject areas
- General Engineering