Bilingual collocation correspondence is helpful to machine translation and second language learning. Existing techniques for identifying Chinese-English collocation correspondence suffer from two major problems. They are sensitive to the coverage of the bilingual dictionary and the insensitive to semantic and contextual information. This paper presents the ICT (Improved Collocation Translation) method to overcome these problems. For a given Chinese collocation, the word translation candidates extracted from a bilingual dictionary are expanded to improve the coverage. A new translation model, which incorporates statistics extracted from monolingual corpora, word semantic similarities from monolingual thesaurus and bilingual context similarities, is employed to estimate and rank the probabilities of the collocation correspondence candidates. Experiments show that ICT is robust to the coverage of bilingual dictionary. It achieves 50.1% accuracy for the first candidate and 73.1% accuracy for the top-3 candidates.
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||21st International Conference on Computer Processing of Oriental Languages: Beyond the Orient: The Research Challenges Ahead, ICCPOL 2006|
|Period||17/12/06 → 19/12/06|
- Bilingual collocation correspondence
- Monolingual corpora
- Theoretical Computer Science
- Computer Science(all)