An improved method for finding bilingual collocation correspondences from monolingual corpora

Ruifeng Xu, Kam Fai Wong, Qin Lu, Wenjie Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Bilingual collocation correspondence is helpful to machine translation and second language learning. Existing techniques for identifying Chinese-English collocation correspondence suffer from two major problems. They are sensitive to the coverage of the bilingual dictionary and the insensitive to semantic and contextual information. This paper presents the ICT (Improved Collocation Translation) method to overcome these problems. For a given Chinese collocation, the word translation candidates extracted from a bilingual dictionary are expanded to improve the coverage. A new translation model, which incorporates statistics extracted from monolingual corpora, word semantic similarities from monolingual thesaurus and bilingual context similarities, is employed to estimate and rank the probabilities of the collocation correspondence candidates. Experiments show that ICT is robust to the coverage of bilingual dictionary. It achieves 50.1% accuracy for the first candidate and 73.1% accuracy for the top-3 candidates.
Original languageEnglish
Title of host publicationComputer Processing of Oriental Languages - Beyond the Orient
Subtitle of host publicationThe Research Challenges Ahead - 21st International Conference, ICCPOL 2006, Proceedings
Pages51-62
Number of pages12
DOIs
Publication statusPublished - 1 Dec 2006
Event21st International Conference on Computer Processing of Oriental Languages: Beyond the Orient: The Research Challenges Ahead, ICCPOL 2006 - Singapore, Singapore
Duration: 17 Dec 200619 Dec 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4285 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Computer Processing of Oriental Languages: Beyond the Orient: The Research Challenges Ahead, ICCPOL 2006
Country/TerritorySingapore
CitySingapore
Period17/12/0619/12/06

Keywords

  • Bilingual collocation correspondence
  • Monolingual corpora

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this