Encoder-Decoder Calibration for Multimodal Machine Translation

Turghun Tayir, Lin Li, Bei Li, Jianquan Liu, Kong Aik Lee

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)


The main purpose of multimodal machine translation is to improve the quality of translation results by taking the corresponding visual context as an additional input. Recently many studies in neural machine translation have attempted to obtain high-quality multimodal representation of encoder or decoder via attention mechanism. However, attention mechanism does not always accurately identify the decisive input for each prediction, which leads to an unsatisfactory multimodal information fusion. To this end, we propose an encoder-decoder calibration method which can automatically calibrate the image and text fusion representation in the encoder, and find the decisive input to the translation in the decoder. We validate our model on the multimodal machine translation dataset Multi30K. Experimental results show that our method significantly outperforms several recent baselines for both English–German and English–French translation tasks in terms of BLEU and METEOR.

Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalIEEE Transactions on Artificial Intelligence
Publication statusPublished - Jan 2024


  • Calibration
  • Decoding
  • encoder-decoder calibration
  • Feature extraction
  • Fuses
  • Machine translation
  • multimodal fusion
  • multimodal machine translation
  • Transformers
  • visual encoder
  • Visualization

ASJC Scopus subject areas

  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Encoder-Decoder Calibration for Multimodal Machine Translation'. Together they form a unique fingerprint.

Cite this