Learning semantic alignment from image for text-guided image inpainting

Yucheng Xie, Zehang Lin, Zhenguo Yang, Huan Deng, Xingcai Wu, Xudong Mao, Qing Li, Wenyin Liu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)


In this paper, we propose a method called LSAI (learning semantic alignment from image) to recover the corrupted image patches for text-guided image inpainting. Firstly, a multimodal preliminary (MP) module is designed to effectively encode global features for images and textual descriptions, where each local image patch and word are taken into account via multi-head self-attention. Secondly, non-Euclidean semantic relations between images and textual descriptions are captured with graph structure by building a semantic relation graph (SRG). The constructed SRG is able to obtain meaningful words describing the image content and alleviate the impact of distracting words, which is achieved by aggregating the semantic relations with graph convolution. In addition, a text-image matching loss is devised to penalize the restored images for diverse textual and visual semantics. Quantitative and qualitative experiments conducted on two public datasets show the outperformance of our proposed LSAI (e.g., FID value is reduced from 30.87 to 16.73 on CUB-200-2011 dataset).

Original languageEnglish
Pages (from-to)3149-3161
Number of pages13
JournalVisual Computer
Issue number9-10
Publication statusPublished - Sept 2022


  • Generative adversarial networks
  • Graph convolution
  • Text-guided image inpainting

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Learning semantic alignment from image for text-guided image inpainting'. Together they form a unique fingerprint.

Cite this