Multi-branch Semantic Learning Network for Text-to-Image Synthesis

Jiading Ling, Xingcai Wu, Zhenguo Yang, Xudong Mao, Qing Li, Wenyin Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

In this paper, we propose a multi-branch semantic learning network (MSLN) to generate image according to textual description by taking into account global and local textual semantics, which consists of two stages. The first stage generates a coarse-grained image based on the sentence features. In the second stage, a multi-branch fine-grained generation model is constructed to inject the sentence-level and word-level semantics into two coarse-grained images by global and local attention modules, which generate global and local fine-grained image textures, respectively. In particular, we devise a channel fusion module (CFM) to fuse the global and local fine-grained features in the multi-branch fine-grained stage and generate the output image. Extensive experiments conducted on the CUB-200 dataset and Oxford-102 dataset demonstrate the superior performance of the proposed method. (e.g., FID is reduced from 16.09 to 14.43 on CUB-200).

Original languageEnglish
Title of host publicationProceedings of the 3rd ACM International Conference on Multimedia in Asia, MMAsia 2021
PublisherAssociation for Computing Machinery
Pages1-5
ISBN (Electronic)9781450386074
DOIs
Publication statusPublished - 1 Dec 2021
Event3rd ACM International Conference on Multimedia in Asia, MMAsia 2021 - Virtual, Online, Australia
Duration: 1 Dec 20213 Dec 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference3rd ACM International Conference on Multimedia in Asia, MMAsia 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/12/213/12/21

Keywords

  • Feature fusion
  • Global and local semantic
  • Multi-branch networks
  • Text-to-image Synthesis

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Multi-branch Semantic Learning Network for Text-to-Image Synthesis'. Together they form a unique fingerprint.

Cite this