TY - GEN
T1 - Multi-branch Semantic Learning Network for Text-to-Image Synthesis
AU - Ling, Jiading
AU - Wu, Xingcai
AU - Yang, Zhenguo
AU - Mao, Xudong
AU - Li, Qing
AU - Liu, Wenyin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - In this paper, we propose a multi-branch semantic learning network (MSLN) to generate image according to textual description by taking into account global and local textual semantics, which consists of two stages. The first stage generates a coarse-grained image based on the sentence features. In the second stage, a multi-branch fine-grained generation model is constructed to inject the sentence-level and word-level semantics into two coarse-grained images by global and local attention modules, which generate global and local fine-grained image textures, respectively. In particular, we devise a channel fusion module (CFM) to fuse the global and local fine-grained features in the multi-branch fine-grained stage and generate the output image. Extensive experiments conducted on the CUB-200 dataset and Oxford-102 dataset demonstrate the superior performance of the proposed method. (e.g., FID is reduced from 16.09 to 14.43 on CUB-200).
AB - In this paper, we propose a multi-branch semantic learning network (MSLN) to generate image according to textual description by taking into account global and local textual semantics, which consists of two stages. The first stage generates a coarse-grained image based on the sentence features. In the second stage, a multi-branch fine-grained generation model is constructed to inject the sentence-level and word-level semantics into two coarse-grained images by global and local attention modules, which generate global and local fine-grained image textures, respectively. In particular, we devise a channel fusion module (CFM) to fuse the global and local fine-grained features in the multi-branch fine-grained stage and generate the output image. Extensive experiments conducted on the CUB-200 dataset and Oxford-102 dataset demonstrate the superior performance of the proposed method. (e.g., FID is reduced from 16.09 to 14.43 on CUB-200).
KW - Feature fusion
KW - Global and local semantic
KW - Multi-branch networks
KW - Text-to-image Synthesis
UR - http://www.scopus.com/inward/record.url?scp=85123052818&partnerID=8YFLogxK
U2 - 10.1145/3469877.3490567
DO - 10.1145/3469877.3490567
M3 - Conference article published in proceeding or book
AN - SCOPUS:85123052818
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 5
BT - Proceedings of the 3rd ACM International Conference on Multimedia in Asia, MMAsia 2021
PB - Association for Computing Machinery
T2 - 3rd ACM International Conference on Multimedia in Asia, MMAsia 2021
Y2 - 1 December 2021 through 3 December 2021
ER -