TY - GEN
T1 - Dilated Residual Aggregation Network for Text-Guided Image Manipulation
AU - Lu, Siwei
AU - Luo, Di
AU - Yang, Zhenguo
AU - Hao, Tianyong
AU - Li, Qing
AU - Liu, Wenyin
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In this paper, we propose a dilated residual aggregation network (denoted as DRA) for text-guided image manipulation, which exploits a long-distance residual with dilated convolutions (RD) to aggregate the encoded visual content and style features and the textual features of the guiding descriptions. In particular, the dilated convolutions increase the receptive field without sacrificing spatial resolutions of intermediate features, benefiting to reconstructing the texture details matching with the textual descriptions. Furthermore, we propose an attention-guided injection module (AIM) to inject textual semantics into feature maps of DRA without polluting the text-irrelevant image regions by combining triplet attention mechanism and central biasing instance normalization. Quantitative and qualitative experiments conducted on the CUB-200-2011 and Oxford-102 datasets demonstrate the superior performance of the proposed DRA.
AB - Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In this paper, we propose a dilated residual aggregation network (denoted as DRA) for text-guided image manipulation, which exploits a long-distance residual with dilated convolutions (RD) to aggregate the encoded visual content and style features and the textual features of the guiding descriptions. In particular, the dilated convolutions increase the receptive field without sacrificing spatial resolutions of intermediate features, benefiting to reconstructing the texture details matching with the textual descriptions. Furthermore, we propose an attention-guided injection module (AIM) to inject textual semantics into feature maps of DRA without polluting the text-irrelevant image regions by combining triplet attention mechanism and central biasing instance normalization. Quantitative and qualitative experiments conducted on the CUB-200-2011 and Oxford-102 datasets demonstrate the superior performance of the proposed DRA.
KW - Attention-guide injection
KW - Long-distance residual
KW - Text-guided image manipulation
UR - http://www.scopus.com/inward/record.url?scp=85115674659&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-86365-4_3
DO - 10.1007/978-3-030-86365-4_3
M3 - Conference article published in proceeding or book
AN - SCOPUS:85115674659
SN - 9783030863647
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 28
EP - 40
BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
A2 - Farkaš, Igor
A2 - Masulli, Paolo
A2 - Otte, Sebastian
A2 - Wermter, Stefan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021
Y2 - 14 September 2021 through 17 September 2021
ER -