TY - GEN
T1 - Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting
AU - Wu, Xingcai
AU - Xie, Yucheng
AU - Zeng, Jiaqi
AU - Yang, Zhenguo
AU - Yu, Yi
AU - Li, Qing
AU - Liu, Wenyin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Text-guided image inpainting aims to complete the corrupted patches coherent with both visual and textual context. On one hand, existing works focus on surrounding pixels of the corrupted patches without considering the objects in the image, resulting in the characteristics of objects described in text being painted on non-object regions. On the other hand, the redundant information in text may distract the generation of objects of interest in the restored image. In this paper, we propose an adversarial learning framework with mask reconstruction (ALMR) for image inpainting with textual guidance, which consists of a two-stage generator and dual discriminators. The two-stage generator aims to restore coarse-grained and fine-grained images, respectively. In particular, we devise a dual-attention module (DAM) to incorporate the word-level and sentence-level textual features as guidance on generating the coarse-grained and fine-grained details in the two stages. Furthermore, we design a mask reconstruction module (MRM) to penalize the restoration of the objects of interest with the given textual descriptions about the objects. For adversarial training, we exploit global and local discriminators for the whole image and corrupted patches, respectively. Extensive experiments conducted on CUB-200-2011, Oxford-102 and CelebA-HQ show the outperformance of the proposed ALMR (e.g., FID value is reduced from 29.69 to 14.69 compared with the state-of-the-art approach on CUB-200-2011). Codes are available at https://github.com/GaranWu/ALMR
AB - Text-guided image inpainting aims to complete the corrupted patches coherent with both visual and textual context. On one hand, existing works focus on surrounding pixels of the corrupted patches without considering the objects in the image, resulting in the characteristics of objects described in text being painted on non-object regions. On the other hand, the redundant information in text may distract the generation of objects of interest in the restored image. In this paper, we propose an adversarial learning framework with mask reconstruction (ALMR) for image inpainting with textual guidance, which consists of a two-stage generator and dual discriminators. The two-stage generator aims to restore coarse-grained and fine-grained images, respectively. In particular, we devise a dual-attention module (DAM) to incorporate the word-level and sentence-level textual features as guidance on generating the coarse-grained and fine-grained details in the two stages. Furthermore, we design a mask reconstruction module (MRM) to penalize the restoration of the objects of interest with the given textual descriptions about the objects. For adversarial training, we exploit global and local discriminators for the whole image and corrupted patches, respectively. Extensive experiments conducted on CUB-200-2011, Oxford-102 and CelebA-HQ show the outperformance of the proposed ALMR (e.g., FID value is reduced from 29.69 to 14.69 compared with the state-of-the-art approach on CUB-200-2011). Codes are available at https://github.com/GaranWu/ALMR
KW - object mask
KW - text-guided image inpainting
KW - textual and visual semantics
UR - http://www.scopus.com/inward/record.url?scp=85119323494&partnerID=8YFLogxK
U2 - 10.1145/3474085.3475506
DO - 10.1145/3474085.3475506
M3 - Conference article published in proceeding or book
AN - SCOPUS:85119323494
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 3464
EP - 3472
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 29th ACM International Conference on Multimedia, MM 2021
Y2 - 20 October 2021 through 24 October 2021
ER -