TY - GEN
T1 - MasterWeaver
T2 - 18th European Conference on Computer Vision, ECCV 2024
AU - Wei, Yuxiang
AU - Ji, Zhilong
AU - Bai, Jinfeng
AU - Zhang, Hongzhi
AU - Zhang, Lei
AU - Zuo, Wangmeng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they often suffer from overfitting issues. The learned identity tends to entangle with irrelevant information, resulting in unsatisfied text controllability, especially on faces. In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both high identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additionally introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model. Additionally, a face-augmented dataset is constructed to facilitate disentangled identity learning, and further improve the editability. Extensive experiments demonstrate that our MasterWeaver can not only generate personalized images with faithful identity, but also exhibit superiority in text controllability. Our code can be found at https://github.com/csyxwei/MasterWeaver.
AB - Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they often suffer from overfitting issues. The learned identity tends to entangle with irrelevant information, resulting in unsatisfied text controllability, especially on faces. In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both high identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additionally introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model. Additionally, a face-augmented dataset is constructed to facilitate disentangled identity learning, and further improve the editability. Extensive experiments demonstrate that our MasterWeaver can not only generate personalized images with faithful identity, but also exhibit superiority in text controllability. Our code can be found at https://github.com/csyxwei/MasterWeaver.
KW - Editability
KW - Identity Preservation
KW - Personalized Text-to-Image Generation
UR - http://www.scopus.com/inward/record.url?scp=85209939695&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72983-6_15
DO - 10.1007/978-3-031-72983-6_15
M3 - Conference article published in proceeding or book
AN - SCOPUS:85209939695
SN - 9783031729829
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 252
EP - 271
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 September 2024 through 4 October 2024
ER -