TY - GEN
T1 - Responsible Visual Editing
AU - Ni, Minheng
AU - Shen, Yeli
AU - Zhang, Lei
AU - Zuo, Wangmeng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - With the recent advancements in visual synthesis, there is a growing risk of encountering synthesized images with detrimental effects, such as hate, discrimination, and privacy violations. Unfortunately, it remains unexplored on how to avoid synthesizing harmful images and convert them into responsible ones. In this paper, we present responsible visual editing, which edits risky concepts within an image to more responsible ones with minimal content changes. However, the concepts that need to be edited are often abstract, making them hard to be located and edited. To tackle these challenges, we propose a Cognitive Editor (CoEditor) by harnessing the large multimodal models through a two-stage cognitive process: (1) a perceptual cognitive process to locate what to be edited and (2) a behavioral cognitive process to strategize how to edit. To mitigate the negative implications of harmful images on research, we build a transparent and public dataset, namely AltBear, which expresses harmful information using teddy bears instead of humans. Experiments demonstrate that CoEditor can effectively comprehend abstract concepts in complex scenes, significantly surpassing the baseline models for responsible visual editing. Moreover, we find that the AltBear dataset corresponds well to the harmful content found in real images, providing a safe and effective benchmark for future research. Our source code and dataset can be found at https://github.com/kodenii/Responsible-Visual-Editing.
AB - With the recent advancements in visual synthesis, there is a growing risk of encountering synthesized images with detrimental effects, such as hate, discrimination, and privacy violations. Unfortunately, it remains unexplored on how to avoid synthesizing harmful images and convert them into responsible ones. In this paper, we present responsible visual editing, which edits risky concepts within an image to more responsible ones with minimal content changes. However, the concepts that need to be edited are often abstract, making them hard to be located and edited. To tackle these challenges, we propose a Cognitive Editor (CoEditor) by harnessing the large multimodal models through a two-stage cognitive process: (1) a perceptual cognitive process to locate what to be edited and (2) a behavioral cognitive process to strategize how to edit. To mitigate the negative implications of harmful images on research, we build a transparent and public dataset, namely AltBear, which expresses harmful information using teddy bears instead of humans. Experiments demonstrate that CoEditor can effectively comprehend abstract concepts in complex scenes, significantly surpassing the baseline models for responsible visual editing. Moreover, we find that the AltBear dataset corresponds well to the harmful content found in real images, providing a safe and effective benchmark for future research. Our source code and dataset can be found at https://github.com/kodenii/Responsible-Visual-Editing.
KW - Image editing
KW - Large multimodal model
KW - Responsible visual editing
UR - http://www.scopus.com/inward/record.url?scp=85206365402&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72670-5_18
DO - 10.1007/978-3-031-72670-5_18
M3 - Conference article published in proceeding or book
AN - SCOPUS:85206365402
SN - 9783031726699
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 314
EP - 330
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th European Conference on Computer Vision, ECCV 2024
Y2 - 29 September 2024 through 4 October 2024
ER -