Abstract
Advances in image editing models have enabled intelligent, rapid fashion customization. Vision-guided editing models, in particular, offer more precise and flexible control over fine-grained garment attributes. However, existing methods are limited to coarse-grained edits and fail to achieve attribute-level manipulation, thereby restricting the flexibility and composability required in fashion customization. To address these issues, this paper proposes a Vision-Guided Fashion Fine-Grained Attribute Editing (VFFAE) framework, which leverages visual references to achieve customized editing of both style and structure in fine-grained garment regions. The VFFAE framework involves three key components: 1) a text-driven fashion fine-grained attribute segmenter that incorporates garment keypoints as spatial priors and applies deformable attention to enhance spatial perception, with CLIP-based multimodal alignment for accurate segmentation; 2) a clothing attribute disentanglement module based on orthogonal subspace projection of CLIP embeddings, enabling zero-shot explicit separation of style and structure attributes; and 3) a conditional diffusion pipeline that leverages disentangled representations of segmented regions to fine-tune a pretrained Stable Diffusion model under classifier-free guidance, enabling controllable attribute editing. Experiments on multiple public datasets show that VFFAE surpasses state-of-the-art methods, and ablation analyses confirm the effectiveness of its segmentation and disentanglement modules, establishing it as a practical solution for high-fidelity attribute-level fashion customization.
| Original language | English |
|---|---|
| Pages (from-to) | 6152-6167 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 36 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - May 2026 |
Keywords
- attributes disentanglement
- classifier-free guidance
- deformable attention
- Fashion editing
- fine-grained attribute segmentation
- orthogonal subspace projection
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'Vision-Guided Fashion Fine-Grained Attribute Editing via Semantic Segmentation and Disentangled Representation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver