Skip to main navigation Skip to search Skip to main content

Vision-Guided Fashion Fine-Grained Attribute Editing via Semantic Segmentation and Disentangled Representation

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Advances in image editing models have enabled intelligent, rapid fashion customization. Vision-guided editing models, in particular, offer more precise and flexible control over fine-grained garment attributes. However, existing methods are limited to coarse-grained edits and fail to achieve attribute-level manipulation, thereby restricting the flexibility and composability required in fashion customization. To address these issues, this paper proposes a Vision-Guided Fashion Fine-Grained Attribute Editing (VFFAE) framework, which leverages visual references to achieve customized editing of both style and structure in fine-grained garment regions. The VFFAE framework involves three key components: 1) a text-driven fashion fine-grained attribute segmenter that incorporates garment keypoints as spatial priors and applies deformable attention to enhance spatial perception, with CLIP-based multimodal alignment for accurate segmentation; 2) a clothing attribute disentanglement module based on orthogonal subspace projection of CLIP embeddings, enabling zero-shot explicit separation of style and structure attributes; and 3) a conditional diffusion pipeline that leverages disentangled representations of segmented regions to fine-tune a pretrained Stable Diffusion model under classifier-free guidance, enabling controllable attribute editing. Experiments on multiple public datasets show that VFFAE surpasses state-of-the-art methods, and ablation analyses confirm the effectiveness of its segmentation and disentanglement modules, establishing it as a practical solution for high-fidelity attribute-level fashion customization.

Original languageEnglish
Pages (from-to)6152-6167
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number5
DOIs
Publication statusPublished - May 2026

Keywords

  • attributes disentanglement
  • classifier-free guidance
  • deformable attention
  • Fashion editing
  • fine-grained attribute segmentation
  • orthogonal subspace projection

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Vision-Guided Fashion Fine-Grained Attribute Editing via Semantic Segmentation and Disentangled Representation'. Together they form a unique fingerprint.

Cite this