|国家预印本平台
首页|DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

来源:Arxiv_logoArxiv
英文摘要

Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion. While recent advances in latent diffusion models have substantially improved visual quality, existing approaches still struggle with preserving fine-grained garment details, achieving precise garment-body alignment, maintaining inference efficiency, and generalizing to diverse poses and clothing styles. To address these challenges, we propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on. DiffFit adopts a progressive generation strategy: the first stage performs geometry-aware garment warping, aligning the garment with the target body through fine-grained deformation and pose adaptation. The second stage refines texture fidelity via a cross-modal conditional diffusion model that integrates the warped garment, the original garment appearance, and the target person image for high-quality rendering. By decoupling geometric alignment and appearance refinement, DiffFit effectively reduces task complexity and enhances both generation stability and visual realism. It excels in preserving garment-specific attributes such as textures, wrinkles, and lighting, while ensuring accurate alignment with the human body. Extensive experiments on large-scale VTON benchmarks demonstrate that DiffFit achieves superior performance over existing state-of-the-art methods in both quantitative metrics and perceptual evaluations.

Xiang Xu

计算技术、计算机技术

Xiang Xu.DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On[EB/OL].(2025-06-29)[2025-07-16].https://arxiv.org/abs/2506.23295.点此复制

评论