首页|RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency

来源：

英文摘要

Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing & Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments.

作者：Xiaowei Chi、Zhihong Liu、Haoqian Wang、Zhengkai Jiang、Siqi Li、Jiawei Zhou

作者单位：

学科分类：服装工业、制鞋工业计算技术、计算机技术

推荐引用：Xiaowei Chi,Zhihong Liu,Haoqian Wang,Zhengkai Jiang,Siqi Li,Jiawei Zhou.RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency[EB/OL].(2025-01-15)[2025-06-14].https://arxiv.org/abs/2501.08682.点此复制

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency

评论