SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks. Despite these advances, diffusion-based video generation remains computationally intensive, especially for high-resolution, long-duration videos. Prior work accelerates its inference by skipping computation, usually at the cost of severe quality degradation. In this paper, we propose SRDiffusion, a novel framework that leverages collaboration between large and small models to reduce inference cost. The large model handles high-noise steps to ensure semantic and motion fidelity (Sketching), while the smaller model refines visual details in low-noise steps (Rendering). Experimental results demonstrate that our method outperforms existing approaches, over 3$\times$ speedup for Wan with nearly no quality loss for VBench, and 2$\times$ speedup for CogVideoX. Our method is introduced as a new direction orthogonal to existing acceleration strategies, offering a practical solution for scalable video generation.
Shenggan Cheng、Yuanxin Wei、Lansong Diao、Yong Liu、Bujiao Chen、Lianghua Huang、Yu Liu、Wenyuan Yu、Jiangsu Du、Wei Lin、Yang You
计算技术、计算机技术
Shenggan Cheng,Yuanxin Wei,Lansong Diao,Yong Liu,Bujiao Chen,Lianghua Huang,Yu Liu,Wenyuan Yu,Jiangsu Du,Wei Lin,Yang You.SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation[EB/OL].(2025-05-25)[2025-06-14].https://arxiv.org/abs/2505.19151.点此复制
评论