|国家预印本平台
首页|Discriminator-Free Direct Preference Optimization for Video Diffusion

Discriminator-Free Direct Preference Optimization for Video Diffusion

Discriminator-Free Direct Preference Optimization for Video Diffusion

来源:Arxiv_logoArxiv
英文摘要

Direct Preference Optimization (DPO), which aligns models with human preferences through win/lose data pairs, has achieved remarkable success in language and image generation. However, applying DPO to video diffusion models faces critical challenges: (1) Data inefficiency. Generating thousands of videos per DPO iteration incurs prohibitive costs; (2) Evaluation uncertainty. Human annotations suffer from subjective bias, and automated discriminators fail to detect subtle temporal artifacts like flickering or motion incoherence. To address these, we propose a discriminator-free video DPO framework that: (1) Uses original real videos as win cases and their edited versions (e.g., reversed, shuffled, or noise-corrupted clips) as lose cases; (2) Trains video diffusion models to distinguish and avoid artifacts introduced by editing. This approach eliminates the need for costly synthetic video comparisons, provides unambiguous quality signals, and enables unlimited training data expansion through simple editing operations. We theoretically prove the framework's effectiveness even when real videos and model-generated videos follow different distributions. Experiments on CogVideoX demonstrate the efficiency of the proposed method.

Haoran Cheng、Qide Dong、Liang Peng、Zhizhou Sha、Weiguo Feng、Jinghui Xie、Zhao Song、Shilei Wen、Xiaofei He、Boxi Wu

计算技术、计算机技术

Haoran Cheng,Qide Dong,Liang Peng,Zhizhou Sha,Weiguo Feng,Jinghui Xie,Zhao Song,Shilei Wen,Xiaofei He,Boxi Wu.Discriminator-Free Direct Preference Optimization for Video Diffusion[EB/OL].(2025-04-11)[2025-04-26].https://arxiv.org/abs/2504.08542.点此复制

评论