|国家预印本平台
首页|ContentV: Efficient Training of Video Generation Models with Limited Compute

ContentV: Efficient Training of Video Generation Models with Limited Compute

ContentV: Efficient Training of Video Generation Models with Limited Compute

来源:Arxiv_logoArxiv
英文摘要

Recent advances in video generation demand increasingly efficient training recipes to mitigate escalating computational costs. In this report, we present ContentV, an 8B-parameter text-to-video model that achieves state-of-the-art performance (85.14 on VBench) after training on 256 x 64GB Neural Processing Units (NPUs) for merely four weeks. ContentV generates diverse, high-quality videos across multiple resolutions and durations from text prompts, enabled by three key innovations: (1) A minimalist architecture that maximizes reuse of pre-trained image generation models for video generation; (2) A systematic multi-stage training strategy leveraging flow matching for enhanced efficiency; and (3) A cost-effective reinforcement learning with human feedback framework that improves generation quality without requiring additional human annotations. All the code and models are available at: https://contentv.github.io.

Wenfeng Lin、Renjie Chen、Boyuan Liu、Shiyue Yan、Ruoyu Feng、Jiangchuan Wei、Yichen Zhang、Yimeng Zhou、Chao Feng、Jiao Ran、Qi Wu、Zuotao Liu、Mingyu Guo

计算技术、计算机技术

Wenfeng Lin,Renjie Chen,Boyuan Liu,Shiyue Yan,Ruoyu Feng,Jiangchuan Wei,Yichen Zhang,Yimeng Zhou,Chao Feng,Jiao Ran,Qi Wu,Zuotao Liu,Mingyu Guo.ContentV: Efficient Training of Video Generation Models with Limited Compute[EB/OL].(2025-06-05)[2025-06-27].https://arxiv.org/abs/2506.05343.点此复制

评论