首页|Probabilistic Adaptation of Text-to-Video Models

Probabilistic Adaptation of Text-to-Video Models

来源：

英文摘要

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expensive. Inspired by how a small modifiable component (e.g., prompts, prefix-tuning) can adapt a large language model to perform new tasks without requiring access to the model weights, we investigate how to adapt a large pretrained text-to-video model to a variety of downstream domains and tasks without finetuning. In answering this question, we propose Video Adapter, which leverages the score function of a large pretrained video diffusion model as a probabilistic prior to guide the generation of a task-specific small video model. Our experiments show that Video Adapter is capable of incorporating the broad knowledge and preserving the high fidelity of a large pretrained video model in a task-specific small video model that is able to generate high-quality yet specialized videos on a variety of tasks such as animation, egocentric modeling, and modeling of simulated and real-world robotics data. More videos can be found on the website https://video-adapter.github.io/.

作者：Bo Dai、Mengjiao Yang、Pieter Abbeel、Joshua B. Tenenbaum、Dale Schuurmans、Yilun Du

作者单位：

学科分类：计算技术、计算机技术电子技术应用自动化技术、自动化技术设备

推荐引用：Bo Dai,Mengjiao Yang,Pieter Abbeel,Joshua B. Tenenbaum,Dale Schuurmans,Yilun Du.Probabilistic Adaptation of Text-to-Video Models[EB/OL].(2023-06-02)[2025-08-10].https://arxiv.org/abs/2306.01872.点此复制

Probabilistic Adaptation of Text-to-Video Models

Probabilistic Adaptation of Text-to-Video Models

评论