首页|Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

来源：

英文摘要

Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by user-specified motion trajectory. To support this task, we introduce a new dataset curated semi-automatically, a comprehensive evaluation protocol targeting this setting, and an efficient identity-preserving motion-controllable video Diffusion Transformer architecture. Our evaluation shows that our proposed approach significantly outperforms existing baselines.

作者：Boyang Wang、Xuweiyi Chen、Matheus Gadelha、Zezhou Cheng

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Boyang Wang,Xuweiyi Chen,Matheus Gadelha,Zezhou Cheng.Frame In-N-Out: Unbounded Controllable Image-to-Video Generation[EB/OL].(2025-05-27)[2025-06-23].https://arxiv.org/abs/2505.21491.点此复制

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

评论