|国家预印本平台
首页|AirScape: An Aerial Generative World Model with Motion Controllability

AirScape: An Aerial Generative World Model with Motion Controllability

AirScape: An Aerial Generative World Model with Motion Controllability

来源:Arxiv_logoArxiv
英文摘要

How to enable robots to predict the outcomes of their own motion intentions in three-dimensional space has been a fundamental problem in embodied intelligence. To explore more general spatial imagination capabilities, here we present AirScape, the first world model designed for six-degree-of-freedom aerial agents. AirScape predicts future observation sequences based on current visual inputs and motion intentions. Specifically, we construct an dataset for aerial world model training and testing, which consists of 11k video-intention pairs. This dataset includes first-person-view videos capturing diverse drone actions across a wide range of scenarios, with over 1,000 hours spent annotating the corresponding motion intentions. Then we develop a two-phase training schedule to train a foundation model -- initially devoid of embodied spatial knowledge -- into a world model that is controllable by motion intentions and adheres to physical spatio-temporal constraints.

Baining Zhao、Rongze Tang、Mingyuan Jia、Ziyou Wang、Fanghang Man、Xin Zhang、Yu Shang、Weichen Zhang、Chen Gao、Wei Wu、Xin Wang、Xinlei Chen、Yong Li

航空自动化技术、自动化技术设备计算技术、计算机技术

Baining Zhao,Rongze Tang,Mingyuan Jia,Ziyou Wang,Fanghang Man,Xin Zhang,Yu Shang,Weichen Zhang,Chen Gao,Wei Wu,Xin Wang,Xinlei Chen,Yong Li.AirScape: An Aerial Generative World Model with Motion Controllability[EB/OL].(2025-07-10)[2025-07-23].https://arxiv.org/abs/2507.08885.点此复制

评论