|国家预印本平台
首页|Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

来源:Arxiv_logoArxiv
英文摘要

Vision-Language-Action (VLA) models for autonomous driving show promise but falter in unstructured corner case scenarios, largely due to a scarcity of targeted benchmarks. To address this, we introduce Impromptu VLA. Our core contribution is the Impromptu VLA Dataset: over 80,000 meticulously curated video clips, distilled from over 2M source clips sourced from 8 open-source large-scale datasets. This dataset is built upon our novel taxonomy of four challenging unstructured categories and features rich, planning-oriented question-answering annotations and action trajectories. Crucially, experiments demonstrate that VLAs trained with our dataset achieve substantial performance gains on established benchmarks--improving closed-loop NeuroNCAP scores and collision rates, and reaching near state-of-the-art L2 accuracy in open-loop nuScenes trajectory prediction. Furthermore, our Q&A suite serves as an effective diagnostic, revealing clear VLM improvements in perception, prediction, and planning. Our code, data and models are available at https://github.com/ahydchh/Impromptu-VLA.

Haohan Chi、Huan-ang Gao、Ziming Liu、Jianing Liu、Chenyu Liu、Jinwei Li、Kaisen Yang、Yangcheng Yu、Zeda Wang、Wenyi Li、Leichen Wang、Xingtao Hu、Hao Sun、Hang Zhao、Hao Zhao

自动化技术、自动化技术设备计算技术、计算机技术

Haohan Chi,Huan-ang Gao,Ziming Liu,Jianing Liu,Chenyu Liu,Jinwei Li,Kaisen Yang,Yangcheng Yu,Zeda Wang,Wenyi Li,Leichen Wang,Xingtao Hu,Hao Sun,Hang Zhao,Hao Zhao.Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models[EB/OL].(2025-05-29)[2025-07-16].https://arxiv.org/abs/2505.23757.点此复制

评论