首页|DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

来源：

英文摘要

Vision-Language-Action (VLA) models have advanced autonomous driving, but existing benchmarks still lack scenario diversity, reliable action-level annotation, and evaluation protocols aligned with human preferences. To address these limitations, we introduce DriveAction, the first action-driven benchmark specifically designed for VLA models, comprising 16,185 QA pairs generated from 2,610 driving scenarios. DriveAction leverages real-world driving data proactively collected by users of production-level autonomous vehicles to ensure broad and representative scenario coverage, offers high-level discrete action labels collected directly from users' actual driving operations, and implements an action-rooted tree-structured evaluation framework that explicitly links vision, language, and action tasks, supporting both comprehensive and task-specific assessment. Our experiments demonstrate that state-of-the-art vision-language models (VLMs) require both vision and language guidance for accurate action prediction: on average, accuracy drops by 3.3% without vision input, by 4.1% without language input, and by 8.0% without either. Our evaluation supports precise identification of model bottlenecks with robust and consistent results, thus providing new insights and a rigorous foundation for advancing human-like decisions in autonomous driving.

作者：Yuhan Hao、Zhengning Li、Lei Sun、Weilong Wang、Naixin Yi、Sheng Song、Caihong Qin、Mofan Zhou、Yifei Zhan、Peng Jia、Xianpeng Lang

作者单位：

学科分类：自动化技术、自动化技术设备计算技术、计算机技术

推荐引用：Yuhan Hao,Zhengning Li,Lei Sun,Weilong Wang,Naixin Yi,Sheng Song,Caihong Qin,Mofan Zhou,Yifei Zhan,Peng Jia,Xianpeng Lang.DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models[EB/OL].(2025-06-05)[2025-06-25].https://arxiv.org/abs/2506.05667.点此复制

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models

评论