Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation
Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation
Accurate accident anticipation remains challenging when driver cognition and dynamic road conditions are underrepresented in predictive models. In this paper, we propose CAMERA (Context-Aware Multi-modal Enhanced Risk Anticipation), a multi-modal framework integrating dashcam video, textual annotations, and driver attention maps for robust accident anticipation. Unlike existing methods that rely on static or environment-centric thresholds, CAMERA employs an adaptive mechanism guided by scene complexity and gaze entropy, reducing false alarms while maintaining high recall in dynamic, multi-agent traffic scenarios. A hierarchical fusion pipeline with Bi-GRU (Bidirectional GRU) captures spatio-temporal dependencies, while a Geo-Context Vision-Language module translates 3D spatial relationships into interpretable, human-centric alerts. Evaluations on the DADA-2000 and benchmarks show that CAMERA achieves state-of-the-art performance, improving accuracy and lead time. These results demonstrate the effectiveness of modeling driver attention, contextual description, and adaptive risk thresholds to enable more reliable accident anticipation.
Jiaxun Zhang、Haicheng Liao、Yumu Xie、Chengyue Wang、Yanchen Guan、Bin Rao、Zhenning Li
公路运输工程
Jiaxun Zhang,Haicheng Liao,Yumu Xie,Chengyue Wang,Yanchen Guan,Bin Rao,Zhenning Li.Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation[EB/OL].(2025-07-08)[2025-07-16].https://arxiv.org/abs/2507.06444.点此复制
评论