|国家预印本平台
首页|FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving

来源:Arxiv_logoArxiv
英文摘要

The integration of data from diverse sensor modalities (e.g., camera and LiDAR) constitutes a prevalent methodology within the ambit of autonomous driving scenarios. Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats. When it comes to fusion, since image patches are dense in pixel space with ambiguous depth, it necessitates additional design considerations for effective fusion. In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion. This investigation encompasses strategies for image-to-3D and LiDAR-to-2D mapping, attention neighbor grouping, single modal tokenizer, and micro-structure of Transformer. By amalgamating the most effective principles uncovered through our investigation, we introduce FlatFusion, a carefully designed framework for sparse camera-LiDAR fusion. Notably, FlatFusion significantly outperforms state-of-the-art sparse Transformer-based methods, including UniTR, CMT, and SparseFusion, achieving 73.7 NDS on the nuScenes validation set with 10.1 FPS with PyTorch.

Yutao Zhu、Xiaosong Jia、Xinyu Yang、Junchi Yan

公路运输工程

Yutao Zhu,Xiaosong Jia,Xinyu Yang,Junchi Yan.FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving[EB/OL].(2025-06-30)[2025-07-23].https://arxiv.org/abs/2408.06832.点此复制

评论