首页|MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

来源：

英文摘要

Monocular 3D object detection reveals an economical but challenging task in autonomous driving. Recently center-based monocular methods have developed rapidly with a great trade-off between speed and accuracy, where they usually depend on the object center's depth estimation via 2D features. However, the visual semantic features without sufficient pixel geometry information, may affect the performance of clues for spatial 3D detection tasks. To alleviate this, we propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts. We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features. In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently. Besides, we design a novel depth-gradient positional encoding (DGPE) to bring more distinct pixel geometry contexts into the transformer for better object detection. Extensive experiments demonstrate that our method achieves the state-of-the-art performance on the KITTI dataset.

作者：Guilian Chen、Lei Wang、Yuanzhu Gan、Zizhang Wu、Jian Pu

作者单位：

学科分类：计算技术、计算机技术自动化技术、自动化技术设备

推荐引用：Guilian Chen,Lei Wang,Yuanzhu Gan,Zizhang Wu,Jian Pu.MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts[EB/OL].(2023-02-21)[2025-06-26].https://arxiv.org/abs/2302.10549.点此复制

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

评论