|国家预印本平台
首页|Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery

Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery

Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery

来源:Arxiv_logoArxiv
英文摘要

We present GLOD, a transformer-first architecture for object detection in high-resolution satellite imagery. GLOD replaces CNN backbones with a Swin Transformer for end-to-end feature extraction, combined with novel UpConvMixer blocks for robust upsampling and Fusion Blocks for multi-scale feature integration. Our approach achieves 32.95\% on xView, outperforming SOTA methods by 11.46\%. Key innovations include asymmetric fusion with CBAM attention and a multi-path head design capturing objects across scales. The architecture is optimized for satellite imagery challenges, leveraging spatial priors while maintaining computational efficiency.

Nicolas Drapier、Aladine Chetouani、Aurélien Chateigner

航空航天技术航天测绘学

Nicolas Drapier,Aladine Chetouani,Aurélien Chateigner.Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery[EB/OL].(2025-07-15)[2025-08-02].https://arxiv.org/abs/2507.11040.点此复制

评论