|国家预印本平台
| 注册
首页|基于混合状态空间与Transformer的高效半密集景象匹配网络

基于混合状态空间与Transformer的高效半密集景象匹配网络

沈天齐 李宁

基于混合状态空间与Transformer的高效半密集景象匹配网络

Efficient Semi-Dense Scene Matching Network Based on Hybrid State Space Model and Transformer

沈天齐 1李宁1

作者信息

  • 1. 北京邮电大学电子工程学院,北京 100876
  • 折叠

摘要

针对高分辨率景象匹配中 Transformer 全局注意力机制计算复杂度随分辨率呈二次方增长、难以兼顾匹配精度与实时性的问题,提出一种高效的混合半密集景象匹配网络 TMamba。首先,采用轻量化 VGG 风格骨干网络替代传统深层残差网络,显著降低前端特征提取开销;其次,构建由 MambaVision 与下采样 Transformer 组成的串联混合交互模块,通过 Mamba 捕捉局部高频几何特征,利用 Transformer 建模全局低频拓扑关系,实现低计算量的深度特征对齐;最后,通过多级损失函数进行由粗到细的特征匹配。在 MegaDepth、HPatches 以及自建复杂景象数据集 SceneMatch 上的实验表明,TMamba 的参数量仅为 1.5M(约为基准模型的 15.8%),在 MegaDepth 数据集上推理耗时较现有先进算法 ELoFTR 降低约 12.6%,同时匹配准确度最高提升至 86.4%。该网络有效打破了受限算力平台下高分辨率图像匹配的瓶颈,具有极强的工程应用前景。

Abstract

To address the quadratic computational complexity inherent in Transformers for high-resolution scene matching, which hampers the balance between accuracy and real-time performance, an efficient hybrid semi-dense matching network named TMamba is proposed. Firstly, a compact VGG-style backbone is employed for multi-scale feature extraction, significantly reducing the front-end computational overhead. Secondly, a serial hybrid interaction module, integrating MambaVision and a down-sampled Transformer, is developed to capture local high-frequency geometric features and global topological relationships with low computational cost. Finally, a coarse-to-fine feature matching strategy is implemented via multi-level loss functions. Experimental results on MegaDepth, HPatches, and the custom SceneMatch datasets demonstrate that TMamba reduces model parameters to merely 1.5M and decreases inference latency by 12.6% on the MegaDepth dataset compared to the state-of-the-art ELoFTR, while achieving a peak matching accuracy of 86.4%. The proposed architecture effectively overcomes the computational bottleneck for high-resolution matching on resource-constrained platforms.

关键词

信息处理技术/图像匹配/景象匹配/视觉状态空间模型/轻量化设计

Key words

Information processing technology/Image matching/Scene matching/Visual state space model/Lightweight design

引用本文复制引用

沈天齐,李宁.基于混合状态空间与Transformer的高效半密集景象匹配网络[EB/OL].(2026-05-09)[2026-05-11].http://www.paper.edu.cn/releasepaper/content/202605-22.

学科分类

计算技术、计算机技术

评论

首发时间 2026-05-09
下载量:0
|
点击量:11
段落导航相关论文