首页|基于混合状态空间与Transformer的高效半密集景象匹配网络

基于混合状态空间与Transformer的高效半密集景象匹配网络

沈天齐李宁

来源：

中国科技论文在线

基于混合状态空间与Transformer的高效半密集景象匹配网络

Efficient Semi-Dense Scene Matching Network Based on Hybrid State Space Model and Transformer

沈天齐 ¹李宁¹

作者信息

1. 北京邮电大学电子工程学院，北京　100876
折叠

摘要

针对高分辨率景象匹配中 Transformer 全局注意力机制计算复杂度随分辨率呈二次方增长、难以兼顾匹配精度与实时性的问题，提出一种高效的混合半密集景象匹配网络 TMamba。首先，采用轻量化 VGG 风格骨干网络替代传统深层残差网络，显著降低前端特征提取开销；其次，构建由 MambaVision 与下采样 Transformer 组成的串联混合交互模块，通过 Mamba 捕捉局部高频几何特征，利用 Transformer 建模全局低频拓扑关系，实现低计算量的深度特征对齐；最后，通过多级损失函数进行由粗到细的特征匹配。在 MegaDepth、HPatches 以及自建复杂景象数据集 SceneMatch 上的实验表明，TMamba 的参数量仅为 1.5M（约为基准模型的 15.8%），在 MegaDepth 数据集上推理耗时较现有先进算法 ELoFTR 降低约 12.6%，同时匹配准确度最高提升至 86.4%。该网络有效打破了受限算力平台下高分辨率图像匹配的瓶颈，具有极强的工程应用前景。

Abstract

To address the quadratic computational complexity inherent in Transformers for high-resolution scene matching, which hampers the balance between accuracy and real-time performance, an efficient hybrid semi-dense matching network named TMamba is proposed. Firstly, a compact VGG-style backbone is employed for multi-scale feature extraction, significantly reducing the front-end computational overhead. Secondly, a serial hybrid interaction module, integrating MambaVision and a down-sampled Transformer, is developed to capture local high-frequency geometric features and global topological relationships with low computational cost. Finally, a coarse-to-fine feature matching strategy is implemented via multi-level loss functions. Experimental results on MegaDepth, HPatches, and the custom SceneMatch datasets demonstrate that TMamba reduces model parameters to merely 1.5M and decreases inference latency by 12.6% on the MegaDepth dataset compared to the state-of-the-art ELoFTR, while achieving a peak matching accuracy of 86.4%. The proposed architecture effectively overcomes the computational bottleneck for high-resolution matching on resource-constrained platforms.

关键词

信息处理技术/图像匹配/景象匹配/视觉状态空间模型/轻量化设计

Key words

Information processing technology/Image matching/Scene matching/Visual state space model/Lightweight design

引用本文复制引用

沈天齐,李宁.基于混合状态空间与Transformer的高效半密集景象匹配网络[EB/OL].(2026-05-09)[2026-05-11].http://www.paper.edu.cn/releasepaper/content/202605-22.

学科分类

计算技术、计算机技术

首发时间： 2026-05-09

下载量：0

点击量：11

段落导航

基于混合状态空间与Transformer的高效半密集景象匹配网络

基于混合状态空间与Transformer的高效半密集景象匹配网络

Efficient Semi-Dense Scene Matching Network Based on Hybrid State Space Model and Transformer

摘要

Abstract

关键词

Key words

引用本文复制引用

学科分类

评论