Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation
Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation
The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global associations from CM-SS2D with local spatial features extracted through convolutional operations. In contrast with Transformer-based approaches, CM-SSM achieves linear computational complexity with respect to image resolution. Experimental results show that CM-SSM achieves state-of-the-art performance on the CART dataset with fewer parameters and lower computational cost. Further experiments on the PST900 dataset demonstrate its generalizability. Codes are available at https://github.com/xiaodonguo/CMSSM.
Xiaodong Guo、Zi'ang Lin、Luwen Hu、Zhihong Deng、Tong Liu、Wujie Zhou
计算技术、计算机技术电子技术应用
Xiaodong Guo,Zi'ang Lin,Luwen Hu,Zhihong Deng,Tong Liu,Wujie Zhou.Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation[EB/OL].(2025-06-22)[2025-07-21].https://arxiv.org/abs/2506.17869.点此复制
评论