|国家预印本平台
首页|Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval

来源:Arxiv_logoArxiv
英文摘要

Remote Sensing Image-Text Retrieval (RSITR) plays a critical role in geographic information interpretation, disaster monitoring, and urban planning by establishing semantic associations between image and textual descriptions. Existing Parameter-Efficient Fine-Tuning (PEFT) methods for Vision-and-Language Pre-training (VLP) models typically adopt symmetric adapter structures for exploring cross-modal correlations. However, the strong discriminative nature of text modality may dominate the optimization process and inhibits image representation learning. The nonnegligible imbalanced cross-modal optimization remains a bottleneck to enhancing the model performance. To address this issue, this study proposes a Representation Discrepancy Bridging (RDB) method for the RSITR task. On the one hand, a Cross-Modal Asymmetric Adapter (CMAA) is designed to enable modality-specific optimization and improve feature alignment. The CMAA comprises a Visual Enhancement Adapter (VEA) and a Text Semantic Adapter (TSA). VEA mines fine-grained image features by Differential Attention (DA) mechanism, while TSA identifies key textual semantics through Hierarchical Attention (HA) mechanism. On the other hand, this study extends the traditional single-task retrieval framework to a dual-task optimization framework and develops a Dual-Task Consistency Loss (DTCL). The DTCL improves cross-modal alignment robustness through an adaptive weighted combination of cross-modal, classification, and exponential moving average consistency constraints. Experiments on RSICD and RSITMD datasets show that the proposed RDB method achieves a 6%-11% improvement in mR metrics compared to state-of-the-art PEFT methods and a 1.15%-2% improvement over the full fine-tuned GeoRSCLIP model.

Hailong Ning、Siying Wang、Tao Lei、Xiaopeng Cao、Huanmin Dou、Bin Zhao、Asoke K. Nandi、Petia Radeva

遥感技术

Hailong Ning,Siying Wang,Tao Lei,Xiaopeng Cao,Huanmin Dou,Bin Zhao,Asoke K. Nandi,Petia Radeva.Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval[EB/OL].(2025-05-22)[2025-07-20].https://arxiv.org/abs/2505.16756.点此复制

评论