|国家预印本平台
首页|SCMM: Calibrating Cross-modal Fusion for Text-Based Person Search

SCMM: Calibrating Cross-modal Fusion for Text-Based Person Search

SCMM: Calibrating Cross-modal Fusion for Text-Based Person Search

来源:Arxiv_logoArxiv
英文摘要

Text-Based Person Search (TBPS) faces critical challenges in cross-modal information fusion, requiring effective alignment of visual and textual modalities for person retrieval using natural language queries. Existing methods struggle with cross-modal heterogeneity, where visual and textual features reside in disparate semantic spaces, creating substantial inter-modal gaps that limit fusion effectiveness. We propose SCMM (Sew Calibration and Masked Modeling), a novel framework addressing these fusion challenges through two complementary mechanisms. First, our sew calibration loss implements adaptive margin constraints guided by caption quality, dynamically aligning image-text features while accommodating varying information density across modalities. Second, our masked caption modeling loss establishes fine-grained cross-modal correspondences through masked prediction tasks and cross-modal attention, enabling detailed visual-textual relationship learning. The streamlined dual-encoder architecture maintains computational efficiency while achieving superior fusion performance through synergistic alignment and correspondence strategies. Extensive experiments on three benchmark datasets validate SCMM's effectiveness, achieving state-of-the-art Rank1 accuracies of 73.81%, 64.25%, and 57.35% on CUHK-PEDES, ICFG-PEDES, and RSTPReID respectively. These results demonstrate the importance of quality-aware adaptive constraints and fine-grained correspondence modeling in advancing multimodal information fusion for person search applications.

Jing Liu、Donglai Wei、Yang Liu、Sipeng Zhang、Tong Yang、Wei Zhou、Weiping Ding、Victor C. M. Leung

计算技术、计算机技术

Jing Liu,Donglai Wei,Yang Liu,Sipeng Zhang,Tong Yang,Wei Zhou,Weiping Ding,Victor C. M. Leung.SCMM: Calibrating Cross-modal Fusion for Text-Based Person Search[EB/OL].(2025-07-09)[2025-07-22].https://arxiv.org/abs/2304.02278.点此复制

评论