|国家预印本平台
首页|Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets

Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets

Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets

来源:Arxiv_logoArxiv
英文摘要

Existing multimodal methods typically assume that different modalities share the same category set. However, in real-world applications, the category distributions in multimodal data exhibit inconsistencies, which can hinder the model's ability to effectively utilize cross-modal information for recognizing all categories. In this work, we propose the practical setting termed Multi-Modal Heterogeneous Category-set Learning (MMHCL), where models are trained in heterogeneous category sets of multi-modal data and aim to recognize complete classes set of all modalities during test. To effectively address this task, we propose a Class Similarity-based Cross-modal Fusion model (CSCF). Specifically, CSCF aligns modality-specific features to a shared semantic space to enable knowledge transfer between seen and unseen classes. It then selects the most discriminative modality for decision fusion through uncertainty estimation. Finally, it integrates cross-modal information based on class similarity, where the auxiliary modality refines the prediction of the dominant one. Experimental results show that our method significantly outperforms existing state-of-the-art (SOTA) approaches on multiple benchmark datasets, effectively addressing the MMHCL task.

Yangrui Zhu、Junhua Bao、Yipan Wei、Yapeng Li、Bo Du

计算技术、计算机技术

Yangrui Zhu,Junhua Bao,Yipan Wei,Yapeng Li,Bo Du.Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets[EB/OL].(2025-06-11)[2025-07-16].https://arxiv.org/abs/2506.09745.点此复制

评论