|国家预印本平台
首页|Robust Multimodal Learning via Entropy-Gated Contrastive Fusion

Robust Multimodal Learning via Entropy-Gated Contrastive Fusion

Robust Multimodal Learning via Entropy-Gated Contrastive Fusion

来源:Arxiv_logoArxiv
英文摘要

Real-world multimodal systems routinely face missing-input scenarios, and in reality, robots lose audio in a factory or a clinical record omits lab tests at inference time. Standard fusion layers either preserve robustness or calibration but never both. We introduce Adaptive Entropy-Gated Contrastive Fusion (AECF), a single light-weight layer that (i) adapts its entropy coefficient per instance, (ii) enforces monotone calibration across all modality subsets, and (iii) drives a curriculum mask directly from training-time entropy. On AV-MNIST and MS-COCO, AECF improves masked-input mAP by +18 pp at a 50% drop rate while reducing ECE by up to 200%, yet adds 1% run-time. All back-bones remain frozen, making AECF an easy drop-in layer for robust, calibrated multimodal inference.

Leon Chlon、Maggie Chlon、MarcAntonio M. Awada

计算技术、计算机技术

Leon Chlon,Maggie Chlon,MarcAntonio M. Awada.Robust Multimodal Learning via Entropy-Gated Contrastive Fusion[EB/OL].(2025-05-21)[2025-06-17].https://arxiv.org/abs/2505.15417.点此复制

评论