|国家预印本平台
首页|FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

来源:Arxiv_logoArxiv
英文摘要

False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across two widely adopted VLP frameworks (ALBEF, BLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.

Myunsoo Kim、Seong-Woong Shim、Byung-Jun Lee

计算技术、计算机技术

Myunsoo Kim,Seong-Woong Shim,Byung-Jun Lee.FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining[EB/OL].(2025-05-16)[2025-06-18].https://arxiv.org/abs/2505.11192.点此复制

评论