首页|Distillation versus Contrastive Learning: How to Train Your Rerankers

Distillation versus Contrastive Learning: How to Train Your Rerankers

来源：

英文摘要

Training text rerankers is crucial for information retrieval. Two primary strategies are widely used: contrastive learning (optimizing directly on ground-truth labels) and knowledge distillation (transferring knowledge from a larger reranker). While both have been studied in the literature, a clear comparison of their effectiveness for training cross-encoder rerankers under practical conditions is needed. This paper empirically compares these strategies by training rerankers of different sizes and architectures using both methods on the same data, with a strong contrastive learning model acting as the distillation teacher. Our results show that knowledge distillation generally yields better in-domain and out-of-domain ranking performance than contrastive learning when distilling from a larger teacher model. This finding is consistent across student model sizes and architectures. However, distilling from a teacher of the same capacity does not provide the same advantage, particularly for out-of-domain tasks. These findings offer practical guidance for choosing a training strategy based on available teacher models. Therefore, we recommend using knowledge distillation to train smaller rerankers if a larger, more powerful teacher is accessible; in its absence, contrastive learning provides a strong and more reliable alternative otherwise.

作者：Zhichao Xu、Zhiqi Huang、Shengyao Zhuang、Ashim Gupta、Vivek Srikumar

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Zhichao Xu,Zhiqi Huang,Shengyao Zhuang,Ashim Gupta,Vivek Srikumar.Distillation versus Contrastive Learning: How to Train Your Rerankers[EB/OL].(2025-07-11)[2025-08-02].https://arxiv.org/abs/2507.08336.点此复制

Distillation versus Contrastive Learning: How to Train Your Rerankers

Distillation versus Contrastive Learning: How to Train Your Rerankers

评论