Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems
Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences. We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.
Seyedeh Baharan Khatami、Sayan Chakraborty、Ruomeng Xu、Babak Salimi
计算技术、计算机技术
Seyedeh Baharan Khatami,Sayan Chakraborty,Ruomeng Xu,Babak Salimi.Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems[EB/OL].(2025-04-04)[2025-05-23].https://arxiv.org/abs/2504.03997.点此复制
评论