首页|Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

来源：

英文摘要

This paper explores generalised probabilistic modelling and uncertainty estimation in comparative LLM-as-a-judge frameworks. We show that existing Product-of-Experts methods are specific cases of a broader framework, enabling diverse modelling options. Furthermore, we propose improved uncertainty estimates for individual comparisons, enabling more efficient selection and achieving strong performance with fewer evaluations. We also introduce a method for estimating overall ranking uncertainty. Finally, we demonstrate that combining absolute and comparative scoring improves performance. Experiments show that the specific expert model has a limited impact on final rankings but our proposed uncertainty estimates, especially the probability of reordering, significantly improve the efficiency of systems reducing the number of needed comparisons by ~50%. Furthermore, ranking-level uncertainty metrics can be used to identify low-performing predictions, where the nature of the probabilistic model has a notable impact on the quality of the overall uncertainty.

作者：Yassir Fathullah、Mark J. F. Gales

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Yassir Fathullah,Mark J. F. Gales.Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge[EB/OL].(2025-05-21)[2025-06-24].https://arxiv.org/abs/2505.15240.点此复制

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

评论