|国家预印本平台
首页|Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation

Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation

Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation

来源:Arxiv_logoArxiv
英文摘要

Large Language Models (LLMs) are becoming ubiquitous, promising automation even in high-stakes scenarios. However, existing evaluation methods often fall short -- benchmarks saturate, accuracy-based metrics are overly simplistic, and many inherently ambiguous problems lack a clear ground truth. Given these limitations, evaluating fairness becomes complex. To address this, we reframe fairness evaluation using Borda scores, a method from voting theory, as a nuanced yet interpretable metric for measuring fairness. Using organ allocation as a case study, we introduce two tasks: (1) Choose-One and (2) Rank-All. In Choose-One, LLMs select a single candidate for a kidney, and we assess fairness across demographics using proportional parity. In Rank-All, LLMs rank all candidates for a kidney, reflecting real-world allocation processes. Since traditional fairness metrics do not account for ranking, we propose a novel application of Borda scoring to capture biases. Our findings highlight the potential of voting-based metrics to provide a richer, more multifaceted evaluation of LLM fairness.

Hannah Murray、Brian Hyeongseok Kim、Isabelle Lee、Jason Byun、Dani Yogatama、Evi Micha

医学现状、医学发展医学研究方法

Hannah Murray,Brian Hyeongseok Kim,Isabelle Lee,Jason Byun,Dani Yogatama,Evi Micha.Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation[EB/OL].(2025-03-29)[2025-05-04].https://arxiv.org/abs/2504.03716.点此复制

评论