Towards Assessing Medical Ethics from Knowledge to Practice
Towards Assessing Medical Ethics from Knowledge to Practice
The integration of large language models into healthcare necessitates a rigorous evaluation of their ethical reasoning, an area current benchmarks often overlook. We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions designed to systematically assess LLMs' alignment with core medical ethics. Grounded in Principlism, our benchmark features a high-quality dataset. This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature, all validated by medical experts. Our experiments reveal a significant gap between models' ethical knowledge and their practical application, especially in dynamically applying ethical principles to real-world scenarios. Most LLMs struggle with dilemmas concerning Beneficence, often over-emphasizing other principles. Frontier closed-source models, driven by strong general capabilities, currently lead the benchmark. Notably, medical domain fine-tuning can enhance models' overall ethical competence, but further progress requires better alignment with medical ethical knowledge. PrinciplismQA offers a scalable framework to diagnose these specific ethical weaknesses, paving the way for more balanced and responsible medical AI.
Yan Hu、Chang Hong、Minghao Wu、Qingying Xiao、Yuchi Wang、Xiang Wan、Guangjun Yu、Benyou Wang
医药卫生理论医学现状、医学发展医学研究方法
Yan Hu,Chang Hong,Minghao Wu,Qingying Xiao,Yuchi Wang,Xiang Wan,Guangjun Yu,Benyou Wang.Towards Assessing Medical Ethics from Knowledge to Practice[EB/OL].(2025-08-07)[2025-08-18].https://arxiv.org/abs/2508.05132.点此复制
评论