Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare
Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare
Generative agents have been increasingly used to simulate human behaviour in silico, driven by large language models (LLMs). These simulacra serve as sandboxes for studying human behaviour without compromising privacy or safety. However, it remains unclear whether such agents can truly represent real individuals. This work compares survey data from the Understanding America Study (UAS) on healthcare decision-making with simulated responses from generative agents. Using demographic-based prompt engineering, we create digital twins of survey respondents and analyse how well different LLMs reproduce real-world behaviours. Our findings show that some LLMs fail to reflect realistic decision-making, such as predicting universal vaccine acceptance. However, Llama 3 captures variations across race and Income more accurately but also introduces biases not present in the UAS data. This study highlights the potential of generative agents for behavioural research while underscoring the risks of bias from both LLMs and prompting strategies.
Yonchanok Khaokaew、Flora D. Salim、Andreas Züfle、Hao Xue、Taylor Anderson、C. Raina MacIntyre、Matthew Scotch、David J Heslop
医学研究方法计算技术、计算机技术
Yonchanok Khaokaew,Flora D. Salim,Andreas Züfle,Hao Xue,Taylor Anderson,C. Raina MacIntyre,Matthew Scotch,David J Heslop.Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare[EB/OL].(2025-04-11)[2025-04-26].https://arxiv.org/abs/2504.08260.点此复制
评论