大语言模型的人格化对齐及其对道德判断的影响
Personalized Alignment of Large Language Models and Its Impact on Moral Judgment
李昌锦 1焦丽颖 2陈圳 1许恒彬 1吴胜涛 3许燕1
作者信息
- 1. 心理学国家级实验教学示范中心(北京师范大学);应用实验心理北京市重点实验室;北京师范大学心理学部
- 2. 北京林业大学人文社会科学学院心理学系
- 3. 吉林大学哲学社会学院哲学系
- 折叠
摘要
随着人机共生时代的到来,大语言模型(LLMs)的伦理困境与算法偏见引发了社会广泛的担忧,引导人工智能技术实现向善发展已成为该领域极具紧迫性和挑战性的重要议题。研究探讨了基于HEXACO人格模型的人格化对齐对LLMs道德判断的影响,其中研究1检验并证实了LLMs可以通过遵循提示词有效表达HEXACO人格特质,研究2探讨了人格化对齐对LLMs功利主义倾向的影响及其与人类的异同。结果表明,高诚实-谦恭、宜人性和尽责性的人格提示词显著减少了GPT-3.5、GPT-4和ERNIE 3.5做出功利主义选择的倾向。由此,本研究提出基于HEXACO人格模型和人格元特质理论的LLMs人格化对齐框架,强调稳定性元特质中的诚实-谦恭、宜人性和尽责性等维度在LLMs人格化对齐中的道德凸显效应。本研究为人工智能人格化对齐的理论构建与技术路径提供了心理学依据。
Abstract
With the advent of the human-machine symbiosis era, the ethical dilemmas and algorithmic biases of large language models (LLMs) have triggered widespread ethical concerns. Guiding artificial intelligence (AI) toward benevolence has thus become an urgent and challenging imperative. This research explores a personalized alignment approach based on the HEXACO personality model and examines its impact on the moral judgment of LLMs. Specifically, the study aims to verify whether LLMs can effectively achieve personalized alignment through prompting and to systematically evaluate how such alignment influences utilitarian tendencies in LLMs compared to humans across various moral dilemmas. By leveraging mature psychological frameworks, this research seeks to provide a scientific basis for constructing controllable and ethical AI alignment strategies.Study 1 tested GPT-3.5, GPT-4, and ERNIE 3.5 using HEXACO-based personality prompts across six domains at high, low, and baseline levels, integrated with different gender roles. Manipulation checks were conducted using two distinct methods: a quantitative assessment using the HEXACO-PI-R scale and a qualitative personal story-writing task rated by independent human evaluators. Study 2 utilized a set of standardized moral dilemmas to assess utilitarian versus deontological choices in both LLMs and human participants. Human data were categorized into high and low personality groups for comparison, while the LLMs performed the same moral judgment tasks under various personality settings to identify shifts in decision-making patterns.The results of Study 1 confirmed the feasibility of personalized alignment, demonstrating that LLMs can dynamically represent HEXACO personality traits through prompts. Among the LLMs tested, GPT-4 exhibited superior instruction-following capabilities and more distinct trait differentiation than the other LLMs. Findings from Study 2 revealed that personality alignment significantly alters the moral judgment of LLMs, though the impact varies across different models and personality domains. Specifically, traits such as HonestyHumility, Agreeableness, and Conscientiousness were found to reduce utilitarian tendencies, leading to a preference for deontological responses. While some traits, particularly HonestyHumility, showed stable and consistent effects between humans and AI, others displayed divergent or even opposite patterns, highlighting fundamental differences in their respective moral reasoning mechanisms.The study reached three primary conclusions. First, LLMs are capable of exhibiting stable and distinguishable personality tendencies that can be activated through prompt-based alignment. Second, the influence of HonestyHumility on moral judgment exhibits a consistent effect across humans and different LLMs, whereas other personality domains show inconsistencies. This suggests that while LLMs moral decision-making shares partial cognitive logic with humans, fundamental differences remain. Third, the personality metatrait of Stabilityand particularly the HonestyHumility domaindemonstrates a significant moral salience effect within thepersonalized alignment process. Based on these insights, this research proposes a personalized alignment framework utilizing the HEXACO model and personality metatrait theory to systematically shape the moral responses of AI, providing a psychological foundation for the development of safety, controllable and ethical AI systems. This framework emphasizes integrating psychological theories to mitigate ethical risks and ensure that AI behavior remains consistent with human values.关键词
大语言模型/人格化对齐/道德判断/HEXACO人格/元特质Key words
large language models/personalized alignment/moral judgment/HEXACO personality/metatrait引用本文复制引用
李昌锦,焦丽颖,陈圳,许恒彬,吴胜涛,许燕.大语言模型的人格化对齐及其对道德判断的影响[EB/OL].(2026-03-13)[2026-03-19].https://chinaxiv.org/abs/202603.00082.学科分类
计算技术、计算机技术
评论