WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
Preference alignment has become a standard pipeline in finetuning models to follow \emph{generic} human preferences. Majority of work seeks to optimize model to produce responses that would be preferable \emph{on average}, simplifying the diverse and often \emph{contradicting} space of human preferences. While research has increasingly focused on personalized alignment: adapting models to individual user preferences, there is a lack of personalized preference dataset which focus on nuanced individual-level preferences. To address this, we introduce WikiPersona: the first fine-grained personalization using well-documented, famous individuals. Our dataset challenges models to align with these personas through an interpretable process: generating verifiable textual descriptions of a persona's background and preferences in addition to alignment. We systematically evaluate different personalization approaches and find that as few-shot prompting with preferences and fine-tuning fail to simultaneously ensure effectiveness and efficiency, using \textit{inferred personal preferences} as prefixes enables effective personalization, especially in topics where preferences clash while leading to more equitable generalization across unseen personas.
Zilu Tang、Afra Feyza Akyürek、Ekin Akyürek、Derry Wijaya
计算技术、计算机技术
Zilu Tang,Afra Feyza Akyürek,Ekin Akyürek,Derry Wijaya.WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?[EB/OL].(2025-05-19)[2025-06-05].https://arxiv.org/abs/2505.13257.点此复制
评论