Discovering Bias Associations through Open-Ended LLM Generations
Discovering Bias Associations through Open-Ended LLM Generations
Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs. Data, code, and results are available at https://github.com/JP-25/Discover-Open-Ended-Generation
Jinhao Pan、Chahat Raj、Ziwei Zhu
计算技术、计算机技术
Jinhao Pan,Chahat Raj,Ziwei Zhu.Discovering Bias Associations through Open-Ended LLM Generations[EB/OL].(2025-08-02)[2025-08-26].https://arxiv.org/abs/2508.01412.点此复制
评论