首页|Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

来源：

英文摘要

Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

作者：Takuma Udagawa、Yang Zhao、Hiroshi Kanayama、Bishwaranjan Bhattacharjee

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Takuma Udagawa,Yang Zhao,Hiroshi Kanayama,Bishwaranjan Bhattacharjee.Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification[EB/OL].(2025-04-19)[2025-04-30].https://arxiv.org/abs/2504.14212.点此复制

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

评论