Debiasing Multilingual LLMs in Cross-lingual Latent Space
Debiasing Multilingual LLMs in Cross-lingual Latent Space
Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual transferability.
Qiwei Peng、Guimin Hu、Yekun Chai、Anders Søgaard
印欧语系计算技术、计算机技术
Qiwei Peng,Guimin Hu,Yekun Chai,Anders Søgaard.Debiasing Multilingual LLMs in Cross-lingual Latent Space[EB/OL].(2025-08-25)[2025-09-06].https://arxiv.org/abs/2508.17948.点此复制
评论