首页|Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

来源：

英文摘要

As large language models (LLMs) become increasingly prevalent in global applications, ensuring that they are toxicity-free across diverse linguistic contexts remains a critical challenge. We explore "Cross-lingual Detoxification", a cross-lingual paradigm that mitigates toxicity, enabling detoxification capabilities to transfer between high and low-resource languages across different script families. We analyze cross-lingual detoxification's effectiveness through 504 extensive settings to evaluate toxicity reduction in cross-distribution settings with limited data and investigate how mitigation impacts model performance on non-toxic tasks, revealing trade-offs between safety and knowledge preservation. Our code and dataset are publicly available at https://github.com/himanshubeniwal/Breaking-mBad.

作者：Himanshu Beniwal、Youngwoo Kim、Maarten Sap、Soham Dan、Thomas Hartvigsen

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Himanshu Beniwal,Youngwoo Kim,Maarten Sap,Soham Dan,Thomas Hartvigsen.Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification[EB/OL].(2025-05-22)[2025-06-08].https://arxiv.org/abs/2505.16722.点此复制

Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

评论