Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Ensuring the safety of generated content remains a fundamental challenge for Text-to-Image (T2I) generation. Existing studies either fail to guarantee complete safety under potentially harmful concepts or struggle to balance safety with generation quality. To address these issues, we propose Safety-Constrained Direct Preference Optimization (SC-DPO), a novel framework for safety alignment in T2I models. SC-DPO integrates safety constraints into the general human preference calibration, aiming to maximize the likelihood of generating human-preferred samples while minimizing the safety cost of the generated outputs. In SC-DPO, we introduce a safety cost model to accurately quantify harmful levels for images, and train it effectively using the proposed contrastive learning and cost anchoring objectives. To apply SC-DPO for effective T2I safety alignment, we constructed SCP-10K, a safety-constrained preference dataset containing rich harmful concepts, which blends safety-constrained preference pairs under both harmful and clean instructions, further mitigating the trade-off between safety and sample quality. Additionally, we propose a Dynamic Focusing Mechanism (DFM) for SC-DPO, promoting the model's learning of difficult preference pair samples. Extensive experiments demonstrate that SC-DPO outperforms existing methods, effectively defending against various NSFW content while maintaining optimal sample quality and human preference alignment. Additionally, SC-DPO exhibits resilience against adversarial prompts designed to generate harmful content.
Shouwei Ruan、Zhenyu Wu、Yao Huang、Ruochen Zhang、Yitong Sun、Caixin Kang、Xingxing Wei
安全科学
Shouwei Ruan,Zhenyu Wu,Yao Huang,Ruochen Zhang,Yitong Sun,Caixin Kang,Xingxing Wei.Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization[EB/OL].(2025-04-19)[2025-06-22].https://arxiv.org/abs/2504.14290.点此复制
评论