Deliberative Alignment: Reasoning Enables Safer Language Models
Melody Y. Guan Jason Wei Boaz Barak Amelia Glaese Manas Joglekar Rachel Dias Saachi Jain Sam Toyer Johannes Heidecke Eric Wallace Andrea Vallone Hongyu Ren Alex Beutel Alec Helyar Hyung Won Chung
作者信息
引用本文复制引用
Melody Y. Guan,Jason Wei,Boaz Barak,Amelia Glaese,Manas Joglekar,Rachel Dias,Saachi Jain,Sam Toyer,Johannes Heidecke,Eric Wallace,Andrea Vallone,Hongyu Ren,Alex Beutel,Alec Helyar,Hyung Won Chung.Deliberative Alignment: Reasoning Enables Safer Language Models[EB/OL].(2024-12-20)[2025-12-13].https://arxiv.org/abs/2412.16339.学科分类
计算技术、计算机技术
评论