|国家预印本平台
| 注册
首页|Deliberative Alignment: Reasoning Enables Safer Language Models

Deliberative Alignment: Reasoning Enables Safer Language Models

Melody Y. Guan Jason Wei Boaz Barak Amelia Glaese Manas Joglekar Rachel Dias Saachi Jain Sam Toyer Johannes Heidecke Eric Wallace Andrea Vallone Hongyu Ren Alex Beutel Alec Helyar Hyung Won Chung

Arxiv_logoArxiv

Deliberative Alignment: Reasoning Enables Safer Language Models

Melody Y. Guan Jason Wei Boaz Barak Amelia Glaese Manas Joglekar Rachel Dias Saachi Jain Sam Toyer Johannes Heidecke Eric Wallace Andrea Vallone Hongyu Ren Alex Beutel Alec Helyar Hyung Won Chung

作者信息

引用本文复制引用

Melody Y. Guan,Jason Wei,Boaz Barak,Amelia Glaese,Manas Joglekar,Rachel Dias,Saachi Jain,Sam Toyer,Johannes Heidecke,Eric Wallace,Andrea Vallone,Hongyu Ren,Alex Beutel,Alec Helyar,Hyung Won Chung.Deliberative Alignment: Reasoning Enables Safer Language Models[EB/OL].(2024-12-20)[2025-12-13].https://arxiv.org/abs/2412.16339.

学科分类

计算技术、计算机技术

评论

首发时间 2024-12-20
下载量:0
|
点击量:31
段落导航相关论文