|国家预印本平台
| 注册
首页|Representation Bending for Large Language Model Safety

Representation Bending for Large Language Model Safety

Ashkan Yousefpour Taeheon Kim Ryan S. Kwon Seungbeen Lee Wonje Jeung Seungju Han Alvin Wan Harrison Ngan Youngjae Yu Jonghyun Choi

Arxiv_logoArxiv

Representation Bending for Large Language Model Safety

Ashkan Yousefpour Taeheon Kim Ryan S. Kwon Seungbeen Lee Wonje Jeung Seungju Han Alvin Wan Harrison Ngan Youngjae Yu Jonghyun Choi

作者信息

引用本文复制引用

Ashkan Yousefpour,Taeheon Kim,Ryan S. Kwon,Seungbeen Lee,Wonje Jeung,Seungju Han,Alvin Wan,Harrison Ngan,Youngjae Yu,Jonghyun Choi.Representation Bending for Large Language Model Safety[EB/OL].(2025-07-15)[2025-12-13].https://arxiv.org/abs/2504.01550.

学科分类

计算技术、计算机技术

评论

首发时间 2025-07-15
下载量:0
|
点击量:11
段落导航相关论文