|国家预印本平台
首页|Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

来源:Arxiv_logoArxiv
英文摘要

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

Alexander Pan、Anton Korinek、Philip H. S. Torr、Tegan Maharaj、Markus Anderljung、Se¨¢n ¨? h¨|igeartaigh、Eric Bigelow、Alan Chan、Lewis Hammond、David Krueger、Usman Anwar、Aleksandar Petrov、Jakob Foerster、Miles Turpin、Mario G¨1nther、Javier Rando、Ekdeep Singh Lubana、Yejin Choi、Heidi Zhang、Sumeet Ramesh Motwan、Ruiqi Zhong、Oliver Sourbut、Peter Hase、He He、Samuel Albanie、Yoshua Bengio、Tomasz Korbak、Giulio Corsi、Atoosa Kasirzadeh、Danqi Chen、Zhaowei Zhang、Lauro Langosco、Stephen Casper、Gabriel Recchia、Daniel Paleka、Abulhair Saparov、Lilian Edwards、Benjamin L. Edelman、Florian Tramer、Christian Schroeder de Witt、Erik Jenner、Jose Hernandez-Orallo

计算技术、计算机技术

Alexander Pan,Anton Korinek,Philip H. S. Torr,Tegan Maharaj,Markus Anderljung,Se¨¢n ¨? h¨|igeartaigh,Eric Bigelow,Alan Chan,Lewis Hammond,David Krueger,Usman Anwar,Aleksandar Petrov,Jakob Foerster,Miles Turpin,Mario G¨1nther,Javier Rando,Ekdeep Singh Lubana,Yejin Choi,Heidi Zhang,Sumeet Ramesh Motwan,Ruiqi Zhong,Oliver Sourbut,Peter Hase,He He,Samuel Albanie,Yoshua Bengio,Tomasz Korbak,Giulio Corsi,Atoosa Kasirzadeh,Danqi Chen,Zhaowei Zhang,Lauro Langosco,Stephen Casper,Gabriel Recchia,Daniel Paleka,Abulhair Saparov,Lilian Edwards,Benjamin L. Edelman,Florian Tramer,Christian Schroeder de Witt,Erik Jenner,Jose Hernandez-Orallo.Foundational Challenges in Assuring Alignment and Safety of Large Language Models[EB/OL].(2024-04-15)[2025-05-22].https://arxiv.org/abs/2404.09932.点此复制

评论