|国家预印本平台
首页|OpenAI o1 System Card

OpenAI o1 System Card

OpenAI o1 System Card

来源:Arxiv_logoArxiv
英文摘要

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Alex Karpenko、Camillo Lugaresi、Aidan Clark、Renny Hwang、Roshan James、Sam Altman、Olivia Watkins、Nikolas Tezak、Daniel Kappler、Filippo Raso、David Robinson、Jonathan Uesato、Jiayi Weng、Oleg Boiko、Wenda Zhou、Angela Jiang、Dan Roberts、Lukasz Kaiser、Fred von Lohmann、Allison Tam、Florencia Leoni、Szymon Sidor、Vinnie Monaco、Tao Wang、Giambattista Parascandolo、Matt Jones、Andrey Mishchenko、Reimar Leike、Kayla Wood、Kevin Lu、Tejal Patwardhan、Scott McKinney、Rapha Gontijo Lopes、Linden Li、Sam Toyer、Chelsea Voss、Andrea Vallone、Ian Osband、Hongyu Ren、Evan Mays、Shibani Santurkar、Lilian Weng、Chris Koch、Claudia Fischer、Spencer Papay、Behrooz Ghorbani、Taylor Gordon、Vineet Kosaraju、Manas Joglekar、Mason Meyer、Mianna Chen、David Mely、James Lennon、Andrew Kondrich、Rahul Arora、Kai Chen、Lorenz Kuhn、Ally Bennett、Mira Murati、Clive Chan、Ryan Greene、Hao Sheng、Shraman Ray Chaudhuri、Pavel Izmailov、Mingxuan Wang、Geoff Salmon、Chak Ming Li、Mia Glaese、Aiden Low、Kai Xiao、Leon Maksin、Siyuan Fu、Boaz Barak、Kendra Rimbach、Yu Bai、Luke Metz、Ilya Sutskever、Chen Shen、Matt Kaufer、Daniel Selsam、Adam Lerer、Joe Palermo、Miles Wang、Wojciech Zaremba、Lindsey Held、Paul Ashbourne、Mike McClay、Ilge Akkaya、Borys Minaiev、Andrew Duberstein、Lauren Yang、Andy Applebaum、Robin Brown、Jason Wei、Karl Cobbe、Katy Shi、Leyton Ho、Yann Dubois、Elizabeth Proehl、Charles de Bourcy、Doug Li、Alex Iftimie、Zheng Shao、Jiacheng Feng、Alec Helyar、John Rizzo、Ian O'Connell、Benjamin Sokolowsky、Dragos Oprica、Francis Song、Raz Gaon、Karan Singhal、Mark Chen、Kevin Yu、Suchir Balaji、Tom Stasi、Dimitris Tsipras、Edmund Wong、Thibault Sottiaux、Gildas Chabot、Irina Kofman、Eric Wallace、Ofir Nachum、Fan Wang、Bob McGrew、Alex Carney、Eddie Zhang、Enoch Cheung、Max Schwarzer、Trevor Creech、Eben Freeman、Wes McCabe、Adam Richardson、Andre Saraiva、Ben Rossen、Yinghai Lu、Steph Lin、Mostafa Rohaninejad、OpenAI、Ted Sanders、Leo Liu、Shengli Hu、Peter Zhokhov、Brandon Houghton、Chong Zhang、Jonathan Ward、John Hallman、Guillaume Leclerc、Tyna Eloundou、Sandhini Agarwal、Haiming Bao、Hadi Salman、Erik Ritter、Neil Chowdhury、Rui Shu、Meghan Shah、Maja Trebacz、Valerie Qi、David Dohan、Eric Mitchell、Samuel Miserendino、Tianhao Zheng、Grace Zhao、Adam Kalai、Hart Andrin、Foivos Tsimpourlas、Jieqi Yu、David Farhi、Tal Broda、Sasha Baker、Mengyuan Xu、Joaquin Qui?onero Candela、Ashvin Nair、Ryan Cheu、Yining Chen、Joel Parish、Felipe Petroski Such、Aaron Jaech、Randall Lin、Trapit Bansal、Lukas Kondraciuk、Vitchyr Pong、Shengjia Zhao、Patrick Chao、Barret Zoph、Vlad Fomenko、Jie Tang、Zhuohan Li、Bowen Baker、Troy Peterson、Yuchen He、Yunyun Wang、Thomas Dimson、Aleksander Madry、Brandon McKinzie、Botao Hao、Noam Brown、Saachi Jain、Jiahui Yu、Jonathan Gordon、Mehmet Yatbaz、Melody Y. Guan、Scottie Yan、Nat McAleese、Hyung Won Chung、Chris Orsinger、Hunter Lightman、:、Jerry Twore、Sam Toizer、Alexander Wei、Michael Lampe、Shuyuan Zhang、Julie Wang、Michele Wang、Cary Hudson、Joost Huizinga、Michael Malek、Santiago Hernandez、Alex Beutel、Johannes Heidecke、Jean Harb、Weiyi Zheng、Ahmed El-Kishky、Cary Bassin、Mikhail Pavlov、Marko Tintor、Karina Nguyen、Ignasi Clavera Gilaberte、Madelaine Boyd、Lindsay McCallum、Nick Ryder、Mo Bavarian、Timur Garipov、Alexander Neitz、Young Cha、Ian Kivlichan、Alex Tachard Passos、Freddie Sulit、Lama Ahmad、Yuchen Zhang、Kevin Liu、Liam Fedus、Mengyuan Yan、Alexander Prokofiev、Neil Chowdhury、Michelle Fradin、Brydon Eastman、Thomas Degry、Rhythm Garg、Christopher Hesse、Suvansh Sanjeev、Reah Miyara、Greg Brockman、Daniel Levy、Keren Gu-Lemberg、Ilya Kostrikov、Oleg Murk、Ananya Kumar、Rachel Dias、Jakub Pachocki、Kevin Stone、Hessam Bagherinezhad

安全科学计算技术、计算机技术

Alex Karpenko,Camillo Lugaresi,Aidan Clark,Renny Hwang,Roshan James,Sam Altman,Olivia Watkins,Nikolas Tezak,Daniel Kappler,Filippo Raso,David Robinson,Jonathan Uesato,Jiayi Weng,Oleg Boiko,Wenda Zhou,Angela Jiang,Dan Roberts,Lukasz Kaiser,Fred von Lohmann,Allison Tam,Florencia Leoni,Szymon Sidor,Vinnie Monaco,Tao Wang,Giambattista Parascandolo,Matt Jones,Andrey Mishchenko,Reimar Leike,Kayla Wood,Kevin Lu,Tejal Patwardhan,Scott McKinney,Rapha Gontijo Lopes,Linden Li,Sam Toyer,Chelsea Voss,Andrea Vallone,Ian Osband,Hongyu Ren,Evan Mays,Shibani Santurkar,Lilian Weng,Chris Koch,Claudia Fischer,Spencer Papay,Behrooz Ghorbani,Taylor Gordon,Vineet Kosaraju,Manas Joglekar,Mason Meyer,Mianna Chen,David Mely,James Lennon,Andrew Kondrich,Rahul Arora,Kai Chen,Lorenz Kuhn,Ally Bennett,Mira Murati,Clive Chan,Ryan Greene,Hao Sheng,Shraman Ray Chaudhuri,Pavel Izmailov,Mingxuan Wang,Geoff Salmon,Chak Ming Li,Mia Glaese,Aiden Low,Kai Xiao,Leon Maksin,Siyuan Fu,Boaz Barak,Kendra Rimbach,Yu Bai,Luke Metz,Ilya Sutskever,Chen Shen,Matt Kaufer,Daniel Selsam,Adam Lerer,Joe Palermo,Miles Wang,Wojciech Zaremba,Lindsey Held,Paul Ashbourne,Mike McClay,Ilge Akkaya,Borys Minaiev,Andrew Duberstein,Lauren Yang,Andy Applebaum,Robin Brown,Jason Wei,Karl Cobbe,Katy Shi,Leyton Ho,Yann Dubois,Elizabeth Proehl,Charles de Bourcy,Doug Li,Alex Iftimie,Zheng Shao,Jiacheng Feng,Alec Helyar,John Rizzo,Ian O'Connell,Benjamin Sokolowsky,Dragos Oprica,Francis Song,Raz Gaon,Karan Singhal,Mark Chen,Kevin Yu,Suchir Balaji,Tom Stasi,Dimitris Tsipras,Edmund Wong,Thibault Sottiaux,Gildas Chabot,Irina Kofman,Eric Wallace,Ofir Nachum,Fan Wang,Bob McGrew,Alex Carney,Eddie Zhang,Enoch Cheung,Max Schwarzer,Trevor Creech,Eben Freeman,Wes McCabe,Adam Richardson,Andre Saraiva,Ben Rossen,Yinghai Lu,Steph Lin,Mostafa Rohaninejad,OpenAI,Ted Sanders,Leo Liu,Shengli Hu,Peter Zhokhov,Brandon Houghton,Chong Zhang,Jonathan Ward,John Hallman,Guillaume Leclerc,Tyna Eloundou,Sandhini Agarwal,Haiming Bao,Hadi Salman,Erik Ritter,Neil Chowdhury,Rui Shu,Meghan Shah,Maja Trebacz,Valerie Qi,David Dohan,Eric Mitchell,Samuel Miserendino,Tianhao Zheng,Grace Zhao,Adam Kalai,Hart Andrin,Foivos Tsimpourlas,Jieqi Yu,David Farhi,Tal Broda,Sasha Baker,Mengyuan Xu,Joaquin Qui?onero Candela,Ashvin Nair,Ryan Cheu,Yining Chen,Joel Parish,Felipe Petroski Such,Aaron Jaech,Randall Lin,Trapit Bansal,Lukas Kondraciuk,Vitchyr Pong,Shengjia Zhao,Patrick Chao,Barret Zoph,Vlad Fomenko,Jie Tang,Zhuohan Li,Bowen Baker,Troy Peterson,Yuchen He,Yunyun Wang,Thomas Dimson,Aleksander Madry,Brandon McKinzie,Botao Hao,Noam Brown,Saachi Jain,Jiahui Yu,Jonathan Gordon,Mehmet Yatbaz,Melody Y. Guan,Scottie Yan,Nat McAleese,Hyung Won Chung,Chris Orsinger,Hunter Lightman,:,Jerry Twore,Sam Toizer,Alexander Wei,Michael Lampe,Shuyuan Zhang,Julie Wang,Michele Wang,Cary Hudson,Joost Huizinga,Michael Malek,Santiago Hernandez,Alex Beutel,Johannes Heidecke,Jean Harb,Weiyi Zheng,Ahmed El-Kishky,Cary Bassin,Mikhail Pavlov,Marko Tintor,Karina Nguyen,Ignasi Clavera Gilaberte,Madelaine Boyd,Lindsay McCallum,Nick Ryder,Mo Bavarian,Timur Garipov,Alexander Neitz,Young Cha,Ian Kivlichan,Alex Tachard Passos,Freddie Sulit,Lama Ahmad,Yuchen Zhang,Kevin Liu,Liam Fedus,Mengyuan Yan,Alexander Prokofiev,Neil Chowdhury,Michelle Fradin,Brydon Eastman,Thomas Degry,Rhythm Garg,Christopher Hesse,Suvansh Sanjeev,Reah Miyara,Greg Brockman,Daniel Levy,Keren Gu-Lemberg,Ilya Kostrikov,Oleg Murk,Ananya Kumar,Rachel Dias,Jakub Pachocki,Kevin Stone,Hessam Bagherinezhad.OpenAI o1 System Card[EB/OL].(2024-12-21)[2025-05-18].https://arxiv.org/abs/2412.16720.点此复制

评论