|国家预印本平台
首页|NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

来源:Arxiv_logoArxiv
英文摘要

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).

Fabrice Harel-Canada、Paul-Alexis Dray、Priti Oli、Jascha Sohl-Dickstein、Maria Obedkova、Vinay Prabhu、Aman Srivastava、Rishabh Gupta、Ashutosh Kumar、William Soto、Thomas Scialom、Priyank Soni、Jamie Simon、Nivranshu Pasricha、Michael A. Yee、Genta Indra Winata、Przemyslaw K. Joniak、Tianbao Xie、Simon Mille、Maxime Meyer、Christian Clauss、Filip Cornell、Aadesh Gupta、Samson Tan、Xudong Shen、Ryan Teehan、Suchitra Dubey、Gerard de Melo、Gloria Wang、Chandan Singh、Pawan Kumar Rajpoot、Pierre Colombo、Nafise Sadat Moosavi、Tanay Dixit、Witold Wydma¨?ski、Abinaya Mahendiran、Vasconcellos P. H. S.、Niklas Muennighoff、Afnan Mir、Sebastian Ruder、Marcin Namysl、Athena Wang、Libo Qin、Juan Diego Rodriguez、Mo Tiwari、A Tabassum、Simon Meoni、Sebastian Gehrmann、Sajant Anand、Taylor Sorensen、Vasile Pais、Yue Zhang、Fiona Anting Tan、Stefan Langer、Tony Sun、Ian Berlot-Attwell、Haoyue Shi、Rishabh Gupta、Jan Pfister、Damien Sileo、Samuel Cahyawijaya、Zhenhao Li、Jing Zhang、Marco Antonio Sobrevilla Cabezudo、Robin M. Schmidt、Yiwen Shi、Nick Siegel、Marco Di Giovanni、Eduard Hovy、Corey James Levinson、Kaustubh D. Dhole、Roman Sitelew、Tongshuang Wu、Kenton Murray、Kalpesh Krishna、Denis Kleyko、Ananya B. Sai、Zijian Wang、Hanna Behnke、Thomas Dopierre、Ashish Shrivastava、Connor Boyle、Lisa Barthe、Nagender Aneja、Roy Rinberg、Mukund Varma T、Jinho D. Choi、Emile Chapuis、Vikas Raunak、Tanya Goyal、Mukund Choudhary、Xinyi Wu、Mayukh Das、Marie Tolkiehn、Kaizhao Liang、Sang Han、Caroline Brun、Ishan Jindal、Louanes Hamla、Richard Plant、Antoine Honore、Tshephisho Sefara、Timothy Sum Hon Mun、Andrey Lukyanenko、Zhexiong Liu、Claude Roux、Shahab Raji、Vukosi Marivate、Bryan Wilie、Hualou Liang、Ondrej Dusek、Varun Gangal、Rabin Banjade、Usama Yaseen、Anna Shvets、Zijie J. Wang、Tatiana Ekeinhor、KV Aditya Srivatsa、Gautier Dagan、Saad Mahamood、Seungjae Ryan Lee、Saqib N. Shamsi、Nicolas Roberts、Wanxiang Che、Fuxuan Wei、Venelin Kovatchev

计算技术、计算机技术

Fabrice Harel-Canada,Paul-Alexis Dray,Priti Oli,Jascha Sohl-Dickstein,Maria Obedkova,Vinay Prabhu,Aman Srivastava,Rishabh Gupta,Ashutosh Kumar,William Soto,Thomas Scialom,Priyank Soni,Jamie Simon,Nivranshu Pasricha,Michael A. Yee,Genta Indra Winata,Przemyslaw K. Joniak,Tianbao Xie,Simon Mille,Maxime Meyer,Christian Clauss,Filip Cornell,Aadesh Gupta,Samson Tan,Xudong Shen,Ryan Teehan,Suchitra Dubey,Gerard de Melo,Gloria Wang,Chandan Singh,Pawan Kumar Rajpoot,Pierre Colombo,Nafise Sadat Moosavi,Tanay Dixit,Witold Wydma¨?ski,Abinaya Mahendiran,Vasconcellos P. H. S.,Niklas Muennighoff,Afnan Mir,Sebastian Ruder,Marcin Namysl,Athena Wang,Libo Qin,Juan Diego Rodriguez,Mo Tiwari,A Tabassum,Simon Meoni,Sebastian Gehrmann,Sajant Anand,Taylor Sorensen,Vasile Pais,Yue Zhang,Fiona Anting Tan,Stefan Langer,Tony Sun,Ian Berlot-Attwell,Haoyue Shi,Rishabh Gupta,Jan Pfister,Damien Sileo,Samuel Cahyawijaya,Zhenhao Li,Jing Zhang,Marco Antonio Sobrevilla Cabezudo,Robin M. Schmidt,Yiwen Shi,Nick Siegel,Marco Di Giovanni,Eduard Hovy,Corey James Levinson,Kaustubh D. Dhole,Roman Sitelew,Tongshuang Wu,Kenton Murray,Kalpesh Krishna,Denis Kleyko,Ananya B. Sai,Zijian Wang,Hanna Behnke,Thomas Dopierre,Ashish Shrivastava,Connor Boyle,Lisa Barthe,Nagender Aneja,Roy Rinberg,Mukund Varma T,Jinho D. Choi,Emile Chapuis,Vikas Raunak,Tanya Goyal,Mukund Choudhary,Xinyi Wu,Mayukh Das,Marie Tolkiehn,Kaizhao Liang,Sang Han,Caroline Brun,Ishan Jindal,Louanes Hamla,Richard Plant,Antoine Honore,Tshephisho Sefara,Timothy Sum Hon Mun,Andrey Lukyanenko,Zhexiong Liu,Claude Roux,Shahab Raji,Vukosi Marivate,Bryan Wilie,Hualou Liang,Ondrej Dusek,Varun Gangal,Rabin Banjade,Usama Yaseen,Anna Shvets,Zijie J. Wang,Tatiana Ekeinhor,KV Aditya Srivatsa,Gautier Dagan,Saad Mahamood,Seungjae Ryan Lee,Saqib N. Shamsi,Nicolas Roberts,Wanxiang Che,Fuxuan Wei,Venelin Kovatchev.NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation[EB/OL].(2021-12-05)[2025-05-14].https://arxiv.org/abs/2112.02721.点此复制

评论