|国家预印本平台
首页|CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

来源:Arxiv_logoArxiv
英文摘要

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

Daya Guo、Linjun Shou、Alexey Svyatkovskiy、Shao Kun Deng、Dawn Drain、Michele Tufano、Shengyu Fu、Junjie Huang、Ge Li、Shuo Ren、Daxin Jiang、Long Zhou、Shujie Liu、Neel Sundaresan、Shuai Lu、Ambrosio Blanco、Ming Zhou、Ming Gong、Colin Clement、Nan Duan、Lidong Zhou、Duyu Tang

计算技术、计算机技术

Daya Guo,Linjun Shou,Alexey Svyatkovskiy,Shao Kun Deng,Dawn Drain,Michele Tufano,Shengyu Fu,Junjie Huang,Ge Li,Shuo Ren,Daxin Jiang,Long Zhou,Shujie Liu,Neel Sundaresan,Shuai Lu,Ambrosio Blanco,Ming Zhou,Ming Gong,Colin Clement,Nan Duan,Lidong Zhou,Duyu Tang.CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation[EB/OL].(2021-02-09)[2025-05-28].https://arxiv.org/abs/2102.04664.点此复制

评论