FullStack Bench: Evaluating LLMs as Full Stack Coders
FullStack Bench: Evaluating LLMs as Full Stack Coders
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.
Xuwu Wang、Xia Xiao、Jinxiang Xia、Rui Long、Jing Mai、Guanghan Ning、Z. Y. Peng、Kai Shen、Jiahao Su、Jing Su、Tao Sun、Yifan Sun、Yunzhe Tao、Guoyin Wang、Siwei Wang、Bytedance-Seed-Foundation-Code-Team、:、Yao Cheng、Jianfeng Chen、Jie Chen、Li Chen、Liyu Chen、Wentao Chen、Zhengyu Chen、Shijie Geng、Aoyan Li、Bo Li、Bowen Li、Linyi Li、Boyi Liu、Jiaheng Liu、Kaibo Liu、Qi Liu、Shukai Liu、Siyao Liu、Tianyi Liu、Tingkai Liu、Yongfei Liu、Yite Wang、Zihan Wang、Liang Xiang、Yongsheng Xiao、Chenguang Xi、Shulin Xin、Jingjing Xu、Shikun Xu、Hongxia Yang、Jack Yang、Yingxiang Yang、Jianbo Yuan、Jun Zhang、Yufeng Zhang、Yuyu Zhang、Shen Zheng、He Zhu、Ming Zhu
计算技术、计算机技术
Xuwu Wang,Xia Xiao,Jinxiang Xia,Rui Long,Jing Mai,Guanghan Ning,Z. Y. Peng,Kai Shen,Jiahao Su,Jing Su,Tao Sun,Yifan Sun,Yunzhe Tao,Guoyin Wang,Siwei Wang,Bytedance-Seed-Foundation-Code-Team,:,Yao Cheng,Jianfeng Chen,Jie Chen,Li Chen,Liyu Chen,Wentao Chen,Zhengyu Chen,Shijie Geng,Aoyan Li,Bo Li,Bowen Li,Linyi Li,Boyi Liu,Jiaheng Liu,Kaibo Liu,Qi Liu,Shukai Liu,Siyao Liu,Tianyi Liu,Tingkai Liu,Yongfei Liu,Yite Wang,Zihan Wang,Liang Xiang,Yongsheng Xiao,Chenguang Xi,Shulin Xin,Jingjing Xu,Shikun Xu,Hongxia Yang,Jack Yang,Yingxiang Yang,Jianbo Yuan,Jun Zhang,Yufeng Zhang,Yuyu Zhang,Shen Zheng,He Zhu,Ming Zhu.FullStack Bench: Evaluating LLMs as Full Stack Coders[EB/OL].(2024-11-30)[2025-05-23].https://arxiv.org/abs/2412.00535.点此复制
评论