|国家预印本平台
首页|COFO: COdeFOrces dataset for Program Classification, Recognition and Tagging

COFO: COdeFOrces dataset for Program Classification, Recognition and Tagging

COFO: COdeFOrces dataset for Program Classification, Recognition and Tagging

来源:Arxiv_logoArxiv
英文摘要

In recent years, a lot of technological advances in computer science have aided software programmers to create innovative and real-time user-friendly software. With the creation of the software and the urging interest of people to learn to write software, there is a large collection of source codes that can be found on the web, also known as Big Code, which can be used as a source of data for driving the machine learning applications tending to solve certain software engineering problems. In this paper, we present COFO, a dataset consisting of 809 classes/problems with a total of 369K source codes written in C, C++, Java, and Python programming languages, along with other metadata such as code tags, problem specification, and input-output specifications. COFO has been scraped from the openly available Codeforces website using a selenium-beautifulsoup-python based scraper. We envision that this dataset can be useful for solving machine learning-based problems like program classification/recognition, tagging, predicting program properties, and code comprehension.

Kuldeep Gautam、S. VenkataKeerthy、Ramakrishna Upadrasta

计算技术、计算机技术

Kuldeep Gautam,S. VenkataKeerthy,Ramakrishna Upadrasta.COFO: COdeFOrces dataset for Program Classification, Recognition and Tagging[EB/OL].(2025-03-23)[2025-08-02].https://arxiv.org/abs/2503.18251.点此复制

评论