Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions
Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions
Pull Requests (PRs) are central to collaborative coding, summarizing code changes for reviewers. However, many PR descriptions are incomplete, uninformative, or have out-of-context content, compromising developer workflows and hindering AI-based generation models trained on commit messages and original descriptions as "ground truth." This study examines the prevalence of "noisy" PRs and evaluates their impact on state-of-the-art description generation models. To do so, we propose four cleaning heuristics to filter noise from an initial dataset of 169K+ PRs drawn from 513 GitHub repositories. We train four models-BART, T5, PRSummarizer, and iTAPE-on both raw and cleaned datasets. Performance is measured via ROUGE-1, ROUGE-2, and ROUGE-L metrics, alongside a manual evaluation to assess description quality improvements from a human perspective. Cleaning the dataset yields significant gains: average F1 improvements of 8.6% (ROUGE-1), 8.7% (ROUGE-2), and 8.5% (ROUGE-L). Manual assessment confirms higher readability and relevance in descriptions generated by the best-performing model, BART when trained on cleaned data. Dataset refinement markedly enhances PR description generation, offering a foundation for more accurate AI-driven tools and guidelines to assist developers in crafting high-quality PR descriptions.
Kutay Tire、Berk ?akar、Eray Tüzün
计算技术、计算机技术
Kutay Tire,Berk ?akar,Eray Tüzün.Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions[EB/OL].(2025-05-02)[2025-06-02].https://arxiv.org/abs/2505.01120.点此复制
评论