Seml：一个基于程序语义和LSTM的软件缺陷预测模型

Seml: a Semantic LSTM Model for Software Defect Prediction

梁洪亮于悦

摘要：软件缺陷预测技术能够辅助开发者发现潜在的软件缺陷，并减少发现缺陷所需的开销。传统软件缺陷预测方法通常利用软件度量元信息（代码行数、控制流圈复杂度等）作为特征来构建机器学习模型进行缺陷预测。然而这种方法的缺点在于软件度量元中不包含软件的语法结构信息和语义信息。针对上述问题，本文提出一种使用深度学习技术以学习程序的语义信息，进而预测程序中缺陷的方法。在公开数据集PROMISE上进行的一系列实验结果表明，相比于现有的基于深度学习的方法以及基于度量元的方法，本文提出的方法在项目内和跨项目缺陷预测中均能达到更高的准确率。另外，实验结果也表明，将从程序源码中抽取的token转换成分布式向量表示能更好地对代码语义信息进行表达，有助于提升缺陷预测效果。

学科分类：计算技术、计算机技术

中文关键词：计算机应用技术软件缺陷预测长短期记忆网络词嵌入

推荐引用：梁洪亮,于悦.Seml：一个基于程序语义和LSTM的软件缺陷预测模型[EB/OL].(2019-03-25)[2025-09-30].http://www.paper.edu.cn/releasepaper/content/201903-317.点此复制

Abstract：Software defect prediction can assist developers in finding potential bugs and reducing maintain cost. Traditional approaches usually utilize software metrics (Lines of Code, Cyclomatic Complexity, etc) as features to build machine learning classifiers and identify defective software modules. However, software metric features often fail to capture programs\' syntax and semantic information. In this paper we propose Seml, a novel framework that combines word embedding and deep learning methods for defect prediction. Evaluation results on eight open source projects show that Seml outperforms three state-of-the-art defect prediction approaches on most of the datasets for both within-project defect prediction and cross-project defect prediction.

Keywords：computer application technologysoftware defect predictionLong Short-term Memoryword embedding

展开英文信息

Seml：一个基于程序语义和LSTM的软件缺陷预测模型

Seml: a Semantic LSTM Model for Software Defect Prediction

评论