Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
This paper investigates the application of natural language processing (NLP)-based n-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through n-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into n-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our n-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features.
Bishwajit Prasad Gond、Rajneekant、Pushkar Kishore、Durga Prasad Mohapatra
计算技术、计算机技术
Bishwajit Prasad Gond,Rajneekant,Pushkar Kishore,Durga Prasad Mohapatra.Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy[EB/OL].(2025-06-19)[2025-07-01].https://arxiv.org/abs/2506.16224.点此复制
评论