|国家预印本平台
首页|Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy

Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy

Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy

来源:Arxiv_logoArxiv
英文摘要

This paper investigates the application of natural language processing (NLP)-based n-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through n-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into n-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our n-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features.

Bishwajit Prasad Gond、Rajneekant、Pushkar Kishore、Durga Prasad Mohapatra

计算技术、计算机技术

Bishwajit Prasad Gond,Rajneekant,Pushkar Kishore,Durga Prasad Mohapatra.Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy[EB/OL].(2025-06-19)[2025-07-01].https://arxiv.org/abs/2506.16224.点此复制

评论