Enhance the machine learning algorithm performance in phishing detection with keyword features
Enhance the machine learning algorithm performance in phishing detection with keyword features
Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.
Zijiang Yang
10.1109/CNIOT65435.2025.11070642
计算技术、计算机技术
Zijiang Yang.Enhance the machine learning algorithm performance in phishing detection with keyword features[EB/OL].(2025-08-12)[2025-08-24].https://arxiv.org/abs/2508.09765.点此复制
评论