|国家预印本平台
首页|改进的文本分类技术在垃圾短信过滤系统中的应用

改进的文本分类技术在垃圾短信过滤系统中的应用

Improved Application of Text Classification in Junk Short Messaging Service Filter System

中文摘要英文摘要

随着短信业务的发展,垃圾短信的内容和数量也不断在发生变化,传统垃圾短信过滤系统中暴露出了查准率降低和效率低下等问题。针对以上问题,本文研究了基于朴素贝叶斯的文本分类在垃圾短信中的应用。考虑到朴素贝叶斯在高维度和大数据量下处理效率降低的不足,提出了通过特征提取降维和Simhash降维结合的解决方案。同时,针对朴素贝叶斯分类器在准确度方面的缺陷,通过Simhash预过滤,采用双过滤器协同处理,来提高系统的查准率,同时提高了过滤系统的实效性。实验结果表明,上述方案能有效的解决过滤系统在这两方面的问题。

With the development of short messaging service (SMS), the content and quantity of junk SMS are constantly changing. Some problems such as reduced precision rate and low speed rate exist in the traditional filter system of junk SMS. To address these two problems, this study focus on the application of Naive Bayes text classification in junk SMS. Considering that the processing rate of Na?ve Bayes text classification would be reduced under high dimension and mass data, I reduce the dimension through feature extraction and simhash. Meanwhile, to remedy the precision defect of Na?ve Bayes classifier, I use dual filter by combing simhash filter to raise the precision rate and also improve the effectiveness of filter system. The results show that the above solutions can effectively solve the two problems.

辛阳、张宇

通信

垃圾短信朴素贝叶斯Simhash文本分类

Junk SMSNa?ve BayesSimhashText Classification

辛阳,张宇.改进的文本分类技术在垃圾短信过滤系统中的应用[EB/OL].(2015-11-30)[2025-05-28].http://www.paper.edu.cn/releasepaper/content/201511-747.点此复制

评论