|国家预印本平台
首页|基于潜在语义分析的BBS文档Bayes鉴别器

基于潜在语义分析的BBS文档Bayes鉴别器

Bayes Discriminator for BBS Documents based on Latent Semantic Analysis

中文摘要英文摘要

对电子公告栏(BBS)文档进行鉴别已成为信息安全技术的重要内容之一。本文融合了数据挖掘技术、数理统计技术和自然语言理解技术,提出了基于潜在语义分析与Bayes分类的BBS文档鉴别方法 ( Bayes Discriminator based on Latent Semantic Analysis, BDLSA).利用自然语言处理技术从训练文档中抽取典型短语集;通过潜在语义分析进行典型短语同义归约,应用关联规则采掘技术提高典型短语间的独立性;用Bayes分类器对BBS文档进行鉴别。本文还对影响系统的关键参数进行了大量的讨论和测试,实验表明新提出的方法对于BBS文档的鉴别是可行而有效的。

ith the rapid development of Internet, the abuse and misuse of BBS become a social problem of information pollution and call on the demand to the discrimination techniques for BBS document. Borrowing the techniques from data mining, probability-statistics and Natural Language Understanding, this paper proposes a new discrimination method for BBS document, called Bayes Discrimination based on Latent Semantic Analysis (BDLSA). The main steps of new method includes (1) Make typical phrase set by extracting the typical sentences from training documents in preprocessing stage with natural language understanding techniques. (2). Apply synonymy reduction on typical phrases by Latent Semantic Analysis (3) Discover the association rules between typical phrases to increase the independency of phrases so that the traditional Bayes discriminator works efficiently. (4) Discriminate BBS Document by Bayes classifier. The algorithms to construct typical phrase set and to reduce synonymy are proposed

郭颖、唐常杰、杜永萍、刘昌钰

计算技术、计算机技术电子技术应用

数据挖掘,关联规则,贝叶斯分类,潜在语义分析,BBS

ata Mining Associate Rule Bayes Classifier Latent Semantic Analysis BBS

郭颖,唐常杰,杜永萍,刘昌钰.基于潜在语义分析的BBS文档Bayes鉴别器[EB/OL].(2004-06-28)[2025-08-05].http://www.paper.edu.cn/releasepaper/content/200406-110.点此复制

评论