基于粗糙集的文本分类研究
ext Categorization Based on Rough Set Theory
文本分类是信息检索和数据挖掘等领域的研究热点。在现有的一些文本分类方法中,文本都是基于向量空间模型表示的,所形成的特征空间维数相当高,导致分类算法效率不高,分类精度不理想。粗糙集应用到文本分类可以在不影响分类精度的条件下降低特征向量的维数,并且可以得到的显式表达的分类规则。本文旨在介绍文本分类一般过程,分析将粗糙集理论应用到文本分类中关键步骤,总结粗糙集与其他分类算法结合应用到文本分类的情况。
ext Categorization is the key topic in many areas such as Information Retrieval, Data Mining and so on. In some currently used methods of text categorizing, the text is expressed by VSM, which causes a rather high dimension of eigenvector, which leads to the inefficiency of sort algorithm and unfavorable result. Rough set theory can reduce the dimensions of feature vector and get classification rules of explicit formulation without influencing the accuracy of text categorization. This paper aims to introduce the normal process of text categorizing, analyze the key procedures of employing rough set theory in Text Categorization, and summary the situation of employing Rough Set and other classification algorithm combined in Text Categorization.
赵玉虹、徐欣、黄理灿
计算技术、计算机技术
文本分类粗糙集理论属性约简
text categorizationrough set theoryttribute reduction
赵玉虹,徐欣,黄理灿.基于粗糙集的文本分类研究[EB/OL].(2010-04-20)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201004-726.点此复制
评论