一种互信息文本特征选择算法的改进
one improvement of Mutual Information in feature selection
文本分类问题中,特征选择是其中很重要的一个环节,互信息方法用于文本特征选择中的效果相比其他文本特征选择方法并不是很好,主要原因在与其偏向罕见词以及负相关上不合理,为了提升分类效果,这里对负相关问题进行改进,并从统计可靠性角度出发提出一种可靠性度量来对偏向罕见词问题加以改进,实验比较了改进后的互信息方法和原始互信息方法的效果,并实验了可靠性度量应用于信息增益方法后的效果,实验证明改进方法相比原始方法有一定的效果提升。
Feature selection is an important part of text categorization, in order to improve the effectiveness and performance of classification, this paper proposed a new approach to improve the Mutual Information based on its shortcomings of inappropriate treatment to negative mutual information and its heavy reliance on low_frequency feature.Statistical reliability is also cosidered. By comparing the new algorithm and the traditional algorithm,experiments show that new algorithm has better results in text classification and Statistical reliability is effective both in Mutual Information and Infomartion Gain.
徐蔚然、彭君睿
计算技术、计算机技术
文本分类特征选择互信息统计可靠性
ext Classificationfeature selectionmutual informationStatistical reliability
徐蔚然,彭君睿.一种互信息文本特征选择算法的改进[EB/OL].(2013-12-13)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201312-323.点此复制
评论