|国家预印本平台
首页|基于混合互信息算法的文本情感分析

基于混合互信息算法的文本情感分析

中文摘要英文摘要

针对互信息(mutual information, MI)特征选择方法存在的正负相关性的现象以及未考虑特征项在不同类别内词频的问题,提出了一种混合互信息特征选择算法(hybrid mutual information, HMI)。该算法引入逆文档频率系数和类间词频信息系数,使得整个文档中的词频信息以及每个类之间的词频信息得以有效利用;引入正负相关性系数,区分正相关性和负相关性,并进行有效的利用。通过实验对比表明,混合互信息算法可以有效地提高特征选择的质量,进而提高文本情感分析的效果。

iming at the phenomenon of positive and negative correlation in the feature selection method of mutual information (MI) and the problem of not considering the word frequency of the feature items in different categories, a hybrid mutual information feature selection algorithm (HMI) is proposed. By introducing the inverse document frequency coefficient and the inter-class word frequency information coefficient, the algorithm can effectively utilize the word frequency information in the whole document and the word frequency information between each class. The positive and negative correlation coefficient is introduced to distinguish positive correlation and negative correlation and to make effective use. The experimental results show that the hybrid mutual information algorithm can effectively improve the quality of feature selection and then improve the effect of text emotional analysis.

王义、戴月明

10.12074/201812.00104V1

计算技术、计算机技术

互信息特征选择正负相关性词频信息情感分析

王义,戴月明.基于混合互信息算法的文本情感分析[EB/OL].(2018-12-13)[2025-08-02].https://chinaxiv.org/abs/201812.00104.点此复制

评论