单词统计特性在情感词自动抽取和商品评论分类中的作用
单词的统计特征在自然语言处理中具有广泛的应用。针对统计特征对关键词抽取和文本分类精确度的影响,分析了八种常见的统计特征,通过情感词抽取和商品评论分类,研究统计特征在情感分析领域中的作用。情感词提取实验的结果表明,通过结合统计特征与词性,情感词提取的准确率能够达到76.4%,显著高于基于统计特征或单词词性的情感词提取算法。商品评论分类的测试结果表明,与传统的基于单词的文本情感分类相比,基于统计特征的商品评论分类的准确率提高了10.8%。利用八种统计特征构造文本向量空间模型,替代基于单词构造文本向量空间模型的方法,能够降低文本向量的维度,具有隐形语义空间(LSA/SVD)的压缩效果,在保证分类结果准确率的前提下有效降低了算法的复杂度,能够替代传统的向量空间模型。
he statistical features of words are widely used in Natural Language Processing. This paper summarizes eight types of statistical features, and studies the role of these features in extracting sentimental words and classifying product reviews. Sentiment words extraction result showed that combining these statistical features and PoS tags of words can achieve much higher extraction accuracy than other methods with precision of 76.4%. Product reviews classification results showed that in contrast with sentimental words in constructing the feature space, exclusively using these 8 kinds of statistical features can improve classification precision by 10.8%. Different from the multi-dimensions of lexical elements in the vector space models (VSM) , this paper only employed these 8 types of statistical features in representation of words or documents, which has the ability that can lower the VSMs dimension and can effectively derive the latent semantic space without expensive time and space complexity of SVD calculation.
韩彤晖、马宏伟、杨东强
计算技术、计算机技术
统计特征情感词提取商品评论分类
韩彤晖,马宏伟,杨东强.单词统计特性在情感词自动抽取和商品评论分类中的作用[EB/OL].(2018-05-18)[2025-07-20].https://chinaxiv.org/abs/201805.00395.点此复制
评论