|国家预印本平台
首页|文本特征提取方案的研究与设计

文本特征提取方案的研究与设计

Research and Design of the Feature Selection in Text Classification

中文摘要英文摘要

随着互联网技术的发展与大数据时代的来临,互联网数据分析与挖掘任务成为学术界和工业界的热点问题。其中文本分类技术尤为重要,但是海量的数据对于分类器会造成维度灾难,严重影响分类器性能。本文分析比较了主流的特征选择方法,并提出了基于卡方检验的特征自动选择算法。实验部分对算法的有效性进行了分析验证。

s the rapid development of the internet and the advent of big data, the technology of data mining becomes the hot topic in academia as well as industry. Text Classification technology is great important for knowledge mining, but a huge mass of data causes the dimensional curse for text classifier, and then the performance of classifier drops significantly. This paper analyzes the mainstream methods of feature selection in text classification. Based CHI method, we propose an automatic method for feature selection in text classification. Extensive experiments on Sougou Lab's dataset illustrate the effectiveness of the algorithm.

辛阳、王然

计算技术、计算机技术

文本分类特征提取HI

ext ClassificationFeature SelectionCHI

辛阳,王然.文本特征提取方案的研究与设计[EB/OL].(2013-09-06)[2025-07-23].http://www.paper.edu.cn/releasepaper/content/201309-99.点此复制

评论