基于标签主题模型的互联网文本分类
Internet Text Categorization Based on Labeled Topic Model
文本自动分类在数据挖掘中是一项非常重要的任务,随着互联网的高速发展,该领域的研究能够更好的帮助人们挖掘有用信息。概率主题模型在文本分类的应用目前主要集中在特征提取阶段,而本文应用一种改进的监督主体模型进行文本分类,有效的解决的了传统LDA的无监督特性,将文本标签信息纳入到统一的模型中,同时增强了模型效果的可解释性。实验表明该方法的分类性能与主流判别式分类算法相比有所提升。
utomatic text classification in data mining is a very important task.With the rapid development of Internet, research in this area will help people to mine more useful information. Probabilistic topic model in the application of text classification is mainly focused on feature extraction now, but this paper applies improved topic model for text classification,which is an effective solution for the traditional LDA unsupervised features and incorporates the label information into a unified topic model, consequently enhancing the effectiveness of the model interpretability. Experimental results show that the classification performance of this method compared with mainstream discriminant classification algorithm has improved.
郑岩、董星
计算技术、计算机技术
文本分类标签主体模型支持向量机分类评估
text classificationlabeled topic modelSVMEvaluation
郑岩,董星.基于标签主题模型的互联网文本分类[EB/OL].(2013-12-24)[2025-08-10].http://www.paper.edu.cn/releasepaper/content/201312-771.点此复制
评论