|国家预印本平台
首页|基于半监督LDA的文本分类应用研究

基于半监督LDA的文本分类应用研究

Research on text classification based on semi-supervised LDA

中文摘要英文摘要

在如今信息数据大爆炸的时代,数据的增长呈现指数级增长,而且其中大部分数据是非结构化数据,这些数据中蕴藏着大量且重要的知识等待着我们用合理的办法将其挖掘出来,如何方便合理快速的进行文本分类也是一个非常重要的课题。LDA模型是一种无监督的模型,它可以发现隐性的主题,为了更有效的发现隐性主题,本文提出一种基于半监督的LDA主题模型,找到一个主题集作为隐性层的知识集,通过这种方法找到的主题与文本更相关,另外,将LDA模型与基于半监督LDA模型应用于文本的特征提取,并与其它特征提取方法比对,实验表明,半监督LDA模型性能略好。

Nowadays, it's a time of information explosion and exponential growth of data.Most of the data is unstructured, which bears a large number of important knowledge awaits us to find out. How to work out a convenient and quick way to process text classification is also a very important issue. LDA model is an unsupervised model, which can discover latent topics under unlabeled data.In order to more effectively find latent topics, a model based on semi-supervised LDA is proposed.By finding a topic set to discover more relevant topics. In addition, the LDA model and the semi-supervisedLDA model are applied to the text of feature extraction, comparing with other models.The experiments show that semi-supervised LDA model performance slightly better.

郑世卓、崔晓燕

计算技术、计算机技术

文本分类主题模型LDA模型半监督LDA

text classificationtopic modelLDA modelsemi-supervised LDA model

郑世卓,崔晓燕.基于半监督LDA的文本分类应用研究[EB/OL].(2014-01-16)[2025-05-10].http://www.paper.edu.cn/releasepaper/content/201401-768.点此复制

评论