基于主题模型的微博语料建模
Modeling of micro blog corpus based on topic model
本文调研了近几年来在自然语言处理领域越发受到重视的主题模型方法。针对微博文本数据的特点,利用主题模型方法,设计了分析微博文本的建模流程,包括对文本预处理,对微博内容的去噪声处理,以及对经过处理后的文本的建模和对建模结果的分析过程。本文对Twitter微博数据实现了去噪和建模,从大量的微博数据中提取出其中最有代表性的主题。建模之后本文对建模的结果进行分析,并对得到的主题研究其演化的过程。此外本文还从时间维度上分析主题的特征变化,以达到研究微博的话题演化的目的。
In this paper we investigate topic modeling methods for text, which has been attracting more attention in the natural language processing field. With the help of topic models and focusing on the characteristics of microblog text, we design a procedure of microblog modeling, including pre-procedure for text, content denoising and processed text modeling. We deal with Twitter's data for denoising and modeling, and extract the most representative topics. After modeling, we analyze the result of modeling and study the topic evolutional process of the extracted topics. We also analyze change of characteristics of topics in time dimension to study the topic evolution of microblog.
曹建彤、蒙宏星
计算技术、计算机技术
主题模型微博话题演化
opic ModelMicroblogTopic Evolution.
曹建彤,蒙宏星.基于主题模型的微博语料建模[EB/OL].(2015-12-15)[2025-08-03].http://www.paper.edu.cn/releasepaper/content/201512-789.点此复制
评论