|国家预印本平台
首页|面向跨语言文本分类与标签推荐的带标签双语主题模型的研究

面向跨语言文本分类与标签推荐的带标签双语主题模型的研究

中文摘要英文摘要

针对日渐丰富的跨语言的文字信息资源与新闻报道及科技文献中的多标签数据,为了挖掘跨语言间的相关性及数据属性间的关联性,提出了带标签双语主题模型,应用于跨语言文本分类与标签的推荐。首先,假设科技文献中的关键词与摘要部分有着内容上的相关性,对关键词进行提取,并进行标签化,进而把标签对应于主题模型中的主题,实例化“潜在”的主题;其次,利用带标签双语主题模型对摘要部分进行了训练迭代;最后,对新加入的文档进行跨语言文本分类及标签的推荐。实验结果表明,跨语言文本分类任务中Micro-F1达到94.81%,推荐的标签也较好地体现出语义上的相关性。

iming at the increasingly rich multi language information resources and multi-label data in news reports and scientific literatures, in order to mining the relevance between languages and the correlation between data, this paper proposed labeled bilingual topic model, applied on cross-lingual text classification and label recommendation. First of all, it could assume that the keywords in the scientific literature are relevant to the Abstract: in same article, then extracted the keywords and regarded it as labels, and aligned the labels with topics in topic model, instantiated the latent topic. Secondly, trained the Abstract: in article through the topic model proposed by this paper. Finally, classified the new documents by cross-lingual text classifier, also recommended the labels. The experiment result show that Micro-F1 measure reaches 94.81% in cross-lingual text classification task, and the recommended labels also reflects the sematic relevance with documents.

田明杰、崔荣一

10.12074/201806.00109V1

计算技术、计算机技术

主题模型标签跨语言文本分类标签推荐潜在主题

田明杰,崔荣一.面向跨语言文本分类与标签推荐的带标签双语主题模型的研究[EB/OL].(2018-06-19)[2025-08-02].https://chinaxiv.org/abs/201806.00109.点此复制

评论