|国家预印本平台
首页|基于不同文本表示协同训练的半监督文本分类算法

基于不同文本表示协同训练的半监督文本分类算法

Semi-Supervised Text Classification Based on Co-training with Different Text Representation

中文摘要英文摘要

在半监督文本分类任务中,协同训练算法从差异化的特征空间角度出发,借助监督分类器的优势,取得了不错的成果。然而,从文本自身的角度出发寻找同时满足充分冗余和条件独立这两个假设条件的双视图是文本协同训练的难点。本文从不同的文本表示模型出发,从不同角度、以不同方式构建了两个互异的文本表示特征空间,作为协同训练的双视图,解决了现有模型均存在的场景特殊性的缺陷。在此基础上,针对不平衡数据集,给出一种改进的协同训练算法,实验结果表明,文本提出的协同训练模型在半监督文本分类任务上性能更优。

In the task of semi-supervised text classification, co-training, based on the differentiated feature sapce, has achieved good results with the use of supervised classifiers. However, the way to find the dual-view from content to meet the conditions of full redundancy and conditionl independence is the difficulty of text co-training. In this paper, two different feature spaces are constructed from different text representation models, which are based on different points and ways. As the double views of co-training, the particularity for scenes in the existed models are solved. On this basis, an improved co-training algorithm for unbalanced dataset is also presented. The experimental results show that the proposed co-training model is superior to semi-supervised text classification.

罗涛、李剑峰、邓攀晓

计算技术、计算机技术

文本分类半监督学习协同训练双视图文本表示模型

text classificationsemi-supervised learningco-trainingtwo-viewtext representation model

罗涛,李剑峰,邓攀晓.基于不同文本表示协同训练的半监督文本分类算法[EB/OL].(2017-01-05)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201701-80.点此复制

评论