基于连续空间表示的文本分类
he text classification of representation technology in contiguous space
文本连续空间表示是用分布式向量来表示文本的特征。本文的实验则是通过doc2vec模型训练,把文本用一个向量来表示。通过这个模型得到的文本向量既具有语法信息又具有语义信息,同时也能使具有相似语义的文本的向量距离尽量接近。因为在自然语言处理中文本分类的应用是对算法的校验的一个基准,所以本文旨在先训练出这种文本向量,然后把其应用到文本分类中,根据文本分类的结果来判断该模型训练出的文本向量的好坏。本文实验的目标就是把文本向量应用到文本分类中,取的良好的分类效果。
he representation of text contiguous space is representing the characteristics of the text by a distributed vector.the experiment of this paper represent a text by a vector through the training of doc2vec model.And the vector got from this model have syntax information and with semantic information.What's more,the vectors of the texts with similar semantic distance as closely as possible.Because text classification is a benchmark of checking a algorithm in the application of natural language processing,so the purpose of this paper is training text vector first,then applying it in text classification,and determining the training effect according to the result of text classification.This experiment aim to apply text vector to text classification, and get a good result.
王小捷、曾桢
计算技术、计算机技术
自然语言处理文本表示文本分类机器学习
Nature language progressext representationText classificationMachine learning
王小捷,曾桢.基于连续空间表示的文本分类[EB/OL].(2015-12-17)[2025-04-04].http://www.paper.edu.cn/releasepaper/content/201512-947.点此复制
评论