|国家预印本平台
首页|基于深度学习的中文微博作者身份识别研究

基于深度学习的中文微博作者身份识别研究

中文摘要英文摘要

作者身份识别一直在公安行业和文检工作中起着重要的作用。现有的作者语言风格建模过程繁琐、文本特征工程没有普适性。针对此问题,在无须专家进行特征建模的情况下,提出CABLSTM中文微博作者身份识别模型,并在公开微博语料集测试该模型准确度。该模型为最大化的提取短文本特征,融合Attention机制于CNN中并去除池化层,通过双向LSTM以获取上下文相关信息,身份识别结果通过Softmax层进行输出。实验结果表明,该模型在进行中文微博作者身份识别任务中与传统机器学习算法以及TextCNN和LSTM算法相对比,在准确率、召回率、F值方面都有一定的提升。

uthor identification has always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, the CABLSTM Chinese microblog author identification model is proposed without expert feature modeling, and the accuracy of the model is tested in the open microblog corpus. This model maximizes the extraction of short text features, fuses the Attention mechanism in the CNN and removes the pooling layer, and obtains context-related information through the bidirectional LSTM. The identity recognition result is output through the Softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and F value in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.

蔡满春、徐晓霖、芦天亮

10.12074/201811.00197V1

计算技术、计算机技术自动化技术、自动化技术设备

作者身份识别LSTMNN特征自动提取

蔡满春,徐晓霖,芦天亮.基于深度学习的中文微博作者身份识别研究[EB/OL].(2018-11-29)[2025-08-03].https://chinaxiv.org/abs/201811.00197.点此复制

评论