新时代人民日报分词语料库构建、性能及应用(二)-深度学习自动分词模型构建
onstruction, Performance and Application of New Era People's Daily Segmented Corpus (Ⅱ)——Constructing Automatic Word Segmentation Model of Deep Learning
目的/意义] 在新时代人民日报分词语料库的基础上构建的深度学习自动分词模型,不仅有助于为高性能分词模型的构建提供经验,也可以借助具体的自然语言处理研究任务验证深度学习相应模型的性能。[方法/过程] 在介绍双向长短时记忆模型(Bi-LSTM)和双向长短时记忆与条件随机场融合模型(Bi-LSTM-CRF)的基础上,阐明汉语分词语料预处理、评价指标和参数与硬件平台的过程、种类和情况,分别构建Bi-LSTM和Bi-LSTM-CRF汉语自动分词模型,并对模型的整体性能进行分析。[结果/结论] 从精准率、召回率和调和平均值3个指标上看,所构建的Bi-LSTM和Bi-LSTM-CRF汉语自动分词模型的整体性能相对较为合理。在具体性能上,Bi-LSTM分词模型优于Bi-LSTM-CRF分词模型,但这一差距非常细微。
Purpose/significance] On the basis of the new era People's Daily(NEPD) word segmentation corpus, the construction of the automatic word segmentation model of deep learning not only can help to provide relevant experience for the construction of high-performance word segmentation model, but also can verify the performance of the corresponding model of deep learning through specific natural language processing tasks.[Method/process] Based on the introduction of Bi-directional Long Short-Term Memory (Bi-LSTM) and Bi-directional Long Short-Term Memory with conditional random field (Bi-LSTM-CRF), this paper expounded the process, type and situation of Chinese word segmentation preprocessing, the evaluation indexes and parameters and hardware platform, the Bi-LSTM and Bi-LSTM-CRF Chinese automatic word segmentation models were constructed respectively, and the overall performance of the models was analyzed.[Result/conclusion] The overall performance of the Bi-LSTM and Bi-LSTM-CRF Chinese automatic word segmentation model is relatively reasonable from the three indexes of precision, recall and F value. In terms of specific performance, Bi-LSTM word segmentation model is superior to Bi-LSTM-CRF word segmentation model, but the difference is very small.
黄水清、王东波
汉语
新时代人民日报分词语料语料库自动分词深度学习Bi-LSTMBi-LSTM-CRF
new era People's Daily segmented corpussegmented corpusautomatic word segmentationdeep learningBi-LSTMBi-LSTM-CRF
黄水清,王东波.新时代人民日报分词语料库构建、性能及应用(二)-深度学习自动分词模型构建[EB/OL].(2023-07-26)[2025-08-16].https://chinaxiv.org/abs/202307.00312.点此复制
评论