基于BERT_BLSTM_CRF的实体识别算法
BERT_BLSTM_CRF entity recognition algorithm
针对现有的科技资源面临的指标繁多、类别细分、难以完整覆盖、精确提炼等问题,本文提出了一种基于BERT_BLSTM_CRF的实体识别算法来提取科技资源实体,构建科技大数据画像。利用BERT预训练语言模型代替word2vec语言模型,通过联合调节所有层中的上下文,采用表义能力较强的双向Transformer网络结构来预训练语言模型,可以很好地根字的上下文信息来丰富字的语义向量,随后引入目前主流的深度学习序列标注模型BLSTM-CRF模型作为基准模型,将BERT预训练后输出的字向量序列输入到BLSTM-CRF进行训练。首先将待识别的文本输入双向LSTM模型中,获取每个字符相应的标签。这些标签之间具有较强的依赖关系,使用连接在双向LSTM后的CRF层来学习标签之间的依赖关系,最后得到全局最优句子级别的标签序列。实验结果表明,与传统的实体识别算法模型相比,本文提出的基于BERT_BLSTM_CRF的实体识别算法模型的准确率更高,从而验证了所提出方法的有效性。
iming at the problems of various indicators, classification subdivision, difficult to complete coverage and accurate extraction of existing science and technology resources, this paper proposes an entity recognition algorithm based on BERT_BLSTM_CRF to extract the scientific and technological resource entities and construct the scientific and technological big data portrait. In this paper, we use the best pre training language model instead of word2vec language model. By adjusting the context of all layers, we use the bidirectional transformer network structure with strong semantic ability to pre train the language model. We can enrich the semantic dimension of words according to the context information of words. Then we introduce the current mainstream deep learning sequence annotation model blstm-crf model as the benchmark model The output word vector sequence after the pre training of Bert is input into blstm-crf for training. Firstly, the text to be recognized is input into the bidirectional LSTM model to obtain the corresponding label of each character. These tags have strong dependence. The CRF layer linked to the two-way LSTM is used to learn the dependency relationship between tags. Finally, the global optimal sentence level tag sequence is obtained. Experimental results show that compared with the traditional entity recognition algorithm model, the proposed entity recognition algorithm model based on BERT_BLSTM_CRF has higher accuracy, which verifies the effectiveness of the proposed method.
杜军平、司雪峰
科学、科学研究计算技术、计算机技术信息传播、知识传播
BERTLSTMRF实体识别
BERTLSTMRFEntity Recognition
杜军平,司雪峰.基于BERT_BLSTM_CRF的实体识别算法[EB/OL].(2020-12-18)[2025-04-30].http://www.paper.edu.cn/releasepaper/content/202012-69.点此复制
评论