基于BERT的中文命名实体识别研究
Research on Chinese Named Entity Recognition Based on BERT
命名实体识别(Named Entity Recognition)是自然语言处理(natural language processing)领域重要的研究方向,中文语法复杂,语序多变。当前命名实体识别的主流模型为BiLSTM+CRF,这个模型对于更加开放的中文场景如小说文本仍然存在泛化能力较差的问题。本文分析了中文及其命名实体识别任务的特点,探究如何让当前NLP热门研究迁移学习及其代表之作BERT更好地适应中文命名实体识别任务,提出了基于BERT命名实体识别模型BB-CRF。BB-CRF具有以下几个优点:泛化能力强,可以适应复杂的语法和OOV问题,收敛速度快。最后,本文在人民日报以及更加贴近自然语法的小说数据两个数据集上评估模型,实验结果在模型准确性方面证明了本文模型能够在广泛的场景中对中文命名实体识别任务的特性进行建模,具有很高的实用性以及指导意义。
Named Entity Recognition (NER) is an important research direction in the field of natural language processing (NLP). Chinese grammar is complex and word order is variable. The current mainstream model for named entity recognition is BiLSTM + CRF. This model still has the problem of poor generalization ability for more open Chinese scenes such as novel text. This article analyzes the characteristics of Chinese and named entity recognition tasks, and explores how to make the current popular research of NLP transfer learning and its representative work BERT better adapt to Chinese named entity recognition tasks, and proposes a BERT-based named entity recognition model BB- CRF. BB-CRF has the following advantages: strong generalization ability, can adapt to complex syntax and OOV problems, and fast convergence. Finally, this paper evaluates the model on two datasets, the People\'s Daily and the novel data closer to natural grammar. The experimental results prove that the model can model the characteristics of the Chinese named entity recognition task in a wide range of scenarios. , With high practicality and guiding significance.
王玉龙、刘同存、过临朋
计算技术、计算机技术
深度学习自然语言处理命名实体识别迁移学习
eep learningnatural language processingnamed entity recognitiontransfer learning
王玉龙,刘同存,过临朋.基于BERT的中文命名实体识别研究[EB/OL].(2020-02-04)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/202002-6.点此复制
评论