主动学习的科技文献研究对象标引体系研究
【目的】识别论文标题中的研究对象属性实例, 试图利用少量标注样本, 最大限度地提高研究对象识别的准确率。【方法】分析科技文献中研究对象的语法特征, 利用少量样本基于条件随机场序列标注算法, 对研究对象进行识别和抽取, 并引入基于未标注数据的主动学习的迭代标引体系, 提高研究对象识别的准确率。【结果】能够高效利用未标注数据, 并最大限度地提高研究对象识别的准确率, 标注准确率达到78.3%。【局限】算法运行效率有待进一步优化。【结论】对科技文献中研究对象属性实例具有较好的识别效果, 为进一步挖掘科技文献中的知识体系和结构打下基础。
Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.
刘丽娟、贺惠新
计算技术、计算机技术
科技文献研究对象条件随机场迭代标引体系主动学习
刘丽娟,贺惠新.主动学习的科技文献研究对象标引体系研究[EB/OL].(2017-10-11)[2025-08-05].https://chinaxiv.org/abs/201711.01232.点此复制
评论