首页|基于机器阅读理解的科技文献三元组抽取模型研究

基于机器阅读理解的科技文献三元组抽取模型研究

王莉军刘洢颖郑明李雪王鑫月

DOI：10.3772/j.issn.1673-2286.2025.04.003

来源：

国家预印本平台

基于机器阅读理解的科技文献三元组抽取模型研究

Research on Triplet Extraction Model for Scientific Literature Based on Machine Reading Comprehension

王莉军 ¹刘洢颖 ¹郑明 ¹李雪 ²王鑫月²

作者信息

1. 中国科学技术信息研究所,富媒体数字出版内容组织与知识服务重点实验室，北京 100038
2. 北京科技大学计算机与通信工程学院，北京 100083
折叠

摘要

科技文献是推动科学研究和技术进步的重要资源，然而随着文献数量的激增，科研人员面临着从海量文献中快速获取关键信息的挑战。提出基于机器阅读理解的开放信息抽取模型MMOIE（Multi-AnswerMachine-Reading-Comprehension Open Information Extraction），用于高效提取科技文献中的三元组。该模型通过结合SIFRank $^ +$ 模型与ELMo预训练语言模型，精确计算关键词的关键性权重，进而筛选出包含至少一个关键词的事实三元组。实验结果表明，与ZORE、SpanOIE、MGD-GNN、TPOIE等方法相比，MMOIE模型在三元组抽取中的召回率达到 $6 4 . 7 8 \%$ ，F1分数达到 $5 5 . 6 2 \%$ ，显著提升了关键信息的提取效率和质量，有效捕捉了文献中的实体关系，为科研人员快速获取关键信息提供了有力支持。

Abstract

Scientific literature is a crucial resource for driving scientific research and technological advancement. However, with the explosive growth of literature, researchers face challenges in quickly obtaining key information from massive documents. This paper proposes an open information extraction model MMOIE (Multi-Answer Machine-Reading Comprehension Open Information Extraction) based on machine reading comprehension to efficiently extract triplets from scientific literature. By combining the SIFRank $^+$ model with the ELMo pre-trained language model, the model accurately calculates the weight of keyword importance and filters out factual triplets containing at least one keyword. Experimental results show that compared to methods like ZORE, SpanOIE, MGD-GNN, and TPOIE, MMOIE achieves a recall rate of $64.78\%$ and an F1 score of $55.62\%$ in triplet extraction, significantly improving extraction efficiency and quality. It effectively captures entity relationships in literature, providing strong support for researchers to quickly access key information.

关键词

科技文献/开放信息/事实三元组/关键三元组/机器阅读理解

Key words

Scientific literature/Open information/Factual triplets/Key triplets/Machine reading comprehension

引用本文复制引用

王莉军,刘洢颖,郑明,李雪,王鑫月.基于机器阅读理解的科技文献三元组抽取模型研究[EB/OL].(2026-04-15)[2026-04-19].https://sinoxiv.napstic.cn/article/25763078.

学科分类

计算技术、计算机技术

首发时间： 2026-04-15 09:48:26

下载量：2

点击量：23

段落导航

基于机器阅读理解的科技文献三元组抽取模型研究

基于机器阅读理解的科技文献三元组抽取模型研究

Research on Triplet Extraction Model for Scientific Literature Based on Machine Reading Comprehension

摘要

Abstract

关键词

Key words

引用本文复制引用

学科分类

评论