Structured references from PDF articles: assessing the tools for
bibliographic reference extraction and parsing
Silvio Peroni Alessia Cioffi
作者信息
Abstract
Many solutions have been provided to extract bibliographic references from
PDF papers. Machine learning, rule-based and regular expressions approaches
were among the most used methods adopted in tools for addressing this task.
This work aims to identify and evaluate all and only the tools which, given a
full-text paper in PDF format, can recognise, extract and parse bibliographic
references. We identified seven tools: Anystyle, Cermine, ExCite, Grobid,
Pdfssa4met, Scholarcy and Science Parse. We compared and evaluated them against
a corpus of 56 PDF articles published in 27 subject areas. Indeed, Anystyle
obtained the best overall score, followed by Cermine. However, in some subject
areas, other tools had better results for specific tasks.引用本文复制引用
Silvio Peroni,Alessia Cioffi.Structured references from PDF articles: assessing the tools for
bibliographic reference extraction and parsing[EB/OL].(2022-05-29)[2026-06-22].https://arxiv.org/abs/2205.14677.学科分类
计算技术、计算机技术