Extracting human data from published figures: implications for data science and bioethics
Extracting human data from published figures: implications for data science and bioethics
Abstract The advent of text mining and natural text reading artificial intelligence has opened new research opportunities on the large collections of research publications available through journal and other resources. These systems have begun to identify novel connections or hypotheses due to an ability to read and extract information from more literature than a single individual could in their lifetime. Most research publications contain figures where data is represented in a graph. Modern publication guidelines are strongly encouraging publication of graphs where all data is displayed as apposed to summary figures such as bar charts. Figures are often encoded in a graphing language that is interpreted and displayed as a graphics. Conversion figures in publications to the underlying code should enable text-based mining to extract the underlying raw data of the graph. Here I show that data from publications greater than 15 years old that contain time series data on human patients is extractable from the original publication and can be reassessed using modern tools. This could benefit cases where data sets are not available due to file loss or corruption. This may also create and issue for the publication of human data as sharing of human data often requires research ethics approval. Author summaryFigures embedded in published research manuscripts are a minable resource similar to text mining. Figures are text based code that draws the image, as such the underlying text of the code can be used to reassemble the original data set.
Cox Brian J.
Department of Physiology, University of Toronto||Department of Obstetrics and Gynaecology, University of Toronto
生物科学研究方法、生物科学研究技术计算技术、计算机技术
Cox Brian J..Extracting human data from published figures: implications for data science and bioethics[EB/OL].(2025-03-28)[2025-06-06].https://www.biorxiv.org/content/10.1101/376848.点此复制
评论