|国家预印本平台
首页|PGxCorpus: a Manually Annotated Corpus for Pharmacogenomics

PGxCorpus: a Manually Annotated Corpus for Pharmacogenomics

PGxCorpus: a Manually Annotated Corpus for Pharmacogenomics

来源:bioRxiv_logobioRxiv
英文摘要

Abstract Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes knowledge related to PGx a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable to humans or software. Natural language processing techniques have been developed and employed for guiding experts curating this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. This absence restricts in particular the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. We present in this article the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.

Digan William、Gogdemir Romain、Dalleau Kevin、Sma?l-Tabbone Malika、Coulet Adrien、Legrand Jo?l、Devignes Marie-Dominique、Bousquet C¨|dric、Lee Chia-Ju、Petitpain Nadine、Ndiaye Ndeye-Coumba、Ringot Patrice、Toussaint Yannick

H?pital Europ¨|en Georges Pompidou, AP-HP, Universit¨| Paris Descartes, Universit¨| Sorbonne Paris Cit¨|||INSERM UMR 1138 Equipe 22, Universit¨| Paris Descartes, Universit¨| Sorbonne Paris Cit¨|Universit¨| de Lorraine, CNRSUniversit¨| de Lorraine, CNRSUniversit¨| de Lorraine, CNRSUniversit¨| de Lorraine, CNRS||Stanford Center for Biomedical Informatics Research, Stanford UniversityUniversit¨| de Lorraine, CNRSUniversit¨| de Lorraine, CNRSSorbonne Universit¨|, INSERMDepartment of Biomedical Informatics and Medical Education, University of WashingtonCentre R¨|gional de Pharmacovigilance, CHRU of NancyINSERM U1256 - NGERE, Universit¨| de LorraineUniversit¨| de Lorraine, CNRSUniversit¨| de Lorraine, CNRS

10.1101/534388

医药卫生理论医学研究方法药学

pharmacogenomicscorpusmanual annotationsnatural language processingsupervised machine learningentity recognitionrelationship extraction

Digan William,Gogdemir Romain,Dalleau Kevin,Sma?l-Tabbone Malika,Coulet Adrien,Legrand Jo?l,Devignes Marie-Dominique,Bousquet C¨|dric,Lee Chia-Ju,Petitpain Nadine,Ndiaye Ndeye-Coumba,Ringot Patrice,Toussaint Yannick.PGxCorpus: a Manually Annotated Corpus for Pharmacogenomics[EB/OL].(2025-03-28)[2025-06-17].https://www.biorxiv.org/content/10.1101/534388.点此复制

评论