|国家预印本平台
首页|Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining

Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining

Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining

来源:bioRxiv_logobioRxiv
英文摘要

Protein phosphorylation regulates numerous cellular processes and is highly studied in biology. However, the analysis of phosphoproteomic datasets remains challenging due to limited information on upstream regulators of phosphosites, which is fragmented across multiple curated databases and unstructured literature. When aggregating information on phosphosites from six databases and three text mining systems, we found that a substantial proportion of phosphosites were mentioned at residue positions not matching the reference sequence. These errors were often attributable to the use of residue numbers from non-canonical protein isoforms, mouse or rat proteins, or post-translationally processed proteins. Non-canonical site numbering is also prevalent in mass spectrometry datasets from large-scale efforts such as the Clinical Proteomic Tumor Analysis Consortium (CPTAC). To address these issues, we developed ProtMapper, an open-source Python tool that automatically normalizes site positions to human protein reference sequences. We used ProtMapper coupled with the INDRA knowledge assembly system to create a corpus of 37,028 regulatory annotations for 16,332 sites—to our knowledge, the most comprehensive corpus of literature-derived information about phosphosite regulation currently available. This work highlights how automated phosphosite normalization coupled to text mining and knowledge assembly allows researchers to leverage phosphosite information that exists within the scientific literature.

Sorger Peter K.、Bachman John A.、Gyori Benjamin M.

10.1101/822668

生物科学研究方法、生物科学研究技术生物化学分子生物学

Sorger Peter K.,Bachman John A.,Gyori Benjamin M..Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining[EB/OL].(2025-03-28)[2025-06-04].https://www.biorxiv.org/content/10.1101/822668.点此复制

评论