|国家预印本平台
首页|A schema for digitized surface swab site metadata in open-source DNA sequence databases

A schema for digitized surface swab site metadata in open-source DNA sequence databases

A schema for digitized surface swab site metadata in open-source DNA sequence databases

来源:bioRxiv_logobioRxiv
英文摘要

Large, open-source DNA sequence databases have been generated, in part, through the collection of microbial pathogens from swabbing surfaces in built environments. Analyzing these data in aggregate through public health surveillance requires digitization of the complex, domain-specific metadata associated with swab site locations. However, the swab site location information is currently collected in a single, free-text ISOLATION SOURCE field promoting generation of poorly detailed descriptions with varying word order, granularity, and linguistic errors, making automation difficult and reducing machine-actionability. We assessed 1,498 free-text swab site descriptions generated during routine foodborne pathogen surveillance. The lexicon of free-text metadata was evaluated to determine the informational facets and quantity of unique terms used by data collectors. Open Biological Ontologies (OBO) foundry libraries were used to develop hierarchical vocabularies connected with logical relationships to describe swab site locations. Five informational facets described by 338 unique terms were identified via content analysis. Term hierarchy facets were developed as were statements (called axioms) about how entities within these five domains were related. The schema developed through this study has been integrated into a publicly available pathogen metadata standard, facilitating ongoing surveillance and investigations. The One Health Enteric Package is available at NCBI BioSample beginning in 2022. Collective use of metadata standards increases the interoperability of DNA sequence databases, enabling large-scale approaches to data sharing, artificial intelligence, and big-data solutions to food safety.

Griffiths Emma、Feng Barry、Daeschel Devin、Snyder Abigail B、Timme Ruth、Allard Marc W、Dooley Damion、Chen Yi

10.1101/2022.12.15.520583

生物科学研究方法、生物科学研究技术微生物学环境科学理论

Griffiths Emma,Feng Barry,Daeschel Devin,Snyder Abigail B,Timme Ruth,Allard Marc W,Dooley Damion,Chen Yi.A schema for digitized surface swab site metadata in open-source DNA sequence databases[EB/OL].(2025-03-28)[2025-06-03].https://www.biorxiv.org/content/10.1101/2022.12.15.520583.点此复制

评论