PanGraph: scalable bacterial pan-genome graph construction
PanGraph: scalable bacterial pan-genome graph construction
The genomic diversity of microbes is commonly parameterized as single nucleotide polymorphisms relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the wide-spread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. Traditional computational approaches towards whole-genome analysis either scale poorly with the number of genomes, or treat genomes as dissociated ``bags of genes'', and thus are not suited for this new era. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as an undirected path along vertices, which in turn, encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into a several common formats for either downstream analysis or immediate visualization.
Molari Marco、Neher Richard、Shaw Liam P.、Noll Nicholas
微生物学生物工程学计算技术、计算机技术
Molari Marco,Neher Richard,Shaw Liam P.,Noll Nicholas.PanGraph: scalable bacterial pan-genome graph construction[EB/OL].(2025-03-28)[2025-05-07].https://www.biorxiv.org/content/10.1101/2022.02.24.481757.点此复制
评论