Construction and representation of human pangenome graphs
Construction and representation of human pangenome graphs
Abstract As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. In this work we collect all publicly available high-quality human haplotypes and constructed the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb. We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.
Andreace Francesco、Dufresne Yoann、Lechat Pierre、Chikhi Rayan
Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Universit¨| Paris Cit¨|||Sorbonne Universit¨|, Coll¨¨ge doctoralSequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Universit¨| Paris Cit¨|||Bioinformatics and Biostatistics Hub, Institut Pasteur, Universit¨| de ParisBioinformatics and Biostatistics Hub, Institut Pasteur, Universit¨| de ParisSequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Universit¨| Paris Cit¨|
遗传学生物工程学分子生物学
Andreace Francesco,Dufresne Yoann,Lechat Pierre,Chikhi Rayan.Construction and representation of human pangenome graphs[EB/OL].(2025-03-28)[2025-07-03].https://www.biorxiv.org/content/10.1101/2023.06.02.542089.点此复制
评论