A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome
ABSTRACT Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from under-investigated habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. It revealed six clusters derived from new bacterial phyla and 63 new giant viruses, 3 of which missed by the traditional marker-based approach. In summary, we demonstrate that Genome Constellation can tackle the computational and algorithmic challenges in large-scale taxonomy analyses in metagenomics.
Sevim Volkan、Morgan-Kiss Rachael M.、Orsini Rachel、Kang Dongwan、Ho Harrison、Barich Daniel J.、Woyke Tanja、Schulz Frederik、Li Wei、Egan Rob、McCue Kayla、Slonczewski Joan L.、Sedlacek Christopher J.、Macklin Derek、Froula Jeff、Shay Jackie E.、Yao Shijie、Wang Zhong
Department of Energy Joint Genome InstituteDepartment of Microbiology, Miami UniversityDepartment of Energy Resources Engineering, Stanford UniversityDepartment of Energy Joint Genome InstituteSchool of Natural Sciences, University of California at MercedDepartment of Biology, Kenyon CollegeDepartment of Energy Joint Genome Institute||Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory||School of Natural Sciences, University of California at MercedDepartment of Energy Joint Genome InstituteDepartment of Land Resources and Environmental Sciences, Montana State UniversityDepartment of Energy Joint Genome InstituteProgram in Computational and Systems Biology, Massachusetts Institute of TechnologyDepartment of Biology, Kenyon CollegeCentre for Microbiology and Environmental Systems Science, Division of Microbial Ecology, University of ViennaDepartment of Bioengineering, Stanford UniversityDepartment of Energy Joint Genome InstituteSchool of Natural Sciences, University of California at MercedDepartment of Energy Joint Genome InstituteDepartment of Energy Joint Genome Institute||Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory||School of Natural Sciences, University of California at Merced
生物科学研究方法、生物科学研究技术分子生物学微生物学
Taxonomy classificationmetagenome assembled genomesmetagenome visualizationGenome Constellation
Sevim Volkan,Morgan-Kiss Rachael M.,Orsini Rachel,Kang Dongwan,Ho Harrison,Barich Daniel J.,Woyke Tanja,Schulz Frederik,Li Wei,Egan Rob,McCue Kayla,Slonczewski Joan L.,Sedlacek Christopher J.,Macklin Derek,Froula Jeff,Shay Jackie E.,Yao Shijie,Wang Zhong.A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome[EB/OL].(2025-03-28)[2025-04-26].https://www.biorxiv.org/content/10.1101/812917.点此复制
评论