GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes
GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes
Metagenomic binning, the process of grouping DNA sequences into taxonomic units, is critical for understanding the functions, interactions, and evolutionary dynamics of microbial communities. We propose a deep learning approach to binning using two neural networks, one based on composition and another on environmental abundance, dynamically weighting the contribution of each based on characteristics of the input data. Trained on over 43,000 prokaryotic genomes, our network for composition-based binning is inspired by metric learning techniques used for facial recognition. Using a task-specific, multi-GPU accelerated algorithm to cluster the embeddings produced by our network, our binner leverages marker genes observed to be universally present in nearly all taxa to grade and select optimal clusters of sequences from a hierarchy of candidates. We evaluate our approach on four simulated datasets with known ground truth. Our linear time integration of marker genes recovers more near complete genomes than state of the art but computationally infeasible solutions using them, while being over an order of magnitude faster. Finally, we demonstrate the scalability and acuity of our approach by testing it on three of the largest metagenome assemblies ever performed. Compared to other binners, we produced 47%-183% more near complete genomes. From these datasets, we find over the genomes of over 3000 new candidate species which have never been previously cataloged, representing a potential 4% expansion of the known bacterial tree of life.
Lettich Richard、Yelick Katherine、Riley Robert、Tritt Andrew、Wang Zhong、Oliker Lenoid、Egan Robert、Buluc Aydin
生物科学研究方法、生物科学研究技术微生物学计算技术、计算机技术
Lettich Richard,Yelick Katherine,Riley Robert,Tritt Andrew,Wang Zhong,Oliker Lenoid,Egan Robert,Buluc Aydin.GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes[EB/OL].(2025-03-28)[2025-05-22].https://www.biorxiv.org/content/10.1101/2024.02.07.579326.点此复制
评论