SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method
SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method
With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.
O?ˉToole ¨¤ine、Hinrichs Angie S.、Wang Jade、de Bernardi Schneider Adriano、Su Michelle、Amin Helly、Bell John、Wadford Debra A.、Perry Marc D.、Turakhia Yatish、Scher Emily、De Maio Nicola、Corbett-Detig Russ、Hughes Scott
Institute of Evolutionary Biology, University of EdinburghGenomics Institute, University of California Santa CruzNew York City Public Health Laboratory, Department of Health and Mental HygieneGenomics Institute, University of California Santa Cruz||Department of Biomolecular Engineering, University of California Santa CruzNew York City Public Health Laboratory, Department of Health and Mental HygieneNew York City Public Health Laboratory, Department of Health and Mental HygieneCalifornia Department of Public Health (CDPH)California Department of Public Health (CDPH)Genomics Institute, University of California Santa CruzDepartment of Electrical and Computer Engineering, University of California San DiegoInstitute of Evolutionary Biology, University of EdinburghEuropean Molecular Biology Laboratory, European Bioinformatics InstituteGenomics Institute, University of California Santa Cruz||Department of Biomolecular Engineering, University of California Santa CruzNew York City Public Health Laboratory, Department of Health and Mental Hygiene
医学研究方法微生物学生物科学研究方法、生物科学研究技术
PhylogeneticsBioinformaticsCOVID-19variants
O?ˉToole ¨¤ine,Hinrichs Angie S.,Wang Jade,de Bernardi Schneider Adriano,Su Michelle,Amin Helly,Bell John,Wadford Debra A.,Perry Marc D.,Turakhia Yatish,Scher Emily,De Maio Nicola,Corbett-Detig Russ,Hughes Scott.SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method[EB/OL].(2025-03-28)[2025-05-13].https://www.biorxiv.org/content/10.1101/2023.05.26.542489.点此复制
评论