scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data
scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data
Abstract MotivationSingle-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, dropout information is not explicitly used by any current cell annotation method. Fully utilizing dropout information for cell type annotation motivated this work. ResultsWe present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene’s marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using fourteen real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate’s misclassified cells are very different from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy. AvailabilityWe implemented scAnnotate as an R package and made it publicly available from CRAN. ContactXuekui Zhang: xuekui@uvic.ca and Li Xing: li.xing@math.usask.ca
Tsao Danielle、Bai Kailun、Ji Xiangling、Tsao Min、Xing Li、Zhang Xuekui
Department of Mathematics & Statistics, University of VictoriaDepartment of Mathematics & Statistics, University of VictoriaDepartment of Mathematics & Statistics, University of VictoriaDepartment of Mathematics & Statistics, University of VictoriaDepartment of Mathematics & Statistics, University of SaskatchewanDepartment of Mathematics & Statistics, University of Victoria
生物科学研究方法、生物科学研究技术计算技术、计算机技术
Tsao Danielle,Bai Kailun,Ji Xiangling,Tsao Min,Xing Li,Zhang Xuekui.scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data[EB/OL].(2025-03-28)[2025-08-02].https://www.biorxiv.org/content/10.1101/2022.02.19.481159.点此复制
评论