A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data
A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data
Abstract Cell type annotation is critical to understand the cell population heterogeneity in the single-cell RNA sequencing (scRNA-seq) analysis. Due to their fast, precise, and user-friendly advantages, automatic annotation methods are gradually replacing traditional unsupervised clustering approaches in cell type identification practice. However, current supervised annotation tools are easily overfitting, thus favoring large cell populations but failing to learn the information of smaller populations. This drawback will significantly mislead biological analysis, especially when the rare cell types are important. Here, we present scBalance, an integrated sparse neural network framework that leverages the adaptive weight sampling and dropout techniques for the auto-annotation task. Using 20 scRNA-seq datasets with different scales and different imbalance degrees, we systematically validate the strong performance of scBalance for both intra-dataset and inter-dataset annotation tasks. Furthermore, we also demonstrate the scalability of scBalance on identifying rare cell types in million-level datasets by uncovering the immune landscape in bronchoalveolar cells. Up to now, scBalance is the first and only auto-annotation tool that expands scalability to 1.5 million cells dataset. In addition, scBalance also shows a fast and stable speed outperforming commonly used tools across all scales of datasets. We implemented scBalance in a user-friendly manner that can easily interact with Scanpy, which makes scBalance a superior tool in the increasingly important Python-based platform.
Fan Xingyu、Li Yu、Cheng Yuqi、Zhang Jianing
School of Information and Software Engineering, University of Electronic Science and Technology of ChinaDepartment of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK)||The CUHK Shenzhen Research InstituteDepartment of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK)||Weill Cornell Graduate School of Medical Sciences, Weill Cornell MedicineDepartment of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK)
生物科学研究方法、生物科学研究技术计算技术、计算机技术细胞生物学
Fan Xingyu,Li Yu,Cheng Yuqi,Zhang Jianing.A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data[EB/OL].(2025-03-28)[2025-04-28].https://www.biorxiv.org/content/10.1101/2022.06.22.497193.点此复制
评论