|国家预印本平台
首页|Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

来源:bioRxiv_logobioRxiv
英文摘要

Abstract Standard preprocessing of single-cell RNA-seq UMI data includes normalization by sequencing depth to remove this technical variability, and nonlinear transformation to stabilize the variance across genes with different expression levels. Instead, two recent papers propose to use statistical count models for these tasks: Hafemeister and Satija (2019) recommend using Pearson residuals from negative binomial regression, while Townes et al. (2019) recommend fitting a generalized PCA model. Here, we investigate the connection between these approaches theoretically and empirically, and compare their effects on downstream processing. We show that the model of Hafemeister and Satija (2019) produces noisy parameter estimates because it is overspecified (which is why the original paper employs post-hoc regularization). When specified more parsimoniously, it has a simple analytic solution equivalent to the rank-one Poisson GLM-PCA of Townes et al. (2019). Further, our analysis indicates that per-gene overdispersion estimates in Hafemeister and Satija (2019) are biased, and that the data analyzed in that paper are in fact consistent with constant overdispersion parameter across genes. We then use negative control data without biological variability to estimate the technical overdispersion of UMI counts, and find that across several different experimental protocols, the data suggest very moderate overdispersion. Finally, we argue that analytic Pearson residuals (or, equivalently, rank-one GLM-PCA or negative binomial regression after regularization) strongly outperform standard preprocessing for identifying biologically variable genes, and capture more biologically meaningful variation when used for dimensionality reduction, compared to other methods.

Kobak Dmitry、Berens Philipp、Lause Jan

Institute for Ophthalmic Research, University of T¨1bingenInstitute for Ophthalmic Research, University of T¨1bingen||Institute for Bioinformatics and Medical Informatics, University of T¨1bingen||Bernstein Center for Computational Neuroscience, University of T¨1bingen||Center for Integrative Neuroscience, University of T¨1bingenInstitute for Ophthalmic Research, University of T¨1bingen

10.1101/2020.12.01.405886

细胞生物学分子生物学生物科学研究方法、生物科学研究技术

Kobak Dmitry,Berens Philipp,Lause Jan.Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data[EB/OL].(2025-03-28)[2025-08-25].https://www.biorxiv.org/content/10.1101/2020.12.01.405886.点此复制

评论