|国家预印本平台
首页|Factorization-based Imputation of Expression in Single-cell Transcriptomic Analysis (FIESTA) recovers Gene-Cell-State relationships

Factorization-based Imputation of Expression in Single-cell Transcriptomic Analysis (FIESTA) recovers Gene-Cell-State relationships

Factorization-based Imputation of Expression in Single-cell Transcriptomic Analysis (FIESTA) recovers Gene-Cell-State relationships

来源:bioRxiv_logobioRxiv
英文摘要

Abstract Single cell RNA sequencing (scRNA-seq) is a gene expression profiling technique that is presently revolutionizing the study of complex cellular systems in the biological sciences. Existing scRNA-seq methods suffer from sub-optimal target recovery leading to inaccurate measurements including many false negatives. The resulting ‘zero-inflated’ data may confound data interpretation and visualization. Since cells have coherent phenotypes defined by conserved molecular circuitries (i.e. multiple gene products working together) and since similar cells utilize similar circuits, information about each expression value or ‘node’ in a multi-cell, multi-gene scRNA-seq data set is expected to also be predictable from other nodes in the data set. Based on this logic, several approaches have been proposed to impute missing values in a data set by extracting information from its non-zero measurements. In this study, we apply non-negative matrix factorization to a selection of published scRNA-seq data sets followed by multiplication of the factor matrices to generate idealized ‘completed’ model versions of the data. From the model matrices, we recommend new values where original measurements are likely to be inaccurate and where ‘zero’ measurements are predicted to be false negatives. The resulting imputed data model predicts novel type markers and expression patterns that match orthogonal measurements and field literature better than those obtained from pre-imputation data or alternative imputation strategies. Contactbenjamin.spike@hci.utah.edu Availability and implementationFIESTA is written in R and is available at https://github.com/elnazmirzaei/FIESTA and https://github.com/TheSpikeLab/FIESTA. Author summaryIn this work, we develop FIESTA, a novel, unsupervised, mathematical approach to impute missing values in scRNA-seq data. For each dataset, we use parts-based, non-negative matrix factorization to break the cells-by-genes expression matrix into optimized component matrices and then multiply these component matrices to generate an idealized, ‘completed’ matrix. The completed matrix has many of the null values filled in because the optimized low rank factors from which it is generated, take multiple cells into account when estimating a particular component, including some cells with positive expression values for genes which are false negatives in other related cells. We also implement scaling and thresholding approaches based on intrinsic data topology for improved interpretability and graphical representation. Overall, FIESTA performs favorably relative to alternative imputation approaches and uncovers gene-gene and gene-cell relationships that are occluded in the raw data. The FIESTA computational pipeline is freely available for download and use by other researchers analyzing scRNA-seq data or other sparse data sets.

Bhaskara Aditya、Spike Benjamin T.、Mehrabad Elnaz Mirzaei

School of Computing, University of UtahSchool of Computing, University of Utah||Huntsman Cancer Institute, Department of Oncological Sciences, University of Utah School of MedicineSchool of Computing, University of Utah||Huntsman Cancer Institute, Department of Oncological Sciences, University of Utah School of Medicine

10.1101/2021.04.29.441691

生物科学研究方法、生物科学研究技术分子生物学计算技术、计算机技术

Bhaskara Aditya,Spike Benjamin T.,Mehrabad Elnaz Mirzaei.Factorization-based Imputation of Expression in Single-cell Transcriptomic Analysis (FIESTA) recovers Gene-Cell-State relationships[EB/OL].(2025-03-28)[2025-06-19].https://www.biorxiv.org/content/10.1101/2021.04.29.441691.点此复制

评论