Multivariate regression with missing response data for modelling regional DNA methylation QTLs
Multivariate regression with missing response data for modelling regional DNA methylation QTLs
Identifying genetic regulators of DNA methylation (mQTLs) with multivariate models enhances statistical power, but is challenged by missing data from bisulfite sequencing. Standard imputation-based methods can introduce bias, limiting reliable inference. We propose \texttt{missoNet}, a novel convex estimation framework that jointly estimates regression coefficients and the precision matrix from data with missing responses. By using unbiased surrogate estimators, our three-stage procedure avoids imputation while simultaneously performing variable selection and learning the conditional dependence structure among responses. We establish theoretical error bounds, and our simulations demonstrate that \texttt{missoNet} consistently outperforms existing methods in both prediction and sparsity recovery. In a real-world mQTL analysis of the CARTaGENE cohort, \texttt{missoNet} achieved superior predictive accuracy and false-discovery control on a held-out validation set, identifying known and credible novel genetic associations. The method offers a robust, efficient, and theoretically grounded tool for genomic analyses, and is available as an R package.
Shomoita Alam、Yixiao Zeng、Sasha Bernatsky、Marie Hudson、Inés Colmegna、David A. Stephens、Celia M. T. Greenwood、Archer Y. Yang
生物科学研究方法、生物科学研究技术遗传学分子生物学
Shomoita Alam,Yixiao Zeng,Sasha Bernatsky,Marie Hudson,Inés Colmegna,David A. Stephens,Celia M. T. Greenwood,Archer Y. Yang.Multivariate regression with missing response data for modelling regional DNA methylation QTLs[EB/OL].(2025-07-08)[2025-07-23].https://arxiv.org/abs/2507.05990.点此复制
评论