FDR control in GWAS with population structure
FDR control in GWAS with population structure
Abstract We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing distinct and interpretable discoveries while controlling the false discovery rate. This approach leverages sophisticated multivariate models, correcting for linkage disequilibrium, and accounts for population structure and relatedness, adapting to the characteristics of the samples at hand. A key element is the recognition that the observed genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows us to generate imperfect copies (knockoffs) of these variables which serve as ideal negative controls; knockoffs are indistinguishable from the original genotypes in distribution, and independent from the phenotype. In sharp contrast with state-of-the-art methods, the validity of our inference in no way depends on assumptions about the unknown relation between genotypes and phenotype. We develop and leverage a model for the genotypes that accounts for arbitrary and unknown population structure, which may be due to diverse ancestries or familial relatedness. We build a pipeline that is robust to the most prominent possible confounders, facilitating the discovery of causal variants. Validity and effectiveness are demonstrated by extensive simulations with real data, as well as by the analysis of several phenotypes in the UK Biobank. Finally, fast software is made available for researchers to apply the proposed methodology to Biobank-scale data sets.
Bates Stephen、Cand¨¨s Emmanuel、Sabatti Chiara、Marchini Jonathan、Sesia Matteo
Departments of Statistics and of EECS, University of CaliforniaDepartments of Statistics and of Mathematics, Stanford UniversityDepartments of Statistics and of Biomedical Data Sciences, Stanford UniversityRegeneron Genetics Center, Regeneron PharmaceuticalsDepartment of Data Sciences and Operations, University of Southern
生物科学研究方法、生物科学研究技术基础医学遗传学
Bates Stephen,Cand¨¨s Emmanuel,Sabatti Chiara,Marchini Jonathan,Sesia Matteo.FDR control in GWAS with population structure[EB/OL].(2025-03-28)[2025-08-02].https://www.biorxiv.org/content/10.1101/2020.08.04.236703.点此复制
评论