|国家预印本平台
首页|Polygenic Risk Prediction using Gradient Boosted Trees Captures Non-Linear Genetic Effects and Allele Interactions in Complex Phenotypes

Polygenic Risk Prediction using Gradient Boosted Trees Captures Non-Linear Genetic Effects and Allele Interactions in Complex Phenotypes

Polygenic Risk Prediction using Gradient Boosted Trees Captures Non-Linear Genetic Effects and Allele Interactions in Complex Phenotypes

来源:medRxiv_logomedRxiv
英文摘要

Abstract Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a given trait. However, the standard PRS fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). Machine learning algorithms can be used to account for such non-linearities and interactions. We trained and validated polygenic prediction models for five complex phenotypes in a multi-ancestry population: total cholesterol, triglycerides, systolic blood pressure, sleep duration, and height. We used an ensemble method of LASSO for feature selection and gradient boosted trees (XGBoost) for non-linearities and interaction effects. In an independent test set, we found that combining a standard PRS as a feature in the XGBoost model increases the percentage variance explained (PVE) of the prediction model compared to the standard PRS by 25% for sleep duration, 26% for height, 44% for systolic blood pressure, 64% for triglycerides, and 85% for total cholesterol. Machine learning models trained in specific racial/ethnic groups performed similarly in multi-ancestry trained models, despite smaller sample sizes. The predictions of the machine learning models were superior to the standard PRS in each of the racial/ethnic groups in our study. However, among Blacks the PVE was substantially lower than for other groups. For example, the PVE for total cholesterol was 8.1%, 12.9%, and 17.4% for Blacks, Whites, and Hispanics/Latinos, respectively. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.

Fornage Myriam、Rotter Jerome I、Romero-Brufau Santiago、Kurniansyah Nuzulul、Brody Jennifer A.、Levy Daniel、Sofer Tamar、Lloyd-Jones Donald M.、Lyons Genevieve、de Vries Paul、Elgart Michael、Psaty Bruce M、Raffield Laura、Rich Stephen S、Lange Leslie A、Gao Yan、Redline Susan、Lin Henry J、the NHLBI?ˉs Trans-Omics in Precision Medicine (TOPMed) Consortium、Guo Xiuqing、Chen Han、Peloso Gina M、Morrison Alanna C

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston||Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at HoustonThe Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical CenterDepartment of Biostatistics, Harvard T.H. Chan School of Public Health||Department of MedicineDivision of Sleep and Circadian Disorders, Brigham and Women?ˉs HospitalCardiovascular Health Research Unit, Department of Medicine, University of WashingtonThe Population Sciences Branch of the National Heart, Lung and Blood Institute||The Framingham Heart StudyDivision of Sleep and Circadian Disorders, Brigham and Women?ˉs Hospital||Department of Medicine, Harvard Medical School||Department of Biostatistics, Harvard T.H. Chan School of Public HealthDepartment of Preventive Medicine, Northwestern UniversityDivision of Sleep and Circadian Disorders, Brigham and Women?ˉs Hospital||Department of Biostatistics, Harvard T.H. Chan School of Public HealthHuman Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at HoustonDivision of Sleep and Circadian Disorders, Brigham and Women?ˉs Hospital||Department of Medicine, Harvard Medical SchoolCardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of WashingtonDepartment of Genetics, University of North CarolinaCenter for Public Health Genomics, University of Virginia School of MedicineDepartment of Medicine, University of Colorado Denver, Anschutz Medical CampusThe Jackson Heart Study, University of Mississippi Medical CenterDivision of Sleep and Circadian Disorders, Brigham and Women?ˉs Hospital||Department of Medicine, Harvard Medical SchoolThe Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical CenterThe Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical CenterHuman Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston||Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at HoustonDepartment of Biostatistics, Boston University School of Public HealthHuman Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston

10.1101/2021.07.09.21260288

遗传学生物科学现状、生物科学发展

Machine LearningGradient Boosted TreesDiverse PopulationXGBoostGenetic PredictionPolygenic Risk Scores

Fornage Myriam,Rotter Jerome I,Romero-Brufau Santiago,Kurniansyah Nuzulul,Brody Jennifer A.,Levy Daniel,Sofer Tamar,Lloyd-Jones Donald M.,Lyons Genevieve,de Vries Paul,Elgart Michael,Psaty Bruce M,Raffield Laura,Rich Stephen S,Lange Leslie A,Gao Yan,Redline Susan,Lin Henry J,the NHLBI?ˉs Trans-Omics in Precision Medicine (TOPMed) Consortium,Guo Xiuqing,Chen Han,Peloso Gina M,Morrison Alanna C.Polygenic Risk Prediction using Gradient Boosted Trees Captures Non-Linear Genetic Effects and Allele Interactions in Complex Phenotypes[EB/OL].(2025-03-28)[2025-08-02].https://www.medrxiv.org/content/10.1101/2021.07.09.21260288.点此复制

评论