|国家预印本平台
首页|Polygenic Risk Score for Gastric Cancer

Polygenic Risk Score for Gastric Cancer

Polygenic Risk Score for Gastric Cancer

来源:medRxiv_logomedRxiv
英文摘要

Abstract BackgroundGastric Cancer is one of the most predominant types of cancer in the world, and its genomic links are currently being studied at great depth. In this paper, we work towards using Genome Wide Association Studies (GWAS) data for identifying the Single Nucleotide Polymorphisms (SNPs) which have the strongest correlation with the occurrence of gastric cancer through statistical tests and to leverage them to build a predictive model using machine learning algorithms. Polygenic risk scoring (PRS) is a straightforward predictive model for assigning genetic risk to individual outcomes (cancer or healthy). MethodGenome Wide Association Studies (GWAS) data for Gastric Cancer was subjected to different statistical tests. Chi-square was used for feature selection by determining the degree of association between each probe (SNP) and the target (cancer or control). These results were used to eliminate many probes and proceed with only those that are statistically significant. Na?ve Bayes Classifier and Catboost machine learning algorithms were used to build classification models to predict (score) gastric cancer. ResultsNa?ve Bayes classifier and Catboost classification algorithms were used for modeling. The features were selected by performing Chi-square test on each of the 319283 SNPs in the data. These values were then ordered according to the negative log of the p-value and the top 5, 100 and 1000 features were used as inputs in the classification models. The Na?ve Bayes classifier gave an accuracy in the range of 0.60 to 0.76 for different sets of features. The Catboost algorithm proved to be more suited for this application as it gave an accuracy above 0.90 for all subsets of features. ConclusionsThis paper aims at creating a highly accurate classification model to predict the occurrence of gastric cancer from GWAS genome data. The Catboost model with an input space of 100 SNPs yielded the best results with an accuracy of 0.93 and can be considered as a polygenic risk scoring model to score new patients for gastric cancer.

Pratinidhi Sharvani Padmaraj、Sayad Saed

10.1101/2020.08.11.20172221

医学研究方法肿瘤学基础医学

Pratinidhi Sharvani Padmaraj,Sayad Saed.Polygenic Risk Score for Gastric Cancer[EB/OL].(2025-03-28)[2025-08-02].https://www.medrxiv.org/content/10.1101/2020.08.11.20172221.点此复制

评论