|国家预印本平台
首页|Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

来源:medRxiv_logomedRxiv
英文摘要

Abstract BackgroundCopy number aberrations (CNA) have proved to be of clinical and therapeutic significance for many diseases including breast cancer, since they drive numerous key underlying biological processes, by regulating molecular phenotypes like gene expression and others. To comprehensively assess the effect of CNAs, it is not sufficient to only identify significant CNA-gene expression pairs, but also to identify the overall gene networks and regulatory structures that are influenced by CNAs, subsequently producing change in outcomes. MethodsIn this article, we adopt a two-step analysis approach to identify CNA regulated genes whose expression levels affect breast cancer related outcomes: (1) we identify gene modules that are regulated by CNAs through sparse canonical correlation analysis (sCCA) which selects a set of closely located CNAs that regulates the expression levels of selected genes. (2) then, we use a using generalized linear model, to identify which genes within the gene modules are associated with breast cancer related outcomes. ResultsAnalyzing clinical and genomic data on 1904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. The identification of gene modules was further validated using independent data on individuals in a study of breast invasive carcinoma from The Cancer Genome Atlas (TCGA). Association analysis on 7 different breast cancer related outcomes identified several novel and interpretable regulatory associations which highlights how CNA can impact key biological pathways and process in context of breast cancer. Through downstream analysis of two example outcomes: estrogen receptor status and overall survival, we show that the identified genes were enriched in relevant biological pathways and the key advantage of our method is that we additionally identify the CNA that regulate these genes. Due to the availability of multiple types of outcomes, we further meta-analyzed the results to identify genes that had potentially associations with multiple outcomes. ConclusionsOverall we present a generalizable analysis approach to identify genes associated to different outcomes that are regulated by sets of CNA and can further be used to combine results across various types of outcomes. The results show that our method can identify novel and interpretable associations, by providing mechanistic insights on how the effects of CNA are cascaded via gene expression to impact breast cancer and related outcomes.

Sen Ananda、Satagopan Jaya、Dutta Diptavo

Department of Biostatistics, University of Michigan||Department of Family Medicine, University of MichiganDepartment of Biostatistics and Epidemiology, Rutgers UniversityDepartment of Biostatistics, Johns Hopkins University

10.1101/2021.08.29.21262811

医学研究方法肿瘤学基础医学

Sen Ananda,Satagopan Jaya,Dutta Diptavo.Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations[EB/OL].(2025-03-28)[2025-05-16].https://www.medrxiv.org/content/10.1101/2021.08.29.21262811.点此复制

评论