|国家预印本平台
首页|Germline Contamination and Leakage in Whole Genome Somatic Single Nucleotide Variant Detection

Germline Contamination and Leakage in Whole Genome Somatic Single Nucleotide Variant Detection

Germline Contamination and Leakage in Whole Genome Somatic Single Nucleotide Variant Detection

来源:bioRxiv_logobioRxiv
英文摘要

Abstract BackgroundThe clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. ResultsThe median somatic SNV prediction set contained 4,325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. ConclusionsThe potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

Bare J. Christopher、Yamaguchi Takafumi N.、Ewing Adam D.、Houlahan Kathleen E.、Margolin Adam A.、Boutros Paul C.、Stuart Joshua M.、Sendorek Dorota H.、Norman Thea C.、Ellrott Kyle、Caloian Cristian

Sage BionetworksInformatics & Biocomputing Program, Ontario Institute for Cancer ResearchDepartment of Biomolecular Engineering, University of California||Mater Research Institute, University of QueenslandInformatics & Biocomputing Program, Ontario Institute for Cancer ResearchSage Bionetworks||Computational Biology Program, Oregon Health & Science University||Department of Biomedical Engineering, Oregon Health & Science UniversityInformatics & Biocomputing Program, Ontario Institute for Cancer Research||Department of Medical Biophysics, University of Toronto||Department of Pharmacology & Toxicology, University of TorontoDepartment of Biomolecular Engineering, University of CaliforniaInformatics & Biocomputing Program, Ontario Institute for Cancer ResearchSage BionetworksDepartment of Biomolecular Engineering, University of California||Computational Biology Program, Oregon Health & Science UniversityInformatics & Biocomputing Program, Ontario Institute for Cancer Research

10.1101/204370

肿瘤学遗传学生物科学研究方法、生物科学研究技术

cancer genomicsnext-generation sequencingmutation callinggermline contaminationgermline leakagepatient identifiabilitysingle nucleotide variantSNV

Bare J. Christopher,Yamaguchi Takafumi N.,Ewing Adam D.,Houlahan Kathleen E.,Margolin Adam A.,Boutros Paul C.,Stuart Joshua M.,Sendorek Dorota H.,Norman Thea C.,Ellrott Kyle,Caloian Cristian.Germline Contamination and Leakage in Whole Genome Somatic Single Nucleotide Variant Detection[EB/OL].(2025-03-28)[2025-06-12].https://www.biorxiv.org/content/10.1101/204370.点此复制

评论