|国家预印本平台
首页|Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers

Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers

Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers

来源:bioRxiv_logobioRxiv
英文摘要

Abstract MotivationAmplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during Polymerase Chain Reaction (PCR) and sequencing. One solution attaches Unique Molecular Identifiers (UMIs) to sample sequences before amplification eliminating amplification bias by clustering reads on UMI and counting clusters to quantify abundance. While modern methods improve over na?ve clustering by UMI identity, most do not account for UMI reuse, or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. ResultsWe introduce Deduplication and accurate Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological sequences and accurately estimate their deduplicated abundance from amplicon sequence data. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. AvailabilitySource code is available at https://github.com/xiyupeng/AmpliCI-UMI.

Peng Xiyu、Dorman Karin S

Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterDepartment of Statistics, Iowa State University||Bioinformatics and Computational Biology Program, Iowa State University||Department of Genetics, Development and Cell Biology, Iowa State University

10.1101/2022.06.12.495839

分子生物学遗传学生物科学研究方法、生物科学研究技术

unique molecular identifiercategorical data clusteringcompositional data quantificationgenetic variantsHidden Markov Modelsparse parameter estimation

Peng Xiyu,Dorman Karin S.Accurate Estimation of Molecular Counts from Amplicon Sequence Data with Unique Molecular Identifiers[EB/OL].(2025-03-28)[2025-05-18].https://www.biorxiv.org/content/10.1101/2022.06.12.495839.点此复制

评论