SafeMut: UMI-aware variant simulator incorporating allele-fraction overdispersion in read editing
SafeMut: UMI-aware variant simulator incorporating allele-fraction overdispersion in read editing
Abstract Next-generation sequencing (NGS) has been widely used for calling biological variants. The gold-standard methodology for accessing the ability of a computational method to call a specific variant is to perform NGS wet-lab experiments on samples known to harbor this variant. Nevertheless, wet-lab experiments are both labor-intensive and time-consuming, and rare variants may not be present in a sample of population. Moreover, these two issues are exacerbated in SafeSeqS which enabled liquid biopsy and minimum-residual disease (MRD) detection with cell-free DNA by using unique molecular identifier (UMI) to detect and/or correct NGS error. Hence, we developed the first UMI-aware NGS small-variant simulator named SafeMut which also considered the overdispersion of allele fraction. We used the tumor-normal paired sequencing runs from the SEQC2 somatic reference sets and cell-free DNA data sets to assess the performance of BamSurgeon, VarBen, and SafeMut. We observed that, unlike BamSurgeon and VarBen, the allele-fraction distribution of the variants simulated by SafeMut closely resembles such distribution generated by technical replicates of wet-lab experiments. SafeMut is able to provide accurate simulation of small variants in NGS data, thereby helping with the assessment of the ability to call these variants in a bioinformatics pipeline.
Guo Jingyu、Wang Sizhen、Zhao Xiaofei
Genetron Health (Beijing) Co. LtdGenetron Health (Beijing) Co. LtdGenetron Health (Beijing) Co. Ltd
生物科学研究方法、生物科学研究技术分子生物学
Guo Jingyu,Wang Sizhen,Zhao Xiaofei.SafeMut: UMI-aware variant simulator incorporating allele-fraction overdispersion in read editing[EB/OL].(2025-03-28)[2025-05-28].https://www.biorxiv.org/content/10.1101/2023.03.14.532524.点此复制
评论