Performance Assessment of Variant Calling Pipelines using Human Whole Exome Sequencing and Simulated data
Performance Assessment of Variant Calling Pipelines using Human Whole Exome Sequencing and Simulated data
Abstract The whole exome sequencing (WES) is a time-consuming technology in the identification of clinical variants and it demands the accurate variant caller tools. The currently available tools compromise accuracy in predicting the specific types of variants. Thus, it is important to find out the possible combination of best aligner-variant caller tools for detecting SNVs and InDels separately. Moreover, many important aspects of InDel detection are not overlooked while comparing the performance of tools. One such aspect is the detection of InDels with respect to base pair length. To assess the performance of variant (especially InDels) caller in combination with different aligners, 20 automated pipelines were developed and evaluated using gold reference variant dataset (NA12878) from Genome in a Bottle (GiaB) consortium of human whole exome sequencing. Additionally, the simulated exome data from two human reference genome sequences (GRCh37 and GRCh38) were used to compare the performance of the pipelines. By analyzing various performance metrices, we observed that BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for Indels. Altogether, DeepVariant with BWA and Novoalign performed best. Further, we showed that merging the top performing pipelines improved the accurate variant call set. Collectively, this study would help the investigators to effectively improve the sensitivity and accuracy in detecting specific variants.
Kumaran Manojkumar、Devarajan Bharanidharan、Subramanian Umadevi
Department of Bioinformatics, Aravind Medical Research Foundation||School of Chemical and Biotechnology, SASTRA (deemed to be universityDepartment of Bioinformatics, Aravind Medical Research FoundationDepartment of Bioinformatics, Aravind Medical Research Foundation
医学研究方法生物科学研究方法、生物科学研究技术
Whole exome sequencingSimulated exome dataVariant calling pipelinesSNVs and InDels Base pair length.
Kumaran Manojkumar,Devarajan Bharanidharan,Subramanian Umadevi.Performance Assessment of Variant Calling Pipelines using Human Whole Exome Sequencing and Simulated data[EB/OL].(2025-03-28)[2025-04-28].https://www.biorxiv.org/content/10.1101/359109.点此复制
评论