Echtvar: Compressed variant representation for rapid annotation and filtering of SNPs and indels
Echtvar: Compressed variant representation for rapid annotation and filtering of SNPs and indels
Abstract Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, including position and alleles are encoded into 32-bits–half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license.
Pedersen Brent S.、de Ridder Jeroen
Center for Molecular Medicine, University Medical Center Utrecht||Oncode InstituteCenter for Molecular Medicine, University Medical Center Utrecht||Oncode Institute
遗传学计算技术、计算机技术分子生物学
Pedersen Brent S.,de Ridder Jeroen.Echtvar: Compressed variant representation for rapid annotation and filtering of SNPs and indels[EB/OL].(2025-03-28)[2025-05-07].https://www.biorxiv.org/content/10.1101/2022.04.15.488439.点此复制
评论