|国家预印本平台
首页|Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment including Long Indels

Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment including Long Indels

Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment including Long Indels

来源:bioRxiv_logobioRxiv
英文摘要

Despite having important biological implications, insertion and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modelling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well-suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.

Anisimova Maria、Iglhaut Clara、Pe?erska J¨±lija、Gil Manuel

10.1101/2024.03.27.586611

遗传学分子生物学生物科学研究方法、生物科学研究技术

Anisimova Maria,Iglhaut Clara,Pe?erska J¨±lija,Gil Manuel.Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment including Long Indels[EB/OL].(2025-03-28)[2025-08-02].https://www.biorxiv.org/content/10.1101/2024.03.27.586611.点此复制

评论