|国家预印本平台
首页|Strobemers: an alternative to k-mers for sequence comparison

Strobemers: an alternative to k-mers for sequence comparison

Strobemers: an alternative to k-mers for sequence comparison

来源:bioRxiv_logobioRxiv
英文摘要

K-mer-based methods are widely used in bioinformatics for various types of sequence comparison. However, a single mutation will mutate k consecutive k-mers and makes most k-mer based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, e.g., spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches due to the size of k. Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of linked minimizers. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce a higher match coverage across sequences. We further implement a proof-of-concept sequence matching tool StrobeMap, and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios. A reference implementation of our tool StrobeMap together with code for analyses is available at https://github.com/ksahlin/strobemers.

Sahlin Kristoffer

Department of Mathematics, Science for Life Laboratory, Stockholm University

10.1101/2021.01.28.428549

生物科学研究方法、生物科学研究技术分子生物学

k-mersminimizerssequence matchingdata structures

Sahlin Kristoffer.Strobemers: an alternative to k-mers for sequence comparison[EB/OL].(2025-03-28)[2025-07-01].https://www.biorxiv.org/content/10.1101/2021.01.28.428549.点此复制

评论