首页|Prefix-free parsing for merging big BWTs

Prefix-free parsing for merging big BWTs

来源：

英文摘要

When building Burrows-Wheeler Transforms (BWTs) of truly huge datasets, prefix-free parsing (PFP) can use an unreasonable amount of memory. In this paper we show how if a dataset can be broken down into small datasets that are not very similar to each other -- such as collections of many copies of genomes of each of several species, or collections of many copies of each of the human chromosomes -- then we can drastically reduce PFP's memory footprint by building the BWTs of the small datasets and then merging them into the BWT of the whole dataset.

作者：Zsuzsanna Liptak、Giovanni Manzini、Francesco Masillo、Vikram Shivakumar、Diego Diaz-Dominguez、Travis Gagie、Veronica Guerrini、Ben Langmead

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Zsuzsanna Liptak,Giovanni Manzini,Francesco Masillo,Vikram Shivakumar,Diego Diaz-Dominguez,Travis Gagie,Veronica Guerrini,Ben Langmead.Prefix-free parsing for merging big BWTs[EB/OL].(2025-06-03)[2025-06-18].https://arxiv.org/abs/2506.03294.点此复制

Prefix-free parsing for merging big BWTs

Prefix-free parsing for merging big BWTs

评论