|国家预印本平台
首页|Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism

Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism

Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism

来源:Arxiv_logoArxiv
英文摘要

Raw nanopore signal analysis is a common approach in genomics to provide fast and resource-efficient analysis without translating the signals to bases (i.e., without basecalling). However, existing solutions cannot interpret raw signals directly if a reference genome is unknown due to a lack of accurate mechanisms to handle increased noise in pairwise raw signal comparison. Our goal is to enable the direct analysis of raw signals without a reference genome. To this end, we propose Rawsamble, the first mechanism that can identify regions of similarity between all raw signal pairs, known as all-vs-all overlapping, using a hash-based search mechanism. We show that an off-the-shelf assembler (i.e., miniasm) can use the overlaps found by Rawsamble to construct genomes from scratch (i.e., de novo assembly). Our extensive evaluations across multiple genomes of varying sizes show that Rawsamble provides a significant speedup (on average by 4.48x and up to 23.10x) and reduces peak memory usage (on average by 4.33x and up to by 22.00x) compared to a conventional genome assembly pipeline using the state-of-the-art tools for basecalling (Dorado's fastest mode) and overlapping (minimap2) on a CPU. We find that 30.94% of overlapping pairs generated by Rawsamble are identical to those generated by minimap2. We further evaluate the benefits and directions that this new overlapping approach can enable in two ways. First, by constructing and evaluating the assembly graphs from the Rawsamble overlaps, we show that we can construct accurate and contiguous assembly segments (unitigs) up to 2.3 million bases in length (almost half the genome length of E. coli). Second, we identify previously unexplored directions that can be enabled by finding overlaps and constructing de novo assemblies as well as the challenges to tackle these future directions. Source code: https://github.com/CMU-SAFARI/RawHash

Stefano Mercogliano、Nika Mansouri Ghiasi、Sayan Goswami、Harun Mustafa、Can Firtina、Maximilian Mordig、Onur Mutlu、Furkan Eris、Joël Lindegger、Andre Kahles

生物科学研究方法、生物科学研究技术

Stefano Mercogliano,Nika Mansouri Ghiasi,Sayan Goswami,Harun Mustafa,Can Firtina,Maximilian Mordig,Onur Mutlu,Furkan Eris,Joël Lindegger,Andre Kahles.Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism[EB/OL].(2025-07-18)[2025-08-16].https://arxiv.org/abs/2410.17801.点此复制

评论