An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting
An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting
This paper describes a new asynchronous algorithm and implementation for the problem of k-mer counting (KC), which concerns quantifying the frequency of length k substrings in a DNA sequence. This operation is common to many computational biology workloads and can take up to 77% of the total runtime of de novo genome assembly. The performance and scalability of the current state-of-the-art distributed-memory KC algorithm are hampered by multiple rounds of Many-To-Many collectives. Therefore, we develop an asynchronous algorithm (DAKC) that uses fine-grained, asynchronous messages to obviate most of this global communication while utilizing network bandwidth efficiently via custom message aggregation protocols. DAKC can perform strong scaling up to 256 nodes (512 sockets / 6K cores) and can count k-mers up to 9x faster than the state-of-the-art distributed-memory algorithm, and up to 100x faster than the shared-memory alternative. We also provide an analytical model to understand the hardware resource utilization of our asynchronous KC algorithm and provide insights on the performance.
Souvadra Hati、Akihiro Hayashi、Richard Vuduc
生物科学研究方法、生物科学研究技术计算技术、计算机技术
Souvadra Hati,Akihiro Hayashi,Richard Vuduc.An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting[EB/OL].(2025-05-07)[2025-06-12].https://arxiv.org/abs/2505.04431.点此复制
评论