|国家预印本平台
首页|SSM-RDU: A Reconfigurable Dataflow Unit for Long-Sequence State-Space Models

SSM-RDU: A Reconfigurable Dataflow Unit for Long-Sequence State-Space Models

SSM-RDU: A Reconfigurable Dataflow Unit for Long-Sequence State-Space Models

来源:Arxiv_logoArxiv
英文摘要

Long-sequence state-space models (SSMs) such as Hyena and Mamba replace the quadratic complexity of self-attention with more efficient FFT and scan operations. However, modern accelerators like GPUs are poorly suited to these non-GEMM workloads due to rigid execution models and specialization for dense matrix operations. This paper proposes architectural extensions to a baseline Reconfigurable Dataflow Unit (RDU) that efficiently support FFT-based and scan-based SSMs. By introducing lightweight interconnect enhancements within compute tiles, the extended RDU enables spatial mapping of FFT and scan dataflows with less than 1% area and power overhead. The resulting architecture achieves a 5.95X speedup over the GPU and a 1.95X speedup over the baseline RDU for Hyena, and a 2.12X and 1.75X speedup over the GPU and baseline RDU, respectively, for Mamba.

Sho Ko、Kunle Olukotun

计算技术、计算机技术

Sho Ko,Kunle Olukotun.SSM-RDU: A Reconfigurable Dataflow Unit for Long-Sequence State-Space Models[EB/OL].(2025-08-11)[2025-08-24].https://arxiv.org/abs/2503.22937.点此复制

评论