Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-efficient accelerator cores for data-parallel processing. Offloading a computation to the many-core acceleration fabric implies synchronization and communication overheads which can hamper overall performance and efficiency, particularly for small and fine-grained parallel tasks. In this work, we present a detailed, cycle-accurate quantitative analysis of the offload overheads on Occamy, an open-source massively parallel RISC-V based heterogeneous MPSoC. We study how the overheads scale with the number of accelerator cores. We explore an approach to drastically reduce these overheads by co-designing the hardware and the offload routines. Notably, we demonstrate that by incorporating multicast capabilities into the Network-on-Chip of a large (200+ cores) accelerator fabric we can improve offloaded application runtimes by as much as 2.3x, restoring more than 70% of the ideally attainable speedups. Finally, we propose a quantitative model to estimate the runtime of selected applications accounting for the offload overheads, with an error consistently below 15%.
Luca Colagrande、Luca Benini
微电子学、集成电路
Luca Colagrande,Luca Benini.Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization[EB/OL].(2025-05-09)[2025-06-27].https://arxiv.org/abs/2505.05911.点此复制
评论