Re-thinking Memory-Bound Limitations in CGRAs
Re-thinking Memory-Bound Limitations in CGRAs
Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at improving CGRA performance, energy efficiency, flexibility, and area utilization, under the idealistic assumption that kernels can access all data from Scratchpad Memory (SPM). However, certain complex workloads-particularly in fields like graph analytics, irregular database operations, and specialized forms of high-performance computing (e.g., unstructured mesh simulations)-exhibit irregular memory access patterns that hinder CGRA utilization, sometimes dropping below 1.5%, making the CGRA memory-bound. To address this challenge, we conduct a thorough analysis of the underlying causes of performance degradation, then propose a redesigned memory subsystem and refine the memory model. With both microarchitectural and theoretical optimization, our solution can effectively manage irregular memory accesses through CGRA-specific runahead execution mechanism and cache reconfiguration techniques. Our results demonstrate that we can achieve performance comparable to the original SPM-only system while requiring only 1.27% of the storage size. The runahead execution mechanism achieves an average 3.04x speedup (up to 6.91x), with cache reconfiguration technique providing an additional 6.02% improvement, significantly enhancing CGRA performance for irregular memory access patterns.
Xiangfeng Liu、Zhe Jiang、Anzhen Zhu、Xiaomeng Han、Mingsong Lyu、Qingxu Deng、Nan Guan
计算技术、计算机技术
Xiangfeng Liu,Zhe Jiang,Anzhen Zhu,Xiaomeng Han,Mingsong Lyu,Qingxu Deng,Nan Guan.Re-thinking Memory-Bound Limitations in CGRAs[EB/OL].(2025-08-13)[2025-08-24].https://arxiv.org/abs/2508.09570.点此复制
评论