|国家预印本平台
首页|CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

来源:Arxiv_logoArxiv
英文摘要

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior works, CaMDN reduces the memory access by 33.4% on average and achieves a model speedup of up to 2.56$\times$ (1.88$\times$ on average).

Tianhao Cai、Liang Wang、Limin Xiao、Meng Han、Zeyu Wang、Lin Sun、Xiaojian Liao

计算技术、计算机技术

Tianhao Cai,Liang Wang,Limin Xiao,Meng Han,Zeyu Wang,Lin Sun,Xiaojian Liao.CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs[EB/OL].(2025-05-10)[2025-06-18].https://arxiv.org/abs/2505.06625.点此复制

评论