|国家预印本平台
首页|APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

来源:Arxiv_logoArxiv
英文摘要

DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of high-precision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by 28-87%. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.

Yonghao Tan、Pingcheng Dong、Yongkun Wu、Yu Liu、Xuejiao Liu、Peng Luo、Shih-Yang Liu、Xijie Huang、Dong Zhang、Luhong Liang、Kwang-Ting Cheng

计算技术、计算机技术

Yonghao Tan,Pingcheng Dong,Yongkun Wu,Yu Liu,Xuejiao Liu,Peng Luo,Shih-Yang Liu,Xijie Huang,Dong Zhang,Luhong Liang,Kwang-Ting Cheng.APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design[EB/OL].(2025-04-10)[2025-06-24].https://arxiv.org/abs/2505.03748.点此复制

评论