|国家预印本平台
首页|LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

来源:Arxiv_logoArxiv
英文摘要

Large Language Model (LLM) inference becomes resource-intensive, prompting a shift toward low-bit model weights to reduce the memory footprint and improve efficiency. Such low-bit LLMs necessitate the mixed-precision matrix multiplication (mpGEMM), an important yet underexplored operation involving the multiplication of lower-precision weights with higher-precision activations. Off-the-shelf hardware does not support this operation natively, leading to indirect, thus inefficient, dequantization-based implementations. In this paper, we study the lookup table (LUT)-based approach for mpGEMM and find that a conventional LUT implementation fails to achieve the promised gains. To unlock the full potential of LUT-based mpGEMM, we propose LUT Tensor Core, a software-hardware co-design for low-bit LLM inference. LUT Tensor Core differentiates itself from conventional LUT designs through: 1) software-based optimizations to minimize table precompute overhead and weight reinterpretation to reduce table storage; 2) a LUT-based Tensor Core hardware design with an elongated tiling shape to maximize table reuse and a bit-serial design to support diverse precision combinations in mpGEMM; 3) a new instruction set and compilation optimizations for LUT-based mpGEMM. LUT Tensor Core significantly outperforms existing pure software LUT implementations and achieves a 1.44$\times$ improvement in compute density and energy efficiency compared to previous state-of-the-art LUT-based accelerators.

Jilong Xue、Lingxiao Ma、Zhiwen Mo、Ting Cao、Fan Yang、Mao Yang、Naifeng Jing、Lei Wang、Jianyu Wei、Zhichen Zeng、Shijie Cao

10.1145/3695053.3731057

计算技术、计算机技术

Jilong Xue,Lingxiao Ma,Zhiwen Mo,Ting Cao,Fan Yang,Mao Yang,Naifeng Jing,Lei Wang,Jianyu Wei,Zhichen Zeng,Shijie Cao.LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference[EB/OL].(2025-07-28)[2025-08-18].https://arxiv.org/abs/2408.06003.点此复制

评论