首页|Low-bit Model Quantization for Deep Neural Networks: A Survey

Low-bit Model Quantization for Deep Neural Networks: A Survey

来源：

英文摘要

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an effective weight-lighting technique, has become an indispensable procedure in the whole deployment pipeline. The essence of quantization acceleration is the conversion from continuous floating-point numbers to discrete integer ones, which significantly speeds up the memory I/O and calculation, i.e., addition and multiplication. However, performance degradation also comes with the conversion because of the loss of precision. Therefore, it has become increasingly popular and critical to investigate how to perform the conversion and how to compensate for the information loss. This article surveys the recent five-year progress towards low-bit quantization on DNNs. We discuss and compare the state-of-the-art quantization methods and classify them into 8 main categories and 24 sub-categories according to their core techniques. Furthermore, we shed light on the potential research opportunities in the field of model quantization. A curated list of model quantization is provided at https://github.com/Kai-Liu001/Awesome-Model-Quantization.

作者：Kai Liu、Qian Zheng、Kaiwen Tao、Zhiteng Li、Haotong Qin、Wenbo Li、Yong Guo、Xianglong Liu、Linghe Kong、Guihai Chen、Yulun Zhang、Xiaokang Yang

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Kai Liu,Qian Zheng,Kaiwen Tao,Zhiteng Li,Haotong Qin,Wenbo Li,Yong Guo,Xianglong Liu,Linghe Kong,Guihai Chen,Yulun Zhang,Xiaokang Yang.Low-bit Model Quantization for Deep Neural Networks: A Survey[EB/OL].(2025-05-08)[2025-06-27].https://arxiv.org/abs/2505.05530.点此复制

Low-bit Model Quantization for Deep Neural Networks: A Survey

Low-bit Model Quantization for Deep Neural Networks: A Survey

评论