DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
Model quantization is a promising method for accelerating and compressing diffusion models. Nevertheless, since post-training quantization (PTQ) fails catastrophically at low-bit cases, quantization-aware training (QAT) is essential. Unfortunately, the wide range and time-varying activations in diffusion models sharply increase the complexity of quantization, making existing QAT methods inefficient. Equivalent scaling can effectively reduce activation range, but previous methods remain the overall quantization error unchanged. More critically, these methods significantly disrupt the original weight distribution, resulting in poor weight initialization and challenging convergence during QAT training. In this paper, we propose a novel QAT framework for diffusion models, called DilateQuant. Specifically, we propose Weight Dilation (WD) that maximally dilates the unsaturated in-channel weights to a constrained range through equivalent scaling. WD decreases the activation range while preserving the original weight range, which steadily reduces the quantization error and ensures model convergence. To further enhance accuracy and efficiency, we design a Temporal Parallel Quantizer (TPQ) to address the time-varying activations and introduce a Block-wise Knowledge Distillation (BKD) to reduce resource consumption in training. Extensive experiments demonstrate that DilateQuant significantly outperforms existing methods in terms of accuracy and efficiency. Code is available at http://github.com/BienLuky/DilateQuant .
Jianquan Li、Qingyi Gu、Xuewen Liu、Zhikai Li、Minhao Jiang、Mengjuan Chen
计算技术、计算机技术
Jianquan Li,Qingyi Gu,Xuewen Liu,Zhikai Li,Minhao Jiang,Mengjuan Chen.DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation[EB/OL].(2025-07-09)[2025-07-21].https://arxiv.org/abs/2409.14307.点此复制
评论