首页|Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

来源：

英文摘要

Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter's inputs and outputs while increasing the adapter's rank to alleviate underfitting during fine-tuning. For low-precision deployment, we propose Quantization-Aware fine-tuning with Balanced Low-Rank Adaptation (QA-BLoRA), which aligns with the block-wise quantization and facilitates quantization-aware fine-tuning of low-rank adaptation based on the parameter merging of Q-BLoRA. Both Q-BLoRA and QA-BLoRA are easily implemented and offer the following optimizations: (i) Q-BLoRA consistently achieves state-of-the-art accuracy compared to baselines and other variants; (ii) QA-BLoRA enables the direct generation of low-precision inference models, which exhibit significant performance improvements over other low-precision models. We validate the effectiveness of Q-BLoRA and QA-BLoRA across various models and scenarios. Code will be made available at \href{https://github.com/xiaocaigou/qbaraqahira}{https://github.com/xiaocaigou/qbaraqahira}

作者：Ao Shen、Qiang Wang、Zhiquan Lai、Dongsheng Li、Xionglve Li

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Ao Shen,Qiang Wang,Zhiquan Lai,Dongsheng Li,Xionglve Li.Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance[EB/OL].(2025-07-22)[2025-08-06].https://arxiv.org/abs/2407.17029.点此复制

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

评论