|国家预印本平台
首页|Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

来源:Arxiv_logoArxiv
英文摘要

Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter's inputs and outputs while increasing the adapter's rank to alleviate underfitting during fine-tuning. For low-precision deployment, we propose Quantization-Aware fine-tuning with Balanced Low-Rank Adaptation (QA-BLoRA), which aligns with the block-wise quantization and facilitates quantization-aware fine-tuning of low-rank adaptation based on the parameter merging of Q-BLoRA. Both Q-BLoRA and QA-BLoRA are easily implemented and offer the following optimizations: (i) Q-BLoRA consistently achieves state-of-the-art accuracy compared to baselines and other variants; (ii) QA-BLoRA enables the direct generation of low-precision inference models, which exhibit significant performance improvements over other low-precision models. We validate the effectiveness of Q-BLoRA and QA-BLoRA across various models and scenarios. Code will be made available at \href{https://github.com/xiaocaigou/qbaraqahira}{https://github.com/xiaocaigou/qbaraqahira}

Ao Shen、Qiang Wang、Zhiquan Lai、Dongsheng Li、Xionglve Li

计算技术、计算机技术

Ao Shen,Qiang Wang,Zhiquan Lai,Dongsheng Li,Xionglve Li.Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance[EB/OL].(2025-07-22)[2025-08-06].https://arxiv.org/abs/2407.17029.点此复制

评论