|国家预印本平台
首页|FAIT: Fault-Aware Fine-Tuning for Better Code Generation

FAIT: Fault-Aware Fine-Tuning for Better Code Generation

FAIT: Fault-Aware Fine-Tuning for Better Code Generation

来源:Arxiv_logoArxiv
英文摘要

Modern instruction-tuned large language models (LLMs) have made remarkable progress in code generation. However, these LLMs fine-tuned with standard supervised fine-tuning (SFT) sometimes generate plausible-looking but functionally incorrect code variants. This issue likely stems from the limitation of standard SFT, which treats all tokens equally during optimization and fails to emphasize the error-sensitive segments-specific code differences between correct implementations and similar incorrect variants. To address this problem, we propose Fault-Aware Fine-Tuning (FAIT), a novel fine-tuning technique that enhances LLMs' code generation by (1) extracting multi-granularity (line/token-level) differences between correct and incorrect yet similar implementations to identify error-sensitive segments, and (2) dynamically prioritizing those segments during training via dynamic loss weighting. Through extensive experiments on seven LLMs across three widely-used benchmarks, our method achieves an average relative improvement of 6.9% on pass@1 with just one epoch of training, with some enhanced 6.7B LLMs outperforming closed-source models, e.g., GPT-3.5-Turbo. Furthermore, our fine-tuning technique demonstrates strong generalization with performance improvements ranging from 3.8% to 19.1% across diverse instruction-tuned LLMs, and our ablation studies confirm the contributions of different granularities of differences and loss function components.

Lishui Fan、Zhongxin Liu、Haoye Wang、Lingfeng Bao、Xin Xia、Shanping Li

计算技术、计算机技术

Lishui Fan,Zhongxin Liu,Haoye Wang,Lingfeng Bao,Xin Xia,Shanping Li.FAIT: Fault-Aware Fine-Tuning for Better Code Generation[EB/OL].(2025-03-21)[2025-07-09].https://arxiv.org/abs/2503.16913.点此复制

评论