CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
Code reasoning is a fundamental capability for large language models (LLMs) in the code domain. It involves understanding and predicting a program's execution behavior, such as determining the output for a given input or whether a specific statement will be executed. This capability is essential for downstream tasks like debugging, code generation, and program repair. Prior approaches mainly rely on supervised fine-tuning to improve performance in code reasoning tasks. However, they often show limited gains and fail to generalize across diverse scenarios. We argue this is due to two core issues: the low quality of training data and the limitations of supervised fine-tuning, which struggles to teach general reasoning skills. To address these challenges, we propose CodeReasoner, a framework that spans both dataset construction and a two-stage training process. First, we introduce a method to construct datasets that focus on the core execution logic of Python programs. Next, we apply instruction tuning to inject execution-specific knowledge distilled from a powerful teacher model. We then enhance reasoning and generalization through GRPO reinforcement learning on top of the fine-tuned model. Experiments on three widely-used code reasoning benchmarks show that CodeReasoner improves performance by 27.1% to 40.2% over prior methods using a 7B model. Notably, the 7B model matches GPT-4o on key tasks like input/output and coverage prediction. When scaled to 14B, CodeReasoner outperforms GPT-4o across all benchmarks. Ablation studies confirm the effectiveness of each training stage and highlight the importance of reasoning chains.
Lingxiao Tang、He Ye、Zhongxin Liu、Xiaoxue Ren、Lingfeng Bao
计算技术、计算机技术
Lingxiao Tang,He Ye,Zhongxin Liu,Xiaoxue Ren,Lingfeng Bao.CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning[EB/OL].(2025-07-23)[2025-08-23].https://arxiv.org/abs/2507.17548.点此复制
评论