首页|Concise Reasoning via Reinforcement Learning

Concise Reasoning via Reinforcement Learning

来源：

英文摘要

Despite significant advancements in large language models (LLMs), a major drawback of reasoning models is their enormous token usage, which increases computational cost, resource requirements, and response time. In this work, we revisit the core principles of reinforcement learning (RL) and, through mathematical analysis, demonstrate that the tendency to generate lengthy responses arises inherently from RL-based optimization during training. This finding questions the prevailing assumption that longer responses inherently improve reasoning accuracy. Instead, we uncover a natural correlation between conciseness and accuracy that has been largely overlooked. Moreover, we show that introducing a secondary phase of RL post-training, using a small set of problems and limited resources, can significantly reduce a model's chain of thought while maintaining or even enhancing accuracy. Finally, we validate our conclusions through extensive experimental results.

作者：Mehdi Fatemi、Banafsheh Rafiee、Mingjie Tang、Kartik Talamadupula

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Mehdi Fatemi,Banafsheh Rafiee,Mingjie Tang,Kartik Talamadupula.Concise Reasoning via Reinforcement Learning[EB/OL].(2025-04-07)[2025-05-13].https://arxiv.org/abs/2504.05185.点此复制

Concise Reasoning via Reinforcement Learning

Concise Reasoning via Reinforcement Learning

评论