|国家预印本平台
首页|Concise Reasoning via Reinforcement Learning

Concise Reasoning via Reinforcement Learning

Concise Reasoning via Reinforcement Learning

来源:Arxiv_logoArxiv
英文摘要

Despite significant advancements in large language models (LLMs), a major drawback of reasoning models is their enormous token usage, which increases computational cost, resource requirements, and response time. In this work, we revisit the core principles of reinforcement learning (RL) and, through mathematical analysis, demonstrate that the tendency to generate lengthy responses arises inherently from RL-based optimization during training. This finding questions the prevailing assumption that longer responses inherently improve reasoning accuracy. Instead, we uncover a natural correlation between conciseness and accuracy that has been largely overlooked. Moreover, we show that introducing a secondary phase of RL post-training, using a small set of problems and limited resources, can significantly reduce a model's chain of thought while maintaining or even enhancing accuracy. Finally, we validate our conclusions through extensive experimental results.

Mehdi Fatemi、Banafsheh Rafiee、Mingjie Tang、Kartik Talamadupula

计算技术、计算机技术

Mehdi Fatemi,Banafsheh Rafiee,Mingjie Tang,Kartik Talamadupula.Concise Reasoning via Reinforcement Learning[EB/OL].(2025-04-07)[2025-05-13].https://arxiv.org/abs/2504.05185.点此复制

评论