Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate "thinking" steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.
Haotian Wang、Han Zhao、Shuaiting Chen、Xiaoyu Tian、Sitong Zhao、Yunjie Ji、Yiping Peng、Xiangang Li
计算技术、计算机技术
Haotian Wang,Han Zhao,Shuaiting Chen,Xiaoyu Tian,Sitong Zhao,Yunjie Ji,Yiping Peng,Xiangang Li.Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability[EB/OL].(2025-04-13)[2025-05-11].https://arxiv.org/abs/2504.09639.点此复制
评论