首页|Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

来源：

英文摘要

Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate "thinking" steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.

作者：Haotian Wang、Han Zhao、Shuaiting Chen、Xiaoyu Tian、Sitong Zhao、Yunjie Ji、Yiping Peng、Xiangang Li

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Haotian Wang,Han Zhao,Shuaiting Chen,Xiaoyu Tian,Sitong Zhao,Yunjie Ji,Yiping Peng,Xiangang Li.Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability[EB/OL].(2025-04-13)[2025-05-11].https://arxiv.org/abs/2504.09639.点此复制

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

评论