|国家预印本平台
首页|Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

来源:Arxiv_logoArxiv
英文摘要

Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate "thinking" steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.

Haotian Wang、Han Zhao、Shuaiting Chen、Xiaoyu Tian、Sitong Zhao、Yunjie Ji、Yiping Peng、Xiangang Li

计算技术、计算机技术

Haotian Wang,Han Zhao,Shuaiting Chen,Xiaoyu Tian,Sitong Zhao,Yunjie Ji,Yiping Peng,Xiangang Li.Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability[EB/OL].(2025-04-13)[2025-05-11].https://arxiv.org/abs/2504.09639.点此复制

评论