SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Recent advancements in Large Language Models (LLMs) have created new opportunities to enhance performance on complex reasoning tasks by leveraging test-time computation. However, existing parallel scaling methods, such as repeated sampling or reward model scoring, often suffer from premature convergence and high costs due to task-specific reward model training, while sequential methods like SELF-REFINE cannot effectively leverage increased compute. This paper introduces Self-Enhanced Test-Time Scaling (SETS), a new approach that overcomes these limitations by strategically combining parallel and sequential techniques. SETS exploits the inherent self-verification and self-correction capabilities of LLMs, unifying sampling, verification, and correction within a single framework. This innovative design facilitates efficient and scalable test-time computation for enhanced performance on complex tasks. Our comprehensive experimental results on challenging benchmarks spanning planning, reasoning, math, and coding demonstrate that SETS achieves significant performance improvements and more advantageous test-time scaling behavior than the alternatives.
Jiefeng Chen、Xinyun Chen、Jie Ren、Chengrun Yang、Ruoxi Sun、Jinsung Yoon、Sercan ? Ar?k
计算技术、计算机技术
Jiefeng Chen,Xinyun Chen,Jie Ren,Chengrun Yang,Ruoxi Sun,Jinsung Yoon,Sercan ? Ar?k.SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling[EB/OL].(2025-01-31)[2025-06-30].https://arxiv.org/abs/2501.19306.点此复制
评论