Search-Based Correction of Reasoning Chains for Language Models
Search-Based Correction of Reasoning Chains for Language Models
Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we introduce a new self-correction framework that augments each reasoning step in a CoT with a latent variable indicating its veracity, enabling modeling of all possible truth assignments rather than assuming correctness throughout. To efficiently explore this expanded space, we introduce Search Corrector, a discrete search algorithm over boolean-valued veracity assignments. It efficiently performs otherwise intractable inference in the posterior distribution over veracity assignments by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time correction method facilitates supervised fine-tuning of an Amortized Corrector by providing pseudo-labels for veracity. The Amortized Corrector generalizes self-correction, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that Search Corrector reliably identifies errors in logical (ProntoQA) and mathematical reasoning (GSM8K) benchmarks. The Amortized Corrector achieves comparable zero-shot accuracy and improves final answer accuracy by up to 25%.
Minsu Kim、Jean-Pierre Falet、Oliver E. Richardson、Xiaoyin Chen、Moksh Jain、Sungjin Ahn、Sungsoo Ahn、Yoshua Bengio
计算技术、计算机技术
Minsu Kim,Jean-Pierre Falet,Oliver E. Richardson,Xiaoyin Chen,Moksh Jain,Sungjin Ahn,Sungsoo Ahn,Yoshua Bengio.Search-Based Correction of Reasoning Chains for Language Models[EB/OL].(2025-05-17)[2025-06-23].https://arxiv.org/abs/2505.11824.点此复制
评论