Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new ``search-and-refine-during-think'' paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.
Yaorui Shi、Sihang Li、Chang Wu、Zhiyuan Liu、Junfeng Fang、Hengxing Cai、An Zhang、Xiang Wang
计算技术、计算机技术
Yaorui Shi,Sihang Li,Chang Wu,Zhiyuan Liu,Junfeng Fang,Hengxing Cai,An Zhang,Xiang Wang.Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs[EB/OL].(2025-05-16)[2025-06-13].https://arxiv.org/abs/2505.11277.点此复制
评论