Self-Evolving Critique Abilities in Large Language Models
Self-Evolving Critique Abilities in Large Language Models
Despite their remarkable performance, Large Language Models (LLMs) face a critical challenge: providing feedback for tasks where human evaluation is difficult or where LLMs potentially outperform humans. In such scenarios, leveraging the critique ability of LLMs themselves - identifying and correcting flaws - shows considerable promise. This paper explores enhancing critique abilities of LLMs, noting that current approaches rely on human annotations or more powerful models, leaving the challenge of improving critique abilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that trains LLMs with self-generated data to evolve their critique abilities. To address the low quality of naively generated data, we propose a contrastive-critic approach that uses reference solutions during data synthesis to enhance the model's understanding of key concepts, and incorporates a self-validation scheme to ensure data quality. The final trained model operates without any reference solutions at inference time. Implemented with Qwen2.5-72B-Instruct, a leading LLM, SCRIT demonstrates consistent improvements across a wide range of benchmarks spanning both mathematical and scientific reasoning: achieving a 10.0\% relative gain in critique-correction accuracy and a 19.0\% relative improvement in error identification F1-score. Our analysis reveals that SCRIT's performance scales positively with data and model size and enables continuous improvement through multi-round iterations.
Fei Huang、Zhengyang Tang、Ziniu Li、Zhenyang Xiao、Tian Ding、Ruoyu Sun、Benyou Wang、Dayiheng Liu、Tianyu Liu、Bowen Yu、Junyang Lin
计算技术、计算机技术数学
Fei Huang,Zhengyang Tang,Ziniu Li,Zhenyang Xiao,Tian Ding,Ruoyu Sun,Benyou Wang,Dayiheng Liu,Tianyu Liu,Bowen Yu,Junyang Lin.Self-Evolving Critique Abilities in Large Language Models[EB/OL].(2025-08-04)[2025-08-19].https://arxiv.org/abs/2501.05727.点此复制
评论