首页|Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study

Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study

来源：

英文摘要

Academic paper review typically requires substantial time, expertise, and human resources. Large Language Models (LLMs) present a promising method for automating the review process due to their extensive training data, broad knowledge base, and relatively low usage cost. This work explores the feasibility of using LLMs for academic paper review by proposing an automated review system. The system integrates Retrieval Augmented Generation (RAG), the AutoGen multi-agent system, and Chain-of-Thought prompting to support tasks such as format checking, standardized evaluation, comment generation, and scoring. Experiments conducted on 290 submissions from the WASA 2024 conference using GPT-4o show that LLM-based review significantly reduces review time (average 2.48 hours) and cost (average \$104.28 USD). However, the similarity between LLM-selected papers and actual accepted papers remains low (average 38.6\%), indicating issues such as hallucination, lack of independent judgment, and retrieval preferences. Therefore, it is recommended to use LLMs as assistive tools to support human reviewers, rather than to replace them.

作者：Chuanlei Li、Xu Hu、Minghui Xu、Kun Li、Yue Zhang、Xiuzhen Cheng

作者单位：

学科分类：计算技术、计算机技术自动化技术、自动化技术设备

推荐引用：Chuanlei Li,Xu Hu,Minghui Xu,Kun Li,Yue Zhang,Xiuzhen Cheng.Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study[EB/OL].(2025-06-18)[2025-07-21].https://arxiv.org/abs/2506.17311.点此复制

Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study

Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study

评论