首页|Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

来源：

英文摘要

Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios. ACE introduces a dual-role trajectory refinement mechanism where LLMs act as both Policy Actor and Value Critic during RL's training: the Actor refines suboptimal actions via multi-step reasoning and environment validation, while the Critic performs temporal credit assignment through trajectory-level reward shaping. Concurrently, RL agent enhances LLMs' task-specific decision-making with high-quality fine-tuning datasets generated via prioritized experience replay. Through extensive experiments across multiple power grid operation challenges with action spaces exceeding 60K discrete actions, ACE demonstrates superior performance over existing RL methods and LLM-based methods.

作者：Xu Wan、Wenyue Xu、Chao Yang、Mingyang Sun

作者单位：

学科分类：发电、发电厂输配电工程自动化技术、自动化技术设备

推荐引用：Xu Wan,Wenyue Xu,Chao Yang,Mingyang Sun.Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making[EB/OL].(2025-06-03)[2025-07-02].https://arxiv.org/abs/2506.02522.点此复制

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

评论