|国家预印本平台
首页|L0: Reinforcement Learning to Become General Agents

L0: Reinforcement Learning to Become General Agents

L0: Reinforcement Learning to Become General Agents

来源:Arxiv_logoArxiv
英文摘要

Training large language models (LLMs) to act as autonomous agents for multi-turn, long-horizon tasks remains significant challenges in scalability and training efficiency. To address this, we introduce L-Zero (L0), a scalable, end-to-end training pipeline for general-purpose agents. Featuring a low-cost, extensible, and sandboxed concurrent agent worker pool, L0 lowers the barrier for applying reinforcement learning in complex environments. We also introduce NB-Agent, the agent scaffold within L0, which operates in a "code-as-action" fashion via a Read-Eval-Print-Loop (REPL). We evaluate L0 on factuality question-answering benchmarks. Our experiments demonstrate that a base model can develop robust problem-solving skills using solely Reinforcement Learning with Verifiable Rewards (RLVR). On the Qwen2.5-7B-Instruct model, our method boosts accuracy on SimpleQA from 30 % to 80 % and on HotpotQA from 22 % to 41 %. We have open-sourced the entire L0 system, including our L0 series models, the NB-Agent, a complete training pipeline, and the corresponding training recipes on (https://github.com/cmriat/l0).

Junjie Zhang、Jingyi Xi、Zhuoyang Song、Junyu Lu、Yuhua Ke、Ting Sun、Yukun Yang、Jiaxing Zhang、Songxin Zhang、Zejian Xie

计算技术、计算机技术

Junjie Zhang,Jingyi Xi,Zhuoyang Song,Junyu Lu,Yuhua Ke,Ting Sun,Yukun Yang,Jiaxing Zhang,Songxin Zhang,Zejian Xie.L0: Reinforcement Learning to Become General Agents[EB/OL].(2025-06-30)[2025-07-16].https://arxiv.org/abs/2506.23667.点此复制

评论