首页|One-shot Entropy Minimization

One-shot Entropy Minimization

来源：

英文摘要

We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.

作者：Zitian Gao、Lynx Chen、Joey Zhou、Bryan Dai

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Zitian Gao,Lynx Chen,Joey Zhou,Bryan Dai.One-shot Entropy Minimization[EB/OL].(2025-05-26)[2025-07-02].https://arxiv.org/abs/2505.20282.点此复制

One-shot Entropy Minimization

One-shot Entropy Minimization

评论