首页|Emergent LLM behaviors are observationally equivalent to data leakage

Emergent LLM behaviors are observationally equivalent to data leakage

来源：

英文摘要

Ashery et al. recently argue that large language models (LLMs), when paired to play a classic "naming game," spontaneously develop linguistic conventions reminiscent of human social norms. Here, we show that their results are better explained by data leakage: the models simply reproduce conventions they already encountered during pre-training. Despite the authors' mitigation measures, we provide multiple analyses demonstrating that the LLMs recognize the structure of the coordination game and recall its outcomes, rather than exhibit "emergent" conventions. Consequently, the observed behaviors are indistinguishable from memorization of the training corpus. We conclude by pointing to potential alternative strategies and reflecting more generally on the place of LLMs for social science models.

作者：Christopher Barrie、Petter T?rnberg

作者单位：

学科分类：语言学计算技术、计算机技术

推荐引用：Christopher Barrie,Petter T?rnberg.Emergent LLM behaviors are observationally equivalent to data leakage[EB/OL].(2025-05-26)[2025-07-02].https://arxiv.org/abs/2505.23796.点此复制

Emergent LLM behaviors are observationally equivalent to data leakage

Emergent LLM behaviors are observationally equivalent to data leakage

评论