首页|Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

来源：

Arxiv

英文摘要

In generative commonsense reasoning tasks such as CommonGen, generative large language models (LLMs) compose sentences that include all given concepts. However, when focusing on instruction-following capabilities, if a prompt specifies a concept order, LLMs must generate sentences that adhere to the specified order. To address this, we propose Ordered CommonGen, a benchmark designed to evaluate the compositional generalization and instruction-following abilities of LLMs. This benchmark measures ordered coverage to assess whether concepts are generated in the specified order, enabling a simultaneous evaluation of both abilities. We conducted a comprehensive analysis using 36 LLMs and found that, while LLMs generally understand the intent of instructions, biases toward specific concept order patterns often lead to low-diversity outputs or identical results even when the concept order is altered. Moreover, even the most instruction-compliant LLM achieved only about 75% ordered coverage, highlighting the need for improvements in both instruction-following and compositional generalization capabilities.

作者：Yusuke Sakai、Hidetaka Kamigaito、Taro Watanabe

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Yusuke Sakai,Hidetaka Kamigaito,Taro Watanabe.Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability[EB/OL].(2025-06-18)[2025-07-02].https://arxiv.org/abs/2506.15629.点此复制

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

评论