Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
In generative commonsense reasoning tasks such as CommonGen, generative large language models (LLMs) compose sentences that include all given concepts. However, when focusing on instruction-following capabilities, if a prompt specifies a concept order, LLMs must generate sentences that adhere to the specified order. To address this, we propose Ordered CommonGen, a benchmark designed to evaluate the compositional generalization and instruction-following abilities of LLMs. This benchmark measures ordered coverage to assess whether concepts are generated in the specified order, enabling a simultaneous evaluation of both abilities. We conducted a comprehensive analysis using 36 LLMs and found that, while LLMs generally understand the intent of instructions, biases toward specific concept order patterns often lead to low-diversity outputs or identical results even when the concept order is altered. Moreover, even the most instruction-compliant LLM achieved only about 75% ordered coverage, highlighting the need for improvements in both instruction-following and compositional generalization capabilities.
Yusuke Sakai、Hidetaka Kamigaito、Taro Watanabe
计算技术、计算机技术
Yusuke Sakai,Hidetaka Kamigaito,Taro Watanabe.Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability[EB/OL].(2025-06-18)[2025-07-02].https://arxiv.org/abs/2506.15629.点此复制
评论