|国家预印本平台
首页|Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

来源:Arxiv_logoArxiv
英文摘要

In generative commonsense reasoning tasks such as CommonGen, generative large language models (LLMs) compose sentences that include all given concepts. However, when focusing on instruction-following capabilities, if a prompt specifies a concept order, LLMs must generate sentences that adhere to the specified order. To address this, we propose Ordered CommonGen, a benchmark designed to evaluate the compositional generalization and instruction-following abilities of LLMs. This benchmark measures ordered coverage to assess whether concepts are generated in the specified order, enabling a simultaneous evaluation of both abilities. We conducted a comprehensive analysis using 36 LLMs and found that, while LLMs generally understand the intent of instructions, biases toward specific concept order patterns often lead to low-diversity outputs or identical results even when the concept order is altered. Moreover, even the most instruction-compliant LLM achieved only about 75% ordered coverage, highlighting the need for improvements in both instruction-following and compositional generalization capabilities.

Yusuke Sakai、Hidetaka Kamigaito、Taro Watanabe

计算技术、计算机技术

Yusuke Sakai,Hidetaka Kamigaito,Taro Watanabe.Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability[EB/OL].(2025-06-18)[2025-07-02].https://arxiv.org/abs/2506.15629.点此复制

评论