首页|Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use

Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use

来源：

英文摘要

Chinese idioms (Chengyu) are concise four-character expressions steeped in history and culture, whose literal translations often fail to capture their full meaning. This complexity makes them challenging for language models to interpret and use correctly. Existing benchmarks focus on narrow tasks - multiple-choice cloze tests, isolated translation, or simple paraphrasing. We introduce Chengyu-Bench, a comprehensive benchmark featuring three tasks: (1) Evaluative Connotation, classifying idioms as positive or negative; (2) Appropriateness, detecting incorrect idiom usage in context; and (3) Open Cloze, filling blanks in longer passages without options. Chengyu-Bench comprises 2,937 human-verified examples covering 1,765 common idioms sourced from diverse corpora. We evaluate leading LLMs and find they achieve over 95% accuracy on Evaluative Connotation, but only ~85% on Appropriateness and ~40% top-1 accuracy on Open Cloze. Error analysis reveals that most mistakes arise from fundamental misunderstandings of idiom meanings. Chengyu-Bench demonstrates that while LLMs can reliably gauge idiom sentiment, they still struggle to grasp the cultural and contextual nuances essential for proper usage. The benchmark and source code are available at: https://github.com/sofyc/ChengyuBench.

作者：Yicheng Fu、Zhemin Huang、Liuxin Yang、Yumeng Lu、Zhongdongming Dai

作者单位：

学科分类：汉语语言学

推荐引用：Yicheng Fu,Zhemin Huang,Liuxin Yang,Yumeng Lu,Zhongdongming Dai.Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use[EB/OL].(2025-06-22)[2025-07-16].https://arxiv.org/abs/2506.18105.点此复制

Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use

Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use

评论