首页|Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

来源：

英文摘要

Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significant challenge, as existing methods either rely on costly manual annotations or fail to align closely with human assessments. In this paper, we propose an effective automated evaluation method based on the Torrance Test of Creative Writing (TTCW), which evaluates creativity as product. Our method employs a reference-based Likert-style approach, scoring generated creative texts relative to high-quality reference texts across various tests. Experimental results demonstrate that our method significantly improves the alignment between LLM evaluations and human assessments, achieving a pairwise accuracy of 0.75 (+15\%).

作者：Ruizhe Li、Chiwei Zhu、Benfeng Xu、Xiaorui Wang、Zhendong Mao

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Ruizhe Li,Chiwei Zhu,Benfeng Xu,Xiaorui Wang,Zhendong Mao.Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach[EB/OL].(2025-04-22)[2025-05-21].https://arxiv.org/abs/2504.15784.点此复制

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

评论