|国家预印本平台
首页|KoLA: Carefully Benchmarking World Knowledge of Large Language Models

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

来源:Arxiv_logoArxiv
英文摘要

The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbf{ability modeling}, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For \textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate $28$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.

Hao Peng、Xin Lv、Kaisheng Zeng、Zhili Wu、Xiaozhi Wang、Hanming Li、Juanzi Li、Weikai Li、Yuan Yao、Lei Hou、Daniel Zhang-Li、Yantao Liu、Ning Ding、Yong Guan、Zijun Yao、Kaifeng Yun、Ji Qi、Xiaohan Zhang、Jifan Yu、Chunyang Li、Shulin Cao、Yu Gu、Jie Tang、Nianyi Lin、Linlu Gong、Yunjia Qi、Hailong Jin、Bin Xu、Zheyuan Zhang、Yushi Bai、Zhiyuan Liu、Jinxin Liu、Shangqing Tu、Amy Xin、Jianhui Chen

信息传播、知识传播科学、科学研究计算技术、计算机技术

Hao Peng,Xin Lv,Kaisheng Zeng,Zhili Wu,Xiaozhi Wang,Hanming Li,Juanzi Li,Weikai Li,Yuan Yao,Lei Hou,Daniel Zhang-Li,Yantao Liu,Ning Ding,Yong Guan,Zijun Yao,Kaifeng Yun,Ji Qi,Xiaohan Zhang,Jifan Yu,Chunyang Li,Shulin Cao,Yu Gu,Jie Tang,Nianyi Lin,Linlu Gong,Yunjia Qi,Hailong Jin,Bin Xu,Zheyuan Zhang,Yushi Bai,Zhiyuan Liu,Jinxin Liu,Shangqing Tu,Amy Xin,Jianhui Chen.KoLA: Carefully Benchmarking World Knowledge of Large Language Models[EB/OL].(2023-06-15)[2025-08-02].https://arxiv.org/abs/2306.09296.点此复制

评论