|国家预印本平台
首页|AfroBench: How Good are Large Language Models on African Languages?

AfroBench: How Good are Large Language Models on African Languages?

AfroBench: How Good are Large Language Models on African Languages?

来源:Arxiv_logoArxiv
英文摘要

Large-scale multilingual evaluations, such as MEGA, often include only a handful of African languages due to the scarcity of high-quality evaluation data and the limited discoverability of existing African datasets. This lack of representation hinders comprehensive LLM evaluation across a diverse range of languages and tasks. To address these challenges, we introduce AfroBench -- a multi-task benchmark for evaluating the performance of LLMs across 64 African languages, 15 tasks and 22 datasets. AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task. We present results comparing the performance of prompting LLMs to fine-tuned baselines based on BERT and T5-style models. Our results suggest large gaps in performance between high-resource languages, such as English, and African languages across most tasks; but performance also varies based on the availability of monolingual data resources. Our findings confirm that performance on African languages continues to remain a hurdle for current LLMs, underscoring the need for additional efforts to close this gap. https://mcgill-nlp.github.io/AfroBench/

Kelechi Ogueji、Odunayo Ogundepo、Akintunde Oladipo、Jimmy Lin、David Ifeoluwa Adelani、Jessica Ojo、Pontus Stenetorp

非洲诸语言语言学常用外国语

Kelechi Ogueji,Odunayo Ogundepo,Akintunde Oladipo,Jimmy Lin,David Ifeoluwa Adelani,Jessica Ojo,Pontus Stenetorp.AfroBench: How Good are Large Language Models on African Languages?[EB/OL].(2023-11-14)[2025-05-06].https://arxiv.org/abs/2311.07978.点此复制

评论