MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
The rapid advancement of large language models (LLMs) has significantly propelled progress in natural language processing (NLP). However, their effectiveness in specialized, low-resource domains-such as Arabic legal contexts-remains limited. This paper introduces MizanQA (pronounced Mizan, meaning "scale" in Arabic, a universal symbol of justice), a benchmark designed to evaluate LLMs on Moroccan legal question answering (QA) tasks, characterised by rich linguistic and legal complexity. The dataset draws on Modern Standard Arabic, Islamic Maliki jurisprudence, Moroccan customary law, and French legal influences. Comprising over 1,700 multiple-choice questions, including multi-answer formats, MizanQA captures the nuances of authentic legal reasoning. Benchmarking experiments with multilingual and Arabic-focused LLMs reveal substantial performance gaps, highlighting the need for tailored evaluation metrics and culturally grounded, domain-specific LLM development.
Adil Bahaj、Mounir Ghogho
闪-含语系(阿非罗-亚细亚语系)法律常用外国语
Adil Bahaj,Mounir Ghogho.MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering[EB/OL].(2025-08-22)[2025-09-06].https://arxiv.org/abs/2508.16357.点此复制
评论