首页|Evaluating LLM-Generated Q&A Test: a Student-Centered Study

Evaluating LLM-Generated Q&A Test: a Student-Centered Study

来源：

英文摘要

This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o-mini-based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.

作者：Anna Wróblewska、Bartosz Grabek、Jakub ?wistak、Daniel Dan

作者单位：

学科分类：教育计算技术、计算机技术

推荐引用：Anna Wróblewska,Bartosz Grabek,Jakub ?wistak,Daniel Dan.Evaluating LLM-Generated Q&A Test: a Student-Centered Study[EB/OL].(2025-05-10)[2025-06-19].https://arxiv.org/abs/2505.06591.点此复制

Evaluating LLM-Generated Q&A Test: a Student-Centered Study

Evaluating LLM-Generated Q&A Test: a Student-Centered Study

评论