Description and Comparative Analysis of QuRE: A New Industrial Requirements Quality Dataset
Description and Comparative Analysis of QuRE: A New Industrial Requirements Quality Dataset
Requirements quality is central to successful software and systems engineering. Empirical research on quality defects in natural language requirements relies heavily on datasets, ideally as realistic and representative as possible. However, such datasets are often inaccessible, small, or lack sufficient detail. This paper introduces QuRE (Quality in Requirements), a new dataset comprising 2,111 industrial requirements that have been annotated through a real-world review process. Previously used for over five years as part of an industrial contract, this dataset is now being released to the research community. In this work, we furthermore provide descriptive statistics on the dataset, including measures such as lexical diversity and readability, and compare it to existing requirements datasets and synthetically generated requirements. In contrast to synthetic datasets, QuRE is linguistically similar to existing ones. However, this dataset comes with a detailed context description, and its labels have been created and used systematically and extensively in an industrial context over a period of close to a decade. Our goal is to foster transparency, comparability, and empirical rigor by supporting the development of a common gold standard for requirements quality datasets. This, in turn, will enable more sound and collaborative research efforts in the field.
Henning Femmer、Frank Houdek、Max Unterbusch、Andreas Vogelsang
计算技术、计算机技术
Henning Femmer,Frank Houdek,Max Unterbusch,Andreas Vogelsang.Description and Comparative Analysis of QuRE: A New Industrial Requirements Quality Dataset[EB/OL].(2025-08-12)[2025-08-24].https://arxiv.org/abs/2508.08868.点此复制
评论