|国家预印本平台
首页|Efficient Annotator Reliability Assessment with EffiARA

Efficient Annotator Reliability Assessment with EffiARA

Efficient Annotator Reliability Assessment with EffiARA

来源:Arxiv_logoArxiv
英文摘要

Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at https://github.com/MiniEggz/EffiARA and the webtool is publicly accessible at https://effiara.gate.ac.uk.

Owen Cook、Jake Vasilakes、Ian Roberts、Xingyi Song

计算技术、计算机技术

Owen Cook,Jake Vasilakes,Ian Roberts,Xingyi Song.Efficient Annotator Reliability Assessment with EffiARA[EB/OL].(2025-04-01)[2025-05-01].https://arxiv.org/abs/2504.00589.点此复制

评论