Aggregated Individual Reporting for Post-Deployment Evaluation
Aggregated Individual Reporting for Post-Deployment Evaluation
The need for developing model evaluations beyond static benchmarking, especially in the post-deployment phase, is now well-understood. At the same time, concerns about the concentration of power in deployed AI systems have sparked a keen interest in 'democratic' or 'public' AI. In this work, we bring these two ideas together by proposing mechanisms for aggregated individual reporting (AIR), a framework for post-deployment evaluation that relies on individual reports from the public. An AIR mechanism allows those who interact with a specific, deployed (AI) system to report when they feel that they may have experienced something problematic; these reports are then aggregated over time, with the goal of evaluating the relevant system in a fine-grained manner. This position paper argues that individual experiences should be understood as an integral part of post-deployment evaluation, and that the scope of our proposed aggregated individual reporting mechanism is a practical path to that end. On the one hand, individual reporting can identify substantively novel insights about safety and performance; on the other, aggregation can be uniquely useful for informing action. From a normative perspective, the post-deployment phase completes a missing piece in the conversation about 'democratic' AI. As a pathway to implementation, we provide a workflow of concrete design decisions and pointers to areas requiring further research and methodological development.
Jessica Dai、Inioluwa Deborah Raji、Benjamin Recht、Irene Y. Chen
信息传播、知识传播计算技术、计算机技术
Jessica Dai,Inioluwa Deborah Raji,Benjamin Recht,Irene Y. Chen.Aggregated Individual Reporting for Post-Deployment Evaluation[EB/OL].(2025-06-22)[2025-07-19].https://arxiv.org/abs/2506.18133.点此复制
评论