首页|Towards interactive evaluations for interaction harms in human-AI systems

Towards interactive evaluations for interaction harms in human-AI systems

来源：

英文摘要

Current AI evaluation methods, which rely on static, model-only tests, fail to account for harms that emerge through sustained human-AI interaction. As AI systems proliferate and are increasingly integrated into real-world applications, this disconnect between evaluation approaches and actual usage becomes more significant. In this paper, we propose a shift towards evaluation based on \textit{interactional ethics}, which focuses on \textit{interaction harms} - issues like inappropriate parasocial relationships, social manipulation, and cognitive overreliance that develop over time through repeated interaction, rather than through isolated outputs. First, we discuss the limitations of current evaluation methods, which (1) are static, (2) assume a universal user experience, and (3) have limited construct validity. Drawing on research from human-computer interaction, natural language processing, and the social sciences, we present practical principles for designing interactive evaluations. These include ecologically valid interaction scenarios, human impact metrics, and diverse human participation approaches. Finally, we explore implementation challenges and open research questions for researchers, practitioners, and regulators aiming to integrate interactive evaluations into AI governance frameworks. This work lays the groundwork for developing more effective evaluation methods that better capture the complex dynamics between humans and AI systems.

作者：Markus Anderljung、Lujain Ibrahim、Saffron Huang、Umang Bhatt、Lama Ahmad

作者单位：

学科分类：计算技术、计算机技术信息传播、知识传播

推荐引用：Markus Anderljung,Lujain Ibrahim,Saffron Huang,Umang Bhatt,Lama Ahmad.Towards interactive evaluations for interaction harms in human-AI systems[EB/OL].(2025-07-30)[2025-08-06].https://arxiv.org/abs/2405.10632.点此复制

Towards interactive evaluations for interaction harms in human-AI systems

Towards interactive evaluations for interaction harms in human-AI systems

评论