|国家预印本平台
首页|ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

来源:Arxiv_logoArxiv
英文摘要

Large language models (LLMs) increasingly power mental-health chatbots, yet the field still lacks a scalable, theory-grounded way to decide which model is most effective to deploy. We present ESC-Judge, the first end-to-end evaluation framework that (i) grounds head-to-head comparisons of emotional-support LLMs in Clara Hill's established Exploration-Insight-Action counseling model, providing a structured and interpretable view of performance, and (ii) fully automates the evaluation pipeline at scale. ESC-Judge operates in three stages: first, it synthesizes realistic help-seeker roles by sampling empirically salient attributes such as stressors, personality, and life history; second, it has two candidate support agents conduct separate sessions with the same role, isolating model-specific strategies; and third, it asks a specialized judge LLM to express pairwise preferences across rubric-anchored skills that span the Exploration, Insight, and Action spectrum. In our study, ESC-Judge matched PhD-level annotators on 85 percent of Exploration, 83 percent of Insight, and 86 percent of Action decisions, demonstrating human-level reliability at a fraction of the cost. All code, prompts, synthetic roles, transcripts, and judgment scripts are released to promote transparent progress in emotionally supportive AI.

Navid Madani、Rohini Srihari

计算技术、计算机技术

Navid Madani,Rohini Srihari.ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents[EB/OL].(2025-05-18)[2025-07-09].https://arxiv.org/abs/2505.12531.点此复制

评论