|国家预印本平台
首页|SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

来源:Arxiv_logoArxiv
英文摘要

Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose \texttt{SageLM}, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive S2S LLMs evaluation. First, unlike cascaded approaches that disregard acoustic features, SageLM jointly assesses both semantic and acoustic dimensions. Second, it leverages rationale-based supervision to enhance explainability and guide model learning, achieving superior alignment with evaluation outcomes compared to rule-based reinforcement learning methods. Third, we introduce \textit{SpeechFeedback}, a synthetic preference dataset, and employ a two-stage training paradigm to mitigate the scarcity of speech preference data. Trained on both semantic and acoustic dimensions, SageLM achieves an 82.79\% agreement rate with human evaluators, outperforming cascaded and SLM-based baselines by at least 7.42\% and 26.20\%, respectively.

Yuan Ge、Junxiang Zhang、Xiaoqian Liu、Bei Li、Xiangnan Ma、Chenglong Wang、Kaiyang Ye、Yangfan Du、Linfeng Zhang、Yuxin Huang、Tong Xiao、Zhengtao Yu、JingBo Zhu

语言学计算技术、计算机技术

Yuan Ge,Junxiang Zhang,Xiaoqian Liu,Bei Li,Xiangnan Ma,Chenglong Wang,Kaiyang Ye,Yangfan Du,Linfeng Zhang,Yuxin Huang,Tong Xiao,Zhengtao Yu,JingBo Zhu.SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement[EB/OL].(2025-08-28)[2025-09-04].https://arxiv.org/abs/2508.20916.点此复制

评论