首页|Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

来源：

英文摘要

Cued Speech (CS) is a visual communication system that combines lip-reading with hand coding to facilitate communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) aims to convert CS hand gestures and lip movements into text via AI-driven methods. Traditionally, the temporal asynchrony between hand and lip movements requires the design of complex modules to facilitate effective multimodal fusion. However, constrained by limited data availability, current methods demonstrate insufficient capacity for adequately training these fusion mechanisms, resulting in suboptimal performance. Recently, multi-agent systems have shown promising capabilities in handling complex tasks with limited data availability. To this end, we propose the first collaborative multi-agent system for ACSR, named Cued-Agent. It integrates four specialized sub-agents: a Multimodal Large Language Model-based Hand Recognition agent that employs keyframe screening and CS expert prompt strategies to decode hand movements, a pretrained Transformer-based Lip Recognition agent that extracts lip features from the input video, a Hand Prompt Decoding agent that dynamically integrates hand prompts with lip features during inference in a training-free manner, and a Self-Correction Phoneme-to-Word agent that enables post-process and end-to-end conversion from phoneme sequences to natural language sentences for the first time through semantic refinement. To support this study, we expand the existing Mandarin CS dataset by collecting data from eight hearing-impaired cuers, establishing a mixed dataset of fourteen subjects. Extensive experiments demonstrate that our Cued-Agent performs superbly in both normal and hearing-impaired scenarios compared with state-of-the-art methods. The implementation is available at https://github.com/DennisHgj/Cued-Agent.

作者：Guanjie Huang、Danny H. K. Tsang、Shan Yang、Guangzhi Lei、Li Liu

作者单位：

学科分类：计算技术、计算机技术汉语

推荐引用：Guanjie Huang,Danny H. K. Tsang,Shan Yang,Guangzhi Lei,Li Liu.Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition[EB/OL].(2025-08-01)[2025-08-11].https://arxiv.org/abs/2508.00391.点此复制

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

评论