|国家预印本平台
首页|Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition

来源:Arxiv_logoArxiv
英文摘要

Cued Speech (CS) is a visual communication system that combines lip-reading with hand coding to facilitate communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) aims to convert CS hand gestures and lip movements into text via AI-driven methods. Traditionally, the temporal asynchrony between hand and lip movements requires the design of complex modules to facilitate effective multimodal fusion. However, constrained by limited data availability, current methods demonstrate insufficient capacity for adequately training these fusion mechanisms, resulting in suboptimal performance. Recently, multi-agent systems have shown promising capabilities in handling complex tasks with limited data availability. To this end, we propose the first collaborative multi-agent system for ACSR, named Cued-Agent. It integrates four specialized sub-agents: a Multimodal Large Language Model-based Hand Recognition agent that employs keyframe screening and CS expert prompt strategies to decode hand movements, a pretrained Transformer-based Lip Recognition agent that extracts lip features from the input video, a Hand Prompt Decoding agent that dynamically integrates hand prompts with lip features during inference in a training-free manner, and a Self-Correction Phoneme-to-Word agent that enables post-process and end-to-end conversion from phoneme sequences to natural language sentences for the first time through semantic refinement. To support this study, we expand the existing Mandarin CS dataset by collecting data from eight hearing-impaired cuers, establishing a mixed dataset of fourteen subjects. Extensive experiments demonstrate that our Cued-Agent performs superbly in both normal and hearing-impaired scenarios compared with state-of-the-art methods. The implementation is available at https://github.com/DennisHgj/Cued-Agent.

Guanjie Huang、Danny H. K. Tsang、Shan Yang、Guangzhi Lei、Li Liu

计算技术、计算机技术汉语

Guanjie Huang,Danny H. K. Tsang,Shan Yang,Guangzhi Lei,Li Liu.Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition[EB/OL].(2025-08-01)[2025-08-11].https://arxiv.org/abs/2508.00391.点此复制

评论