|国家预印本平台
首页|Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset

Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset

Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset

来源:Arxiv_logoArxiv
英文摘要

The 1st SpeechWellness Challenge conveys the need for speech-based suicide risk assessment in adolescents. This study investigates a multimodal approach for this challenge, integrating automatic transcription with WhisperX, linguistic embeddings from Chinese RoBERTa, and audio embeddings from WavLM. Additionally, handcrafted acoustic features -- including MFCCs, spectral contrast, and pitch-related statistics -- were incorporated. We explored three fusion strategies: early concatenation, modality-specific processing, and weighted attention with mixup regularization. Results show that weighted attention provided the best generalization, achieving 69% accuracy on the development set, though a performance gap between development and test sets highlights generalization challenges. Our findings, strictly tied to the MINI-KID framework, emphasize the importance of refining embedding representations and fusion mechanisms to enhance classification reliability.

Ambre Marie、Ilias Maoudj、Guillaume Dardenne、Gwenolé Quellec

计算技术、计算机技术

Ambre Marie,Ilias Maoudj,Guillaume Dardenne,Gwenolé Quellec.Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset[EB/OL].(2025-05-19)[2025-07-16].https://arxiv.org/abs/2505.13069.点此复制

评论