LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness
LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness
Phonetic speech transcription is crucial for fine-grained linguistic analysis and downstream speech applications. While Connectionist Temporal Classification (CTC) is a widely used approach for such tasks due to its efficiency, it often falls short in recognition performance, especially under unclear and nonfluent speech. In this work, we propose LCS-CTC, a two-stage framework for phoneme-level speech recognition that combines a similarity-aware local alignment algorithm with a constrained CTC training objective. By predicting fine-grained frame-phoneme cost matrices and applying a modified Longest Common Subsequence (LCS) algorithm, our method identifies high-confidence alignment zones which are used to constrain the CTC decoding path space, thereby reducing overfitting and improving generalization ability, which enables both robust recognition and text-free forced alignment. Experiments on both LibriSpeech and PPA demonstrate that LCS-CTC consistently outperforms vanilla CTC baselines, suggesting its potential to unify phoneme modeling across fluent and non-fluent speech.
Zongli Ye、Jiachen Lian、Akshaj Gupta、Xuanru Zhou、Krish Patel、Haodong Li、Hwi Joo Park、Chenxu Guo、Shuhe Li、Sam Wang、Cheol Jun Cho、Zoe Ezzes、Jet M. J. Vonk、Brittany T. Morin、Rian Bogley、Lisa Wauters、Zachary A. Miller、Maria Luisa Gorno-Tempini、Gopala Anumanchipalli
计算技术、计算机技术
Zongli Ye,Jiachen Lian,Akshaj Gupta,Xuanru Zhou,Krish Patel,Haodong Li,Hwi Joo Park,Chenxu Guo,Shuhe Li,Sam Wang,Cheol Jun Cho,Zoe Ezzes,Jet M. J. Vonk,Brittany T. Morin,Rian Bogley,Lisa Wauters,Zachary A. Miller,Maria Luisa Gorno-Tempini,Gopala Anumanchipalli.LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness[EB/OL].(2025-08-05)[2025-08-16].https://arxiv.org/abs/2508.03937.点此复制
评论