|国家预印本平台
首页|Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

来源:Arxiv_logoArxiv
英文摘要

Most neural speech codecs achieve bitrate adjustment through intra-frame mechanisms, such as codebook dropout, at a Constant Frame Rate (CFR). However, speech segments inherently have time-varying information density (e.g., silent intervals versus voiced regions). This property makes CFR not optimal in terms of bitrate and token sequence length, hindering efficiency in real-time applications. In this work, we propose a Temporally Flexible Coding (TFC) technique, introducing variable frame rate (VFR) into neural speech codecs for the first time. TFC enables seamlessly tunable average frame rates and dynamically allocates frame rates based on temporal entropy. Experimental results show that a codec with TFC achieves optimal reconstruction quality with high flexibility, and maintains competitive performance even at lower frame rates. Our approach is promising for the integration with other efforts to develop low-frame-rate neural speech codecs for more efficient downstream tasks.

Hanglei Zhang、Yiwei Guo、Zhihan Li、Xiang Hao、Xie Chen、Kai Yu

通信

Hanglei Zhang,Yiwei Guo,Zhihan Li,Xiang Hao,Xie Chen,Kai Yu.Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate[EB/OL].(2025-05-22)[2025-06-19].https://arxiv.org/abs/2505.16845.点此复制

评论