|国家预印本平台
首页|TF-MLPNet: Tiny Real-Time Neural Speech Separation

TF-MLPNet: Tiny Real-Time Neural Speech Separation

TF-MLPNet: Tiny Real-Time Neural Speech Separation

来源:Arxiv_logoArxiv
英文摘要

Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.

Malek Itani、Tuochao Chen、Shyamnath Gollakota

无线电设备、电信设备微电子学、集成电路

Malek Itani,Tuochao Chen,Shyamnath Gollakota.TF-MLPNet: Tiny Real-Time Neural Speech Separation[EB/OL].(2025-08-05)[2025-08-17].https://arxiv.org/abs/2508.03047.点此复制

评论