基于深度学习模型的语音特征提取方法研究
Speech Feature Extraction Based on Deep Learning Models
随着移动互联网的发展,语音识别作为实现人机自由交互的关键技术越来越受到重视。特别是目前已经进入大数据时代,海量语音数据的获取已经成为可能,如何有效利用这些未经标注的原始数据成为当前语音识别领域的一个研究热点。与此同时,深度学习模型凭借着其对海量数据所具备的强大建模能力,能够直接对这些未标注数据进行处理,与语音识别的联系愈加紧密。本文在语音识别与深度学习理论相结合的基础上,针对如何利用深度学习模型提取更为鲁棒的声学特征这一问题展开研究,分别采用了自动编码器和深度神经网络两种模型通过无监督和有监督训练方法实现从原始语音特征中自动提取新特征。基于上述两种模型提取的新特征和原始MFCC相比,在词识别正确率方面分别提高了1.96%和3.53%。
With the rapid development of Mobile Internet, speech recognition which remains key to human-machine interaction is attracting more and more attention. Especially in the era of big data when access to large amount of speech data is possible, exploring ways to manipulate these unlabeled data effectively has become the hot topic in the field of speech recognition. Meanwhile, deep learning models find more opportunities in combination with speech recognition technologies because of its outstanding performance in unlabeled speech corpus processing and data modeling. This thesis is based on the combination of speech recognition and deep learning theory, aims at extracting more robust acoustic features with application of deep models. In the experiment, both Auto-Encoder and Deep Neural Network are constructed to extract new features from mfcc, respectively through techniques of unsupervised and supervised feature learning. Finally, the new features obtained from deep models mentioned above have increased the word recognition accuracy by 1.96% and 3.53%, compared with mfcc features.
梁静、刘刚
通信
语音识别深度神经网络深度自动编码器特征提取
Speech RecognitionDeep Neural NetworkDeep Auto-EncoderFeature Extraction
梁静,刘刚.基于深度学习模型的语音特征提取方法研究[EB/OL].(2013-12-31)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201312-1241.点此复制
评论