基于HMM的英语焦点语音合成
Generating Emphasis in Expressive English Speech using HMM-based Speech Synthesis
文本对英语焦点语音进行了分析与转换建模,提出了基于改进的二级决策树和补偿模型的HMM焦点语音合成方法。在单词音节级和音素级对中性语音到焦点语音的声学特征变化进行了分析,建立了从中性语音到焦点语音的层级式转换模型;并进而根据数据分析,设计了焦点相关特征集,基于该特征集建立了英语焦点语音的HMM合成模型,提出了改进的二级决策树结构,首先使用焦点无关问题建立决策树,然后使用焦点相关问题对该决策树的叶节点进行扩展,该结构可以在保证合成语音自然度的前提下,提高合成语音的焦点表达效果;为了进一步解决焦点语音数据稀疏的问题,在HMM预测器之后增加了补偿模型,用来修正数据源焦点类别与目标不符的预测特征。实验表明,采用改进二级决策树和补偿模型的英语焦点语音合成系统,在保持自然度不变的前提下,有效提高了合成语音的焦点表达效果。
Emphasis is an important form of expressiveness in speech. Hidden Markov model (HMM) based synthesis has shown great flexibility in generating expressive speech. However it does not work well when the training data is lacking as in the case of emphatic speech where a sentence usually contains only few emphatic words. To address the issue, this paper proposes a modified two-pass decision tree with a compensation model for emphatic speech synthesis. The English syllables are first categorized into 6 classes based on their locations and distances in relation with the nearest emphatic word and its primary stressed syllable(s). The English phones are then grouped into 9 types according to their pronunciation method. The decision tree is then constructed using non-emphasis related context questions in the first pass, and is then extended with emphasis questions in the second pass. The compensation model is used to modify the parameters predicted by the two-pass decision tree based on phone types rather than individual phones to alleviate the sparse data problem. Experiments show that the proposed method can generate emphatic speech with high quality for both naturalness and emphasis.
孟凡博、蔡莲红、蒙美玲、吴志勇、贾珈
常用外国语语言学
计算机应用焦点语音语音合成二级决策树补偿模型隐马尔可夫模型HMM
computer applicationemphasized speechspeech synthesistwo-pass decision treecompensation modelhidden Markov model (HMM)
孟凡博,蔡莲红,蒙美玲,吴志勇,贾珈.基于HMM的英语焦点语音合成[EB/OL].(2012-03-06)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/201203-191.点此复制
评论