|国家预印本平台
首页|基于Gentle AdaBoost的多语种识别及语言关系研究

基于Gentle AdaBoost的多语种识别及语言关系研究

Multi-language recognition based on Gentle AdaBoost and language relationship map learning

中文摘要英文摘要

本文将一种新的特征提取方法应用于语种识别,采用Gentle AdaBoost算法搭建多语种识别系统,并通过两两语种的识别率分析语言之间的距离关系。特征提取过程中,先通过OpenEar工具箱里的56个底层特征提取模块(Low Level Descriptors)提取语音信号的基本声学特征,然后将这些特征通过39种统计函数进行统计分析,将得到的统计值与它们的一阶和二阶差分值,最终构成6552维特征向量。在OGI-TS数据库上,使用Gentle AdaBoost搭建语种识别系统,并用系统输出的两两语种识别率绘制语言关系图。结果表明,本文所用的方法与传统经典方法性能相当,并且得到的语言关系图和实际情况基本一致,这也为语言关系的研究提供了一个新的思路。

his paper applies a new feature set to language recognition. We use the OpenEar toolkit to extract 6,552 features per speech sample, these features consist of 56 low level descriptors (LLDs), the corresponding 39 statistical functions and their first and second order difference values. The language model training part is based on the Gentle AdaBoost algorithm. For recognizing language pairs, we compared the original AdaBoost with the Gentle AdaBoost algorithms on the OGI-TS corpora. Then we extend the Gentle Adaboost algorithm to build our multi-language recognition system. Besides, we use the output of the language recognition system to generate a language relationship map which explains the distance between each language.

胡浩基、周强、孙乐、于慧敏

语言学

语种识别Gentle AdaBoost多类分类器语言关系图

language recognitionGentle AdaBoostmulti-class classificationlanguage relationship map

胡浩基,周强,孙乐,于慧敏.基于Gentle AdaBoost的多语种识别及语言关系研究[EB/OL].(2016-03-04)[2025-08-22].http://www.paper.edu.cn/releasepaper/content/201603-52.点此复制

评论