|国家预印本平台
首页|基于时间尺度聚合的短语音说话人识别

基于时间尺度聚合的短语音说话人识别

中文摘要英文摘要

说话人识别技术基于个体语音特征进行身份区分,广泛应用于语音助手、智能安防等领域。然而,短语音数据由于时长有限,难以提取稳定的说话人特征,严重影响识别准确率。传统的多尺度特征聚合方法大多侧重于通道维度上的信息融合,可能无法充分捕捉短语音场景下关键的时序动态信息。本文提出了一种基于时序特征的多尺度特征聚合方法。该方法通过构建多尺度特征提取模块,有效捕捉短语音中的局部和全局时序特征。该方法可以增强不同尺度特征的互补性,在模型规模减小50%的情况下,并实现约1%的准确率提升。

Speaker recognition distinguishes identities based on individual voice features and is widely used in voice assistants and security systems. Short speech data, due to its limited duration, makes it difficult to extract stable speaker features, which reduces recognition accuracy. Traditional multi-scale feature aggregation methods often focus on information fusion along the channel dimension but may miss important temporal dynamics in short speech scenarios. This paper proposes a multi-scale feature aggregation method based on temporal features. A multi-scale feature extraction module captures both local and global temporal features in short speech. This approach enhances the complementarity of features at different scales, maintaining recognition accuracy even when the model size is reduced by 50%, with an accuracy improvement of approximately 1%.

王逸轩、别红霞

北京邮电大学人工智能学院,北京市,100876北京邮电大学人工智能学院,北京市,100876

计算技术、计算机技术自动化技术、自动化技术设备

人工智能说话人识别短语音特征聚合?????

artificial intelligencespeaker recognitionshort speechfeature aggregation

王逸轩,别红霞.基于时间尺度聚合的短语音说话人识别[EB/OL].(2025-04-18)[2025-05-10].http://www.paper.edu.cn/releasepaper/content/202504-171.点此复制

评论