基于HiFi-GAN模型的轻量化研究
Research on Lightweight Based on HiFi-GAN Model
HiFi-GAN模型在语音合成方面能够高效和高保真地合成音频,但是,其庞大的模型参数和运算量使其在有限资源设备上存储、部署和进行实时运算变得困难。目前关于HiFi-GAN模型的压缩与加速研究相对较少,而在移动设备和边缘设备上实现实时语音合成的需求却非常广泛。论文研究HiFi-GAN模型的轻量化方法,实现网络压缩,并实现硬件平台上的部署。论文的主要研究及成果如下:为解决HiFi-GAN在推理阶段对计算和存储资源的高消耗以及结构复杂的问题,本文提出一种知识蒸馏与架构搜索结合的压缩方法。该方法对HiFi-GAN模型的resBlock层进行卷积分解,得到相对紧凑的学生模型,并使用经过设计的训练目标方案进行蒸馏学习。之后,将此学生网络作为“一次性”网络进行架构搜索,最终获得紧凑的最优子学生模型。实验证明,这一方法显著压缩HiFi-GAN模型的尺寸并减少运算量,在单一说话人LJSpeech数据集和未知说话人VCTK数据集上都生成良好的语音质量PESQ值。
he HiFi-GAN model combines audio with high efficiency and high fidelity in speech synthesis, but its huge model parameters and calculations make it difficult to store, deploy, and perform real-time computing on limited-resource devices. There are relatively few studies on the compression and acceleration of HiFi-GAN models, while the need for real-time speech synthesis on mobile devices and edge devices is very extensive. This paper studies the lightweighting method of HiFi-GAN model to realize the network compression and deploy on the hardware platform. The main research and results of the paper are as follows: Aiming at the problems of high computing and storage resource consumption and complex structure in the inference stage of HiFi-GAN, this paper proposes a compression method combining knowledge distillation and architecture search according to its network structure. In this method, the resBlock layer of the HiFi-GAN model is convolutional decomposed to obtain a relatively compact student model, and the designed training objective scheme is used for distillation learning. After that, this student network is used as a "one-time" network for architecture search, and finally the compact optimal sub-student model is obtained. Experiments show that this method significantly compreses the size of the HiFi-GAN model and reduces the amount of computation, and generates good speech quality PESQ values on both the single speaker LJSpeech dataset and the unknown speaker VCTK dataset.
别红霞、高斌
电子技术应用计算技术、计算机技术
人工智能HiFi-GAN知识蒸馏架构搜索
artificial intelligenceHiFi-GANKnowledge distillationArchitecture search
别红霞,高斌.基于HiFi-GAN模型的轻量化研究[EB/OL].(2024-04-15)[2025-08-24].http://www.paper.edu.cn/releasepaper/content/202404-183.点此复制
评论