首页|Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

来源：

英文摘要

In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural networks. Using a fully convolutional network with the proposed DCKs, high-quality talking-face video can be generated from multi-modal sources (i.e., unmatched audio and video) in real time, and our trained model is robust to different identities, head postures, and input audios. Our proposed DCKs are specially designed for audio-driven talking face video generation, leading to a simple yet effective end-to-end system. We also provide a theoretical analysis to interpret why DCKs work. Experimental results show that our method can generate high-quality talking-face video with background at 60 fps. Comparison and evaluation between our method and the state-of-the-art methods demonstrate the superiority of our method.

作者：Yong-jin Liu、Juyong Zhang、Yu-Kun Lai、Xuwei Huang、Mengfei Xia、Ran Yi、Zipeng Ye、Guoxin Zhang

作者单位：

DOI：10.1109/TMM.2022.3142387

学科分类：计算技术、计算机技术电子技术应用

推荐引用：Yong-jin Liu,Juyong Zhang,Yu-Kun Lai,Xuwei Huang,Mengfei Xia,Ran Yi,Zipeng Ye,Guoxin Zhang.Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels[EB/OL].(2022-01-16)[2025-06-09].https://arxiv.org/abs/2201.05986.点此复制

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

评论