DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Video face restoration faces a critical challenge in maintaining temporal consistency while recovering fine facial details from degraded inputs. This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality portraits, into a video restoration framework through variational latent space modeling. Our key innovation lies in reformulating discrete codebook representations as Dirichlet-distributed continuous variables, enabling probabilistic transitions between facial features across frames. A spatio-temporal Transformer architecture jointly models inter-frame dependencies and predicts latent distributions, while a Laplacian-constrained reconstruction loss combined with perceptual (LPIPS) regularization enhances both pixel accuracy and visual quality. Comprehensive evaluations on blind face restoration, video inpainting, and facial colorization tasks demonstrate state-of-the-art performance. This work establishes an effective paradigm for adapting intensive image priors, pretrained on high-quality images, to video restoration while addressing the critical challenge of flicker artifacts. The source code has been open-sourced and is available at https://github.com/fudan-generative-vision/DicFace.
Yan Chen、Hanlin Shang、Ce Liu、Yuxuan Chen、Hui Li、Weihao Yuan、Hao Zhu、Zilong Dong、Siyu Zhu
计算技术、计算机技术
Yan Chen,Hanlin Shang,Ce Liu,Yuxuan Chen,Hui Li,Weihao Yuan,Hao Zhu,Zilong Dong,Siyu Zhu.DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration[EB/OL].(2025-06-16)[2025-08-02].https://arxiv.org/abs/2506.13355.点此复制
评论