|国家预印本平台
首页|MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

来源:Arxiv_logoArxiv
英文摘要

While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy inputs. This contrasts with signal processing techniques, such as Fourier and wavelet analyses, which often employ hierarchical decompositions. Inspired by such principles, particularly the idea of signal separation, we introduce a diffusion framework leveraging multi-scale latent factorization. Our framework uniquely decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal capturing core structural information and a high-frequency residual signal that contributes finer, high-frequency details like textures. This decomposition into base and residual components directly informs our two-stage image generation process, which first produces the low-resolution base, followed by the generation of the high-resolution residual. Our proposed architecture facilitates reduced sampling steps during the residual learning stage, owing to the inherent ease of modeling residual information, which confers advantages over conventional full-resolution generation techniques. This specific approach of decomposing the signal into a base and a residual, conceptually akin to how wavelet analysis can separate different frequency bands, yields a more streamlined and intuitive design distinct from generic hierarchical models. Our method, \name\ (Multi-Scale Factorization), demonstrates its effectiveness by achieving FID scores of 2.08 ($256\times256$) and 2.47 ($512\times512$) on class-conditional ImageNet benchmarks, outperforming the DiT baseline (2.27 and 3.04 respectively) while also delivering a $4\times$ speed-up with the same number of sampling steps.

Shuangrui Ding、Longyu Chen、Yichen Zhang、Zhipeng Zhang、Haohang Xu

计算技术、计算机技术

Shuangrui Ding,Longyu Chen,Yichen Zhang,Zhipeng Zhang,Haohang Xu.MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize[EB/OL].(2025-06-30)[2025-07-16].https://arxiv.org/abs/2501.13349.点此复制

评论