STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models
STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models
Diffusion models (DMs) have demonstrated remarkable performance in high-fidelity image and video generation. Because high-quality generations with DMs typically require a large number of function evaluations (NFEs), resulting in slow sampling, there has been extensive research successfully reducing the NFE to a small range (<10) while maintaining acceptable image quality. However, many practical applications, such as those involving Stable Diffusion 3.5, FLUX, and SANA, commonly operate in the mid-NFE regime (20-50 NFE) to achieve superior results, and, despite the practical relevance, research on the effective sampling within this mid-NFE regime remains underexplored. In this work, we propose a novel, training-free, and structure-independent DM ODE solver called the Stabilized Taylor Orthogonal Runge--Kutta (STORK) method, based on a class of stiff ODE solvers with a Taylor expansion adaptation. Unlike prior work such as DPM-Solver, which is dependent on the semi-linear structure of the DM ODE, STORK is applicable to any DM sampling, including noise-based and flow matching-based models. Within the 20-50 NFE range, STORK achieves improved generation quality, as measured by FID scores, across unconditional pixel-level generation and conditional latent-space generation tasks using models like Stable Diffusion 3.5 and SANA. Code is available at https://github.com/ZT220501/STORK.
Zheng Tan、Weizhen Wang、Andrea L. Bertozzi、Ernest K. Ryu
计算技术、计算机技术
Zheng Tan,Weizhen Wang,Andrea L. Bertozzi,Ernest K. Ryu.STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models[EB/OL].(2025-05-30)[2025-07-16].https://arxiv.org/abs/2505.24210.点此复制
评论