|国家预印本平台
首页|PixelFlow: Pixel-Space Generative Models with Flow

PixelFlow: Pixel-Space Generative Models with Flow

PixelFlow: Pixel-Space Generative Models with Flow

来源:Arxiv_logoArxiv
英文摘要

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256$\times$256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Shoufa Chen、Chongjian Ge、Shilong Zhang、Peize Sun、Ping Luo

计算技术、计算机技术

Shoufa Chen,Chongjian Ge,Shilong Zhang,Peize Sun,Ping Luo.PixelFlow: Pixel-Space Generative Models with Flow[EB/OL].(2025-04-10)[2025-05-02].https://arxiv.org/abs/2504.07963.点此复制

评论