EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance
EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance
Zero-shot personalized image generation models aim to produce images that align with both a given text prompt and subject image, requiring the model to incorporate both sources of guidance. Existing methods often struggle to capture fine-grained subject details and frequently prioritize one form of guidance over the other, resulting in suboptimal subject encoding and imbalanced generation. In this study, we uncover key insights into overcoming such drawbacks, notably that 1) the choice of the subject image encoder critically influences subject identity preservation and training efficiency, and 2) the text and subject guidance should take effect at different denoising stages. Building on these insights, we introduce a new approach, EZIGen, that employs two main components: leveraging a fixed pre-trained Diffusion UNet itself as subject encoder, following a process that balances the two guidances by separating their dominance stage and revisiting certain time steps to bootstrap subject transfer quality. Through these two components, EZIGen, initially built upon SD2.1-base, achieved state-of-the-art performances on multiple personalized generation benchmarks with a unified model, while using 100 times less training data. Moreover, by further migrating our design to SDXL, EZIGen is proven to be a versatile model-agnostic solution for personalized generation. Demo Page: zichengduan.github.io/pages/EZIGen/index.html
Ziqin Zhou、Ethan Smith、Lingqiao Liu、Zicheng Duan、Yuxuan Ding、Chenhui Gou
计算技术、计算机技术
Ziqin Zhou,Ethan Smith,Lingqiao Liu,Zicheng Duan,Yuxuan Ding,Chenhui Gou.EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance[EB/OL].(2024-09-12)[2025-08-02].https://arxiv.org/abs/2409.08091.点此复制
评论