首页|Text-to-Image Alignment in Denoising-Based Models through Step Selection

Text-to-Image Alignment in Denoising-Based Models through Step Selection

来源：

英文摘要

Visual generative AI models often encounter challenges related to text-image alignment and reasoning limitations. This paper presents a novel method for selectively enhancing the signal at critical denoising steps, optimizing image generation based on input semantics. Our approach addresses the shortcomings of early-stage signal modifications, demonstrating that adjustments made at later stages yield superior results. We conduct extensive experiments to validate the effectiveness of our method in producing semantically aligned images on Diffusion and Flow Matching model, achieving state-of-the-art performance. Our results highlight the importance of a judicious choice of sampling stage to improve performance and overall image alignment.

作者：Paul Grimal、Hervé Le Borgne、Olivier Ferret

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Paul Grimal,Hervé Le Borgne,Olivier Ferret.Text-to-Image Alignment in Denoising-Based Models through Step Selection[EB/OL].(2025-04-24)[2025-06-09].https://arxiv.org/abs/2504.17525.点此复制

Text-to-Image Alignment in Denoising-Based Models through Step Selection

Text-to-Image Alignment in Denoising-Based Models through Step Selection

评论