|国家预印本平台
首页|Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Image-Editing Specialists: An RLAIF Approach for Diffusion Models

Image-Editing Specialists: An RLAIF Approach for Diffusion Models

来源:Arxiv_logoArxiv
英文摘要

We present a novel approach to training specialized instruction-based image-editing diffusion models, addressing key challenges in structural preservation with input images and semantic alignment with user prompts. We introduce an online reinforcement learning framework that aligns the diffusion model with human preferences without relying on extensive human annotations or curating a large dataset. Our method significantly improves the realism and alignment with instructions in two ways. First, the proposed models achieve precise and structurally coherent modifications in complex scenes while maintaining high fidelity in instruction-irrelevant areas. Second, they capture fine nuances in the desired edit by leveraging a visual prompt, enabling detailed control over visual edits without lengthy textual prompts. This approach simplifies users' efforts to achieve highly specific edits, requiring only 5 reference images depicting a certain concept for training. Experimental results demonstrate that our models can perform intricate edits in complex scenes, after just 10 training steps. Finally, we showcase the versatility of our method by applying it to robotics, where enhancing the visual realism of simulated environments through targeted sim-to-real image edits improves their utility as proxies for real-world settings.

Elior Benarous、Yilun Du、Heng Yang

信息科学、信息技术计算技术、计算机技术

Elior Benarous,Yilun Du,Heng Yang.Image-Editing Specialists: An RLAIF Approach for Diffusion Models[EB/OL].(2025-04-17)[2025-05-14].https://arxiv.org/abs/2504.12833.点此复制

评论