首页|Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

来源：

英文摘要

Current text-driven image editing methods typically follow one of two directions: relying on large-scale, high-quality editing pair datasets to improve editing precision and diversity, or exploring alternative dataset-free techniques. However, constructing large-scale editing datasets requires carefully designed pipelines, is time-consuming, and often results in unrealistic samples or unwanted artifacts. Meanwhile, dataset-free methods may suffer from limited instruction comprehension and restricted editing capabilities. Faced with these challenges, the present work develops a novel paradigm for instruction-driven image editing that leverages widely available and enormous text-image pairs, instead of relying on editing pair datasets. Our approach introduces a multi-scale learnable region to localize and guide the editing process. By treating the alignment between images and their textual descriptions as supervision and learning to generate task-specific editing regions, our method achieves high-fidelity, precise, and instruction-consistent image editing. Extensive experiments demonstrate that the proposed approach attains state-of-the-art performance across various tasks and benchmarks, while exhibiting strong adaptability to various types of generative models.

作者：Chenrui Ma、Xi Xiao、Tianyang Wang、Yanning Shen

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Chenrui Ma,Xi Xiao,Tianyang Wang,Yanning Shen.Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions[EB/OL].(2025-05-25)[2025-06-13].https://arxiv.org/abs/2505.19352.点此复制

Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions

评论