首页|LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

来源：

英文摘要

In controllable image synthesis, generating coherent and consistent images from multiple references with spatial layout awareness remains an open challenge. We present LAMIC, a Layout-Aware Multi-Image Composition framework that, for the first time, extends single-reference diffusion models to multi-reference scenarios in a training-free manner. Built upon the MMDiT model, LAMIC introduces two plug-and-play attention mechanisms: 1) Group Isolation Attention (GIA) to enhance entity disentanglement; and 2) Region-Modulated Attention (RMA) to enable layout-aware generation. To comprehensively evaluate model capabilities, we further introduce three metrics: 1) Inclusion Ratio (IN-R) and Fill Ratio (FI-R) for assessing layout control; and 2) Background Similarity (BG-S) for measuring background consistency. Extensive experiments show that LAMIC achieves state-of-the-art performance across most major metrics: it consistently outperforms existing multi-reference baselines in ID-S, BG-S, IN-R and AVG scores across all settings, and achieves the best DPG in complex composition tasks. These results demonstrate LAMIC's superior abilities in identity keeping, background preservation, layout control, and prompt-following, all achieved without any training or fine-tuning, showcasing strong zero-shot generalization ability. By inheriting the strengths of advanced single-reference models and enabling seamless extension to multi-image scenarios, LAMIC establishes a new training-free paradigm for controllable multi-image composition. As foundation models continue to evolve, LAMIC's performance is expected to scale accordingly. Our implementation is available at: https://github.com/Suchenl/LAMIC.

作者：Yuzhuo Chen、Zehua Ma、Jianhua Wang、Kai Kang、Shunyu Yao、Weiming Zhang

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Yuzhuo Chen,Zehua Ma,Jianhua Wang,Kai Kang,Shunyu Yao,Weiming Zhang.LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer[EB/OL].(2025-08-01)[2025-08-11].https://arxiv.org/abs/2508.00477.点此复制

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

评论