首页|Product of Experts for Visual Generation

Product of Experts for Visual Generation

来源：

英文摘要

Modern neural models capture rich priors and have complementary knowledge over shared data domains, e.g., images and videos. Integrating diverse knowledge from multiple sources -- including visual generative models, visual language models, and sources with human-crafted knowledge such as graphics engines and physics simulators -- remains under-explored. We propose a Product of Experts (PoE) framework that performs inference-time knowledge composition from heterogeneous models. This training-free approach samples from the product distribution across experts via Annealed Importance Sampling (AIS). Our framework shows practical benefits in image and video synthesis tasks, yielding better controllability than monolithic methods and additionally providing flexible user interfaces for specifying visual generation goals.

作者：Yunzhi Zhang、Carson Murtuza-Lanier、Zizhang Li、Yilun Du、Jiajun Wu

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Yunzhi Zhang,Carson Murtuza-Lanier,Zizhang Li,Yilun Du,Jiajun Wu.Product of Experts for Visual Generation[EB/OL].(2025-06-10)[2025-07-16].https://arxiv.org/abs/2506.08894.点此复制

Product of Experts for Visual Generation

Product of Experts for Visual Generation

评论