World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation
World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation
Improving data efficiency and generalization in robotic manipulation remains a core challenge. We propose a novel framework that leverages a pre-trained multimodal image-generation model as a world model to guide policy learning. By exploiting its rich visual-semantic representations and strong generalization across diverse scenes, the model generates open-ended future state predictions that inform downstream manipulation. Coupled with zero-shot low-level control modules, our approach enables general-purpose robotic manipulation without task-specific training. Experiments in both simulation and real-world environments demonstrate that our method achieves effective performance across a wide range of manipulation tasks with no additional data collection or fine-tuning. Supplementary materials are available on our website: https://world4omni.github.io/.
Haonan Chen、Bangjun Wang、Jingxiang Guo、Tianrui Zhang、Yiwen Hou、Xuchuan Huang、Chenrui Tie、Lin Shao
计算技术、计算机技术自动化技术、自动化技术设备
Haonan Chen,Bangjun Wang,Jingxiang Guo,Tianrui Zhang,Yiwen Hou,Xuchuan Huang,Chenrui Tie,Lin Shao.World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation[EB/OL].(2025-06-30)[2025-08-02].https://arxiv.org/abs/2506.23919.点此复制
评论