首页|xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

来源：

英文摘要

This paper introduces BLIP-3, an open framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. We release 4B and 14B models, including both the pre-trained base model and the instruction fine-tuned ones. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our models demonstrate competitive performance among open-source LMMs with similar model sizes. Our resulting LMMs demonstrate competitive performance among open-source LMMs with similar model sizes, with the ability to comprehend interleaved image-text inputs. Our training code, models, and all datasets used in this work, including the three largescale datasets we create and the preprocessed ones, will be open-sourced to better support the research community.

作者：Juan Carlos Niebles、Zeyuan Chen、Huan Wang、Ludwig Schmidt、Silvio Savarese、Caiming Xiong、Ran Xu、Yejin Choi、Le Xue、Manli Shu、Anas Awadalla、Jun Wang、An Yan、Senthil Purushwalkam、Honglu Zhou、Viraj Prabhu、Yutong Dai、Michael S Ryoo、Shrikant Kendre、Jieyu Zhang、Shaoyen Tseng、Gustavo A Lujan-Moreno、Matthew L Olson、Musashi Hinck、David Cobbley、Vasudev Lal、Can Qin、Shu Zhang、Chia-Chih Chen、Ning Yu、Juntao Tan、Tulika Manoj Awalgaonkar、Shelby Heinecke

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Juan Carlos Niebles,Zeyuan Chen,Huan Wang,Ludwig Schmidt,Silvio Savarese,Caiming Xiong,Ran Xu,Yejin Choi,Le Xue,Manli Shu,Anas Awadalla,Jun Wang,An Yan,Senthil Purushwalkam,Honglu Zhou,Viraj Prabhu,Yutong Dai,Michael S Ryoo,Shrikant Kendre,Jieyu Zhang,Shaoyen Tseng,Gustavo A Lujan-Moreno,Matthew L Olson,Musashi Hinck,David Cobbley,Vasudev Lal,Can Qin,Shu Zhang,Chia-Chih Chen,Ning Yu,Juntao Tan,Tulika Manoj Awalgaonkar,Shelby Heinecke.xGen-MM (BLIP-3): A Family of Open Large Multimodal Models[EB/OL].(2025-06-19)[2025-07-19].https://arxiv.org/abs/2408.08872.点此复制

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

评论