|国家预印本平台
首页|OmniBench: Towards The Future of Universal Omni-Language Models

OmniBench: Towards The Future of Universal Omni-Language Models

OmniBench: Towards The Future of Universal Omni-Language Models

来源:Arxiv_logoArxiv
英文摘要

Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored. We introduce OmniBench, a novel benchmark designed to evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as omni-language models (OLMs). OmniBench features high-quality human annotations that require integrated understanding across all modalities. Our evaluation reveals that: i) open-source OLMs show significant limitations in instruction-following and reasoning in tri-modal contexts; and ii) most baseline models perform poorly (around 50% accuracy) even with textual alternatives to image/audio inputs. To address these limitations, we develop OmniInstruct, an 96K-sample instruction tuning dataset for training OLMs. We advocate for developing more robust tri-modal integration techniques and training strategies to enhance OLM performance. Codes and data could be found at our repo (https://github.com/multimodal-art-projection/OmniBench).

Yizhi Li、Ge Zhang、Yinghao Ma、Ruibin Yuan、Kang Zhu、Hangyu Guo、Yiming Liang、Jiaheng Liu、Zekun Wang、Jian Yang、Siwei Wu、Xingwei Qu、Jinjie Shi、Xinyue Zhang、Zhenzhu Yang、Xiangzhou Wang、Zhaoxiang Zhang、Zachary Liu、Emmanouil Benetos、Wenhao Huang、Chenghua Lin

计算技术、计算机技术

Yizhi Li,Ge Zhang,Yinghao Ma,Ruibin Yuan,Kang Zhu,Hangyu Guo,Yiming Liang,Jiaheng Liu,Zekun Wang,Jian Yang,Siwei Wu,Xingwei Qu,Jinjie Shi,Xinyue Zhang,Zhenzhu Yang,Xiangzhou Wang,Zhaoxiang Zhang,Zachary Liu,Emmanouil Benetos,Wenhao Huang,Chenghua Lin.OmniBench: Towards The Future of Universal Omni-Language Models[EB/OL].(2024-09-23)[2025-06-05].https://arxiv.org/abs/2409.15272.点此复制

评论