|国家预印本平台
首页|Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

来源:Arxiv_logoArxiv
英文摘要

The 180x360 omnidirectional field of view captured by 360-degree cameras enables their use in a wide range of applications such as embodied AI and virtual reality. Although recent advances in multimodal large language models (MLLMs) have shown promise in visual-spatial reasoning, most studies focus on standard pinhole-view images, leaving omnidirectional perception largely unexplored. In this paper, we ask: Are MLLMs ready for omnidirectional spatial reasoning? To investigate this, we introduce OSR-Bench, the first benchmark specifically designed for this setting. OSR-Bench includes over 153,000 diverse question-answer pairs grounded in high-fidelity panoramic indoor scene maps. It covers key reasoning types including object counting, relative distance, and direction. We also propose a negative sampling strategy that inserts non-existent objects into prompts to evaluate hallucination and grounding robustness. For fine-grained analysis, we design a two-stage evaluation framework assessing both cognitive map generation and QA accuracy using rotation-invariant matching and a combination of rule-based and LLM-based metrics. We evaluate eight state-of-the-art MLLMs, including GPT-4o, Gemini 1.5 Pro, and leading open-source models under zero-shot settings. Results show that current models struggle with spatial reasoning in panoramic contexts, highlighting the need for more perceptually grounded MLLMs. OSR-Bench and code will be released at: https://huggingface.co/datasets/UUUserna/OSR-Bench

Zihao Dongfang、Xu Zheng、Ziqiao Weng、Yuanhuiyi Lyu、Danda Pani Paudel、Luc Van Gool、Kailun Yang、Xuming Hu

计算技术、计算机技术

Zihao Dongfang,Xu Zheng,Ziqiao Weng,Yuanhuiyi Lyu,Danda Pani Paudel,Luc Van Gool,Kailun Yang,Xuming Hu.Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?[EB/OL].(2025-05-17)[2025-07-16].https://arxiv.org/abs/2505.11907.点此复制

评论