|国家预印本平台
首页|INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

来源:Arxiv_logoArxiv
英文摘要

Large Vision-Language Models (LVLMs) and Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance in various general multimodal applications and have shown increasing promise in specialized domains. However, their potential in the insurance domain-characterized by diverse application scenarios and rich multimodal data-remains largely underexplored. To date, there is no systematic review of multimodal tasks, nor a benchmark specifically designed to assess the capabilities of LVLMs in insurance. This gap hinders the development of LVLMs within the insurance industry. This study systematically reviews and categorizes multimodal tasks for 4 representative types of insurance: auto, property, health, and agricultural. We introduce INS-MMBench, the first hierarchical benchmark tailored for the insurance domain. INS-MMBench encompasses 22 fundamental tasks, 12 meta-tasks and 5 scenario tasks, enabling a comprehensive and progressive assessment from basic capabilities to real-world use cases. We benchmark 11 leading LVLMs, including closed-source models such as GPT-4o and open-source models like LLaVA. Our evaluation validates the effectiveness of INS-MMBench and offers detailed insights into the strengths and limitations of current LVLMs on a variety of insurance-related multimodal tasks. We hope that INS-MMBench will accelerate the integration of LVLMs into the insurance industry and foster interdisciplinary research. Our dataset and evaluation code are available at https://github.com/FDU-INS/INS-MMBench.

Hanjia Lyu、Xian Xu、Chenwei Lin、Jiebo Luo

计算技术、计算机技术

Hanjia Lyu,Xian Xu,Chenwei Lin,Jiebo Luo.INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance[EB/OL].(2025-08-07)[2025-08-24].https://arxiv.org/abs/2406.09105.点此复制

评论