首页|A Survey on Efficient Vision-Language Models

A Survey on Efficient Vision-Language Models

来源：

英文摘要

Vision-language models (VLMs) integrate visual and textual information, enabling a wide range of applications such as image captioning and visual question answering, making them crucial for modern AI systems. However, their high computational demands pose challenges for real-time applications. This has led to a growing focus on developing efficient vision language models. In this survey, we review key techniques for optimizing VLMs on edge and resource-constrained devices. We also explore compact VLM architectures, frameworks and provide detailed insights into the performance-memory trade-offs of efficient VLMs. Furthermore, we establish a GitHub repository at https://github.com/MPSCUMBC/Efficient-Vision-Language-Models-A-Survey to compile all surveyed papers, which we will actively update. Our objective is to foster deeper research in this area.

作者：Gaurav Shinde、Anuradha Ravi、Emon Dey、Shadman Sakib、Milind Rampure、Nirmalya Roy

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Gaurav Shinde,Anuradha Ravi,Emon Dey,Shadman Sakib,Milind Rampure,Nirmalya Roy.A Survey on Efficient Vision-Language Models[EB/OL].(2025-04-13)[2025-05-05].https://arxiv.org/abs/2504.09724.点此复制

A Survey on Efficient Vision-Language Models

A Survey on Efficient Vision-Language Models

评论