视觉语言表示学习综述

Overview of Visual Language Representation Learning

汪国亮商彦磊

摘要：视觉语言多模态表示学习成为近年来的研究热点，融合两种模态所包含的不同特征信息对于提升模型在下游任务上的表现有较大帮助。视觉语言表示学习模型在许多下游任务上取得了很好的效果。本文将首先介绍视觉语言表示学习的下游任务，然后介绍视觉语言表示学习的两种方式，分为基于相似性的视觉语言表示学习和基于预训练模型的视觉语言表示学习。最后，介绍视觉语言表示学习的发展趋势。

学科分类：计算技术、计算机技术

中文关键词：视觉语言表示学习图文跨模态检索图像字幕

推荐引用：汪国亮,商彦磊.视觉语言表示学习综述[EB/OL].(2023-04-25)[2025-10-05].http://www.paper.edu.cn/releasepaper/content/202304-326.点此复制

Abstract：Visual language multimodal representation learning has become a recent research hotspot, as fusing different feature information from the two modalities can greatly improve the performance of models in downstream tasks. Visual language representation learning models have achieved good results in many downstream tasks. This article will first introduce the downstream tasks of visual language representation learning, and then introduce the two methods of visual language representation learning: similarity-based visual language representation learning and pre-trained model-based visual language representation learning. Finally, the development trends of visual language representation learning will be discussed.

Keywords：Visual Language Representation LearningImage-Text RetrievalImage Captioning

展开英文信息

视觉语言表示学习综述

Overview of Visual Language Representation Learning

评论