|国家预印本平台
首页|多模态数据的知识增强深度语义学习方法

多模态数据的知识增强深度语义学习方法

中文摘要英文摘要

图像-文本检索的主要目的是方便通过图像检索文本,反之亦然。然而,这些数据往往呈现出多模态、多源异构的特点,包括文本、图像、音频、视频等不同形式的信息,因此要高效地管理和利用这些多模态数据,需要使用到知识图谱、图神经网络、对比学习、对抗学习以及多模态大模型等新兴技术。因此本文提出了多模态知识增强的跨模态教育大数据的深度语义学习方法,本研究将多模态知识图谱应用于跨模态检索任务中。为了实现了图像和文本表示的统一,本文加入了模态内和模态间的对比学习损失函数。这些措施有助于多模态数据在统一的语义空间下进行表示学习,从而提高了检索的准确性和效率。实验结果表明,该方法在多模态检索任务中取得了显著的性能提升,验证了其有效性和实用性。

he main purpose of image-text retrieval is to facilitate the retrieval of text by image and vice versa. However, these data often show the characteristics of multi-mode, multi-source heterogeneous, including text, image, audio, video and other different forms of information, so to effectively manage and use these multi-mode data, need to use the knowledge graph, graph neural network, contrast learning, adversarial learning and multi-modal large model and other emerging technologies. Therefore, this paper proposes a deep semantic learning method for cross-modal education big data with multi-modal knowledge enhancement. In this study, multi-modal knowledge graph is applied to cross-modal retrieval task. The unity of image and text representation is realized, and the contrast learning loss function within and between modes is added. These measures are helpful for representation learning of multimodal data in a unified semantic space, thus improving the accuracy and efficiency of retrieval. Experimental results show that the proposed method achieves remarkable performance improvement in multi-modal retrieval tasks, which verifies its effectiveness and practicability.

卓振勇、梁美玉

北京邮电大学计算机学院,北京 100083北京邮电大学计算机学院,北京 100083

计算技术、计算机技术信息传播、知识传播

跨模态检索多模态知识图谱图神经网络对比损失函数

ross-modal retrievalMultimodal knowledge graphGraph neural networkontrast loss function

卓振勇,梁美玉.多模态数据的知识增强深度语义学习方法[EB/OL].(2025-03-17)[2025-05-09].http://www.paper.edu.cn/releasepaper/content/202503-143.点此复制

评论