首页|Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

来源：

中文摘要

英文摘要

Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks,i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking (ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions arelinked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robustlinking in highly specialized domains. Simultaneously, the inference efficiency of existing models is lowwhen there are many candidate entities. On this account, we propose a novel model that leverages visuallinguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering thetrade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for thenew task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entitiesas ground truth. Extensive experimental results on the dataset show that our proposed model is effective asit significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.

作者：Qiushuo, Zheng、 Chaoyu, Bai、Meng, Wang、Hao, Wen、Guilin, Qi

作者单位：

DOI：10.1162/dint_a_00146

学科分类：计算技术、计算机技术

中文关键词：Knowledge GraphMulti-modal LearningPoly Encoders

英文关键词：Knowledge GraphMulti-modal LearningPoly Encoders

推荐引用：Qiushuo, Zheng, Chaoyu, Bai,Meng, Wang,Hao, Wen,Guilin, Qi.Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation[EB/OL].(2022-11-28)[2025-08-02].https://chinaxiv.org/abs/202211.00418.点此复制

Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

评论