|国家预印本平台
首页|Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation

中文摘要英文摘要

Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks,i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking (ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions arelinked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robustlinking in highly specialized domains. Simultaneously, the inference efficiency of existing models is lowwhen there are many candidate entities. On this account, we propose a novel model that leverages visuallinguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering thetrade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for thenew task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entitiesas ground truth. Extensive experimental results on the dataset show that our proposed model is effective asit significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.

Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks,i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking (ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions arelinked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robustlinking in highly specialized domains. Simultaneously, the inference efficiency of existing models is lowwhen there are many candidate entities. On this account, we propose a novel model that leverages visuallinguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering thetrade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for thenew task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entitiesas ground truth. Extensive experimental results on the dataset show that our proposed model is effective asit significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.

Qiushuo, Zheng、 Chaoyu, Bai、Meng, Wang、Hao, Wen、Guilin, Qi

10.1162/dint_a_00146

计算技术、计算机技术

Knowledge GraphMulti-modal LearningPoly Encoders

Knowledge GraphMulti-modal LearningPoly Encoders

Qiushuo, Zheng, Chaoyu, Bai,Meng, Wang,Hao, Wen,Guilin, Qi.Faster Zero-shot Multi-modal Entity Linking via VisualLinguistic Representation[EB/OL].(2022-11-28)[2025-08-02].https://chinaxiv.org/abs/202211.00418.点此复制

评论