首页|GazeNLQ @ Ego4D Natural Language Queries Challenge 2025

GazeNLQ @ Ego4D Natural Language Queries Challenge 2025

来源：

英文摘要

This report presents our solution to the Ego4D Natural Language Queries (NLQ) Challenge at CVPR 2025. Egocentric video captures the scene from the wearer's perspective, where gaze serves as a key non-verbal communication cue that reflects visual attention and offer insights into human intention and cognition. Motivated by this, we propose a novel approach, GazeNLQ, which leverages gaze to retrieve video segments that match given natural language queries. Specifically, we introduce a contrastive learning-based pretraining strategy for gaze estimation directly from video. The estimated gaze is used to augment video representations within proposed model, thereby enhancing localization accuracy. Experimental results show that GazeNLQ achieves R1@IoU0.3 and R1@IoU0.5 scores of 27.82 and 18.68, respectively. Our code is available at https://github.com/stevenlin510/GazeNLQ.

作者：Wei-Cheng Lin、Chih-Ming Lien、Chen Lo、Chia-Hung Yeh

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Wei-Cheng Lin,Chih-Ming Lien,Chen Lo,Chia-Hung Yeh.GazeNLQ @ Ego4D Natural Language Queries Challenge 2025[EB/OL].(2025-06-06)[2025-07-16].https://arxiv.org/abs/2506.05782.点此复制

GazeNLQ @ Ego4D Natural Language Queries Challenge 2025

GazeNLQ @ Ego4D Natural Language Queries Challenge 2025

评论