|国家预印本平台
首页|LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

来源:Arxiv_logoArxiv
英文摘要

Language-driven action localization in videos requires not only semantic alignment between language query and video segment, but also prediction of action boundaries. However, the language query primarily describes the main content of an action and usually lacks specific details of action start and end boundaries, which increases the subjectivity of manual boundary annotation and leads to boundary uncertainty in training data. In this paper, on one hand, we propose to expand the original query by generating textual descriptions of the action start and end boundaries through LLMs, which can provide more detailed boundary cues for localization and thus reduce the impact of boundary uncertainty. On the other hand, to enhance the tolerance to boundary uncertainty during training, we propose to model probability scores of action boundaries by calculating the semantic similarities between frames and the expanded query as well as the temporal distances between frames and the annotated boundary frames. They can provide more consistent boundary supervision, thus improving the stability of training. Our method is model-agnostic and can be seamlessly and easily integrated into any existing models of language-driven action localization in an off-the-shelf manner. Experimental results on several datasets demonstrate the effectiveness of our method.

Zirui Shang、Xinxiao Wu、Shuo Yang

计算技术、计算机技术

Zirui Shang,Xinxiao Wu,Shuo Yang.LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization[EB/OL].(2025-05-30)[2025-06-21].https://arxiv.org/abs/2505.24282.点此复制

评论