LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.
Mouxing Yang、Dezhong Peng、Peng Hu、Yijie Lin、Xi Peng、Yiding Lu
计算技术、计算机技术
Mouxing Yang,Dezhong Peng,Peng Hu,Yijie Lin,Xi Peng,Yiding Lu.LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification[EB/OL].(2025-04-14)[2025-05-14].https://arxiv.org/abs/2504.10174.点此复制
评论