|国家预印本平台
首页|Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

来源:Arxiv_logoArxiv
英文摘要

Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise of large vision-language models (LVLMs) has enabled a rethinking of geo-localization as a reasoning-driven task grounded in visual cues. However, two major challenges persist. On the data side, existing reasoning-focused datasets are primarily based on street-view imagery, offering limited scene diversity and constrained viewpoints. On the modeling side, current approaches predominantly rely on supervised fine-tuning, which yields only marginal improvements in reasoning capabilities. To address these challenges, we propose a novel pipeline that constructs a reasoning-oriented geo-localization dataset, MP16-Reason, using diverse social media images. We introduce GLOBE, Group-relative policy optimization for Locatability assessment and Optimized visual-clue reasoning, yielding Bi-objective geo-Enhancement for the VLM in recognition and reasoning. GLOBE incorporates task-specific rewards that jointly enhance locatability assessment, visual clue reasoning, and geolocation accuracy. Both qualitative and quantitative results demonstrate that GLOBE outperforms state-of-the-art open-source LVLMs on geo-localization tasks, particularly in diverse visual scenes, while also generating more insightful and interpretable reasoning trajectories.

Ling Li、Yao Zhou、Yuxuan Liang、Fugee Tsung、Jiaheng Wei

计算技术、计算机技术

Ling Li,Yao Zhou,Yuxuan Liang,Fugee Tsung,Jiaheng Wei.Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models[EB/OL].(2025-06-17)[2025-08-02].https://arxiv.org/abs/2506.14674.点此复制

评论