首页|Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

来源：

英文摘要

Objectives: The rapid advancement of Multimodal Large Language Models (MLLMs) has significantly enhanced their reasoning capabilities, enabling a wide range of intelligent applications. However, these advancements also raise critical concerns regarding privacy and ethics. MLLMs are now capable of inferring the geographic location of images -- such as those shared on social media or captured from street views -- based solely on visual content, thereby posing serious risks of privacy invasion, including doxxing, surveillance, and other security threats. Methods: This study provides a comprehensive analysis of existing geolocation techniques based on MLLMs. It systematically reviews relevant litera-ture and evaluates the performance of state-of-the-art visual reasoning models on geolocation tasks, particularly in identifying the origins of street view imagery. Results: Empirical evaluation reveals that the most advanced visual large models can successfully localize the origin of street-level imagery with up to $49\%$ accuracy within a 1-kilometer radius. This performance underscores the models' powerful capacity to extract and utilize fine-grained geographic cues from visual data. Conclusions: Building on these findings, the study identifies key visual elements that contribute to suc-cessful geolocation, such as text, architectural styles, and environmental features. Furthermore, it discusses the potential privacy implications associated with MLLM-enabled geolocation and discuss several technical and policy-based coun-termeasures to mitigate associated risks. Our code and dataset are available at https://github.com/zxyl1003/MLLM-Geolocation-Evaluation.

作者：Xian Zhang、Xiang Cheng

作者单位：

学科分类：遥感技术

推荐引用：Xian Zhang,Xiang Cheng.Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks[EB/OL].(2025-06-30)[2025-07-16].https://arxiv.org/abs/2506.23481.点此复制

Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

评论