Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
arXiv:2601.05432v1 Announce Type: new Abstract: The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans — using maps. In this work, we first equip the model textit{Thinking with Map} ability and formulate it as an agent-in-the-map loop. We develop a two-stage optimization scheme for it, including […]