Key Features

Web-augmented agentic visual reasoning
Image-zoom-in tool
Web-search tool
Hierarchical reward system
Complete training pipeline
Supervised fine-tuning stage
Reinforcement learning stage
GeoBench benchmark evaluation

The GeoVista model is evaluated on the GeoBench benchmark, which includes photos and panoramas from around the world, along with satellite images of different cities. The evaluation pipeline consists of level-wise evaluation and nuanced evaluation, which extracts the predicted address and computes the haversine distance to the ground-truth location. GeoVista surpasses other open-source agentic models on the geolocation task and achieves performance comparable to closed-source models.


GeoVista has a hierarchical reward system that leverages multi-level geographical information to improve overall geolocation performance. The model iteratively generates thoughts and actions, parsing and executing each action to yield a new observation. This process repeats until it outputs a final geolocation prediction or reaches the maximum interaction turn limit. GeoVista's performance is demonstrated through a demo video, which showcases its capabilities in geolocation tasks.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!