Key Features

Unified vision-language-action and world model
Autoregressive action world model
Hierarchical reward system
Complete training pipeline
Supervised fine-tuning stage
Reinforcement learning stage
Evaluation on LIBERO benchmark
Support for various datasets

The RynnVLA-002 model has a hierarchical reward system that leverages multi-level geographical information to improve overall performance. It iteratively generates thoughts and actions, parsing and executing each action to yield a new observation. The model has been evaluated on the LIBERO benchmark, which includes photos and panoramas from around the world, along with satellite images of different cities.


RynnVLA-002 has a complete training pipeline, including a cold-start supervised fine-tuning stage and a reinforcement learning stage. The model can be trained on various datasets, including LIBERO and LeRobot, and has been shown to achieve state-of-the-art results on these benchmarks. The model's performance is demonstrated through a demo video, which showcases its capabilities in vision-language-action tasks.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!