The RynnVLA-002 model has a hierarchical reward system that leverages multi-level geographical information to improve overall performance. It iteratively generates thoughts and actions, parsing and executing each action to yield a new observation. The model has been evaluated on the LIBERO benchmark, which includes photos and panoramas from around the world, along with satellite images of different cities.
RynnVLA-002 has a complete training pipeline, including a cold-start supervised fine-tuning stage and a reinforcement learning stage. The model can be trained on various datasets, including LIBERO and LeRobot, and has been shown to achieve state-of-the-art results on these benchmarks. The model's performance is demonstrated through a demo video, which showcases its capabilities in vision-language-action tasks.

