NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, Soujanya Poria

2025-11-18

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

Summary

This paper focuses on improving the reliability and adaptability of Vision-Language-Action (VLA) models, which are AI systems that understand instructions, see the world, and then take actions. They introduce a new model, NORA-1.5, and a method for refining it after initial training to make it work better in the real world.

What's the problem?

Current VLA models, while promising, aren't always dependable or able to handle new situations well. They struggle when moved to different robots or real-world environments, meaning they don't generalize their learning effectively. Essentially, they can be good at a specific task in a specific setting, but fail when things change even slightly.

What's the solution?

The researchers built NORA-1.5 by improving an existing model called NORA. They added a new component based on 'flow matching' to help it predict actions more accurately. Then, they created a system of 'rewards' that tell the model how good its actions are. These rewards consider whether the actions move the robot closer to its goal and how much the actions differ from what a 'good' action would look like. Finally, they used these rewards to further train NORA-1.5, making it better suited for specific robots and tasks.

Why it matters?

This work is important because it moves us closer to creating robots that can reliably follow instructions and operate in the real world. By improving the robustness and adaptability of VLA models, we can build AI agents that are more dependable and useful in everyday situations, rather than being limited to controlled environments. The reward-based training method is a simple but effective way to make these models more practical.

Abstract

Vision--language--action (VLA) models have recently shown promising performance on a variety of embodied tasks, yet they still fall short in reliability and generalization, especially when deployed across different embodiments or real-world environments. In this work, we introduce NORA-1.5, a VLA model built from the pre-trained NORA backbone by adding to it a flow-matching-based action expert. This architectural enhancement alone yields substantial performance gains, enabling NORA-1.5 to outperform NORA and several state-of-the-art VLA models across both simulated and real-world benchmarks. To further improve robustness and task success, we develop a set of reward models for post-training VLA policies. Our rewards combine (i) an action-conditioned world model (WM) that evaluates whether generated actions lead toward the desired goal, and (ii) a deviation-from-ground-truth heuristic that distinguishes good actions from poor ones. Using these reward signals, we construct preference datasets and adapt NORA-1.5 to target embodiments through direct preference optimization (DPO). Extensive evaluations show that reward-driven post-training consistently improves performance in both simulation and real-robot settings, demonstrating significant VLA model-reliability gains through simple yet effective reward models. Our findings highlight NORA-1.5 and reward-guided post-training as a viable path toward more dependable embodied agents suitable for real-world deployment.

View Paper