EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu

2025-11-04

EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

Summary

This paper introduces a new way to control robots using a type of artificial intelligence called Energy-Based Transformers, or EBTs. It's an alternative to the currently popular method using 'diffusion policies', aiming to make robots more reliable and efficient.

What's the problem?

Current robot control systems, especially those using diffusion policies, are computationally expensive, meaning they require a lot of processing power. They also struggle when faced with unexpected situations or changes in their environment – this is called 'exposure bias' and can lead to the robot failing. These systems can also be unstable, meaning they don't always produce consistent results. Essentially, they're powerful but fragile.

What's the solution?

The researchers developed a new architecture called EBT-Policy. This system learns to understand the 'energy' of different actions, allowing it to choose the best course of action more reliably and efficiently. It uses Energy-Based Transformers to handle complex situations and scales well to real-world problems. Importantly, it can often figure out what to do after just a couple of tries, much faster than existing methods like Diffusion Policy which need many more attempts.

Why it matters?

EBT-Policy represents a significant step forward in robotics because it's faster, more stable, and more adaptable than current methods. It can even recover from mistakes without specific training for those situations, showing a level of intelligence not seen before. This makes it a promising approach for creating robots that can operate reliably in unpredictable real-world environments, and it could lead to more general and robust robot behavior.

Abstract

Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.

View Paper