Understanding and Improving Hyperbolic Deep Reinforcement Learning

Timo Klein, Thomas Lang, Andrii Shkabrii, Alexander Sturm, Kevin Sidak, Lukas Miklautz, Claudia Plant, Yllka Velaj, Sebastian Tschiatschek

2025-12-18

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Summary

This paper investigates using a special type of space, called hyperbolic space, to help reinforcement learning agents learn more effectively, particularly in complex environments. It identifies why previous attempts to use these spaces haven't always worked well and proposes a new method to overcome those challenges.

What's the problem?

Reinforcement learning agents rely on how they represent information about their surroundings. While hyperbolic spaces are good at capturing complex relationships, training agents within them is difficult because the learning process can become unstable. Specifically, the way the agent learns to value different actions and states can go awry when the internal representations get too large, causing the learning algorithm to fail. This instability stems from issues with how the agent updates its strategy during training.

What's the solution?

The researchers developed a new reinforcement learning agent called Hyper++. It tackles the instability problem in three key ways: first, it changes how the agent estimates the value of actions, using a method that’s less prone to errors. Second, it adds a rule that keeps the internal representations within a reasonable range, preventing them from becoming too large and causing instability, but without losing important information. Finally, it uses a slightly different way to build the hyperbolic network layers, making them easier to optimize during training. They tested Hyper++ on a variety of challenging games.

Why it matters?

This work is important because it makes hyperbolic spaces a more practical tool for reinforcement learning. By stabilizing the training process, Hyper++ allows agents to learn more efficiently and achieve better performance, especially in complex environments where understanding relationships between different elements is crucial. The 30% reduction in training time and improved performance on games like Atari demonstrate the potential of this approach to advance the field of artificial intelligence.

Abstract

The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose, as they naturally capture hierarchical and relational structure often present in complex RL environments. However, leveraging these spaces commonly faces optimization challenges due to the nonstationarity of RL. In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincaré Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic PPO agent that consists of three components: (i) stable critic training through a categorical value loss instead of regression; (ii) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping; and (iii) using a more optimization-friendly formulation of hyperbolic network layers. In experiments on ProcGen, we show that Hyper++ guarantees stable learning, outperforms prior hyperbolic agents, and reduces wall-clock time by approximately 30%. On Atari-5 with Double DQN, Hyper++ strongly outperforms Euclidean and hyperbolic baselines. We release our code at https://github.com/Probabilistic-and-Interactive-ML/hyper-rl .

View Paper