SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno
2024-10-16

Summary
This paper discusses SimBa, a new architecture designed to improve deep reinforcement learning (RL) by scaling up the number of parameters while avoiding common issues like overfitting.
What's the problem?
In deep reinforcement learning, increasing the size of the model (more parameters) can lead to problems such as overfitting, where the model performs well on training data but poorly on new data. Existing methods for scaling models have not been thoroughly explored, which limits their effectiveness in real-world applications.
What's the solution?
The authors introduce SimBa, which incorporates a simplicity bias to help large models learn more effectively. SimBa includes three key components: an observation normalization layer to standardize inputs, a residual feedforward block that allows for direct connections from input to output, and layer normalization to manage the strength of features. By using SimBa, they found that various deep RL algorithms showed improved efficiency and performance, even achieving results comparable to or better than state-of-the-art methods.
Why it matters?
This research is important because it provides a new way to scale deep reinforcement learning models without sacrificing performance. By enhancing how these models learn and adapt, SimBa can contribute to advancements in AI applications that require complex decision-making, such as robotics, gaming, and autonomous systems.
Abstract
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.