RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang
2025-02-20
Summary
This paper talks about RAD, a new way to train self-driving cars using a combination of reinforcement learning and 3D computer graphics. It's like teaching a car to drive in a super realistic video game before letting it loose on real roads.
What's the problem?
Current methods for teaching self-driving cars often rely on copying human drivers, which can lead to confusion about cause and effect, and doesn't prepare the car for unusual situations it hasn't seen before. It's like trying to learn to drive only by watching others, without actually getting behind the wheel yourself.
What's the solution?
The researchers created a system called RAD that uses 3D graphics to make incredibly realistic virtual environments for self-driving cars to practice in. In these virtual worlds, the cars can try different things and learn from their mistakes without any real-world risk. They also designed special rewards to teach the cars about safety and added some human-like driving behavior to make the cars drive more naturally.
Why it matters?
This matters because it could make self-driving cars much safer and more reliable. By practicing in virtual worlds, these cars can learn how to handle dangerous situations without putting anyone at risk. The results show that cars trained this way are three times less likely to crash than those trained using older methods. This could help make self-driving cars a reality sooner, potentially making our roads safer and changing how we think about transportation.
Abstract
Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.