AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, Xinggang Wang

2025-03-11

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via
Reinforcement Learning and Reasoning

Summary

This paper talks about AlphaDrive, a smart AI system that helps self-driving cars handle tricky situations by learning through practice and thinking through decisions step-by-step, like a student solving tough problems.

What's the problem?

Current self-driving AI struggles with rare or complex scenarios (like unexpected roadblocks) because it lacks common sense and can’t plan well, even when using advanced vision-language models trained in basic ways.

What's the solution?

AlphaDrive uses a two-step training method: first, it learns from examples (like studying), then practices with rewards for good decisions (like trial-and-error games), while special rewards guide it to plan better routes and avoid mistakes.

Why it matters?

This makes self-driving cars safer and better at handling real-world surprises, reducing accidents and helping them navigate cities more smoothly.

Abstract

OpenAI o1 and DeepSeek R1 achieve or even surpass human expert-level performance in complex domains like mathematics and science, with reinforcement learning (RL) and reasoning playing a crucial role. In autonomous driving, recent end-to-end models have greatly improved planning performance but still struggle with long-tailed problems due to limited common sense and reasoning abilities. Some studies integrate vision-language models (VLMs) into autonomous driving, but they typically rely on pre-trained models with simple supervised fine-tuning (SFT) on driving data, without further exploration of training strategies or optimizations specifically tailored for planning. In this paper, we propose AlphaDrive, a RL and reasoning framework for VLMs in autonomous driving. AlphaDrive introduces four GRPO-based RL rewards tailored for planning and employs a two-stage planning reasoning training strategy that combines SFT with RL. As a result, AlphaDrive significantly improves both planning performance and training efficiency compared to using only SFT or without reasoning. Moreover, we are also excited to discover that, following RL training, AlphaDrive exhibits some emergent multimodal planning capabilities, which is critical for improving driving safety and efficiency. To the best of our knowledge, AlphaDrive is the first to integrate GRPO-based RL with planning reasoning into autonomous driving. Code will be released to facilitate future research.

View Paper