OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic

Songyan Zhang, Wenhui Huang, Zhan Chen, Chua Jiahao Collister, Qihang Huang, Chen Lv

2025-12-02

OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic

Summary

This paper introduces a new framework called OpenREAD for improving self-driving cars. It focuses on making the car's 'thinking' process, or reasoning, better through a combination of learning from existing data and then improving through trial and error.

What's the problem?

Current self-driving systems often struggle with truly understanding complex driving situations and making good decisions in unpredictable scenarios. While they can learn from labeled examples (supervised learning), this doesn't always translate to handling new situations well. Also, improving these systems through 'trial and error' (reinforcement learning) is difficult because it's hard to define what a 'good' outcome looks like for complex reasoning tasks – you can’t easily give a reward for 'understanding' a scene.

What's the solution?

The researchers developed OpenREAD, which uses a powerful language model to act as a 'critic' during the trial-and-error learning phase. This critic evaluates how well the car is *reasoning* about the driving situation, not just whether it successfully completes a maneuver. They also created a large dataset of detailed explanations (Chain-of-Thought) to help the system learn how to think through driving problems. By combining this reasoning evaluation with the trial-and-error process, the system learns to both understand the scene and plan a safe path.

Why it matters?

This work is important because it moves self-driving cars closer to being able to handle complex, real-world driving situations. By focusing on improving the car’s reasoning abilities, and by finding a way to evaluate that reasoning, OpenREAD allows for more effective learning and better overall performance in both understanding the environment and making driving decisions.

Abstract

Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcement fine-tuning (RFT), have shown strong potential in advancing the knowledge-driven autonomous driving (AD) paradigm. However, the learning nature of SFT still limits the generalization of reasoning, thereby constraining the full potential of driving performance. Meanwhile, current RFT approaches are primarily applied to downstream tasks, since scene understanding is an open-ended problem where corresponding rewards are difficult to quantify. To address these limitations, we propose OpenREAD, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework that enables end-to-end RFT across the full spectrum from high-level reasoning to low-level trajectory planning. Specifically, we begin by constructing large-scale Chain-of-Thought (CoT) annotations on open-source driving-related knowledge datasets, and employ the powerful Qwen3 large language model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.

View Paper