Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Jiahao Wang, Zhenpei Yang, Yijing Bai, Yingwei Li, Yuliang Zou, Bo Sun, Abhijit Kundu, Jose Lezama, Luna Yue Huang, Zehao Zhu, Jyh-Jing Hwang, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang

2025-10-10

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Summary

This paper explores using AI to create realistic virtual worlds for testing self-driving cars, and then uses those worlds to improve the self-driving AI itself.

What's the problem?

Currently, testing self-driving cars is expensive and sometimes dangerous, requiring lots of real-world driving. While AI can *generate* realistic-looking videos of driving scenarios, it's unclear if these videos are accurate enough to reliably test a self-driving system. Also, understanding *why* a self-driving AI makes certain decisions is difficult, and it's hard to make them perform well in situations they haven't specifically been trained for.

What's the solution?

The researchers developed a system called Drive&Gen that connects video-generating AI with self-driving AI. They created ways to measure how realistic the generated videos are, using the self-driving AI as a judge. By carefully controlling the generated scenarios, they pinpointed weaknesses in the self-driving AI. Finally, they showed that training the self-driving AI on these AI-generated videos actually makes it better at handling real-world situations it hasn't encountered before.

Why it matters?

This work is important because it offers a cheaper and safer way to test and improve self-driving cars. By using AI to create endless testing scenarios, and by helping us understand how self-driving AI thinks, we can make these systems more reliable and expand where they can safely operate, potentially leading to wider adoption of autonomous vehicle technology.

Abstract

Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize to out-of-distribution scenarios? In this work, we bridge the gap between the driving models and generative world models (Drive&Gen) to address these questions. We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos. By exploiting the controllability of the video generation model, we conduct targeted experiments to investigate distribution gaps affecting E2E planner performance. Finally, we show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection. This synthetic data effectively improves E2E model generalization beyond existing Operational Design Domains, facilitating the expansion of autonomous vehicle services into new operational contexts.

View Paper