Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang

2025-10-30

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

Summary

This paper introduces a new way to create realistic driving videos using artificial intelligence, specifically focusing on improving how well self-driving cars 'see' and understand the world around them.

What's the problem?

Current AI systems for generating driving videos are good at making videos *look* real and allowing control over what happens in them, but they don't really test if these videos actually help self-driving car systems perform better at tasks like recognizing objects or handling unusual situations. Also, a common training method involves a lot of training time – first on fake data, then on real data – and the paper shows that simply training longer on real data can sometimes be just as good, making the fake data seem less useful.

What's the solution?

The researchers developed a system called Dream4Drive that creates synthetic driving videos in a smarter way. It breaks down scenes into 3D elements and then renders them, giving a lot of control over creating specific, challenging scenarios – like unusual weather or rare objects – that self-driving cars might encounter. They also created a large collection of 3D objects commonly found in driving environments, called DriveObj3D, to help with this process. The AI is then trained on these generated videos.

Why it matters?

This work is important because it provides a method for generating targeted training data that specifically improves a self-driving car’s ability to handle difficult and uncommon situations, which are crucial for safety. By creating these 'corner cases' at scale, it helps make autonomous driving systems more robust and reliable, and the released dataset will help other researchers build on this work.

Abstract

Recent advancements in driving world models enable controllable generation of high-quality RGB videos or multimodal videos. Existing methods primarily focus on metrics related to generation quality and controllability. However, they often overlook the evaluation of downstream perception tasks, which are really crucial for the performance of autonomous driving. Existing methods usually leverage a training strategy that first pretrains on synthetic data and finetunes on real data, resulting in twice the epochs compared to the baseline (real data only). When we double the epochs in the baseline, the benefit of synthetic data becomes negligible. To thoroughly demonstrate the benefit of synthetic data, we introduce Dream4Drive, a novel synthetic data generation framework designed for enhancing the downstream perception tasks. Dream4Drive first decomposes the input video into several 3D-aware guidance maps and subsequently renders the 3D assets onto these guidance maps. Finally, the driving world model is fine-tuned to produce the edited, multi-view photorealistic videos, which can be used to train the downstream perception models. Dream4Drive enables unprecedented flexibility in generating multi-view corner cases at scale, significantly boosting corner case perception in autonomous driving. To facilitate future research, we also contribute a large-scale 3D asset dataset named DriveObj3D, covering the typical categories in driving scenarios and enabling diverse 3D-aware video editing. We conduct comprehensive experiments to show that Dream4Drive can effectively boost the performance of downstream perception models under various training epochs. Page: https://wm-research.github.io/Dream4Drive/ GitHub Link: https://github.com/wm-research/Dream4Drive

View Paper