LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Royden Wagner, Omer Sahin Tas, Jaime Villa, Felix Hauser, Yinzhe Shen, Marlon Steiner, Dominik Strutz, Carlos Fernandez, Christian Kinzig, Guillermo S. Guitierrez-Cabello, Hendrik Königshof, Fabian Immel, Richard Schwarzkopf, Nils Alexander Rack, Kevin Rösch, Kaiwen Wang, Jan-Hendrik Pauls, Martin Lauer, Igor Gilitschenski, Holger Caesar, Christoph Stiller

2026-03-30

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Summary

This paper introduces a new dataset called KITScenes-Longtail, specifically designed to improve how self-driving cars handle unusual or rare driving situations.

What's the problem?

Self-driving technology is really good in common scenarios, but struggles when it encounters things it hasn't 'seen' much before – like unexpected obstacles or unusual traffic patterns. Existing datasets don't focus enough on these less frequent, but important, events, making it hard to train cars to react safely and effectively in those cases. It's difficult to test if a self-driving system truly *understands* what it's doing, beyond just avoiding crashes.

What's the solution?

The researchers created a dataset with lots of video footage from different camera angles, along with information about where the car is going, what instructions it's following, and detailed explanations of *why* a human driver would make certain decisions. These explanations are provided in three languages – English, Spanish, and Chinese – by experts from different backgrounds. This allows researchers to build and test AI models that can not only drive, but also explain their reasoning and follow complex instructions, even in rare situations.

Why it matters?

This dataset is important because it pushes self-driving research beyond just making cars safe and comfortable. It allows researchers to evaluate how well AI can actually *understand* driving situations and make smart decisions, especially when things get tricky. The multilingual reasoning traces also help understand how cultural differences might influence driving behavior and how to build more robust and adaptable self-driving systems.

Abstract

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail

View Paper