V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen

2025-02-17

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with
Multi-Modal Large Language Models

Summary

This paper talks about V2V-LLM, a new system that helps self-driving cars work together and communicate using advanced AI language models. It's like giving cars the ability to have smart conversations with each other to make driving safer and more efficient.

What's the problem?

Current self-driving cars mostly rely on their own sensors to understand what's happening around them and decide where to go. This can be dangerous if the sensors aren't working well or can't see everything. While some systems let cars share what they see with each other, they don't really help the cars plan together or make decisions as a team.

What's the solution?

The researchers created V2V-LLM, which uses a powerful AI language model to help cars share information and work together. They also made a special dataset called V2V-QA to test how well the system works. V2V-LLM can understand information from multiple cars at once, figure out what's important, and help plan safe driving routes. It can answer questions about the road, identify important objects, and help with planning.

Why it matters?

This matters because it could make self-driving cars much safer and more reliable. By working together and sharing information, cars could avoid accidents even when their own sensors aren't perfect. It's a big step towards having truly smart, cooperative self-driving cars that can handle complex road situations as a team, potentially making roads safer for everyone.

Abstract

Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on detection and tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .

View Paper