CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou

2025-03-06

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time
Cognitive Task Solving and Reasoning in UAVs

Summary

This paper talks about CognitiveDrone, a new AI system designed to help drones solve complex tasks in real-time by combining vision, language, and action capabilities.

What's the problem?

Drones often struggle with tasks that require advanced reasoning and decision-making, especially when they need to follow instructions or make sense of visual information. Current systems either focus on simple tasks like racing or lack the ability to handle more cognitive challenges effectively.

What's the solution?

The researchers developed CognitiveDrone, which uses a Vision-Language-Action (VLA) model trained on over 8,000 simulated flight scenarios. They also created an improved version called CognitiveDrone-R1, which includes an extra reasoning module to simplify and clarify task instructions. This system was tested using a benchmark called CognitiveDroneBench, showing that the enhanced model significantly outperformed simpler models in tasks like recognizing humans, understanding symbols, and reasoning.

Why it matters?

This matters because it enables drones to handle more complicated and real-world scenarios, such as search-and-rescue missions or human-robot interactions. By improving their ability to think and act intelligently, drones can become more versatile and reliable tools in various industries.

Abstract

This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io

View Paper