FlyPose: Towards Robust Human Pose Estimation From Aerial Views
Hassaan Farooq, Marvin Brenner, Peter St\ütz
2026-01-13
Summary
This paper introduces FlyPose, a system designed to accurately detect and track people from the perspective of a drone, which is becoming increasingly important as drones are used more often around people.
What's the problem?
Detecting people from drones is harder than from ground-level cameras because drones see people from far away, at steep angles, and people are often partially hidden or blocked from view. Existing methods struggle with these challenges and often aren't fast enough to work in real-time, which is crucial for safety when a drone is flying near people.
What's the solution?
The researchers created FlyPose, a system specifically built for this problem. They trained it using images from multiple datasets to improve its accuracy in detecting people and estimating their poses (where their body parts are). They also created a new, challenging dataset of drone images with people in them to help others improve their systems. FlyPose is designed to be lightweight and fast, running in about 20 milliseconds on a small computer often used in drones.
Why it matters?
This work is important because it makes drones safer and more reliable when operating near humans. Accurate human pose estimation allows drones to better understand their surroundings and avoid collisions, enabling applications like package delivery, search and rescue, and infrastructure inspection in populated areas.
Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.