Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola
2025-10-31
Summary
This paper investigates whether the impressive improvements we've seen in AI's ability to handle tests and puzzles actually translate into real-world economic benefits, like automating jobs.
What's the problem?
While AI is getting better at things like answering questions and solving problems in a research setting, it's unclear if this means AI can actually *do* useful work that people are paid to do. There's a gap between showing AI can perform well on benchmarks and demonstrating it can automate tasks in a practical, economically valuable way.
What's the solution?
The researchers created a new way to measure AI's ability to perform real work, called the Remote Labor Index (RLI). This index includes a variety of projects from different industries that require AI to complete tasks from start to finish, just like a human worker would. They then tested current AI systems on these projects to see how much work they could actually automate.
Why it matters?
The results show that current AI is still quite limited in its ability to automate work – the best AI could only automate about 2.5% of the tasks. This research provides a realistic assessment of AI's current capabilities, helping us understand the actual impact of AI on jobs and allowing businesses and policymakers to prepare for future changes.
Abstract
AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.