TRAIL: Trace Reasoning and Agentic Issue Localization

Darshan Deshpande, Varun Gangal, Hersh Mehta, Jitin Krishnan, Anand Kannappan, Rebecca Qian

2025-05-14

TRAIL: Trace Reasoning and Agentic Issue Localization

Summary

This paper talks about TRAIL, a study focused on how well large language models can follow and debug the steps that AI agents take when doing complicated tasks, using a new dataset with human-marked examples.

What's the problem?

The problem is that even advanced language models have trouble figuring out where things go wrong when an AI agent is working through a long, multi-step process, which makes it hard to fix mistakes or improve how these agents work.

What's the solution?

The researchers created a special dataset where humans carefully marked the steps and issues in various agent workflows. They then tested modern language models to see how well they could trace and debug these processes, revealing where the models still struggle.

Why it matters?

This matters because understanding and fixing mistakes in AI workflows is crucial for making AI more reliable and trustworthy, especially as these systems take on more complex tasks in the real world.

Abstract

Modern long context LLMs struggle with trace debugging of agentic workflows, as evidenced by a new dataset of human-annotated traces.

View Paper