FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments
Amir Saeidi, Venkatesh Mishra, Souradeep Mukhopadhyay, Gaowen Liu, Ali Payani, Jayanth Srinivasa, Chitta Baral
2026-04-30
Summary
This paper focuses on improving how well AI agents, powered by Large Language Models (LLMs), can handle complex, real-world tasks that require multiple steps and using different tools. These agents often stumble when making a series of decisions, leading to errors that build on each other.
What's the problem?
AI agents using LLMs, especially those that are smaller or have limited resources, frequently make mistakes when trying to solve problems through conversation, like a customer service chatbot. These errors aren't just isolated incidents; they snowball because each wrong decision makes the next one harder to get right. The agents struggle to maintain accuracy over a longer conversation or task.
What's the solution?
The researchers developed a system called FAMA, which stands for Failure-Aware Meta-Agentic framework. It works in two parts: first, it studies where existing agents typically fail. Then, it uses a team of specialized 'helper' agents to provide extra, targeted information to the main agent *before* it makes a decision. This extra information helps the main agent avoid the common mistakes identified in the first step, making the whole process more reliable.
Why it matters?
This research is important because it shows a practical way to build more dependable AI agents. By focusing on preventing common errors instead of just trying to make the main agent 'smarter,' they achieved significant improvements in performance. This is especially valuable for open-source LLMs that might not have the same capabilities as larger, more expensive models, and it brings us closer to AI agents that can truly assist us in complex situations.
Abstract
Large Language Models are being increasingly deployed as the decision-making core of autonomous agents capable of effecting change in external environments. Yet, in conversational benchmarks, which simulate real-world customer-centric issue resolution scenarios, these agents frequently fail due to the cascading effects of incorrect decision-making. These challenges are particularly pronounced for open-source LLMs with smaller parameter sizes, limited context windows, and constrained inference budgets, which contribute to increased error accumulation in agentic settings. To tackle these challenges, we present the Failure-Aware Meta-Agentic (FAMA) framework. FAMA operates in two stages: first, it analyzes failure trajectories from baseline agents to identify the most prevalent errors; second, it employs an orchestration mechanism that activates a minimal subset of specialized agents tailored to address these failures by injecting a targeted context for the tool-use agent before the decision-making step. Experiments across open-source LLMs demonstrate performance gains up to 27% across evaluation modes over standard baselines. These results highlight that targeted curation of context through specialized agents to address common failures is a valuable design principle for building reliable, multi-turn tool-use LLM agents that simulate real-world conversational scenarios.