Fara-7B: An Efficient Agentic Model for Computer Use
Ahmed Awadallah, Yash Lara, Raghav Magazine, Hussein Mozannar, Akshay Nambi, Yash Pandya, Aravind Rajeswaran, Corby Rosset, Alexey Taymanov, Vibhav Vineet, Spencer Whitehead, Andrew Zhao
2025-11-26
Summary
This paper introduces a new way to create lots of data for teaching computers how to use the internet like people do, and then uses that data to build a surprisingly capable computer agent.
What's the problem?
Training computers to perform tasks on the web, like booking a flight or finding information, is hard because there isn't a lot of good data showing how humans actually *do* those things online. Large language models have tons of text to learn from, but there's no equivalent for teaching a computer to interact with websites. Existing datasets don't cover the full range of tasks people attempt online.
What's the solution?
The researchers created a system called FaraGen that automatically generates realistic web tasks, tries different ways to solve them, and then checks if the solutions actually work. This creates a large, diverse, and affordable dataset of successful web interactions. They then used this data to train a computer agent, Fara-7B, which 'sees' the computer screen and clicks where it thinks it needs to, all without needing a huge amount of computing power. They also created a new, more challenging set of web tasks called WebTailBench to better test these agents.
Why it matters?
This work is important because it shows that we can overcome the lack of real-world data by *creating* high-quality synthetic data. This allows us to build smaller, more efficient computer agents that can perform complex tasks on the web, potentially even running directly on your phone or laptop, and compete with much larger, more resource-intensive models. The open release of the model and benchmark will help other researchers build on this work.
Abstract
Progress in computer use agents (CUAs) has been constrained by the absence of large and high-quality datasets that capture how humans interact with a computer. While LLMs have thrived on abundant textual data, no comparable corpus exists for CUA trajectories. To address these gaps, we introduce FaraGen, a novel synthetic data generation system for multi-step web tasks. FaraGen can propose diverse tasks from frequently used websites, generate multiple solution attempts, and filter successful trajectories using multiple verifiers. It achieves high throughput, yield, and diversity for multi-step web tasks, producing verified trajectories at approximately $1 each. We use this data to train Fara-7B, a native CUA model that perceives the computer using only screenshots, executes actions via predicted coordinates, and is small enough to run on-device. We find that Fara-7B outperforms other CUA models of comparable size on benchmarks like WebVoyager, Online-Mind2Web, and WebTailBench -- our novel benchmark that better captures under-represented web tasks in pre-existing benchmarks. Furthermore, Fara-7B is competitive with much larger frontier models, illustrating key benefits of scalable data generation systems in advancing small efficient agentic models. We are making Fara-7B open-weight on Microsoft Foundry and HuggingFace, and we are releasing WebTailBench.