SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Shaofei Cai, Yulei Qin, Haojia Lin, Zihan Xu, Gang Li, Yuchen Shi, Zongyi Li, Yong Mao, Siqi Cai, Xiaoyu Tan, Yitao Liang, Ke Li, Xing Sun

2025-12-30

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Summary

This paper introduces a new way to train AI agents to perform tasks on computers, specifically those with graphical user interfaces like apps on your phone. It focuses on making the process of checking if the AI successfully completed the task much more efficient and reliable.

What's the problem?

Currently, when training an AI to use a computer program, we have to let it try the task and *then* have another program check if it succeeded. This checking process looks at everything the AI did, which is often a lot of unnecessary information, making it slow, expensive, and sometimes inaccurate. It's like having someone watch a whole movie to see if the main character achieved their goal, instead of just focusing on the key scenes.

What's the solution?

The researchers developed a system called SmartSnap where the AI agent is trained to not only *do* the task but also to *prove* it did it correctly. It does this by taking 'snapshots' – small, focused pieces of evidence – showing the key steps that demonstrate success. Then, a separate AI acts as a judge, but only needs to look at these snapshots, making the verification process much faster and more reliable. They also established guidelines for the AI to take good snapshots: they should be complete, concise, and creatively highlight the success.

Why it matters?

This new approach allows for training more powerful AI agents that can handle complex tasks without requiring massive amounts of computing power or time for verification. The experiments showed significant performance improvements, meaning AI can learn to use software more effectively and efficiently, potentially leading to more helpful and automated tools in the future.

Abstract

Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.

View Paper