HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

Shaojie Zhang, Pei Fu, Ruoceng Zhang, Jiahui Yang, Anan Du, Xiuwen Xi, Shaokang Wang, Ying Huang, Bin Qin, Zhenbo Luo, Jian Luan

2025-11-03

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

Summary

This paper focuses on making computer programs that can understand and follow instructions given in plain language to interact with graphical user interfaces (GUIs), like clicking buttons or filling out forms. It addresses the issue of these programs being *too* confident, even when they're likely to make mistakes.

What's the problem?

When we try to build programs that can automate tasks on a computer screen, they often aren't very good at knowing their own limits. They might confidently predict where to click, even if they're wrong, which can cause the whole task to fail. Current methods for training these programs, even advanced ones, don't teach them to accurately assess how sure they are about their actions. This is especially problematic in dynamic GUIs where a single wrong click can ruin everything.

What's the solution?

The researchers developed a new system called HyperClick. It works by giving the program two kinds of feedback: a simple 'right or wrong' signal, and a more detailed assessment of *how* confident the program was in its prediction. HyperClick uses a special way to measure this confidence, and then adjusts the program's learning process to reward not just correct actions, but also accurate confidence levels. Essentially, it teaches the program to be honest about when it's unsure.

Why it matters?

This research is important because it makes GUI automation more reliable. By making programs more aware of their own capabilities and limitations, we can reduce errors and build systems that are more trustworthy. This could lead to better automated tools for everyday tasks, and more robust systems for complex applications.

Abstract

Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement fine-tuning (RFT), lack self-awareness of their capability boundaries, leading to overconfidence and unreliable predictions. We first systematically evaluate probabilistic and verbalized confidence in general and GUI-specific models, revealing a misalignment between confidence and actual accuracy, which is particularly critical in dynamic GUI automation tasks, where single errors can cause task failure. To address this, we propose HyperClick, a novel framework that enhances reliable GUI grounding through uncertainty calibration. HyperClick introduces a dual reward mechanism, combining a binary reward for correct actions with a truncated Gaussian-based spatial confidence modeling, calibrated using the Brier score. This approach jointly optimizes grounding accuracy and confidence reliability, fostering introspective self-criticism. Extensive experiments on seven challenge benchmarks show that HyperClick achieves state-of-the-art performance while providing well-calibrated confidence. By enabling explicit confidence calibration and introspective self-criticism, HyperClick reduces overconfidence and supports more reliable GUI automation.

View Paper