Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen

2025-06-02

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and
Benchmarking Multimodal LLM Agents

Summary

This paper talks about Open CaptchaWorld, a new online platform that tests how well AI models can solve different types of CAPTCHA puzzles, which are the challenges websites use to tell humans and bots apart.

What's the problem?

The problem is that while AI models are getting good at understanding pictures and text, they still have a hard time solving CAPTCHAs, especially the interactive and multi-step ones that humans find easy. This makes it tough for AI agents to be truly useful on real websites, because they get stuck on these challenges.

What's the solution?

The researchers built Open CaptchaWorld, which includes a wide variety of modern CAPTCHA puzzles and a special scoring system that measures how much thinking and action is needed to solve each one. They tested both humans and advanced AI models on these puzzles and found that humans do much better, showing exactly where the AI falls short.

Why it matters?

This is important because it points out a big weakness in current AI models and provides a way to measure and improve their abilities. By using Open CaptchaWorld, researchers can develop smarter AI agents that can handle real-world web tasks, making them more helpful and reliable.

Abstract

Open CaptchaWorld benchmark evaluates MLLM-powered agents on diverse CAPTCHA puzzles, revealing significant performance gaps compared to humans.

View Paper