Breaking reCAPTCHAv2
Andreas Plesner, Tobias Vontobel, Roger Wattenhofer
2024-09-17

Summary
This paper discusses how researchers successfully used advanced machine learning techniques to break Google's reCAPTCHA v2 system, achieving a 100% success rate in solving its challenges.
What's the problem?
reCAPTCHA v2 is designed to differentiate between human users and bots by presenting challenges that are easy for people but difficult for automated systems. However, as AI technology advances, these systems may become less effective at stopping bots, especially if they can solve the challenges just as well as humans.
What's the solution?
The researchers modified a machine learning model known as YOLO (You Only Look Once) to improve its ability to identify objects in images used in reCAPTCHA challenges. By training the model on thousands of images, they enabled it to solve all presented captchas, significantly outperforming previous attempts that only managed a success rate of 68-71%. Their approach also revealed that both humans and bots face similar challenges when solving these captchas.
Why it matters?
This research is crucial because it highlights the vulnerability of current captcha systems to advanced AI technologies. As AI continues to improve, there is a pressing need for captcha systems to evolve and become more sophisticated to maintain their effectiveness in protecting websites from bots and ensuring user security.
Abstract
Our work examines the efficacy of employing advanced machine learning methods to solve captchas from Google's reCAPTCHAv2 system. We evaluate the effectiveness of automated systems in solving captchas by utilizing advanced YOLO models for image segmentation and classification. Our main result is that we can solve 100% of the captchas, while previous work only solved 68-71%. Furthermore, our findings suggest that there is no significant difference in the number of challenges humans and bots must solve to pass the captchas in reCAPTCHAv2. This implies that current AI technologies can exploit advanced image-based captchas. We also look under the hood of reCAPTCHAv2, and find evidence that reCAPTCHAv2 is heavily based on cookie and browser history data when evaluating whether a user is human or not. The code is provided alongside this paper.