The familiar grid asking users to click bicycles, crosswalks, mountains, stairs, or traffic lights may no longer be a reliable line between people and bots. New research from ETH Zurich PhD student Andreas Plesner and colleagues says locally run bots, using trained image-recognition models, can match human-level performance against Google’s reCAPTCHA v2 and reach a 100 percent success rate.
The finding does not mean every website has already lost bot protection. It does show that a widely recognized form of CAPTCHA is under heavy pressure from the same machine-learning progress that made visual recognition more capable in the first place.
What The Research Tested
The work focused on Google’s reCAPTCHA v2, the visible challenge that presents a grid of street images and asks the user to identify which tiles contain a given object. The system has been familiar to web users for years because it turns a simple question into a gate: prove you can recognize what is in these images, and you are more likely to be human.
Google began phasing that system out years ago in favor of reCAPTCHA v3, an invisible approach that evaluates user interactions rather than showing an explicit puzzle. Still, the older version remains important because it is still used by millions of websites. Sites using reCAPTCHA v3 can also fall back to reCAPTCHA v2 when the newer system gives a user a low human confidence rating.
That makes the research more than a curiosity. A visible CAPTCHA challenge may appear only at certain moments, but it still serves as a backup test across a large part of the web.
How The Bot Got Through
The researchers used a fine-tuned version of YOLO, the open source object-recognition model whose name stands for “You Only Look Once.” The paper describes YOLO as “well known for its ability to detect objects in real-time” and says it “can be used on devices with limited computational power, allowing for large-scale attacks by malicious users.”
Training used 14,000 labeled traffic images. After that, the system could estimate whether a CAPTCHA grid image matched one of reCAPTCHA v2’s 13 candidate categories.
The researchers also handled a second format, described as “type 2” challenges. In those cases, the CAPTCHA asks users to mark portions of a single segmented image that contain a target object. For that format, the team used a separate pre-trained YOLO model. It worked on nine of 13 object categories and requested a new image when it encountered one of the other four.
The image model was only part of the system. The automated agent also needed to avoid looking automated in other ways. The researchers used a VPN to reduce detection from repeated attempts from the same IP address. They created a mouse movement model to resemble human activity. They also used fake browser and cookie information drawn from real browsing sessions.
Why 100 Percent Matters
The object-recognition results varied by category. For individual CAPTCHA images, YOLO identified objects correctly from 69 percent of the time for motorcycles to 100 percent of the time for fire hydrants.
Those image results, combined with the other measures, were enough to pass every time. Some sessions required multiple challenges, but the bot still got through the overall CAPTCHA process with a 100 percent success rate.
The bot also solved the average CAPTCHA in slightly fewer challenges than a human in similar trials, although the improvement over humans was not statistically significant. The larger point is simpler: the bot did not need to be meaningfully better than a person. It only needed to clear the human test reliably.
Earlier academic studies that used image-recognition models against reCAPTCHAs succeeded between 68 to 71 percent of the time. The new paper’s authors argue that reaching 100 percent “shows that we are now officially in the age beyond captchas.”
The CAPTCHA Arms Race Continues
This is not the first time CAPTCHA systems have been weakened by automation. The source article notes that researchers were showing as far back as 2008 how bots could be trained to break audio CAPTCHAs designed for visually impaired users. By 2017, neural networks were being used to defeat text-based CAPTCHAs that asked users to read distorted letters.
Image-based tests were once part of the next step in that contest. Now, locally run AI can handle those too, at least in the reCAPTCHA v2 traffic-image setting described by this research.
The likely direction is already visible in the source. Human identification is moving toward more subtle forms of device fingerprinting and behavior analysis, rather than asking users to solve visual puzzles. A Google Cloud spokesperson told New Scientist, “We have a very large focus on helping our customers protect their users without showing visual challenges, which is why we launched reCAPTCHA v3 in 2018.” The spokesperson also said, “Today, the majority of reCAPTCHA’s protections across 7 [million] sites globally are now completely invisible. We are continuously enhancing reCAPTCHA.”
That shift reflects a practical problem. If machines can identify the same objects people identify, then object recognition alone is no longer a useful proof of humanity.
What This Says About The Web
CAPTCHAs have always depended on a moving boundary. They ask for something easy enough for people but difficult enough for machines. The paper’s authors put it this way: “In some sense, a good captcha marks the exact boundary between the most intelligent machine and the least intelligent human.”
That boundary is getting harder to use. As the authors write, “As machine learning models close in on human capabilities, finding good captchas has become more difficult.”
For users, the visible effect may be fewer image grids over time and more invisible checks happening in the background. For websites, the implication is that old visual challenges cannot be treated as permanent proof against automation. The CAPTCHA is not disappearing as a concept, but the era of relying on traffic-image puzzles as a decisive human test looks increasingly fragile.