Cloudflare data points to a widening divide in how much of the web major AI companies can reach. According to Cloudflare CEO Matthew Prince, Google currently has a much larger window into online content than OpenAI, Microsoft, Anthropic or Meta.
The reason, Prince argues, is not only technical scale. It is also the way Google connects its long-standing search crawler with its AI data collection systems, putting publishers in a difficult position when they try to control how their content is used.
The scale of Google's data edge
Prince says internal Cloudflare measurements show that Google currently sees 3.2 times more pages than OpenAI. The gap is even larger when compared with other AI competitors. Google captures 4.6 times more content than Microsoft and 4.8 times more than Anthropic or Meta.
Those figures matter because generative AI systems depend heavily on access to web content. The more pages a company can collect, inspect or use, the stronger its potential position becomes in building and improving AI models.
In Prince’s view, Google’s advantage comes from tying together two functions that publishers would prefer to manage separately: appearing in Google Search and allowing content to be gathered for AI training. Because those systems are bundled, website owners cannot simply reject one form of access while keeping the other.
Why publishers have limited leverage
For publishers, Google Search remains a major channel for visibility. Losing that visibility can mean losing readers, traffic and the commercial value tied to being discoverable. That is why the crawler arrangement creates a high-stakes decision.
Under the situation described by Prince, a site owner who wants to block Google from using content for AI training also risks disappearing from Google Search. That creates a practical barrier to opting out, even for publishers that want stronger control over their work.
The source describes this as a choice between allowing content to be used to train Google's AI models or giving up search visibility. For many publishers, that trade-off could be financially damaging.
Cloudflare's blocked AI requests show the pushback
The tension is not theoretical. Since July 1, Cloudflare has already blocked 416 billion AI requests for its customers. That number shows how actively site owners are trying to resist or manage automated AI access to their content.
But the blocking system mainly affects companies that follow standards or identify their crawlers separately. In other words, publishers can more effectively restrict AI systems when those systems are clearly distinct from other web crawlers.
Google is different in Prince’s account because its search and AI systems are closely coupled. That coupling lets Google avoid barriers that affect competitors whose AI crawlers are easier for publishers and infrastructure providers to identify and block.
Why the crawler split matters
Prince frames the issue as an extension of Google’s established strength in search into the AI market. His argument is that a company with long-standing dominance in search can use that position to gain a privileged level of access to web data for artificial intelligence.
The central problem is control. Publishers want to decide whether their content can be used for AI training without sacrificing their ability to appear in search results. A separate search crawler and AI crawler would make that distinction clearer.
Prince told WIRED that Google is the central obstacle to progress unless pressured or persuaded to separate its search and AI crawlers. Without that split, publishers have little practical ability to protect their content or negotiate licensing models that could matter in the era of generative AI.
What this means for the AI market
The numbers cited by Prince suggest that crawler design can shape competition. If Google sees 3.2 times more pages than OpenAI, 4.6 times more content than Microsoft and 4.8 times more than Anthropic or Meta, then access itself becomes a competitive advantage.
That advantage does not only affect AI labs. It also affects publishers, who supply much of the material that AI systems seek to learn from. When access is linked to search visibility, publishers may have less room to set terms for how their work is used.
The broader question is whether the web can support separate rules for discovery and AI training. Cloudflare’s data shows that many site owners are already trying to draw that line. Prince’s criticism is that Google’s current approach makes the line difficult to enforce.
For now, the dispute centers on a simple but consequential issue: whether publishers can remain visible in Google Search while still saying no to Google’s AI data collection. According to Prince, they effectively cannot.