AI chatbots are reshaping which web sources users see

A study from Ruhr University Bochum and the Max Planck Institute for Software Systems compared Google organic search with four generative AI search systems. It found that AI chatbots often cite different, less familiar websites than Google search, with major differences in freshness, source depth, diversity, and consistency.

AI chatbots are reshaping which web sources users see

AI chatbots are not simply putting a conversational layer on top of traditional search. A study from Ruhr University Bochum and the Max Planck Institute for Software Systems shows that generative AI search systems can choose different sources, summarize information differently, and give users a different view of the web than Google organic search.

The result is a shift in how people encounter information. The answer may look direct and polished, but the sources behind it can be broader, less predictable, and less stable than the links users see in classic search results.

What the Study Compared

The researchers compared Google's organic search results with four generative AI search systems: Google AI Overview, Gemini 2.5 Flash with search, GPT-4o-Search, and GPT-4o with the search tool enabled.

The study used more than 4,600 queries across six topics, including politics, product reviews, and science. That range matters because search behavior changes depending on the task. A product review, a science question, and a political query can all reward different kinds of sources and different levels of freshness.

The systems also differed in how they used the live web. GPT-4o-Search always performs a live web search for every query. GPT-4o with the search tool enabled decides whether to rely on its internal knowledge or look up new information for each question.

That design choice affects the answer before source selection even begins. A system that always searches will build its response around newly retrieved material. A system that sometimes relies on internal knowledge may produce a useful answer, but it can also cite fewer outside sources or miss changes that happened after its internal knowledge became outdated.

Why AI Sources Look Different

The clearest finding is that AI search systems do not mirror Google's top organic results. In the study, 53 percent of the websites cited by AI Overview did not appear in Google's top 10 organic results. Another 27 percent were not even in the top 100.

That means users who rely on an AI-generated answer may be exposed to websites they would not have seen on the first page of Google results, or even deep into the ranked list. This can widen the information pool, but it also makes the origin of the answer less familiar.

The study also found that AI systems often use domains that are less well-known. Only about a third of the domains used by AI Overview and GPT-Tool were among the 1,000 most-visited sites. Organic search was higher at 38 percent.

The number of external sources varied sharply by system:

  • GPT-Tool averaged just 0.4 external sources per answer.
  • AI Overview and Gemini pulled from over eight sites per query.
  • GPT-Search used about four sources per answer.

These differences show that the label AI search covers very different behaviors. Some systems act more like research aggregators. Others lean more heavily on the model's internal knowledge and use fewer citations.

Coverage Can Narrow on Ambiguous Questions

AI systems and search engines often cover similar broad topics, but they do not always give users the same range of angles. Using the LLooM framework, researchers found that even the most limited AI system, GPT-Tool, included 71 percent of the overall topic coverage found across all search tools.

That suggests AI answers can capture much of the main subject matter. But the gap becomes more important when a question has more than one possible meaning.

For ambiguous questions, organic search covered 60 percent of possible subtopics. AI Overview covered 51 percent, while GPT-Tool covered 47 percent.

In practical terms, a user may receive a coherent AI answer while still missing relevant interpretations. The response can feel complete because it is written as a finished summary. But a traditional search results page may expose more branches of the question at once, especially when the user's intent is not obvious.

Freshness and Consistency Are Still Uneven

Current events remain a major stress test. In a test of 100 trending topics from September 2025, AI Overviews appeared for only 3 percent of queries. GPT-Search covered 72 percent of topics, organic search covered 67 percent, Gemini covered 66 percent, and GPT-Tool covered 51 percent.

The study included a clear example involving Ricky Hatton's cause of death. GPT-Tool relied on outdated internal knowledge and incorrectly reported that the boxer was still alive.

That example highlights a basic risk for systems that do not regularly update their knowledge or do not search the web for every query. If the subject has changed recently, a confident answer can be wrong because the system is working from stale information.

Consistency also differs. When the same questions were asked two months apart, organic search returned the same sources 45 percent of the time. Gemini matched earlier sources 40 percent of the time. AI Overview matched its earlier results only 18 percent of the time.

The study found that broad topic coverage can remain stable even when the sources change. Still, changing citations can matter. The surface answer may look similar, while the evidence behind it and the perspectives included may shift from one session to another.

What This Means for Search

The researchers argue that search quality benchmarks need to change. Traditional measures do not fully capture AI-driven systems that retrieve sources, summarize them, and decide how much of the answer should come from external material.

They call for evaluation methods that consider source diversity, topic breadth, and how information is summarized. Those factors are now part of the search experience, even when users are not actively choosing between a chatbot and a search engine.

For readers, the main lesson is simple: AI chatbots can make information easier to consume, but they also change what gets surfaced. Less familiar sources may appear more often. Some answers may cite many sites, while others cite almost none. Breaking news can expose outdated knowledge. Ambiguous questions can receive narrower treatment than they would in organic search.

For companies, the shift also affects visibility. As search engines and AI tools merge, SEO strategies are being reconsidered for a landscape where ranking in Google's organic results is not the only path into an answer.

The study does not show one universal winner across all search tasks. It shows a more complicated future: search is becoming a mix of retrieval, summarization, internal model knowledge, and source selection. That makes the answer faster to read, but it also makes the path behind the answer more important to inspect.