AI text is making new websites more alike and upbeat

A study of English-language websites from the Internet Archive found that about 35 percent of newly published websites were fully or partially AI-generated by mid-2025. The clearest effects were not more factual errors, but more similar wording and a stronger upbeat tone.

WTF Index IDIOCRACY
◄ Terminator 0 Idiocracy 3 ►

The story points to AI-generated text making the web more homogenized and blandly upbeat, eroding variety and quality rather than creating direct danger.

AI text is making new websites more alike and upbeat

AI text is no longer a fringe layer on the open web. A large analysis of websites from the Internet Archive found that about 35 percent of all newly published websites were fully or partially AI-generated by mid-2025.

The finding matters because the study does not simply ask whether AI writing exists online. It looks at how that writing may be changing the texture of the web: what ideas appear, how similar pages sound, and whether online information feels trustworthy.

How the researchers measured AI text online

The study was carried out by researchers at Imperial College London, the Internet Archive, and Stanford University. The team examined a representative sample of English-language websites from the Internet Archive's Wayback Machine, covering 33 monthly intervals from August 2022 to May 2025.

Before ChatGPT launched in late 2022, the share of fully or partially AI-generated websites in the sample was essentially zero. By mid-2025, the researchers found that the share had climbed to about 35 percent.

To identify AI-generated writing, the researchers used the Pangram v3 detector. According to the source article, that detector performed best in the team's own robustness tests across five dimensions.

The researchers then tested six common ideas about how AI might affect the web. Only two were statistically supported: semantic contraction and the positivity shift.

The web may be getting narrower

Semantic contraction is the study's term for a shrinking range of ideas and expressions online. The researchers found that AI-generated texts were 33 percent more semantically similar to each other than human-written content.

In plain terms, AI-written pages tended to cluster closer together in meaning. The researchers interpret this as a sign that language models move toward the average of their training data. That could make online discourse feel less varied, even when the number of pages keeps growing.

The concern is not that every AI-generated page says the same thing. It is that a large volume of similar writing can gradually reduce the space for more unusual phrasing, sharper disagreement, or distinctive ways of framing a topic.

The study links this to the possible narrowing of the Overton window of online discourse. That point remains an interpretation of the data, but it follows from the measured increase in similarity among AI texts.

The tone is becoming more cheerful

The second supported finding was the positivity shift. AI texts scored 107 percent higher on positive sentiment than fully human-written content.

The researchers connect this to the tendency of language models toward sycophancy and overoptimism. If a growing share of online writing is polished, agreeable, and upbeat, the result could be a web that feels less confrontational but also less human.

Co-author Jonas Dolezal, an AI researcher at Stanford, argued that models may need more resistance and a clearer voice. In a comment to 404 Media, he said:

"Rather than forcing models to be perfectly compliant and agreeable, allowing them to have a more distinct personality or 'friction' might help them act as a creative partner rather than a replacement for human voice,"

The source article notes that the study measures correlations, not causation. That distinction is important. The researchers observed strong patterns in the data, but the work does not prove that AI alone caused every shift they measured.

What the study did not find

Several widely held fears were not supported by the data. The study did not find evidence that individual writing styles are disappearing, that external links are declining, or that information density is falling.

It also could not show an increase in factual errors. That result is more limited than the others because of the way the researchers tested it.

For the truth decay hypothesis, the team used GPT-4o-mini to extract verifiable claims from websites, up to five per page. Fifty human annotators then checked those claims against outside sources and rated them as supported, refuted, not enough evidence, or conflicting evidence. The key metric was the share of clearly refuted statements.

The researchers found no statistically significant correlation between that metric and the share of AI content. But the source article emphasizes that this part of the study rested on a much smaller base than the overall analysis.

  • Each annotator checked claims from five articles.
  • That produced a subsample of roughly 250 websites.
  • The broader study used roughly 10,000 URLs per month across 33 months.

The method also focused on clearly refutable individual claims. It did not capture vaguer claims, suggestive language, or statements that cannot be checked with existing fact-checking tools and infrastructure.

Dolezal told 404 Media:

"The most surprising result was that our Truth Decay hypothesis wasn't confirmed,"

He also noted that AI could still be increasing the volume of unverifiable claims, even if the study did not find more verifiably false statements.

Why perception and evidence diverge

The researchers also surveyed 853 US adults in a representative poll. Most respondents believed in all of the negative hypotheses, including the four that the study did not support empirically.

One example stands out: 83 percent agreed that individual writing styles are vanishing in favor of a generic AI voice. The study's data did not back that up.

The survey also found that people who rarely use AI were more likely to believe in negative effects than regular users, 88.3 versus 76.2 percent. Among AI skeptics, the difference was 91.3 versus 71.1 percent.

The researchers warn that the amount of AI content now online makes model collapse a practical concern. Model collapse refers to the risk that AI models may degrade by training on their own outputs.

Rather than relying only on detection after content is published, the researchers recommend cryptographic provenance standards like C2PA. They also call for search and recommendation systems to reward semantic diversity.

Co-author Maty Bohacek of Stanford said the team is already working with the Internet Archive to turn the analysis into an ongoing monitoring tool. He told 404 Media that the goal is to keep tracking the signal over time instead of leaving it as a single fixed snapshot.

The study has limits. It examined only English-language text, not other languages or formats like images or video. It also depends on the reliability of the Pangram v3 detector, and the data comes only from the Internet Archive, which does not represent the whole web.

Still, the central message is clear: the most visible risk may not be a sudden flood of provably false AI claims. It may be a slower shift toward sameness, cheerfulness, and what the study calls reality apathy, as users become less willing to trust online information at all.