TechCrunch AI March 27, 2025 TERMINATOR

Why open source developers are turning AI crawlers away

Open source developers say aggressive AI crawlers are overloading project infrastructure and ignoring robot.txt rules. In response, FOSS maintainers are adopting tools such as Anubis, Nepenthes, and Cloudflare’s AI Labyrinth to block, slow, or misdirect bots.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 0 ►

Aggressive AI crawlers are acting autonomously in ways that strain open source infrastructure and force defensive countermeasures.

Why open source developers are turning AI crawlers away

AI crawlers have become a practical infrastructure problem for parts of the open source world. The complaint is not just that bots are collecting data. Developers say some of them behave aggressively, ignore published crawl limits, and can push community-run services into outages.

That pressure is now producing a new defensive layer around free and open source software projects. Some tools try to prove a visitor is human before a request reaches a Git server. Others attempt to waste crawler resources by trapping bots in irrelevant or fake content.

Why FOSS Projects Feel the Pressure

Niccolò Venerandi, developer of the Linux desktop Plasma and owner of the blog LibreNews, wrote that open source developers are “disproportionately” affected by bad crawler behavior. The reason is structural: FOSS projects tend to expose more of their infrastructure publicly, while often having fewer resources than commercial software operations.

Git servers are a key example. They host FOSS projects so people can download code or contribute back to it. That openness is central to how open source works, but it also gives automated crawlers a large surface to hit repeatedly.

The traditional tool for telling bots where not to go is the Robots Exclusion Protocol robot.txt file. It was created for search engine bots, and many site operators still use it as the normal way to publish crawl rules. The problem described by developers is that many AI bots do not honor it.

When those bots ignore limits, the effect can move beyond nuisance traffic. The source article describes websites being targeted by bad crawler behavior, sometimes to the point of taking them down. For smaller software communities, even short outages can interrupt development, collaboration, and public access to code.

The AmazonBot Incident Behind Anubis

In January, FOSS developer Xe Iaso published what was described as a “cry for help” blog post about AmazonBot. Iaso said the bot relentlessly hit a Git server website and caused DDoS outages.

According to Iaso, the crawler ignored robot.txt, hid behind other IP addresses, and pretended to be other users. The developer’s frustration was direct: “It's futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more,” Iaso wrote.

Iaso also described a pattern of crawling that repeatedly followed links and revisited the same pages. In the post, the developer wrote: “They will scrape your site until it falls over, and then they will scrape it some more.”

The response was Anubis, a reverse proxy proof-of-work check. Before a request is allowed to reach a Git server, it must pass the challenge. The aim is to block bots while allowing browsers operated by humans to continue through.

The name is part of the project’s identity. Anubis refers to the Egyptian mythology figure associated with judgment after death. Iaso told TechCrunch: “Anubis weighed your soul (heart) and if it was heavier than a feather, your heart got eaten and you, like, mega died.”

If Anubis determines the request is human, a cute anime picture announces success. Iaso described the drawing as “my take on anthropomorphizing Anubis.” If the request is a bot, it is denied.

A Fast Spread Through the Community

Anubis appears to have met a need that was already widely felt. Iaso shared it on GitHub on March 19. In just a few days, it collected 2,000 stars, 20 contributors, and 39 forks.

That quick uptake matters because it shows the issue is not isolated to one developer’s server. Venerandi told TechCrunch that he knew of multiple other projects facing similar problems. One project, he said, “had to temporarily ban all Chinese IP addresses at one point.”

Venerandi framed that as a sign of how extreme the situation has become. Developers are being pushed toward blunt measures, including blocking entire countries, because bots that ignore robot.txt can be difficult to control with normal web administration tools.

For open source maintainers, this creates a difficult balance. They want their projects to remain open and accessible. At the same time, they have to keep servers available for the people who actually need the code, documentation, and contribution workflow.

From Blocking Bots to Wasting Their Time

Anubis is one approach: test the visitor before letting the request reach the service. Other developers have discussed a more adversarial strategy, where misbehaving bots are given content that wastes their resources or reduces the value of scraping.

On Hacker News, user xyzal suggested filling robot.txt forbidden pages with harmful or absurd material, including “a bucket load of articles on the benefits of drinking bleach” or “articles about positive effect of catching measles on performance in bed.” The stated goal was to make crawler visits produce negative value, not just no value.

A tool called Nepenthes, released in January by an anonymous creator known as “Aaron,” is built around a related idea. It traps crawlers in an endless maze of fake content. The creator admitted to Ars Technica that the tool is aggressive, if not outright malicious. Its name comes from a carnivorous plant.

Cloudflare has also moved into this defensive category. Last week, it released AI Labyrinth, a tool meant to “slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect ‘no crawl’ directives.” Cloudflare said the tool feeds misbehaving AI crawlers “irrelevant content rather than extracting your legitimate website data.”

These approaches differ in tone and implementation, but they share a common premise: if some AI crawlers will not respect stated crawl boundaries, site owners may need systems that enforce consequences automatically.

What the Pushback Shows

The defensive reaction from FOSS developers is not only technical. It is also a signal of exhaustion. Maintainers who run public infrastructure for community projects are being forced to spend time protecting services from crawlers instead of improving the software those services exist to support.

SourceHut’s DeVault told TechCrunch that “Nepenthes has a satisfying sense of justice to it, since it feeds nonsense to the crawlers and poisons their wells, but ultimately Anubis is the solution that worked” for his site.

DeVault also made a broader plea: “Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop.”

The source article concludes that this outcome is unlikely. For now, open source developers are responding with a mix of practical engineering, public frustration, and humor. The result is a growing toolkit for a web where robots may not ask permission, and maintainers increasingly feel they must make them prove they belong.