Ars Technica AI October 23, 2025 NEUTRAL

Why Reddit’s Perplexity lawsuit targets Google search scraping

Reddit alleges Perplexity and several scraping companies obtained Reddit content through Google search results after direct anti-scraping barriers blocked access. Perplexity, SerpApi, and Oxylabs deny wrongdoing and frame the dispute as a fight over public data and the open Internet.

Reddit’s lawsuit against Perplexity is not just a dispute over one AI search product. It is a wider challenge to the way AI tools, data brokers, and scraping services use search results to reach material that platforms say they have worked to protect.

The complaint, filed on Wednesday, accuses Perplexity of working with several companies to scrape Reddit content from Google search results. Reddit says the alleged conduct bypassed anti-scraping systems maintained by both Reddit and Google.

What Reddit says happened

Reddit alleges that Perplexity presents itself as “the world’s first answer engine” while depending on Reddit content and Google search results. In Reddit’s view, the product does not gain access to that material through ordinary browsing or direct platform access.

The lawsuit says Perplexity uses another company’s large language model to process Google search results and answer user questions based on those results. Reddit’s central claim is that Perplexity could only make some answers work by wrongfully accessing Reddit content shown inside Google’s search engine results pages.

To test that theory, Reddit says it created content that could only be found through Google search engine results pages, also called SERPs. Reddit described the tactic as “the digital equivalent of marked bills.” According to the lawsuit, “within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post.”

Reddit argues that this test showed Perplexity or its co-defendants scraped Google SERPs and then quickly incorporated Reddit data into the answer engine. Reddit also likened the companies involved in the alleged scheme to “bank robbers” and said it caught Perplexity “red-handed.”

Why Google search results matter

The case turns on a practical issue: blocking scraping on one site may not stop the same content from being gathered somewhere else. Reddit says it uses tools such as “registered user-identification limits, IP-rate limits, captcha bot protection, and anomaly-detection tools” to limit automated access to its own platform.

Google also has systems designed to stop automated harvesting of its search results. Reddit says Google prohibits “unauthorized automated access” to its SERPs and maintains “anti-scraping systems and teams dedicated to preventing unauthorized access to its products and services.”

Reddit subpoenaed Google for more detail on those protections. According to the complaint, Google confirmed that it uses a technological access control system called “SearchGuard.” The system is designed to stop automated systems from obtaining wholesale search results and indexed data while still allowing individual human users to view search results, including results that feature Reddit data.

The complaint says SearchGuard creates a barrier challenge that automated systems cannot solve in the ordinary course unless they take affirmative steps to get around it. Reddit alleges that bypassing such systems violates the Digital Millennium Copyright Act, along with laws against unfair trade and unjust enrichment.

The companies Reddit named

Reddit’s complaint names Perplexity and three companies it says were part of the alleged scraping effort: Oxylabs UAB, AWMProxy, and SerpApi.

Oxylabs UAB is described in the lawsuit as “a Lithuanian data scraper.”
AWMProxy is described as “a former Russian botnet.”
SerpApi is described as a Texas company that sells services for scraping search engines.

Reddit alleges that Oxylabs presents its scraping service as a way to get around Google’s technological measures, pointing to an Oxylabs website called “How to Scrape Google Search Results.” Reddit also claims SerpApi markets tools for scraping SERPs at “ludicrous speeds.”

The complaint says SerpApi’s fastest option uses “a server-swarm to hide from, avoid, or simply overwhelm by brute force effective measures Google has put in place to ward off automated access to search engine results.” Reddit also alleges that SerpApi gives users tips such as sending “fake user-agent string[s],” changing IP addresses, and using proxies “to make traffic look like regular user traffic” and thereby “impersonate” user traffic.

Reddit says the three companies disguise “their web scrapers as regular people (among other techniques)” to get around restrictions. A subpoena requesting information from Google allegedly showed that during a two-week span in July, the companies scraped “almost three billion” SERPs containing Reddit text, URLs, images, and videos.

How the defendants responded

Perplexity denied wrongdoing in a Reddit post. It described its answer engine as summarizing Reddit discussions and citing Reddit threads in answers, similar to how someone might share links or posts on Reddit.

Perplexity also argued that Reddit is attacking the open Internet and seeking licensing fees for Reddit content. It said Reddit knows Perplexity does not train foundational models. Perplexity further claimed that Reddit’s goal is to use the lawsuit as a “show of force in Reddit’s training data negotiations with Google and OpenAI.”

Perplexity wrote: “We won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor,” and added, “Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games. “

Reddit’s complaint appears to anticipate that open-Internet argument. It cites Reddit’s robots.txt language: “Reddit believes in an open Internet, but not the misuse of public content.”

SerpApi’s spokesperson told Ars that Reddit did not notify the company before filing the lawsuit. The spokesperson said SerpApi strongly disagrees with Reddit’s allegations and intends to defend itself in court. The company said that in the eight years it has been in business, it has operated on the right side of the law and that its website says: “The crawling and parsing of public data is protected by the First Amendment of the United States Constitution. We value freedom of speech tremendously.”

Oxylabs’ chief governance strategy officer, Denas Grybauskas, told Ars that Reddit’s complaint seemed baffling because the companies named in the litigation are “unrelated and unaffiliated.” Grybauskas said Oxylabs was shocked and disappointed, adding that Reddit had not tried to speak with the company directly. He also said Oxylabs’ position is that no company should claim ownership of public data that does not belong to them.

What the lawsuit is really testing

The dispute puts a hard question in front of AI search: when public content appears in search results, does automated collection of that content remain ordinary access, or does it become misuse when technical barriers are bypassed?

Reddit frames the case around access controls, platform investment, and alleged circumvention. Perplexity and the scraping companies frame it around public data, citations, fair use principles, and the open Internet.

That makes the case important beyond Reddit and Perplexity. The facts alleged in the complaint focus on Google SERPs, Reddit content, scraping services, and an AI answer engine. But the underlying conflict is broader: platforms want limits on automated reuse, while AI search products and data collection companies argue that public web information should remain available for indexing, summarizing, and search-driven answers.