Ars Technica AI November 12, 2025 TERMINATOR

Why 20 million ChatGPT chats are at the center of a privacy fight

OpenAI is asking a court to reverse an order requiring it to provide 20 million ChatGPT conversations to The New York Times and other news plaintiffs. The company says most of the logs are unrelated to the case, while The New York Times says the sample is anonymized and covered by a legal protective order.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story centers on large-scale disclosure of private AI chat logs, raising mild concerns about surveillance, control, and user privacy rather than AI capability itself.

Why 20 million ChatGPT chats are at the center of a privacy fight

OpenAI is fighting a court order that would require it to hand over 20 million ChatGPT user conversations to The New York Times and other news plaintiffs in a copyright lawsuit. The dispute has become a larger argument over how private AI chats should be treated when a lawsuit seeks evidence from a consumer product used at large scale.

The company says the order reaches too far because the logs are complete conversations, not isolated examples. The New York Times says user privacy is not at risk because the chats would be anonymized by OpenAI and handled under a legal protective order.

What the court ordered

A November 7 order by US Magistrate Judge Ona Wang said OpenAI must produce the 20 million de-identified Consumer ChatGPT Logs to News Plaintiffs by November 14, 2025, or within 7 days of completing the de-identification process. The order sided with the news plaintiffs even though the parties dispute whether OpenAI had agreed to produce the logs in full.

The logs are a random sampling of ChatGPT conversations from December 2022 to November 2024, according to OpenAI. The company said the set does not include chats from business customers.

OpenAI had previously offered 20 million user chats as an alternative to a demand for 120 million. Its current position is that producing the full 20 million-chat sample remains too broad because the conversations have not first been narrowed for relevance to the lawsuit.

OpenAI’s privacy argument

OpenAI says the logs contain complete exchanges between users and ChatGPT, covering multiple prompt-output pairs. That matters because a full conversation can reveal context, interests, work, plans, or sensitive personal material even when obvious identifiers have been removed.

In its filing, OpenAI said that “more than 99.99%” of the chats have “nothing to do” with the case. It asked the district court to vacate the order and direct the News Plaintiffs to respond to OpenAI’s proposal for identifying relevant logs. The company could also seek review in a federal court of appeals.

OpenAI’s broader point is that AI chat records should not be treated as a bulk discovery target simply because they exist. It argued that courts do not let plaintiffs in other cases search through private emails of tens of millions of Gmail users without first narrowing the request for relevance, and said generative AI tools should receive similar treatment.

The company also said de-identification is not the same as removing all private information. According to OpenAI, the process is not designed to strip out non-identifying but still private material, such as a hypothetical use of ChatGPT by a Washington Post reporter to help prepare a news article.

The New York Times’ response

The New York Times disputes OpenAI’s framing. In a statement to Ars, the company said its case against OpenAI and Microsoft is about holding the companies accountable for using copyrighted works to create products that directly compete with The Times.

The Times also said no ChatGPT user’s privacy is at risk. Its position is that the court ordered a sample of chats anonymized by OpenAI itself and covered by a legal protective order. The Times further argued that OpenAI’s terms of service allow the company to train models on user chats and turn over chats for litigation.

The plaintiffs have also argued that access to the output log sample is needed to keep discovery on schedule before the February 26, 2026, discovery deadline. In an October 30 filing, they said OpenAI had refused to produce even a small sample of the billions of model outputs at issue in the case.

Why the sample matters

The disagreement is not only about the number of chats. It is also about what kind of evidence the plaintiffs can inspect directly and what filtering role OpenAI should be allowed to play before the material is shared.

OpenAI says the discovery requests were initially limited to logs related to Times content. The company says it had been working to satisfy those requests by sampling conversation logs, but that News Plaintiffs later demanded the full 20 million-log sample via hard drive.

OpenAI says it offered privacy-preserving options, including targeted searches over the sample to find chats that might include text from a New York Times article. It also says it offered high-level data classifying how ChatGPT was used in the sample. According to OpenAI, those options were rejected by The Times.

The plaintiffs say that approach is inadequate. Their position is that they need access to the model outputs themselves to analyze how real-world users interact with the consumer-facing product, how retrieval augmented generation, or RAG, functions to deliver news content, and the frequency of hallucinations.

What happens to the chats now

OpenAI says the chats are stored in a secure system protected under legal hold, meaning they cannot be accessed or used for purposes other than meeting legal obligations. The company also says The New York Times would be legally obligated at this time not to make any data public outside the court process.

Even so, OpenAI says it will fight attempts to make the user conversations public. It has also told users that it plans to develop advanced security features meant to keep data private, including client-side encryption for messages with ChatGPT.

The case now presents a clear conflict: the plaintiffs want direct access to a large sample of product logs they say are central to their claims, while OpenAI says that handing over complete user conversations at this scale creates a dangerous precedent. The outcome will help define how courts handle AI chat logs when consumer privacy and discovery demands collide.