TechCrunch AI January 31, 2025 TERMINATOR

Why OpenAI tested AI persuasion on r/ChangeMyView

OpenAI used posts from r/ChangeMyView to evaluate how persuasive its AI reasoning models can be. The test compares model-written replies with human responses, raising questions about Reddit data, licensing, scraping, and safeguards for persuasive AI.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

Testing AI persuasion against human arguments points mildly toward more powerful influence capabilities and related control risks, though it is framed as an internal safety evaluation.

Why OpenAI tested AI persuasion on r/ChangeMyView

OpenAI has been using Reddit’s r/ChangeMyView community as the basis for a persuasion test for its AI reasoning models. The detail appeared in a system card released with o3-mini, OpenAI’s new “reasoning” model, on Friday.

The evaluation matters because it sits at the intersection of three issues shaping AI development: access to high-quality human data, the growing persuasive ability of models, and the safeguards companies say they are building to keep those abilities in check.

How the r/ChangeMyView test works

r/ChangeMyView is built around argument. Millions of Reddit users belong to the forum, where people post strong opinions and invite others to challenge them. Other users then reply with arguments meant to change the original poster’s position.

That structure makes the subreddit useful for evaluating persuasion. According to OpenAI, the company collects user posts from r/ChangeMyView and asks its AI models to write replies in a closed environment. Those replies are designed to persuade the Reddit user to reconsider the view in the original post.

OpenAI then shows the model-generated answers to testers. The testers judge how persuasive the arguments are. Finally, OpenAI compares the AI responses with human replies written for the same post.

This is not a public benchmark. OpenAI told TechCrunch it has no plans to release the evaluation. The company also said the ChangeMyView-based evaluation is unrelated to its content-licensing deal with Reddit.

Why Reddit data is valuable to AI developers

The r/ChangeMyView test illustrates why Reddit is attractive to companies building and testing AI systems. The site contains large amounts of human-written material, including long discussions, disagreements, explanations, and attempts to persuade other people.

For AI model developers, that kind of content can be more useful than simple snippets of text. A persuasion benchmark needs more than facts. It needs a record of how people frame arguments, respond to disagreement, and try to make a case that another person may accept.

OpenAI already has a content-licensing deal with Reddit. That deal allows OpenAI to train on posts from Reddit users and display Reddit posts in its products. The source article does not state what OpenAI pays for the content. It notes that Google reportedly pays Reddit $60 million a year under a similar deal.

At the same time, OpenAI said this specific evaluation is separate from its Reddit deal. TechCrunch reported that it is unclear how OpenAI accessed the subreddit’s data for the test.

The data-access dispute around AI training

The ChangeMyView evaluation also points to a larger conflict around AI data. Reddit has made licensing agreements with some AI companies, but it has also criticized companies for scraping its site without payment.

Reddit CEO Steve Huffman told The Verge last year that Microsoft, Anthropic, and Perplexity refused to negotiate with him. He also said blocking those companies had been “a real pain in the ass.”

OpenAI has faced its own legal pressure over data collection. The company has been accused in several lawsuits of improperly scraping websites, including The New York Times, to gather more training data for ChatGPT and the AI models behind it.

Reddit did not immediately respond to TechCrunch’s request for comment.

The tension is clear: AI companies need large, useful datasets, but many of the places where those datasets exist are controlled by platforms, publishers, or communities. Licensing can create a formal path to that data, while scraping disputes raise questions about consent, payment, and control.

What the benchmark showed about o3-mini

On the ChangeMyView benchmark, o3-mini did not appear to perform significantly better or worse than o1 or GPT-4o. But OpenAI’s latest AI models appeared to be more persuasive than most people on the subreddit.

OpenAI’s o3-mini system card said GPT-4o, o3-mini, and o1 showed strong persuasive argumentation abilities, placing within the top 80-90th percentile of humans. The system card also said OpenAI does not currently see the models performing far better than humans or showing clear superhuman performance.

That distinction is important. The source does not describe OpenAI as trying to build a model that can outperform every human in persuasion. Instead, OpenAI frames the test as part of an effort to measure and limit risk.

The concern is that highly persuasive AI could become dangerous if it could reliably influence human users. In theory, that ability could help an advanced AI pursue its own agenda or serve the agenda of whoever controls it.

Why persuasion testing is becoming part of AI safety

OpenAI’s stated goal is not to create hyper-persuasive models. The company says it wants to make sure AI models do not become too persuasive.

That concern has become more relevant as reasoning models improve at persuasion and deception. OpenAI has developed new evaluations and safeguards to address those risks.

The r/ChangeMyView benchmark also shows a practical problem for AI companies: even after scraping much of the public internet and seeking licensed data, developers still need specialized datasets to test specific model behaviors. Persuasion is one of those behaviors. It requires examples of arguments, responses, and human judgment.

That makes the benchmark both technically useful and socially sensitive. It relies on a real online community whose posts were created for public discussion, not necessarily for AI safety testing. It also shows how hard it is for AI companies to separate model development from the messy data economy around the public web.

For readers, the key takeaway is straightforward: OpenAI is measuring how persuasive its models can be, and Reddit’s r/ChangeMyView has become one of the tools for that work. The results do not show clear superhuman persuasion, but they do show that leading models can already rank highly against human commenters in this setting.