Ars Technica AI February 3, 2025 TERMINATOR

Why OpenAI is testing AI persuasion on Reddit

OpenAI says o3-mini is more persuasive than random Reddit r/ChangeMyView responses in about 82 percent of comparisons. The company says that is not yet “clear superhuman performance,” but it is enough to rate persuasion as a “Medium” risk under its Preparedness Framework.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 1 ►

The story centers on AI systems becoming highly persuasive enough to be treated as a monitored manipulation risk.

Why OpenAI is testing AI persuasion on Reddit

OpenAI is not only measuring whether its models can solve math problems, reason through logic, understand images, or assist with forecasts. It is also studying a more social question: how persuasive can ChatGPT become when it writes an argument meant to change someone’s mind?

The company’s latest public signal comes from a system card released with the o3-mini simulated reasoning model. In that document, OpenAI says its current models have not shown much movement toward the “superhuman” persuasion level it treats as a future danger. Yet the company is already treating human-level persuasive writing as a risk worth monitoring.

How Reddit Became a Persuasion Benchmark

OpenAI’s test draws on Reddit’s r/ChangeMyView forum, a community built around argument and revision. The forum describes itself as “a place to post an opinion you accept may be flawed, in an effort to understand other perspectives on the issue.”

That makes it useful for studying persuasion. The forum’s 3.8 million members have posted propositions on politics, economics, social norms, and AI itself. Replies can earn a “delta” when they actually change the original poster’s view, creating a large collection of real-world arguments that researchers have studied for years.

OpenAI uses a random selection of human replies from ChangeMyView as a comparison point. Its models generate responses to the same prompts, and human evaluators rate both the AI-written and human-written arguments on a five-point scale across 3,000 different tests.

The resulting percentile score does not mean that an AI convinced 82 percent of readers to change their minds. It measures “the probability that a randomly selected model-generated response is rated as more persuasive than a randomly selected human response.” That distinction matters because the benchmark is comparative, not a direct measurement of changed beliefs.

What The 82 Percent Figure Means

OpenAI previously found that 2022’s ChatGPT-3.5 ranked in the 38th percentile on this measure, making it significantly less persuasive than random humans in the test. With September’s release of the o1-mini reasoning model, that result rose to the 77th percentile. The full o1 model reached percentiles in the high 80s.

The newer o3-mini model did not produce a major leap on this specific measure. OpenAI says it is more persuasive than humans in about 82 percent of random comparisons.

That sounds powerful, but the benchmark has limits. The comparison is against random responses from everyday Reddit users. If a human reply receives a low score and an AI reply receives a slightly higher score, the model wins that comparison even if neither argument is especially strong.

The test also does not show how often a person actually changed their mind after reading a ChatGPT-written argument. Nor does it establish whether the strongest AI responses move people on deeply held beliefs or on relatively minor questions, such as whether a hot dog is a sandwich.

Why OpenAI Calls The Risk “Medium”

OpenAI says o3-mini’s persuasion capability falls short of the 95th percentile threshold it would treat as “clear superhuman performance.” Still, the company rates the model’s persuasion capability as a “Medium” risk in its Preparedness Framework, which is used to assess potential “catastrophic risks from frontier models.”

At that level, a model has “comparable persuasive effectiveness to typical human written content.” OpenAI says that could become “a significant aid to biased journalism, get-out-the-vote campaigns, and typical scams or spear phishers.”

The concern is not only that an AI can write a decent argument. It is that the cost and effort required to produce such arguments may drop sharply. OpenAI notes that creating strong persuasive writing without AI “requires significant human effort,” while AI-generated arguments “could make all content up to their capability level nearly zero-cost to generate.”

That changes the scale of the problem. A single human-level argument may not be extraordinary. A large volume of cheap, targeted, human-level persuasive content could be much harder for platforms, readers, and institutions to handle.

The Threshold OpenAI Says It Has Not Reached

OpenAI’s more severe “Critical” threshold is much higher. The company describes that level as “persuasive effectiveness strong enough to convince almost anyone to take action on a belief that goes against their natural interest.”

OpenAI warns that such a model “would be a powerful weapon for controlling nation states, extracting secrets, and interfering with democracy.” The source article links this kind of concern to regulation efforts like California’s SB-1047.

For now, OpenAI says its models are not there. But the company is already applying mitigations at the “Medium” level. Those steps include “heightened monitoring and detection” of AI-based persuasion efforts, “live monitoring and targeted investigations” of extremists and “influence operations,” and rules requiring its o-series reasoning models to refuse requested political persuasion tasks.

The practical takeaway is straightforward. OpenAI’s own testing suggests that current reasoning models can produce persuasive writing that compares well with ordinary online arguments. The larger worry is not a single superhuman message, but the possibility that persuasive, human-level content can be generated cheaply and at scale.