WIRED AI March 13, 2025 TERMINATOR

A clearer path is emerging for reporting dangerous AI flaws

More than 30 AI researchers are proposing a more formal system for reporting dangerous AI flaws. The plan calls for standardized reports, company-backed disclosure infrastructure, and a way to share vulnerabilities across model providers.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story centers on dangerous AI vulnerabilities, privacy leakage, and jailbreak disclosure, though it is mainly about improving safety processes.

A clearer path is emerging for reporting dangerous AI flaws

AI models are now used widely enough that a hidden flaw can become more than a technical issue. A new proposal from more than 30 AI researchers argues that outsiders need clearer permission, better reporting channels, and safer ways to disclose dangerous AI flaws.

Why AI flaw reporting is under pressure

In late 2023, a team of third-party researchers found a troubling issue in OpenAI’s widely used artificial intelligence model GPT-3.5. When prompted to repeat certain words a thousand times, the model began repeating the word, then shifted into incoherent output and snippets of personal information from its training data, including parts of names, phone numbers, and email addresses.

The researchers worked with OpenAI so the flaw could be fixed before it was made public. The case became one example among scores of problems found in major AI models in recent years.

The new proposal says many AI vulnerabilities are still reported in uneven and risky ways. Some methods for breaking AI safeguards are shared on the social media platform X. Other jailbreaks are sent to only one company even when they may affect many. Some flaws may stay private because researchers fear being banned or facing prosecution for violating terms of use.

“Right now it's a little bit of the Wild West,” says Shayne Longpre, a PhD candidate at MIT and the lead author of the proposal.

What the researchers want to change

The proposal is backed by more than 30 prominent AI researchers, including some who found the GPT-3.5 flaw. Their central argument is straightforward: third-party researchers should have a recognized way to test models and disclose problems without uncertainty over whether good-faith research will be punished.

The authors suggest three main measures for AI flaw disclosure:

Standardized AI flaw reports that make the reporting process more consistent.
Infrastructure from big AI firms to support third-party researchers who disclose flaws.
A system that allows flaws to be shared between different providers.

The model is borrowed from cybersecurity, where outside researchers often rely on established norms and legal protections when they report bugs. AI researchers, the proposal argues, do not yet have the same clarity.

“AI researchers don’t always know how to disclose a flaw and can’t be certain that their good faith flaw disclosure won’t expose them to legal risk,” says Ilona Cohen, chief legal and policy officer at HackerOne, a company that organizes bug bounties, and a coauthor on the report.

Why outside testing matters

Large AI companies already perform extensive safety testing before releasing models. Some also hire outside firms to probe systems further. But the proposal questions whether that is enough for general-purpose AI systems that may be used in many applications and services.

Powerful models can contain harmful biases. Certain inputs can also cause them to break free of guardrails and produce unpleasant or dangerous responses. The source article points to risks such as encouraging vulnerable users toward harmful behavior or helping a bad actor develop cyber, chemical, or biological weapons.

Some experts also fear that advancing models could assist cyber criminals or terrorists, and may even turn on humans as they advance. The proposal does not present third-party reporting as a complete solution to those concerns. It presents it as a practical way to make serious flaws easier to find, report, fix, and discuss publicly.

Longpre asks whether there are enough people inside those companies to address every issue in general-purpose AI systems used by hundreds of millions of people in applications that have not yet been imagined. Some AI companies have started organizing AI bug bounties, but Longpre says independent researchers can still risk breaking terms of use if they probe powerful AI models on their own.

Who is involved

The researchers behind the initiative include academics from MIT, Stanford University, Princeton, and Carnegie Mellon University. The effort also includes people from large companies including Microsoft and Mozilla, along with several independent AI research organizations.

Ruth Appel, a postdoctoral fellow at Stanford University who worked on the proposal, says a formal process would help model faults be flagged quickly and would hold companies publicly accountable. Without such a system, she says, users could face a worse or more dangerous product because flaws may not be reported or may not be discovered.

The proposal arrives as the US government’s AI Safety Institutes, created under the Biden administration to help vet the most powerful AI models for serious problems, faces an uncertain future due to cuts being implemented by Elon Musk’s Department of Government Efficiency.

Longpre and Appel helped organize a workshop at Princeton University on third party AI flaw disclosure last October. Researchers from companies including Google, OpenAI, Microsoft, and Cohere attended the event.

The next test is adoption

Longpre says the researchers have started discussing the proposals with researchers from some big AI firms including OpenAI, Google, and Anthropic. Those companies did not immediately respond to a request for comment.

Longpre was previously part of a group of researchers that called for companies to change their terms of service so third-party researchers could probe models. That change did not happen.

Nicholas Carlini, an ex-Google researcher and a member of the team that discovered the GPT-3.5 flaw in 2023, told the Princeton workshop that the reporting system needs clearer norms. The core issue is not only whether AI flaws exist. It is whether researchers know exactly how to report them, whether companies can coordinate responses, and whether the public can learn about serious problems after they are fixed.

If the proposal gains support from AI companies, it could move AI flaw disclosure closer to the norms already used in cybersecurity. If it does not, researchers may continue to face the same uncertainty that the proposal is trying to resolve.