Japanese AI startup Sakana says it has shown that an AI-generated scientific paper can make it through peer review. The claim is important because it lands in the middle of a widening debate over whether AI can help produce science, not just summarize it or write about it.
The details matter. Sakana’s result points to real progress in automated research workflows, but it also shows why acceptance by reviewers is not the same thing as a clear contribution to scientific knowledge.
What Sakana Says Its AI System Did
Sakana used an AI system called The AI Scientist-v2 to generate papers for a workshop at ICLR, a long-running and reputable AI conference. According to the company, workshop organizers and ICLR leadership agreed to work with Sakana on an experiment involving double-blind review of AI-generated manuscripts.
The company said it collaborated with researchers at the University of British Columbia and the University of Oxford. Together, they submitted three AI-generated papers to the workshop for peer review.
Sakana described the process as end-to-end generation. The AI Scientist-v2 produced the scientific hypotheses, experiments and experimental code, data analyses, visualizations, text, and titles.
Robert Lange, a research scientist and founding member at Sakana, told TechCrunch: “We generated research ideas by providing the workshop abstract and description to the AI. This ensured that the generated papers were on topic and suitable submissions.”
One of the three papers was accepted to the ICLR workshop. That paper examined training techniques for AI models, with Sakana saying it cast a critical lens on the topic. The company then withdrew the paper before publication, citing transparency and respect for ICLR conventions.
Why The Result Is Not A Simple Breakthrough
The most basic fact is still notable: one AI-generated paper was accepted after initial peer review. But several caveats limit what can be concluded from that outcome.
First, Sakana itself acknowledged quality problems. In its blog post, the company said the AI sometimes made “embarrassing” citation errors. One example involved incorrectly attributing a method to a 2016 paper instead of the original 1997 work.
Second, the paper did not go through every possible layer of review. Because Sakana withdrew it after the initial peer review, it did not receive an additional “meta-review.” At that stage, workshop organizers could still have rejected the paper.
Third, Sakana noted that workshop acceptance rates tend to be higher than acceptance rates for the main “conference track.” The company also said none of its AI-generated studies passed its internal bar for ICLR conference track publication.
That distinction matters for how the result should be understood. A workshop acceptance can still be meaningful, but it is not identical to acceptance in a more selective main conference track.
Experts See Human Judgment Behind The AI Output
Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, called Sakana’s results “a bit misleading.” His concern was not just whether the AI could produce a plausible paper, but how much human filtering shaped the final submission.
Guzdial told TechCrunch: “The Sakana folks selected the papers from some number of generated ones, meaning they were using human judgment in terms of picking outputs they thought might get in. What I think this shows is that humans plus AI can be effective, not that AI alone can create scientific progress.”
That point shifts the interpretation. The experiment may show that AI can generate research-like outputs that humans can screen and submit. It does not necessarily show that AI, working independently, can identify and produce scientific advances.
Mike Cook, a research fellow at King’s College London specializing in AI, also questioned the rigor of the review setting. He told TechCrunch: “New workshops, like this one, are often reviewed by more junior researchers.”
Cook also noted that the workshop focused on negative results and difficulties. In his view, that context may matter because “it’s arguably easier to get an AI to write about a failure convincingly.”
The Bigger Issue Is What Peer Review Measures
The episode raises a practical question for scientific publishing: what exactly does it mean when an AI-generated paper passes peer review?
Cook said he was not surprised that AI could pass review, given that AI is strong at producing human-sounding prose. He also pointed out that partly AI-generated papers passing journal review is not new, and neither are the ethical dilemmas this creates for science.
The risk is not only that AI might make mistakes. The source article highlights technical shortcomings such as hallucination, along with concern that AI could add noise to scientific literature rather than help move research forward.
Cook framed the issue directly: “We need to ask ourselves whether [Sakana’s] result is about how good AI is at designing and conducting experiments, or whether it’s about how good it is at selling ideas to humans — which we know AI is great at already. There’s a difference between passing peer review and contributing knowledge to a field.”
That distinction is central. Peer review is meant to evaluate scientific work, but reviewers are also evaluating how an argument is presented. If an AI system can write persuasively while still making citation mistakes or producing work with limited novelty, the process may need new safeguards.
Sakana Calls For Norms Around AI-Generated Science
Sakana did not claim that The AI Scientist-v2 had produced groundbreaking or especially novel science. The company said the goal was to “study the quality of AI-generated research” and to highlight the need for “norms regarding AI-generated science.”
That makes the experiment less like a declaration of victory and more like a warning sign. AI can already participate in parts of the research pipeline in ways that are difficult for institutions to ignore.
The company also raised the question of whether AI-generated science should be judged on its own merits first, to avoid bias against it. At the same time, Sakana warned against a future in which the technology is optimized mainly to pass peer review, which could undermine the purpose of the process.
The lesson is measured but important. Sakana’s AI peer review result shows that automated systems can produce work that looks credible enough to move through part of a scientific review process. It also shows that the scientific community still needs clearer rules for authorship, disclosure, quality control, and the difference between fluent writing and reliable knowledge.