GPT-5 runs lab tests as OpenAI targets cheaper CFPS

OpenAI and Ginkgo Bioworks connected GPT-5 to an automated cloud laboratory to improve cell-free protein synthesis. The system cut the specific production cost for sfGFP from $698 per gram to $422 per gram, but the results remain limited to a single protein and a single CFPS system.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 0 ►

GPT-5 was used in a closed-loop automated biology system, raising mild autonomy and bio-capability concerns despite strong validation limits.

GPT-5 runs lab tests as OpenAI targets cheaper CFPS

OpenAI and Ginkgo Bioworks have tested a closed-loop system in which GPT-5 designs experiments for an automated laboratory, receives the results, and plans the next round. The project focused on cell-free protein synthesis, or CFPS, a method for producing proteins without growing living cells.

The outcome was measurable: lower costs, higher yield, and a large set of experimental data. But the project also showed that autonomous biology still depends on careful validation, lab infrastructure, and human intervention.

Why CFPS is hard to optimize

Cell-free protein synthesis works by moving the protein-making machinery from inside cells into a controlled mixture. Instead of growing living cells, researchers run the machinery in a reaction solution.

The source article notes that CFPS is used in drug development, diagnostics, and industrial enzyme production. According to the accompanying paper, it is also used in the commercial manufacturing of an antibody-drug conjugate.

The challenge is that each reaction depends on many interacting components. These include DNA templates, cell extracts called lysates, energy sources, salts, and cofactors. Small changes can affect performance, but the number of possible mixtures is too large to search by intuition alone.

Previous attempts using machine learning produced only incremental improvements. OpenAI and Ginkgo Bioworks approached the problem by linking GPT-5 directly to automated experimentation, so the model could design tests, learn from the data, and revise its next set of experiments.

What the automated lab actually did

The system ran over six iterative rounds. In total, it tested more than 36,000 different reaction compositions across 580 automated microtiter plates. Each plate contains hundreds of tiny wells, allowing many reactions to run in parallel.

GPT-5 generated experimental designs as digital files. Before those designs reached the lab, a validation system based on the Python library Pydantic checked whether they were scientifically sound and physically executable on the available automation.

Once approved, the experiments went to Ginkgo Bioworks' cloud laboratory in Boston. There, Reconfigurable Automation Carts, or RACs, handled individual pieces of lab equipment such as liquid handlers, incubators, and measurement instruments. Robotic arms and transport rails moved sample plates between stations, while Ginkgo's Catalyst software controlled the workflow.

After each experiment, measurement data flowed back to GPT-5. The model analyzed the results, developed hypotheses, and designed the next round. According to the authors, human involvement was limited to preparation and loading and unloading of reagents and consumables.

Across six months, the system generated roughly 150,000 data points. In the first round, GPT-5 worked without prior example data or experimental results, relying on knowledge stored in its weights. In this so-called "zero-shot" mode, it still produced usable designs, although they were not yet optimal.

The strongest results came after GPT-5 gained tools

The headline result was a lower cost for producing the test protein sfGFP, a fluorescent standard protein used as a benchmark. The specific production cost fell from $698 per gram to $422 per gram, a 40 percent reduction compared to the previous state of the art, published by researchers at Northwestern University in August 2025.

Protein yield also increased by 27 percent, rising from 2.39 to 3.04 grams per liter of reaction solution. Reagent costs alone dropped by 57 percent, from $60 to $26 per gram.

The authors also compared the work with the list price of a commercially available CFPS kit from NEB, which runs around $800,000 per gram. They also acknowledged that those figures are not directly comparable.

The largest performance improvement started in the third round. At that point, GPT-5 gained access to a computer, the internet, data analysis packages, and a recent preprint of the previous best results from the Northwestern University researchers. It also received expanded metadata, including raw data, liquid handling error reports, and actual incubation times.

From round 3 onward, GPT-5 could combine its own experimental results with insights from the scientific literature. In just two months, covering rounds 3 through 5, the system surpassed the previous state of the art. The source article notes an important caveat: the DNA template and cell lysate were also improved at the same time, making it difficult to assign the gains precisely.

What GPT-5 appeared to learn

Before it had access to the Northwestern University preprint, GPT-5 suggested reagents including nucleoside monophosphates, potassium phosphate, and ribose. The authors of that publication had independently identified the same substances as critical.

The model also wrote human-readable lab notebook entries describing its analyses and hypotheses. Among its findings, GPT-5 identified that HEPES, an inexpensive buffer, had a disproportionately large effect on protein yield. It also found that phosphate must be buffered within a narrow concentration and pH range, and that adding spermidine boosts yields.

GPT-5 made an economic observation as well. The cost of a CFPS reaction is now more than 90 percent determined by cell lysate and DNA. That means improving protein yield per unit of those expensive ingredients is the most effective lever for lowering cost, rather than focusing on cheaper secondary components.

Of the more than 20 additional reagents GPT-5 suggested, several appeared in the best-performing reaction compositions. These included NMPs, glucose, potassium phosphate, and catalase.

Why this is not full lab autonomy yet

The system produced relatively few major design errors. Of 480 designed microtiter plates, only two had fundamental design flaws, less than one percent.

One error happened when the model overwrote a prescribed volume specification to make room for additional reagents. Another came from a coding error in unit conversion, producing a plate containing only glucose and ribose. That plate produced no protein.

The bigger limitations were not just software errors. Early in the process, measurement results varied significantly. Deviations between replicates on the same plate sometimes exceeded 40 percent. Ginkgo staff manually adjusted reagent concentrations and stock solutions, bringing deviations down to a median of 17 percent.

The results also apply only to one protein, sfGFP, and one CFPS system. Whether the same reaction compositions transfer to other proteins remains unclear. In a test with twelve additional proteins, only six were detectable by gel electrophoresis, and further optimization would be needed for other target proteins.

OpenAI and Ginkgo Bioworks say they plan to extend the approach to additional biological processes. For now, the project shows both sides of AI-driven wet lab work: GPT-5 can help navigate a complex experimental space, but the lab still needs validation, reliable automation, and human oversight.