Three GPT-5.6 Pro models appear in OpenAI benchmark

An OpenAI genomics benchmark paper lists GPT-5.6 Luna Pro, Terra Pro, and Sol Pro, even though those Pro variants have not been announced. The table suggests ChatGPT Pro could move from one top model to a choice between speed, high-volume work, and maximum reasoning power.

WTF Index TERMINATOR
◄ Terminator 1 Idiocracy 0 ►

The story is mainly a routine benchmark leak, with only a mild lean toward more powerful reasoning models.

Three GPT-5.6 Pro models appear in OpenAI benchmark

An OpenAI benchmark paper on genomics has surfaced an unannounced detail about GPT-5.6: the Pro tier may not remain a single premium model. The paper lists three Pro variants, named GPT-5.6 Luna Pro, Terra Pro, and Sol Pro, each marked as "Pro (Extended)" runs.

What the paper shows

OpenAI officially unveiled the GPT-5.6 generation in late June with three standard models. Sol is positioned for the hardest tasks, Terra is aimed at high-volume business workloads, and Luna is meant for faster, cheaper everyday queries.

That public announcement did not include Pro versions. The new detail appears in the results table of an OpenAI paper on a genomics benchmark, where the three Pro names appear for the first time.

The structure is notable because it mirrors the standard GPT-5.6 lineup. Instead of Pro being only the single strongest option, the table points to a Pro version of each role:

  • GPT-5.6 Luna Pro for the fast end of the lineup.
  • Terra Pro for high-volume work.
  • Sol Pro for maximum reasoning performance.

The paper does not say whether these models will actually become selectable in ChatGPT. For now, the names come only from the benchmark table.

Why the results matter

In the benchmark, Sol Pro reaches a pass rate of 31.5 percent. That makes it the strongest of all 60 tested models in the source article’s account.

It also places Sol Pro ahead of standard Sol, which reaches 28.7 percent, and ahead of the best non-GPT score, Claude Opus 4.8 at 16.0 percent. The pass rate measures whether a model can finish a full multi-step analysis without errors and produce the correct final answer.

The comparison is based on the full 129-task suite. When standard models at their highest reasoning setting, "max", are compared with the Pro variants, the benefit is not evenly distributed across the lineup.

Luna Pro gains a full seven points over its standard version. Sol Pro improves by less than three points over standard Sol. Terra Pro reaches 28.5 percent, putting the high-volume Pro variant close to standard Sol at 28.7 percent.

That pattern suggests extra compute may help the lower tiers more than the already strongest standard model. It also makes Terra Pro especially interesting because it nearly matches the best standard flagship while staying aligned with the high-volume part of the GPT-5.6 lineup.

A possible shift for ChatGPT Pro

Until now, ChatGPT Pro has been described as one tier above everything else, with one top model rather than several parallel options. The benchmark table suggests a different direction: Pro could become a family of models instead of a single premium endpoint.

If that structure appears in ChatGPT, Pro users may be able to choose based on the task rather than always using the same top model. A user might want speed for routine queries, throughput for large business workloads, or maximum reasoning power for the hardest analyses.

That would be a significant change in how the Pro offering is organized. It would also make the naming of the Pro tier more closely match the rest of GPT-5.6, where Luna, Terra, and Sol already signal different tradeoffs.

Still, the paper stops short of confirming a product launch. The presence of model names in a benchmark table is evidence that these variants exist in the test context, but it is not the same as an announcement that they will ship in ChatGPT.

What remains unknown

The biggest missing detail is token usage for the Pro runs. For the standard GPT models, the paper reports average token usage as a rough proxy for compute cost. Sol at its highest setting uses about 33,200 tokens.

For the Pro rows, that comparable figure is absent. The authors say no comparable token accounting was available. The source article also notes the possibility that OpenAI does not want to share those figures.

That omission matters because Pro performance gains are difficult to interpret without knowing the compute involved. A higher pass rate is meaningful, but cost and token use help explain how expensive that improvement may be to deliver.

For now, the key takeaway is narrow but important: an OpenAI genomics benchmark lists GPT-5.6 Luna Pro, Terra Pro, and Sol Pro before any public Pro lineup announcement. The results point to a possible future where ChatGPT Pro is not one model, but a set of choices shaped around speed, scale, and reasoning depth.