Nvidia has laid out a longer view of its AI hardware plans, giving developers, data center operators and AI companies a clearer sequence of chips to watch through 2028. At Nvidia’s GTC 2025 conference in San Jose, California, CEO Jensen Huang introduced new AI-accelerating GPUs, added details about already announced systems, and connected the roadmap to a broader vision for AI agents and robots.
The announcements move in stages. Blackwell Ultra B300 is planned for the second half of 2025. Vera Rubin is scheduled for the second half of 2026. Rubin Ultra is expected to follow in the second half of 2027. A later architecture, Feynman, is planned for 2028.
A clearer path after Blackwell
The nearest product in the roadmap is Blackwell Ultra B300. Nvidia plans to launch it in the second half of 2025, positioning it as a step beyond the current Blackwell B200 configuration.
Blackwell Ultra B300 includes two GPUs and delivers 15 petaflops of dense FP4 compute performance per chip. In a full NVL72 rack, Blackwell Ultra reaches 1.1 exaflops of dense FP4 inference compute. That is 1.5 times more than the current Blackwell B200 configuration.
Memory is also part of the upgrade. Each B300 GPU has 288GB of HBM3e memory, compared with Blackwell’s 192GB. For AI systems, that additional memory matters because model work depends not only on raw compute but also on how much fast memory is available close to the processors.
The source article defines FP4 as a 4-bit floating-point format used for representing and processing numbers within AI models. In this roadmap, FP4 appears repeatedly because Nvidia is emphasizing inference performance, where models are used to generate outputs after training.
Vera Rubin brings a new CPU and larger rack performance
The central update was Vera Rubin, which had first been teased at Computex 2024 and is now scheduled for the second half of 2026. The GPU is named after a famous astronomer and pairs with a custom Nvidia-designed CPU called Vera.
Vera Rubin will feature 288 gigabytes of memory. Nvidia says it will deliver performance gains over Grace Blackwell, especially for AI training and inference. The chip places two GPUs together on one die, and each chip delivers 50 petaflops of FP4 inference performance.
The rack-level numbers are larger. In a full NVL144 rack, Vera Rubin delivers 3.6 exaflops of FP4 inference compute. Nvidia compares that with Blackwell Ultra’s 1.1 exaflops in a similar rack configuration, making the Vera Rubin rack 3.3 times higher on that measure.
The CPU side is also specific. Vera has 88 custom ARM cores and 176 threads. It connects to Rubin GPUs through a high-speed 1.8 TB/s NVLink interface, giving Nvidia a tightly coupled CPU-GPU platform rather than only a GPU upgrade.
Rubin Ultra pushes the rack design further
Rubin Ultra is planned for the second half of 2027. Huang described it as the next major step after Vera Rubin, with a larger rack configuration and a larger chip design.
Rubin Ultra will use the NVL576 rack configuration. Its individual GPUs will include four reticle-sized dies, and each chip will deliver 100 petaflops of FP4 precision. That doubles the per-chip FP4 figure given for Vera Rubin.
At rack scale, Rubin Ultra reaches 15 exaflops of FP4 inference compute and 5 exaflops of FP8 training performance. Nvidia frames that as about four times more powerful than the Rubin NVL144 configuration.
Memory grows sharply as well. Each Rubin Ultra GPU will include 1TB of HBM4e memory. Across the complete rack, Nvidia lists 365TB of fast memory. That pairing of compute and memory is the main technical story of Rubin Ultra: the company is not only increasing individual chip performance, but also scaling the surrounding rack design.
Feynman is named, but still mostly undefined
Huang also briefly introduced a next-generation GPU architecture called Feynman. It is named after American theoretical physicist Richard Feynman and is planned to arrive sometime in 2028.
Nvidia did not provide many details about Feynman’s design or capabilities. The one specific point in the source article is that Feynman would use a “Vera” CPU instead of the expected “Richard” based on the naming pattern.
That makes Feynman more of a marker on the roadmap than a fully described product. It tells the market that Nvidia is already pointing beyond Rubin Ultra, but the company has not yet shared the same level of technical detail that it gave for Blackwell Ultra B300, Vera Rubin or Rubin Ultra.
Why Nvidia is tying chips to agents and robots
The hardware roadmap was presented alongside Huang’s view of where AI systems are going. He described data centers as “AI factories” that produce tokens, the units of data that AI models currently process, instead of physical objects.
He also discussed “physical AI,” a future in which humanoid robots perform human-like labor. Nvidia already provides software platforms that help robot-controlling AI models train in virtual worlds.
For the nearer term, Huang speculated that Nvidia chips will soon power “10 billion digital agents” doing helpful work for humans. He also said that by the end of this year, 100 percent of Nvidia engineers will be assisted by AI models.
Taken together, the roadmap shows how Nvidia is presenting its chips: not simply as faster processors, but as infrastructure for AI training, inference, digital agents and robotics. The announced sequence gives a year-by-year outline, while the technical details show where Nvidia expects the largest gains to come from: more FP4 inference compute, more FP8 training performance, faster CPU-GPU links and much larger pools of fast memory.