The Atlantic hurricane season is winding down, and forecasters are already studying which tools helped most. This season, one result stands out: Google DeepMind’s Weather Lab delivered unusually strong cyclone forecasts in its first season of public track guidance.
The contrast was sharp. Google’s AI forecasting service performed very well, while the Global Forecast System model, operated by the US National Weather Service, struggled badly in preliminary comparisons.
An AI model moved to the front
Google DeepMind’s Weather Lab only started releasing cyclone track forecasts in June, but its early performance was strong enough to draw attention from forecasters. The comparison comes from preliminary number crunching by Brian McNoldy, a senior researcher at the University of Miami.
The official model performance data from the National Hurricane Center will not be published for a few months. Even so, the early numbers give a clear first look at how major forecast systems handled the season.
The analysis covered all 13 named storms in the Atlantic Basin this season. It measured mean position error across forecast times from 0 to 120 hours, or five days. In this kind of comparison, the model with the lower error is the better performer.
On that measure, the Google DeepMind model, labeled GDMI, was the best performer at nearly all forecast hours. The GFS, labeled AVNI, was by far the worst-performing model in the comparison.
The five-day gap was large
The clearest difference appeared at the five-day range. Google’s forecast had an error of 165 nautical miles. The GFS model had an error of 360 nautical miles.
That gap matters because hurricane track forecasts are decision tools. When one model is far outside the pack, forecasters may give it much less weight than models that are consistently closer to observed storm tracks.
The early comparison also included a dotted black line showing the average forecast error for official forecasts from the 2022 to 2024 seasons. Google DeepMind’s performance was not merely better than the struggling GFS. It also regularly beat the official National Hurricane Center forecast, labeled OFCL.
That is notable because the official forecast is not a single automated model output. It is produced by human experts who evaluate a broad set of model data. Google’s AI-based model also beat respected consensus products, including TVCN and HCCA.
What the comparison does and does not show
The preliminary comparison did not include the European Centre for Medium-Range Weather Forecasts model, which is often viewed as the gold standard among traditional, physics-based systems. The source notes, however, that the ECMWF model typically does not outperform the hurricane center or consensus models on hurricane track forecasts.
That context makes the Google DeepMind result harder to dismiss as a narrow or isolated win. The AI model was compared against official forecasts, consensus products, and the US Global Forecasting System, and it emerged near the top across the forecast window.
The model also did exceptionally well at intensity forecasting. Intensity forecasting deals with changes in hurricane strength, a different challenge from predicting the center track of a storm. In its first season, DeepMind performed strongly on both track and intensity.
Still, the source treats the findings as early. The National Hurricane Center’s official comparison data is still pending, and the current analysis is preliminary. The main takeaway is not that every future forecast will be led by one AI model, but that AI weather models have now shown performance that forecasters cannot ignore.
Why speed changes the forecast workflow
One reason AI weather models are attracting attention is speed. Traditional physics-based models run on some of the most expensive and advanced supercomputers in the world. AI-based systems can produce forecasts much more quickly.
Michael Lowry, a hurricane specialist and author of the Eye on the Tropics newsletter, highlighted that point in the source article. He also noted that data-driven models with neural network architectures can learn from mistakes and adjust.
That difference could change how forecasters use model guidance. If AI models remain accurate while producing forecasts quickly, they may become a more central part of hurricane analysis. The source also notes that these systems are relatively new and likely have room to improve.
For forecasters used to traditional physics-based guidance, the first-season performance of Google DeepMind’s model was striking. The article’s broader implication is straightforward: AI weather models are moving from experimental tools to systems that may influence routine hurricane forecasting.
The GFS problem remains unresolved
The poor showing by the GFS is harder to explain. The model has previously been worth considering, even when it trailed competitors. This season, the source says forecasters often disregarded it.
Lowry wrote that it was not immediately clear why the GFS performed so poorly. Some people speculated that a lapse in data collection from DOGE-related government cuts this year could have contributed, but Lowry also noted that such a factor would presumably have affected other global physics-based models, not just the American GFS.
The source also points to the US government shutdown as a reason answers may not come soon. It suggests that the model’s dynamic core upgrade, which began in 2019, has largely failed to deliver the needed improvement.
The season therefore leaves forecasters with two linked questions. One is how fast AI hurricane models will improve after such a strong early showing. The other is what must change for the GFS to regain credibility in major storm forecasting.