Lower Gemini 1.5 costs raise the stakes for AI developers

Google has released Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002 with benchmark gains, lower latency, and higher rate limits. Gemini 1.5 Pro token prices are being cut by more than 50% for prompts under 128,000 tokens, with the new pricing taking effect on October 1, 2024.

WTF Index NEUTRAL
◄ Terminator 1 Idiocracy 0 ►

This is mainly a routine model update and pricing change, with only a mild tilt toward more capable and scalable AI systems.

Lower Gemini 1.5 costs raise the stakes for AI developers

Google is updating its Gemini 1.5 model lineup with two new versions aimed at making AI applications faster, more capable, and less expensive to run. The releases, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, are positioned as improvements over earlier Gemini 1.5 models across performance, speed, and cost.

The changes matter because developers often choose AI models by balancing quality against operating expense. In Google’s framing, the new Gemini AI models are meant to improve that tradeoff: stronger results on key benchmarks, lower latency, higher rate limits, and sharply reduced Gemini 1.5 Pro token pricing.

What Google changed in Gemini 1.5

The two updated models are Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002. According to Google, both show gains across a range of benchmarks, with particularly visible progress in maths, long context, and visual tasks.

Google has also adjusted the economics around Gemini 1.5 Pro. The company has reduced input and output token prices for Gemini 1.5 Pro by more than 50%, while also increasing rate limits for both updated models and reducing latency.

That combination points to a practical shift for teams building with the Gemini API, Google AI Studio, or Vertex AI. Better model performance is useful on its own, but lower costs and faster responses can affect whether a feature is viable at scale.

Benchmark gains focus on math, vision and code

Google reported improvements across several types of evaluations. On MMLU-Pro, described in the source as a more challenging version of MMLU, the models improved by about 7%.

The largest stated gain is in math performance. On the MATH and HiddenMath benchmarks, Google says performance increased by 20%.

The updates also include smaller but still relevant improvements in visual understanding and Python code generation evaluations. Those areas saw gains of 2-7%.

Taken together, the benchmark picture suggests Google is not presenting these Gemini 1.5 updates as a narrow tuning pass. The improvements span reasoning-oriented tasks, multimodal use, and code-related work, which are all common targets for developers building AI products.

Why lower latency and rate limits matter

Model quality is only one part of an AI system. For many applications, response time and usage capacity are just as important. Google says the new Gemini models reduce latency and increase rate limits for both Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002.

Lower latency can make AI features feel more responsive. Higher rate limits can help teams serve more requests without hitting usage ceilings as quickly. The source does not provide exact latency or rate-limit numbers, but it does identify both as areas Google has improved.

Cost is the clearest concrete change. The new pricing for Gemini 1.5 Pro takes effect on October 1, 2024, for prompts under 128,000 tokens. Google also expects development costs with Gemini to fall further when the price reduction is combined with context caching.

For developers, that makes the updated Gemini 1.5 models relevant not only as benchmark upgrades, but also as infrastructure choices. If the same workload can run with lower token prices and faster responses, the model becomes easier to justify for production use.

Access and availability

Users can access Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002 through Google AI Studio, the Gemini API, and Vertex AI for Google Cloud customers.

Google has also released an improved version of the Gemini 1.5 experimental model announced in August. That updated model is called Gemini-1.5-Flash-8B-Exp-0924, and it is described as offering further enhancements for text and multimodal applications.

A chat-optimized version of Gemini 1.5 Pro-002 is also planned for Gemini Advanced users. The source says that version is coming soon, but does not give a specific release date.

The broader signal

The Gemini 1.5 update is a reminder that AI model competition is not only about headline capability. Pricing, throughput, latency, long context performance, and developer feedback all shape which models get adopted in real products.

Google says it refined the models’ output style based on developer feedback, with the goal of making responses more helpful while maintaining content safety standards. The company also says the refinements are aimed at more precise and cost-effective use.

For teams already working with Gemini, the practical questions are straightforward: whether the new Gemini 1.5 Pro and Gemini 1.5 Flash models perform better on their own workloads, whether the lower token pricing changes product economics, and whether the reduced latency improves user experience. The source does not answer those application-specific questions, but it does show where Google wants developers to look first.