Zhipu AI has put GLM-5 into the open-source field with an unusually direct claim: the model can compete with leading Western systems on coding, reasoning, and agent work. The release matters because it is not positioned as another chatbot. It is pitched as a foundation model built for longer plans, complex systems, and workflows that turn prompts into finished outputs.
A larger model aimed at real work
GLM-5 has 744 billion parameters, with 40 billion active at any given time. It uses a Mixture-of-Experts architecture, making it nearly twice the size of GLM-4.5, which had 355 billion parameters.
The training data also increased from 23 to 28.5 trillion tokens. According to Z.ai, GLM-5 uses Deepseek Sparse Attention (DSA), which is intended to reduce deployment costs while preserving performance on long contexts.
Zhipu AI’s broader message is that foundation models have to move past conversation and into task execution. GLM-5 is presented as a model for building complex systems and planning over long periods, a direction also being pursued by Anthropic, Google, and OpenAI.
The model weights are available under the MIT license. That is important for the open-source AI market because the license is described as one of the most permissive open-source licenses.
Benchmarks show strong coding and agent results
According to Zhipu AI, GLM-5 leads all open-source models in the company’s published benchmarks for reasoning, coding, and agent tasks. The strongest part of the pitch is not only that the model scores well, but that it performs well on tests designed to measure sustained task management.
One example is Vending Bench 2, where a model has to operate a simulated vending machine business for an entire year. GLM-5 ended the benchmark with $4,432 in its account. Claude Opus 4.5 finished with $4,967.
Andon Labs ran the benchmark. The same group helped Anthropic with "Project Vend," the real-world experiment where Claude Sonnet 3.7 attempted to run an actual self-service store and lost money.
On SWE-bench Verified, GLM-5 reaches 77.8 percent. That score places it ahead of Deepseek-V3.2 and Kimi K2.5, while still below Claude Opus 4.5’s 80.9 percent.
Zhipu AI also says GLM-5 beats all tested proprietary models on BrowseComp, a benchmark for agent-based web search and context management. The source notes that this claim has not yet been confirmed by independent testing.
That caveat matters. Benchmark performance does not always match day-to-day usefulness, and the source warns that this gap can be wider with open-source models. Strong test results can still hide weaker practical usability compared with proprietary systems.
Documents, coding agents, and local deployment
GLM-5 is also being presented as a production tool for documents. Zhipu AI says the model can take text and other sources and turn them directly into polished .docx, .pdf, and .xlsx files.
The official Z.ai app includes an agent mode with built-in document creation skills. The examples given include sponsorship proposals and financial reports.
The model also works with OpenClaw, described in the source as a new and controversial framework for cross-app and cross-device workflows. It also supports popular coding agents including Claude Code, OpenCode, and Roo Code.
For deployment, GLM-5 runs on Nvidia GPUs as well as chips from Huawei Ascend, Moore Threads, Cambricon, and other Chinese manufacturers. Zhipu AI says kernel optimization and model quantization make "reasonable throughput" possible on these alternative chips.
This hardware support has clear relevance in China, where US export restrictions have made Nvidia hardware hard to obtain. For local deployment, GLM-5 supports the vLLM and SGLang inference frameworks, with setup instructions available in the GitHub repository.
The tooling around GLM-5 is part of the release
Zhipu AI also open-sourced slime, the reinforcement learning framework used to retrain GLM-5. The framework is aimed at a difficult bottleneck: applying reinforcement learning to large language models remains slow.
Slime uses an asynchronous architecture that pairs the Megatron training framework with the SGLang inference engine. It supports Qwen3, Deepseek V3, and Llama 3, in addition to Zhipu’s own models.
That makes the release broader than a single model checkpoint. Zhipu AI is also publishing infrastructure that may matter to teams trying to train or retrain large models for agent behavior.
Chinese AI labs are pushing faster
The timing of GLM-5 is also part of the story. A Stanford analysis found Chinese AI models typically run about seven months behind their US counterparts. GLM-5 arrived roughly three months after the latest flagships from Anthropic, Google, and OpenAI, cutting that gap in half.
Zhipu AI recently released GLM-4.7, GLM-5’s direct predecessor. That model introduced a "Preserved Thinking" feature that carries thought processes across long dialogs. GLM-5 moves the SWE-bench score from 73.8 to 77.8 percent.
Competition inside China is also intensifying. Moonshot AI released Kimi K2.5, which can coordinate up to 100 sub-agents working in parallel through "Agent Swarms" and also records top scores on agent benchmarks.
Both GLM-5 and Kimi K2.5 use Mixture-of-Experts architectures. Both are pursuing autonomous AI agents that can plan and execute across long time horizons.
The source points to one trend across the latest GLM-5 and Kimi K2.5 benchmarks: Deepseek is losing ground. Deepseek-V3.2 trails both models by a clear margin on several agent and coding tests. According to the South China Morning Post, Deepseek’s next big model, a one-trillion-parameter system, has been delayed because of the growing model size.
For now, GLM-5’s biggest claim is not that it has settled the race. It is that an MIT-licensed open-source model from Zhipu AI is close enough to leading proprietary systems on some visible tests to make the next phase of agent AI more competitive.