The Decoder September 24, 2025 TERMINATOR

Alibaba pushes Qwen3-Max toward coding and automation

Alibaba has released Qwen3-Max, its largest and most capable AI model so far, with more than one trillion parameters and training on 36 trillion tokens. The model is aimed at software development, automation, tool use, and long-context work, while API access is available through Alibaba Cloud Model Studio.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

A larger model aimed at coding, automation, tool use, and agent workflows mildly increases AI capability and autonomy, but this is mostly a routine product launch.

Alibaba pushes Qwen3-Max toward coding and automation

Alibaba has introduced Qwen3-Max, a new flagship AI model built to expand the company’s Qwen lineup into larger-scale coding, automation, and agent workflows. The release puts the model at the center of Alibaba’s push to compete on practical developer tasks, not just general chat performance.

The headline figure is scale: Qwen3-Max has more than one trillion parameters and was trained on 36 trillion tokens. Alibaba describes it as the biggest and most capable model in the Qwen family to date.

A larger Qwen model built for practical work

Qwen3-Max uses the same architecture as the Qwen3 series, which was first announced in April. The difference is size. Alibaba has scaled the system to over a trillion parameters while continuing to use a Mixture of Experts approach.

In a Mixture of Experts model, only part of the full parameter set is active during inference. That matters because it lets a very large system route work through selected expert components instead of engaging everything at once. The source article does not provide the active parameter count, but it does make clear that the design is intended to support a much larger overall model.

A preview version, Qwen3-Max-Instruct, launched earlier this month. That preview already reached third place on the Text Arena Leaderboard, ahead of GPT-5-Chat, OpenAI's GPT-5 variant running with reasoning set to low.

Training focused on stability and long context

Alibaba says the Qwen3-Max training run was unusually stable. The company reported a smooth loss curve and said there were no sudden spikes, rollbacks, or major adjustments during training.

The company also points to efficiency gains. Optimized parallelization made training Qwen3-Max-Base 30 percent more efficient compared to Qwen2.5-Max-Base. For a model at this scale, that kind of improvement matters because training efficiency affects how quickly teams can iterate and how reliably they can operate large systems.

Long-context handling is another major part of the release. Alibaba says new techniques tripled throughput for long-context training and made it possible to handle input sequences up to one million tokens. That kind of context window is especially relevant for software projects, document-heavy workflows, and automation tasks where the model may need to reason over large amounts of input at once.

The team also added automatic monitoring and recovery tools. According to Alibaba, those tools reduced downtime from hardware failures to just a fifth of what was seen with the previous generation.

Coding and agent benchmarks lead the story

The Qwen team reports strong results for Qwen3-Max-Instruct across knowledge, reasoning, programming, instruction following, human preference alignment, agent tasks, and multilingual understanding. The largest gains, however, are in programming and agent abilities.

That focus matches how Alibaba is positioning the model. Rather than presenting Qwen3-Max only as a broad chatbot upgrade, the company is emphasizing real software development and automation.

On SWE-Bench Verified, a benchmark for fixing real-world software bugs, Qwen3-Max-Instruct scored 69.6. Alibaba says that result puts it among the top-performing models available.

On Tau2-Bench, which tests how well models can call external tools and handle complex workflows, Qwen3-Max-Instruct scored 74.8. The source article says that score puts it ahead of Claude 4 Opus and Deepseek V3.1.

Those two benchmark areas are important for different reasons:

Software repair: SWE-Bench Verified evaluates whether a model can deal with real-world bugs rather than isolated coding prompts.
Tool use: Tau2-Bench looks at whether a model can use external tools and manage multi-step workflows.
Automation readiness: Together, the results point toward models that can help with applied developer and agent tasks.

A reasoning version is still on the way

Alibaba is also training a reasoning-focused version called Qwen3-Max-Thinking. That model has not been fully released yet, but the company plans to release it soon.

According to the source article, Qwen3-Max-Thinking has already maxed out the AIME 25 and HMMT math benchmarks, matching results from GPT-5 Pro and Grok 4. The model uses a code interpreter and extra compute during testing.

That extra compute is part of the model’s reasoning setup. Test-time compute lets the system run several solutions at once and choose the best one. AIME is described in the source as a tough student math competition and is often used as a benchmark for logical reasoning.

The separation between Qwen3-Max-Instruct and Qwen3-Max-Thinking also suggests a clear product distinction. One version is already available for instruction-following and applied tasks, while the reasoning-focused variant is being prepared for more compute-heavy problem solving.

Access through Qwen Chat and Alibaba Cloud

Qwen3-Max-Instruct is available on Qwen Chat. Like many other Qwen models, it is not open-source.

Developers can access the model through Alibaba Cloud Model Studio. The interface is compatible with OpenAI APIs, which may make integration easier for teams that already build around that style of API workflow.

Qwen3-Max is also part of a wider expansion of Alibaba’s AI lineup. The company recently introduced Qwen-3-TTS-Flash for voice generation, Qwen-Image-Edit for image editing, Qwen3-Next for faster text processing, and Qwen3-Omni, a multimodal model for text, image, and audio tasks.

Taken together, the release shows Alibaba moving Qwen beyond a single flagship model story. Qwen3-Max is the large-model centerpiece, but the broader lineup now spans text, voice, image editing, and multimodal tasks. For developers watching the AI infrastructure market, the main signal is clear: Alibaba is pushing Qwen toward practical production use, especially where coding, tools, and automation are central.