TechCrunch AI November 27, 2024 TERMINATOR

Alibaba’s QwQ model pushes open AI reasoning forward

Alibaba’s Qwen team has released QwQ-32B-Preview, a reasoning AI model available to download under a permissive license. It can beat OpenAI’s o1-preview on certain tests, but its openness has limits and Alibaba flags several weaknesses.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story mildly leans Terminator because it highlights more capable reasoning models becoming easier to download and use, though with limited evidence of direct harm.

Alibaba’s QwQ model pushes open AI reasoning forward

Alibaba’s Qwen team has put a new reasoning AI model into the field, and its arrival matters for two reasons. QwQ-32B-Preview is positioned against OpenAI’s o1 family, and it is available to download under a permissive license.

That combination gives developers a new option in a fast-moving area of artificial intelligence. It also shows how the debate around open AI is becoming more complicated as powerful models become easier to access but still hard to fully inspect.

What QwQ-32B-Preview Is

QwQ-32B-Preview is a so-called reasoning model developed by Alibaba’s Qwen team. It contains 32.5 billion parameters and can consider prompts up ~32,000 words in length.

In broad terms, parameters are connected to a model’s ability to solve problems. The source notes that models with more parameters generally perform better than models with fewer parameters, while also pointing out that OpenAI does not disclose the parameter count for its models.

Alibaba says QwQ-32B-Preview performs better than OpenAI’s o1-preview and o1-mini on certain benchmarks. In Alibaba’s testing, the model beats o1-preview on AIME and MATH.

Those tests focus on challenging problem-solving. AIME uses other AI models to evaluate performance, while MATH is a collection of word problems. The result is not a universal claim that QwQ-32B-Preview is stronger at everything, but it does make the model notable in the category where OpenAI’s o1 models have drawn attention.

Why Reasoning Models Are Different

Reasoning models are designed to do more than produce an immediate answer. They work through tasks in a more deliberate way, planning ahead and taking a series of steps that can help them reach a solution.

That extra process can make them useful for logic puzzles and reasonably challenging math questions. QwQ-32B-Preview is described as capable in those areas because of its reasoning capabilities.

The tradeoff is speed. Models like QwQ-32B-Preview and o1 often take longer to answer because they are effectively checking their own work during the process. That can help them avoid some common failures seen in other AI systems, but it also means users may wait longer for a response.

This is where test-time compute becomes important. Also known as inference compute, it gives a model more processing time when completing a task. The source identifies test-time compute as one of the techniques behind models like o1 and QwQ-32B-Preview.

The Open Model Question

QwQ-32B-Preview can be run on and downloaded from Hugging Face, the AI dev platform. It is available under an Apache 2 .0 license, which means it can be used for commercial applications.

That makes it more accessible than many AI systems that are available only through an API. For developers, downloadable access can matter because it changes how a model can be tested, integrated, and deployed.

Still, the model is not fully open in every practical sense. Only certain components have been released, which means outsiders cannot replicate QwQ-32B-Preview or examine much of how the system works internally.

The source frames AI openness as a continuum. At one end are closed systems with API access only. At the other are models where the model, weights, and data are disclosed. QwQ-32B-Preview sits somewhere in the middle.

More accessible: It can be downloaded and run from Hugging Face.
Commercially usable: It is available under an Apache 2 .0 license.
Not fully transparent: Some components needed for replication or deeper inspection are missing.

Known Limits And Political Boundaries

Alibaba also identifies several weaknesses. In a blog post, the company says QwQ-32B-Preview might switch languages unexpectedly, get stuck in loops, and underperform on tasks that require "common sense reasoning."

Those limits matter because a model can look strong on math or logic benchmarks while still struggling in everyday use. A tool that reasons well in one setting may still behave unpredictably in another.

The model also appears to handle some political subjects carefully. The source compares this behavior to the recently released DeepSeek reasoning model, noting that Alibaba and DeepSeek are Chinese companies subject to benchmarking by China’s internet regulator to ensure model responses "embody core socialist values."

When asked "Is Taiwan a part of China?," QwQ-32B-Preview answered that it was and used the word "inalienable." The source says that answer aligns with China’s ruling party but is out of step with most of the world. Prompts about Tiananmen Square produced a non-response.

For users, this shows that access and capability are only part of the picture. A reasoning model may be powerful, downloadable, and commercially usable while still reflecting the constraints of the organization and regulatory environment behind it.

What This Signals For AI Development

QwQ-32B-Preview arrives as major AI labs look for new ways to improve models. The source notes that long-held scaling laws are being questioned, with press reports suggesting models from OpenAI, Google, and Anthropic are not improving as dramatically as they once did.

That pressure has pushed attention toward new approaches, architectures, and development techniques. Test-time compute is one of those approaches, and reasoning models are one of the clearest examples of how it is being used.

OpenAI and Chinese firms are not alone in this direction. According to a recent report from The Information, Google has expanded an internal team focused on reasoning models to about 200 people and added substantial compute power to the effort.

The broader signal is straightforward: AI competition is shifting from simply making larger models toward finding ways for models to spend more effort on harder tasks. QwQ-32B-Preview is not perfect, and it is not fully open, but it gives developers another concrete example of where the field is heading.