The Decoder November 8, 2025 TERMINATOR

Kimi K2 Thinking raises the bar for open-source AI agents

Moonshot AI has unveiled Kimi K2 Thinking, an open-source language model built for long, tool-driven reasoning tasks. The model combines a one trillion parameter design, a 256,000-token context window, and benchmark results that Moonshot AI says set new records in agentic reasoning and coding.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 0 ►

The story emphasizes increasingly powerful open-source agentic AI that can use tools autonomously across long task chains.

Kimi K2 Thinking raises the bar for open-source AI agents

Moonshot AI’s Kimi K2 Thinking is being positioned as a major step for open-source AI systems that can reason, browse, code, and use tools across long task chains. The Chinese AI company calls it the "best open-source thinking model," and its launch puts fresh attention on how open models are moving into territory often associated with leading commercial systems.

A model built for multi-step tool use

Kimi K2 Thinking is designed as a "thinking agent." In practical terms, that means it is not only meant to answer a prompt directly, but to break a complex task into steps, use available tools, check results, and continue working through the problem.

The model uses "test time scaling," a runtime approach that increases reasoning tokens and tool calls while a task is being processed. According to Moonshot AI, Kimi K2 Thinking can carry out between 200 and 300 tool calls in sequence without human help, while maintaining logical consistency over hundreds of steps.

That matters because many useful AI tasks are not single-turn questions. Research, coding, web browsing, mathematical work, and document creation often require loops: form a hypothesis, search or calculate, inspect the result, revise the plan, and continue. Moonshot AI is presenting K2 Thinking as a system built specifically for that kind of workflow.

The scale behind Kimi K2 Thinking

K2 Thinking is a one trillion parameter model. It uses a mixture-of-experts architecture, so only 32 billion parameters are active at a time. The model also has a context window of 256,000 tokens, allowing it to work with large amounts of information in a single session.

Moonshot AI has also applied quantization-aware training to make the model more practical to run. The company says this compresses parts of the model, reduces memory needs, and roughly doubles text generation speed compared with the uncompressed version.

Importantly, Moonshot AI says the published benchmark results already use this optimized model. That means the reported performance is tied to the version intended to be more usable, rather than only to an uncompressed research version.

Benchmark results emphasize reasoning and coding

Moonshot AI says K2 Thinking has reached record-setting results across reasoning, coding, and agent-based tests. On Humanity's Last Exam (HLE) with tools, the model scored 44.9 percent, which the company describes as a new high for that benchmark.

On BrowseComp, a benchmark focused on agentic search and browsing, K2 Thinking reached 60.2 percent. The source article notes that this is far above the human baseline of 29.2 percent.

Its coding results are also central to Moonshot AI’s claims. K2 Thinking scored 71.3 percent on SWE-Bench Verified and 61.1 percent on SWE-Multilingual. Moonshot's comparison chart shows the model ahead of some leading commercial models, including GPT-5 and Claude Sonnet 4.5, as well as Chinese rival Deepseek-V3.2, in certain tests.

Moonshot AI also highlights front-end and application-building examples. In one demo, Kimi K2 Thinking generated a functional Word-style document editor from a single prompt. The company says the model performs strongly on HTML, React, and other front-end tasks, producing responsive apps from prompts.

Agentic search and long reasoning loops

Beyond coding, K2 Thinking is built for research workflows that combine search, browsing, reasoning, and programming. The model can cycle through thinking, searching, browsing, thinking again, and programming while checking evidence and building an answer.

Moonshot AI’s examples are meant to show that this is not limited to short reasoning traces. In one case, the model solved a PhD-level math problem using 23 nested reasoning and tool calls. It researched relevant literature, ran calculations, and reached the correct answer.

In another demo, K2 Thinking handled a research task that required identifying a person from several criteria, including a college degree, NFL career, and roles in movies and TV. After searching across multiple sources, the model identified former NFL player Jimmy Gary Jr.

These examples underline the same point: Moonshot AI is not pitching K2 Thinking only as a chatbot. It is framing the model as an open-source agent that can coordinate multiple actions across a longer chain of work.

Access, licensing, and the wider open-source race

K2 Thinking is available now on kimi.com and via API. The full Agentic Mode is coming soon, while the current chat mode offers a streamlined toolset for faster responses. Model weights are available on Hugging Face.

The source article also notes two important details around cost and licensing. According to CNBC, training the Kimi K2 model reportedly cost around $4.6 million, citing a source familiar with the matter. Simon Willison noted that the model’s MIT license includes a modification: companies using it commercially must display the Kimi K2 name prominently if they generate over $20 million in monthly revenue or have beyond 100 million monthly active users.

That licensing clause may reflect concern that U.S. companies could adopt Chinese open-source models, often because they are cheaper, without revealing their use in commercial products.

Kimi K2 Thinking also follows earlier attention around the standard Kimi K2 model. Moonshot AI drew notice in July when that model competed with top systems such as Claude Sonnet 4 and GPT-4.1. The earlier Kimi K2 was not given special reasoning training, but it was tuned for agentic tasks and tool use, and it posted strong results in math, science, and multilingual benchmarks.

With K2 Thinking, Moonshot AI is pushing that direction further. The key claim is not simply that an open-source model can answer difficult questions, but that it can sustain tool-based reasoning over many steps while remaining useful for coding, browsing, and structured research.