Moonshot AI Releases Kimi K2 Thinking, Its Strongest Open-Source Reasoning Model to Date

高榕创投·November 7, 2025·22·0

Models are thinking agents.

On November 6, Moonshot AI officially released Kimi K2 Thinking — the most capable open-source reasoning model the company has built to date.

Kimi K2 Thinking is a new-generation Thinking Agent trained on the "Model as Agent" philosophy. It natively masters the ability to "think while using tools," achieving state-of-the-art (SOTA) results on multiple benchmarks including Humanity's Last Exam, BrowseComp (autonomous web browsing), and SEAL-0 (complex information gathering and reasoning), while delivering comprehensive improvements in agentic search, agentic programming, writing, and general reasoning.

The Kimi K2 Thinking model can autonomously execute up to 300 rounds of tool calls without human intervention, maintaining stable multi-turn reasoning capabilities to help users solve more complex problems. This represents Moonshot AI's latest advance in Test-Time Scaling — achieving stronger agent and reasoning performance by simultaneously scaling both thinking tokens and tool-call rounds.

The Kimi K2 Thinking model is now live on kimi.com and in the latest version of the Kimi mobile app for regular chat mode. The underlying model for Kimi Agent mode will also be upgraded to Kimi K2 Thinking in the future, bringing full multi-turn reasoning and tool-calling capabilities.

The Kimi K2 Thinking model API is accessible through the Kimi Open Platform (platform.moonshot.cn). For self-hosted deployment, the model is available for download on Hugging Face, ModelScope, and other platforms.

Comprehensive Improvements in Reasoning Performance

The Kimi K2 Thinking model demonstrates powerful reasoning and problem-solving capabilities on Humanity's Last Exam, an ultimate closed-book academic test covering over 100 specialized fields. Under equal conditions with access to tools — search, Python, and web browsing — Kimi K2 Thinking achieved a SOTA score of 44.9% on this benchmark.

Here's an example of the reasoning process for a humanities question from Humanity's Last Exam. In this example, Kimi K2 Thinking conducts 5 rounds of search and reasoning, building layer by layer with new information gathered in each round to ultimately arrive at the answer:

↕ Scroll up and down to view the complete reasoning process

Major Gains in Autonomous Search and Browsing

In complex search and browsing scenarios, the Kimi K2 Thinking model also performs exceptionally well. BrowseComp is a benchmark released by OpenAI specifically designed to evaluate AI agent web browsing capabilities. The test was created to measure an AI agent's persistence and creativity in information-overloaded environments — essentially, whether it can "dig deep" like a human researcher. On this highly challenging task, human performance averages only 29.2%. Kimi K2 Thinking demonstrated remarkable research tenacity, setting a new SOTA record with a score of 60.2%.

Powered by long-horizon planning and autonomous search capabilities, Kimi K2 Thinking can leverage dynamic loops of up to hundreds of rounds of "think → search → browse → think → code," continuously proposing and refining hypotheses, verifying evidence, conducting reasoning, and constructing logically consistent answers. This ability to search actively while thinking continuously enables Kimi K2 Thinking to break down vague, open-ended questions into clear, actionable subtasks.

Here's an example: through two rounds of search and reasoning, Kimi K2 Thinking first identifies a manufacturing company based on known stock buyback information, then locates the official stock buyback announcement on the U.S. Securities and Exchange Commission (SEC) website to arrive at an accurate answer:

↕ Scroll up and down to view the complete reasoning process

Continued Refinement of Agentic Programming

The Kimi K2 Thinking model's coding capabilities have also been enhanced, with further improvements on benchmarks including SWE-Multilingual (multilingual software engineering), SWE-bench Verified, and Terminal usage.

The team observed noticeably stronger performance from Kimi K2 Thinking when handling HTML, React, and component-rich frontend tasks, transforming creative concepts into fully functional, responsive products. In agentic coding scenarios, Kimi K2 Thinking can think while calling various tools, flexibly integrating into software agents to handle more complex, multi-step development workflows.

Here are two examples:

Kimi K2 Thinking can now help you recreate a fully functional Word-style text editor.

Kimi K2 Thinking can also help you create a lavishly styled voxel art piece:

Upgraded General Capabilities

Creative Writing: Kimi K2 Thinking significantly elevates writing ability. It transforms rough inspiration into clear, compelling, and purposeful narratives with both rhythmic quality and depth. It easily handles subtle stylistic variations and ambiguous structures while maintaining stylistic coherence across long-form pieces. In creative writing, its imagery is more vivid and its emotional resonance stronger, blending precise expression with rich expressiveness.

Academics and Research: In academic research and professional domains, Kimi K2 Thinking delivers marked improvements in analytical depth, factual accuracy, and logical structure. It methodically dissects complex instructions and develops ideas with clarity and rigor. This makes it especially adept at handling academic papers, technical abstracts, and lengthy reports where informational completeness and reasoning quality are paramount.

Personal and Emotional: When responding to personal or emotional questions, Kimi K2 Thinking answers with greater empathy and a more balanced, neutral stance. Its thinking is thorough, thoughtful, and specific, offering nuanced perspectives and actionable follow-up suggestions. It helps users work through complex decisions with clarity and care, its tone grounded, practical, and more human.

Here's an example of assisting with reading an English technical paper:

↕ Scroll up and down to view the complete analysis process

Native INT4 Quantization for Improved Inference Efficiency

Low-bit quantization is an effective method for reducing latency and GPU memory footprint on large-scale inference servers. Testing revealed that because reasoning models produce extremely long decoding sequences, conventional quantization methods often cause severe performance degradation. To overcome this challenge, the team adopted quantization-aware training (QAT) during the post-training stage and applied INT4 weight-only quantization to the MoE components.

This enables the Kimi K2 Thinking model to support native INT4 inference on complex reasoning and agentic tasks, with generation speed improved by approximately 2x. INT4 offers broader compatibility with inference hardware and is more friendly to domestic Chinese AI accelerators. Notably, all benchmark results were achieved at INT4 precision.

Start Using Now

Head to kimi.com or update to the latest Kimi app, open the "Long Thinking" toggle for the K2 model from the toolbox, and throw your complex tasks at Kimi to think through together. The Kimi K2 Thinking model API is now available on the Kimi Open Platform (platform.moonshot.cn), supporting 256K context at the same pricing as Kimi K2-0905: RMB 4 per million input tokens, RMB 16 per million output tokens, and RMB 1 for cached input hits. The Turbo API with speeds up to 100 tokens/s is also available simultaneously at RMB 8 per million input tokens, RMB 58 per million output tokens, and RMB 1 for cached input hits. Developers are welcome to test and provide feedback on the new model API. About the Kimi K2 Model

The Kimi K2 model was first released on July 11. It is an open-source foundation model built on a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion active parameters. On September 5, the Kimi K2-0905 update further improved coding capabilities and expanded the context window from 128K to 256K. To date, products including Cline, Cursor, flowith, Genspark, Kilo Code, Kortix Suna, OpenRouter, Perplexity, RooCode, TRAE, Trickle, Vercel, Windsurf, and YouWare have integrated or are using the Kimi K2 model. On November 6, Moonshot AI released Kimi K2 Thinking, comprehensively upgrading agent and reasoning capabilities.