Moonshot AI Releases and Open-Sources Kimi K2 Thinking Model, Boosting Agent and Reasoning Capabilities | Z News

真格基金·November 7, 2025

Moonshot AI's most capable open-source reasoning model to date.

On November 6, Moonshot AI released Kimi K2 Thinking, now live on the web and the latest app version, with its API also officially available on the Kimi Open Platform.

Kimi K2 is a new-generation Thinking Agent trained on the "model as Agent" philosophy. On BrowseComp, OpenAI's benchmark for evaluating AI Agent web browsing capabilities, Kimi K2 achieved 60.2%, becoming the new state-of-the-art model.

ZhenFund invested in Moonshot AI's angel round in 2023. Since its launch in October 2023, Kimi has continued to upgrade its foundation model capabilities and expand its product features and interactive experience. The Kimi K2 model was first released on July 11. Currently, multiple products including Cursor, Genspark, Perplexity, and YouWare have integrated or are using the Kimi K2 model.

We look forward to co-creating intelligence with Kimi and more users in the future.

Today, we are releasing Kimi K2 Thinking — Kimi's most capable open-source thinking model to date.

Kimi K2 Thinking is our new-generation Thinking Agent, trained on the "model as Agent" philosophy. It natively masters the ability to "think while using tools." It achieves state-of-the-art performance on multiple benchmarks including Humanity's Last Exam, autonomous web browsing (BrowseComp), and complex information gathering and reasoning (SEAL-0), with comprehensive improvements in Agentic search, Agentic programming, writing, and general reasoning capabilities.

The Kimi K2 Thinking model can autonomously execute up to 300 steps of tool calls through sustained, stable deep thinking — without human intervention — helping users solve more complex problems. This is our latest advance in Test-Time Scaling, achieving stronger Agent and reasoning performance by simultaneously scaling thinking tokens and tool-calling steps.

The Kimi K2 Thinking model is now available on kimi.com and in the latest version of the Kimi mobile app via the standard chat mode. The underlying model for Kimi Agent mode will also be upgraded to Kimi K2 Thinking in the future, bringing full multi-step thinking and tool-calling capabilities.

The Kimi K2 Thinking model API is accessible through the Kimi Open Platform (platform.moonshot.cn). For self-deployment, please download the model from Hugging Face, ModelScope, and other platforms.

Comprehensive Improvement in Reasoning Performance

Let's look at an example from Humanity's Last Exam showing the reasoning process for a humanities question. In this example, Kimi K2 Thinking conducts five searches and reasoning steps, building layer by layer with new information gathered at each step, ultimately arriving at the answer:

Scroll up and down to view the complete reasoning process

Comprehensive Improvement in Autonomous Search and Browsing Capabilities

In complex search and browsing scenarios, the Kimi K2 Thinking model also performs exceptionally well. BrowseComp is a benchmark released by OpenAI specifically designed to evaluate AI Agent web browsing capabilities. Its original purpose was to measure the persistence and creativity that AI Agents demonstrate in information-overloaded environments — essentially, whether they can "dig deep" like human researchers. On this highly challenging task, human performance averages only 29.2%. Kimi K2 Thinking demonstrates remarkable research diligence on this benchmark, achieving 60.2% to become the new state-of-the-art model.

Driven by long-horizon planning and autonomous search capabilities, Kimi K2 Thinking can leverage dynamic loops of up to hundreds of steps — "think → search → browse webpage → think → code" — to continuously propose and refine hypotheses, verify evidence, conduct reasoning, and construct logically consistent answers. This ability to search actively while thinking continuously enables Kimi K2 Thinking to decompose vague, open-ended questions into clear, executable subtasks.

Let's look at an example. In this case, Kimi K2 Thinking conducts two searches and thinking steps: first, it identifies the speedboat manufacturer based on known stock buyback information, then finds the stock buyback announcement on the U.S. Securities and Exchange Commission (SEC) website, arriving at an accurate answer:

Scroll up and down to view the complete reasoning process

Continuous Refinement of Agentic Programming Capabilities

The Kimi K2 Thinking model's coding capabilities have also been enhanced, with further improvements on benchmarks including the multilingual software engineering benchmark SWE-Multilingual, the SWE-bench validation set, and Terminal usage.

We observed that Kimi K2 Thinking shows noticeable performance gains when handling HTML, React, and component-rich frontend tasks, transforming ideas into fully functional, responsive products. In Agentic Coding scenarios, Kimi K2 Thinking can think while calling various tools, flexibly integrating into software agents to handle more complex, multi-step development workflows.

Let's look at two examples:

Now, Kimi K2 Thinking can help you recreate a fully functional Word text editor.

Kimi K2 Thinking can also help you create a flamboyant voxel art piece:

General Foundation Capability Upgrades

Creative Writing: Kimi K2 Thinking significantly improves writing ability. It can transform rough inspirations into clear, compelling, and purposeful narratives with both rhythm and depth. It easily handles subtle stylistic variations and ambiguous structures while maintaining stylistic coherence across long-form content. In creative writing, its imagery is more vivid and its emotional resonance stronger, blending precise expression with rich performative power.

Academia and Research: In academic research and professional domains, Kimi K2 Thinking shows marked improvement in analytical depth, informational accuracy, and logical structure. It methodically dissects complex instructions and develops ideas with clarity and rigor. This makes it especially adept at handling academic papers, technical abstracts, and lengthy reports where informational completeness and reasoning quality are paramount.

Personal and Emotional: When responding to personal or emotional questions, Kimi K2 Thinking's answers are more empathetic and its stance more balanced and measured. Its thinking is thorough, thoughtful, and specific, offering nuanced perspectives and actionable follow-up suggestions. It helps users work through complex decisions with clarity and care, its tone grounded, practical, and more human.

Let's look at an example of assisting with reading an English technical paper:

Scroll up and down to view the complete analysis process

Native INT4 Quantization for Improved Inference Efficiency

Low-bit quantization is an effective method for reducing latency and GPU memory usage on large-scale inference servers. Our testing found that because thinking models produce extremely long decoding lengths, conventional quantization methods often cause significant performance degradation. To overcome this challenge, we adopted Quantization-Aware Training (QAT) during the post-training stage and applied INT4 weight-only quantization to the MoE components.

This enables the Kimi K2 Thinking model to support native INT4 inference in complex reasoning and Agentic tasks, improving generation speed by approximately 2x. INT4 offers stronger compatibility with inference hardware and is more friendly to domestic accelerated computing chips. Notably, all of Kimi's benchmark results were achieved at INT4 precision.

Start Using Now

Go to kimi.com or update to the latest Kimi app, turn on the "Long Thinking" switch for the K2 model from the toolbox, and throw your complex tasks at Kimi to think through together.

The Kimi K2 Thinking model API is now available on the Kimi Open Platform (platform.moonshot.cn), supporting 256K context at the same pricing as Kimi K2-0905: 4 yuan per million input tokens, 16 yuan per million output tokens, and 1 yuan for cached input hits. The Turbo API with speeds up to 100 tokens/s is also available simultaneously at 8 yuan per million input tokens, 58 yuan per million output tokens, and 1 yuan for cached input hits. Developers are welcome to test and provide feedback on the new model API.

About the Kimi K2 Model

The Kimi K2 model was first released on July 11. It is an open-source foundation model with a Mixture-of-Experts (MoE) architecture, 1 trillion total parameters, and 32 billion active parameters. On September 5, the Kimi K2-0905 update further improved coding capabilities and expanded the context window from 128K to 256K. To date, products including Cline, Cursor, flowith, Genspark, Kilo Code, Kortix Suna, OpenRouter, Perplexity, RooCode, TRAE, Trickle, Vercel, Windsurf, and YouWare have integrated or are using the Kimi K2 model. On November 6, the Kimi K2 Thinking model was released, comprehensively upgrading Agent and reasoning capabilities.