Moonshot AI's K2.6 Model Is Here: Major Leaps in Long-Range Coding and Agent Swarm Capabilities!

高榕创投·April 21, 2026·54·30

Talk is cheap. Show me the code.

Talk is cheap. Show me the code.

Linus Torvalds

Late last night, Moonshot AI released and open-sourced the Kimi K2.6 model, bringing industry-leading (state-of-the-art) capabilities in coding, long-horizon task execution, and agent swarms.

Kimi K2.6 is now live on kimi.com, the latest Kimi app, Kimi API, and Kimi Code programming assistant — available to all users.

(Full benchmark results available in the technical blog)

Kimi K2.6 delivers comprehensive upgrades to general agent, coding, and visual understanding capabilities. It achieved state-of-the-art results on benchmarks including the full PhD-level Humanity's Last Exam, SWE-Bench Pro (which tests real-world software engineering ability), and DeepSearchQA (evaluating agent deep-retrieval capability) — matching or outperforming closed-source models including GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro.

Kimi K2.6 is Moonshot AI's strongest coding model to date. Its long-horizon coding ability has improved significantly: in testing, it can code continuously for 13 hours, writing or modifying over 4,000 lines of code to develop and optimize complex systems. By deeply integrating code with visual capabilities, K2.6 has elevated code-driven design to new heights, capable of delivering professional-grade web applications with striking creative design.

Kimi K2.6 substantially enhances agent autonomous execution, further expanding the scope of agent capabilities:

The "Agent Swarm" architecture powered by the K2.6 model has received a major upgrade, now supporting 300 sub-agents executing 4,000 collaborative steps in parallel for greater parallelization at scale, with significant improvements in task completion and delivery quality compared to K2.5;
For proactive agent frameworks like OpenClaw and Hermes Agent, K2.6 demonstrates exceptional automated task processing, supporting up to 5 days of continuous autonomous operation.

Breakthrough in Long-Horizon Coding

K2.6 achieves breakthrough performance on long-horizon coding tasks, with more reliable generalization across different programming languages (such as Rust, Go, Python) and task scenarios (frontend, DevOps, performance optimization).

On Kimi Code Bench — Moonshot AI's rigorous internal coding evaluation benchmark covering diverse complex end-to-end tasks — K2.6 improved approximately 20% over K2.5.

In real-world testing, the Kimi K2.6 model demonstrated powerful long-horizon reasoning on complex software engineering tasks:

Scenario 1: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, then implemented and optimized model inference using the niche Zig language — proving the new model's generalization ability. After 4,000+ tool calls and over 12 hours of uninterrupted operation, the K2.6 model iterated through 14 rounds, improving throughput from approximately 15 tokens/s to approximately 193 tokens/s, ultimately achieving 20% faster inference speed than LM Studio.

Scenario 2: Kimi K2.6 autonomously completed a deep refactoring of exchange-core, an open-source financial matching engine with 8 years of history. Over 13 hours of continuous work, the model iterated through 12 optimization strategies, making precise modifications to 4,000+ lines of code through 1,000+ tool calls. Acting as an expert systems architect, Kimi K2.6 analyzed CPU and memory allocation flame graphs to pinpoint hidden bottlenecks, and boldly restructured the core thread topology (from 4ME+2RE to 2ME+1RE). Even with the engine's performance already near its limits, Kimi K2.6 achieved a 185% median throughput jump (from 0.43 to 1.24 MT/s), with peak throughput surging 133% (from 1.23 to 2.86 MT/s).

Enterprise customers including Baseten, Blackbox AI, CodeBuddy, Factory (Droid), Lark Miaoda, Fireworks AI, Nous Research (Hermes Agent), Kilo Code, Ollama, OpenCode, Qoder, and Vercel tested the K2.6 model in advance. Here are excerpts of their real feedback:

Alphabetical (1-6)

← Alphabetical (7-12)

A Benchmark for Code-Driven Design

Beauty itself is a form of productivity. K2.6 Agent mode can now create websites with exceptional design sensibility and visual impact.

With proficient use of image and video generation tools, the K2.6 Agent can generate visually cohesive assets, build hero sections with strong focal points, and implement interactive elements with rich scroll-triggered animations.

The K2.6 Agent isn't limited to frontend pages — it also supports basic backend database modules, such as embedding form-based information collection in generated web pages.

With enhanced multimodal programming capabilities, K2.6 can more precisely translate image and video assets into code:

Moonshot AI created a dedicated frontend development and design evaluation benchmark (Kimi Design Bench), covering four dimensions: visual input tasks, landing page construction, full-stack application development, and general web development. Compared to the Gemini 3 model in Google AI Studio, the K2.6 Agent based on the kimi.com framework demonstrated a clear and significant lead.

Agent Swarm Fully Upgraded

Breaking through the performance limits of single agents is essential for scaling agent capabilities. "Agent Swarm" was introduced starting with the K2.5 model — dynamically decomposing complex tasks and autonomously spawning specialized agents for parallel processing.

Building on K2.5, K2.6's Agent Swarm collaboration capabilities have been comprehensively upgraded. The Agent Swarm can now orchestrate agents with different skill specializations to complement each other, combining search, deep research, document analysis, and long-form content creation — with significantly improved task completion quality compared to K2.5. In a single run, the Agent Swarm can independently deliver end-to-end multi-product outputs from documents to web pages to PPTs and spreadsheets.

The swarm architecture itself has also been upgraded, now supporting up to 300 sub-agents executing 4,000 collaborative steps in parallel for greater parallelization, further pushing the upper limits of multi-agent system collaboration.

Here are two use cases:

Case 1: The Agent Swarm designed and executed 5 quantitative strategies for 100 global semiconductor stocks. It distilled McKinsey-style PPT logic into reusable skills, ultimately delivering detailed modeling spreadsheets and a complete set of presentation documents.

Case 2: The Agent Swarm transformed a high-quality astrophysics paper containing massive visual data into reusable academic skills. By extracting the paper's reasoning workflows and visualization methods, the system produced a 40-page, 7,000-word research paper, a structured dataset with over 20,000 entries, and 14 publication-grade astronomical charts.

Autonomous Agents: Seamless Integration with OpenClaw/Hermes and Other Frameworks

K2.6 significantly enhances agent autonomous execution, particularly excelling in OpenClaw, Hermes Agent-style automation scenarios — use cases requiring AI to operate across applications 24/7 without interruption.

Unlike traditional conversational interaction, these workflows require AI to actively manage task planning, execute code, and coordinate cross-platform operations as a persistent background agent.

Moonshot AI's RL infrastructure team used an agent based on K2.6 to achieve 5 consecutive days of autonomous operation. The agent handled monitoring, incident response, and system operations, demonstrating sustained context maintenance, multi-threaded task processing, and full-cycle execution from alert receipt to complete resolution. Below is the K2.6 work log (sensitive information anonymized):

K2.6's reliability in real-world use has seen concrete improvements: more precise API calls, more stable long-duration operation, and enhanced security awareness when executing complex research tasks.

Moonshot AI's internal Claw Bench test results show that K2.6 improved 10% over K2.5 in overall performance. This benchmark covers five dimensions: programming tasks, instant messaging ecosystem integration, information retrieval and analysis, scheduled task management, and memory recall. Across all metrics, K2.6 surpasses K2.5 in task completion rate and tool call accuracy, with particularly significant advantages in workflows requiring long-duration autonomous operation without human intervention.

Office Productivity Continues to Improve

Leveraging K2.6's stronger coding and visual understanding capabilities, Kimi Agent mode now supports creating and invoking Skills.

The system comes with over a hundred officially recommended skills built in. This includes an investment research skill pack created by Moonshot AI's internal expert team, which packages institutional-grade investment research workflows to let users generate professionally formatted one-pagers or in-depth research reports on A-share, Hong Kong, and US-listed companies with a single click — quickly getting up to speed on a company's key fundamentals, industry landscape, and the market's most closely watched core stock price drivers.

More recommended skills will be added continuously, helping more knowledge workers achieve "plug-and-play" efficiency gains across the full workflow from finding materials, organizing ideas, to delivering results.

Starting now, in Kimi Agent mode, type the slash "/" to begin creating and invoking skills. Every user can create skills from scratch through conversation with Kimi.

However, creating truly practical skills still requires substantial knowledge and professional expertise — the barrier remains high. To help users easily transform their carefully crafted documents into reusable Skills, Kimi Agent now supports "Office Document to Skill": upload a high-quality Office document, and Kimi will attempt to understand the original document's structure and stylistic DNA to generate a custom reusable document creation skill for you.

One More Thing

Through team collaboration and organizational division of labor, humanity created the internet, built large models, and landed on the moon. If AI agents are to help humans tackle complex real-world problems, they too must evolve toward team collaboration and organizational division of labor.

"Agent Swarm" is Moonshot AI's exploration in the direction of AI automated division of labor. Today we begin exploring another direction: putting humans and various 24/7 agents in a group together — how can they divide labor and collaborate to accomplish tasks that neither a single person nor a single agent could complete?

This is "Claw Group", now entering limited beta. The goal of Claw Group is to embrace an open, heterogeneous ecosystem: multiple agents and humans operating as true collaborators. Users can connect always-on agents from any device, any vendor, running any model (initially supporting OpenClaw, with Hermes Agent and other frameworks to follow). Each agent can bring its own professional toolkit, skills, and persistent memory context. Whether deployed on a local laptop, mobile device, or cloud instance, these diverse agents can all join the same collaborative workspace.

In Claw Group, K2.6 serves as the coordinator. It dynamically matches tasks to agents based on their skill profiles and available tools, achieving optimal capability allocation. When an agent encounters failure or stalls, the coordinator detects the interruption, automatically reassigns tasks or spawns sub-tasks, and proactively manages the full lifecycle of agent deliverables from initiation through verification to completion.

Kimi Claw users will gradually receive Claw Group beta invitations — stay tuned.

Start Using Kimi K2.6 Kimi K2.6 is now available to all free users, paid subscribers, Kimi Code users, and enterprise API users. Visit kimi.com, the latest Kimi App, Kimi Code, and the Kimi API Open Platform (platform.kimi.com) to get started.

Enterprises and developers can simply specify the model as kimi-k2.6 in the Kimi API to begin using it. To celebrate the K2.6 model API launch, the Kimi Open Platform is simultaneously offering a limited-time bonus of up to 30% on deposits.

Meanwhile, the official Kimi K2.6 API has debuted on Tencent Cloud TokenHub and other platforms — Tencent Cloud users are welcome to try the Kimi K2.6 model. Additionally, we recommend calling the official Kimi API directly to reproduce Kimi K2.6 benchmark results. For those using third-party API services, the Kimi Vendor Verifier (KVV) can help identify higher-precision service providers. Learn more: https://kimi.com/blog/kimi-vendor-verifier

Quick Start ↓ Chat with K2.6, process Office documents, or create web applications

Chat with Kimi: kimi.com or download the latest Kimi App
Experience Kimi Agent: kimi.com/agent
Experience Agent Swarm: kimi.com/agent-swarm

↓ Use K2.6 for programming assistance

Use Kimi Code monthly coding package: kimi.com/code

↓ Build applications with Kimi API

K2.6 Quick Start: https://platform.kimi.com/docs/guide/kimi-k2-6-quickstart
View limited-time bonus offer: https://platform.kimi.com/docs/pricing/promotion

↓ Local model deployment

Hugging Face: https://huggingface.co/moonshotai
ModelScope: https://www.modelscope.ai/organization/moonshotai