Yunqi Capital | MiniMax M2 ranks among the top three globally in API calls — but what capability is actually worth paying attention to?

云启资本·November 4, 2025·25·0

The "Mystery" of Interleaved Chain-of-Thought

Just one week after its release, MiniMax M2 has broken into the top three globally by call volume on OpenRouter and claimed the #1 spot on HuggingFace Trending. But what drives M2's standout performance on agent and coding tasks isn't just the model itself — it's the deep support for Interleaved Thinking.

In this Yunqi Capital feature, we're sharing internal R&D thinking from the team at MiniMax — an early Yunqi Capital portfolio company — on why the continuity of "reason → tool use → re-reason" is the key to unlocking truly capable agents.

The following is adapted from MiniMax (Xiyu Technology).

Since its launch, a growing number of developers at home and abroad have begun using MiniMax M2. Within days, M2 has climbed to #3 globally in call volume on OpenRouter and #1 on HuggingFace Trending. M2 is the first Chinese model on OpenRouter to exceed 50 billion daily tokens consumed.

M2 ranks in the top three globally in OpenRouter call volume

M2 tops the global HuggingFace Trending leaderboard

In the early stages of M2's development, we identified the importance of Interleaved Thinking for agent and coding applications. Aside from Anthropic's Claude, most models do not yet fully support Interleaved Thinking — it remains a non-consensus within the industry. From user feedback, we've also noticed that Interleaved Thinking is sometimes not used correctly in practice. Why is Interleaved Thinking important, and how can it be used effectively across different API interfaces to achieve the best results? We'd like to share some of our internal thinking.

Why Is Interleaved Thinking So Important?

Interleaved Thinking is critical for agents: it refers to the alternation between explicit reasoning and tool use, with reasoning results continuously carried forward into subsequent steps. This process significantly improves planning ability, self-correction capability, and reliability in long-horizon tasks. In practice, it transforms lengthy, heavily tool-dependent tasks into stable "plan → act → reflect" loops, reducing state drift and repetitive errors while ensuring each action is grounded in the latest evidence. Interleaved Thinking also improves debuggability: snapshots of the reasoning process make failures interpretable and recoverable, and reusing hypotheses, constraints, and partial conclusions (rather than re-deriving each step) improves sample efficiency. For best results, rather than completing all thinking upfront, interleave thinking with tool feedback and maintain continuity in the chain of thought, allowing it to accumulate across multiple rounds of interaction.

From community developer feedback, we've found that some failure cases stem from not using Interleaved Thinking correctly — specifically, failing to preserve the thinking state from previous rounds across multi-turn sessions. One reason for this is that the widely used OpenAI Chat Completion API does not support returning reasoning content and passing it back in subsequent requests. While the Anthropic API natively supports this capability, the community has done less to support models beyond Claude, and many applications using the Anthropic API still don't feed previous thinking processes back in. This has meant that Interleaved Thinking hasn't been well supported. To fully unlock M2's capabilities, preserving the thinking process across multi-turn interactions is essential.

In MiniMax M2, Interleaved CoT only achieves maximum effectiveness when the previous round's reasoning is preserved and fed back into subsequent rounds. The model reasons between tool calls, continuously passing forward plans, hypotheses, constraints, and intermediate conclusions — it's this sustainable, accumulative reasoning state that makes M2 stable and reliable. Once previous reasoning states are discarded, the model's cumulative understanding degrades, state deviation increases, self-correction weakens, and planning ability regresses — especially in long-horizon tool use and "run-fix" loops.

Multiple benchmarks show that preserving thinking states across previous multi-turn interactions improves performance:

SWE-Bench Verified: 69.4 vs. 67.2 (Δ=+2.2; +3.3%)
Tau^2: 87 vs. 64 (Δ=+23; +35.9%)
BrowseComp: 44.0 vs. 31.4 (Δ=+12.6; +40.1%)
GAIA: 75.7 vs. 67.9 (Δ=+7.8; +11.5%)
xBench: 72.0 vs. 66.0 (Δ=+6.0; +9.1%)

Maintaining complete Interleaved Thinking states is critical — a model's reliability depends not only on its current thoughts but on its ability to review and correct previous ones. Interleaved Thinking institutionalizes this process: plan → act → reflect, with state always preserved, allowing reflection to accumulate and corrections to propagate across multiple turns.

Interleaved Thinking, Illustrated

Unlocking M2's Interleaved Thinking Capabilities

We've provided optimal support for MiniMax M2 and its Interleaved Thinking capabilities on our open platform. For best performance and compatibility, we strongly recommend using our official API. MiniMax offers two API interfaces:

OpenAI-Compatible API

When calling M2 through MiniMax's OpenAI-compatible API, you can now:

Separate reasoning_details field: The model's reasoning process is returned in a dedicated reasoning_details field, no longer mixed in with content. This makes the API structure cleaner and easier to parse.
Complete chain of thought: Passing the reasoning_details field back in subsequent requests ensures the model maintains a complete chain of thought across multiple tool calls, enabling more accurate judgment and planning. (See the official documentation for code examples and more details.)

Anthropic-Compatible API

The Anthropic API natively supports Interleaved Thinking — simply add the model's complete output from each round (including thinking_blocks) to the messages history and send it with your request. (See the official documentation for more details.)

Driving Industry Standards, Building an Agent-Ready Future

Beyond supporting Interleaved Thinking on MiniMax's open platform official API, we're also working closely with partners including OpenRouter, Ollama, Droid, Vercel, and Cline to advance and implement cross-platform support for this capability. Through collaboration across our ecosystem, we hope to establish a unified protocol paradigm for broadly supporting Interleaved Thinking across applications, OpenAI-compatible APIs, and Anthropic-compatible APIs — laying the groundwork for the entire industry. We believe open, unified standards will empower developers worldwide to easily build more powerful, more reliable agents and drive the flourishing of the AI ecosystem. For partnership inquiries, please reach out to us at $2.