MiniMax M2.1: 10B Active Parameters, Claims Global SOTA in Multilingual Coding | Yunqi Capital

云启资本·December 25, 2025·7·0

Let the hard tech speak for itself.

At a pivotal moment — having just cleared its Hong Kong Stock Exchange hearing and making fresh progress toward its IPO — Yunqi Capital portfolio company MiniMax has also delivered impressive new results in model R&D.

Recently, MiniMax released its latest flagship Coding & Agent model, M2.1. On the Multi-SWE-bench leaderboard, which measures multilingual software engineering capabilities, M2.1 achieved 49.4% with only 10B active parameters, surpassing multiple top international models to claim the global SOTA.

Continuing to let hardcore tech do the talking, M2.1 also serves as the latest testament to MiniMax's engineering capabilities and R&D efficiency. Read on with this edition of "Yunqi Capital" for the full story.

The following is adapted from QbitAI.

Original title: A New King of AI Coding Emerges! MiniMax M2.1 Takes the Multilingual Programming SOTA

MiniMax's latest flagship Coding & Agent model, M2.1, has been officially released.

On one hand, there's the fresh progress on clearing the Hong Kong Stock Exchange hearing. On the other, new models keep dropping — fast — and this one just hit SOTA.

What it's trying to solve is the serious "subject bias" problem that has plagued previous models.

This so-called bias means that past models performed reasonably well writing Python scripts or Web front-end pages, but the moment they encountered backend architecture or low-level logic, their performance would cliff-dive.

M2.1's core evolution lies in finally cracking this problem — mastering backend development conventions.

The release of M2.1 also proves that MiniMax has maintained a high-frequency R&D rhythm even while pushing through its IPO process.

Deeper Understanding of Low-Level Systems: 10B Active Parameters Take SOTA

M2.1 translates its understanding of engineering context into deep adaptation of the development toolchain. It doesn't just generate code — it skillfully works with mainstream programming tools like Cursor and Claude Code to execute precise fixes or refactors in existing codebases.

This means it's no longer a rookie that can only write new features, but a seasoned hand that can follow established architectural conventions and perform engineering-grade operations.

Specifically, M2.1 has systematically improved capabilities across Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages.

For WebDev and AppDev, M2.1 has significantly strengthened native Android and iOS development capabilities, addressing a widely acknowledged industry weakness in mobile development.

Additionally, as the first model series to introduce Interleaved Thinking, M2.1 doesn't just focus on whether code executes correctly — it simultaneously attends to the model's ability to integrate and execute "compound instruction constraints."

In use, M2.1 demonstrates strong generalization, performing well across various programming tools and Agent frameworks including Claude Code, Droid (Factory AI), and Cline.

To validate these capabilities in real-world environments, MiniMax also built and open-sourced a new evaluation benchmark, VIBE (Visual & Interactive Benchmark for Execution in Application Development), expanding assessment dimensions from pure text to Web, simulation, Android, iOS, and backend — five domains in total.

M2.1 ultimately scored 88.6 on average, with overall performance approaching Claude Opus 4.5. Notably, on the most complex Android subtask, it scored 89.7 — providing developers looking to solve native client challenges with AI a compelling data point.

That's enough about features. How does MiniMax M2.1 actually perform on real programming tasks? Let's put it to the test.

Hands-On with MiniMax M2.1

First up — an H5 mini-game.

To more realistically simulate actual development scenarios, we didn't dump all requirements at once. Instead, we broke the process into three stages.

We're building a "Space Slingshot" game. The first round's goal is to set up the most basic interface and functionality.

In under a minute, MiniMax M2.1 completed the HTML structure, CSS styling, and JS scripting.

The actual running result confirmed that M2.1's code satisfied all requirements in the prompt.

Of course, a game designed this way isn't very challenging. We need to add extra mechanics on top of this foundation — that's the task for round two.

M2.1 reads the existing code and new instructions, making multi-round modifications on top of the original base.

As expected, a "black hole" appears in the result page — and refreshing repeatedly confirms that both the black hole's size and position are indeed randomly generated.

Next, playtesting confirms that the ball is indeed affected by the black hole's gravitational pull, and the game automatically ends once absorbed.

Now the difficulty has definitely ramped up — but maybe too much. This kind of "add water when too thick, add flour when too thin" requirement tweaking is also common in real development. So phase three introduces some new mechanics to lower the difficulty a bit, plus adds visual effects.

Checking the running result against the three requirements in the prompt — all satisfied.

That basically completes the "Space Slingshot" game development process. But there's one more bonus question: rewrite this program in Python.

After understanding the web version's program logic, M2.1 grasped what needed to be expressed and successfully completed the code migration from front-end to Python.

Next, let's switch languages and test the latest model's backend development capabilities.

Past models consistently approached backend code with a front-end mindset, resulting in code that either wouldn't run or wasn't practically usable. But after hands-on testing with M2.1, it feels like this hard nut of backend development has genuinely been cracked.

It just so happens that the QbitAI website backend needs an update, so we chose Java — a mainstream development language — to implement a permission design system. This is an essential component of every system, and also a key element that industry practitioners believe needs redesigning for large-scale Agent deployment.

Because this is a systemic task rather than a minor patch, we didn't directly generate code from prompts. Instead, we first had the model output a design document based on requirements, then implemented code based on that document.

The model quickly output a Markdown document with very detailed content, including which classes the permission design needs to implement.

What methods and properties each class contains, property types, method parameters, return values, and comments — all clearly written.

It also clearly wrote out the relationships between classes based on my requirements, making good use of Java's inheritance characteristics from the design stage.

Finally, it offered several database table design suggestions, defining required fields and corresponding properties. Does this seem more useful than that colleague of yours who never writes documentation? (Doge)

Returning to the conversation with the model, we had it generate code based on its own design document (doge).

The model generated code quickly as always, with a clear project package structure separating entity classes, enums, and implementation logic, with comments throughout. An IDE line-counting plugin showed this mini-project totaled over 1,700 lines of code — and I only spent under a minute typing two sentences**.

Next, we had M2.1 put together a UI interface for it.

The results were very surprising — all functionality required in the previous stages was fully implemented.

Switching to a low-permission account, the available operations also matched the initial design.

In summary, M2.1 genuinely has some chops when it comes to backend logic design — capable of completing full-stack delivery from backend to frontend.

Of course, its multilingual capabilities don't stop there. Even niche languages like Rust have received dedicated training.

For example, in an official demo case, M2.1 used Rust to build a dual-mode CLI + TUI Linux security audit tool, supporting one-click底层扫描 and intelligent risk rating for processes, networks, SSH, and other critical items.

And addressing the industry's widely acknowledged mobile development weakness, M2.1 has significantly strengthened native Android and iOS development capabilities.

For example, this iOS home screen interactive widget features a "Sleeping Santa" tap-to-wake mechanism with complete logic and native-grade interactive animations.

Why did MiniMax choose to release a new model at this particular moment?

Hardcore Progress on the Eve of IPO

At the delicate juncture of having just cleared its Hong Kong Stock Exchange hearing, MiniMax's choice to release M2.1 is a quiet strategic declaration.

The outside world often slaps感性 labels on this company due to hit products like Talkie and Hailuo, believing its strengths lie in voice and video multimodal interaction.

But MiniMax's consecutive pushes on M2 and M2.1 this year prove its text model coding and Agent capabilities.

For a long time, the industry has assumed AI is only good at fault-tolerant tasks like Web front-ends or Python scripts. M2.1 breaks through this ceiling by aligning with job context — for instance, truly understanding Go's concurrency model or C++'s memory management mechanisms.

M2.1 is also a concrete explanation of MiniMax's R&D efficiency. Its prospectus reveals that since founding, the company has spent only about $500 million to build full-modal capabilities.

A key pillar supporting this efficiency is its exceptionally high "AI quotient" internally — over 80% of code is already written by AI. M2.1 is essentially the capability overflow of this long-serving "AI intern."

This "internal-use-turned-external-sale" path means the model has already served as a productivity tool supporting high-intensity iteration by a 385-person team before ever hitting the market.

Against this backdrop of highly AI-ified processes, MiniMax has developed unique insights into AI-native organizations — AI needs to create value across more job functions and in more authentic production scenarios.

It is precisely based on this understanding that this model was born.

For developers, this may carry more reference value than raw parameter metrics.

The capabilities and value demonstrated by the new model are MiniMax's best roadshow.

Talk is cheap. Show you the Model~