Moonshot AI K2's Writing Ability Nears Qwen 3

葬AI葬AI·July 15, 2025

Moonshot AI still has dreams (not a paid post 😭)

"Didn't get paid by Moonshot AI 😭"

Over the past couple of days, I tested the writing capabilities of Moonshot AI's new model, Kimi K2.

My partner and I are building a writing product, so we'd already run all the major models on the market through their paces.

My ranking for writing ability: Gemini 2.5 Pro > Claude Sonnet 4 ≥ various stitched-together models > Qwen 3 > Kimi K2 > DeepSeek R1

The vast majority of posts on this newsletter use AI-generated first drafts. I had all these models rewrite StepFun Has No News twice, then evaluated the results for strengths and weaknesses.

The full test document is in this Lark link, with the testing process and complete outputs from each model:

https://likczh6fsao.feishu.cn/docx/IPNxd1SZhoXjWkx6vW1c6vuTnYd?from=from_copylink

Space constraints mean this post will stick to conclusions.

Kimi K2's writing ability is slightly below Qwen 3. Its logical reasoning is solid — on par with Qwen 3, capable of explaining complex topics clearly. But its prose style skews toward DeepSeek: a bit floaty, with flashes of inspiration that prove uncontrollable.

K2's biggest writing problem is hallucination. It generates numerous specific claims and data points absent from the context, and does so fluently enough that you won't catch them without close reading. This makes it nearly unusable for serious writing.

For example, I had K2 write the full text of StepFun Has No News based on my verbal transcription and outline.

K2 fabricated large amounts of dialogue I never said.

My original point was something like: I asked a few friends, none knew any StepFun news. K2 rendered this as: "I asked ten AI founders; nine shook their heads, and the tenth shot back: 'Isn't ModelBest more qualified than them?'"

Did it study journalism? 😌

Most egregiously, it invents data with false precision. It completely made up: "01.AI once built an AI coding assistant that just cleared 50,000 DAU before the team disbanded," and "Zhipu AI once had a meeting-notes tool with 30,000 DAU; the lead has since left to start his own company."

Had I not known these companies well enough to be certain I never said such things in the transcription, and had I published this verbatim, I'd likely be getting sued right now.

K2's prose is also floaty, much like DeepSeek's — those same unpredictable flashes of inspiration. It spontaneously coined the line: "In the first half of 2024, VCs were throwing money around like grenades."

Vivid, sure. But overall, this unstable creativity combined with severe hallucination makes K2 unsuitable for serious writing.

The best writing model I've experienced so far is Gemini 2.5 Pro.

Gemini 2.5 Pro's logical reasoning is exceptionally strong. Ask it for a 2,000-word piece, and the draft it produces in one go has genuine logical coherence from sentence to sentence. Combined with its 1-million-token context window, it can handle all my writing needs.

Of course, Gemini occasionally still produces telltale AI-isms — flashes of inspiration that don't quite land.

For stability, Claude Sonnet 4 performs well. Its prose is extremely plain and unadorned, almost never saying anything weird. But Claude's weakness is equally clear: weaker logical ability. On pieces over 2,000 words, the overall flow is smooth, yet individual sentences lack logical connection to each other.

Then there's Qwen 3 — much like a scaled-down Gemini 2.5 Pro. Solid logic, capable of complex writing tasks. Clean, simple prose with occasional inspired lines. But Qwen 3 over-compresses, routinely stripping away useful details until the article is just a logical skeleton without fleshy specifics.

Back to Kimi K2.

At least on writing, K2 delivered no surprises. Logic comparable to Qwen 3, but more hallucination-prone, with a shorter context window too (128k vs. 256k). This makes it difficult for K2 to handle particularly complex writing tasks in one shot.

But does that mean I can say Moonshot AI has no dreams?

I retract my previous bias (didn't get paid by Moonshot AI 😭)

As of July 2025, the Moonshot AI team has produced a model competitive with Qwen 3 — and open-sourced this trillion-parameter model.

I can't say a company willing to open-source a near-state-of-the-art model has no dreams. Even if I won't use K2 for work, I maintain my respect for the team that built it.

(Images in this post generated by ChatGPT o3, with Gemini 2.5 Pro assisting the writing.)