May God bless Qwen and save Alibaba

葬AI·July 1, 2026

Where gold and silver mountains lie ahead

"Gold and silver mountains lie ahead"

"Comrades and friends, the patch notes are in!

AI app builders are out of moves. Lady Luck has tipped the scales back toward the foundation model companies once again. In light of this, we're bringing back our tribute series from a year ago — running through every model company one by one.

We've already covered Moonshot AI, Zhipu AI, and Doubao. Stay tuned for the rest 😘"

Alibaba is a fascinating company.

AI is absolutely white-hot right now. GLM 5.2 has hit Opus 4.8 level, Zhipu AI has completely shed its stereotype as a 2G shop run by old professors grinding for government grants, and its market cap has broken through a trillion Hong Kong dollars — roughly half of Alibaba's.

After finishing the first edition of the Funeral AI Benchmark, I immediately saw how impressive GLM 5.2 was. What I also noticed was that Qwen 3.7 Max was damn impressive too — just a hair behind GLM, at Opus 4.7 level.

Here's where Alibaba gets interesting. Qwen 3.7 Max is indisputably China's #2 domestic model, significantly ahead of ByteDance's Seed 2.1 Pro.

And yet everyone still thinks of Alibaba as an old-guard company. Despite having a killer model, our dear Alibaba's stock has time-traveled back twelve years, happily crashing through its IPO price.

https://funeralai.cc/test/

This is karma, folks.

Alibaba went scorched-earth against Meituan's Wang Xing. Sure, Wang's gone full white-haired, issuing self-criticism edicts saying he's largely to blame for the stock collapse. But Alibaba's not living the good life either. Mid-to-senior P-level compensation packages got slashed by nearly half.

So here's the thing: delivery or AI — this is Alibaba's two-line struggle. Either delivery crushes AI, or AI crushes delivery.

The logic is simple. Every global tech giant is all-in on AI. But Alibaba's still fighting the delivery wars. Of course the shorts are coming for you.

Alright, enough bullshitting. Let's seriously discuss Alibaba's AI business.

Alibaba loves organizational reform, and its AI business is undergoing intense structural churn.

First, Lin Junyang's departure catalyzed Alibaba to create the Token Business Group (ATH), merging all AI operations and moving AI units previously under Alibaba Cloud, Taobao-Tmall, and others into this new group.

Then within ATH, they established the Token Foundry division, consolidating all model teams under direct leadership by Wu Yongming — the boss personally overseeing foundation models.

Most recently, the beautifully written Inside DingTalk catalyzed the merger of Agent products, a leadership change at DingTalk, and the consolidation of MuleRun and Wukong.

So here's where Alibaba's AI business stands now.

The ATH Business Group oversees five divisions: Token Foundry, MaaS, Qwen App, Wukong, and Innovation.

Token Foundry houses the Qwen, Wan, Happy Horse, and other model families.

MaaS recently rebranded the clunky Bailian backend into the cleaner Qwen Cloud.

The Qwen App division is primarily the Quark team building the Qwen app.

Wukong covers DingTalk and the B2B Agent Wukong.

Innovation currently has Qoder left; MuleRun, previously here, just merged with DingTalk.

Crystal clear: Alibaba is doing a massive consolidation of its AI business.

From organization to models to products, step by step. The goal is one company, one model, one product: the "Alibaba-Qwen-Qwen App" triad.

The problems start with the Qwen app.

As everyone knows, Doubao is in a league of its own, and the key factor is likely multimodality.

If people are just typing in chat boxes, every chatbot feels the same. Given Qwen's superior model, the Qwen app should theoretically be smarter. But folks, you don't need Doubao to be that smart.

My two most vivid Doubao usage scenarios: earlier this year when the Funeral AI crew went skiing in Tonghua, our taxi driver kept Doubao on voice chat the entire ride, riffing from the Russia-Ukraine war all the way to county-level housing prices.

And when my home had a leak during renovation and the contractor stonewalled, my mom called Doubao, which found the contractor boss's contact info. She called him directly, problem solved.

Features this good? Not copying them would be a waste.

What stunned me was that the Qwen app actually does have voice and video capabilities. But they're buried in the toolbar above the input box — you gotta swipe left like crazy to find them.

Hiding them this deep shows it's not that Qwen couldn't think of this, or lacks multimodal capability. It's pure conviction in non-consensus, sprinting full speed toward adding Agent capabilities to chatbots 👍

Beyond basic office-suite features, there's integrating with Taobao, Amap, and others — letting Agents order milk tea, book restaurants, hail cabs, and so on, aka "Qwen does errands."

First, this is innovation, and deserves credit. Like the recently hyped Doubao feature that supposedly calls restaurants to book — the Qwen app actually had this months ago, genuinely letting AI make phone reservations.

But the problem is, cross-app tool invocation brings severe task success rate issues. More importantly, chatbot is a first-order capability, Agent is second-order, and the funnel between them probably filters out 90% of users.

First-order capabilities still incomplete, trying to leapfrog with unproven second-order capabilities — that's a standing long jump on flat ground.

Of course, maybe Qwen's PMs are playing 4D chess. Betting on rapid improvement in model Agentic capabilities, making these complex tasks increasingly smooth.

These are product details. What matters most: the Qwen app launched way too late.

Doubao started in 2023, shipped video calling by mid-2025. By end of 2025, Doubao stickers and fan-edit videos were everywhere, and the Qwen app just launched. The previous Tongyi and Quark apps were basically starting from scratch post-merger.

So, from the missed lesson of Doubao, you can understand everything Alibaba's doing today — merge, focus fire, concentrate forces on AI.

The good news: the AI industry overall has patched. Everyone realized Doubao isn't profitable either. Or rather, 2C AI apps just aren't profitable.

The only profit path, verified by the evil Anthropic, is this: crush coding capability, target productivity scenarios. If your model is good enough, people will beg to throw money at you.

The freshest example is Zhipu AI. Revenue barely matters — if GLM can break through in coding scenarios and catch Opus 4.8, it's worth half an Alibaba.

As Zhipu AI co-founder Jie Tang put it: "The essence of the AI era is rapid technological progress — when you stop to polish your product, you might find the underlying tech has already fallen behind and nobody wants it; when you stop to think about business models, the AI world has already been disrupted again."

So everything comes back to model capability.

If Qwen can stay ahead, its productivity Agents like Qoder Work and others succeeding is only natural.

The Zhipu AI story: because GLM memberships were so hard to get, tons of people downloaded Zhipu's Agent product Zcode. Gotta praise Zhipu here — copying Codex was the right move. Unlike Kimi Work, which copied the wrong thing, copying Claude's coworking product which was itself a mess.

The Seedance story: with strong enough model capability, you naturally get distributors like Mianshen and LibTV, subsidizing out of their own pockets to help Volcano Engine hit numbers. Whether Lark or Dreamina's video Agents are any good barely matters.

So the question returns to: how do we evaluate Qwen?

Good news: Qwen genuinely leads. Bad news: it's always just that tiny bit short.

You can see it in Qwen's model releases — Alibaba leadership's biggest hope for Qwen is to break through one point, completely go viral, make the masses think Qwen is awesome.

To that end, Qwen 3.7 Max, like GLM, sacrificed multimodal capability to focus on coding and long-horizon tasks.

But it's always just that little bit short. Even Zhipu AI's own benchmarks show Qwen tied or beating GLM and Opus on several tests. The actual capability gap is probably less than one percentage point.

But that tiny gap determines GLM as China's #1 model. During this month when Fable 5 was restricted, GLM was also the most powerful model freely available to the world.

The reward for #1 massively exceeds #2. During this period, everyone discussed Zhipu AI. Nobody discussed Qwen.

So here's the thing: there exists a domestic model critical point.

When a foundation model's capability can match the flagship models from A[holes] Inc. and O[rganization] Corp., or open-source something as transcendent as CoT, the world's attention concentrates on it. It gets cast as the knight challenging the evil tech monopoly — a dragon-slaying narrative.

I asked a stock-bro friend: if Qwen surpasses Fable 5 by year-end, can Alibaba stock pop 20%? His first reaction: forget 20%, 100% wouldn't be crazy.

I'm quite optimistic on Qwen. The latest models are extremely close, Qwen 3.7 Max launched a month before GLM 5.2, and Qwen's update frequency is slightly faster.

So the next-gen Qwen model will likely lead GLM and become China's #1.

But here's where the problem lies.

Qwen is already awesome, likely to break through the critical point soon. But does this mean Alibaba Group will linearly become great?

So we're back to: Qwen being awesome doesn't mean Alibaba's stock rises. Because in people's minds, Alibaba isn't a pure AI company — it's still fighting ugly wars.

Quick aside: as an Alibaba shareholder, I need to call out Happy Horse here.

Gaming the benchmarks to fool your bros is fine, don't fool yourself. Video model benchmarks are useless because video quality is immediately obvious. Happy Horse's outputs are oily as hell — saying you beat Seedance, nobody believes you.

This is common sense. Stop testing everyone's common sense.

Let me abstractly summarize.

ChatBot is phase one competition. Phase two — productivity Agents — has just begun.

You can clearly see both Alibaba and ByteDance are still horse-racing on Agent products. ByteDance has Doubao Pro, Trae Work, Coze, and a pile of Lark Agents. Alibaba has Qwen Desktop, Qoder Work, and MuleRun just merged with DingTalk Wukong.

Obviously this can't last. Eventually BAT will each consolidate into one Agent product.

Here, Tencent reaps the dividend of Yuanbao and its models being weak — no dilemma, just push Workbuddy.

Compared to ChatBot, Agent products more severely test models' long-horizon task capability. ByteDance's weak base model becomes a bigger problem.

People's subconscious filters Seed models through Doubao's strong product. But Seed 2.1 Pro is factually second-tier domestic, behind GLM, Qwen, and Kimi, roughly on par with our dear MiniMax M3.

This is a fun question: why is ByteDance's product and video model unbeatable, while Seed base models never quite made it?

After some thinking, the likely reason: the Seed team is too hot, external expectations too high, someone's always egging researchers to go start companies and shake morale.

Compare with DeepSeek, Seedance, and GLM — all teams nobody cared about before, able to focus on quiet, scrappy development aka edge innovation, gradually breaking through capability thresholds, with the outside world only belatedly realizing they'd been cooking something huge.

This is an organizational problem.

Even if Seed pays well and Doubao RSUs flow freely — you can't give every researcher a Ferrari. But startup bros really do have people driving Ferraris. Investors waving these temptations at you, hard to resist.

Fundamentally, foundation models are an engineering problem, requiring the whole team to be purely focused and grinding. Those damn investors (specifically two VCs and FAs eyeing ByteDance) and competitors constantly poaching people to disrupt morale — that doesn't fly.

Another reason: doing C-end products well doesn't necessarily boost model coding capability. Look at companies with abundant C-end data — Google, Grok, Meta — models getting worse one after another.

As everyone knows, the Doubao product team does extensive post-training themselves, and the audio-video features involve more than one model. Product doing well means product team has more say, model team naturally gains another "dad." Strong dad, weak son — makes sense.

So Alibaba's external image dragging down public expectations for Qwen — this is actually a blessing, cherish it.

(Cover image generated by ChatGPT, purely human-written, and here's a song 🎸)

⬇️

Subscribe to our Substack: funeralai.substack.com