"The 100 Million Token Club" Is Packed — AI Is Running Out of Fuel | A Conversation with Wenyuan Yu: Technical Lead of Alibaba Cloud Bailian

March 29, 2026

🚥 If you've been using Claude Code, OpenClaw, or various Agents lately, you've probably noticed something: model capabilities keep improving, SOTA results are exciting, but Tokens are sobering — expensive, and never enough.

Someone even created a group for power users with a ridiculous threshold: burn through 100 million Tokens in a single day to join the "100M TOKEN Club." Even more ridiculous? That bar is getting "too low" — because more and more people are pushing AI from chat tools into real productivity workflows.

For this episode of Crossing, our guest is Wenyuan Yu, technical lead for Alibaba Cloud's "Bailian" platform. Wenyuan occupies a rare vantage point: how compute demands are exploding, which scenarios are devouring Tokens, how cloud paradigms are being rewritten by Agents, and the real engineering challenges behind "no amount of GPUs is ever enough."

We also discuss why surging Tokens aren't a temporary bubble but a phase-defining signal; when enterprises should actually build their own infra; and why AI coding's rise makes "vibe coding" more dangerous, not less.

If you care about how AI enters production, stabilizes, scales, and where the next wave of opportunity emerges — this episode is worth your time.

🎬 Our video podcast is now live on Koji Yang Yuancheng's channels on WeChat Video, Xiaohongshu, Bilibili, and YouTube.

📒 The transcript will be published on the Crossing WeChat official account.

🟢 00:00:39 Rapid Fire

Age, alma mater, MBTI and zodiac sign, one-sentence description of Bailian, work history.

🟢 00:01:09 The Token Nuke: Agent-Ignited Compute Explosion

Claude Code and OpenClaw sweeping the globe — behind this isn't just a tool going viral, but a fundamental shift in how compute gets consumed.

Token volumes doubling month-over-month, and these are the highest-quality SOTA Coding Tokens — people have finally stopped treating AI as a chatbot
Power users burning 100 million Tokens daily is no longer an impressive threshold
Token count is misleading: Tokens from small models and Tokens from deep-thinking SOTA large models have fundamentally unequal compute value
"This is just the beginning" — Wenyuan said this three times in two minutes

🟢 00:04:15 Every GPU Must Not Idle for a Single Second

There's a CEO with "the most aggressive compute investment" — and it's still not enough. This hunger for compute has never existed in cloud computing history.

Qwen 3.5 launched on New Year's Eve; two weeks later, QPM hit the highest peak of any text model in history
Wenyuan's mission for his team, in one sentence: make every GPU not idle for a single second
From 1,000 to 1 million cards, every single one fully utilized

🟢 00:11:49 Build In-House or Cloud? Let Me Make a "Hot Take"

Cost control, data security, flexibility — these three reasons enterprises cite for building their own GPU clusters are, Wenyuan argues, precisely the reasons to use MaaS instead.

"I don't think there's any scenario that requires building in-house"
Enterprise-owned GPU clusters: inference optimization, algorithm iteration, scheduling complexity... add up all the hidden costs, is it really cheaper?
"Confidential inference": end-to-end encryption keys held by the user — Alibaba Cloud cannot see your model files or any of your requests — this is cryptography-level guarantee
You think buying GPUs gives you flexibility? "The only certainty today is uncertainty"

🟢 00:14:40 Unpopular Opinion: Don't Let AI Write Too Much Code for You

Advice to CS students: "use AI less" — isn't that contradictory?

The core skills engineers need haven't fundamentally changed; what's changed is everyone's efficiency has massively improved
Zhang Wenhong's analogy: if an intern doctor relies entirely on AI from their first patient, they'll never catch that 1% error AI makes — programmers face the same problem
Vibe Coding is fine for prototypes; but production code requires knowing every line's side effects, and AI still falls slightly short
Spec Coding is the real answer: write the requirements spec clearly, let AI fill in the blanks — a FAST conference paper proved that with a clear enough spec, even a 32B model can write a high-quality file system

🟢 00:21:40 The Most Counterintuitive Prediction: OS Developers Will Be the First Replaced by AI

Everyone assumes frontend engineers are most at risk — Wenyuan says, on the contrary.

Writing OS kernels, databases, file systems: high-quality codebases, clear test cases, fully quantifiable results — exactly the "closed problems" AI excels at
The closer work gets to humans, the harder it is to replace — "What makes a good short-video app?" This is an open problem AI can't solve, with no clear answer
AI performs well in math competitions and coding competitions precisely because the problems are sufficiently clear; MaaS systems engineering is actually an open problem, AI is still far off
"Even when AI eventually reaches 99.9%, hold onto your own 0.1%"

🟢 00:23:29 Compute Is Like Oil: But Today the Key Isn't Just Compute

China's cloud computing reference architectures have historically followed the United States — but this time, even the U.S. doesn't have the answer.

Impact of NVIDIA supply restrictions: not "can China produce oil," but "can daily demand match daily supply"
Cars are already racing on the highway, but there's not enough fuel — the compute supply gap will profoundly affect China's AI development speed
Take on Neoclouds: not bullish on pure resource-reselling models, more bullish on AI-native infrastructure — sandbox hosting, observability, Agent-centric specialized services

🟢 00:29:39 The Future Infrastructure Quartet: Water, Electricity, Gas, and Models

AI will become a standard commodity like water, electricity, and gas — but Wenyuan says: yes, but it won't become the kind of standardized infrastructure where "you plug in and get 220 volts."

AI will definitely become a utility-like infrastructure, on the level of highways and telecom operators
But it won't be standardized — diversity in speed, model performance, and functionality means it won't be "one plug solves everything"

Subscribe to Crossing: 🚦 We track the industry shifts and entrepreneurial opportunities brought by the new wave of AI technology.

🚦 Crossing is Steve Jobs's metaphor for Apple — standing at the intersection of technology and liberal arts, where great products are born. AI is transforming every industry. We seek out, interview, and bring together a new generation of AI entrepreneurs and active participants in the AI era, exploring and embracing new changes and new possibilities together.

👦🏻 Host Koji: I founded Crossing and launched AI Hacker House, a community space for a new generation of AI entrepreneurs, and serve as Venture Partner at ZhenFund. I believe technology, especially AI, represents the greatest value creation opportunity of our generation. Koji's Jike | Koji's website

👧🏻 Host Ronghui: I co-founded Crossing, worked at a dollar-denominated VC, and spent five years as a Silicon Valley correspondent, tracking technological development and business stories. Feel free to reach out and chat. Ronghui's Jike