Minimax's Yan Junjie in Conversation with Huang Mingming: AGI, There's Only One Path — the Hardest One

暗涌Waves·June 19, 2024

A Conversation on Non-Consensus and Long-Term Optimal Solutions

By Lili Yu

Edited by Jing Liu

Bald head, smiling face, an air of easygoing warmth. On June 14, when MiniMax founder Junjie Yan appeared at 36Kr's WAVES 2024 conference, it was still a face unfamiliar to many. As the founder of a closely watched AI large-model company, this marked his first appearance at an offline summit. In the year-and-a-half-long Chinese large-model war that has dragged on to this day, MiniMax was actually one of two startups founded before ChatGPT's emergence, and remains one of the highest-valued Chinese large-model startups and the fastest explorer of consumer-side commercialization. Yet Yan has remained remarkably low-profile.

In this dialogue titled "Non-Consensus and the Long-Term Optimal Solution," Yan's interlocutor was Mingming Huang, founding partner of MingShi Capital and an early investor in MiniMax. During the conversation, Yan spoke with unusual candor about topics he had never before disclosed: how his initial decision came about; the difficult choices on model architecture; organizational management insights; and how he views the inflection point for AI application explosion, and China's position in AGI relative to the world.

"AGI is a race China cannot afford to lose," Huang said.

Currently, beyond MingShi Capital, MiniMax's investors include Hillhouse Capital, IDG Capital, Hongshan, Yunqi Capital, miHoYo, Alibaba, Tencent, and Xiaohongshu, among others. WAVES is a new summit IP launched by 36Kr last year; this is its second edition.

At the WAVES conference, Yan and Huang in conversation. Below are the key takeaways we extracted from Yan's remarks:

1. One reason I moved from AI to deciding to pursue general AI: My grandfather once said he wanted to write a book about his decades of experience. But he couldn't actually do it, because it requires very strong language organization skills, and at minimum the ability to type. I believed only artificial intelligence could help him achieve this.

2. AI development will have three stages:

Stage one: Before 2021. Here AI didn't exist independently; it was more a component within business operations and products.
Stage two: AI begins to possess some general capabilities, able to solve more mass-market problems. At this point AI can independently drive certain products. This is where we are now.
Stage three: AI capabilities stably exceed those of the average individual. AI-driven user time online will necessarily surpass that of traditional products.

3. The GPT-4 generation of models has roughly a 20-30% error rate across various benchmarks and many real-world scenarios. The future inflection point is when model error rates drop by another order of magnitude, and application scale increases by two more orders of magnitude.

4. If there are only five global AGI companies in the future, at least two will be Chinese companies, and at least the second-place company will be Chinese.

5. Whether it's cash-rich tech giants or cash-poor startups, (Chinese companies') investment at the compute level will likely be 1-2 orders of magnitude smaller than American companies. This is something very certain for the next two to three years.

6. Transitioning from Dense to MoE models is one of the necessary conditions for building better models. Including synthetic data, attention mechanisms, multimodal fusion, and more — the technology stack required for better models is accumulating ever more components.

7. When do you know you've made the right choice in entrepreneurship? When you discover you had no choice at all, but could only find a single path — that's likely the correct one.

8. Believe that AI's value lies in serving ordinary people, because the vast majority of people in any social stratification are ordinary people.

9. Talent is the company's most core asset, because talent and the organization of talent create everything.

10. I grew up in a relatively underdeveloped region. A clear observation: people in these regions may need AI's help even more than city dwellers.

The full conversation video is below:

The following is the complete dialogue, edited by Anyong Waves:

The Initial Decision

Mingming Huang: Hello IO (IO is Yan's internal company moniker). Many people have never seen you in public, saying you must be some AI hidden behind the scenes. I'm very glad to have pulled the real person on stage for direct exchange, at 36Kr Anyong's invitation.

You and I first met at the end of 2021. Our introducer was Wei Liu, co-founder of miHoYo. You had just started your entrepreneurial journey, the first Chinese startup to propose building general artificial intelligence. At our first meeting, three of us from MingShi went. To be honest, I didn't understand what you were talking about — you spoke of dialogue, voice, digital humans. The market also went crazy saying you were a metaverse company pivoting to AGI. Fortunately, at least one of our three people understood: Ling Xia. After he talked with you again, he told me we absolutely had to invest. So we participated in MiniMax so early, giving us a ticket to a new world.

MingShi Capital thus established a rule: when meeting with heavyweight founders, we bring at least three people — in case one of them understands.

Going back to 2021: because the previous wave of AI had fallen far below expectations, both in social value and commercial value, that was a rather dark moment for the world's view of AI. ChatGPT wasn't released until November 30, 2022. Earlier, in 2021, what massive opportunity did you see, and why were you so convinced so early that AGI was coming?

Junjie Yan: It was only a little over two years ago, but it feels like a century has passed.

Huang: A day in heaven, a thousand years on earth.

Yan: I thought of this three years ago, at the end of 2020. Why I decided to pursue general AI — actually two very extreme things happened that made me realize I had to do this.

I had always been doing technical work, writing papers, doing research. One classmate worked in algebraic geometry, one of the most cutting-edge fields in mathematics. One day he told me his advisor's advisor had passed away, and I realized that in such an important frontier field, there might be only about 20 people in the world who understand it. Progress in this field is very random, and it's increasingly difficult to even enter it.

If progress depends on random individuals, it will certainly encounter challenges. How can a frontier field sustain progress? Beyond cultivating better people, I began to wonder: could better AI also achieve this? If technological progress matters, then beyond cultivating better talent, another way to do research is artificial intelligence, because relying on technology offers the highest degree of certainty.

It's the same for ordinary people, not just frontier fields.

My hometown is in a county town; I often go back to observe county town life. My grandfather, in his seventies or eighties, one day said he wanted to write a book about his decades of experience. Maybe not many people would care about this experience, but I cared. I found he couldn't write the book, because it requires very strong language organization skills, requires typing — he couldn't do any of it. How could his experience become a book? I couldn't help him, but I believed AI could.

I realized that whether it's the most cutting-edge matters or ordinary people's affairs, with more general AI technology, everything would be very different.

But at that time, AI technology was very dependent on customizing models for specific needs, only able to solve particular problems — facial recognition, speech recognition, things like that. This mattered in the long term, but the actual value AI produced at that point was very limited. The methodology must be wrong, or the direction must be wrong.

I began to realize that to solve this problem, the only way was to make AI more general, to make it part of ordinary people's lives. That's when I started thinking I absolutely had to do general AI, had to do AI to C. But at that time, there wasn't even the term "large model." In simplified language, it was an interactive intelligent agent — easily mistaken for doing digital humans.

Huang: As a pioneer in this field, could you share how you view AGI development over the next five, ten years?

Yan: We can look at history first. I think AI development will have three stages.

Stage one: Before 2021, AI mostly existed in university labs, including many large companies having such labs, using better algorithms to solve specific business problems. AI in this generation didn't exist independently; it was more a component of business and products, making a particular function more efficient. This was the stage after deep learning emerged but before large models.

For example, facial recognition, speech recognition, many beauty camera apps and similar things. Starting in 2020, American companies like this emerged; we started at the end of 2021.

Now we're in the second stage. Starting in 2020, American companies like this emerged; we started at the end of 2021. AI can already exist as an independent product form. The core variable is that AI can become general — general means no customization needed, able to serve more scenarios, and only then does it have independent value.

For example, on AI assistants and AIGC content communities, some AI-native products can emerge. But the problem is that current user scale penetration isn't that high. How to increase this penetration? Mainly through technological progress, product innovation. We've found that at least on our own products, basically every major user inflection point comes from model capability improvements — this is a very significant phenomenon.

Stage three is after another round of model capability improvement, when error rates drop another order of magnitude, and model capabilities stably exceed ordinary individuals. This will certainly produce user interaction frequency surpassing recommendation-system-based applications. The inflection point can be defined as: model error rates drop another order of magnitude, application scale increases two more orders of magnitude.

Huang: For models to enter the next era, error rates need to drop an order of magnitude, users need to pass 100 million DAU.

China's AGI Position in the World's Future: At Least Second Place Is a Chinese Company

Huang: As a China-born AGI company, we're destined to have 1-2 orders of magnitude fewer resources available than OpenAI, including some world-class tech giants. You've even said that none of the top 50 people globally with the most significant influence on large models are in China. As a Chinese AGI startup, how to catch up with top companies like OpenAI, and what opportunities exist to surpass them?

Yan: We can look at some objective numbers. Beyond OpenAI, leading startups have over $1 billion in funding. But this isn't a startup-only track; it's a track where startups compete with larger companies from the previous generation. Looking at American companies — Google, Microsoft, AWS — they're all at the hundreds of billions of dollars investment scale for the coming years.

Huang: Each one is investing $100 billion over three years.

Yan: This is the consensus among several American tech giants. OpenAI also has similar-scale investment. In China, ByteDance might possibly, or Tencent, Alibaba might have that much money. But with these compute constraints, they actually can't spend that money. Whether it's cash-rich giants or cash-poor startups, investment at the compute level will likely be 1-2 orders of magnitude smaller — this is something very certain for the next two to three years.

We can't complain too much about this. Let's think about how to do AGI well given this, with the constraints being objectively real.

You need compute, data, and algorithms. But actually, there's a very core element being overlooked: users.

AI isn't just embodied in a model; there's another part that can be embodied in user creation. Objectively speaking, we'll lag somewhat on models, and through much effort narrow that generational gap. We can be closer to users, making up the gap through users. This simplifies to: technical catch-up, then more engagement with users, jointly achieving AGI.

Huang: First, we have to acknowledge this reality — the gap in compute and resources. But we also have continuously iterating advantages we can evolve, including user experience and user foundation, a user-centric mindset. The engineer dividend can play a significant role.

Extending from the above question: in the last era, we invested in electric vehicles, and were early investors in Li Auto. I remember BYD's Chuanfu Wang saying: "Together, we are Chinese automobiles." As a pioneer and leader in Chinese AGI, how do you view Chinese AGI companies' positioning in the future global landscape? I personally believe AGI is a race China cannot afford to lose. If we lose this race, I think it's like when China first faced the world in the 18th and 19th centuries, treating an already industrialized world with an agrarian civilization mindset.

Yan: AI R&D investment will certainly grow larger — this is undeniable. In the short term there will be much competition, both domestic and international, with much randomness that can't be fully anticipated. But looking long-term, considering five or ten years out, assuming there are only three companies in the world, or only five.

Huang: If there are five AGI companies.

Yan: Then at least second place should be a Chinese company.

First, because China has 1 billion internet users — at least in user scale, China is absolutely leading.

Second, on talent: although China's current overall environment, innovation capability, still has gaps compared to the US. But we can also see many excellent people returning or growing up here. And we don't necessarily need to think of AI as something particularly mysterious — it's the same as other disciplines, like new energy or biopharma. I believe that although China currently has gaps, China's overall talent quality and talent ecosystem will keep improving. At that point, China's best company may have gaps with America's first company, but will likely be better than America's second company, because in America there will also be head concentration.

In the short term on compute resources, compute, chip process technology we're behind, but on communication interconnect we're leading.

Huang: Communication interconnect is world-class.

Yan: Although we'll experience many challenges in the short term, with gaps in various aspects, looking long-term, if there are five, at least two are Chinese, and at least second place is Chinese.

Huang: Eight years ago, when we looked at intelligent EVs, when we disassembled Tesla from its vehicle electronic and electrical architecture to its battery pack, our first feeling was: the traditional auto industry is finished.

Our second feeling was: how can China's auto industry catch up? There's no way to catch up — the gap was several generational levels. But we used seven or eight years to achieve "corner overtaking" — Chinese EV development is evident to all.

Musk said he believes that of the world's top ten auto companies in the future, there should be one Tesla, and the remaining nine should all be Chinese companies. In the Chinese AGI field, to borrow IO's words, if there are five in the future, at least 2-3 will come from China. We acknowledge the gap, but still have great hope for catching up.

On the MoE Decision: This Was the Only Path

Huang: From day one, many decisions MiniMax made were highly non-consensus. We were the earliest to propose doing general AI in 2021; last year we bet on MoE (Mixture-of-Experts). As of June 2023, MoE wasn't even consensus in Silicon Valley — only OpenAI was fully betting on MoE, while Google was fully betting on Dense models. Even MoE's originators didn't quite believe in the MoE path.

MiniMax internally decided in June to fully bet on this, staking nearly 80% of available compute resources. At the time, MiniMax was in the middle of a funding round at around $1 billion valuation. While this was beneficial long-term, other domestic peers not making this choice could more easily produce functionality that investors and users could see value in. At such a moment, why dare to make this decision?

Yan: Two things caused it. As an entrepreneur and relatively rational person, I do a lot of analysis. We found at that time we were processing tens of billions of tokens daily. With a Dense model, we couldn't produce that many tokens daily. Very quickly, due to inference cost issues, we'd burn through all our money.

Huang: And that was just with the user volume at that time.

Yan: We already clearly knew then that although outwardly it was a C-side product, the value delivered to users was essentially model capability improvement. We could easily see the Dense ceiling was right there.

If we pursued a higher ceiling, we had to make similar technological innovations. It wasn't that there were two paths to choose from — rather, to achieve your goal, this was the only path.

Huang: This is a necessary condition for AGI.

Yan: This is a necessary condition for better models. Not just choosing MoE, but including all kinds of decisions in entrepreneurship — I found that what initially seemed like choices actually weren't choices. When do you feel a choice was right? You discover it wasn't a choice, but the only path you could think of, the only one leading to your goal. At that time, it had to be this way; if we couldn't do it, we were finished.

Huang: I've always said entrepreneurship is like life itself — maybe 5-6 decisions most influence a life, and it's the same for startups. Each decision determines whether you and your peers diverge by a hair's breadth or by thousands of miles. Looking like there are many choices, but after thinking clearly, perhaps there's only one choice — is it because you're looking at more long-term things that you reach this conclusion?

Yan: Right. We knew this venture would be very difficult, but optimizing for a 3-6 month target is meaningless — it's a very long-term thing anyway.

Huang: Short-term optimization can be perceived externally, but doesn't mean much for long-term goals.

Yan: Yes. This sounds simple when you think about it — internally we say "no shortcuts." We've taken some shortcuts internally, but every time we got slapped in the face. Eventually this became the company's number one value: no shortcuts. But even so, sometimes we still instinctively take shortcuts.

Huang: Human nature prefers shortcuts, especially with so many smart people in our industry. I've interviewed company executives — globally, only OpenAI had produced MoE, and this technology is indeed difficult as you said. We failed twice. I know some of your executives were actually quite nervous; they asked you about it in between, but their feedback to me was: every time they found you, they couldn't tell if you were pretending or genuinely resolute — not a shred of hesitation. Were you ever hesitant in your heart, especially having failed twice, having bet almost all the company's compute resources and manpower — was there ever a moment's hesitation?

Yan: Actually, I was still very nervous. When you can't think clearly, you get tangled up. But when you think clearly, you discover it's the only path. Knowing other paths won't work, only this one will. Even if you're anxious, it's useless — you can only move forward, because you're already convinced this is how it is.

Huang: Because the long-term goal determines this is the only viable path. He didn't communicate the MoE bet with most shareholders. Last year there were market rumors — some people produced good functionality, some kept iterating on Dense, saying MiniMax's large model had stopped at the March version without major progress.

Many people asked, many people worried on your behalf not knowing what you were doing — the original model wasn't iterating, products hit bottlenecks.

In January this year, your MoE wasn't fully completed, but by then you already had confidence. Once when we met, IO very casually told Xia and me: staked nearly 80% of company resources, failed twice, but this thing is more or less done now.

He appeared nonchalant. I also appeared unruffled on the surface, but I can share my true feelings at that time with everyone: I felt the person across from me was either a madman or a genius, daring to bet all resources on this. Every investor has the thought that they invested too little in a company — this was what I said to Xia first after walking out the door last time after talking with Junjie about the MoE bet. I said: we invested too little in MiniMax this company, we invested too little in Junjie this person.

On Users: Facing Every Ordinary Person

Huang: OpenAI is simultaneously doing the two hardest things in the world: AGI, and a killer app at massive scale. MiniMax was also the earliest Chinese AGI company to propose simultaneously building large models and killer apps. Why must you do both?

Yan: This comes from internal philosophy formed during the entrepreneurial process. We realized two things: AI's value lies in serving ordinary people, because the vast majority of people in any social stratification are ordinary people. Greater value means enabling more ordinary people to use your product.

If you want to serve ordinary people, the only way is to reach so many people in product form. This company's value also lies in how much value it creates for users — the more users, the greater your value. Technological progress needs to depend on much user interaction feedback; feedback isn't necessarily direct likes, there's all kinds of information. User feedback makes models better — this is the core element.

Huang: This reminds me of the electric vehicle field a few years ago — there was a wave of people aiming for the stars, charging toward L4, L5. But EV companies like Tesla and Li Auto said: I need as many cars as possible running on roads, getting user driving behavior feedback to better build autonomous driving models. This has remarkable similarities with what you just said.

Yan: We create better AI together with users, not that we make a great technology and give it to others — this is our understanding of technology and product. Users, or users' creations, are part of the model and product, not two separate entities. It's not about making the best thing, then like God letting everyone use it.

Huang: Users and users' creations are part of the product and model. We've found a very interesting phenomenon: people in Silicon Valley say the road to AGI is destined to be full of power struggles. Whether in Silicon Valley or China, most people doing AI take an elite perspective — I make the most awesome thing, you people, you billion people, six billion people just use it. All viewing the masses from above with an elite perspective. You've said we're not developing this technology, but rather co-creating this technology with users.

Huang: Beyond what you just said, how does this relate to your personal growth experience?

Yan: I grew up in a relatively underdeveloped region. Now I spend much time living in cities, but still have opportunities to often see how my hometown people live. A clear observation: they may need AI's help even more than city people. Whether elderly people, working people like me, or even younger students.

Huang: When Junjie first mentioned this to me, I felt very ashamed. Previously, whether internally discussing AGI or from so-called social moral constraint angles, we mostly took an elite perspective.

This reminds me of a recent joke. This year's college entrance exam apparently had a question about AI's impact on human social development. For a child in the mountains of Yunnan, who may never have touched a computer or the internet, who's thinking about how to finish farming the family crops and still attend class — how is such a child supposed to answer about AI's impact on social development? So co-creating with users, creating value for every ordinary person — this was something very moving that MiniMax gave me.

On Organization and Management:

Gentle Exterior, Decisive Action

Huang: Seeing IO for the first time, people are easily misled by appearances — smiling, cheerful. Later when I talked with your colleagues, I found they had a process from distrust to trust with you — should we bet, each time you're ahead of domestic companies, even ahead globally.

Our biggest concern at the time was: under that gentle exterior, can you actually manage? After all, running a company is different from technology development. Later after talking, I found you're the complete opposite of your appearance — when making decisions, you're incredibly efficient and decisive. When you make decisions, not a shred of hesitation, considering only whether it helps achieve the next better model or longer-term progress; if not helpful, you cut all useless nodes.

At your previous company you also managed a thousand-person team, while MiniMax today has only 300-plus people. Did this organizational and management approach exist from the company's founding, or did you rapidly iterate after encountering problems?

Yan: This is a very critical question. Suppose this company had no employees, leaving only some money, models, and users — it actually couldn't get better.

Talent is the company's most core asset, because people and organization create everything that follows.

When starting this company, I already thought this through. Because your resources are all limited, competition is fierce, goals are especially difficult, there's all kinds of uncertainty completely beyond control. The only way is to think about the most essential things, not be misled by superficial things.

If you think about underlying things, the only thing is technological progress efficiency. Technological progress efficiency and effectiveness can convert into each other — suppose you have limited compute resources, higher efficiency means faster iteration thus better effectiveness. These two things weren't equivalent in traditional AI, but in this era, efficiency and effectiveness are almost equivalent.

If your sole goal is R&D efficiency, you can naturally deduce what R&D organizational form can have relatively high efficiency. From this you can almost deduce what a good organization should look like, how it should operate, what kind of people to find, how to make an organization go from good to excellent — you can deduce many things.

The only way is to find the few streamlined core principles inside, and deduce from principles what should be done. When encountering errors, constantly adjust. The clearer your underlying thinking, the lower your probability of making mistakes.

Huang: Including Yiming Zhang and Li Xiang of Li Auto — your hearts are very pure. Every person making decisions considers a hundred, ten thousand points: what does the outside world think, what do investors think, what do employees think, what does media think. But most of us haven't thought clearly and thoroughly enough about those most underlying, or truly most long-term goals that you just mentioned.

In fact, only people with sufficiently pure hearts, people who truly have genuine faith and love for this matter from the heart, can resolutely and decisively discard noise and make the long-term optimal solution — a solution that at the time may appear non-consensus to many people.