Zhilin Yang: Innovation, Long-Term Thinking, First Principles

Monolith砺思资本·April 29, 2024

I'll see you on the other side of the moon.

Earlier this year, we sat down with Zhilin Yang for a two-hour conversation covering AGI, entrepreneurship, consciousness, philosophy, life, and vision.

Monolith has invested in Moonshot AI across three consecutive rounds. Based on our understanding of this team, they are product- and results-oriented AGI evangelists who stay grounded and pursue truth.

The road to AGI is destined to be long. Recognizing what truly matters and persisting through it — when you look back years from now after reaching the destination, you may find much of today's discourse was merely trivial noise.

We used Moonshot AI to organize this conversation, hoping it helps outsiders better understand Moonshot AI as a company.

In the opening of 2001: A Space Odyssey, an ape-man tosses a bone high into the air. As the spinning bone rises into space, the scene cuts instantly to millions of years later — the bone has transformed into a spacecraft roaming the cosmos. Stanley Kubrick uses this powerfully imaginative transition to compress humanity's 3-million-year evolutionary journey into a single moment, creating one of cinema history's greatest montages.

In this epic sequence, from the blood-and-guts Stone Age to piloting spacecraft to the moon, millions of years pass in the flick of a finger — all because of the ape-man's mastery of tools (technology). Meanwhile, aboard this starship rides one of cinema's earliest AI antagonists: the cold, cunning supercomputer HAL 9000, which would become the progenitor of artificial intelligence in science fiction films.

In the summer of 1969, Neil Armstrong's voice reached every corner of the world through television. 110 hours after leaving Earth, he became the first human to set foot on the moon. That day, roughly 600 million people worldwide watched this greatest of human achievements via live television broadcast.

To celebrate this moment, Pink Floyd — then an invited band — was improvising in a BBC studio.

Four years after humans landed on the moon, Pink Floyd released their eighth studio album, The Dark Side of the Moon. Through panting footsteps, airport boarding announcements, cash register drawers opening and closing, and throbbing bass lines, they expressed to the world their understanding of conflict, greed, time, and death — and would be referenced continuously for the half-century that followed.

From bone club to starship, from ape to AI, from Monolith to Moonshot — throughout this journey of exploration, what has always accompanied us is the fear of the unknown, the boundlessness of space, but more importantly the passion and dreams of cosmic exploration, and the ultimate reflections on human nature and our future.

Today, achieving artificial general intelligence is called this era's "Apollo Program." Moonshot AI is a large model company dedicated to realizing AGI, named after Pink Floyd's album. Founder Zhilin Yang's GitHub bio reads:

The ultimate goal of all my work, including both research and business, is to maximize the value of artificial intelligence.

In our conversation with Zhilin Yang, we didn't discuss many product or company details. Instead, we talked more about long-term vision in the AI field, reflections on organization, consciousness, and life, the original intentions behind entrepreneurship, and recently read books and films.

Below is the full transcript,

organized by Monolith, with edits and abridgments:

I

"Innovation, Long-term, First Principles"

MONOLITH:

This is the inaugural episode of our First Mover podcast. As founder, Xi Cao, please start by introducing the origin of our fund and podcast names.

Xi Cao:

The name Monolith comes from the mysterious black monolith in the famous science fiction film 2001: A Space Odyssey. A key reason we chose it was that we felt it represented our curiosity, exploration, and pursuit of truth about the ultimate nature of the universe.

The name "First Mover" was actually contributed by a good friend. It derives from the cosmological theory of the ancient Greek philosopher Aristotle, representing our interest in investigating the decisive factors that drive the development of things.

MONOLITH:

Zhilin, you can also share the backstory of naming the company Moonshot.

Zhilin Yang:

When we were using Kimi in internal testing, we wanted to come up with a Chinese name. After thinking for a while, we said why not call it "Yue Zhi An Mian" (the dark side of the moon). I typed it in and asked Kimi, "If 'Yue Zhi An Mian' were a company name, what should it mean?" Kimi then cited two references — one was Pink Floyd's album, the other was Kubrick's film. So when I first met Xi Cao, it felt very connected.

Moonshot AI is like AGI — you normally only see the illuminated side, but you don't know what the dark side looks like. It's mysterious, and seeing it is difficult. In English it translates neatly to Moonshot, implying doing something enormously valuable but extremely difficult. This matched our original intention quite well.

MONOLITH:

Xi Cao, when did you first meet Zhilin, and what was your impression?

Xi Cao:

The first meeting was online — actually I was mostly the one pitching (laughs). The second time was downstairs at the company. That left a deep impression. I felt that a good company should be explainable in one or two sentences. I remember Zhilin said we want to do AGI, and we want to do toC products. I found it very pure — the matter was settled.

Earlier, in 2019, we had also met at a Tsinghua University event, but weren't very close then. Among those at the dinner table, I felt Zhilin was a founder with potential, with spark and substance. Many things aligned after that.

MONOLITH:

Zhilin, you can also talk about how Xi Cao's pitch moved you.

Zhilin Yang:

(laughs) Actually not just the first conversation — from when we met to now, including these funding rounds, as well as business and recruiting, Xi Cao has been enormously helpful to us. So he's one of our favorite investors so far.

MONOLITH:

We have a rapid-fire Q&A segment. I told Kimi yesterday that I had a conversation with Zhilin Yang today, and it generated 100 questions, from which we carefully selected 10.

First question: Describe Moonshot AI in three words.

Zhilin Yang:

Innovation, long-term, first principles.

MONOLITH:

Describe yourself in three words.

Zhilin Yang:

Same as above.

MONOLITH:

Share a book you've been reading recently.

Zhilin Yang:

I've been reading a book called Everybody: A Book About Freedom.

MONOLITH:

Share a song or album.

Zhilin Yang:

"Brain Damage,"

The Dark Side of the Moon.

MONOLITH:

Share one person who has had the biggest impact on your life.

Zhilin Yang:

The single biggest person — no one. This requires precision, or more thought.

MONOLITH 13:44

If you could have dinner with anyone in history, who would you choose? Why?

Zhilin Yang:

Steve Jobs. He's a successful case of someone who could generalize things with taste. There aren't many such cases — maybe Steve Jobs is one, Pink Floyd counts too.

MONOLITH:

Which scientist's work do you think has had the most profound influence on the development of modern artificial intelligence?

Zhilin Yang:

Geoffrey Hinton (godfather of deep learning).

MONOLITH:

In all of history, what do you think is the most successful internet product?

Zhilin Yang:

YouTube, because it has had massive impact.

MONOLITH:

If we narrow this context to China?

Zhilin Yang:

For China, I think it's WeChat, because it's China's super app for the internet.

MONOLITH:

Last question: what is something you believe deeply right now?

Zhilin Yang:

An AGI company needs a new organization built from 0 to 1.

MONOLITH:

Are you satisfied with the questions Kimi generated?

Zhilin Yang:

I think they're pretty good.

II

"Show me the code

matters more than anything else"

MONOLITH:

Moonshot AI isn't your first startup. You spent some time in the US — what did you see there that reignited your passion to start another company?

Zhilin Yang:

I definitely felt I was seeing massive change. AI as a technology has been developing for seventy or eighty years, and there had never been a consumer-facing opportunity like this. There were some excellent AI companies before, but they mostly focused on B2B.

But starting in the second half of 2022, more and more applications like Midjourney and ChatGPT emerged — even ordinary, non-technical users started using them, and weekly active users surpassed 100 million. These were very clear signals. There was definitely enormous opportunity here.

Also, compared to the internet, its biggest characteristic is that it creates new productive forces, not just connections. So I believe this is one of the most valuable things in the next ten years.

MONOLITH:

Looking back, would you have chosen a more aggressive approach — say, going directly into large models and AGI during your first startup a few years ago?

Zhilin Yang:

We actually did quite a lot of AGI-related work back then. For example, some of the earliest large model projects in China — we did a lot of advocacy behind the scenes. But objectively, many things only work when done at the right time. I think building an independent AGI company requires the right moment.

In fact, we had stepped into many pitfalls before, which helped enormously for this venture, because then you know what doesn't work. The biggest lesson for me was this — if you want to do AGI, you absolutely need to build an organization from 0 to 1. This is something I now believe deeply, and it's the core reason why we're building this company. But without going through that previous process, it would have been hard to truly understand why this is the case.

MONOLITH:

Many people don't know you're a drummer. Your previous band was called Splay Tree (taken from a data structure). You've said rock and entrepreneurship are alike — both represent a spirit of rebellion and breaking conventions. How did playing in a band influence you?

Zhilin Yang:

I believe a truly great company needs humanistic depth, not just technology and product. You have to look at its underlying culture and values. The pursuit of aesthetics lets us build better, more soulful products — I believe in that. Our people could probably form several bands at this point.

MONOLITH:

I saw your GitHub self-introduction —

My goal of all my work, including both research and business, is to maximum the value of artificial intelligence. This is a bit like how Elon Musk said the whole point of founding Tesla was to accelerate the advent of electric vehicles, not to become General Motors. When did you establish your mission and vision for AI?

Zhilin Yang:

It was a gradual process. I've been working on AI for over ten years. At first it was just curiosity about the thing itself — that besides the human brain, there's something else that can produce intelligence. Back then people were still handwriting back propagation, there weren't mature frameworks, and results weren't that good.

Gradually through this process, I discovered that AI could genuinely be a 2x, 10x, or even 100x GDP opportunity, because human society's productive ceiling is essentially the sum of brainpower that your population can generate. Now suddenly there's something that can substitute for that, which makes the upper limit extraordinarily high.

The first time I met Xi Cao, I told him about something — I once had a dream where I trained a neural network that achieved 100% accuracy on all tasks, and I woke up in shock. Because I realized I could suddenly change the world — then realized I couldn't (laughs). Back then many tasks were only at sixty or seventy percent. Today that dream has partially come true. AI surpasses most humans on many tasks.

Through this process, I've felt quite emotional about how much has changed. We believed very early that language models were extremely important, but not everyone thought so. At the time, the biggest use case for language models was rescoring speech recognition — many people thought that was their only purpose. Only later did people gradually see more possibilities for using AI to unlock productivity bottlenecks.

MONOLITH:

There's a well-known episode in this journey — your 2019 Transformer-XL paper was once rejected by ICLR, on the grounds that they didn't believe improvements in large language models would have much practical value. Did negative feedback like that shake you?

Zhilin Yang:

By then I was already pretty zen about it. People didn't really care whether you published in some conference or journal. What mattered most was arXiv plus GitHub. So I already didn't care much about so-called peer review.

There have been shifts along the way. Ten years ago peer review and refereeing mattered a lot. Then it became GitHub plus arXiv. Now it's products — show me the code. Let the product speak.

MONOLITH:

More first-principles thinking.

Zhilin Yang:

Build something that actually works, that works well. These four words — works, works well — matter more than anything else.

"Expect the worst,

but do everything possible to do things well"

MONOLITH:

It's been almost a year since you started the company. Are you satisfied with how it's developing?

Zhilin Yang:

I think progress has been very good so far, even exceeding expectations. Of course, for a large model or AGI company, the core is really the organization itself, because all technology is produced through the organization. And the core of the organization is people. Without the right people, talking about management methods or how to collaborate is just empty talk. So I think organizational building is something we may have done relatively well early on.

Xi Cao:

Why did it exceed expectations? Were expectations low to begin with, or is it actually doing well?

Zhilin Yang:

I've always maintained cautious optimism — expect the worst, but do everything possible to do things well. That's part of why it exceeded expectations. Also market variables matter a lot — the flow of talent and capital has allowed us to build the company faster.

MONOLITH:

You mentioned that doing AGI requires a completely new organizational form. How do you understand this?

Zhilin Yang:

The biggest difference between traditional internet and AI products is: one is design then manufacture, the other is design through manufacture. I can't plan ahead and say there will be users putting 50 resumes into Kimi to screen them, then specifically design and optimize for that scenario. We can only first design the foundational capabilities well — like strong in-context learning ability, instruction-following ability. This is a process of completing design through manufacturing.

So if your production method is different, your organization must be different. This is extremely important. Now it's model as application — essentially consuming intelligence rather than connections. Some things haven't changed much compared to the internet era, but the changes that have occurred are so profound that without a completely new organization, you probably can't do it.

Xi Cao:

So internet thinking and AGI thinking aren't necessarily completely separate, but the point about needing a new organization may be the innovator's dilemma for some existing players.

MONOLITH:

So in terms of organizational building, what kind of people does Moonshot AI want to attract?

Zhilin Yang:

We have many AGI evangelists at our company — people who believe that this is the only thing worth doing in the next ten years. That vision is the most critical. So we most welcome people with strong passion for AGI. That's the first point. The second is probably learning ability.

There's another important point — talent has a snowball effect. When you know some excellent people, and a, c, d, e are all at this company, and these people already know or admire each other, then more and more similar people join the organization.

MONOLITH:

What's the most important question you ask in interviews?

Zhilin Yang:

I do have one, but if I tell you I can't use it anymore, so I can't tell you. (Laughs)

MONOLITH:

Then share something you've been prioritizing and pushing forward recently.

Zhilin Yang:

Recently: product, product, product.

MONOLITH:

Speaking of product, what kind of product manager can build a killer app today?

Zhilin Yang:

I don't think there's a specific formula, but the core is learning ability. Like, on the premise that your basic, core judgments are correct, you still need to rapidly upgrade yourself with lots of information input.

Xi Cao:

Over the past years, I think the product manager has been an excessively amplified stereotype. There's a misconception here — sometimes so-called product managers aren't precisely defined by which industry they're in. Some industries don't have this concept at all. The gaming industry doesn't — they're called game designers. Or take education — who would be the best product manager there? Probably first and foremost practitioners with the deepest understanding of education.

I think in this wave of opportunity, the technical requirements for founders may be even higher. So among many companies last year, we chose to bet on Zhilin because we felt he was the strongest in AI technology. In this era, whether for practitioners or entrepreneurs, the requirement is actually both/and. In both the internet and mobile internet waves, the companies that grew very large — Tencent, Baidu, NetEase, ByteDance, Kuaishou — all had technically-minded founders.

"The essence of human evolution

is capturing and solidifying entropy in the universe"

MONOLITH:

What do you think consciousness is, and can machines have consciousness?

Zhilin Yang:

A few decades ago, someone asked a question like this: how do you know if a machine understands a sentence? Today, it's an upgraded version of that question, but I think the essence hasn't changed.

Today, no one asks "can this machine understand a sentence?" anymore, because the answer has become simple. So I think in a few more years, your question will also become trivial. You won't even need to measure it — you'll just feel it.

If you break it down, there are scientific approaches too. You could turn it into a formal language and express it through a logical formula. Or at the operational level: take an article or paper, ask the AI 100 questions, and if it gets them all right, then it understands, and thus has consciousness.

Right now, AI still can't form long-term, year-spanning connections with people. It also lacks a deep identity, or an endogenous motivation — these can serve as measuring sticks too.

Xi Cao:

I think in some discussions, the definition of consciousness is often framed from a human self-centered perspective. This wave of intelligence is definitely related to scale. But in nature, ant colonies and bee swarms exhibit seemingly intelligent behavior once they reach a certain scale. So do bee swarms or ant colonies have consciousness? Maybe they do, but not the kind of consciousness we define.

Zhilin Yang:

It's also a result of quantitative change producing qualitative change. Maybe after 10 to the 27th power, their consciousness becomes stronger. So you can't say they don't have it today — it's just not that obvious.

MONOLITH:

From first principles, why do we need AI?

Zhilin Yang:

There are two schools of thought on this. One is effective accelerationism. People in this camp believe that all kinds of entropy are floating around in the universe, and humanity's goal is to capture them. Today's neural networks are a capability for capturing entropy. When you reduce loss from 10 to 1, you're essentially solidifying, capturing, and condensing a bunch of entropy.

Of course, you can never capture all entropy, because some things are completely uncertain signals that you can't predict. But you predict everything that can be predicted. The essence of human evolution is doing exactly this. It benefits social progress, because the result is continuously developing productivity. A hundred years ago, diseases like polio were untreatable. Now, beyond medicine, there's all kinds of technology — this is the result of capturing entropy.

The other school is Effective Altruism. They don't believe this acceleration brings any benefits; they're relatively more pessimistic.

Technological development doesn't shift according to individual will, or even according to any single society, because it's fundamentally a free market. The free market will evolve in whatever direction maximizes efficiency. Right now, the market has chosen to accelerate. Essentially, it's a process of maximizing efficiency to capture entropy. This is something no one can stop. So you follow the will of entropy in this universe, and find the best solution within it.

MONOLITH:

What do you think of the phenomenon of emergent intelligence?

Zhilin Yang:

Emergent intelligence is completing design through manufacturing. You can't design it in advance, because you don't know what will emerge. This is also why I think we need AI native approaches — not solving single-point problems, but solving systematically.

Xi Cao:

I remember you mentioned before that intelligence can be viewed as a form of compression.

Zhilin Yang:

Right, next-token prediction is essentially a lossless compression process, because you're using the fewest bits possible to represent the world. For example, if you need to predict an arithmetic sequence, and you know the underlying pattern of the world, you only need to store the first two numbers to predict every subsequent number. That's a massive compression process.

I think this is the meaning of intelligence. The real world is more chaotic — there are predictable parts and unpredictable parts. But when you can push the predictable parts to their limit, you've reached the upper bound of intelligence.

Xi Cao:

I'm curious — in this process, what's easier to compress, and what's harder?

Zhilin Yang:

There's no fundamental difference, as long as you can represent it in the same token space. Why text might get compressed first is because it's already a more condensed signal. Visual signals have higher loss because there's more uncertainty. But ultimately, it's all a scaling problem. As long as you put it in the same space and scale, it can all be solved in principle.

Xi Cao:

AGI is a term that's come up repeatedly in our conversation. How should we understand its definition?

Zhilin Yang:

I think it's when every behavior that humans can produce in the digital world today, AI can perform at a level approaching the human top 1% or expert level. Eventually, the same in the physical world too.

Xi Cao:

Your definition is top 1% — that's a bit like "general elite intelligence."

Zhilin Yang:

There are different standards. My point is that to ultimately measure it, you have to break it down into specific sub-tasks to have any way of measuring it.

Xi Cao:

Speaking of foundational model capabilities, where do you see limitations for LLMs?

Zhilin Yang:

If we're talking broadly about sequence models doing next-token prediction, there may not be many limitations. From first principles, it's actually general-purpose. Eventually, all tasks achievable in the digital world can be learned through this paradigm.

Of course, analogizing to old computing paradigms: with enough investment in digitization, everything could eventually become a sequence of zeros and ones. The new computing paradigm can't do that yet. For instance, if it's just a pure text language model, many things can't be represented — there's still limitation there.

Another important bottleneck: right now, AI has two input lines, one for electricity and one for data. If you truly want to eventually build a superintelligence, you need to be able to pull out that data line. Just feed it electricity, and it should be able to evolve on its own.

Xi Cao:

Pulling out the data line — can we understand this as meaning that currently available data will one day be insufficient?

Zhilin Yang:

Right. This generation of models is at 10 to the 25th power floating-point operations. In the next two to three years, there will be another one to two orders of magnitude of growth. According to scaling laws, this data needs to grow by at least another order of magnitude. But high-quality data in this world is somewhat finite. So you'll need synthetic data, having the model itself explore points in the representation space that haven't been sampled yet. This is basically a known problem that must be solved.

MONOLITH:

How do you think AI will affect ordinary people's daily lives?

Zhilin Yang:

I think it will probably make your life better. More people won't need to work for money. You can exercise your creativity, do more creative things.

Poster generated by Midjourney

There are repetitive jobs in this world that AI can essentially abstract away. On the other hand, AI can democratize, or make universal in an inclusive way, so that everyone possesses stronger creative capacity.

Today I can use a tool to draw an extremely exquisite painting; tomorrow I might shoot a film — everyone could become a director. In the future, AI will make this creative capacity very widespread, and humanity's spiritual world will become increasingly rich.

Xi Cao:

Many tasks in the physical world of the future will be compressed or abstracted.

Zhilin Yang:

Yes, and the spiritual world too. You couldn't originally make a film as good as Quentin Tarantino's, but in the future you might be able to — AI can help you create. So everyone will have more capacity and power for creative expression.

Video generated by OpenAI's Sora

At the end of 2001: A Space Odyssey, we see the symbol of the Star Child, a new form of consciousness gestating in space. Just as that moment in the film represented humanity's aspiration for its future, we now stand at another kind of crossroads. When artificial intelligence is no longer merely a tool, but becomes a partner in expanding the boundaries of our cognition and capabilities, how will it affect our civilization and identity?

Through our conversation with Kimi, we genuinely feel that artificial intelligence is no longer cold code, but an extension of human intelligence — a partner that breathes and thinks alongside us. From digital tool to spiritual companion, from simple task automation to complex creative expression, the blueprint for AI is being drawn, and each of us is a brushstroke in this painting.

On this journey of exploring AGI, we are not only pushing the boundaries of technology, but also questioning the essence of human existence. Just as Stanley Kubrick told the story of human evolution through images, this time we are weaving a new narrative by transforming energy into intelligence. This path is destined to encounter countless doubts and challenges, but will also open infinite possibilities.

This article's conclusion was summarized and generated by Kimi

Monolith Interactive

Have you used the Kimi intelligent assistant? What was your experience? After reading this article, what are your impressions and expectations of the Moonshot AI team, and of AI's development? Do you have the intention to start a business in AI, or are you already doing so?

Drop your thoughts and observations in the comments. We'll select five users with the most heartfelt comments and the most likes to receive a special commemorative gift from MONOLITH. The deadline is midnight, May 1. Add our podcast assistant below to get in touch.