Microsoft AI CEO Mustafa: Think About AI's Potential and Risks as a New "Species" | Bolt Picks

线性资本·November 6, 2024

How AI Can Shape a Better Future

Masters of Scale is a podcast hosted by LinkedIn co-founder Reid Hoffman, featuring founders and leaders of successful companies sharing their experiences and insights. The latest episode features a conversation between Reid Hoffman and Mustafa Suleyman (CEO of Microsoft AI and co-founder of DeepMind) at the Masters of Scale Summit held in October 2024. They discussed the risks and rewards of artificial intelligence, why Suleyman uses the word "species" to describe AI, his views on an agentic future, and why he believes now is a great time to start a company.

We've excerpted and translated some of the key highlights. You can click the "read more" link to listen to the original podcast.

Image | Podcast shownotes

📝 Summary

1. Comparing AI to a "species": When facing something entirely new and unknown, familiar metaphors help us understand faster. AI is compared to a "species" because it possesses certain biological-like capabilities such as perception and interaction. This is not only vivid but also helps us think about AI's potential and risks.

2. Model hallucinations are not a flaw: AI's "hallucinations" or creativity are among its strengths. People want more possibilities from their input instructions — this malleability and ambiguity are exactly what we desire. The key is setting boundaries to prevent AI from gaining excessive autonomy and bringing risks. At the same time, AI should be designed to avoid negative behaviors and promote better human development.

3. AI's emotional intelligence is as important as IQ: We must not only improve AI's accuracy and speed but also value its emotional capabilities and expression. This matters enormously for user experience, especially in how information is delivered.

4. The agentic future and the co-pilot's role: AI companions must be able to see what users see and sense their environment, thereby better assisting them in completing tasks. Although many proofs of concept exist today, full realization will still take time.

5. Voice input's impact on agents: The interface and its form determine what users can input. Voice input will unlock new ways of interacting with AI agents, enabling more natural expression and communication, which will lower the barrier to completing tasks.

6. How interacting with agents inspires creativity: AI dramatically lowers the barrier for people to access ideas and creativity, and will expand the scope of creative thinking. Meanwhile, AI can help people record and realize their ideas, similar to humans having a second brain.

7. Model distillation and synthetic data: Model distillation technology is playing an increasingly important role in improving AI intelligence levels, with significant room for future development. Entrepreneurs should pay attention to how to leverage synthetic data for high-quality fine-tuning.

8. The value of small models: Small models are definitely the future. They can provide efficient solutions in specific scenarios. We will compress knowledge into smaller, cheaper models that could even be embedded in refrigerator magnets.

9. Thoughts on AI entrepreneurship: We are currently in a transitional period. In major technological transformations over the past fifty years, the structure of everything has been reshaped. The present is a good time to start and scale a company, and also a moment to pivot one's career. Even if you're not an entrepreneur, as long as you want to take positive action, this is a moment worth paying attention to.

🎙️ Interview

1. Reid Hoffman: I've heard you and others sometimes compare artificial intelligence to a "species." How does this perspective help us think about AI? In what ways is it a useful lens? In what ways might it mislead us?

Mustafa Suleyman: When we encounter something truly new, unlike anything we've seen before — every wave of new technology does feel this way. Imagine how magical and insane it felt to have electricity for the first time, or the shock of speaking to someone across the Atlantic by telephone. It expands your cognition at a conceptual level — there are so many more possibilities in the world. So whenever this happens, we try to find an appropriate metaphor, connecting it to something we know to make it easier to understand. It won't ultimately be like anything we already know, but this is the best way to grasp it before it arrives. I proposed the digital species metaphor because if we step back and look at what these things can do, "species" is the closest analogy, and it also surfaces many problems we don't want to see. Models can now see what you see, hear what you hear, understand and interact with text in real time, and take action on your behalf. These capabilities are being used by more and more people. So an accurate metaphor, an alternative word, is species — and I also think this word helps us think about what we don't want AI to do.

2. Reid Hoffman: So, speaking from the perspective of a "species," what must we do and what must we absolutely avoid to steer this "species" in the right direction?

Mustafa Suleyman: I think one incredible thing about models is that you put something in, and they don't give you 100% of what you wanted. That's also what's remarkable about software — people want it to tell us something we don't know. So I actually think "hallucination" is not a flaw. To me, it's a strength, or call it creativity. We want multiple possible responses based on the input instructions. And this malleability and ambiguity are exactly what we expect. So having machines learn to understand, express, and execute, rather than humans manually executing, has been the core of machine learning for the past 15 years. Now it can actually do this, which is great — but what we need to figure out is where the boundaries of machine learning are.

Currently, models rarely do introspection within norms. The self-improving closed loop of machine learning still requires human supervision. But we can see that in 2025, the fruits of this work will gradually emerge. I think this is an area to watch and approach with caution.

Another issue is machine autonomy. Obviously, if these models can independently interact in any digital environment, create their own virtual machines, operate on the web without any human supervision or control, call APIs, and so on — this would significantly increase risk. These are two capabilities we should be very concerned about.

On the positive side, I think AI is incredibly creative. I even believe AI will help us discover our best selves. If designed properly, AI won't be sarcastic, won't judge people, won't say humiliating things. And yet, humans often do. Some people will deliberately program AI to behave this way, but it's not inevitable — it's a choice made by those who design it.

I think we should do everything possible, across the industry's ecosystem, norms, and values, to limit this possibility. But some people will do it. I think this also provides us with space to become better versions of ourselves. A few weeks ago I read a paper about people who deeply believe in conspiracy theories. Those who had long conversations with chatbots showed lower tendencies to believe in conspiracy theories. I think there's a reason: chatbots are relatively more patient, they don't judge you, don't belittle you, are consistent, and mostly provide evidence based on scientific literature. So I think the positive side of AI can indeed bring great help to humanity.

3. Reid Hoffman: When you, Karen, and I founded Inflection, we established a principle that emotional intelligence is as important as IQ. I'd like you to talk about what this meant for Pi, why it was set up this way, and why it's important not just for the Pi product but for everything we're doing?

Mustafa Suleyman: We typically associate IQ with these things: accuracy, speed, comprehensiveness, relevance, and the ability to access information in real time. All these capabilities are steadily improving. But what I've noticed is that people tend to think "if I just lay out the facts, others will get it." This is an engineer's mindset. This mindset neglects the medium of transmission — the importance of how information is delivered. It affects tone, style, the model's emotional capabilities, how deeply it asks questions, and so on. For most consumers, this way of expressing and accessing information may be more important than just browsing and absorbing Wikipedia-style information. So I think this will be one of the key problems everyone is now trying to solve, one of the key capabilities to add.

4. Reid Hoffman: Speaking of the agentic future, give us some of your thoughts. From the co-pilot perspective, how are you thinking about this? In the next 2-5 years, what role will agents play in our lives?

Mustafa Suleyman: I think the first step is that your co-pilot, or your AI companion, must be able to see what you see. Whatever you're looking at on your screen, in your browser, on your desktop, on your phone — it sees too. This means your sensory input, it can synchronize with. This way, sometimes you might be able to invoke its memory with relatively vague language, like: "Do you remember that thing I saw?" or "Where are those things?" This kind of understanding is something we've never had before. It enables your AI to take action on your behalf, navigate browsers, call APIs, make reservations, purchases, and plans.

We currently have many cool demos of these ideas, but I think we're still some distance from putting them into production. Looking at previous waves of AI development, before GPT-3 around 2020 or 2021, large language models were still quite unstable. Or consider the development of speech recognition and dictation. This field went through 15-20 years of development, and only in the past 2, 3, 4 years has accuracy reached around 99.5%, with personalization enhanced. It's precisely because of this that we're seeing more and more people choosing voice input as an interaction method — partly due to changes in input methods, and partly due to advances in generation technology. So I think the agentic future is still a few years away.

5. Reid Hoffman: What changes do you think voice input will bring to the world of agents, some additional improvements? Generative AI enables smooth conversations, and voice input effectiveness improves because you just talk to it and it accurately understands what you mean.

Mustafa Suleyman: Yes, at an abstract level, the interface and the form of the interface determine what you can input. Search boxes and search engines are like mailboxes — they taught us the language of search, compressing our thoughts into three to five words, not even complete sentences. So what's interesting about voice experience is that it unlocks new parts of your thinking when interacting with computers, because you can express yourself in complete sentences, self-correct, go back and forth, and add those things we say in spontaneous conversation — and the model responds in paragraphs. What's different from before is that you might think of things you've never asked or said in this "digital way" before.

Because with an always-available AI companion that can complete any task you can do in the digital world. And I think people will likely ask it to do things they currently wouldn't do on a computer themselves. I think this will be a major shift, because the barrier to completing a task will be greatly reduced — both because there's almost no marginal cost, and because friction is greatly decreased. This way, you'll think of things you hadn't thought of before.

6. Reid Hoffman: So what inspiration can interacting with these agents bring to our creativity and inspiration?

Mustafa Suleyman: Think about how many random ideas and questions you have in a day. If you really think deeply about your own subconscious, at what moments do you become aware "I'm thinking about something" or "I wonder about something"? These thoughts are almost never expressed in language in the moment — partly because there's no one around constantly listening to your crazy ideas, except yourself. And people don't have time to type and record them; the barrier to pulling out your phone to record is also quite high. For example, I search about 5-8 times a day, which already takes considerable effort.

So if the barrier to accessing these ideas is lowered, the range of creative ideas you can generate expands, and these creative ideas will be recorded and manifested through your AI companion. We will perfect memory functions — I'm very confident about this. We already have memory on the web, and can accurately retrieve information from the web at any time. So we're just compressing this information to serve your personal knowledge graph. You can also add your own files, emails, calendar, and so on. This way, memory will completely change these experiences, because having meaningful conversations or interesting explorations around an idea, then having to start from scratch after a few sessions, is quite frustrating. We completely forget what we discussed before. So I think this is a major shift, because not only is the barrier to expressing creative ideas lowered, but these ideas won't be forgotten. You can make vague cross-references to things you don't remember: "What was that thing I mentioned three weeks ago?" Like having a second brain, expanding your mind. This is why emotional intelligence is so important.

7. Reid Hoffman: I completely agree. So let's get more specific about models now. Many entrepreneurs are thinking about how to view this evolving ecosystem in the coming years and what to pay attention to?

Mustafa Suleyman: The good news is that models are simultaneously getting bigger and smaller, and this will almost certainly continue. Over the past year, an emerging approach has become popular, called "distillation." Large, intelligent, and expensive models provide reasoning instruction to small models, and small models can engage in reinforcement learning from AI feedback, with quite good supervisory effects.

But scale still matters enormously. We still have huge room for development. So I don't see any signs of slowing down. There are also new forms of data to incorporate. Of course, we're adding video, images, and so on. Among these, I'm very interested in studying operation trajectories in complex digital environments — like jumping from browser to desktop, then handing off to phone, switching between different ecosystems, whether in closed environments or the open web. We're working to understand these operation trajectories, collect data, and use supervised fine-tuning and other methods. I think this will produce many eye-opening results.

8. Reid Hoffman: There are many angles to discuss data. A classic one is: what data can you process, and what's the quality of that data? There's already a lot of discussion about this online. But I think one point people overlook is: where will new data come from? For example, I find synthetic data interesting — if we have such data, we can train better small models and large models. So how do we obtain this data? How do we ensure their integration? How should entrepreneurs think about this?

Mustafa Suleyman: When you ask a chatbot a question, that's a question, not a prompt. When you write a 3-page style guide with some examples for the AI to imitate, you're writing a prompt. Then you ask a question of this prompted model — the prompt is essentially your data. It's your high-quality instruction set, guiding your pre-trained model to output in a specific way. What's surprising is that models can perform quite well with just a few pages of instructions; prompt them differently and you get completely different results. If we step back, for models to exhibit nuance, precision, and subtlety — for example, to align closely with your brand values or the product you want to build — you must provide tens of thousands of examples of "good behavior" and fine-tune them into the model. This is an ongoing pre-training process that depends on some high-quality data you're already quite certain about. The good news now is that tens of thousands of examples are quite accessible in many niche areas or specific vertical domains. This is also a positive signal and advantage at present. I think startups have significant room in high-quality fine-tuning of pre-trained models.

9. Reid Hoffman: How should entrepreneurs think about using and deploying small models? Many of them will use advanced models and large-scale models from companies like Microsoft, OpenAI, or Google to help them. But how should entrepreneurs view the opportunities brought by small models? How can they use small models to do something interesting and special?

Mustafa Suleyman: I think small models are undoubtedly the future. Ask a large model a question, and it actually activates billions of neural pathways unrelated to your current query. And what's crazier is that it does all this efficiently. But it's actually not necessary. If you have a clear use case, I think we will compress knowledge into smaller, cheaper models that can even be embedded in refrigerator magnets. Refrigerator magnets are the smallest digital devices I can think of, or they'll be in earbuds or wearable devices, and so on.

I think this ambient awareness revolution is coming, though people predicted it quite early. But this is the trajectory of knowledge compression — taken to the extreme, people will have quite practical functionality. A refrigerator magnet will know what it needs to know. It might know you've entered the kitchen in the morning, welcome you, talk to you about what's in the fridge, remind you of your schedule. Although people aren't really pushing this yet, any two-person team could absolutely explore this direction.

10. Reid Hoffman: Final question — what do you think people should be thinking about over the next two days (of the event)? I'll give you some time to think; let me give my answer first. For me, as technologists, we should consider what factors to introduce to design a more humane future. People often think "more humane" means looking back at how humans have performed over the past thousands of years, which is indeed an important part, but looking forward is equally important, because as technology evolves, humanity evolves too. We evolve our identity through these technologies — through the cup we're holding now, the stage, the podcast equipment, all of these have changed how we live as humans. So remember, we have emotions, passions, empathy, but how does this emotion get expressed in the dance with technology? That's the question I'm posing. What about you?

Mustafa Suleyman: I would say: ask yourself, are you ready to go all in? Because this is a transitional moment. I truly believe that in major technological transformations over the past fifty years, the structure of everything has been reshaped. I think the present is a good time to start and scale a company, and also a moment to transform your career development. Even if you're not an entrepreneur, as long as you want to take positive action — or if you're someone who loves organizing events, a researcher — this is a moment worth paying attention to. Because by 2050, the world will be very different. Now is when we collectively shape and influence where the world goes — nothing is predetermined. We can absolutely make the world better, and I think we're incredibly fortunate at this moment, feeling empowered by technology while also bearing enormous responsibility.

📮 Further Reading

Linear Bolt Bolt is an investment initiative established by Linear Capital specifically for early-stage, global-market-facing AI applications. It upholds Linear's investment philosophy, focusing on technology-driven transformative projects, and aims to help founders find the shortest path to their goals. Whether in speed of action or investment approach, Bolt's commitment is lighter, faster, and more flexible. In the first half of 2024, Bolt has already invested in seven AI application projects including Final Round, Xinguang, Cathoven, Xbuddy, and Midreal.