Ten Thousand-Word Conversation with Scale AI Founder Alex Wang: Why Data, Not Compute, Is the Biggest Bottleneck for Large Models｜Z Talk

真格基金·August 5, 2024

We've exhausted all the easily accessible data.

Z Talk is ZhenFund's column for sharing insights.

Three months ago, Scale AI — which provides data labeling for AI models — closed a new funding round of nearly $1 billion led by Accel, with its valuation soaring to $13.8 billion. This unicorn, whose client roster includes OpenAI, Google, Meta, and numerous other AI giants, was founded by a Chinese-American teenager born in 1997. In 2016, while still an undergraduate at MIT, Alexandr Wang identified data as one of the three pillars of AI models that remained largely unaddressed, dropped out to found Scale AI, and achieved unicorn status within three years. What he once described as "just a summer project" quickly grew into the "data factory" for global AI models.

In a recent in-depth conversation, Alex Wang shared his views on model performance bottlenecks, approaches to data acquisition, and his management experience in building Scale AI's organization.

ZhenFund actively tracks frontier technology and innovation trends. Going forward, we will continue to bring you insights and deep thinking from the world's top founders. Stay tuned. This content comes from the 20VC podcast; below is the full translated transcript.

Core Content

Diminishing Returns in Foundation Models and the Data Bottleneck
- Why is data the biggest bottleneck to model performance today, rather than compute?
- How can we overcome the data bottleneck? What currently unused data can be captured?
- Facing data security challenges, will we see large companies return to on-premise deployment in the future?
- Why will proprietary, differentiated data become a moat for enterprises?
Scale AI's Experience in PR and Talent Recruitment
- Why is "the best PR is no PR"?
- Why should every founder today have their own direct distribution channel?
- Why are the most valuable employees often not the ones who joined when the company was hottest?
Rapid-Fire Q&A with Alex Wang
- What is the biggest misconception Alex has heard about AI?
- What will Scale AI look like in ten years? Will Scale AI go public?
- What is a question Alex has never been asked but thinks should be asked?

01 Data Is the Biggest Bottleneck to Model Performance Today

Harry Stebbings: Alex, great to sit down with you face-to-face. Thank you so much for coming in today.

Alex Wang: Great to be here.

Harry Stebbings: Let's skip the origin story you've told many times and get straight to it. When we talk about model performance today, do you think we've reached a point where we're seeing diminishing returns as we throw more compute at the problem?

Alex Wang: It's interesting. Especially right now — OpenAI shipped GPT-4 starting in fall 2022. Since then, we haven't seen a new foundation model or a dramatically new model better than GPT-4. We haven't seen GPT-4.5, GPT-5, or any other lab shipping a model dramatically better than GPT-4, despite massively increased compute spend.

Since ChatGPT launched, you can look at NVIDIA's revenue chart and it goes straight up after GPT-4 shipped. NVIDIA's data center revenue went from roughly $5 billion per quarter to now over $20 billion per quarter. So within that same time window, there have been tens of billions of dollars spent on high-end NVIDIA GPUs. GPT-4 shipped before the massive spike in NVIDIA spend, and within that time period, we haven't seen a major breakthrough since GPT-4.

Overall, we're seeing compute investment growing exponentially now. But as a community and as an industry, we're still waiting for the next great model.

Harry Stebbings: So do you think we've hit a high point and are plateauing, waiting? Do you think this lasts months, or is it like autonomous driving, where we saw performance plateau for years before we recently got breakthroughs?

Alex Wang: AI models have three components: compute, data, and algorithms.

The history of AI has been built on all three of these pillars advancing together. You need massive amounts of compute. You need algorithmic advances like Transformers or RLHF, or future algorithmic advances. And you need the pillar of data to support it. I think the performance stagnation we've seen recently can be explained by hitting a data bottleneck.

GPT-4 was a model that essentially trained on the entire internet and used massive amounts of compute. I think much of what the industry has done in the last few years is massively scale compute without simultaneously building the other two pillars. So we need more algorithmic improvements, but the key point is that we need to ensure there's more data to support these improvements.

Harry Stebbings: What do you mean by a data bottleneck? How do we overcome it?

Alex Wang: Simply put, we've exhausted all the easily accessible data, including all the data on the internet and common crawling data.

Harry Stebbings: So easily accessible data is social media content, anything that isn't behind a paywall, anything that's easy and free to crawl.

Alex Wang: Easy and free-to-crawl content, things you can download — basically anything that's already been written down and simply accessible from the open internet. After that, a lot of AI improvement came from pre-training advances. That's basically training these models to be very good at imitating internet content. Now, these models are extremely good at imitating internet content — better than humans even.

But when we think about AGI or powerful AI systems, we want more than just imitating internet content. We need AI systems that can perform tasks, solve hard problems, work with humans on everyday issues, and we can't achieve that vision using only internet data, and we've more or less exhausted the data on the internet.

Harry Stebbings: Why can't we achieve that with internet data alone? When we think about effective AI agents or software doing work rather than just selling tools, why can't existing data get AI from tools to doing work?

Alex Wang: The simple answer is that when humans are doing more complex tasks, they go through a lot of thinking processes that aren't written down on the internet. For example, a fraud analyst at a bank, when analyzing suspicious transactions, needs to analyze various different data snippets and apply reasoning and human intelligence to make decisions — looking at some data and inferring certain conclusions from it. But that process isn't written on the internet for models to crawl. You could say that all the reasoning and thinking that drives the economy today is not written on the internet, so if you're just training models on internet data, they can't learn that from the data.

Harry Stebbings: So how do we encode and capture data that hasn't been encoded yet? Like the thinking process of that fraud analyst you mentioned, the analysis, discussions in internal meetings — none of that has been converted into structured data, in any dataset. How do we capture that data for subsequent work?

Alex Wang: I think what we need now is frontier data. We need rich frontier data to break through the current data scarcity and the data bottleneck we're facing. This frontier data includes complex reasoning chains, deliberations, chains of model agents, tool use — these key components all need to be encapsulated into frontier data to enhance model capabilities.

02 GPT-4 Was Trained on 1 PB of Data, While JPMorgan Chase's Proprietary Dataset Is 150 PB

Harry Stebbings: How do we capture this data?

Alex Wang: There are mainly three pathways. First, many enterprises internally possess massive amounts of proprietary data that, for various reasons, hasn't been uploaded to the internet. JPMorgan Chase's proprietary internal dataset is 150 PB, while GPT-4 was trained on less than 1 PB of internet data. The amount of data that exists inside large enterprises is absolutely astronomical.

So one pathway is mining all this existing enterprise data and extracting all the quality information within it. The volume is massive — we can mine this existing enterprise data and capture its value.

Second, while this data is proprietary, we can deliver it in customized form to customers who need it. Finally, we need to go through a series of processes to refine and use this data to solve the actual problems enterprises face.

Harry Stebbings: But they'll never open-source it, right? It's all proprietary.

Alex Wang: Exactly. This can only be each enterprise going through a process where, say, my company has a set of very important problems, and I need to mine all my existing data and refine it for use in AI systems to solve my own company's problems.

Harry Stebbings: We talked about diminishing returns earlier. I was speaking with a prominent CTO the other day who said the real breakthrough is whether we can actually solve the "reasoning" problem. How do you think about our ability to solve reasoning, and the impact of data in helping us tackle that?

Alex Wang: I think these models are very good at reasoning when trained on massive amounts of data. However, there's a large difference between human intelligence and machine intelligence. Humans have a very general form of intelligence — they can adapt to their environment, self-adjust, understand what's happening around them — and no AI system today can do that. We have to recognize that as a limitation.

This means that for any situation where we want these models to perform well, we need data from that situation or that scenario. We need to provide the model with sufficient data to support its reasoning capabilities in various contexts. In fact, if models have enough data, they will be able to perform well in various contexts.

So there are probably two ways to solve the reasoning gap. One is to build some kind of general reasoning capability — if achieved, that would be a massive breakthrough. The other is to ensure there's sufficient data for every single scenario from a data perspective, and if you flood the model with data across all these scenarios, you'll get a model with strong reasoning capabilities.

Harry Stebbings: When we see massive enterprises like JPMorgan Chase, Goldman Sachs, or any large corporation sitting on enormous datasets, how do we transition from a data-scarce environment to a data-rich one? This data, by its proprietary nature, doesn't easily flow toward general-purpose models that could otherwise help the world, humanity, or achieve any breakthrough progress. How do we make that shift from data scarcity to data abundance? Is it through creating synthetic data? How should we think about this?

Alex Wang: Yeah, I think you're exactly right that we need to generate new data. To get from GPT-4 to GPT-10, we need to find new ways to produce frontier data. Take chips as an example — we need to build more fabs, improve resolution, manufacture nanoscale components. When we talk about advancing compute, it's natural to think about increasing the means of production. But I don't think we've had that same instinct about data — we need to change that mindset.

The process of producing data is actually a hybrid synthetic process. We need algorithms to do the bulk of the heavy lifting in data synthesis, but we also need human expert input and guidance to help when AI systems encounter problems or edge cases.

The scaling of autonomous driving illustrates this well — much of it relied on safety drivers. You put a safety driver in the car, and when the vehicle has issues, the safety driver takes over. AI systems need that same setup. We need AI models to generate large volumes of data, but we also need humans to step in and adjust the models when necessary to ensure data quality.

Harry Stebbings: What would these people look like in today's organizational structures? Are we creating new roles for these AI "saviors"?

Alex Wang: Yeah, we could call them "AI trainers" or "AI contributors." I'd go so far as to say that contributing data to AI is actually one of the most impactful jobs a human can have. Let's say I'm a mathematician. I could choose to study pure mathematics on my own — that's one path in life. Or I could choose to use all my skills, talents, and intelligence to help make AI models smarter.

For example, I could make GPT-4 a bit smarter at math. If I apply that improvement to every single use of GPT-4, considering all the math students, companies, and developers who will use it — that creates enormous impact. So as a human expert, you have the ability to help improve these models by producing data, and thereby impact all of society.

What we're seeing is that for scientists, mathematicians, doctors, and all human experts around the world, this is a very exciting proposition — they can transfer all their capabilities, intelligence, training into a model that will impact all of society.

Harry Stebbings: People often say the biggest challenge in data governance is actually data structure and clarity. So how should we think about data structure? For instance, I don't know the specifics, but I suspect JPMorgan's 150 petabytes of data aren't perfectly structured and readily ingestible by many models. How should we think about the structuring challenges of these massive datasets?

Alex Wang: I think this is a situation that requires parallel efforts on two fronts. One is mining existing data, which is in any case a one-time effort. From mining all existing data, you'll get a one-time benefit, and that could be very meaningful.

Harry Stebbings: Do you think within five years, everyone will be internally mining their largest data sources?

Alex Wang: I don't think everyone will, but the most advanced companies certainly will. Then we'll reach a point where we still need to improve models, and ultimately it all comes down to data production. What means of production do you need to serve your next step of data production, just as you think about forward production in chips.

Harry Stebbings: What's the other form?

Alex Wang: The other is pushing data production. Data mining and pushing data production are the core directions of data sourcing. From a broader perspective, I think many bottlenecks in AI progress are fundamentally more about data than anything else. As NVIDIA continues to manufacture hundreds of billions of dollars worth of chips, if we can proportionally produce corresponding amounts of data as we get more and more chips. If we can produce both simultaneously, then we'll get model capabilities beyond what we can imagine.

Harry Stebbings: So when we think about increasing data supply, what approaches can we actually take? I'm thinking of Dan Siroker at Limitless — he's basically created this new hardware device that records everything you say and do, and it generates your own personal AI because it has everything you've said throughout your day. In my mind, this is a new form of data creation. How do you think about increasing data supply?

Alex Wang: Probably two main aspects. One is efforts like Limitless, which is essentially more longitudinal data collection — collecting more of what naturally happens in the world. The other is in the workplace, where there might be some continuous data collection around things like "what applications are being used," in what sequence, copying something from one place to another.

Harry Stebbings: You have a lot of these RPA and many UiPath processes to accomplish this kind of task — I'm very used to that approach.

Alex Wang: Yeah. That's process mining, a term in SaaS — basically continuous collection of existing enterprise processes. Then there's the consumer angle, somewhat like what you mentioned, a longitudinal collection of your own life, like wearing Meta Ray-Bans. And then there has to be a commitment to having human experts collaborate with models to produce frontier data.

The two pathways I mentioned — whether enterprise process mining or consumer data collection — these will produce valuable datasets, but they won't produce data that actually pushes models forward.

Because to push model advancement, you need extremely complex data — this is where you need agentic behavior, complex reasoning chains, this is where you need advanced code data or possibly advanced physics, biology, or chemistry data — these are what truly push model boundaries.

I think this requires a global infrastructure-level effort to make happen. Like, I think we need to think about how to get experts around the world collaborating with models, helping produce AI systems that become the world's best scientists, or the world's best coders or mathematicians.

03 Proprietary, Differentiated Data

Will Become the Enterprise "Moat"

Harry Stebbings: When we consider model commoditization, how should we think about proprietary access to these data sources? Someone once told me that OpenAI's models aren't necessarily better — they just have better access to data, they've purchased more data, and so on; data was the main reason they've performed better in the past. But will we see a model gain data access that other models don't have? How should we think about fair and equitable access to data from a model perspective?

Alex Wang: I think you're exactly right — if you think about the competitive arena among these different model providers, data is actually the main pillar where genuine, lasting competitive advantages emerge.

So if you think about where the moat is in large language model (LLM) competition, data is one of the few areas that can create sustainable barriers. Because algorithms are IP, but they'll be understood by the industry at some point; you can have more compute than others, but others can just spend more money to buy the same compute. Data is one of the only areas that can truly generate long-term sustainable competitive advantage.

Harry Stebbings: I agree — when you look at some of OpenAI's deals, they've obviously partnered with the Financial Times and gained access to the FT archives, and I think they've actually done quite a bit with Axel Springer too. That's access many other models can't get, giving them superior content on any relevant query.

Alex Wang: Exactly. I think this is the beginning of a mindset shift toward viewing data as a moat. The Financial Times, Axel Springer — these are the first signs. But going forward, these labs will be thinking about many questions: What data will I use to differentiate from competitors? How will I produce this data? And what long-lasting advantage will this create?

We've been talking about data in the context of model commoditization, but I actually expect that we'll see companies begin to formulate data strategies that create more differentiation in the market over time.

Right now in San Francisco, researchers and big-company CEOs boast about how many GPUs they have — the biggest indicator of how seriously they take AI is their GPU count.

But I think in the future, they'll boast about what data they have access to, how much data they've produced, and their unique rights to different data sources. I think this will actually become the main arena of competition going forward.

Harry Stebbings: Given that data strategy is a potential factor for winning and competing in different ways, do you think it will be difficult to see commoditization of these models over time?

Alex Wang: There are two possibilities for the future. One is that even data strategy quickly becomes commoditized — different labs copy each other, or eventually all converge in the same direction.

Harry Stebbings: Totally agree, especially with many content producers — they won't sign exclusive deals with one model and not others.

Alex Wang: Yeah, different labs need to formulate strategies to produce their unique datasets. For example, Anthropic is very focused on enterprise use cases — maybe they need a data strategy that gives them access to greatly differentiated new data to support these enterprise use cases. Or maybe OpenAI with ChatGPT needs a unique data strategy that lets them leverage the fact that they have so many users and so much reach. Going forward, each lab will need to rely on their ability to acquire proprietary and differentiated data.

Harry Stebbings: Do you think we'll see a trend back to on-premise deployment? When we think about JPMorgan Chase's 150 petabytes of data, I don't know if they'll be willing to throw all their most sensitive data into the cloud. Will we see large enterprises running and working with models on-premise?

Alex Wang: This is a very interesting question. When we talk to these large enterprises and leaders within them, they quickly realize that their enterprise data is probably their only competitive differentiator in the AI world. They'll be extremely cautious — if they strike a deal and somehow all their data gets acquired by the model developer or shared in some way, they could be mortgaging their entire company's future.

I actually think there's a massive opportunity for open-source models, whether that's Llama models, Mistral models, or others. These can run on-premise, enterprises can take them and customize them on their own data, and it never needs to go back to the model developer or the cloud or anywhere else. I think there's huge unmet demand there. This is actually the direction most serious enterprises will head: I need to make sure my data isn't being used in any way to improve my competitors' capabilities.

04 Future Pricing Will Be Based on Usage

Harry Stebbings: I think over the next five years, AI services will actually generate more revenue than AI models. We're already seeing Accenture's generative AI revenue at $2.4 billion, while OpenAI's revenue is apparently $2 billion. How do you think about this? Scale AI works with some large enterprises today, and the learning and adoption curve for AI is a challenge for large enterprises. As we scale this AI education curve, do you think providing this kind of service will become a core business for the company in the coming years?

Alex Wang: First, AI will definitely create a lot of value, but where the value capture happens is the critical question. Andy Grove's High Output Management has several chapters about Intel — at first we thought it was where the most value would be captured, but then realized value would keep migrating to other parts of the stack. I read it about ten years ago and thought it was strange at the time, and now in AI you're seeing this again, because it's so novel and nascent that where exactly value is created in the stack keeps shifting.

I think there's a lot of competition around the models themselves. I'm not sure how much value the models themselves have, but I'm very confident that everything above and below the models will be valuable. For infrastructure, NVIDIA is the biggest company built on AI today, they're one of the top companies globally by market cap. NVIDIA is below the models, and above the models, all these applications and services will be built on top of it.

Harry Stebbings: The question is, we do have companies like Notion AI, Box right now — but have you seen their growth numbers? Salesforce and others are growing in the single digits now. The commoditization of these features will give us better products, but I'm not sure you can capture revenue by raising prices. How do you think about this?

Alex Wang: Yeah, our view on this question comes from an article that circulated widely in the software world. I think it's a deliberately provocative argument.

Harry Stebbings: For those who haven't read it, could you summarize its core thesis?

Alex Wang: The article makes a very clever comparison, comparing today's software companies to traditional media companies before the rise of social media. In the old media era there were many great, high-end media companies with experts producing highly differentiated content, but then they were broadly disrupted by social media and the internet because content distribution costs suddenly dropped dramatically. The world of media consumption became a very broad and diverse collection — you'd consume content made by anyone you found interesting, much more on-demand rather than the walled gardens of large media producers.

This is basically similar to what's about to happen in software. Enterprises today live in walled gardens of a few software providers. Now with generative AI and all these other trends, they'll have these collections of different applications and point solutions, and portals to collections of various software providers. We'll move from a world of a smaller number of walled-garden SaaS applications to a much more fragmented universe.

Harry Stebbings: Do you agree with this?

Alex Wang: It's deliberately provocative, but I buy one aspect of it. I do think enterprises and the world at large will demand much higher levels of customization. The first tech company that moved in this direction was Palantir. They've had a bad reputation for a long time because everyone thought Palantir was just a consulting company, but Palantir's view was: we go into enterprises, understand exactly what their problems are, and help them build the perfect application that connects all their data. If we can do that, then what we build will be more valuable than anything any other software provider can offer.

Obviously they were doing this before generative AI and all the tools that would make it more viable. I do think this is a direction the world is moving, especially now that software production costs and software creation costs have dropped dramatically. We're moving toward a world where more and more software used by enterprises will be customized and purpose-built for specific problems.

Harry Stebbings: What does this mean for how large enterprises' engineering teams are organized? Do they shrink? Do they focus on different things? Just having the world's best prompter team? What impact does this have on engineering team structure?

Alex Wang: Yeah, I think software engineering overall will change dramatically. Much of what developers spend a lot of time doing today, they won't need to spend time on in the future as models get better at coding. But a large portion of what they do is irreplaceable.

Over time, I think what's especially valuable is: what is my customer's problem? Or what problem do I need to solve? And then translating those problems into engineering problems that AI engineers then solve.

Harry Stebbings: Everyone says we'll see the end of per-seat pricing in software. To what extent do you think we'll see the end of per-seat pricing in this next wave of software? Especially from a data perspective, you might see a more usage-based pricing model — do you think this will really replace per-seat pricing?

Alex Wang: The reason per-seat pricing doesn't make sense in the future is that in today's enterprises, most productive work is done by employees. But in the future, you can imagine more and more work being done by AI agents or AI models, and then per-seat pricing really doesn't make sense.

As a software provider, a solutions provider, you want to make sure you're capturing the value you provide to people and the value your agents or AI systems produce. This will shift the world's pricing from perception-based pricing to usage-based pricing.

05 In 10 Years, Foundation Models Will Be Even More a Battle of Giants

Harry Stebbings: One of my biggest concerns is regulatory provisions stifling innovation, like consumer data protection laws and unnecessary regulation around data access. Do you think my concerns are valid? How do you think about regulation around data access?

Alex Wang: This is a very important question, and what we've seen in Europe is indeed a very strict regulatory approach to data. I personally believe that looser data regulation is not incompatible with liberal democracy — freer data access provisions are very compatible with liberal democracy. Society needs to find the right balance and find a way to address this.

But I think this is a very important question because the US has made tremendous efforts to ensure it doesn't slow down chip production, including at the regulatory level. We need a similar perspective on data. From a policy standpoint, whether in the US or the UK, we need to think about how to ensure countries aren't tying their own hands in future data production.

Harry Stebbings: So what would a pro-data regulatory stance look like?

Alex Wang: First, I think there are large datasets that don't confer proprietary advantages to specific players — these need to be aggregated and opened up to the entire industry.

To give simple examples, aerospace safety data, for instance — obviously a hot topic. But to drive progress across the entire industry, aerospace safety data should be centrally aggregated. Or fraud and compliance issues in financial services that I mentioned earlier — this data should also be aggregated to build stronger capabilities. So I think there should be some degree of data aggregation across industrial domains to drive progress across entire industries.

And I think in many consumer-facing domains, we need to address many existing restrictions to ensure they don't hinder AI progress.

For example, HIPAA in healthcare and all the personally identifiable information (PII) and other restrictions currently more or less prevent patient data from being used to train AI models.

But I think as a civilized society we do want to learn from all existing medical data about how to cure human diseases. So we need to find solutions — like, how do we clarify anonymization provisions or find an explicit way to leverage existing patient data to improve future health outcomes.

Harry Stebbings: What do you think the foundation model landscape looks like in 10 years? Who's independent, who gets acquired?

Alex Wang: What we've seen at the core of foundation model competition is cost, and costs are now extremely high. These models have gone from costing hundreds of millions of dollars to billions, and potentially hundreds of billions. I think in 10 years they could cost hundreds of billions or even trillions of dollars.

Not many entities have that much discretionary capital to invest in these AI models. So over time, AI work, especially foundation model work, will increasingly center around nations or large tech companies — these will be the only entities that can possibly fund or afford these massive AI projects.

By then it will be even more a battle of giants.

Harry Stebbings: Will we see all the smaller players get acquired by the big cloud providers — Google, Amazon, NVIDIA — and integrated into their existing solutions?

Alex Wang: I think some of these partnership dynamics will be really interesting to watch, like the OpenAI-Microsoft relationship, the Anthropic-Amazon relationship. And how these partnerships evolve over the long term is one of the most interesting questions of this technology era.

06 "The Best PR Is No PR"

Harry Stebbings: You've made a great point about PR: "The best PR is no PR." What does that mean?

Alex Wang: Fundamentally, traditional journalism isn't particularly conducive to building a great company.

Specifically, a lot of traditional journalism is click-driven. So the traditional news engine will pump you up on your way up to generate clicks, and then tear you down on your way down to generate clicks again.

This is in stark contrast to direct channels like 20VC, where founders and companies can communicate their message fully and explain what they're actually doing.

Harry Stebbings: But from another angle, I don't care about clicks. Though that's a bit unfair to traditional media. Yes, we have sponsors, but even without them we'd still do this show. I don't do sensationalist headlines. I don't exaggerate to grab eyeballs because I'm not just optimizing for clicks.

Alex Wang: Exactly. You're genuinely trying to educate and explain to your listeners what's happening.

Harry Stebbings: Though this seems a bit unfair. Can you imagine if someone said, "Hey, I'm going to build Scale AI, but I don't care if we lose money." You'd be like, "How am I supposed to compete with that?"

Alex Wang: Right. But I do feel that testifying before Congress was actually more fair treatment than facing the media. That sounds ridiculous, but I think we're in a state where a lot of traditional media is dysfunctional — the system itself, because of this extremely click-driven rather than genuinely educational approach, has almost no way to be fully fair to companies.

So the imperative is for companies themselves to tell their stories correctly through direct channels like podcasts, where their message won't be distorted.

Harry Stebbings: I completely agree, which is why building a brand today matters more than ever — because if you don't own your distribution channel, it gets twisted. Has this changed your strategy?

Alex Wang: Yes. We've thought a lot about how to communicate directly, what is the purest way to convey and explain what we're doing.

This conversation we're having right now is a great example — you ask me a question, and I answer exactly what I believe, and that gets communicated to your listeners and viewers. I think that's one of the purest forms of communication.

Harry Stebbings: One big mistake people make is they try to build direct channels for their company, but the public doesn't care about company scale — they care about Alex Wang. It's much easier to build a following with a person than with a company.

Alex Wang: Yes, I think very few companies can pull that off. OpenAI is one of them — I think OpenAI as an entity and brand carries a lot of meaning.

Harry Stebbings: That's true, but if you look at social media attention between Sam Altman and OpenAI, the former gets way more attention. People are more into hero worship than ever now.

And it's not just tech — Messi, Margot Robbie in Barbie, the individual stardom within organizations or movements drives everything.

Alex Wang: I think this reflects a deep human need. We as humans are wired with so many mechanisms to understand individuals — we have the capacity to understand individuals, but it's very hard to understand an organization.

Harry Stebbings: So should founders care about traditional PR? Should they care about exposure in traditional media?

Alex Wang: I don't think so. We're in an era where founders don't need to pay too much attention to traditional PR. They should think about what interesting perspectives they can offer, and what is the purest way to communicate those perspectives.

Harry Stebbings: When do you feel like the media has tried to unfairly tear you down?

Alex Wang: Almost inevitably, we've experienced a rapid rise — in 2019 we became a unicorn. For a few years after that, everything seemed to go smoothly. But starting in 2022, the entire media narrative shifted toward dismantling tech companies.

To some degree this was fair — many tech companies had gotten very high valuations, there was so much excitement in tech, and then the market crashed. But from 2022 onward, especially for us, the media tone completely shifted. The media started pointing to mistakes made by companies like ours or many of our peers, rather than trying to maintain a balanced perspective.

07 "Scale AI Has 800 People, and I Still Approve Every New Hire"

Harry Stebbings: On driving results through incentives, you previously said: "Why is it harder than it sounds to hire people who genuinely care about the work and the company?" What did you mean by that? How do you think about this when hiring?

Alex Wang: It sounds simple, but if you hire people who genuinely care about the quality of their work, who really care about the organization and making sure it's impactful — that means they're willing to be meticulous about every detail. If they hit difficulties or obstacles, they'll do whatever it takes to overcome them.

That's how startups work. Everyone in these small teams cares about the work ten times, a hundred times more than the average employee at a big company, so you end up solving way more problems than big companies do.

Harry Stebbings: How many people do you have now?

Alex Wang: We're about 800 people.

Harry Stebbings: 800. You're at a pretty substantial scale now. It's like a sports team only recruiting A+ or A-level athletes — only hiring top, first-rate talent gets harder because first-rate talent is scarce by definition. Can you have 800 A-level players?

Alex Wang: I think the answer is yes.

Something we discuss internally is how to build a truly small, elite team, only hiring the absolute top of the top. This comes down to the hiring process — at our current stage, I still personally approve every single new hire. I either interview them directly or review the interview feedback and understand every person we hire, to make sure we maintain extremely high standards.

Harry Stebbings: How often do you push back against your team's recommendations when hiring?

Alex Wang: On average 25% to 30% of the time, which is quite high. Usually it's because a new hiring manager might need to calibrate their standards, or because of various special circumstances.

But for me, as the founder, I've seen everyone who's joined and who's succeeded versus failed. I'm almost like an algorithm — I've developed the most refined dataset to understand what kind of people make Scale successful, to understand the distinction between top-tier and merely good.

As a founder, my job is to make sure that as an organization, we're fully leveraging all the knowledge and experience we've accumulated over the past eight years and passing that down.

Harry Stebbings: What's the biggest mistake you've made in management or leadership? One of mine is, I thought people are motivated by either fear or freedom. When you hire someone, some people act because "you have to perform," others because "I believe in you, I respect you, do the best you can." You have to identify which camp each person is in, and then hope that if their skills are there, they'll perform at their best. I wish I'd known this when I started, but I didn't — I just tried to make everyone act out of fear. What do you wish you'd known, and what mistakes have you made?

Alex Wang: The biggest mistake was actually during 2020-2021, when I thought that hypergrowth for the company meant the team had to hypergrow too. During those years, like many tech companies, our team size doubled or tripled annually. In 2020 we had about 150 people. By end of 2022, we were over 700. It was crazy hiring, team hypergrowth. But I found that when you're hiring that fast, it's impossible to maintain the high standards and excellence we were just talking about.

Harry Stebbings: Did you see the standard dropping immediately?

Alex Wang: It's somewhat subtle. After you hire all these people, you might notice it a year later, or six months later. You gradually notice that certain challenges in the organization — problems that used to be easily handled and solved — start becoming entrenched, and you can't get around them.

So you'll notice that from our peak of 700 at end-2022 to about 800 now, team size has stayed basically flat. But company revenue has grown significantly.

Harry Stebbings: The interesting thing is, companies have brand inflection points. They get hot, they get cold, then they get hot again. You know what I mean?

Alex Wang: I do.

Harry Stebbings: From the outside, it feels like Scale AI is hot again.

Alex Wang: This is actually something really interesting, and I've asked Patrick Collison (Stripe CEO) about this too. Stripe is an incredible company, and for most of its life I think it's been one of the iconic companies in Silicon Valley.

I asked him whether he thought being an iconic company helped them with hiring overall. He made an interesting point: The best people Stripe ever hired were the ones who would have joined Stripe regardless of whether it was hot or not. These were often people who took unconventional paths, but they were the best hires Stripe ever made. And many of the people who joined because Stripe was the hottest company in Silicon Valley weren't necessarily the most valuable employees, for various reasons.

The conventional wisdom and narrative is that you want to be the hottest company to attract the best talent, which enables hypergrowth, which leads to continued growth. But I think that's often very difficult. What's more important is building a self-sustaining talent ecosystem that maintains very high standards, always seeks the best people, and operates independently of whether the company is hot or not. Because as you said, you have hot periods and cold periods — they alternate. You need that talent ecosystem to be self-sustaining and independent of the company's temperature to drive the company to do its best work.

Harry Stebbings: I think it also depends on function. A lot of go-to-market functions tend to cluster together, and if a brand is hot, the best sales teams get attracted to it, and you can concentrate a bunch of great salespeople, especially as you expand geographically.

I think about OpenAI's go-to-market team in London. They're exceptional, one of the best teams in London. And that's because they have an incredible brand. You know what I mean? So it depends on how close to the core you are and what function you're in.

Alex Wang: Yes, I agree. But if you look at OpenAI's core technology development, a lot of it is still driven by people who joined before OpenAI became the hottest company.

Another company that's been through this is Airbnb, led by Brian Chesky. He publicly said after COVID that he suddenly realized he had to rebuild the entire company. He dramatically reduced team size, doubled down on talent density, and then built teams that stayed small. I think they're now even one of the most profitable companies per capita in all of tech, if not the most. Because he realized he didn't need to keep scaling team size to achieve financial returns and output.

08 Rapid Fire

Harry Stebbings: I want to do a few quick ones. I'll give you a short statement, and you give me your immediate thought.

Alex Wang: Alright, let's do it.

Harry Stebbings: What have you changed your mind on most in the last 12 months?

Alex Wang: I think it's the topic we've been discussing around hypergrowth. Mainly decoupling hypergrowth of the team from hypergrowth of the company, and making outsized investments in quality and excellence.

Harry Stebbings: What's the biggest misconception you hear most often about AI?

Alex Wang: I think the biggest misconception today is that we're only a compute problem away from AGI. I think we need data to get there.

Harry Stebbings: If you could have anyone in the world who's not currently on your board as your next board member, who would you choose?

Alex Wang: While it's not very practical, I think Satya Nadella is one of the most outstanding business strategists of the modern era. What he's accomplished at Microsoft is shocking, and I think any board would be incredibly lucky to have him.

Harry Stebbings: What's a question you've never been asked or rarely been asked that you think should be asked?

Alex Wang: How have my views on AI changed across different eras?

I mention this because I founded the company in 2016. The first three years of the company were entirely focused on autonomous driving and self-driving cars. Then in 2019, we actually started working on generative AI, started working with OpenAI on GPT-2.

So we're one of the few AI companies that's seen multiple eras of AI technology, that's seen the first boom and bust cycle of autonomous vehicles. I think an interesting question is, what's the same and what's different across this continuum?

Harry Stebbings: How have your views changed? What are you most excited about now?

Alex Wang: I'm excited, but I think there's also reason for caution.

One thing that happened in autonomous vehicles, in the self-driving boom, is that a lot of promises got decoupled from technical reality. So a lot of the well-known autonomous vehicle companies and well-known organizations were making bolder and bolder promises in order to raise money. These promises weren't entirely decoupled from reality at first, but over time they became more and more decoupled from technical reality. That led to a very painful trough where the promises didn't materialize. So the whole industry seemed to fall apart.

But actually, now we have Waymo driving in San Francisco, perfect L4 autonomous vehicles on the road, Tesla's autopilot is also very good. If we had had more modest promises along the way, I think we would be amazed by autonomous vehicles right now, instead of having gone through this massive up and down, and now maybe being on the upswing again.

That's a big concern I have about generative AI, and I hope it doesn't happen, but the same thing could play out where we start making massive promises about the technology, but these promises get decoupled from technical reality, and that creates a gap that necessarily leads to a hangover.

Harry Stebbings: What does Scale AI look like in ten years?

Alex Wang: I hope we're doing something similar to what we're doing now, continuing to be the data foundry for AI, powering AI progress with data.

Harry Stebbings: Do you want to go public?

Alex Wang: Of course. I've also been thinking about how to solve those timeless problems.

Harry Stebbings: Do you want to be a public company CEO? If you're Stripe, I don't know why you'd want to do that.

Alex Wang: There are clear benefits to being public. But I think Stripe is an amazing company because they can have really good profitability and achieve all their core financial objectives without going public.

Harry Stebbings: Alex, it's been great to have you on the show. Thank you so much.

Translated by Stone

Edited by Wendi

Recommended Reading