Colossal-AI's You Yang: What Choices Am I Making on Top of a 30 Billion Valuation Ambition?

真格基金·October 22, 2024

Reasoning Becomes the Next "Battleground," AI Infra Startups "Prospect" for Video Foundation Models.

Reasoning becomes the next "battleground," as AI Infra startups "prospect" for video foundation models.

Author | Kexuan Zhu

Editor | Caixian Chen

As the most-cited PhD graduate in high-performance computing globally on Google Scholar in 2020, Yang You was nominated by UC Berkeley for the ACM Doctoral Dissertation Award (one of two selected from 81 EECS PhD graduates that year).

He is also the only person in the world under 35 to have led (as first or corresponding author) Best Paper/Distinguished Paper wins at four top-tier conferences — AAAI, ACL, IPDPS, and ICPP.

In July 2021, Yang founded Colossal-AI. Three years have passed in the blink of an eye. Over that period, investor valuations of the company have grown 40-fold.

When speaking to AI Tech Review about his commercial ambitions, Yang stated, "We want to build Colossal-AI's market cap to the 20–30 billion RMB range, then go public. The goal is very quantified and clear: reach 2 billion RMB in revenue."

Looking back, Colossal-AI has made numerous strategic adjustments to meet infrastructure demands in the AI era.

As the focus of large model development has gradually shifted from pre-training to post-training and inference, the company officially began laying out training-inference integrated machines at the end of last year.

Around the same time, Yang and his team recognized that even as an infrastructure middleware company, they needed certain model capabilities of their own. Thus, in June this year, Colossal-AI independently developed and released Open-Sora, the world's first open-source video generation model with a Sora-like architecture.

On this strategic move, Yang observed, "If you don't build your own high-quality large model, then your inference platform won't have quality resources." This, in his view, is where Colossal-AI's competitive edge lies.

But the most critical reason, he explained to AI Tech Review, is that "video foundation models are still at the GPT-1 stage. When they reach GPT-4-level capabilities, the compute demands will be at their highest — and that's one of the directions where AI infrastructure companies can most easily generate significant value."

Video foundation models and training-inference integrated machines are Colossal-AI's two current priority areas for inference. Before making Open-Sora a hit product, the company still needs to rely on open-source models, making the integrated machines its primary near-term focus.

Beyond this, "ecosystem" is a core keyword Yang has repeatedly emphasized externally. He firmly believes that "the long-term moat for AI Infra startups lies in their ecosystem. Without one, it's very difficult to compete or coexist with giants."

Currently, Colossal-AI is the only AI Infra startup in the world with its own independent open-source ecosystem, with roughly 40,000 to 100,000 developers deeply using its products.

Below is the full interview between AI Tech Review and Yang You, edited for clarity without changing the original meaning:

1 The Long-Term Moat Is the Ecosystem

AI Tech Review: Colossal-AI is now three years old. Compared to academia, has entrepreneurship been more arduous? Any insights to share?

Yang You: I think becoming Yang Zhenning or Jack Ma — you can't really compare the two directly. Neither path to success is easy; that's my baseline view. Personally, I'm still young, only 33 this year, so there's plenty of room for exploration.

I haven't felt the difficulty yet. Not that I'm saying I'm exceptional — I mean I've set a reasonably attainable goal for myself. If I wanted to win a Nobel Prize within five years, that would be pure fantasy. Or build a company to NVIDIA's market cap in five years, also impossible.

I think difficulty is relative to the person. First, everyone needs to set a reasonable goal based on their own capabilities. Since mine is reasonable, I haven't felt too much hardship. Of course, many people have far more Google Scholar citations than me, and several domestic large model companies certainly have higher valuations than ours. It's about giving yourself proper positioning.

My current positioning: on the academic side, produce influential work; on the commercial side, build Colossal-AI to a 20–30 billion RMB market cap and go public. Our goal is very quantified and explicit: reach 2 billion RMB in revenue.

Yang You

AI Tech Review: Your website mentions Colossal-AI ranked #1 on GitHub Trending globally. Both Colossal-AI and Open-Sora have topped GitHub's global rankings multiple times. How many developers are in your open-source community currently?

Yang You: Roughly 40,000 to 100,000 developers are deeply using our products.

GitHub has weekly and daily rankings. I think we've made the weekly list three times and the daily list seven or eight times. Making the daily list means you're the most watched open-source project globally that day. Of course, I can't claim our product is the best in the world. In large model training and inference software, PyTorch is unquestionably #1. I'd say we're among the most influential besides PyTorch, though the gap with PyTorch remains substantial.

AI Tech Review: Have developers encountered difficulties using Colossal-AI's products? How do you address them?

Yang You: Our developers broadly fall into two categories: those leaning toward general education, and those where we need to balance customization demands against generality.

First, many people are constantly pivoting into the large model space without relevant background. If they lack even basic knowledge when using our tools — like we're making professional gold-mining equipment, but they don't even know where gold is — they'll certainly encounter difficulties. So we need to do educational work.

Then, for professional developers who've used DeepSpeed, Megatron, or even want to write their own frameworks, these deep users often have customized needs. Here we need to weigh trade-offs, since we're a general-purpose tool. We don't want to modify our tool into something overly niche for a few users. Both types of users can give us very effective feedback, which we use to improve further.

AI Tech Review: The open-source community seems to be an important part of Colossal-AI.

Yang You: I believe the long-term moat for AI Infra startups lies in their ecosystem.

This wave of AI is only two or three years old — it's still unclear who will become the next giant. But look at the previous wave of AI infrastructure companies, which was really big data. Between 2010 and 2020, the most successful were probably Databricks and Snowflake. Their strategy was to bind themselves to user ecosystems.

That's why Databricks built its Spark ecosystem with a large open-source community. The first two years focused on cultivating developers; as developers gradually integrated into various industries, they brought real customers and revenue, allowing the business to grow continuously.

Otherwise, as an infrastructure company without an ecosystem, it's very hard to compete or coexist with major cloud providers. AWS has tried building open-source ecosystems, but giants aren't necessarily good at it. That's why they allow companies like Databricks to get a slice of the pie — though they share revenue.

Our competitive advantage in overseas markets is also our ecosystem. We're the only AI Infra startup in the world with our own independent open-source ecosystem. These users are our loyal users.

AI Tech Review: Does Colossal-AI currently compete with cloud providers?

Yang You: We don't have a competitive relationship with cloud providers, especially not in China. Let me explain China's actual situation: Chinese cloud providers effectively don't have high-end compute, due to relatively strict compliance with US sanctions. What we mainly do is aggregate existing legitimate high-end compute from private Chinese sources into clusters, or provide services within enterprises — meaning if a company has purchased A100s or H100s, we further serve them.

AI Tech Review: So your focus isn't actually large model companies.

Yang You: We do collaborate with large model companies, but currently more with fine-tuning companies.

Let me explain pre-training, post-training, and inference. For pre-training, large model companies certainly prefer to do this themselves, but their funding rounds are substantial. I've never heard of Together.AI or Lambda Labs getting to serve OpenAI. To serve OpenAI, you need Microsoft-level scale. Or take Elon Musk's large model company — no suppliers, they built their own 100,000-GPU cluster.

Whether it's US large model companies or China's "Six Little Dragons," they either build their own infrastructure or get served by major cloud providers. Startups can't cut into this pie at all. We do have some large model company clients, but we don't designate them as strategically important targets. We just collaborate to see if there are technical gaps we need to fill.

Our current revenue mainly comes from post-training companies — those in the training stage after pre-training, such as automakers, pharmaceutical companies, oil companies, financial institutions. They have data privacy needs but lack large-scale clusters, perhaps buying at most 1,000 GPUs, yet they also demand high efficiency. They're essentially building internal business large models.

2 Where Value Lies in Inference

AI Tech Review: Colossal-AI has now entered the large model space itself, releasing Open-Sora, a Sora-like video generation model. Other domestic AI Infra vendors don't seem to have made this move. What was your thinking?

Yang You: Because in the next two to three years, video generation models have the greatest development space and the highest compute demands. I mean, if video models reach GPT-4 level — today OpenAI says video models are only at GPT-1 stage.

Currently, video models are still very small. For example, to generate a 720P video with a not-especially-large model, you need one machine with eight GPUs, taking roughly 1–4 minutes. This level of scaling places the highest demands on AI infrastructure optimization.

The second reason is that among our actual clients, we've genuinely encountered some with this need. They do hope infrastructure vendors can provide a good video model template to facilitate industry deployment.

Look at the best infrastructure companies today. Together.AI is a solid AI infrastructure company; by serving video model company Pika, they accumulated valuable product experience and revenue — essentially making a video model play. Lambda Labs' Lego on their platform is built on Colossal-AI's Open-Sora; they're making similar moves.

But fundamentally, our original thinking was that video models have high compute demands, making them a direction where AI infrastructure companies can most easily generate significant value.

Why do I believe video models have stronger long-term scaling laws than LLMs? Because video model training data is a true reflection of the objective world; the ultimate data creator is the Creator itself, and large models can fully discover its inherent patterns. LLM training data comes from the internet and books; data creators vary wildly in quality, with much ambiguity and garbage information.

From birth, humans don't read text every moment, but constantly receive video input — even text itself can serve as visual input. During infancy, humans develop intelligence without literacy, showing that visual signals alone are sufficient to scale. And the various physical laws in vision require scaling to certain magnitudes to be precisely mastered.

AI Tech Review: Compared to other video generation models using Diffusion Transformer (DiT) architecture, what specifically differentiates Open-Sora?

Yang You: Our biggest advantage is ID consistency.

Some commercial clients are using our feature, though we haven't released it in Open-Sora, because if it can really monetize quickly — targeting video creators, filmmakers, etc. — character consistency is quite critical.

For example, can I generate a personalized short film for my advisor's daughter's birthday? Our Open-Sora emphasizes ID consistency in content; we've invested heavily here, though it's not open-sourced.

We plan to bring this into our commercial product soon. After the commercial release, we'll see whether open-sourcing makes sense. The release timeframe is around National Day.

AI Tech Review: This is also one of your focuses in inference. Could you detail Colossal-AI's overall inference strategy?

Yang You: We're actually doing a lot on the inference side.

First, we need to make our video model excellent, deployed and serving users. This provides tremendous learning value for us — real users, real demand — and we'll work hard to optimize video model inference speed to the extreme.

Second, I have some reservations about AI Infra startups doing MaaS to sell API access to open-source models. At first we considered this path; it seems very attractive, but ordinary players can't get this cake. As an AI developer, would you call this kind of platform's API, or Moonshot AI's, Zhipu's, or DeepSeek's API? DeepSeek and Qwen all have their own MaaS. This approach feels like competing with general large model companies. And if you don't build your own high-quality large model, your inference platform lacks quality resources.

The core issue right now is this: when large models are still being debated on whether they can enter production workflows, price and speed are secondary. What matters most is whether they can truly produce intelligent effects, how good the content generation quality is. The core of inference MaaS right now is having quality resources — that's why only ChatGPT can generate hundreds of millions in revenue.

So our current focus is actually on training-inference integrated machines. Since Open-Sora clearly hasn't reached superstar status yet — our influence is far below Kimi's, let alone ChatGPT's — before making Open-Sora a hit, we still need to rely on open-source models. Here we need to think clearly about when open-source models have advantages over closed-source models, which relates to my earlier reservations, because I feel open-source models currently have no advantages over closed-source models in most scenarios.

When do they have advantages? When you can fine-tune on users' rare data, turning an open-source model through post-training into a highly customized version for that user. Since this scenario involves privacy, it becomes selling integrated machines. We can put our training infrastructure or software into the machine, or onto our Colossal-AI Cloud, letting customers first fine-tune their customized model this way, then serve internally through the integrated machine.

Or if they don't want to buy a machine, they can use our cloud, renting the entire machine. We essentially build them a serving instance, like AWS or Google Cloud where each server is an instance — we construct a serving machine where they can quickly deploy their model. Not calling APIs, but having full control over the model.

AI Tech Review: How is commercialization going for Colossal-AI's training-inference integrated machines?

Yang You: We've hit our targets. This year our goal was over 20 million RMB in revenue from integrated machines; we're now approaching 30 million.

AI Tech Review: Are you mainly collaborating with Huawei currently?

Yang You: Not just Huawei. We consider any legitimate Huawei or NVIDIA machines. Mainly Huawei's Ascend 910B and the soon-to-be-released Ascend 910C, plus NVIDIA's H20.

AI Tech Review: We heard you're in discussions about compute center deployments. What's the strategy there?

Yang You: Local compute centers have funding advantages but software weaknesses — particularly limited accumulation in software like Colossal-AI or training-inference integration. This can lead to purchased chips becoming scrap metal in the worst case.

For example, one local government acquired 3,000 petaflops of compute, but the idle rate was basically 99%, unsellable. That's very bad. They hope our software can optimize it, so those 3,000 petaflops actually deliver the value of 3,000 H100s.

AI Tech Review: Do you feel Colossal-AI still has any gaps to fill?

Yang You: Regarding gaps, last year we realized we needed certain model capabilities ourselves, which is why we built the video model. This was something we thought through at the end of last year. To serve these companies well, you need to have trained models yourself; if you haven't, they won't feel confident entrusting certain projects to you. So training video models also filled our gap in this area.

This qualifies us to serve. Now four Fortune Global 500 clients and seven Fortune Global 2000 clients have paid us nearly 10 million RMB, which is the greatest validation of us. This is also where Colossal-AI's competitive edge lies.

3 Opportunities at Home and Abroad

AI Tech Review: What differences do you see between domestic and overseas AI Infra currently?

Yang You: Domestically, due to various constraints, I feel there isn't a very automated product that can quickly emerge. And since compute is dispersed among local governments, plus high-end compute is officially embargoed against China, chip-layer restrictions have slowed upper-layer software development. Overseas, NVIDIA quickly unified the market, making things easier for these companies — they just need to build well on top of NVIDIA.

AI Tech Review: Is this also where domestic pain points lie?

Yang You: The current domestic pain point is definitely whether one or two hardware platforms can quickly consolidate the market — essentially unifying the chip layer below AI infrastructure software. But opportunities exist too; I believe China's market is large and won't be smaller than the US in the future.

AI Tech Review: What's Colossal-AI's current domestic and overseas deployment?

Yang You: Currently we don't make strict domestic/overseas distinctions, because I feel we're still in a product experimentation phase. By total revenue, domestic is roughly comparable to last year's figures. Our video model also has an overseas version; we serve some overseas clients, and our Colossal-AI Cloud has an overseas version too. Overseas clients are more willing to pay for software services, so currently domestic and overseas revenue are roughly comparable.

AI Tech Review: What differences exist between overseas and domestic Colossal-AI Cloud users?

Yang You: Domestic clients are more fragmented. We find domestic clients, if training larger models, tend to buy their own servers. Where they truly use cloud, numbers are high but very fragmented, so our minimum optimization unit is one GPU.

Overseas is relatively more consolidated. A company makes purchases and won't train such small models, so our minimum optimization unit is one server.

AI Tech Review: Where do you feel pressure in expanding overseas markets?

Yang You: At small scale — say, reaching 200 million RMB in revenue — there won't be too much pressure. But when we reach 1 billion RMB, we'll definitely attract attention from CoreWeave, Lambda Labs, and similar companies.

AI Tech Review: I recall you previously mentioned Colossal-AI as China's Together.AI. How do your products differ?

Yang You: The biggest difference is background.

We come from parallel computing; Together.AI leans more toward algorithmic modifications. Our philosophy is that after improving training and inference computations, precision doesn't change — only the computation method changes, with identical results. Together.AI may involve new tricks or methods to balance precision and speed.

For more content, follow below:

Unauthorized reproduction on webpages, forums, or communities in any form is strictly prohibited without authorization from "AI Tech Review"! For WeChat public account reprints, please first obtain authorization by messaging "AI Tech Review" backstage. When reprinting, indicate the source and insert this public account's business card.