Silicon Intelligence Goes Left, Colossal-AI Goes Right | Z Talk

真格基金·August 26, 2024

Colossal-AI's approach to training large models is already making money.

Z Talk is ZhenFund's column for sharing knowledge and insights.

Yang You, founder of Colossal-AI, holds a PhD from UC Berkeley and was nominated by UC Berkeley for the ACM Doctoral Dissertation Award. He was also named to the Forbes 30 Under 30 list. You was also a participant in the third cohort of ZhenFund's "ZhenPlanet · Frontier Tech Entrepreneurship Camp."

In 2021, Professor You founded Colossal-AI, built around Colossal-AI, a general deep learning system for the large model era. With the mission of liberating AI productivity, the company applies techniques from high-performance computing to accelerate and improve the efficiency of large model R&D and deployment. Colossal-AI has since garnered over 38,000 stars on GitHub, ranking first in its niche globally. That same year, ZhenFund invested in Colossal-AI's angel round; the company has since completed four additional funding rounds.

In the accelerated computing赛道, how did Colossal-AI choose its technical path? And how did it achieve profitability? We hope this article offers some insights, and welcome your thoughts in the comments.

Author | Kexuan Zhu

Editor | Caixian Chen

If developing large model applications is like "gold mining," then the computing power and tooling infrastructure required to build large model foundations are the indispensable "shovels."

As the saying goes, in a gold rush, sell shovels. Everyone wants a piece of this AI wave — not only are tech giants scrambling to be "shovel sellers," but numerous startups have also spotted new opportunities.

Among them, in the accelerated computing赛道, Jinhui Yuan's SiliconFlow and Yang You's Colossal-AI stand out as typical representatives. Beyond them, there weren't many teams in China capable of building distributed systems in the early days.

However, despite being in the same赛道, the two have repeatedly chosen different forks in the road —

Looking at the timeline, Yuan, who began his entrepreneurial journey in 2017, chose to break monopolies and challenge the tech giant Meta. The OneFlow he led debuted as a "challenger" to PyTorch, building its training framework entirely from scratch.

You, who officially entered the arena in 2021, chose a more prudent and efficient approach — innovating and developing distributed computing on top of the mature PyTorch framework.

This was also the most obvious distinction between the two in their early focus on training.

Today, as the large model race enters its second half, prioritizing inference has become industry consensus. At this juncture, differences in their strategies have once again emerged.

Continuing their team's early approach, SiliconFlow's inference framework SiliconLLM remains a third system independent of mainstream frameworks vLLM and TensorRT-LLM, while Colossal-AI focuses on optimization based on its own Colossal-AI framework.

Notably, to advance commercialization, Colossal-AI has further expanded its product portfolio, officially launching the text-to-video large model Open-Sora. In contrast, SiliconFlow has not publicly released any large models.

In terms of cloud platform operations, SiliconFlow does not need to rent cloud resources or download models, while Colossal-AI has chosen to engage in compute leasing, supplemented by model training, fine-tuning, and inference acceleration.

What led to these different choices? And how are each playing the "shovel seller" role in the second half?

01 Standing on the Shoulders of Giants

In the field of distributed deep learning frameworks, truly breakthrough teams are few and far between. Yuan and You were among the earliest to set out.

In 2016, You began working in distributed computing. At the time, the industry mainstream was still focused on asynchronous distributed computing techniques.

That same year, Yuan also mentioned in discussions with industry peers that as deep learning model parameters grew larger, model training would outgrow what frameworks like TensorFlow, MXNet, or Caffe could handle.

But the AI field had yet to see extremely large-parameter deep learning models, so many considered this view unfounded.

In January 2017, Yuan officially set off with his team, personally naming and founding OneFlow in Beijing.

At that time, OneFlow redefined how distributed computing was implemented, making multi-GPU distributed system programming as intuitive and convenient as single-GPU programming.

The underlying framework OneFlow built, while sharing APIs with PyTorch, had every line of code from bottom-level operator implementation to the framework itself written by Yuan and his team.

So much so that in 2022, PyTorch's DiscreteTensor borrowed from OneFlow's GlobalTensor for its distributed capabilities.

A 2022 tweet from Soumith Chintala, one of PyTorch's founders

Flash back to June 2020, when OpenAI released GPT-3, the world's largest pre-trained language model at the time, validating the correctness of Yuan's early view.

It was also this year that You began a new chapter with Colossal-AI. Having just earned his PhD from UC Berkeley, where he researched high-performance computing.

Facing GPT-3's emergence, You had a prediction — large models would matter greatly in the future, and the bottleneck preventing their adoption across industries would definitely be computing costs. This sparked his idea to start a large model-related business.

This idea didn't materialize until 2021. In July, he founded Colossal-AI and pushed the boundaries of distributed computing further with his team.

Unlike Yuan's approach of building from scratch, You's Colossal-AI, while similarly targeting the accelerated computing赛道, chose to directly build its large model training and inference acceleration system Colossal-AI on top of PyTorch.

The underlying distributed API calls were also PyTorch; what You and his team mainly did was rewrite upper-level operators and optimize communication efficiency and memory usage, making distributed computing more efficient and easier to use.

Based on this, the Colossal-AI system provides a unified parallel training and inference system, helping developers seamlessly integrate multiple parallel techniques including data parallelism, pipeline parallelism, tensor parallelism, and sequence parallelism.

Essentially standing on the shoulders of giants, Colossal-AI reimplemented distribution on top of PyTorch, with work more closely aligned with the open-source community.

Explaining the reasoning, You once told AI Tech Review, "On one hand, building a strong open-source community genuinely creates greater value — even free, many people use it. On the other hand, the company ultimately wants to go public; essentially, AI's core competitiveness in B2B is about establishing strong trust-based relationships with users."

PyTorch's popularity also made Colossal-AI more readily accepted. By comparison, OneFlow was relatively niche, making it harder to attract developers — forming the early difference between the two.

Ultimately, OneFlow "lost" to PyTorch in the battle to replace it — on ecosystem grounds.

"Had highlights but not enough to turn the tide," Yuan once assessed. "PyTorch's ecosystem and upstream-downstream completeness — comprehensively speaking, building on PyTorch was definitely more conducive to product promotion."

Additionally, an industry insider told AI Tech Review, "OneFlow didn't rely on the open-source community; much of its foundation was self-built. So many companies whose models were written in PyTorch wouldn't likely use OneFlow, unless OneFlow partnered with that company or another major tech firm."

Despite this, Yuan remained quite optimistic: "Although we didn't achieve PyTorch's standard-setting status as an industry standard, we did explore a technological no-man's-land, something no one had explored, years ahead of time — which later became truly mainstream."

At root, technology is the "door opener" — both teams' technical capabilities are beyond question.

But technology alone is far from enough; how to make money is equally critical, and this remains the hardest "problem" to solve for AI赛道 startups.

In 2020, during the OneFlow era, Yuan and his team tried many approaches — launching large-scale model training open-source toolbox Libai (Li Bai), domain-specific acceleration solutions, and products like AI development platform OneBrain.

Later, having finally found a breakthrough, they lacked conditions or missed timing, making promotion difficult and commercialization slow. Ultimately, OneFlow failed to generate revenue.

Come 2021, a peak period for AI Infra and open-source investment, Hillhouse also invested in OneFlow, but Yuan didn't take much more money — he still hoped to refine the technology before raising further funding.

But opportunity waits for no one. By the next year, when the technology was dazzling enough, capital had already cooled first. On top of strength, Yuan ultimately lacked some luck.

In 2023, with ChatGPT's explosive debut, the "hundred-model war" fired its opening shot. Because large models have certain barriers to entry, Yuan, after comprehensive consideration of funding, resource integration, and commercialization issues, chose to partner with Huiwen Wang, merging OneFlow into Lightyears Away.

Their time fighting side by side wasn't long either. That same year, Lightyears Away was acquired by Meituan, and Yuan, firm in his entrepreneurial ideals, chose to leave and start anew.

Reflecting on the reasons, Yuan once said, "The technological curiosity from the OneFlow era was satisfied; what remained unsatisfied, the unfulfilled aspiration, was mainly on the commercial level. As a startup, ultimately it comes down to commercial success — letting customers vote with real money."

So he set out again with commercial ideals. Earlier this year, SiliconFlow was formally established.

Also in 2023, Colossal-AI's commercial situation was considerably more promising than OneFlow's.

"Colossal-AI's large model training approach is already making money," You previously told AI Tech Review. "We now have many Fortune Global 500 and Global 2000 clients, including several domestic startups as potential customers — Alibaba's Tongyi Qianwen, Baidu's Ernie Bot, MiniMax may have all used Colossal-AI."

Why was Colossal-AI able to turn a profit? Two reasons.

"First, Colossal-AI's prices are cheaper than competitors. Second, Colossal-AI doesn't just provide large model building capabilities, but also underlying AI Infra training capabilities," an informed source analyzed for AI Tech Review.

02 Opportunities in the Second Half

To date, the "hundred-model war" has entered its second half, yet software commercialization in China remains an unsolved "problem" for the industry.

However, Yuan maintains an optimistic mindset: the path to software commercialization in China isn't nonexistent — it's just that no one has yet摸索 out a clearly viable path.

Currently, from an industry consensus perspective, exploring products and business models based on software requires combining software with something users have no choice but to pay for.

Based on this, training-inference integrated machines and binding software with cloud/compute power have become convergent choices for SiliconFlow and Colossal-AI alike.

The integrated machine route has been validated and proven viable.

In the current domestic landscape, selling "shovels" alone won't convince many manufacturers. The best solution is to package training and inference into a complete toolkit, selling it together with large models.

Domestic clients prefer paying for integrated hardware-software solutions. This path works better than selling software alone; although hardware accounts for the majority of gross margin overall, it's beneficial for software sales.

This also aligns with Colossal-AI's transformation thinking — relying solely on a single training tool, however powerful, isn't enough to gain firm footing. At the end of last year, Colossal-AI also tried launching training-inference large model integrated machines, providing clients with comprehensive large model training-inference solutions.

Notably, riding Sora's "momentum," Colossal-AI further expanded its business territory, officially entering the text-to-video large model space.

In March this year, Colossal-AI announced the launch of Open-Sora, an open-source Sora-like architecture multimodal video model. Upon release, it garnered significant industry attention, capturing substantial market buzz. According to the company, Open-Sora can reduce replication costs by 46% and expand model training input sequence length to 819K patches.

By July, Open-Sora's latest open-sourced version 1.2 could already generate single-shot videos up to 16 seconds at 720p.

To enable interaction with Open-Sora, Colossal-AI also provided a one-click deployable Gradio application. Gradio, as a Python package, allows developers to automatically generate a web interface by defining model inputs and outputs.

Open-Sora launched by Colossal-AI

Great minds think alike — SiliconFlow has also entered the integrated machine space. However, their approach still differs somewhat from Colossal-AI's.

SiliconFlow's directional choice was relatively straightforward — directly partnering with others to make integrated machines, primarily integrating their product when server manufacturers build integrated machines, then having manufacturers pay.

At the same time, SiliconFlow itself has not publicly launched any large models.

Yuan once analyzed for AI Tech Review, "Now models are gradually converging — everyone's model architecture is almost the same. So our new business doesn't pursue extremely general models; the focus is supporting models with the greatest economic and commercial value."

On its large model API cloud service platform SiliconCloud, its text-to-video function uses Zhipu AI's open-source AI video generation model CogVideoX-2B.

Other functions including text chat, text-to-image, and image-to-image all use mainstream models including Llama3.1, Qwen2, GLM4, DeepSeek, Flux.1, SDXL, PhotoMaker, and others.

SiliconCloud text-to-video function page

This is also SiliconFlow's approach to profitability through cloud — launching a large model API pay-as-you-go model, using SiliconCloud API directly without renting cloud resources or downloading models, helping developers accelerate generative AI application development.

Currently, various overseas AI Infra companies can profit through cloud. Looking at China, this path also has certain viability.

Whether on public or private cloud, everything ties to compute power. All customers building products or applications must pay for GPUs, compute, and cloud — so software can be packaged with cloud or compute, profiting through service fees.

Following this path, Colossal-AI's cloud platform — Luchen Cloud (https://cloud.luchentech.com) — has chosen to engage in compute leasing, supplemented by model training, fine-tuning, and inference acceleration.

Services provided by Luchen Cloud

Stopping here, it's not hard to see that the core of the aforementioned solutions is the inference engine.

Currently, training's "ceiling" has been clarified through industry-wide push, while inference's actual level still has considerable gap from theoretical levels.

For example, large model training's Model FLOPs Utilization (MFU) theoretically maxes around 60%. NVIDIA and others through joint optimization can achieve 40-50%, leaving only 10-20% improvement space. But inference's improvement space is at least tenfold.

From a cost perspective, training large models' high barriers in funding, GPUs, and other aspects make suitable companies few and far between, with concentrated clients and strong bargaining power — hard for startups to commercialize. By comparison, massive compute isn't a prerequisite for entering the inference Infra field.

Additionally, training has phase limitations with relatively fixed datasets, while inference is continuous — once service goes live, data is endless, never stopping as long as users engage. Take OpenAI: during inference, it generates 1-2 trillion tokens daily; a week's generation exceeds training data volume.

Most critically, using large models doesn't necessarily require training, but does require inference — meaning inference's market is more dispersed and larger.

And as inference demand gradually rises, looking globally, the more mainstream inference engines include NVIDIA's TensorRT-LLM and UC Berkeley's open-source vLLM; many overseas AI Infra companies optimize on top of these two.

Colossal-AI maintains its early innovation-focused approach from the training era, choosing to research and iterate based on the Colossal-AI framework.

In May this year, it open-sourced an inference acceleration solution for the latest LLaMA-3 model, achieving over 40% higher throughput compared to the mainstream vLLM framework. Beyond text generation models, Colossal-AI's inference framework also supports optimization for various image generation models including Stable Diffusion3.

SiliconFlow, meanwhile, still harbors "ambitions" to challenge framework giants.

Different from OneFlow's era focus on general training frameworks serving deep model production, SiliconFlow places emphasis on the inference layer, serving large model applications. Its inference framework SiliconLLM, built from scratch as a third system, is completely independent of the two mainstream frameworks vLLM and TensorRT-LLM.

SiliconFlow website's performance comparison of SiliconLLM with vLLM and TensorRT-LLM

On this, Yuan once candidly stated in an interview, "Inference framework is contested ground everyone wants to control. Before SiliconFlow, our competitors were these framework giants."

At this stage, to capture the inference market, first making a name overseas is SiliconFlow's top choice after comprehensive assessment.

Compared to OneFlow's initial open-source approach, SiliconFlow beyond its open-source version has launched a paid version to achieve breakthroughs in monetization.

Early on, SiliconFlow mainly promoted large model inference engines. Because overseas payment habits and business models are more mature, promotion is relatively easier.

Overseas, a mature subscription software payment system already exists: users pay monthly by credit card, backend systems automatically send software, informing how to proceed with download and installation. Domestic client cooperation can only go through unsustainable project-based systems.

Domestic payment habits are affected by accounting systems — enterprises struggle to price intangible software. Domestic finance operates on budget systems, procuring fixed assets, while software is typically treated as service rather than fixed asset.

Meanwhile, even market-oriented domestic enterprises prefer advance pricing, favoring one-time purchases. Overseas doesn't favor prepayment, preferring "pay-as-you-go."

Additionally, domestic sales are channel-determined; just making an engine isn't enough — it needs to be a product form, so domestic requires resources for product commercialization exploration. Overseas prioritizes product strength; making globally competitive products sells overseas.

For SiliconFlow, initially forming a commercial闭环 with relatively quick results was also overseas, where the model has already been proven.

"Now almost daily there are emails from foreigners coming to discuss. The website explains how pricing works, but there are still other issues to discuss. They also ask if we're willing to use other methods —总之 there's quite a lot of cooperation," Yuan once introduced to AI Tech Review.

But expanding overseas markets also means needing stronger competitiveness than domestically — both opportunity and challenge —

The US has very strong companies in every AI Infra niche: AutoML for mobile deployment, TogetherAI and FireworksAI for cloud inference services, ModularML and TVM for compilers, various MPO companies for hardware acceleration.

Just in the inference framework direction, competition is needed with numerous startups including Tianqi Chen's OctoAI and Yangqing Jia's Lepton AI.

Recently, after making a name overseas, SiliconFlow has also prioritized launching SiliconCloud domestically (https://siliconflow.cn/siliconcloud), achieving fairly good growth momentum with daily token generation reaching tens of billions, currently having "landed" overseas as well.

For Colossal-AI, it has consistently mainly adopted a strategy of adapting to domestic and overseas business scenarios while developing synchronously, accumulating core client cases and user reputation both domestically and internationally.

First, based on open-source community's passive customer acquisition nature, Colossal-AI doesn't need active overseas market expansion; it currently has clients in China, Europe and America, the Middle East, and Southeast Asia.

Domestically, Colossal-AI currently focuses on traditional industry clients. In You's view, traditional automakers, pharmaceutical companies, oil companies, and financial institutions have long-term payment willingness. Ultimately, for AI to achieve落地, traditional industries are indispensable application scenarios.

Starting late last year, Colossal-AI also reached cooperation with Huawei.

In February this year, the two formally jointly launched an integrated AI development and deployment platform — ColossalAI Platform and Luchen Ascend Training-Inference Integrated Machine — empowering traditional enterprises to train and fine-tune private vertical large models locally through private data.

According to Colossal-AI's official testing, ColossalAI Platform can reduce large model pre-training costs by 50%, infrastructure costs by 10x, hardware requirement costs by 10x, and project launch time by 10x.

Recommended Reading