Exclusive | The General-Purpose LLM Startup Wave: The First Battle Is About to End

暗涌Waves·April 7, 2023

Contestants, take your positions.

By Lili Yu, Lixin He

Edited by Jing Liu

Lightning fast.

An Yong Waves has learned that AI startup MiniMax is raising a new round at a valuation exceeding $1 billion, with the deal "currently closing." Sources close to the matter revealed that "all existing shareholders are oversubscribing, and there are roughly four or five new investors." Previously, MiniMax completed two funding rounds with investors including miHoYo, Hillhouse Venture Capital, Yunqi Capital, and Mingshi Capital.

MiniMax currently holds the highest valuation among large language model startups. Beyond it, other LLM companies drawing investor attention include Wang Huiwen's Lightyear AI, Zhipu AI, Yang Zhilin's new venture, and Harry Shum's new company.

Apart from Lightyear AI, which LatePost reported is launching a second round at a pre-money valuation of roughly $1 billion, our sources indicate that other companies "also have funds committed."

Among them, MiniMax and Zhipu AI were founded in 2021 and 2019 respectively. Their standout advantage is "getting an early start and already having relatively mature large model products," which puts their valuations further ahead.

Multiple investors noted that current LLM startup valuations are primarily driven by the capital required to execute and the caliber of the team.

The large model value chain divides into three layers: the infrastructure layer providing compute power, the model layer, and the application layer. Startup opportunities currently concentrate in the model and application layers. Large models themselves split into general-purpose models offering API access like OpenAI, and differentiated models fine-tuned from open-source foundations.

One investor told us that building the former type requires $50 million just to get started. To gain meaningful market presence, securing nine-figure funding quickly becomes necessary. A partner at another investment firm told An Yong Waves that general-purpose LLM entrepreneurship depends on heavy resources and strong capital. For both investors and founders, this will be "a game for the few."

Multiple investors told An Yong Waves that although several deals remain in process, the market broadly believes the window for general-purpose LLM entrepreneurship has essentially closed, with the first round of capital arms-racing largely winding down.

In the view of a partner at a dollar fund investing in AI, startup opportunities often lie in entering before tech giants reach consensus. Once giants align, unless you're a flag-bearer on Wang Huiwen's level, or possess exceptional product sensibility later — someone who can define killer products — it's nearly impossible to win small against big. For VCs, while some still believe this is a ten-times-bigger opportunity than mobile internet and tolerate high valuation ceilings, most feel that "the top tier is already unaffordable, and the rest need more watching."

Barely twenty days since GPT-4's release, China's general-purpose LLM entrepreneurship is already approaching its first inflection point.

Those at the Table

Early leaders in any venture wave rarely become ultimate champions. The group-buying wars and ride-sharing era proved this repeatedly. But many believe that given LLM entrepreneurship's higher technical barriers and no-less-demanding capital and resource requirements, come-from-behind victories or dark-horse stories seem less likely.

Multiple investment firm partners told An Yong Waves that while LLM entrepreneurship appears bustling, fewer than 10 companies are actually at the table, with participating VC institutions also scarce and concentrated largely among dollar funds.

These companies roughly divide into two camps: the entrepreneurial camp from AI companies and tech giants, and the academic camp from universities and research institutions.

Leading the valuation race is the perpetually mysterious MiniMax. Its core team hails primarily from AI teams at SenseTime, the Chinese Academy of Sciences, and Uber.

Beyond this, the internet entrepreneur camp includes Wang Huiwen's Lightyear AI, Sogou founder Wang Xiaochuan's Wuji Intelligence, former Kuaishou AI R&D core figure Li Yan's Yuanshi Technology, former JD AI chief Zhou Bowen Xiangyuan Technology, and Kai-Fu Lee's Project AI 2.0.

The loudest buzz surrounds Lightyear AI. After ZhenFund and Source Code Capital came aboard, An Yong Waves learned that Wang Xing, 5Y Capital, and others joined its latest round.

Wang Xiaochuan's story is more singular. After stepping down as Sogou CEO in 2021, he immersed himself in life sciences and medicine, backing companies like Xiaolu Traditional Chinese Medicine, DeepCare, and the Hot Intestine Research Institute. As recently as mid-2022, he was researching smart pillows. But after the 2023 Spring Festival, he pivoted decisively to LLM entrepreneurship.

There's also former JD Group VP and Canadian Academy of Engineering foreign academician Mei Tao's ZhiXiang Future. ZhiXiang Future has already completed a seed round led by Alpha Startups, with multiple USTC alumni entrepreneurs and AI luminaries joining.

Among "academic camp" LLM entrepreneurs, Tsinghua dominates: producing Jie Tang's Zhipu AI, Zhilin Yang's Recurrent AI, Fanqiu Qi's DeepLang Technology, and Guoyang Zeng's ModelBest — the two companies Wang Huiwen recently attempted to acquire.

Broadly speaking, Tsinghua's NLP entrepreneurs trace three lineages: the Knowledge Engineering Group (KEG) led by Jie Tang and Juanzi Li, the Natural Language Processing and Social Human Computing Lab (THUNLP) led by Maosong Sun, and the Conversational AI (CoAI) research group led by Zhu Xiaoyan and her student Minlie Huang.

Under Tang and Li's lineage emerged Zhipu AI. The same day GPT-4 launched, Tang announced on Weibo that ChatGLM, a dialogue bot based on a 100-billion-parameter model, was opening invitation-only beta testing. Tang's student Zhilin Yang, who had participated in Huawei's "Pangu" large model development, has also made his new company highly sought-after.

In An Yong Waves interviews, one investor described Yang as having "world-class scientific vision and unique instinct for technical direction." Sources told us that Sequoia China, Capital Today, Coatue, and others have expressed investment interest in Yang's new company, with even Qi Lu participating personally.

Under THUNLP, China's earliest NLP research lab, emerged Maosong Sun, Zhiyuan Liu, Fanqiu Qi of DeepLang, and Guoyang Zeng of ModelBest.

Among them, Sun and Liu's CPM model became the predecessor to the Beijing Academy of Artificial Intelligence's "Wudao·Wenyuan." Qi of DeepLang and Zeng of ModelBest were the two companies Wang Huiwen recently attempted to acquire.

Under the CoAI lineage, Minlie Huang's Lingxin Intelligence launched its first product AI Utopia in 2022. After completing an angel+ round late last year including Zhipu AI, it recently announced a Pre-A round.

Beyond universities, China's LLM table includes entrepreneurs from non-profit AI research institutions with deep domain expertise, particularly IDEA (International Digital Economy Academy) and the Beijing Academy of Artificial Intelligence.

The highest-profile among these is a new company advised by IDEA founder Harry Shum and founded by IDEA Cognitive Computing and Natural Language Research Center head Jiaxing Zhang. In November 2019, after Microsoft bet on OpenAI's GPT models, Microsoft's top AI executive Harry Shum announced his departure. His final work on Microsoft's general AI models was the chatbot Xiaoice.

There's also Lan Zhenzhong's Westlake Xinchen from Westlake University's School of Engineering, and Zhou Ming's Langboat Technology from Microsoft Research Asia. The former recently completed a new multi-million-dollar Pre-A round with investors including Baidu Ventures, KaiTai Capital, and the Westlake Education Foundation Sustainable Development Platform.

Paths and Endgames

For general-purpose LLM startups, near-term pressure comes mainly from ChatGPT-4's release and tech giants' encirclement.

Following Baidu's Ernie Bot launch, Tencent is accelerating its "Hunyuan" large model, Alibaba will announce large model progress at its April cloud summit, Huawei is reportedly releasing its Pangu large model benchmarking ChatGPT in April, and ByteDance — with its massive consumer image and video data — is launching its own model in June. Indeed, on March 22, former Alibaba M6 model lead Hongxia Yang joined ByteDance AI Lab to work on language generation models.

In one dollar fund AI partner's view, startup opportunities beyond entering before giants reach consensus require breaking through via non-consensus approaches.

He believes that while OpenAI currently defines itself primarily as an underlying model services provider, having established ChatGPT as its MVP, it will likely both build models covering vertical scenarios and create a consumer product redefining interaction to capture new consumer traffic.

If this happens, it means startups could break tech giants' current scenario advantages in the scramble for new contexts, with possibilities for startups to partner with mini-giants across different domains.

Indeed, many startups are abandoning pure API-service general large model directions, instead using self-trained models, open-source models, or even smaller models to create their own scenarios and build application products.

On March 22, Wang Huiwen posted on Jike recruiting product managers. Days later, former Sogou Input Method product manager Ma Zhankai — known as "father of Sogou Input Method" — announced his joining. Some investors previously speculated this represented Wang's team direction adjustment after GPT-4's release and giant encroachment. Per LatePost, both Wang Huiwen's Lightyear AI and Wang Xiaochuan's new company are choosing to build both large models and model-based applications.

Of course, startup direction choices often correlate with funding amounts. Huachuang Capital investor Jin Zhang told An Yong Waves that general large model costs are too high; some less-funded companies will likely pivot to smaller models and applications based on open-source with accumulated data and user scenarios.

If OpenAI ultimately doesn't launch such a consumer product, one dollar fund AI partner believes giant-startup competition becomes a different scenario: either exit, get acquired, or stay small and beautiful in niche domains.

Beyond scenarios, startup-giant competition centers on data, algorithms, and compute. In this investor's view, since data's core isn't volume but quality and engineering capability, and algorithms can gradually catch up, startups and giants aren't uncompetitive on data and algorithms.

As for compute competition and rising costs, he believes this will be a shared challenge for giants and startups alike.

Typically, 10,000 NVIDIA A100 chips represent the compute threshold for building quality large models, while handling GPT-3.5's 180 billion parameters may require 20,000 GPUs. This is partly why many believe only cloud providers qualify for LLM entrepreneurship.

Yet domestic cloud providers mainly hold mid-to-low performance GPUs. After A100s were restricted by the US, many companies had to use the "neutered" A800 alternative. Its lower data transfer rates and smaller memory necessarily impact training speed.

As the arms race escalates, whether startups can secure supply becomes questionable. One investor told An Yong Waves that A100 scarcity has driven prices from the official ~$10,000 (~RMB 70,000) to RMB 80,000–90,000, even RMB 100,000 per chip.

For giants, the transition from thousands to tens of thousands of cards also poses massive system architecture challenges. Beyond compute costs, there are hardware procurement, operational costs, and near-term commercialization difficulties. A dollar fund investor long observing Silicon Valley told An Yong Waves, "Unless you're a ten-billion-dollar company, it's hard to keep playing."

Shixiang, a global tech investment fund that has participated in overseas large language model projects, founder Guang Mi notes that near-term AI competition faces several inflection points: at the model layer, key milestones include announced first-round funding from table participants, signaling the end of LLM first-round windows; and the launch of China's first self-developed model reaching GPT-3.5 capability, capturing developer mindshare and ecosystem first. At the application layer, the key milestone is "the first killer app emerging" — some AI-native application rapidly reaching tens of millions of active users.

As for this game's endgame, most investors believe "it'll definitely be an oligopoly of two or three." The dollar fund AI partner believes stretched to ten years, it will have higher concentration than cloud computing — "maybe four or five left in China" — but with at least one "hundred-billion-dollar company" fully integrating AGI "from underlying chips to consumer applications to hardware robots."