Copying homework is fine, but do it with some dignity.

暗涌Waves·November 15, 2023

Keeping the original name, or at least adding a disclaimer — that's a rare bit of decency.

By Lili Yu

Edited by Jing Liu

Yesterday, a screenshot of a WeChat Moments post by Yangqing Jia, former chief AI scientist at Alibaba, circulated widely. Jia wrote that a friend had told him a certain domestic large language model was nothing more than the LLaMA architecture with a few variable names changed. It didn't take long for internet sleuths to discover that a developer on Hugging Face, the open-source community for large models and datasets, had raised a similar concern: "This model uses Meta LLaMA's architecture, only modifying a tensor." LLaMA is an open-source large language model from Meta, Facebook's parent company, that is free for commercial use.

Jia's post gained traction largely because it punctured an open secret in AI circles: so-called self-developed large models are, in fact, heavily "watered down." An investor once told Anyong Waves: "Think about it — launching a large model in two months? That just doesn't add up." As early as May at the Waves conference, Professor Zhiwu Lu of Gaoling School of Artificial Intelligence at Renmin University pointed out that the so-called "spring of domestic large language models" was largely an illusion of many companies fine-tuning foreign foundation models.

Though in reality, fine-tuning seems to be the only viable path. The reasons can be macro: OpenAI is so dominant that unless you can surpass it, whatever you build likely won't be worth much, or might even fall short of open-source alternatives. The reasons can also be granular: this is a game where you need nearly $200-300 million just to get a seat at the table, making it virtually impossible for a startup to train a large model from scratch.

In yesterday's discussions, young AI researcher Yao Fu also shared his perspective in a group chat. He argued that "criticizing a model for lacking innovation simply because its architecture hasn't changed would be unfair to any model," since all models build iteratively on what came before — "the architectures are all more or less similar, but the results are completely different."

Some investors noted that on Hugging Face, it's common to find models with identical architectures but different names, because most modifications are to training methods and data ratios.

In a subsequent statement, Jia further clarified: his criticism wasn't about keeping the same model architecture, but about arbitrarily changing names. The problem with renaming is that code originally compatible with LLaMA would work out of the box, but now requires significant rework to adapt to the new names.

This controversy may have been accidental, but it reveals the tension between speed and blurred boundaries in the race to catch up with OpenAI.

For China's AI entrepreneurs — especially application-layer founders — as followers of this new generation of AI technology, they are entering an unprecedentedly ambiguous zone where it's increasingly difficult to distinguish "you" from "me" versus OpenAI.

The recent OpenAI Developer Conference served as the latest catalyst. After ChatGPT 3.5's explosive debut late last year, China's AI entrepreneurs went through several waves of frenzied activity. Initially, both major tech companies and startups shared an obsession with building large models themselves. But after GPT-4's rapid release and the aggressive moves by tech giants, most quickly abandoned the foundation model route — except for Minimax and Zhipu AI, which "started early and already had relatively mature large model products," plus a handful of others.

Even the then-high-profile Lightyear Beyond, founded by Huiwen Wang, and Xiaochuan Wang's Baichuan Intelligence, soon adopted a dual-track approach of both large models and model-based applications.

Yet the OpenAI Developer Conference, dubbed "AI's Spring Festival Gala," foretold the inevitable futility for a wave of startups. The conference featured not only the pre-leaked GPT-4 Turbo and more powerful new capabilities, but also the launch of the GPT Store and custom GPTs (GPTs), allowing users to create their own GPTs.

This means that in applications — the arena where Chinese entrepreneurs have historically excelled — there may soon be nothing left standing: without quickly securing vertical scenarios and data, you risk being swallowed whole at any moment.

This is the predicament of this generation of AI followers: defining next-generation AI application products necessarily requires working within the capability boundaries of large models, but the crux is that these boundaries are constantly and rapidly evolving, with OpenAI always one step ahead.

With the front end in constant flux, back-end applications are like towers built on sand.

In May this year, a well-known first-generation product manager had expressed full confidence, stating that Chinese product managers had already sprung into action after ChatGPT 3.5's release. But after this OpenAI conference, when asked "how much opportunity is left for AI entrepreneurs," he could only respond with a wry, laughing-crying emoji.

In the transmission chain of the internet and mobile internet eras, from "copy to China" to "copy from China," Chinese founders relied on massive markets and user bases to cultivate an army of product managers — the entire process was seamless and silky smooth. In this new AI era, that perfect synergy has clearly been disrupted. The new generation of product managers needs not just product skills, but deep understanding of large models and data.

Of course, there's another well-known variable: the external environment where the power to define technology and product boundaries has also become ineffective.

So in the current moment, how should a Chinese AI entrepreneur — especially an application-focused company — find a way forward?

An optimist would tell you that large models will certainly be regionally segmented, so Chinese large models won't necessarily compete directly with OpenAI.

Second, AI-native applications, communities, and companies that master vertical scenarios and data will still have room to maneuver and breathing space. In fact, many application-layer entrepreneurs come from vertical domains like education, healthcare, and gaming. AI progress is also further igniting fields like autonomous driving, electric vehicles, and robotics.

Many cite the concept of "data moats": one type is accumulated non-public vertical industry data, while ChatGPT mainly uses general public data; another is accumulated user private data — "the more it knows you, the better it understands you."

This logic likely holds. Because in the future battle for vertical scenarios and data, there will inevitably be a clash between AI entrepreneurs wielding novel technology and traditional enterprise service companies or vertical industry incumbents armed with data and customer resources.

Let's return to where we began. An AI investor once posed a question to us: "Before OpenAI, this unprecedented new species, who isn't the one copying homework?" From imitation to innovation is the inevitable path for most latecomer regions, and indeed for most people's lives.

More pragmatically, the question should be: what constitutes more ethical "homework copying"?

In rapidly shifting innovation ecosystems, things are often swift and boundary-blurring. Excessive "demands for standardization" can, in some sense, hinder efficiency and even stifle innovation. But even so, there should be a bottom line.

As Jia requested at the end of that original WeChat post: "Dear esteemed leaders, if you're just using an open-source model architecture, please have the decency to keep the original name, so we don't have to do a bunch of work just to adapt to your renamed version..."

Keeping the original name, or adding a simple declaration — that's a rare measure of decency in this era.

Image source | IC photo

Layout | Yunxiao Guo