The Wild Side of High-Flyer: An Invisible AI Giant's Path to Large Language Models

暗涌Waves·May 24, 2023

Be wildly ambitious, and be wildly sincere.

By Lili Yu

Edited by Jing Liu

In the stampede toward large language models, High-Flyer Quant is probably the strangest outlier of all.

This was always destined to be a game for the few. Many startups have pivoted or retreated since the tech giants entered the fray, yet this quantitative hedge fund pressed ahead, alone and undeterred.

In May, High-Flyer christened its independent new venture into large models "DeepSeek," emphasizing its commitment to building truly human-level artificial intelligence. Its ambitions go beyond replicating ChatGPT — it wants to research and unlock more of the unknown mysteries of artificial general intelligence (AGI).

Moreover, in a field considered extraordinarily dependent on scarce talent, High-Flyer is trying to assemble a group of people with an almost obsessive drive, wielding what it considers its greatest weapon: the curiosity of a collective.

In quantitative finance, High-Flyer is a "top-tier fund" that once reached 100 billion RMB in assets under management. But the dramatic circumstances of its sudden prominence in this new wave of AI are worth noting.

When the shortage of high-performance GPUs among domestic cloud providers became the most direct constraint on the birth of generative AI in China, Caijing Eleven reported that no more than five domestic companies possessed over 10,000 GPUs. Aside from several major tech firms, one was a quantitative hedge fund named High-Flyer. Conventionally, 10,000 NVIDIA A100 chips are considered the computational threshold for training a large model from scratch.

In fact, this company rarely examined through an AI lens was already a hidden giant in the field. In 2019, High-Flyer Quant established an AI subsidiary. Its self-developed deep learning training platform "Firefly 1" cost nearly 200 million RMB to build and was equipped with 1,100 GPUs. Two years later, "Firefly 2" saw investment increase to 1 billion RMB, with roughly 10,000 NVIDIA A100 GPUs.

This means that, purely in terms of compute, High-Flyer secured its ticket to build something like ChatGPT earlier than many major tech companies.

Yet large models depend heavily on compute, algorithms, and data — so getting started requires $50 million, and a single training run costs over $10 million. Without tens of billions in capital, it's genuinely difficult to sustain. Despite all these hurdles, High-Flyer remains remarkably optimistic. Founder Liang Wenfeng told us: "What matters is that we want to do this, and we can do this. That makes us among the most suitable candidates."

This puzzling optimism stems first from High-Flyer's unusual path to growth.

Quantitative investing is an import from America, which means that nearly all top Chinese quant funds have founding teams with some experience at American or European hedge funds. High-Flyer is the sole exception: it was built entirely by a homegrown team, finding its own way forward.

In 2021, just six years after its founding, High-Flyer reached 100 billion RMB in scale and was crowned one of the "Quant Four Heavenly Kings."

This outsider's path to entry has made High-Flyer perpetually feel like a disruptor. Multiple industry insiders told us that High-Flyer "has always approached this industry with entirely fresh methods — whether in R&D, products, or distribution."

The founder of a leading quant fund believes that over the years, High-Flyer "has never followed some conventionally accepted path," but rather "done things the way they wanted to." Even when the approach was somewhat heretical or controversial, "they dared to say it out loud and then execute on their own ideas."

As for the secret of High-Flyer's growth, the firm internally attributes it to "hiring people without experience but with potential, and having an organizational structure and corporate culture that allows innovation to happen." They believe this will also be the secret weapon that allows large model startups to compete with the tech giants.

But the more crucial secret may lie with High-Flyer's founder, Liang Wenfeng.

While pursuing artificial intelligence at Zhejiang University, Liang was utterly convinced that "AI would definitely change the world" — in 2008, still an unpopular conviction.

After graduating, unlike his peers who joined major tech companies as programmers, he holed up in a cheap rental in Chengdu, enduring repeated failures as he experimented across numerous scenarios, eventually breaking into one of the most complex domains: finance. And so High-Flyer was born.

An interesting detail: in those early years, a similarly "crazy" friend working on "impractical" flying machines in a Shenzhen urban village once tried to recruit him. That friend later built a company now worth over $100 billion. Its name: DJI.

So beyond the inevitable topics of money, talent, and compute, we also spoke with Liang Wenfeng about what organizational structures allow innovation to happen, and how long human madness can sustain itself.

After more than a decade building his company, this rarely seen "tech nerd" founder is giving his first public interview.

Coincidentally, on April 11, when High-Flyer announced its entry into large models, it quoted a warning that French New Wave director François Truffaut once gave to young filmmakers: "You must have crazy ambition, and you must be crazy sincere."

The interview follows:

On Research, On Exploration

"Do the most important, most difficult things"

"Dark Currents": Not long ago, High-Flyer announced it would enter the large model race. Why would a quantitative hedge fund do something like this?

Liang Wenfeng: Our building large models actually has no direct connection to quant or finance. We've established a separate new company called DeepSeek to do this. Many people in High-Flyer's core team work in AI. We experimented with many scenarios before eventually breaking into finance, which is sufficiently complex. General artificial intelligence may be among the next most difficult challenges, so for us this is a question of how to do it, not why.

"Dark Currents": Will you train a general-purpose large model, or one vertical to a specific industry — say, finance?

Liang Wenfeng: We're building AGI, artificial general intelligence. Language models are likely a necessary path to AGI, and they've already shown preliminary AGI characteristics, so we'll start there and later expand to vision and other areas.

"Dark Currents": Because of the tech giants' entry, many startups have abandoned the direction of building purely general-purpose large models.

Liang Wenfeng: We won't prematurely design applications built on top of our model. We'll stay focused on the large model itself.

"Dark Currents": Many believe that for startups to enter after the tech giants have reached consensus is no longer good timing.

Liang Wenfeng: For now, it seems that neither the giants nor startups can quickly establish overwhelming technical advantages. With OpenAI showing the way, and everyone building on published papers and open-source code, by next year at the latest, both giants and startups will have their own large language models. Each has its opportunities. Existing vertical scenarios aren't controlled by startups, so this phase is somewhat unfriendly to them. But because these scenarios are ultimately fragmented and distributed as small needs, they're better suited to nimble startup organizations. In the long run, the barrier to building on large models will keep dropping — startups will have opportunities whether they enter now or anytime in the next 20 years. Our goal is clear: no verticals, no applications — just research, just exploration.

"Dark Currents": Why frame it as "research" and "exploration"?

Liang Wenfeng: Curiosity-driven. From a distance, we want to test some hypotheses. For instance, we suspect that the essence of human intelligence may be language — that human thought may itself be a linguistic process. You think you're thinking, but maybe you're just weaving language in your mind. This means something like human-level AI (AGI) could emerge from language models. More immediately, GPT-4 still holds many unsolved mysteries. While replicating it, we'll also do research to uncover them.

"Dark Currents": But research means much greater costs.

Liang Wenfeng: Pure replication can be done on top of published papers or open-source code, requiring only minimal training runs, or even just finetuning — very cheap. Research requires all kinds of experiments and comparisons, more compute, and higher demands on people, so it's more expensive.

"Dark Currents": Where does the research funding come from?

Liang Wenfeng: High-Flyer, as one of our backers, has ample R&D budget. We also have an annual budget of several hundred million RMB for donations, which previously went to charitable organizations — we can adjust that if needed.

"Dark Currents": But building a foundation-level large model requires at least $200-300 million just to get a seat at the table. How do we sustain continuous investment?

Liang Wenfeng: We're talking to various potential investors. From what we've seen, many VCs have reservations about research — they need exits, they want products commercialized quickly. Given our research-first approach, it's hard to get VC funding. But we have compute and an engineering team — that's half the chips already.

"Dark Currents": What business model scenarios have you mapped out?

Liang Wenfeng: We're thinking that later on, we can open-source and share most of our training results, which could tie into commercialization somehow. We want more people — even a small app — to be able to use large models at low cost, rather than having the technology concentrated in the hands of a few people and companies, creating a monopoly.

"Dark Currents": Some tech giants will also offer services later. What's your differentiation?

Liang Wenfeng: The giants' models may be tied to their platforms or ecosystems. We're completely free.

"Dark Currents": Either way, for a commercial company to undertake this kind of unlimited research investment seems somewhat crazy.

Liang Wenfeng: If you must find a business justification, there probably isn't one — it doesn't pencil out. From a business standpoint, basic research has terrible ROI. Early OpenAI investors weren't thinking about returns; they genuinely wanted to do this. What we're fairly certain of now is: since we want to do this, and we have the capability, at this moment in time, we're among the most suitable candidates.

The Ten-Thousand-GPU Reserve and Its Price

"Something thrilling perhaps can't be measured purely in money"

"Dark Currents": GPUs are the scarce resource in this ChatGPT startup wave, yet you had the foresight to stockpile 10,000 as early as 2021. Why?

Liang Wenfeng: Actually, from the first GPU to 100 in 2015, 1,000 in 2019, then 10,000 — this happened gradually. Before a few hundred GPUs, we colocated at IDC facilities. Beyond that scale, colocation couldn't meet our needs, so we started building our own data centers. Many people assume there's some hidden business logic, but mainly it was curiosity-driven.

"Dark Currents": What kind of curiosity?

Liang Wenfeng: Curiosity about the boundaries of AI capabilities. For many outsiders, the ChatGPT wave was a huge shock; for insiders, the shock came in 2012 with AlexNet, which kicked off a new era. AlexNet's error rate was far below other models at the time, reviving neural network research that had been dormant for decades. While specific technical directions kept changing, the combination of models, data, and compute remained constant — especially after OpenAI released GPT-3 in 2020, when the direction was clear: massive compute needed. But even in 2021, when we invested in Firefly 2, most people still couldn't understand.

"Dark Currents": So you've been consciously building compute reserves since 2012?

Liang Wenfeng: For researchers, the thirst for compute is endless. After small-scale experiments, you always want to try larger scales. Since then, we've consciously deployed as much compute as possible.

"Dark Currents": Many assume this computing cluster was for quantitative trading — using machine learning for price prediction?

Liang Wenfeng: If we only did quantitative investing, a small number of GPUs would suffice. We do extensive research beyond investing — we want to understand what paradigms can fully describe financial markets, whether there are more concise representations, where different paradigms' boundaries lie, whether these paradigms have broader applicability, and so on.

"Dark Currents": But this process is also a money-burning exercise.

Liang Wenfeng: Something thrilling perhaps can't be measured purely in money. It's like buying a piano for your home — first, you can afford it; second, it's because you have a group of people eager to play music on it.

"Dark Currents": GPUs typically depreciate at about 20% annually.

Liang Wenfeng: We haven't calculated precisely, but probably not that much. NVIDIA GPUs are hard currency. Even old cards from many years ago still have users. When we retired our older cards, they fetched decent prices on the secondary market — we didn't lose much.

"Dark Currents": Building a computing cluster involves maintenance costs, labor costs, even electricity — all substantial expenses.

Liang Wenfeng: Electricity and maintenance are actually quite low — about 1% of hardware cost annually. Labor costs aren't low, but labor is also investment in the future, the company's greatest asset. The people we hire tend to be relatively unpretentious, curious, with opportunities to do research here.

"Dark Currents": In 2021, High-Flyer was among the first in Asia-Pacific to receive A100 GPUs, even before some cloud providers. Why?

Liang Wenfeng: We did early pre-research, testing, and planning for new cards. As for some cloud providers, from what I understand, their demand was previously fragmented. It wasn't until 2022, with autonomous driving creating demand for machine rentals for training and the ability to pay, that some cloud providers built out their infrastructure. It's hard for giants to do research and training purely for their own sake — they're more driven by business needs.

"Dark Currents": How do you see the competitive landscape for large models?

Liang Wenfeng: The giants definitely have advantages, but if they can't quickly apply what they build, they may not sustain it — they need to see results more urgently.

Some leading startups also have solid technology, but like the previous wave of AI startups, they face commercialization challenges.

"Dark Currents": Some people suspect a quant fund emphasizing its AI work is just pumping up its other business.

Liang Wenfeng: But our quant fund has basically stopped raising capital from outside.

"Dark Currents": How do you distinguish true AI believers from speculators?

Liang Wenfeng: Believers were already here before, and will still be here after. They'll buy GPUs in batches or sign long-term agreements with cloud providers, rather than renting short-term.

How to Truly Enable Innovation

"Innovation usually generates itself — it's not deliberately arranged, and certainly not taught"

"Dark Currents": How is DeepSeek's hiring progressing?

Liang Wenfeng: The initial team is assembled. Early on, with insufficient staffing, we'll temporarily borrow some people from High-Flyer. We started recruiting when ChatGPT-3.5 took off late last year, but we still need more people to join.

"Dark Currents": Talent for large model startups is scarce. Some investors say the right people may only exist in AI labs at giants like OpenAI or Facebook AI Research. Will you recruit overseas for such talent?

Liang Wenfeng: If you're pursuing short-term goals, finding experienced people makes sense. But with a long-term view, experience matters less — foundational ability, creativity, passion matter more. From this perspective, there are plenty of suitable candidates domestically.

"Dark Currents": Why does experience matter less?

Liang Wenfeng: It's not necessarily true that only people who've done something can do it. A High-Flyer hiring principle: we look at capability, not experience. Our core technical positions are mostly filled by fresh graduates or people one to two years out of school.

"Dark Currents": In innovative work, is experience actually a hindrance?

Liang Wenfeng: Experienced people will unthinkingly tell you how something should be done. Inexperienced people will fumble around, think carefully about how to do it, and find a solution that fits the actual current situation.

"Dark Currents": High-Flyer entered this industry as a complete outsider with no finance DNA, and reached the top tier within a few years. Is this hiring principle one of the secrets?

Liang Wenfeng: Our core team, myself included, initially had no quant experience — that's quite unusual. I wouldn't call it the secret to success, but it's part of High-Flyer's culture. We don't deliberately avoid experienced people, but we focus more on capability.

Take our sales roles as an example. Our two main salespeople were both industry outsiders. One previously did foreign trade in German machinery; the other wrote code in back-office operations at a securities firm. When they entered this industry, they had no experience, no resources, no connections.

Now we may be the only major private fund that primarily sells direct. Direct sales means no intermediary fees — at the same scale and performance, higher margins. Many have tried to copy us without success.

"Dark Currents": Why have many tried to copy you without success?

Liang Wenfeng: Because this alone isn't enough to make innovation happen. It needs to match the company's culture and management.

In fact, they produced nothing the first year, and only started showing results the second year. But our evaluation standards differ from typical companies. We have no KPIs, no assigned tasks.

"Dark Currents": What are your evaluation standards then?

Liang Wenfeng: Unlike typical companies that focus on client order volume, our salespeople's compensation isn't pre-calculated based on how much they sell. We encourage salespeople to develop their own networks, meet more people, build greater influence.

Because we believe a trustworthy, upright salesperson may not get clients to place orders immediately, but will make you feel they're reliable.

"Dark Currents": After finding the right people, how do you get them up to speed?

Liang Wenfeng: Give them important things to do, and don't interfere. Let them figure it out themselves, let them perform.

Actually, a company's DNA is hard to copy. Take hiring inexperienced people — how do you judge their potential? After hiring them, how do you help them grow? These can't be directly imitated.

"Dark Currents": What do you think are the necessary conditions for building an innovative organization?

Liang Wenfeng: Our conclusion: innovation requires minimal intervention and management, giving everyone space to freely experiment and room to fail. Innovation usually generates itself — it's not deliberately arranged, and certainly not taught.

"Dark Currents": This is a very unconventional management approach. In this case, how do you ensure someone works efficiently and in the direction you want?

Liang Wenfeng: Ensure value alignment when hiring, then use corporate culture to keep everyone moving in step. Of course, we don't have a written corporate culture — because anything written becomes an obstacle to innovation. More often, it's leaders leading by example. How you make decisions when facing something becomes a kind of standard.

"Dark Currents": In this wave of large model competition, will the more innovation-friendly organizational structure of startups be the breakthrough point for competing with giants?

Liang Wenfeng: By textbook methodology, startups doing what they're doing now wouldn't survive.

But markets change. The real decisive force is often not existing rules and conditions, but the ability to adapt and adjust to change.

Many large companies' organizational structures can no longer respond and act quickly. And they easily let previous experience and inertia become constraints. Under this new wave of AI, a new batch of companies will certainly emerge.

True Madness

"Innovation is expensive and inefficient, sometimes accompanied by waste"

"Dark Currents": What excites you most about doing this?

Liang Wenfeng: Figuring out whether our hypotheses are true. If they are, that's exciting enough.

"Dark Currents": For this large model hiring push, what are your non-negotiable criteria?

Liang Wenfeng: Passion, solid foundational ability. Nothing else matters that much.

"Dark Currents": Are such people easy to find?

Liang Wenfeng: Their passion usually shows — because they genuinely want to do this, so these people are often looking for you at the same time.

"Dark Currents": Large models may require endless investment. Do the costs give you pause?

Liang Wenfeng: Innovation is expensive and inefficient, sometimes accompanied by waste. That's why innovation only emerges after economic development reaches a certain level. When you're very poor, or in industries not driven by innovation, cost and efficiency are paramount. Look at OpenAI — they burned through a lot of money before getting results.

"Dark Currents": Do you feel you're doing something crazy?

Liang Wenfeng: I don't know if it's crazy, but the world contains many things that can't be explained by logic. Like many programmers who are fanatical contributors to open-source communities — exhausted after a long day, still contributing code.

"Dark Currents": There's a kind of spiritual reward in this.

Liang Wenfeng: Like hiking 50 kilometers — your body is wrecked, but your spirit is fulfilled.

"Dark Currents": Do you think curiosity-driven madness can continue indefinitely?

Liang Wenfeng: Not everyone can be crazy their whole life. But most people, during their younger years, can devote themselves to something with completely no utilitarian purpose.