After Raising Hundreds of Millions in Funding, Mei Tao on How This Generation of AI Entrepreneurs Should Build

暗涌Waves·December 24, 2024

When the wave comes, jump with it.

By Lili Yu

"An Yong Waves" has learned exclusively that HiDream.ai, an AI video generation startup, has raised a new Series A round led by state-backed funds with Hefei Industry Investment as the lead investor, totaling hundreds of millions of RMB, following its Pre-A round led by Dunhong Capital. Other participants include Anhui Province AI Mother Fund and Hubei Yangtze River Film Group. The company had previously received lead investments from Alpha公社 and iFlytek.

HiDream.ai was the first company in the world to launch text-to-video AI. From the outset, founder and CEO Tao Mei did a careful calculation: compared to large language models, multimodal models represent an absolute reduction in dimensionality when it comes to dependence on compute and resources; and from a commercialization perspective, they could move faster and earlier. This seemed like a more rational, pragmatic kind of romance — but reality has proven far colder than imagined. From Sora at the start of the year, to Kling mid-year, to Google Veo 2, video generation in 2024 has become a battlefield no less frenzied than large language models.

Even so, entrepreneurship remains an temptation that Mei's generation of AI researchers can hardly refuse — AI has never been so close to business and reality.

A graduate of the University of Science and Technology of China (USTC), Mei spent 12 years at Microsoft, reaching the peak of academia: publishing over 300 papers in multimedia analysis and computer vision, winning best international paper awards 15 times. He became an IEEE Fellow and foreign member of the Canadian Academy of Engineering, as well as chief scientist for major AI projects under China's Ministry of Science and Technology's "Science and Technology Innovation 2030" initiative.

This experience also revealed to him the chasm between technology and product, ultimately convincing him he needed to bridge that gap. His five years at JD.com after 2018 marked his entry into industry. As JD Vice President and Deputy Dean of JD Exploration Research Institute, he began exploring the path from technology to commercialization. HiDream.ai, which he founded afterward, ties all of this together more intimately.

Mei's entrepreneurial circumstances resemble a slice of this era's AI founders: when embracing product, you cannot abandon the model, or you risk being swallowed; when testing the domestic market, you cannot abandon going global, because China's consumer market contains structural challenges that startups cannot pry open. As for fundraising, in the current capital winter, it often means founders must feed confidence back to their investors.

These realities have also made Mei acutely aware of the true difference between being an executive at a large company and being a founder — with the former, there's always someone behind you; now "there's no one behind you," and "every problem comes to you, and you have to handle them all."

Below, after more than a year of entrepreneurship, Mei shares his reflections on fundraising, commercialization, and more:

Video Generation Is Indeed Closer to Commercialization

  1. Sora officially launched not long ago, but its overall capabilities were roughly what we expected. Objectively speaking, in the current video generation landscape, OpenAI no longer holds significant advantage. When it first emerged, though merely a demo, it transformed the entire methodology; but today, from a product deployment perspective, whether overseas or domestic, other products have largely caught up.

  2. The video generation track became crowded this year. In June, Kling and Luma AI launched; we announced our new model at the July Shanghai World AI Conference. August saw MiniMax's Hailuo, and recently World Labs and Google Veo 2, which have even transitioned from image processing to 3D. The fierce competition stems from this track's shorter path to commercialization and faster product deployment compared to large language models.

  3. Last year's actual global AIGC revenue was roughly $20 billion, with 50-60% coming from video and image generation, or image/video-related tool revenue; 30% was large language model-related, such as chatbot revenue. So many companies are pivoting to this track — it has become essential territory for large model companies.

  4. As a startup, we won't compete head-on with OpenAI, ByteDance, and other tech giants. We need to achieve algorithmic innovation through distinctive approaches, and solve the last-mile problems for vertical industries, occupying users' mindshare through product and closed-loop value. Giants have compute advantages, especially in C-end traffic, but they must answer to financial statements, so they will focus tightly on mainstream track businesses. Their products must serve their existing mainstream products — whether ByteDance's Jimeng or Kuaishou's Kling, both must serve their existing creator ecosystems.

  5. We absolutely will not repeat what giants are doing along their paths. We have our own specialized, vertical domains. Previously we operated on a "1+3+N" framework: one large model, three core products, plus many scenario ecosystems. Going forward, we will release a new multimodal understanding model benchmarked against GPT-4o, making that "1" thicker and broader.

  6. On the model side, we independently developed the world's first commercially available billion-parameter video generation large model, benchmarked against OpenAI's Sora. We possess China's most comprehensive multimodal copyrighted corpus: hundreds of thousands of hours of copyrighted video material and tens of thousands of licensed IPs. We cover 70% of domestic film and television data, and have generated hundreds of millions of AIGC derivative content pieces, currently seeing broad application in film/television, cultural tourism, marketing, and other scenarios. As of end-November, we have cumulatively served over 10 million users and 40,000 enterprises across more than 100 countries and regions, with monthly recurring revenue achieving substantial growth.

  7. Meanwhile, we are about to release a new Mixture-of-Experts (MoE) model. During training, it incorporates not only DiT (Diffusion Transformer) architecture but also AR (Auto-Regressive) architecture, combining both strengths — achieving the visual generation effects of DiT while solving the token discretization problem in AR architecture. We have already validated this on images.

From a holistic model perspective, we first did generation, then understanding. In the future we will have a unified grand architecture that fuses understanding and generation models into one unified framework, currently still in experimental stages. Further out, we hope to transform our accumulated, most comprehensive domestic copyrighted video materials into an AI video search service.

  1. Beyond tech giants, foundation model companies pivoting to this track also have unique advantages — for example, their experience with 10,000-card cluster architectures. But when it comes to technical approaches to video generation and understanding of data, we multimodal-native startups are more vertical, more specialized.

Moreover, the video generation market is vast: some companies excel at animation style, others at photorealistic style, others at cinematic or 3D. No single vendor can do everything well, and different companies' user bases don't fully overlap. Therefore, the track's crowding won't affect our progress at our own pace.

Tuition Paid on the Road to Commercialization

  1. They say this generation of AI entrepreneurs must, from Day 1, reach for the stars while keeping feet on the ground. From our first day, we've operated with a strong sense of crisis, constantly thinking about how to find product-market fit. We've moved relatively early and fast on commercialization. Though we haven't raised the most money, we've thought through every dollar spent, every person hired.

  2. This also stems from my training at JD.com. JD is a retail business with relatively low gross margins, so the company culture emphasizes精细化运营. Often, leadership would apply极限思维 — using minimum resources to build a business. Additionally, the three product essentials — cost, efficiency, and experience — were repeatedly emphasized as indispensable. This holds true for any company, any product. We've experimented extensively with commercialization, paid some tuition, and gradually found our footing.

  3. For C-end products, we must consider how to solve the "double non-hundred" problem. Current AIGC products face two "non-hundred" issues: first, users cannot use the product to 100% of its potential; second, models cannot generate results matching 100% of user expectations. Thus, AIGC products currently need to cross two chasms: from technology early adopters to professional users, and from professional users to general users. Our C-end product growth momentum is strong; we recently appeared on the 2024 China AI Product Ranking's overseas product potential award list.

  4. As for enterprise services, from my supply chain analysis work at JD, I understood that while China has many enterprises, truly scaled ones are relatively few. In this context, getting enterprises to "buy things" remains difficult. China's SaaS has long struggled to break through, but AIGC technology may change this situation.

  5. For enterprise services, our key accounts are primarily central state-owned enterprises and leading internet companies. Last year, we launched PixMaker, a commercial photography product for brand merchants' shelf listings. After our strategic upgrade this year, we began producing marketing materials, particularly tools for short video marketing production. Because we believe AIGC's largest related industry is content production, and the largest portion of content production relates to marketing. Currently, we have over 40,000 cooperating SMEs, and over 100 large enterprises. Our AI video color ringtone cooperation with telecom operators, for example, can transform our AIGC product into a truly national-level product.

  6. Additionally, we focus heavily on toolization and SaaS services. We see an advantage in China: you can first serve large customers to refine your product, then reverse course to go overseas for SMB services. SMBs, large C-end users, and professional individual users share essentially the same product logic — none require point-to-point service — and we already have several products performing well. At bottom, commercialization mainly involves two things: first, providing creators with good creation platforms and content ecosystems; second, producing advertising content for brand enterprises needing marketing. Going forward, we'll also explore moving from production to distribution.

How This Generation's Entrepreneurs Solve Fundraising

  1. Not long ago, we closed two funding rounds. One from market-based funds, one from state capital — we combined our Pre-A and Series A. The former was Dunhong Capital, a well-known top-tier fund focused on cultural technology; the latter was state-backed funds led by Hefei Industry Investment, also including Anhui Province AI Mother Fund and Hubei Yangtze River Film Group. It's an undeniable fact that AI startups now struggle to raise from USD funds. So we're walking on two legs — talking with state capital, and with market-based and industrial capital.

  2. When taking state capital, I think you need to consider whether the government's prioritized industrial directions align with your company's direction, and whether they can cultivate your company into a leading or chain-master enterprise. Today's state capital, like Hefei Industry Investment, also has professional perspectives, views, and due diligence — market-based judgment. And state capital represents local governments' industrial directions; startups can also leverage this momentum.

  3. Our first funding round last year came from a USTC alumni group called "Zhong He Da." This group of roughly 100 people consists mainly of USTC entrepreneurs and scholars, who regularly organize alumni activities and entrepreneurship exchanges. It was 15 alumni from this group who formed a partner LLP to support our first funding round.

USTC has traditionally cultivated scientists oriented toward mathematics and physics — the so-called "one academician per thousand graduates." But it has been less prominent in engineering and business, so they wanted to collectively support someone to do this, and I happened to want to start a company. This funding was called "Zhong He Da Seed No. 1," and there may soon be Seed No. 2, No. 3, and so on.

  1. At the very beginning, some USD funds came in — they liked big stories, the more星辰大海 the better. But later, after U.S. regulatory provisions came out, many USD funds dared not invest, and we switched to a RMB structure. Regarding USD versus RMB, I think it depends on where your business is, where your customers are. If our business truly goes global in the future, we can also take USD funds — the structure can be adjusted.

  2. Three years ago if you started a company, 100 RMB came easily; now, 70% of USD funds are impossible, and the remaining 30% of RMB is scattered everywhere. Perhaps only a small portion is industrial funds, and even industrial capital has grown cautious. Among multimodal startups currently able to raise external funding, there are only a few. Ten years ago, there would have been at least a dozen or so. But the logic is the same: without commercialization data, who ultimately takes over? My industrial experience tells me that a company must create genuine commercial value, create value for shareholders — otherwise the company has no meaning.

  3. I often tell my investors when they can exit. How high is our ceiling? I don't know, because it often depends on macro trends and some contingencies. But I tell them how high our floor is — that I will ensure our company operates healthily and stably.

When the Wave Comes, Jump With It

  1. Entrepreneurship makes me feel my life has never been so complete. As an executive at a large company, you just need to manage technology or teams; as for strategy, you still have bosses behind you. Becoming an entrepreneur is different — there's no one behind you. Every problem ultimately comes to you, and you must end them.

  2. Everyone who joins a startup must first do their own psychological construction — you have to think it through yourself. Otherwise, at the slightest difficulty, you'll wonder: why am I putting myself through this? I've experienced technology to product, and then a stretch of commercialization, but truly starting a company requires even more.

  3. Around 2015, when the Four Little Dragons emerged, I was still at Microsoft. Many people asked me to come out and start a company then, but I didn't. First, I felt my wings weren't fully grown — I could push further academically; additionally, that wave's business model seemed relatively thin. I chose to emerge in 2018 because I felt I had accumulated enough academically, and wanted to all-in on a product.

  4. At Microsoft Research, we often said: from a technology to a product might require 100 engineers; to sell the product well might require another 100 solution specialists or BD people — see how large that gap is. At the time I thought, I must find somewhere to bridge that chain. Later at JD, every technology I worked on went into products. This process can be viewed as: from technology to product, to a business line, to a company.

  5. Choosing video as a track was also a result of rational thinking. Last year we judged that large language model competition was too intense. Meanwhile, the gap between domestic and international video generation wasn't large. Additionally, on business model: large language models are used for human-computer interaction and comprehension, where accuracy matters and hallucinations are problematic. Video generation is a digital creative industry — users don't care as much about hallucinations. The company was founded in March last year, first funding received in May, and the first version of the HiDream model launched on HiDream.ai in August. At that time, we were the first company in the world to launch text-to-video AI.

  6. We do both models and applications. If you only do applications without self-developed models, it's too thin — you risk being disrupted by models. But we're not doing general models; we're doing vertical models. I believe as long as we ensure our model capabilities remain among the world's best in this domain, and our products solve the last-mile user experience problem, this company can succeed.

  7. Walking this path, you realize being an entrepreneur is more challenging than being a scientist. Scientists can focus daily on 0-to-1 innovation without solving concrete productization problems; for entrepreneurs, after 0-to-1, you must take 1 to 100, then to 10,000. In this process, you must eliminate your ego. Let yourself descend from some previous professional peak, self-zero, then climb another peak. In starting this company, I'm actively breaking my own comfort zone to seek this transformation.

  8. In the current environment, entrepreneurs indeed need to become hexagonal warriors. But I'm increasingly enjoying uncertainty. Looking back 60 years at AI waves, there have been three rises and three falls. We're currently in the middle of the third wave, not yet falling. I always believe people should go with the flow. When the wave comes, don't swim against it — jump with it, ride the momentum.

  9. I'm willing to devote the next ten years to this company, experiencing business from beginning to end. If I have the opportunity in the future, I hope to pass this experience to younger people. In such an environment, many things await doing — this is also good. Why do many financially free people want to climb Everest once? Because people always seek new starting points, rather than idling, aging abruptly.

Image Source | IC Photo

Layout | Hongyu Liu