VAST Closes New Funding Round of Tens of Millions of Dollars, Says "Model as Product" Is Not a Real Product

暗涌Waves·June 10, 2025

A new path.

"A New Path." By Shi Jiaxiang

AnYong Waves has learned that VAST, a company in the 3D generative model space, has completed a Pre-A+ round of financing totaling tens of millions of dollars. This round was led by the Beijing AI Industry Investment Fund, with participation from Jingya Capital.

"The entire industry may have gotten it wrong." This has been the biggest realization for VAST founder Yachen Song over the past nine months. After speaking with hundreds of 3D creators, Song came to understand that treating AI-generated 3D as "model-as-product" — the same path taken for text and images — simply doesn't work.

The reason: other content domains operate on UGC ecosystems where users publish simply to share, but 3D at its current stage remains a PGC ecosystem.

"So who are the real target users?" Song's answer: people much like programmers — they need an AI-native 3D workbench "like Cursor" that reimagines traditional production workflows.

Based on this insight, VAST last month launched Tripo Studio, an AI-driven all-in-one 3D workbench. Rather than merely generating an 80-point model, it provides a complete AI workflow that lets creators optimize models to 95 points within minutes.

Meanwhile, VAST is exploring what content paradigms are currently achievable and consumable with native AI 3D — understanding what kinds of strongly interactive, lightweight content actually work. Together with indie developers, they're experimenting with 3D mini-games, including a "Tripo Lightning" version of the "Universal Tai Chi" feature from Where Winds Meet that lets players manifest their words into reality within virtual worlds.

This exploration stems partly from Song's own gaming obsession. Back at SenseTime, he built AI+Gaming and AI+Animation businesses from scratch. As a member of the post-95 generation "poisoned by digital heroin," Song told us during our last meeting that he sleeps around 2 or 3 a.m. daily — CEO of VAST by day, guild leader in Rate Earth by night.

In this interview, he says he hasn't played Rate Earth for a while "because a bunch of new SLGs came out." Recently he's been immersed in Rome and The Great Rivers: Azure Dragons and White Cranes, and tries to squeeze in a Dungeons & Dragons session each weekend — "I just sleep very little, I'm not not working. I'm still up at 8 a.m. for meetings," he adds.

VAST's Beijing office has moved three times, but always stayed near Tsinghua University, drawing numerous Tsinghua undergrads, master's students, and PhDs — perhaps one reason they've caught the Beijing AI Industry Fund's attention.

We spoke with VAST founder Yachen Song in September 2024 and again in June 2025. Throughout both conversations, Song's tone remained excited, optimistic, and filled with conviction that 3D represents the next content medium. As he put it, the biggest difference between startups and tech giants is: startups believe first, then see.

The following conversation has been edited by AnYong Waves —

Part 01

From "3D Cursor" to "3D Meitu"

AnYong: Since our conversation last September after your funding round, what's your latest thinking on the 3D space?

Song Yachen: After we talked, I must have spoken with several hundred Tripo creators, if not close to a thousand. And suddenly it hit me — something the industry hasn't realized yet: everyone has been doing it wrong.

I used to tell you that 3D was the last "C" in AIGC. But in that moment, I realized that's not true.

Us and our competitors have been endlessly competing on foundation models because without productization, there's nothing else to do.

We'd always assumed that text, image, video, and 3D large models were all content generation models — "model-as-product." Early Midjourney, Runway — minimal UI, type in text, get an image or video. You'd see some video or effect suddenly blow up. But 3D is fundamentally different.

The reason: text, image, and video creators are a different crowd from existing 3D creators. The former could do UGC creation early on — shoot video on your phone, edit photos, use CapCut for editing, all low barrier. Because the ecosystem existed, "model-as-product" could let the masses participate directly.

But 3D is more of a PGC ecosystem. PGC creation is for profit. UGC users publish to share ideas, vent, rant, or "show off" — never to make money. So the "model-as-product" route doesn't work.

AnYong: So what route should it be?

Song Yachen: I think our user profile is actually quite similar to programmers.

Tripo serving enterprises through APIs is fine, but for professional users, they need an AI-native 3D workbench that reimagines traditional production workflows.

Like how programmers, once accustomed to Cursor, can barely go back to VS Code. This is completely different from adding an AI plugin to existing software, because generation and editing remain disconnected.

What our users really need is to generate an 80-point 3D model in Tripo, then optimize it to 95+ in five minutes. Redefining the 3D production pipeline through AI, giving creators an immersive 3D content creation environment.

AnYong: After this realization, what changed?

Song Yachen: This year we strengthened our product, engineering, and commercialization teams. We believe the end deliverable must be an AI-native workflow that completely redefines the 3D pipeline, where users can create 3D content end-to-end.

AnYong: How do you understand the users you're serving now?

Song Yachen: We've served professional users (Pro) well over the past two years; now we're serving professional consumers (Pro-C).

Serving UGC users is the next step. Only after you have "3D Meitu" can "3D Douyin" emerge.

Our path is: first "model-as-product," now "all-in-one AI 3D workbench," next "3D Meitu."

The difference between "3D Meitu" and ordinary 3D workbenches is that while "3D Meitu" reduces user control and editability, it lets users with zero foundation create 3D content consumable by the masses with no barrier to entry.

A bit like filters in CapCut — you can't control their specific parameters. But the upside is zero barrier, zero cost, and your creation becomes directly consumable.

Long-term there's still a To C opportunity, but it must come step by step. First lower the barrier for Pro-C creators, then gradually lower it to pure UGC users.

Why could text, image, and video do it in one step? Because typing methods, phone cameras — mass-level creator tools already existed. The UGC ecosystem was mature, users were already educated, already using Meitu and CapCut. Say "AI version of CapCut or Meitu" and they instantly get it.

AnYong: Compared to last year's interview, you seem more conservative.

Song Yachen: Since starting the company in 2023, I've done an annual all-hands share every year. After the third one this year, I asked CTO Ding Liang how it was. He said not much changes each time. So my original intent hasn't changed.

Nor does it mean we're more conservative. Quite the opposite — I used to think it would take 5-10 years for "3D Douyin" to emerge; now I think it'll happen in 3-5 years. I'm more optimistic, but dynamically adjusting the methods and paths to get there.

Part 02

Believe First, Then See.

AnYong: 3D used to be a relatively niche space, but this year Tencent and ByteDance are also pushing into 3D. Are you worried?

Song Yachen: We've always faced competition from giants — just previously it was NVIDIA, Meta, Google, OpenAI. Now there are more domestic competitors too.

Giants mostly assemble algorithm teams, pursuing technical and academic influence. Much of what they're doing now, we were doing two years ago. So the situation becomes: when we were "competing on models," they hadn't started; now they're starting to "compete on models," while we're already ahead on product, engineering, and commercialization.

Giants aren't that scary. Their mindset is "see first, believe later." They need to see something already built, with good data, large user base, making money — then they'll follow. Startups are different. We "believe first, then see." We figured out what we wanted to do long ago.

Plus, we also collaborate with giants.

AnYong: Then tell us about your contribution to Where Winds Meet.

Song Yachen: When Where Winds Meet launched early this year, it had a core gameplay feature called "Universal Tai Chi" — turning 3D generation technology into a completely new interactive form where you could "speak and manifest." We provided that.

For example, a player encounters a river. Previously they'd have to go around; now they can generate a bridge and walk across. Or they're losing a fight, generate a tree, hide behind it — you can't hit me.

Initially 3D generation was somewhat slow — not just us, universally slow, taking dozens of seconds. For a player, waiting dozens of seconds for a bridge to appear while crossing a river — you could've walked around by then. Later we made a Tripo Lightning version, compressing generation from dozens of seconds to a few seconds, maximum under 10 seconds.

The value we provide has three dimensions: cost reduction and efficiency gains; lowering creation barriers so people who couldn't do it before now can; and creating entirely new gameplay and interaction methods.

When collaborating with game companies on gameplay, most companies enter at the R&D or publishing stage. We often start collaborating during live operations. Once a game launches and the team prepares for operations, trying new gameplay, new features, and player engagement — that's when collaboration costs are lowest.

AnYong: Besides gaming, you also have 3D printing clients. How do you view the 3D printing market?

Song Yachen: I'm a 3D printing hobbyist myself. I have a 3D printer at home. The shoes I'm wearing right now are 3D printed.

But there's a bug: I didn't know how to model before, so I could only download others' models. This has always been why 3D printing hasn't grown faster — the vast majority of people can't do 3D modeling. This limits its target users to maybe tens of millions.

But "3D generation" expands the potential market from tens of millions to billions overnight. With Tripo 2.5's model generation fidelity, VAST's 3D generation technology can now fully serve the consumer 3D printing industry.

AnYong: So 3D printing will be a major application scenario?

Song Yachen: From a model generation perspective, everything except hyper-realistic human faces is viable. But realistic human faces our 3.0 version will solve.

I also want to add: flexible production goes beyond additive manufacturing (3D printing). Manufacturing also has subtractive manufacturing (carving) and formative manufacturing (molding). All three manufacturing methods can apply 3D generation technology.

3D generation can be applied across many industrial fields, far beyond consumer use — jewelry, footwear and apparel, home goods, toys, figurines, building blocks, lighting, stationery, scented candles, food packaging, and more.

It enables two models. First, small-batch rapid iteration, using large numbers of SKUs for market validation. Generate many jewelry designs, see which goes viral on Xiaohongshu, then mass-produce — no inventory risk if it doesn't sell.

Second, POD (print-on-demand). Customization exists not just on the supply side but the demand side. We're also discussing with e-commerce platforms how to use 3D generation to help users express their needs. In the PC era, you searched on Taobao with text; with phone cameras, you see a nice outfit, snap a photo, search for similar items. In the flexible production era, 3D generation can more accurately express consumers' actual needs.

AnYong: Since starting the company, have you felt any powerlessness in educating users about 3D?

Song Yachen: The post-00s, post-05s generation has extremely high sensitivity to 3D. They grew up gaming, exposed to these things — explaining 3D to them costs almost nothing.

Think about it: before short video emerged, did you really feel that strongly about video? Before Xiaohongshu existed, were you that into photo galleries?

It's because Douyin and Xiaohongshu exist that we've become so familiar with the media forms they carry, feeling like we instantly get it.

The only reason 3D seems harder to grasp right now is because "3D Douyin" hasn't appeared yet.

AnYong: But don't you think 3D is extremely dependent on hardware device普及, and hardware isn't something you can directly control?

Song Yachen: When I was very young, I read web novels on an MP3 player — a device completely not designed for reading. The screen was tiny, maybe ten characters per line, no backlight, had to use a flashlight, finishing one book might require pressing the page-turn button hundreds of thousands of times.

Later people used MP4 players, Little Emperor learning computers — none of this hardware was designed for reading. Yet the terrible experience didn't hinder the explosion of web novels at all. It wasn't until Kindle came along that there was purpose-built hardware.

Content and platforms must emerge first, then dedicated hardware follows — never the reverse.

As we discussed before: "believe first, then see."

AnYong: I also heard you're quite good at recruiting?

Song Yachen: Recruitment has nothing to do with me personally.

Have you seen One Piece? Like Luffy in One Piece — he himself isn't necessarily eloquent. What matters most is that his crewmates genuinely want to find the One Piece.

Whitebeard said, he's a remnant of the old era; the new era has no ship that can carry him. That's exactly it. Whitebeard's power was immense, with many subordinates, but he couldn't recruit someone like Sanji, because Sanji had a dream to pursue.

At a startup, the office, equipment, compute, benefits may not be the best. The team is newly formed, with磨合 costs, even lacking various specialized roles. How do you attract the best talent? By accomplishing one awesome thing after another.

Entrepreneurship is: once you're convinced the direction is right, you work to make it happen.

Recommended Reading

The Flow of Money, The Rise and Fall of People