"Language Models Have Hit a Wall; 3D Foundation Models Are Just Getting Started" — VAST Founder Yachen Song on Two Years of Building 3D Foundation Models at Full Speed

"Language Models Have Hit a Wall; 3D Foundation Models Are Just Getting Started" — VAST Founder Yachen Song on Two Years of Building 3D Foundation Models at Full Speed

September 14, 2025

Beyond the wave of text-to-image and text-to-video, where's the next AI inflection point that will blow our minds? The answer might be AI + 3D.

This week, we invited Simon Song, founder and CEO of 3D large model company VAST, to chat with us about the story behind VAST's latest 3D generation model, Tripo 3.0.

This '97-born entrepreneur has raised three consecutive rounds of funding in short succession — tens of millions of US dollars each — stockpiling enough ammunition. After a year of heads-down work, Simon is making his podcast debut this year to discuss several key strategic questions with us:

  • He believes large language models have already "hit a wall," with evolution slowing down — which is precisely what created space for applications and agents to flourish. 3D large models, by contrast, are entirely different: they've only just begun, and remain a blue ocean.
  • At the resource-constrained startup stage, why does VAST insist on "having it both ways"? Why both develop large models and build its own application, Tripo Studio?
  • Why is the ultimate form of technology a kind of "decompression"? He argues that human media history (text → image → video → 3D) isn't about ascending dimensions, but rather a series of forced dimensional reductions and compressions of the 3D "source file" world, limited by technology. Technological progress is the process of "decompressing" back to the world's true form.
  • And in a future where robots can do everything for us, how will human value be redefined?

From elementary school, when he attracted classmates to "top up" his hand-drawn paper RPG world with spicy strips, to going all-in on AI entrepreneurship to build an "infinite world" where he believes everyone will create in 3D — welcome to Simon's observations and reflections on his entrepreneurial journey. We'd also love to hear your thoughts on AI + 3D in the comments.

🟢 01:27 Quick-fire Q&A: Age, alma mater, MBTI and zodiac sign, one-sentence description of current company and product, funding status, revenue and profit, team size, pre-entrepreneurship experience

🟢 02:47 Ten "I am ___" sentences

  • I am Song Yachen, and also Simon
  • I am founder and CEO of VAST
  • I am an addictive gamer
  • ...the remaining 6 are even better, listen to the podcast

🟢 08:43 Part 1: The Origin of Everything: From charging for RPGs in elementary school to a dream of an infinite world

  • The childhood of a 3D large model entrepreneur: manually creating RPG worlds, classmates "topping up" with spicy strips and dried tofu
  • Core driving force: the physical world has limits; the greater world comes from the human brain, imagination, and creativity — that is an infinite world
  • The original "nail": wanting to build a UGC 3D content ecosystem, but discovering the world lacked a mass-market creation tool — like text without input methods, or video without smartphone cameras

🟢 26:14 Part 2: Model vs. Workbench: Why build both the engine and the F1 car?

  • A crucial strategic judgment: when a large model is iterating rapidly every 3-5 months, pure application-layer companies have almost no reason to exist, because "it builds a new wall for you, and you're screwed"
  • Building both models and applications (Tripo Studio): because you know the next iteration's direction, know which old walls to plaster over and which to leave
  • The essential difference between large model companies and product companies: the former is walking around with a hammer looking for nails. VAST believes it was never that from day one

🟢 28:44 Part 3: Survival Rules in the AI 2.0 Era: Language Models Have Hit a Wall, but 3D Hasn't

  • A disruptive view: why are so many Agents and applications emerging now? "I think it's because AI 1.0 died, so I'm doing AI 1.0 things."
  • AI 1.0 vs AI 2.0: the former is genius scientists hand-tuning parameters to train countless small models solving long-tail problems; the latter is data-driven training of a general large model to generalize and solve all problems
  • Why are there almost no pure application companies in 3D? "Because language models hit a wall, but 3D hasn't"

🟢 57:26 Part 4: The Ultimate Form of Technology Is a "Decompression"

  • We think the internet is ascending dimensions (text → image → video), but it's actually reduction and abstraction — because technology wasn't advanced enough, we were forced to "compress" what the 3D world originally looked like
  • As technology becomes more advanced, it allows for more decompression. When decompressed to the extreme, it's the source file
  • Why will everyone make 3D? Shooting video and posting photos, which feel completely natural to us now, have only been around for about ten years
  • The market size of a 3D UGC platform should be 2-3x the combined size of platforms like Twitter, Weibo, Xiaohongshu, Douyin, and TikTok

🟢 01:12:21 Part 5: Welcome to the Fourth Industry: When the Only Measure of Value Is "Experience"

  • Ultimate vision: in the future, robots can handle most things in the physical world for humans; human value lies in creativity and content
  • How to measure value — "the total amount of time all people spend in our world across all moments"
  • The future "currency" is compute power. The more attractive your world, the more "money" you get, the better your recommendation algorithm becomes, and the better experiences you can create
  • A metaphor from Upload: the more you spend in reality, the smoother the virtual world becomes. The future will be like this — this is compute power

Subscribe to the "Crossroads" podcast 🚦 We track the industry transformations and new entrepreneurial opportunities brought by the new wave of AI technology.

🚦 Crossroads is a metaphor Steve Jobs used for Apple — standing at the intersection of technology and liberal arts, where great products are born. AI is transforming every industry. We seek out, interview, and bring together a new generation of AI entrepreneurs and active players in the AI era. Together with them, we explore and embrace the new changes, the new possibilities.

👦🏻 Host Koji: I co-founded Jiepang / The Fair / Tangdao, and started AI Hacker House, a community space for a new generation of AI entrepreneurs. I believe technology, especially AI, is the greatest value-creation opportunity of our generation. Welcome to chat with me, bounce ideas, and link to the next possibility. Koji on Jike, Koji's website

👧🏻 Host Ronghui: I've worked at a dollar-denominated VC and spent five years as a Silicon Valley correspondent, tracking technological development and business stories. Welcome to chat and exchange ideas with me. Ronghui on Jike

🎄 This podcast is supported by The Fair's Sound Forest Podcast Initiative.