Two Tech Giants Invest in a 3D Generative AI Startup | WAVES

暗涌Waves·January 12, 2025

Meituan and ByteDance invested simultaneously.

By Jiaxiang Shi

Edited by Jing Liu

In October 2023, after spending half a year and nearly all his energy on a funding round that ultimately fell through, Wu Di, founder of Yingmu Technology, was stunned. There was no time to dwell on it. Yingmu carried out its first large-scale layoff since founding. Wu had wanted to quickly raise a small round to keep operations running, but the environment was as bleak as it could get. The failed fundraising solidified their resolve to expand into full-category 3D asset generation.

At the time, some teams had already launched 3D generation products based on 2D-to-3D lifting techniques — the dominant academic approach. But they saw the bottleneck: this path could only capture one side of a real object; even infinite images from infinite angles couldn't fully describe 3D content. The only solution was to use native 3D data from the start. It was an all-or-nothing bet. Even the artists on the team originally assigned to film projects were reassigned to model annotation.

Rodin, a 3D engine built on CLAY, launched in June of last year. CLAY is a native 3D generative Diffusion Transformer large model jointly developed by Yingmu and ShanghaiTech University. This research earned them an Honorable Mention for Best Paper at SIGGRAPH 2024. Forty-five days later, Rodin hit $1 million ARR — the main reason, Wu said, they later caught the attention of major tech companies.

Dark Waves has learned that Yingmu Technology has completed a new Series A round of tens of millions of dollars, led by Meituan Longzhu and ByteDance, with existing investors Hongshan and MiraclePlus participating.

Yingmu has long been tagged as a "student startup" — core members are still pursuing master's and PhD degrees in the lab. But four years in, CTO Zhang Qixuan says the "young prodigies" have gradually shifted their priority to commercialization and product usability.

Wu still remembers when he first arrived at ShanghaiTech: the campus was a construction site, and he didn't even know if this patch of dirt would actually become the modern campus shown in the renderings. But having just finished the gaokao, he didn't care. Compared to the conventional path — finish school, go abroad for grad school, return to China and join a big tech company — this nearly blank score held far more appeal.

"WAVES" is a new column from Dark Waves. Here, we bring you the stories and spirit of a new generation of entrepreneurs and investors.

The following is a retrospective from Yingmu Technology founder Wu Di and CTO Zhang Qixuan on their entrepreneurial journey, including their understanding of the future of 3D generation, edited by Dark Waves:

On Entrepreneurship: A Single Choice

1. Yingmu was born from a lab problem: how to put people and objects into virtual worlds. To achieve this, in 2020 we launched our first facial scanning system, capable of capturing how faces appear under different lighting conditions to synthesize their display under entirely new lighting.

2. But this technology kept hitting walls in real-world applications. We had made it into the face-swapping project for The Wandering Earth II, but the collaboration ultimately fell through. The reason: our first-generation dome light field focused on light capture — piecing together how people appeared under illumination — with fixed camera angles and no model movement. It could only be used for completely static shots from specific angles. The light field could only collect geometric data, not material properties, and was powerless against dynamic information like facial wrinkles.

3. That's when I realized there was a massive gap between academic research and what industry actually needs. What industry needs is 3D modeling with clean topology, proper UVs, renderability, adjustable expressions, and real-time game engine compatibility — not elegant wireframes. While waiting for the next-generation dome light field, we wanted to try something based on generative network technology.

4. Yingmu developed two products at the time. One was called Wand. The app was simple: users would sketch on a canvas, and Wand would generate a photorealistic portrait. Development took two weeks. The first generation of photorealistic portraits made no waves, so we switched the output from realistic humans to anime-style images. Wand shot to number one on the App Store's Graphics & Design category, with over 1.6 million registered users "drawing anime waifus."

5. But Wand was just a simple tool with no user retention. We couldn't figure out a viable monetization model or balance users against compute costs. The options were either to dive deeper into technology and add more features, or build an anime community. But we didn't believe in 2D technology, and our eight-person all-engineering team had no one skilled at community operations. We finally accepted we couldn't sustain this traffic, killed the entire 2D business line, and moved on.

6. In retrospect, Wand had completed its historical mission. It earned us our first revenue — only 6,000 RMB — but more importantly, it helped us close our angel round. We still believed that next-generation display devices and interaction methods would operate in three dimensions.

On Direction and the Future of 3D Generation: Wavering and Resolve

7. After closing that round, the metaverse was hot, and we rode the digital human and metaverse wave to a second funding round. Our thinking at the time: existing digital humans would eventually evolve into ID-type digital humans, becoming standard equipment for anyone entering virtual worlds. So in late 2022 we launched DreamFace and, based on that framework, ChatAvatar, a 3D character generator capable of producing at least supporting-character-level models with skeletal rigging.

8. But we entered right as the metaverse was ending. Commercial progress was difficult, every step a struggle. That year I graduated, moved the office out of the ShanghaiTech lab, and hit the pandemic lockdown — paying rent for half a year on an empty space.

9. By 2023, I had spent six months working on a new funding round, only to have the lead investor pull out overnight. I was stunned. I had wanted to first raise one or two million dollars just to survive, but the environment was as down as it could get. I had finance show me the bank balance twice a week, watching cash flow, barely maintaining break-even. That's when I realized: without a new milestone, Yingmu wouldn't be able to raise another dollar.

10. We had already put broad generative 3D on the roadmap, but faced a critical technical choice. 3D generation broadly splits into two technical paths: 2D lifting and native 3D. The former trains on massive 2D image datasets, but because data is concentrated from the 3D world, it consistently produces "multi-head" artifacts. Releasing a product on this path might secure quick funding, but the gap to "production-ready" would be unbridgeable. Whether native 3D could even work, we had no idea.

11. We ultimately agreed: if we needed to compete with the 3D industry, we had to use native 3D training methods. The difficulty of this approach is often attributed to insufficient quality data. But actually, the bottleneck in 3D generation isn't data volume — it's the right 3D representation and parameter scale. The key is minimizing information loss from dataset to final output.

12. Rodin launched last June, the last among its cohort of 3D generation startups to release. I believe its generation quality and usability were generationally ahead of comparable products at the time. Rodin Gen-1.5, released on the last day of 2024, filled the gap in sharp edge generation for 3D generation. For CAD-class industrial models and hard-surface models, it holds absolute advantage.

13. But even so, AI-generated models still have considerable distance from being directly usable. Unlike video, image, and other content formats, 3D is industrial-grade content, not consumer-grade — meaning there are established industry standards. With topology, geometric precision, materials, UV unwrapping, and other issues still unresolved, AI-generated 3D remains far from direct deployment in games and films.

14. Moreover, solving creation capability for ordinary users in 3D worlds doesn't mean a consumer-grade era for 3D will arrive — more preconditions are needed, like Vision Pro and Quest 3 becoming as ubiquitous as iPhones. The metaverse's moment in the spotlight was largely B-side players hyping themselves up.

In improving efficiency for the games industry, 3D generation is nowhere near Midjourney. Back in the lab, we thought technology equals product equals company. But technology doesn't equal product, and doesn't equal company.

15. Rodin can't generate industrial-grade 3D works for games or films either. Perhaps in the future, 3D generation will appear as core gameplay in games and film productions, but for now, native 3D technology's opportunity lies in existing markets.

16. So Yingmu is targeting "game outsourcing" for commercialization this time: in the game modeling pipeline, from concept art to finished model, there exists a series of "wasted drafts" that may require multiple revisions. Now, once the three-view concept art is complete, Rodin can first generate a modeling draft, with specific details then adjusted by modelers — reducing costs at the mid-poly or preview stage of modeling, or applied to peripheral, non-critical assets.

17. When I first came to ShanghaiTech, the school was a construction site, the lab newly built. We witnessed nearly the entire process of ShanghaiTech rising from rubble to towers. In a sense, ShanghaiTech's creation from nothing, as our advisor put it, was itself a "great entrepreneurship." And Yingmu Technology's four years are a footnote to that entrepreneurship.

Image source: Unsplash