Yunqi Capital x Tsinghua University | 5 Embodied AI "Heavyweights" on Physical AI and Commercialization

云启资本·April 6, 2026·23·0

Finding the Decisive Factor in Embodied AI's Path to Market

Where will the real decisive factor lie as embodied intelligence comes to market? The answer is emerging from the exploration of leading players on the front lines.

On March 28, "Pujiang Surges, AI Boundless — Physical AI Frontier Exploration and Industrial Innovation Seminar & 28th Department Alumni Forum", co-hosted by Tsinghua University Department of Electronic Engineering, Tsinghua Alumni Association Electronic Engineering Branch, and Zhangjiang Hi-Tech, successfully concluded, with Yunqi Capital as a co-organizer.

Among the events, the roundtable "Embodied New Horizons: Physical AI and Future Intelligent Agents", moderated by Tsinghua University Professor of Electronic Engineering Wang Yu, brought together five key figures standing at the intersection of embodied technology frontiers and industry practice.

Chen Yilun, Yao Maoqing, Wang Qian, Ding Wenbo, and Li Ziyi — all alumni of Tsinghua's Department of Electronic Engineering — discussed technical bottlenecks, scenario deployment, and more. Wang Qian and Li Ziyi represent portfolio companies we have been supporting over the long term: X Variable Robotics and Neolix, respectively.

The following content is adapted from "THU Alumni Association Electronic Engineering Branch"

Moderator:

Wang Yu | Professor, Tsinghua University Department of Electronic Engineering (Class of 1998)

Roundtable Guests:

Chen Yilun | Founder & CEO, Tashi Zhihang (Class of 2001)

Yao Maoqing | Partner, AgiBot; Chairman, Mifeng (Class of 2005)

Wang Qian | Founder & CEO, X Variable Robotics (Class of 2007)

Ding Wenbo | Associate Professor & Director of Research Affairs, Tsinghua Shenzhen International Graduate School (Class of 2007)

Li Ziyi | CFO, Neolix (Class of 2010)

Roundtable discussion

Highlights

Part 01

Wang Yu:

Many are calling 2026 the "year zero" of embodied intelligence. I'd like to ask each of you — in which direction does the real decisive factor lie? Please start with a brief introduction, then address this key question.

Wang Yu moderating the roundtable

Chen Yilun:

I'm from the Class of 2001 in the Department of Electronic Engineering, and later received my PhD from the University of Michigan, where I focused on machine learning. After graduation, I worked on robotics system R&D for five years before joining DJI as Chief Engineer, responsible for machine vision and automated production systems. I then moved to Huawei, where I helped build the autonomous driving team from the ground up and led multiple mass-production projects. I founded Tashi Zhihang in early 2025.

Regarding 2026, I believe we'll see major technical breakthroughs. The embodied intelligence field has been severely data-constrained, but data volume will grow significantly this year, AI capabilities will improve accordingly, and the hardware is already in place. The crucial point is that embodied intelligence must establish its own independent value proposition. Just as autonomous driving and large language models have proven their irreplaceable value, embodied intelligence needs to identify a core value that can sustain itself over the long term.

Chen Yilun speaking

Yao Maoqing:

I'm from the Class of 2005. AgiBot was founded in 2023 — the name reflects our pursuit of AGI. The transition from cognitive intelligence to decision-making intelligence is inevitable, and the industry is currently in an exploration and trial-and-error phase.

2026 is a critical "show your homework" year. Over the past two years, numerous startups have launched hardware products and assembled complete R&D teams. This year, we need to identify suitable industrial and technical paths for embodied intelligence, while also clarifying what elements are still missing for long-term development — particularly how to build a development path for the physical world that is independent of language models.

Yao Maoqing speaking

Wang Qian:

I'm from the Class of 2007. I've been working on neural networks since 2009, making me an early deep learning researcher. I went abroad for my PhD in robotics, briefly worked in quantitative finance, and have now returned to robotics.

I think this year we can start talking about a "ChatGPT moment" for embodied intelligence, or at least a "GPT-3 moment." In the past two years, the industry generally believed this moment was still distant, so it wasn't discussed much. This year, whoever first reaches this milestone in foundational model capabilities may matter more than specific industry or application deployments.

Wang Qian speaking

Ding Wenbo:

I'm from the Class of 2007. I'm currently an associate professor and director of research affairs at Tsinghua Shenzhen International Graduate School.

I've been focused on the data question: where does embodied intelligence data come from? How can it be acquired efficiently and formed into a data flywheel? Data acquisition for the previous generation of language models was relatively straightforward because it was naturally tied to the internet industry. But embodied intelligence involves physical world interaction, making data acquisition significantly harder. How to organically integrate humans, machines, digital humans, and generative models — making simulation data sufficiently realistic and real-world data easy to use — is a question that needs deep consideration from both technical and ethical perspectives.

Ding Wenbo speaking

Li Ziyi:

I'm from the Class of 2010. I co-founded Neolix in 2018, focusing on L4 autonomous driving for logistics scenarios.

The current commercialization progress of embodied intelligence closely resembles where autonomous driving was in 2020-2021: clear commercial direction, but still one step away from crossing the chasm to value validation in specific scenarios. It's like mountain climbing: to summit Everest, you must first climb Mount Hua; to climb Mount Hua, you must first climb Fragrant Hills. The key is finding that "Fragrant Hills" and then advancing step by step.

Li Ziyi speaking

Part 02

Wang Yu:

Yilun, you're pursuing the AWE route. How does it differ from VLA?

Chen Yilun:

First, some context. Autonomous driving is a sub-problem of embodied intelligence, and its methodology derives from robotics. In autonomous driving, we tried numerous technical approaches, many with extremely high trial-and-error costs. VLA was essentially a route that had been thoroughly tested and proven ineffective during my time in autonomous driving. Based on this experience, I abandoned the VLA approach from the start. Embodied intelligence, like autonomous driving, needs its own independent technical system.

The laws of physical-world AI can be summarized as: perceive the world through sensors, construct world representations, then improve task success rates within that representation space through imitation learning and reinforcement learning.

Wang Yu:

What progress has AgiBot made in models, datasets, and practical applications?

Yao Maoqing:

To achieve AGI, we must evolve from current cognitive intelligence to decision-making intelligence. Large language models have consumed hundreds of trillions of tokens of internet data, but remain largely at the semantic cognition level. Enabling intelligent agents to interact dynamically with their environment and complete complex tasks in closed loops is the next challenge. What's most lacking right now is data. Therefore, on the model side, we've been exploring intermediate representation forms that bridge logical thinking and planning with physical-world control.

On the application side, Chinese embodied intelligence startups face dual pressure: they must both explore frontiers and deliver阶段性产出. We've achieved 7×24 hour parallel operation in some industrial scenarios, with hourly output exceeding human levels. But overall, we still need continuous iteration through the data flywheel, and we need to raise hardware mean time between failures to the ten-thousand-hour level of industrial robots.

Wang Yu:

Wang Qian, what's your model strategy? Why are you trying consumer-facing scenarios like housekeeping?

Wang Qian:

Our goal is to build a foundational model for the physical world — a "grand unified model." VLA, world models, and so on are all downstream tasks. What we really need to learn are the physical laws, object properties, and fundamental action logic hidden behind these tasks. The core difficulty in hand manipulation is the "physical gap" — the physical world contains massive amounts of randomness and non-linearity. This wasn't prominent in autonomous driving. So our model is closer to what we'd now call a "full-modality model" in language models, simultaneously achieving VLA and world model functionality within a single model, using cross-task validation to gain essential understanding of the physical world.

Why try home scenarios? Because what we need isn't artificially constructed data factory data, but sufficiently diverse, complex, and high-quality real interaction data. This is an inevitable choice driven by technical requirements. Commercial pressure certainly exists, but I still believe that when a true foundational model emerges, all previous阶段性成果 will be redefined.

Wang Yu:

Wenbo, what's the status of tactile sensors and tactile models? Are robotics companies adopting them?

Ding Wenbo:

First, the conclusion: robotics companies generally believe introducing tactile perception causes instability in large models, so actual adoption is limited.

However, the multimodal information that touch provides is crucial for fine manipulation. Taking biology as reference: gorillas far exceed humans in physical capability and muscle efficiency, but humans prevail through higher intelligence. We've introduced radar technology into tactile sensing to achieve multimodal signal fusion — this draws on fundamentals taught in the Department of Electronic Engineering: electrodynamics, optics, signal processing. As for industrial application, although touch is currently seen as a risk factor for model collapse, I believe this is an obstacle that must be overcome.

Wang Yu:

Ziyi, which difficulties that the autonomous driving industry has experienced will embodied intelligence inevitably face?

Li Ziyi:

My career path is somewhat unusual: after graduation, I didn't go into R&D, but instead worked at Big Four accounting firms, investment banks, and investment institutions before starting my own company. So I may focus more on business models and scenario deployment.

The first principles of logistics are simple: cost reduction. It took us eight years to drive the full lifecycle operating cost of our unmanned vehicles down to 0.5 RMB per kilometer, while traditional freight charges about 4 RMB per kilometer. This achievement depended on finding the right entry point where cost and scenario matched at each development stage — first ensuring survival, then pushing toward more difficult scenarios.**

On the hardware side, we self-develop modules and integrate supply chain resources; on the software side, we've evolved from HD maps + LiDAR, to BEV, to end-to-end mapless solutions. This process demands high balance between operations and R&D. Embodied intelligence will likely follow a similar path. But the favorable condition is that the current external environment is more mature than autonomous driving's early days — it's no longer going solo, but ecosystem collaboration. We're already exploring new application scenarios with embodied intelligence companies, trying to offer customers better solutions through an "autonomous driving + embodied" combination.

Wang Yu:

Finally, please each answer three questions: What do you think could be done in collaboration with the university (particularly the Department of Electronic Engineering)? When do you predict the ChatGPT moment and OpenClaw moment for embodied intelligence? What's your company's most important goal this year?

Chen Yilun:

Collaboration with the university: The Department of Electronic Engineering's curriculum and student capabilities are highly aligned with robotics. From my own experience: when developing data collection gloves, knowledge from communication principles and radar signal processing was directly applicable; when designing dexterous hands, the core flexible PCB issue was fundamentally a matter of the department's工艺 and system integration expertise. I'm fully committed to supporting the department's embodied intelligence research and will provide 100% cooperation within our company's capabilities.

GPT moment: No problem achieving fully autonomous operation on single-point tasks this year. OpenClaw moment: Chemical reactions between different skills, significantly accelerating expansion speed. My vision is that robots could become an aggregation of multiple top experts' capabilities, with each individual skill far exceeding human level, while possessing a methodology-level general connection mechanism that can continuously generate new skills. This goal will take about two years.

This year's goal: Solve a critical problem that has long remained unresolved in industry. If successful, it will bring high technical satisfaction.

Yao Maoqing:

Collaboration with the university: Embodied intelligence has entered deep waters, involving distributed training communication, network protocols, hardware reliability, and other systemic challenges — precisely where the department's students excel. I hope to collaborate with the university on original R&D from底层关节 to dexterous hands, areas where mature supply chain solutions don't yet exist.

GPT moment: It may not follow the simple paradigm of "foundation model + vertical applications." If one day models truly understand the underlying representations of the physical world, that might be defined as the GPT moment.

This year's goal: Continue advancing dataset construction and model iteration.

Wang Qian:

Collaboration with the university: The learning curve in AI is flattening — high school students can now participate in frontier research. The undergraduate stage at the Department of Electronic Engineering is already capable of high-level research work. Step-by-step teaching models may waste talent. I suggest establishing large-scale, industry-collaborated research mechanisms at the undergraduate level.

GPT moment: I maintain the foundation-model-first route. Embodied intelligence represents China's first opportunity in centuries to take world-leading position from the 0-to-1 stage. I expect the GPT moment in 2-3 years, and product-level milestones may precede foundation-model milestones.

This year's goal: Make our foundation model the world's best. This seemed distant when discussed last year, but is now realistically achievable.

Ding Wenbo:

Collaboration with the university: The technical sophistication and resource abundance of industry now significantly surpass universities. Students should be boldly sent to enterprises for training. However, undergraduates' worldview and character formation still needs university guidance — completing four years at Tsinghua before entering enterprise yields different results than direct entry.

OpenClaw moment: The naming of OpenClaw carries deep meaning — an invasive alien species, with limb regeneration. If robots can perceive their own structural damage and autonomously repair or reconfigure, that would be a true milestone. But this goal remains distant. That said, software-defined radio once lowered the barrier for hardware R&D; similarly, software-hardware synergy and mutual respect is a direction this industry needs to push.

This year's goal: Acquire 10 million hours of本体-free data, partly from self-development, partly from partners.

Li Ziyi:

Collaboration with the university: We're willing to serve as an edge-side carrier for the department's technical achievements in chips, algorithms, and models. We've already completed the most arduous scenario exploration work and possess scalable edge-side capabilities. We hope to deeply collaborate with the department on talent pipeline development.

GPT moment: The autonomous driving industry has experienced three ups and downs. If asked a year ago, I would have predicted 2029-2030. But DeepSeek's performance this year has been exceptional, with technology iteration speed exceeding expectations. This moment may come significantly earlier.

This year's goal: Continue expanding unmanned vehicle delivery scale, while jointly exploring new scenarios with embodied intelligence companies.