A Conversation with Zhiyuan's Chief Scientist: Calling Embodied Intelligence "the Next LLM" Is Its Biggest Misunderstanding

暗涌Waves·April 2, 2025

Necessary froth.

"Necessary Foam."

By Lili Yu

The embodied intelligence track is currently caught in a peculiar spectacle. On one side, investors like Allen Zhu are making high-profile exits from what they call a bubble. On the other, a sector that many investors had already written off as "the betting window has closed" by late last year keeps getting reactivated by massive funding rounds. Among these are not only newcomers with autonomous driving backgrounds, but also companies whose valuations had already skyrocketed last year. Tencent's latest bet on AgiBot is one of the most eye-catching — and marks Tencent Investment's first foray into embodied intelligence.

As early as 2023, this company — founded by Taihua Deng, former president of Huawei's computing product line, and "Huawei Genius Boy" Zhihuijun (Peng Zhihui), among others — set a record by completing a 300 million RMB angel round just one month after incorporation. Not stopping there, it reached a $1 billion valuation within months, becoming the fastest embodied intelligence company globally to achieve unicorn status. Among China's top-tier embodied intelligence companies, AgiBot is unquestionably the most audacious and high-profile. The market has never been short of noise around it.

Following the release of its first general-purpose embodied foundation model in March, AgiBot announced a partnership with Physical Intelligence (Pi). The matchmaker happens to be the chief scientist who just joined AgiBot yesterday: Jianlan Luo.

According to available information, Jianlan Luo previously conducted research at Google X and Google DeepMind. During his postdoctoral work at the Berkeley Artificial Intelligence Research Lab (BAIR), he was a core member of the team led by Professor Sergey Levine — a leading figure in deep reinforcement learning and one of Pi's co-founders.

Regarding why he joined AgiBot, as well as various misconceptions about the embodied intelligence sector, Luo shared his thoughts with several media outlets. Below are excerpts from those conversations, edited and compiled by An Yong.

Part 01

Those Who Laugh Last Will Follow the Full-Stack, Hardware-Software-Integrated Route

Question 1: Allen Zhu's exit has led many to believe embodied intelligence is already in a massive bubble.

Jianlan Luo: Foam, at its core, means attention and resources — it's betting ahead of the curve. A flood of capital rushes in, expectations eventually fall short at some point, things cool down, and then perhaps heat up again. This is completely normal.

Every technology paradigm shift goes through this. Autonomous driving is the same. Since Waymo started in 2016, it took until now to truly see commercially viable deployment. Embodied intelligence is even more complex and systemic, which means it requires longer technical accumulation — breakthroughs won't come from simply stacking compute or models.

Question 2: Is the large model the most critical variable driving embodied intelligence's popularity?

Jianlan Luo: Over-analogizing embodied intelligence to the large model paradigm is the biggest misconception outsiders have about this industry.

There are similarities — some large model techniques can transfer to embodied systems and robotics — but they cannot be simply equated.

For example, an LLM with 50-60% accuracy is usable. Because you have a human brain — if ChatGPT tells you to drink pesticide, you won't, because you can judge for yourself. But on a robot, that accuracy level is useless.

Imagine your home robot smashing a cup on your coffee table every three hours, throwing your phone at the window, or a coffee-delivery robot randomly spilling coffee every 20 minutes. Like autonomous driving — completely different from ten years ago, success rates are already high, yet people demand even more. Because every such failure has physical-world consequences.

So using the large model cycle to analogize embodied intelligence underestimates the unique challenges of operational intelligence and action intelligence.

Question 3: As the field evolves through different stages, how will the importance of software (represented by large models) and hardware (involving manufacturing) shift?

Jianlan Luo: Software and hardware are equally important. Currently, software hasn't converged to a single point, hardware hasn't converged either, and the industry has no consensus on how to integrate the two.

Question 4: What are the most critical bottlenecks in software and hardware right now?

Jianlan Luo: On the software side, while large models are powerful, they still lack long-term memory. Cross-task attempts, hierarchical control, and real-time feedback all remain difficult problems. Whether to use simulation, how much real data versus synthetic data — none of this is settled. Including whether to use RL, because applying RL in the real world brings challenges in sample efficiency, training stability, and generalization.

Hardware-wise, some high-performance platforms are still costly. Some sensors lack sufficiently fine feedback — tactile sensors, for instance, haven't reached maturity. Reliability also has significant room for improvement.

There are also many robot bodies, solutions, and actuators. I don't think one body will solve all problems going forward. Rather, there will be several relatively standardized bodies for different industries, with corresponding solutions.

Question 5: The data problem seems to be the most controversial — but it's also a classic chicken-and-egg dilemma.

Jianlan Luo: Right, it looks like a circular problem. Without sufficient data, it's hard to deploy robots to the real world. But without deployment, you can't get data.

But imagine this: if 1,000 robots work at Starbucks, making and delivering coffee 24/7, the data returned in one month would exceed the scale of any robot dataset we've seen.

And robots differ from cars in another way. Without 100% confidence, you can't really put cars in the real world — safety requirements are too strict. But robots can start in some closed or semi-closed environments, operating at 70-80% capability, feeding data back to improve the system.

Question 6: Autonomous driving also had many discussions about data problems in its early days.

Jianlan Luo: When autonomous driving started in 2016, there were also many debates due to data scarcity. But now there's too much data — Tesla disclosed 50 billion miles of on-road data last year, and data centers can't even store it all. So the concern shouldn't be whether we have enough data, but what algorithmic designs we need to better connect and utilize that data. Therefore, embodied intelligence companies that control products and ecosystems and have the capability to deploy robots themselves will have significant first-mover advantages.

Question 7: In your view, is the full-stack, hardware-software-integrated route necessary? Some companies just want to focus on the body.

Jianlan Luo: In autonomous driving's early days, there were also companies focused solely on the "brain." But now OEMs are all building their own autonomous driving. Ten years ago, when drones were hot, a wave of drone companies emerged in both China and the US. American companies said they wouldn't do hardware — I recall Intel opened over 20 labs in the US just for drone navigation and such. Of course, this was partly because the US lacks manufacturing and supply chains, so they could only do the brain. But now you can't remember any of those names, because they no longer exist. The name we remember now is DJI.

While doing only the brain can work with hardware partners, I believe the full-stack route with iterative hardware-software integration will ultimately prevail.

Part 02

If Robots Truly Achieve Manipulation, That's AGI

Question 8: AgiBot already has CTO Zhihuijun, and Yao Maoqing, executive director of AgiBot Robotics Research Institute, also has a technical background. Is there a reporting relationship between you? How will you divide responsibilities?

Jianlan Luo: Internally, we're a relatively flat, highly collaborative team. Zhihuijun has deep accumulation in systems engineering. Director Yao oversees strategic direction and the big picture. I'll focus more on pushing algorithm roadmaps and integrating external technical ecosystems.

We're in a parallel, complementary relationship, emphasizing consensus-driven, project-oriented collaboration.

Question 9: What was the background for AgiBot's partnership with Pi (Physical Intelligence)?

Jianlan Luo: First, AgiBot and Pi share many aligned philosophies — both emphasize the importance of real data, both push embodied intelligence deployment from practical ground up. That's the broad context.

Additionally, it was founded by pioneers in embodied intelligence, professors Sergey Levine and Chelsea Finn, and is currently among the best embodied intelligence companies internationally.

Question 10: Among embodied intelligence startups, AgiBot has consistently used an ecosystem approach to building the company, almost like operating a startup with big-company methods. Is this intentional?

Jianlan Luo: We believe the complexity of embodied intelligence far exceeds what any single company can bear. So we emphasize open collaboration — on one hand helping external companies achieve their iterations, on the other bringing their capabilities into our ecosystem.

Question 11: Why hasn't a star company like OpenAI emerged yet in the embodied intelligence space?

Jianlan Luo: Because the industry hasn't converged on highly deterministic technical solutions, so no one is far ahead with strong discourse power.

Question 12: People see many cool robot demo videos, but ultimately they're all human teleoperated. How can autonomous decision-making be achieved?

Jianlan Luo: The difference between autonomous decision-making and teleoperation is like thinking you're chatting with ChatGPT, but actually there's another person typing on another computer — completely different things.

The essence is the robot's analysis and modeling of uncertainty, then converting that into executable action chains. For a robot, if position changes slightly, or color changes slightly, it's different from what it remembered. The generalization capability of this perception-prediction-generation mechanism is the most critical technology.

Question 13: Recently, embodied intelligence manufacturers' muscle-flexing demos have concentrated on long-horizon, complex tasks, with each company's skill points differing. How do you define long-horizon and complex tasks?

Jianlan Luo: "Long-horizon" is a relatively subjective term. We're more focused on whether a task has complex dependency relationships in sequence, and its generalization capability, rather than absolute conditions like one minute being long-horizon and under one minute being short.

As for complex tasks — at least in manipulation, Unitree focuses more on local motion and such. There are unsolved problems in manipulation. For example, when robot hands make contact with the external world, very complex physical phenomena and models emerge. Then, under multimodal, high-dimensional visual input, how to complete relatively dexterous tasks while achieving very high success rates.

This has been the most critical challenge in manipulation for 50 years, and we're now attempting some work in this area.

Question 14: Manipulation — the robot manipulation problem — is also receiving very high attention currently.

Jianlan Luo: If robots truly achieve manipulation, that's AGI. It's intelligence more advanced than LLMs. If human civilization is zero to ten, LLMs are at most three. But if manipulation is achieved, it's at least seven or eight.

Part 03

Now Is the Best Time to Enter Embodied Intelligence

Question 15: In the pursuit of robot AGI, what interests you most?

Jianlan Luo: How to make this system have stronger autonomous learning capability and generalization. In 2016, after Google published the first deep robot learning paper, not a single learning-based robot was actually deployed to the real world. But now it's different.

The embodied intelligence research center we're newly establishing at AgiBot is neither a pure research institution nor a pure engineering deployment institution. It's an intermediate state, hoping to bridge the chain from basic science to technology deployment.

Question 16: Influenced by large models, reinforcement learning is also becoming trendy in embodied intelligence.

Jianlan Luo: Everyone is now looking in this direction, because we have DeepSeek R1, we have GPT-o1. Robotics has a 50-year history, and while many professors did pioneering work solving control stability and such, my observation over the past decade is that progress in this field always comes from other fields — like CV or NLP.

Now there are several waves of people doing embodied intelligence — some from CV, some from learning-based approaches, some from core robotics — and their perspectives all differ.

Question 17: Many large companies, industrial players, and consumer electronics companies are now entering embodied intelligence. What unique advantages do startups like AgiBot have?

Jianlan Luo: Many players entering is actually a positive signal — it means increasing attention. As the next-generation intelligent terminal, robots are naturally on consumer electronics companies' radar. They have very strong accumulation in user experience, productization, cost control, and supply chain integration.

Teams like AgiBot's advantage lies more in understanding the industry's underlying logic. They may be more vertical, more refined; we may be stronger in intelligence. Ultimately, the two directions will converge.

Question 18: What cycle do you think embodied intelligence is currently in, and is it still a good time to enter?

Jianlan Luo: Looking from 2016, I think embodied intelligence has gone through about a decade of exploration — initially it was called robot learning.

I think now is a very exciting time. Within a few years, we'll see some successes in specific scenarios.

Actually, there are 5 million robots deployed in the real world globally right now, but they're all "blind" robots — operating by absolute positioning, doing repetitive programmed work. As intelligence improves, we've entered the application window for robots.

While those idealized, all-capable robots may take ten years or more to arrive, robots with usage value in specific scenarios and continuous learning capability will come earlier.

So now is the best time to enter and to break through.

Image source | IC Photo

Recommended Reading

The flow of money, the rise and fall of people