How Far Is Embodied Intelligence from Reality: Hype, Bubble, Technological Frontier, and Commercialization | FreeS Research --- In 2024, embodied intelligence remains one of the most watched tracks in the tech investment landscape. From humanoid robots to intelligent robotic arms, from autonomous driving to smart prosthetics — the concept of "embodied intelligence" seems to be everywhere. Yet when we peel back the layers of buzzwords, a fundamental question persists: How far is this technology actually from real-world application? ## The Hype Cycle: Where Are We Now? The current wave of embodied intelligence enthusiasm bears the classic hallmarks of a technology hype cycle. Capital is pouring in, media coverage is frenzied, and demos are increasingly spectacular. But history offers cautionary parallels. Consider the trajectory of autonomous driving. A decade ago, full self-driving was touted as "just around the corner." Today, while significant progress has been made, true L5 autonomy remains elusive. The gap between laboratory demonstration and scalable, reliable deployment proved far wider than anticipated. Embodied intelligence faces an analogous challenge. A robot performing a backflip or making coffee in a controlled environment is not the same as one that can operate robustly across unpredictable real-world conditions. ## The Bubble Question:

峰瑞资本·May 15, 2024·34·2

AI veers left, robotics veers right, and embodied intelligence stands at the intersection.

This is the third installment in FreeS Fund's Embodied Intelligence Series.

Not long ago, we spoke with Zhang Wei, founder of LimX Dynamics, about breakthroughs in humanoid robot lower-limb mobility, and with Lian Wenzhao, a researcher at the Chinese Academy of Sciences' Institute of Automation, about progress in intelligent robot upper-limb precision manipulation. From the perspectives of entrepreneur and practitioner, they shared their experiences with embodied intelligence technology development, industrial deployment opportunities, and the capital frenzy.

How do early-stage investors view this embodied intelligence revolution?

This time, we invited two of FreeS Fund's tech investment colleagues, Qianhang Yan and Pengqi Liu, for a conversation. Both invested in embodied intelligence projects over the past year. Interestingly, their backgrounds and perspectives differ slightly: Pengqi Liu comes from electronics and computer science, while Qianhang Yan's expertise is in mechanical engineering and automation.

The emergence of embodied intelligence from imagination into reality owes precisely to breakthroughs in the "soft" side — AI large models — and advances in the "hard" side — mechanical engineering — plus the cross-pollination of the two.

They discussed:

What developmental stages has China's robotics track gone through, and what does its trajectory look like?
Are those viral humanoid robot videos mostly just showmanship, or are they actually close to real-world deployment?
AI large models have been buzzing for over a year now — is robotics the direction where applications can land fastest?
Why do academia and industry generally define embodied intelligence as a three-layer architecture: brain, cerebellum, and body? What developmental trajectories has each layer followed?
How will embodied intelligence transform our lives and industrial landscapes, and what practical challenges does commercial deployment face?

We've edited portions of their discussion into this article, hoping to offer fresh angles for consideration. For the full conversation, head to the Xiaoyuzhou app, Apple Podcasts, or Ximalaya and search for "Gao Neng Liang" (High Energy) to subscribe. If you're an entrepreneur or practitioner in the embodied intelligence space, feel free to reach out to Pengqi Liu (pengqi@freesvc.com) and Qianhang Yan (qianhang@freesvc.com) for further discussion.

Engagement Giveaway What do you think about the present and future of embodied intelligence? Share your thoughts in the comments.

By 5:00 PM on May 24, the three most thoughtful commenters will receive a FreeS industry research handbook and a copy of What Is ChatGPT Doing... And Why Does It Work?

/ 01 / China's Robotics Track Has Been Through Three Cycles

Pengqi Liu: My background is in electronics and computer science. Previously I focused mainly on software investments. Over the past year, I started investing in the AI large model space, which naturally led me to the embodied intelligence track. Dr. Yan has orthodox mechanical and robotics training — he's been covering robotics at FreeS for a while. How did you start paying attention to this wave of embodied intelligence?

Qianhang Yan: From undergrad through PhD, I studied mechanical engineering. During my doctoral research, I did extensive work on motion planning and trajectory planning, spanning 3D printing, machine tool processing, and robot mobility. Robotics has always been a key investment focus for me.

China's robotics track has been through several cycles. Back in 2013–2014, industrial robot investment first heated up — that's when FreeS invested in Yifei Automation. By 2016–2017, collaborative robots became the new focal point, producing well-known companies like Universal Robots, AUBO, and Changmu Valley Medical, which FreeS backed at an early stage. After the pandemic, around 2022, we started noticing a new trend toward more general-purpose robots.

Across these three waves, we can see a clear evolution:

Initially, attention focused on labor substitution and industrial automation — robots didn't need to work alongside humans.

Later, people started caring about human-robot collaboration, which meant exploring robot adaptability and intelligence. This opened new application scenarios for collaborative robots. For example, if you ask a robot to bring you a water glass and you nudge it mid-task, it can still steadily complete the delivery.

The current wave of humanoid robots and embodied intelligence marks a continued rise in intelligence levels and reflects the gradual maturation of the robotics industry. We previously discussed with Zhang Sai of Yifei Automation that domestic Chinese industrial robots have penetrated roughly one-third of the market. Combined with breakthroughs in AI large models, people began recognizing AI's potential for general-purpose intelligence. This wave is the convergence of these two factors. Since our 2023 investment in LimX Dynamics, we've been closely watching whether robots can achieve overall generalization beyond just generalized leg mobility.

/ 02 / Embodied Intelligence Today Is at a Stage Similar to Early Autonomous Driving

Pengqi Liu: The current wave of robots differs noticeably from the previous generation focused on manufacturing, particularly in human interaction capabilities and broad generalizability.

The videos we see on social media range from Boston Dynamics' showy backflips and jumps, to Tesla announcing it would build robots for factory work, to Stanford's open-source shrimp-cooking robot a few months back — all making people wonder if robots can really enter homes and do chores. Recently, Figure, backed by OpenAI, released a demo showing human-robot interaction like handing over an apple and placing dishes. NVIDIA's showcase of robotics at GTC also drew wide attention.

It seems the third wave of robotics has arrived quickly. Scenes we once considered pure science fiction suddenly don't feel so distant. From your professional perspective, Dr. Yan, how do you view this wave? Are the videos we see mostly just showmanship, or are they actually close to real deployment?

Qianhang Yan: Frankly, the current humanoid robot frenzy was largely sparked by Tesla last year. You may recall that when Boston Dynamics first went viral, the prevailing view was that it was just showmanship with no viable commercial path. Tesla's entry changed that perception, partly because its deep积累 in autonomous driving gave it a foundation to apply those technologies effectively to humanoid robots.

Tesla has not only advanced AI technology but also its own auto factories, making a commercial robot product a logical inference. So Tesla's involvement suddenly brought serious attention to the robotics concept — and of course, Elon Musk excels at generating buzz with frontier technology.

From an investor's perspective, the ideal expectation is a future where a robot can move autonomously, interact with humans naturally, and flawlessly execute assigned tasks. That's the future everyone anticipates.

I've also spoken with robotics professionals, including traditional robot manufacturers and people working on tactile sensors and other intelligent technologies. Their attitude is relatively conservative, seeing some泡沫 in the current frenzy. But everyone agrees that intelligent robots represent the future trend — the technical possibilities have been partially validated, though commercialization remains a long road.

This parallels the development of autonomous driving. Think back over a decade ago, when Google's self-driving cars first hit the roads — everyone hoped they'd reach L4 or L5 automation, which was clearly a distant goal. Today's humanoid robots or embodied intelligence may be at a stage similar to early autonomous driving.

Pengqi Liu: Right. Even though self-driving cars could operate on roads 10 to 20 years ago, they still haven't become ubiquitous. It's just a matter of time — robotics development may follow a similar trajectory.

/ 03 / Robotics Is One of the Directions Where AI Can Be Applied Fastest

Pengqi Liu: Beyond robotics, another closely watched concept is "embodied intelligence." I looked up its definition — in English it's actually called Embodied AI, which literally means "embodied artificial intelligence." The China Computer Federation offers a very technical definition: embodied intelligence is an intelligent system based on physical embodiment for perception and action, capable of interacting with the environment through an agent to acquire information, understand problems, make decisions, and execute actions, thereby producing intelligent behavior and adaptability. From a robotics perspective, this definition essentially adds intelligence and general-purpose capabilities to traditional robots, enabling them to perform intelligent decision-making and generalized tasks.

In my view, embodied intelligent robots are essentially the same concept as general-purpose intelligent robots — it's just that this wave happens to be driven by large language models, so people are calling them embodied intelligent robots. In English, a robot is just called a robot — there's no implication that it must be human-shaped. It's fundamentally an automated mechanical device capable of executing tasks. Do you think embodied intelligence and general-purpose intelligent robots are the same thing?

Yan Qianhang: Why did people gradually realize they needed to build general-purpose intelligent robots? In the earliest days, you just wrote explicit code to execute specific tasks. But later, people found this approach required legions of IT experts constantly modifying code and tasks — an enormously complex process. So people began hoping robots could possess autonomous intelligence and execute diverse tasks.

Today's robotics business model is built around robots as automation nodes — an inherently heavy model. Whether from a technical or business perspective, this is pushing all robotics companies toward general-purpose intelligent machines to achieve productization.

Coming back to embodied intelligence — Embodied AI essentially creates a concrete interface between AI, previously confined to software, and the physical world. Humanoid form is the most immediately obvious manifestation. But I don't think it has to be humanoid. General-purpose intelligent robots are just one subset of embodied intelligence; the embodied form can be varied — it could be a large piece of industrial equipment. Add AI to it, and it becomes an embodied intelligent system. Autonomous driving is fundamentally another example of embodied intelligence.

So overall, the concept of general-purpose intelligent machinery that everyone is ultimately pursuing happens to require AI to achieve, because hardware alone or traditional firmware cannot realize general intelligence. And today, AI large models have proven they possess certain generalization capabilities. Looking back, isn't robotics potentially the fastest domain for AI application? So you could say two directions are converging today — people from traditional robotics are doing AI, and AI researchers are rushing in to do robotics.

/ 04 / Is Humanoid the Best Form for Robots?

Pengqi Liu: From what I'm hearing, whether it's the "person" in "robot" (机器人, literally "machine-person") or the "body" in "embodied intelligence" (具身智能, literally "embodied-body intelligence"), the Chinese terms may mislead people into thinking this refers to humanoid form. But from the English definitions and actual industry conditions, robots don't necessarily have to be humanoid. So why have Boston Dynamics, earlier Japanese companies, and more recently Tesla, Figure AI, and domestic manufacturers all focused R&D efforts on humanoid forms? Could you share the development trajectory of humanoid robots and why the industry is so enthusiastic about humanoid form?

Yan Qianhang: Actually, the history of humanoid robot development is essentially a process of continuously upgrading control dimensions and control capabilities. Take early Japanese robots — models like ASIMO didn't have torque control functions and mainly relied on position control, so they walked with small steps and tentatively on flat ground.

Boston Dynamics was among the first teams to adopt torque feedback for local motion control. Before motor technology matured, they used hydraulic systems to develop the Atlas series of humanoid robots. Now, with improved motor performance — thanks to spillover from the new energy vehicle industry — humanoid robot companies like Figure AI are launching their products, and Tesla is heating up the market.

Initially, people's vision for robots was simply replacing humans to complete specific tasks, using machines to execute work. Now that everyone is developing humanoid robots, my personal understanding is that beyond sci-fi appeal and marketing value, from a physical perspective, humanoid robots offer the highest degrees of freedom and strongest generality in hand manipulation capability and foot mobility. Zhang Wei from LimX Dynamics has also discussed this in detail. General intelligent humanoid robots are gradually becoming industry consensus.

Pengqi Liu: In some scenarios, such as manufacturing, you don't necessarily need a humanoid form — you can just build an automated piece of equipment. But in home scenarios, for instance, we need devices and machines better suited for interacting with humans, and much existing infrastructure is designed for human needs, so humanoid form makes more sense.

Yan Qianhang: I once learned about an interesting scenario. Currently, many high-value-added industrial products still require handcrafting, such as Suzhou embroidery. It's too complex — it requires both the craftsman's learning ability and extremely fine motor operations. For example, Suzhou embroidery requires craftsmen to split a single silk thread into 20 filaments, then thread a needle and embroider stitch by stitch. This is already very difficult for humans. If robots could replace humans in completing this work in the future, it would create enormous production value.

Currently, a major limitation of traditional industrial robots is their lack of flexibility — they cannot substitute for humans in many scenarios. Looking ahead, if humanoid robots could do everything humans can do, that would be an ideal state. Therefore, service, production, consumption, and various other scenarios could all become potential application areas for humanoid robots.

05 Today's Large Models Resemble the "Brain in a Vat" From Philosophy

Yan Qianhang: Looking back, large models have played a key role in this wave of humanoid and embodied intelligence. What kind of large model combined with what kind of robot body do you think will achieve true embodied intelligence?

Pengqi Liu: This is actually what I care about most. My interest in the embodied intelligence track stemmed from large models. The most direct impact of large model technology on general-purpose intelligent robots is that it significantly enhances robots' perception and understanding of the environment. Combined with understanding human language instructions, this enables better task decision-making and decomposition. So large models essentially enhance the robot's "brain" capabilities. However, how much room large models have to improve the robot's "cerebellum" capabilities — planning and control — seems to remain an open question in academia and industry.

Yan Qianhang: Current large models resemble the "brain in a vat" from philosophy — an idealized model that only outputs language or multimodal information, existing independently of any machine or body. What kind of body or form it should connect to in the future to fully realize its general capabilities — this is what investors and entrepreneurs are currently exploring.

Pengqi Liu: We're already deep into the core technical discussion. From my understanding, whether embodied intelligence or general-purpose intelligent robots, their development mainly comes from the convergence of two industries.

On one hand, hardware: industrialization and commercialization have driven rapid development of mechanical structures, electrification, and sensors for automobiles, robots, and other applications, while also reducing hardware costs and making them more accessible.

On the other hand, software and data: from early computers and PCs to software, then to the internet, massive amounts of data have been generated. Combined with powerful computing, this produced the scaling laws and large models we see today.

This also parallels human and biological evolutionary history. Our bodies continuously adapted to the environment, evolving upright walking and developed brains. When brains became sufficiently developed, we developed unique soft capabilities like fiction and imagination, which further brought about the evolution of language, religion, and culture.

In embodied intelligence, these two lines — hardware and software/data — are converging once again. As an investor, I'm particularly excited. Not only is pure software general intelligence approaching realization under GPT's impetus, but I even feel that embodied intelligence interacting with the physical world is nearly within reach.

06 Understanding Embodied Intelligence's Three-Layer Framework: Brain, Cerebellum, and Body

Pengqi Liu: From a technical perspective, academia and industry currently define embodied intelligence as a three-layer architecture: brain, cerebellum, and body.

The bottom layer, the hardware or body, is mainly responsible for perceiving the environment and executing specific actions. The top layer, the brain, the softer part, is responsible for understanding environmental perception. For example, when receiving language task instructions, the brain can understand the task and decompose it into multiple steps — this is where large models can deliver maximum value. The OpenAI-Figure collaboration achieved this function.

So how do we connect the soft and hard lines? Mainly through the middle cerebellum layer. Just like us humans — having only a brain and body isn't enough. We also need to perform complex operations like navigation and balance, which involves controlling each joint's movement. These movements aren't actively thought out by our brain; they occur naturally in an involuntary state. This is the cerebellum's function — serving as the middle layer connecting brain and body.

Let's discuss each of these three layers specifically.

How Evolved Is the Body?

Pengqi Liu: Yan, you come from a robotics hardware background. I'm curious — to achieve general-purpose intelligent robots, how mature is the hardware body currently? Can it already support embodied intelligence development? If not, which technologies need further breakthroughs?

Yan Qianhang: Talking about the hardware layer, we can divide it into upper and lower body.

The upper body mainly consists of arms and dexterous hands responsible for manipulation. Over the years, arm technology has become relatively mature and well-developed, having been validated in industrial scenarios for many years. Dexterous hand progress has been somewhat slower — frankly, many scenarios previously didn't require particularly dexterous hands. Currently, human-like dexterous hands remain more in the research or R&D stage; I believe people will attempt to develop them.

As for the lower body — the robot's legs and feet, including motors, reducers, and other actuators — these don't currently constitute scarce technologies forming hard barriers. During the previous quadruped robot entrepreneurship wave, someone could even assemble a quadruped robotic dog on Taobao for just 40,000 yuan. But whether that dog could walk normally required the developer to figure out themselves.

For humanoid or general-purpose intelligent robots, with the emergence of new demands and rapid iteration of China's supply chain, hardware's foundational capabilities can already support some embodied intelligence deployment and validation. Of course, hardware still needs improvement to adapt to new scenario requirements.

For example, hardware impact resistance is a concern. Traditional collaborative or industrial robots typically work in fixed scenarios without much unexpected interference, so impact resistance requirements are low. But when bipedal robots walk in complex environments like the wilderness, they may slip or fall from heights — requiring hardware with greater durability and impact resistance.

Pengqi Liu: So there's a requirement for robustness. Beyond that, what's currently lacking may be dexterous manipulation capability.

Yan Qianhang: Right. Dexterous manipulation hardware is still relatively primitive, because today everyone is still at the stage of validating feasibility, without much consideration for commercialization costs and efficiency. Assuming efficiency becomes a future consideration, one immediate issue is that humans only need three meals a day to work a full day, but robots working a full day consume far more energy than humans. So after feasibility is validated, the next step in hardware will be focusing on and optimizing energy efficiency — for example, what hardware forms can improve so-called energy input-to-output efficiency.

Liu Pengqi: Beyond the skeletal and muscular hardware, humans also have ears, eyes, and noses to perceive the world. What stage is sensor technology at currently? Is its maturity sufficient to support embodied intelligence?

Yan Qianhang: From a control theory perspective, breakthroughs in control were enabled by the introduction of angular encoders — these technologies first achieved precise position control, making servo motors the mainstream technology in industry over the past three to four decades.

For robots, the introduction of visual sensors today allows robots to see, but other dimensions of perception, such as tactile and force sensors, have not yet become widespread on robots.

Liu Pengqi: Is it because they're too large or too expensive?

Yan Qianhang: It's related to both cost and size. What's currently lacking in the market are standardized, more highly integrated sensor solutions.

Liu Pengqi: So overall market demand and volume haven't picked up yet.

Yan Qianhang: So people aren't paying much attention yet. At present, the importance of robot vision has been recognized. Next, if robot tactile and force perception can be further improved, it will be very helpful for robots to achieve intelligence. This is just like with humans — if you only have limited vision, no sense of force, no sense of touch, even with an excellent cerebellum and brain, you'll struggle to complete tasks in actual execution that a fully able person could do effortlessly.

Of course, we shouldn't set the bar too high for robot intelligence, demanding they complete tasks while lacking perceptual capabilities, because intelligence and robot embodiment capabilities are complementary. On a capable body, relatively simple intelligence may be sufficient to get the job done.

Liu Pengqi: It sounds like for hardware embodiment and sensors, despite current challenges, the outlook is optimistic. As long as demand increases and given enough time, these problems can basically be solved.

Cerebellum: Robot Control Methods Are Transitioning from Model-Based to Learning-Based

Liu Pengqi: Having covered the body, let's look one level up at the cerebellum. Unlike the brain responsible for thinking and the tangible, visible body, the concept of the cerebellum is particularly abstract and may be harder to grasp. But as the intersection between the virtual and physical worlds, the cerebellum is the critical part where robots ultimately execute planning and control. What developmental trajectory has it gone through?

Yan Qianhang: From the perspective of classical control theory, the core of control is ensuring that a device can precisely accomplish a set goal after receiving a command. Control methods are essentially a process of solving equations — the key lies in how to solve this equation more accurately.

Unlike traditional industrial robots, which typically repeat tasks in simple scenarios, humanoid robots face complex, changing environments and multi-task demands. Therefore, control methods require higher response frequency and flexibility. If you try to model all control needs comprehensively, it generates a massive number of parameters, making equation solving extremely complex. Current model-based control theory abstracts simplified models and derives from these models to approximately achieve relatively ideal control effects. Boston Dynamics has been validating this control theory over the past decade or so.

Liu Pengqi: This kind of model must require very long-term accumulation.

Yan Qianhang: Yes, the ability to abstract models also has relatively high requirements. For example, how can the abstracted model approximate real-world scenarios as closely as possible, so that while reducing solution parameters, the precision of the solution is maintained. This is a process of solving equations in reverse.

From a numerical computation perspective, AI is essentially a function regressor or optimizer. So with sufficient data, can we use AI to accelerate the equation solving process? Rather than relying on traditional mathematical numerical computation methods, directly using AI models to train on large amounts of data and solve equations forward. This approach would undoubtedly greatly improve efficiency in complex scenarios. Therefore, people are now beginning to emphasize using reinforcement learning and imitation learning methods, combined with the cerebellum, to achieve more powerful control capabilities.

Another challenge facing model-based control is generalizability. Many models are effective in certain scenarios but not applicable in others.

In the past, AI has already proven its capabilities in natural language processing (NLP) and computer vision (CV). Traditional CV relied on feature engineering, only effective in specific scenarios — generalization was achieved through AI. NLP is the same; original models could only do specific tasks like translation, but now a single large model can solve various problems. So people are also considering whether robot control methods should gradually transition from model-based control to learning-based control.

Imitation Learning vs. Reinforcement Learning

Liu Pengqi: You just mentioned two key terms: reinforcement learning and imitation learning. These are also two concepts that have been discussed a lot recently in various articles and videos. I actually have some questions. First, reinforcement learning first broke into mainstream awareness because of Google's AlphaGo, which used deep reinforcement learning to defeat the strongest human player in Go. But Go is, after all, a relatively closed scenario with very clear rules. However, the physical world surrounding robots is certainly very complex and diverse. In complex environments, how can we train a good model through reinforcement learning?

Yan Qianhang: Today's legged robots applying reinforcement learning actually go through a process of gradual substitution from partial to comprehensive replacement. This process first relies on existing high-quality motion control data, which may be generated by model-based control methods. Then, an effective reward function is needed to drive the robot through large amounts of internal iteration. Since the physical scenarios robots actually face are very complex, if you seek out these scenarios during the training phase, costs will be very high. So, just like simulation driving in autonomous driving, robots can also train through simulated environments. For example, you can directly simulate a complex terrain, including up-and-down stairs, sandy ground, or cement ground, then have the robot try to train on these terrains using high-quality data — this can accelerate the model training process.

To truly get a very good model, it's actually the result of accumulated engineering experience from many aspects. It's not solely dependent on how good the simulation environment is, how strong the reinforcement learning is, or how good the data is. All of these are indispensable.

Liu Pengqi: So reinforcement learning is more of a process of learning through practice. And, simulation environments are relatively important for training a good reinforcement learning model.

I've also done some research on imitation learning. The imitation learning process is actually a form of supervised learning. For example, in playing badminton, we need humans to demonstrate the operation once, then feed this data to the robot, and the robot tries to imitate human operations as closely as possible to achieve its own actual operation. This process necessarily requires large amounts of data. If the data is insufficient, it definitely can't generalize. So I think the challenge here is, how can we collect enough data to help robots with imitation learning?

Yan Qianhang: It's actually considering that high-quality data from robots actually interacting with the physical world is very difficult to obtain, so the idea of having humans teach them to accumulate some data came about. Additionally, synthetic data can be used — for example, stir-frying actions when cooking, we can make some adjustments to them, then use AI to synthesize new video data as supplementation.

Imitation learning currently faces some problems that are difficult to solve in the short term. These problems are not directly related to imitation learning itself, but are limited by the robot's own capabilities. First is whether the robot can effectively utilize the data provided by imitation learning. Second, from an intelligence perspective, whether knowledge obtained through imitation learning can be decomposed and internalized. These two problems may not be solvable by imitation learning methods alone.

What consequences do these problems bring? Current imitation learning can teach robots to copy movements, but it can't make robots understand the logic and decision-making reasons behind the actions. In other words, robots can only imitate human movement trajectories and control parameters — this is relatively elementary in imitation learning. For example, when a coach teaches you hand-by-hand how to swing a racket, what you remember is not just the movement trajectory of the swing, but also the techniques of how to guide your body and hand to generate force.

Current robots still struggle to learn these; they cannot effectively utilize this data. This is similar to the NLP field before Transformers emerged, when people didn't know how to effectively encode text and conduct generalized training.

Another problem is that what imitation learning acquires is limited by the robot's perceptual capabilities. For example, having a robot fold clothes — it can only imitate the movement, but may struggle to recognize details like fabric material. These are limitations of teleoperation.

Finally, from the perspective of human movement, the first step of imitation learning actually doesn't go to the cerebellum, but to the brain. The brain needs to first decompose these movements, then internalize them and pass them to the cerebellum as training data. This process may still require technical iteration and more thinking from everyone. I've been wondering: at what granularity of tasks should imitation learning operate? If learning complex human tasks, like carrying a cup of water and then adding sugar, this is difficult for imitation learning to generalize. To achieve generalization, imitation needs to happen at a reasonable granularity.

Liu Pengqi: From your perspective, what's currently called imitation learning isn't really imitation learning. For example, teaching a robot to carry a cup — this one action might need to be repeated 100 times, while humans might learn it through language description or a single demonstration. Current methods are mostly supervised data-driven approaches. But in the long term, how can we achieve learning with as little data as possible? Combined with the brain capabilities you mentioned earlier, how do we make robots not just know that, but also know why? This may be the future direction of development, and also a key breakthrough point.

Where Is the Boundary Between Brain and Cerebellum?

Yan Qianhang: I'd like to add something — I've been thinking about where the boundary between brain and cerebellum lies. Through evolution, human brain and cerebellum functions have become quite clearly distinguished. However, current robot cerebellum and brain capabilities are still relatively elementary, and sometimes the boundary isn't clear.

One idea I learned from a portfolio company is to turn the cerebellum into an AI-driven foundation model — while it doesn't directly solve core tasks, it's like an underlying operating system. When the brain has specific tasks and commands, it has sufficient capability to execute these tasks.

Therefore, future embodied intelligence may focus more on how tasks from the brain can better utilize the cerebellum's robot foundation model to execute specific tasks well.

Liu Pengqi: Regarding the boundary between brain and cerebellum, our consensus should be that at the brain level — that is, robot perception of the environment, understanding of tasks, and decision-making decomposition — things are going fairly well. We've already seen this in demo videos from companies like OpenAI, Tesla, and Google.

We've also seen that academia and industry don't seem satisfied with only using large models as brains — they're starting to consider whether large models can take on some cerebellum capabilities? For example, Google's RT-1 small model and RT-2 large model released this year both began attempting to put visual language and behavior trajectories into a single large model for end-to-end training, then directly execute downstream tasks. Does this mean we don't need to discuss so-called reinforcement learning and imitation learning at all — we just feed all collectable data directly to a large model, and end-to-end results come out directly? Is this a future trend?

Yan Qianhang: My personal view is that Google's RT-X series has indeed proven the feasibility of end-to-end approaches — using large models to drive behavior is viable. But in practical use, because the large model needs to reason through trajectory data at every node, this results in very slow frequency, potentially causing jerky, stuttering motion. Keep in mind that in control systems, you typically need control frequencies above 100 hertz to ensure continuity and uninterrupted operation throughout the process. While large models may help the cerebellum develop capabilities for new tasks in the early stages, in the long run, the brain and cerebellum will likely still need to remain separate. This is like how children learn to walk — they need to carefully observe the ground and take cautious, deliberate steps. When babies walk, their brains are highly engaged, learning how to place their feet and observing the results of each footfall. But for adults, walking only requires the brain to plan the route; the precision and force of each step gradually become muscle memory.

Liu Pengqi: So could we understand it this way — in the future, the robot's brain will take on the learning role. Once it has learned something, the parameters and models it has acquired can be distilled down to a smaller model. This smaller model might not need much power or much reasoning capability; it can automatically execute the actions it has already learned, similar to the relationship between training and inference.

Yan Qianhang: Ultimately, embodied intelligence will likely move in this direction. Because no one demands that robots be all-around athletic prodigies — everyone is working to train intensively in one specific direction. The essence of athlete training is also, through repetitive muscular movement, to solidify knowledge that the brain continuously reinforces into the cerebellum, forming certain muscular instincts. The goal of training is to enable the body to make instinctive reactions before it even perceives or sees something. From this perspective, the robot's brain will indeed be very helpful for learning in the future. But in the actual deployment process, some tasks will require a reasonably capable cerebellum to handle, or will need a relatively more ideal architecture. What everyone sees today with the RT-2 model is that it can learn but cannot be used — this is a rather thorny problem.

Practical Considerations for Commercializing Embodied Intelligence

Liu Pengqi: We've had a thorough discussion of the technology. Let's close by talking about commercialization and investment-related questions. Most of the robotics companies you've looked at previously have landed in manufacturing and service scenarios. For this wave of embodied intelligence and general-purpose intelligent robots, do you think we need to consider their near-term commercialization, or can we just focus on the future for now? If we do need to consider commercialization, where do the potential challenges lie?

Yan Qianhang: If we want these types of robotics companies to commercialize, we should probably follow industry-standard practice. That is, first pursue the most perfect R&D possible, then gradually break down the technology and de-escalate to commercial applications. This is a top-down technology transfer process.

Liu Pengqi: In the past, countries including the United States concentrated resources on many large-scale high-tech projects — things like the space shuttle, rockets. Regardless of whether these projects themselves were particularly profitable, they generated many small technologies that could be commercialized for civilian use, and this civilianization itself could drive progress in other industries. In the process of developing humanoid robots, I believe many modules can also be spun out — dexterous hands, for example, can play a role in certain specific scenarios.

Yan Qianhang: From an investment perspective, we certainly hope our portfolio companies can do both research and commercialization. If a project team can deliver robots as products, that demonstrates some capacity for commercial landing. Robot vacuums, for example, have been very successful — they turned themselves into consumer electronics products. If this wave of embodied intelligent robots can find productizable directions in small scenarios and solve the generalization problem, they can achieve some degree of commercialization. The challenge is that if you want to commercialize, market expectations for products are typically very high — you need to score at least 80 out of 100. Even if you've proven your generalizability, if you haven't created a qualitative gap with competitors, commercialization remains challenging.

I've asked robotics entrepreneurs in various different scenarios: with the emergence of GPT Vision and visual foundation models like SAM, would this have any impact on their industrial robots? Most entrepreneurs told me that customers won't pay extra for this.

So the commercialization progress of new technology is still relatively slow. You must find scenarios with very clear product demand — not necessarily pure humanoid robot commercialization, but possibly composite robots. For example, traditional, already mature AGVs (automated guided vehicles) plus two arms can enable grasping and sorting in different scenarios. Today you deploy it in a factory setting for loading and unloading metal parts; tomorrow you put it in a logistics setting to grab packages. These are capabilities that general intelligence should possess. If this can be achieved, the commercial prospects are enormous. In the near term, it's still because the technology isn't mature enough — people's expectations for the future will depend on the speed of technological progress.

Liu Pengqi: From what I'm hearing, we still need to maintain the height of technology R&D, but we can de-escalate to do some scenarios that can actually land. Autonomous driving, for example, has been developing for many years and still hasn't fully commercialized, but it has spawned many technologies and products applicable to specific scenarios — these are de-escalated applications of autonomous driving technology, such as household robot vacuums, factory AGVs, and so on. I believe embodied intelligence will have similar scenarios emerging. But the biggest challenge is still what I heard Lian Wenzhao mention before — how to balance accuracy, execution speed, and generalizability.

Yan Qianhang: I very much agree with this. When this wave of embodied intelligent robotics companies searches for commercialization paths, their advantage still lies in generalizability. They don't need to compete with traditional specialized robots on precision and speed. What's more viable is finding scenarios where generalizability offers high substitution value — such as interaction with people, services for people, including elder care, childcare, and other home care scenarios. In these scenarios, humanoid robots actually have an advantage, because they fundamentally don't need to achieve operations precise to within one millimeter — many human operations don't reach that precision either.

Liu Pengqi: In other words, for traditional industrial robotics companies, including autonomous driving companies, it's not so easy for them to pivot toward embodied intelligence either? I've heard of companies planning to first make a robot vacuum, and perhaps in the future this vacuum can grow two robotic arms, then grow more sensors, thus evolving into a home service robot. How likely do you think this is?

Yan Qianhang: This comes back to a question: should technological transformation first capture the market, or first capture the technological frontier? Currently, I believe that on the consumer application side, robotics technology transformation probably still needs to stand at the technological frontier. Because market loyalty may not be that high. Once a company establishes itself at the technological frontier and truly launches some interesting and intelligent robot companion products, as a consumer, you might not necessarily choose to buy a home companion robot from a robot vacuum brand — you won't have that inertia.

Liu Pengqi: Especially given how powerful our domestic supply chain capabilities are, the gap in hardware manufacturing and product capability is probably relatively easy to close.

How to Invest in Embodied Intelligence?

Liu Pengqi: Returning to investment — across the entire embodied intelligence track, which directions of companies do you focus on, and when selecting teams, what capabilities do you prioritize?

Yan Qianhang: I usually look at hardware quite a bit. Recently I've been paying more attention to the dexterous hand direction — I'm curious whether there will be new drive forms or new technical implementation methods. Because hands have so many degrees of freedom and joints, it's particularly difficult for traditional electric drives to achieve a certain level of complexity and precision.

Another direction I think has opportunity is hardware sensors. I'm also thinking about what driving factors could promote large-scale sensor application in the robotics industry.

On the embodied intelligence or AI side, I also explore new investment opportunities across three directions: data, models, and scenarios.

Liu Pengqi: Understood. I've looked at AI quite a bit before. My focus is on the cerebellum as the core — this is also the key problem that hasn't been solved well yet. With the cerebellum at the center, looking upstream, I pay attention to the hardware that the cerebellum needs to command and the sensors that contribute data to the cerebellum; looking downstream, I focus on how to apply large model capabilities more extensively to the cerebellum, giving it stronger generalizability and generalization capability.

Combining with my previous experience investing in software, I think robotics companies need to pay special attention to product landing — they need to achieve data closed loops and iterate on technology. Tesla is a great example: it first sold cars, the cars had sensors installed, they could automatically collect data, and thus iterate on technology.

Additionally, the entire embodied intelligence track is quite competitive right now — competing for capital, competing for talent — which also places very high demands on companies' fundraising capabilities.

Today's discussion has been quite thorough, covering industry status, underlying technology, as well as commercialization and investment opportunities. I hope it has provided some inspiration, and I welcome more in-depth discussions with everyone. Thank you!

Engagement Benefit

What do you think about the present and future of embodied intelligence? Feel free to share with us in the comments.

Until 17:00 on May 24, the 3 readers with the most thoughtful comments will receive a FreeS Fund industry research handbook and a copy of This Is ChatGPT.

▲ Li Feng in Conversation with Lian Wenzhao: The Imagination and Bubble of Large Models, the "Impossible Triangle" of Robotics, and the Future

▲ Li Feng in Conversation with LimX Dynamics Founder Zhang Wei: Humanoid? Robot? | FreeS VC Dialogue

▲ From 1/10 to 1/3 Market Share, How Did Domestic Industrial Robots Achieve Explosive Growth? | FreeS VC Dialogue

▲ Toward 2024, How We Think About AI Entrepreneurship and Investment | FreeS Year-End Special

▲ After ChatGPT Went Viral, Where Does AIGC Go? | FreeS Report 28

▲ FreeS Report 20 | Learning from History, Why Are We Bullish on Industrial Robots?

Star the FreeS Fund WeChat Official Account — Firsthand business insights delivered promptly