Li Feng in Conversation with LimX Dynamics Founder Zhang Wei: Humanoid? Robot? | FreeS Fund Dialogues

峰瑞资本峰瑞资本·April 6, 2024

All problems in the world are computational problems.

A bipedal robot hikes through mountain terrain — descending gravel slopes, climbing grassy inclines, crossing drainage ditches. The ground is complex, yet the robot handles it with ease. There are close calls: it nearly trips on a dirt mound, takes deliberate hits from human testers. Notably, it passes every test, demonstrating remarkable control and stability. These are scenes from a demo video recently released by LimX Dynamics. The video sparked widespread discussion online, fueling curiosity about robotics technology and imagination around real-world applications.

This conversation grew out of that viral video. Our guest is Dr. Wei Zhang, founder of LimX Dynamics. Zhang established the general-purpose legged robotics company in 2022 and is a tenured professor at Southern University of Science and Technology. Before founding LimX Dynamics, he was a tenured professor at Ohio State University and a postdoctoral researcher at UC Berkeley. He holds a Ph.D. in electrical and computer engineering from Purdue University and completed his undergraduate studies in automation at the University of Science and Technology of China.

Feng Shu and Professor Zhang discussed:

  • How China's first bipedal robot to successfully hike in the wild was built, and what "black tech" made it possible
  • How the difficulty of AlphaGo defeating the world Go champion in 2016 compares to a bipedal robot walking in the wild in 2024
  • What machine learning and imitation learning actually mean
  • Whether humanoid design is necessary given that humans aren't perfect
  • Since humanoid robots fundamentally replace human movement, where exactly do current humanoid robots stand in terms of locomotion and manipulation, and what challenges remain?
  • How the humanoid robotics industry might develop in China and the US

We hope this offers fresh perspectives and insights. This is also the first installment of FreeS Fund's Embodied Intelligence Series. In this episode, Dr. Zhang focuses mainly on humanoid robot locomotion and whole-body motion control. A future episode will delve into upper-body and fine manipulation topics. Beyond the entrepreneurial perspective, two FreeS investment colleagues will also host a discussion on embodied intelligence. You're welcome to search for and subscribe to "High Energy" (高能量) on Xiaoyuzhou, Apple Podcasts, or Ximalaya to listen to our program, or return to the FreeS Fund WeChat account to read the full transcript.

Engagement Giveaway

What innovation opportunities do you see in the embodied intelligence space? What are your expectations for humanoid robots? By 17:00 on April 10, the 5 most thoughtful commenters will receive copies of The Third Chimpanzee and Sapiens: A Brief History of Humankind.

01 Could Bipedal Robots Replace China's National Soccer Team?

Li Feng: LimX Dynamics recently released a bipedal robot demo that generated tremendous buzz. Many media outlets, including Xinhua and Cankao Xiaoxi, reposted it. What's the view count at now?

Wei Zhang: It had already surpassed 13 million within a week of release.

Li Feng: And countless likes, with tons of comments. Many people said this was remarkable Chinese technology they'd never seen before. One highly upvoted comment was quite interesting — when could this robot replace China's national soccer team? We could briefly discuss whether it might replace professional athletes, like soccer players or mountaineers?

Wei Zhang: I think there's hope for the future.

Li Feng: When I shared the video, I wrote, "It looks a bit goofy and pitiful, but still very impressive compared to its peers."

Wei Zhang: Haha, what's goofy about it?

Li Feng: Its gait is still somewhat stumbling, though even when it's about to fall, it can recover itself. Where does this robot stand globally in this direction?

Wei Zhang: First, it's a bipedal robot. Bipedal robots are generally considered among the more difficult types to control. Currently, most bipedal walking remains confined to controlled laboratory environments. Of course, in lab settings, robots can perform flashier moves than walking — like Boston Dynamics' humanoid robot Atlas doing backflips. However, truly open-environment walking in the wild by bipedal robots remains relatively rare globally.

Li Feng: Besides LimX Dynamics, has anyone else released this type of video?

Wei Zhang: I haven't seen other similar videos so far. Though in terms of capability, as AI technology develops, other teams will gradually be able to achieve this too.

02 "We've Found the 'Switch' for Humanoid Robot Locomotion"

▲ LimX Dynamics bipedal robot P1 undergoing zero-shot, unprotected, fully open testing in a wild forest

Li Feng: Looking back at the video, we can examine some technical details.

First, it has two legs. The balance control difficulty for bipedal robots is significantly higher than for quadrupedal ones. Especially when the robot stumbles — if one foot misses its landing, there's no third or fourth leg to help stabilize it. The other foot must react instantly, with no time to think, or a fall is almost certain.

Second, this robot's feet are much smaller than human feet. They're point feet, making balance difficult because both contact area and force distribution are constrained.

Third, the robot faced complex terrain. Ground hardness is unpredictable, with hidden potholes, and slope and elevation changes that are hard to discern. Even with visual sensors (though the bipedal robot P1 in this video didn't use vision), misjudgments easily occur — like when people step into holes or trip in forests because fallen leaves obscure the ground. In these situations, humans can use their hands to recover balance, but this robot could only rely on its two feet for real-time adjustment. So we see it stumble and nearly fall.

Plus there's deliberate interference — pushing, hitting — actually for testing purposes, all of which destabilize the robot. Beyond these, what other interesting details or difficulties might not be easily noticed?

Wei Zhang: You've covered almost everything. I'll add one point: this bipedal robot's footless design was actually a method we used to test our algorithms. Having no foot is like walking on stilts. Imagine humans walking on stilts through mountains — our locomotion capability probably wouldn't surpass this robot's. I'd say this robot has already achieved human-level basic locomotion capability. While we can't guarantee it will handle every complex scenario you mentioned without any mistakes, this is already a qualified starting point, and I believe subsequent progress will be very rapid.

Li Feng: Right, there were some interesting comments under the video: if robots could explore unknown environments, like earthquake sites, coal mine shafts, or complex scenes with collapses or mudslides, it would be great if they could enter to rescue people or deliver supplies. Does it currently have this capability?

Wei Zhang: From a locomotion perspective, I believe the technical validation has basically passed. However, going into complex environments to perform grasping, carrying, and other tasks involving upper-body fine manipulation still needs time to iterate and improve. But overall, this is something people can reasonably expect — it's feasible, and the pace of future technological progress will be quite fast.

Li Feng: So in the near future, at least in our daily lives, we might see humanoid robots delivering things indoors.

Wei Zhang: As long as it's not too heavy, it shouldn't be a problem. Many technical challenges are like a layer of window paper — once pierced, subsequent development becomes predictable. Doing frontier exploration or tackling key problems is like groping in the dark; you can't see anything, it feels very difficult. Suddenly you find that critical "switch," and everything becomes simple. From the perspective of robot locomotion capability, I believe we've now found such a "switch."

03 What's Hard About Bipedal Robots Hiking Freely in the Wild, and How Was It Achieved?

Li Feng: Could the demo we released have been produced a year or a year and a half earlier?

Wei Zhang: It would have been difficult, because it involves multiple factors coming together when conditions are ripe, rather than being achievable through single efforts alone. Some of these factors are within our control, some outside it. Let me list a few:

  • Maturation of AI infrastructure: Though not directly related to humanoid robots, the development of AI toolchains and infrastructure has profound indirect effects on the humanoid robotics field. I can make an analogy: the rise of large models like ChatGPT has greatly accelerated AI infrastructure construction, equivalent to building highways for many industries. Our robotics industry can use these highways to transport our goods more conveniently. This is a major transformation that has become increasingly mature in recent years.
  • Maturation of robot hardware: Hardware is the foundation of robotics technology. Its maturity directly affects robot performance, application scope, and many other aspects. With hardware developments in recent years, the critical "switches" in various technical directions have gradually been found, which is why we're starting to see walking robots emerge in the past couple of years.
  • Breakthroughs in reinforcement learning: Reinforcement learning is one of the core technologies for intelligent robot control, and the discovery of its technical "switch" also occurred roughly within the past year.

Li Feng: Could you explain what reinforcement learning is?

Wei Zhang: Reinforcement learning is a branch of deep learning and machine learning. Fundamentally, they all do one thing: transform goals we can cognitively understand in real life into mathematical loss functions or reward functions.

In simple terms, it works by setting a goal (such as keeping a robot walking upright without falling), then translating that goal into a numerical metric (positive rewards for staying upright, negative penalties for falling). Next, we use neural networks to describe and learn how to achieve this goal — that is, to optimize the policy. Reinforcement learning can be viewed as a special form of machine learning, involving a technical term called Markov Decision Process, which is a dynamic system for learning decision-making. Through neural networks, humanoid robots learn to adjust their strategies based on current states to maximize rewards. Specifically, they optimize the neural network's parameters according to reward signals received, so as to select better strategies in subsequent actions.

Li Feng: When we discussed reinforcement learning and imitation learning internally, someone made an analogy: the whole process is like a baby learning to walk. When a child watches adults — say, mom and dad — walking, that's imitation learning. Then stumbling and falling on their butt is like a penalty, prompting the child to correct and adjust their gait; while successfully walking to a destination and getting a hug and reward from mom and dad, whether a lollipop or a toy, reinforces the memory of the correct walking pattern. Reinforcement learning is similar to how a child learns to walk through this process of falling on their butt and getting rewards.

Wei Zhang: That's a vivid analogy. With the advancement of deep learning technology and computing resources, you can imagine thousands upon thousands of "children" walking and learning simultaneously — the learning efficiency becomes much higher.

Li Feng: Or rather, it's like one child having thousands of clones, simultaneously learning to walk through falling (on their butt) and reaching destinations (getting lollipops), then merging the results from all these clones' learning. So this child learns to walk and run much faster.

Wei Zhang: Exactly.

▲ A LimX Dynamics bipedal robot descending a gravel slope

Li Feng: Returning to that video we discussed earlier, the bipedal robot demonstrated outstanding locomotion capability, particularly in its lower limbs. I want to ask: from hardware to algorithms to control, was everything shown in the video completely self-developed by us?

Wei Zhang: Yes, both our hardware and algorithms are self-developed. Of course, we also build upon existing, publicly available research from predecessors to iterate — it's the result of continuous evolution across the entire field.

Li Feng: In achieving this process, does it lean more toward scientific research or industrial practice, or somewhere in between?

Wei Zhang: Probably both are needed, which is why it's so difficult. It requires both engineering implementation and scientific research breakthroughs, because some problems hadn't been tackled or well-solved before — it's not simply a matter of copying what others have already done. So we need to propose novel methods and strategies, we need to do some innovation.

/ 04 / Revisiting the Moment the Bipedal Robot Nearly Tripped on a Mound and Recovered

Li Feng: In the video, there's a brief clip where the bipedal robot nearly trips on a mound while crossing a small gully. I noticed the video was deliberately slowed to 0.25x speed here — though it doesn't appear extremely slow — and we can see the robot, in the instant before falling, quickly adjusting its supporting leg, then using the other leg to modulate its stride, relatively swiftly recovering its standing posture from a near-fall stumble. What was the thinking behind deliberately presenting this seemingly "silly-cute" detail?

▲ The bipedal robot nearly tripping while crossing a ditch

Wei Zhang: I haven't discussed this in detail with the team. Slow-motion is common in sports broadcasting, like replaying highlight moves in gymnastics competitions. From a technical perspective, this clip nicely demonstrates the bipedal robot's fundamental locomotion capability when facing complex disturbances. Imagine someone on stilts trying to recover balance in a similar situation — the difficulty would be considerable. This detail actually happens to show that this robot's locomotion capability has reached or surpassed human level, and slow-motion makes it more intuitive to see.

Li Feng: Let me ask some detailed questions. First, when the bipedal robot faces a risk of falling, it needs to perceive changes in its center of gravity and balance state, and transmit this information to the decision-making system or algorithm. Then, the algorithm needs to rapidly formulate an adjustment strategy, sending control commands to various joints (such as motors and other mechanical devices) for real-time adjustment.

However, the actual situation may be far more complex: for example, adjusting the left leg may not achieve the expected effect, requiring simultaneous adjustment of the right leg to cooperatively recover balance. From an ordinary person's perspective, this process involves rapid feedback, decision-making, adjustment, and responding to unexpected situations — multiple links, each needing to be completed in extremely short time, with considerable complexity and difficulty.

I'd like to know, how is this process specifically implemented?

Wei Zhang: It is indeed very difficult, which is why this kind of effect was hard to achieve for so long. Today it's mainly thanks to advances in artificial intelligence, particularly neural network development, that the problem has become relatively easier. Previously, we might have needed to manually design every possible response strategy based on logical rules or models — both complex and difficult to scale. Now, with the end-to-end learning capability of neural networks, bipedal robots can learn on their own how to handle various complex situations by simulating numerous falling and recovery scenarios.

Although neural networks may appear to outsiders as a "black box," that doesn't mean we know nothing about what's happening inside. In fact, it contains multiple functional modules, including the perception, decision-making, and adjustment links you mentioned. These modules don't operate in isolation but collaborate with each other. The neural network learns through automated methods, generating large amounts of training data and self-training based on this data.

/ 05 / Bipedal Robots Walking in Real Environments Is Somewhat Different from AlphaGo Learning Go

Li Feng: If we turn back the clock, AlphaGo playing Go was probably the classic case of neural networks in reinforcement learning back then. In 2016, when AlphaGo played against Lee Sedol, initially they traded wins and losses. However, as AlphaGo iterated and upgraded, especially through continuous learning and optimization in self-play, it ultimately demonstrated strength surpassing human players — it just kept winning. In this process, AlphaGo first learned from large amounts of human game records, then continuously iterated through self-play, which has striking similarities to how we're discussing bipedal robots improving their balance capability through reinforcement learning.

Wei Zhang: Yes.

Li Feng: But the challenges bipedal robots face now are somewhat different from AlphaGo learning Go. Go is a closed, theoretically exhaustible environment. In contrast, our robots need to apply reinforcement learning in open, unknown real-world environments — isn't this much more difficult?

Wei Zhang: Playing chess and robot motion control may not be the same category of problem, so it's hard for me to say which is more difficult. I think Go is quite difficult — its search space is genuinely quite large. Although our bipedal robot's movement appears to be continuous-space motion, which people consider inexhaustible, actually everything is exhaustible. All problems in this world are computational problems; all theories are about reducing computational complexity. Of course, this is my personal view.

Humanoid robot locomotion capability has definitely seen relatively major breakthroughs in the past year or two. Unlike computer games that can iterate in virtual environments, humanoid robots, though also trained in virtual environments, ultimately need to form a closed loop with real-world hardware and physical environments — this is extremely challenging. As early as 2015, humanoid robots walking indoors was basically feasible, but ensuring that indoor training results could transfer seamlessly to actual physical world use — this was the biggest barrier, and it also depends on continuous hardware iteration. The Go playing you mentioned happens entirely in the virtual world; AlphaGo can entertain itself with self-improvement.

Li Feng: Past cases, whether AlphaGo playing Go or models and algorithms applied in online games, all validated reinforcement learning in 100% purely digital environments. These examples show that under idealized simulation conditions, reinforcement learning works well. However, for today's humanoid robots, one challenge is that many differences exist between simulation environments and real environments, including ground conditions and other environmental factors, as well as the robot's interaction with the physical world itself — all of which affect the final results.

Wei Zhang: Yes, the difficulty with humanoid robots is that it's not just a virtual matter, it's also a real matter. It's a physical process. A crucial point is connecting simulation and the physical world, then making what is learned in simulation truly usable in practice — I think everyone spent a long time solving this gap. Because hardware iteration speed is relatively slower than software, much much slower. One or two iterations of hardware might take half a year or a year. Additionally, hardware exploration takes a long time — you never know what a good solution is without doing experiments. Compared to making thousands of copies in simulation, the time cost of doing experiments is much higher.

/ 06 / Why Have Humanoid Robots and Embodied AI Become Hot?

Li Feng: Alright, our discussion today mainly covers two parts: first, bipedal robot technology itself, and second, your general-purpose robotics company. As a founder, do you think the humanoid robot or embodied AI direction is relatively hot right now?

Wei Zhang: I think there's a certain degree of heat. People still have certain expectations for it, and there have indeed been some fairly substantive developments.

Li Feng: From a rational perspective, there must be reasons why it's hot. Where do you think this heat comes from, and why is it hot?

Wei Zhang: I think there are two main factors. First, the development of AGI (Artificial General Intelligence), which is also the key to why the term "embodied intelligence" emerged. Previously, although AI had developed quite well, it mainly played a role in the virtual world with limited impact on the physical world. Now with AGI, people are starting to think about how to make AI affect the physical world, and this requires new terminals and carriers — such as humanoid robots.

Another factor is that humanoid robot hardware itself has reached a stage from quantitative change to small qualitative change. Previously, building a decent humanoid robot was very difficult, time-consuming and labor-intensive, with low success rates. Now hardware has advanced, and many teams can quickly put together a humanoid robot comparable to previous ones in a short time, though without major functional changes. So the maturation of hardware is also an important prerequisite for AI to play a role on hardware.

▲ LimX Dynamics' humanoid robot CL-1

Li Feng: When you say hardware, what specifically are you referring to?

Zhang Wei: Take humanoid robots, for example. They've always been considered extremely complex. The legs, or lower body, have been a particularly stubborn problem. Going back to the light switch analogy — it's like searching for a hidden switch that everyone struggled to find for the longest time. It doesn't require a new physical discovery, but it is an engineering process of continuous trial and iteration. The switch we're looking for requires both hardware and software to work in concert. If you build the hardware but can't control it well with software, you might not even know which side the problem lies. And when you're iterating on software, defective hardware won't cut it either. This iterative process is like walking with alternating steps — it's hard to take a huge leap with one foot. Now I feel both feet have reached the switch.

Li Feng: That reminds me of the old saying, "trying to run before you can walk." It's like asking the robot's "brain" to learn posture control, gait, and balance while its limbs are still unstable and underpowered. Theoretically, both aspects need to progress together gradually, like a child learning to walk — falling a few times, growing stronger, and eventually running.

Zhang Wei: Exactly. Software and hardware are tightly coupled; software can't iterate independently because it has no way of knowing what's good or appropriate. Previously, hardware was extremely expensive and took long cycles to produce, so software never had the chance to mature. Now everyone has more or less found a viable path forward. The key isn't to hoard breakthroughs — it's continuous iteration.

/ 07 / What progress has the hardware industry made?

Li Feng: When we talk about hardware today, are we mostly referring to things like reducers, motors — the degrees of freedom, precision, and sensitivity of controlled joints — or a broader range of hardware components?

Zhang Wei: Mainly the overall structure, including frame components, joint design, and transmission system solutions. How to implement these — whether to install sensors, whether to use force sensing, how fast the sensor response needs to be — the requirements differ for arms versus legs, and each requires step-by-step experimentation. Every iteration takes considerable time. Before AI technology advanced to its current level, people weren't particularly invested in this, and funding was limited. It doesn't require some massive physical breakthrough; it just needs continuous iteration to reach a critical tipping point.

Li Feng: We recently invested in a six-axis force sensor project. They're using MEMS technology, which is driving costs down rapidly. Though from what I can see, six-axis force sensors aren't yet widely installed on current humanoid robots.

Zhang Wei: Force sensing is indeed needed; the question is how to obtain force information. In some cases, estimating force through the robot's own drive system and algorithms is sufficient — no additional sensors required. Of course, you can directly install sensors to measure force, but the response speed needs to be fast. If you send a command and only get a reaction a second later, that's unusable. How fast is fast enough — these things need to be gradually figured out through iteration. I feel we've now found a solid starting point. Going forward, it'll be a flowering of different approaches, with each team iterating along their own path. Progress should be rapid.

Li Feng: As the power system for humanoid robots, have motors reached a relatively usable state for applications in legs or the lower body?

Zhang Wei: "Usable" is about right. "Good" would be an overstatement.

Li Feng: If we imagine humanoid robots becoming more capable — say, playing soccer, whether at an elite or ordinary level. Beyond activities like mountain climbing, what improvements or advancements do motors need to handle more complex athletic scenarios?

Zhang Wei: Increasing torque density remains a development direction for motors, though it can be compensated through other transmission methods. Hardware development primarily depends on functional positioning, market demand, and commercial value.

Personally, I think having humanoid robots play soccer or climb mountains may be of limited significance. It's more practically meaningful to have them perform everyday service tasks — fetching water, assisting with production line operations, and so on. These tasks don't demand extreme complexity or peak physical performance from the robot, better realizing its value in helping humans. After all, not everyone can play soccer on a field; some people can't even run, but that doesn't prevent them from being productive workers and contributing members of society. This role — what the robot actually does — determines the direction of hardware iteration, and different companies may make different choices.

If you're pursuing extreme athletic performance, you might need to aim for something like Boston Dynamics' early products, but that requires significant improvements in motor performance — for instance, Boston Dynamics' early products used hydraulic systems.

/ 08 / "The form that represents the greatest common divisor across all scenarios — that's humanoid"

Li Feng: Let's discuss two other questions. First, I'd like to know whether you believe a humanoid design is necessary for robots, Professor Zhang — this is currently a hotly debated topic with differing viewpoints.

Zhang Wei: Though opinions vary, a humanoid design is necessary to some degree. Early on, when people were promoting AI technology, many reports and marketing materials depicted humanoid figures, even though AI itself couldn't draw at the time. These promotional posters often featured a human head, sometimes even just half a head, so people have long associated AI with humanoid carriers.

People consider many factors when thinking about humanoid robots. One intuitive and fundamental perspective is the social and emotional dimension — humanoid robots more easily resonate with people and gain acceptance. Though I come from an engineering background and don't have any particular attachment to humanoid design, in the video we released this time, the scene where the bipedal robot gets "hit" — many viewers commented that they felt heartbroken watching it. That's when I realized people do have this social-emotional connection to robots.

Li Feng: Very strong reaction — indeed, many people felt the robot being hit was quite pitiful.

Zhang Wei: I hadn't deeply appreciated this before, but I certainly feel it now. But this isn't our main rationale for designing humanoid robots. One viewpoint holds that human evolution isn't perfect, so there's no need for robots to imitate human form — better to model them after birds and create flying robots. This perspective is quite common. In reality, we don't design humanoid robots simply to mimic human appearance; we value functionality. Though human form may not be the most perfect or efficient in nature — we can't outrun cars or outfly birds — modern society as a whole is built around human form.

Li Feng: Let me interject here — we previously recommended a fascinating book, The Third Chimpanzee. Read alongside other works like Sapiens, it gives you profound insight into human evolution. The author argues that the shift from quadrupedal to bipedal locomotion was essentially a survival-pressure-driven prioritization of brain development. This transformation sacrificed speed and stability in exchange for significant brain evolution, which in turn drove us to learn tool use, including fire.

This is a classic case of compromise and evolution — we discarded certain characteristics while gaining decisive advantages that, over the long course of evolution, helped humans surpass nearly all other species to become one of the most prolific mammals on Earth.

While robots don't necessarily need to evolve this way, and as you mentioned earlier, robot design should be based on functional requirements, the question remains: are two legs truly the optimal choice?

▲ The shift from quadrupedal to bipedal locomotion in humans. Image source: Unsplash

Zhang Wei: I believe two legs are best.

Li Feng: Because they're most energy-efficient, or most efficient overall?

Zhang Wei: Because two legs best adapt to our human environment. We humans have two legs — not particularly energy-efficient, but the result of hundreds of thousands of years of evolution, and unlikely to change in the next ten thousand years; we won't suddenly grow a third leg. Our living environment — furniture heights, doorknob positions — is all designed around human proportions. The closer a robot's form is to human, the more seamlessly it integrates into our environment, and the lower the integration cost. The more a robot's form diverges from human, the more real-world objects need to be redesigned and redeployed. So the primary consideration in robot design is adapting to the human environment. I've always believed robots should be designed to serve humans within human environments, or to replace human labor.

In fact, every application scenario has a robot form best suited to its characteristics — three legs or other forms may be perfectly suitable. If we select a single scenario, and that scenario or task is sufficiently large, treating the robot purely as a tool, its form is typically not humanoid. But if we want the form that represents the greatest common divisor across all scenarios — that's humanoid.

Additionally, from the robot's perspective, the critical aspect of bipedal humanoid design is having two arms for manipulation. The greatest advantage of humanoid robots lies in their general mobile manipulation capability — the ability to complete various tasks within human activity spaces, just as humans do. People have many ideas about robot forms — three legs, four legs plus two arms — but these add unnecessary complexity and may leave robots unable to maneuver in tight spaces, turn flexibly, or reach high places. By contrast, because humanoid robots approximate human form, they dramatically reduce integration costs for environmental interaction, making them a relatively general-purpose design choice.

Li Feng: LimX Dynamics previously researched wheeled-legged robots — whether with two or four legs, the feet aren't point feet or soles but equipped with wheels. These robots can both walk and roll. Against the backdrop of current technological advances, will wheeled-legged robots experience similar technological evolution? When might they achieve more complex, more impressive capabilities — for instance, traversing rugged mountain terrain, then quickly switching to roads for long-distance running, then continuing to climb steep slopes?

▲ LimX Dynamics' wheeled quadruped robot W1

Zhang Wei: From a technical feasibility standpoint, achieving these three mode transitions is entirely possible for wheeled-legged robots, and we're actively developing in this direction. For quadruped robots, whether wheeled or not, the current focus is solving "generalized mobility" — achieving broad adaptation to various terrains. That means reaching human-comparable terrain adaptability; no need to surpass humans.

Li Feng: Like crossing snow-capped mountains and grasslands.

Zhang Wei: That's not very general-purpose. Not everyone climbs snow-capped mountains. For the vast majority of everyday terrains people encounter in daily life, if you want a robot with strong generalized mobility, four legs are sufficient — it's the optimal form. The primary purpose of bipedal robot design is to achieve upright walking, so that the front two limbs (the arms) can perform manipulation — in other words, to free up the hands.

Li Feng: Let's think differently. Suppose we set aside flight functionality and design a Transformer-like robot — it could have two legs or two wheeled legs, or four wheeled legs, and even transform its front two wheeled legs into arms.

Zhang Wei: Too complicated. Once humanoid robots can be made good enough, they can handle most general-purpose manipulation tasks. The function of two legs is primarily stable operation. If you want speed, have it ride a hoverboard or drive a car — use human tools. You don't need one robot that can do everything. I think two legs plus wheels is unnecessary, overly complex, even somewhat against first principles.


The Essence of Humanoid Robots Is Replacing Human Movement: Mobility and Manipulation

Li Feng: Since we're on the topic of bipedal and quadrupedal robots, today some people say they're developing humanoid robots, while others say they're building embodied robots. When we talk about robots, how should we classify them? For instance, we joked earlier that Professor Zhang mainly works on the lower body of robots — the legs, the lower limbs. There are also other classification methods mentioning a three-layer architecture: brain, cerebellum, and body.

Zhang Wei: Let's start from function. The essence of robots is replacing human movement. Why call it essence? Because it can be used to define what a robot is. At least in traditional robotics, if it doesn't move, you can't call it a robot — so those conversational robots are actually chatbots; if they don't move, they're not robots.

Robots essentially replace human movement; AI replaces human thinking. Their fundamental goals are different. Movement can be divided into mobility and manipulation — what you referred to as lower body and upper body. These two categories are the core tasks of robots. Some robots only do mobility, some only do manipulation.

Li Feng: If it's only mobility, is that like fully autonomous vehicles in intelligent driving?

Zhang Wei: Exactly. Both mobility and manipulation can be embodied, and both can have brains and cerebellums. For example, a person without hands or without feet still has a brain and cerebellum. A humanoid robot aims to create a "non-disabled person." Its ultimate mission is to possess human-like generalized mobility and manipulation capabilities. This process requires a lot of AI. AI development can help both the cerebellum's motor capabilities and the so-called brain's perception of the world. For robots, the purpose of perception is also for movement — to better interact with and influence the environment through movement. For instance, you need to know where the cup is before you can grab it.

Li Feng: LimX Dynamics' videos demonstrate the progress in bipedal robot lower-body movement capabilities in open environments, thanks to the combination of hardware and software with AI technologies, especially reinforcement learning. If these technological advances were applied to the upper body of humanoid robots, what would happen?

Zhang Wei: Essentially, robot movement has two core objectives: first, changing its own position and state, such as moving from point A to point B; second, changing the state of objects in the environment through its own movement — for example, the robot handing you this cup.

Li Feng: Or handing me my wallet.

Zhang Wei: Hahaha. Upper limb movement technology has been developing for four or five decades and is relatively mature. But today, people's requirements for robot upper limbs go far beyond autonomous movement — the key is achieving precise manipulation and successfully completing specified manipulation tasks. From this perspective, reinforcement learning may have limited utility for improving precise manipulation in robot upper limbs, because it focuses more on robot-environment interaction. In comparison, current frontier environmental understanding and perception technologies, as well as latest advances in imitation learning, are more useful for driving further iteration of robot upper limb functions. I think these technologies are in a state where they're about to find the "switch."

Li Feng: Alright, let me try to explain from a non-specialist perspective. Human upper limbs, counting from the rise of automatic control technology in the 1950s and the birth of the earliest mechanical arms and industrial robots, have a very long development history. Those early mechanical arms could be seen as primitive "upper limbs," but their capabilities were limited to mechanical, repetitive, and relatively simple movements. Now, our requirements for robot upper limbs have significantly increased — we want them to perform high-precision work, such as precisely picking up a cup, or executing tasks like "how to put an elephant in a refrigerator" that require multiple precise steps.

Zhang Wei: Yes, or rather, the difficulties are somewhat different from the early days.


What Is Imitation Learning, and What Progress Has Been Made This Year?

Li Feng: Okay, we'll have another episode later to deeply explore topics related to robot "upper limbs" or fine motor skills. Professor Zhang mentioned the concept of imitation learning just now — could you briefly explain it?

Zhang Wei: Imitation learning is also a type of machine learning. Essentially, a person or expert provides a demonstration, and the humanoid robot observes this operation and learns to reproduce it. There's a popular term for it: "watch and do." What needs emphasis here is that imitation learning is largely data-driven — in most cases, imitation learning really has no model and doesn't model the physical world; it just observes the action and completes the reproduction. Reinforcement learning is essentially model-driven; it relies on modeling the world and then combining various attempts in a simulated environment — it's just that AI has improved the model's generalization capability.

Li Feng: In plain terms, imitation learning is like "learning by example." For instance, learning to play piano or badminton — these skill-based sports often require hands-on teaching of certain movements, which is essentially imitation. Of course, during the learning process, we reinforce learning through experiences like scoring a goal or missing a shot. Returning to the upper body topic — why can't the upper body use reinforcement learning as much as lower limb movement does, and instead needs to rely more on imitation learning?

Zhang Wei: According to the classification I mentioned earlier, humanoid robot self-movement usually has good physical simulation models that are relatively close to reality, making reinforcement learning relatively easy to apply. But when it comes to upper body manipulation, although self-movement simulation is relatively easy, interaction with manipulated objects is extremely difficult to model accurately. Especially because there are too many types of manipulation objects — for example, there might be twenty or thirty different items on a desk right now. Building clear, precise simulation models for every single thing is very difficult. Particularly at the end of the robotic arm, whether it's a hand or a gripper, the physical process of contact and interaction with objects is hard to describe accurately and in detail.

Precisely because we cannot fully model the real physical world, imitation learning has become a more feasible choice in the initial stage. In fact, the progress of projects like the Stanford cooking robot over the past year, especially the last six months, has essentially been progress in the application of imitation learning.

Li Feng: When we interact with objects unrelated to our own bodies, we often draw upon inherent human common sense, basic physics knowledge, and understanding of patterns.

For example, when we pick up a cup or other item, we subconsciously judge whether it's hard or soft, whether it's like an egg that shatters with a squeeze, or like a phone that won't break even if dropped a few times. Based on the object's material, friction, and other factors, we decide whether to pick it up gently, grip it firmly, or carefully cup it in our hands. These judgments all relate to our perception and interaction with the physical world — including estimating an object's weight and shape, and choosing what posture and angle would be relatively stable and convenient for grasping and lifting.

Furthermore, facing objects in different states — such as an inverted, upright, or tilted cup — we need to judge what angle of contact is most secure. All such operations involve estimating numerous physical quantities, as well as feedback and judgment during the contact process.

For example, sometimes we see a box that looks lightweight, but when we try to pick it up we find it's extremely heavy because it contains several pairs of dumbbells — this is a kind of feedback during contact.

The problem now is, as Professor Zhang said, for reinforcement learning, even when facing the same type of box, it may be difficult to make accurate model predictions and judgments before actual contact. Because many physical properties, including material hardness, are hard to completely and accurately simulate in a simulated environment.

Zhang Wei: Indeed, for example, fluid modeling is extremely complex.

Li Feng: Next, I'll give two small examples to illustrate the different progress being made in this field worldwide today. One direction we've invested in is called next-generation industrial design software. So what's different about new CAD software compared to traditional CAD? The key is that before designing a structure, they can pre-consider all possible material properties needed to calculate how it absorbs sound, bears weight, resists earthquakes, and so on. Users can define requirements like needing good sound insulation, or bulletproof functionality, or strong supporting force, with a preferably arc-shaped form, or even minimal weight. The design software incorporates these requirements into its calculation process and ultimately generates design plans containing detailed information about internal and external structure, shape, material surface properties, and friction.

Zhang Wei: Sounds extremely complex. I think it's feasible for modeling certain specialized fields. But for completing actual manipulation tasks, I estimate its scalability is limited. Actually, people may have a misconception about simulation — often thinking that robots can easily solve the problem of difficult-to-obtain robot data and experimental data through simulation technology, and even get the so-called "data flywheel" spinning. But in reality, simulation is essentially modeling the real physical world. As you just mentioned, modeling requires considering many factors, and there are countless different objects in the real world. Achieving broad generalized modeling is almost impossible.

Moreover, when robots complete manipulation tasks, they don't need to comprehensively model the entire physical world. Frankly, modeling is much harder than actually completing the manipulation task. At least in small scopes, people are starting to realize this. This is also why imitation learning has been able to make certain progress — because people are starting to realize that when humans grasp objects, they don't need to know the specific model or parameters of the object; they can successfully grasp it based on intuition alone. This is a shift at the awareness level.

The Industry Status of Tactile Sensors: Many Approaches, No Consensus

Li Feng: There are many hardware requirements here. Take the simple action of grabbing a cup: first, the cup's hardness and softness determine how I assess its weight and material, which in turn determines whether to grip it firmly or hold it gently. Second, whether the cup surface is slippery and how much force is needed to hold it — these are all determined at the moment of grasping, and require instant feedback the moment contact is made. For humans there's an additional issue: if the cup contains scalding water, you might not be able to hold it.

So theoretically, tactile sensors need to be extremely sensitive at the end effector, capable of timely and real-time feedback on various signals, with this feedback able to real-time guide subsequent operations. In your view, how far along is the hardware at this stage?

Zhang Wei: Since humanoid robots became hot, tactile sensors have also become a hot direction. There's no doubt we need tactile sensors, but the current industry status is: many approaches, no consensus. There are indeed many people researching this field, but most are at the laboratory stage, or at a stage where things are theoretically explainable but after being made into products, robustness or consistency — or rather diversity or usability — is relatively poor.

This is similar to the software iteration challenges we discussed earlier with bipedal walking. Tactile sensing technology may not require a breakthrough at the physical level, but to know whether a tactile sensor is useful, you first need to build the hand; to know whether the hand is useful, you need to build the arm; to know whether the arm is useful, you need to solve AGI first. So the validation loop is extremely long.

I don't think this field lacks talent or intelligence — what it lacks is clear goals and direction for iteration. It's like trying to accomplish tasks through visual big data before cameras even existed. Right now, tactile sensing technology hasn't found a clear direction in terms of applications. Once the target becomes clear, technological progress will accelerate.

Li Feng: We invested in a company called Inspiry Robotics. They work on fine motor control in end effectors. Actually, even within the category of "hands," there are different approaches to executing fine movements. I can draw an analogy: early education classes — though that industry has taken quite a hit lately — where some children attend before kindergarten. These classes spend a lot of time training children's gross motor skills, fine motor skills, and what you might call precision movements. Gross motor skills are things like crawling, walking, running. Fine motor skills might be grasping objects. Precision movements involve using fingers, or wrist plus fingers, to pick up, pinch, place, sort, and so on. From your perspective, what room for improvement exists in robotic hands, wrists, and fingers, or what visible solution iterations are on the horizon?

▲ Inspiry Robotics' anthropomorphic five-finger dexterous hand. Image source: Inspiry Robotics WeChat official account

Zhang Wei: There are quite a few companies building hands now, especially in the last year or two. But I think they're still lacking clear direction for iteration — it's uncertain how much impact they can make. The worst situation for a field is when it can't control its own destiny. I think hand development still needs to wait for AI to mature further. But it is indeed in a state where it might take off. Because of the current wave of humanoid robot enthusiasm and AGI development, investment and attention in this area have both increased.

Li Feng: Are six-axis force sensors, or multi-axis force sensors, more useful for fine manipulation tasks?

Zhang Wei: They're useful — it depends what you're using them for. Tesla's and Figure's hands look pretty good right now.

12. Robotic fine manipulation of soft objects like fabric cutting remains a research challenge

Li Feng: Another question: is today's vision sufficient? We invested in a company called Covariant, founded by UC Berkeley professor Pieter Abbeel and three of his PhD students. The company started with pick-and-place scenarios in warehousing and logistics. Initially they faced the challenge of handling irregularly shaped objects stacked together on conveyor belts — things like envelopes or packages. The robot needed to determine whether a polygon represented several different objects stacked on top of each other.

▲ Covariant robot performing automated depalletization. Image source: Covariant official website https://covariant.ai/

The garment industry has also long been considered difficult for robots to enter. One can imagine a scenario where fabric cut pieces are stacked and interleaved on a conveyor belt, some wrinkled or obscured by irregular objects, and the robot must accurately grab a piece, lay it flat, and then sew it into the clothes we wear every day. The human eye can easily judge the shape of a fabric piece and unfold it, but for robots, fabric is soft, and achieving this sequence of actions was once extremely difficult. Are there solutions to these kinds of problems now?

Zhang Wei: Still very difficult to solve — I think it remains a frontier research problem. There are many academic papers on this. Deep learning has been used quite a bit in robotic manipulation, but hasn't yet developed sufficiently strong generalization capabilities.

To solve these new problems, whether from 0 to 1 or from 0 to 40 or 50, will require some new methods. Handling envelopes and clothing share some similarities to some degree — both need to solve the problem of recognizing stacked objects — but each has its own challenges. The solutions to these problems are foreseeable, because large models are simply too powerful. Sometimes GPT's understanding of a scene even surpasses human ability.

Our research group once spent half an hour discussing a single image. It contained a complex scene with physical reflections, and we couldn't agree on which side was the person and which was the person's reflection. In the end, GPT's analysis turned out to be correct — it could clearly distinguish both the person and the reflection. Envelope sorting should theoretically be solvable; productizing it is another matter.

Li Feng: Yes, as it turned out, Covariant solved this kind of unstructured sorting problem with its approach.

Zhang Wei: Though manipulating soft objects like fabric is somewhat different — this is extremely complex, and will probably need to wait for continued iteration of large models.

Li Feng: One view I hold is that in China, the automation and intelligent transformation of production lines in industries like apparel connects to a long-term and significant proposition. China has the world's most complete industrial chain. For any of these advantaged industrial chains, if they can be transformed — whether through automation or robotics — the possibility of upgrading and transformation in China is substantial. Conversely, for any industry that cannot be transformed in China through process modification and partial automation, it may gradually shift to countries and regions with lower labor costs, such as Southeast Asia.

Zhang Wei: I think it's difficult — fabric automation is quite hard. There are some startups working on it now, and I don't think it's completely impossible. It can be automated, but deployment and costs are very high. If you were to approach it the way chip manufacturing does, you could probably figure it out — it's just that the cost would be too high.

13. Robot development has three stages

Li Feng: A hot topic right now is that with large language models, human-robot interaction has become much easier. In the past, we needed programming or specific machine language instructions to get robots to perform tasks. But now, because large language models can understand and generate natural language dialogue, directly commanding robots through everyday conversation has become imaginable.

For example, we could say: "Help me take the wallet out of Professor Zhang's pocket." If the robot responds that there's no wallet in Professor Zhang's pocket, we could continue: "Then go look in his backpack." Finally the robot goes to find the wallet, credit card, or cash, whatever it is, and brings it to me.

Beyond this intuitive interaction method, what other connections and potential applications do you see between large models and robots?

Zhang Wei: Large models have shown people the possibility that robots could have strong generalization capabilities. We can divide robot development into three stages: from 0 to 1 is achieving control of the robot body itself — at this stage we've already found the "switch," and humanoid robots are one example; from 1 to 10 is learning individual skills, and I think we're close to finding the switch for that; from 10 to 100 means robots can understand arbitrary scenes and make reasonable decisions. While some people think this stage has already been achieved, I believe large models have only given us a glimmer of hope that we'll find the "switch." Compared to the past, this goal is no longer a distant fantasy — it's imaginable. Take grabbing a cup as an example: if the cup is accidentally knocked over and something spills, how should the robot respond? With large models, robots could in the future decompose tasks — for instance, first getting a cloth to wipe it up, and if the carpet gets wet too, it could go find cleaning staff to handle it. I believe that with the development of large models, we'll see so-called Robot Agents that decompose tasks, making robots more intelligent.

Li Feng: If I understand correctly, when I tell the robot "Make sure to bring Professor Zhang's wallet," I only need to issue the command. As for how the robot gets the wallet from Professor Zhang, I don't need to worry about the specific process.

Zhang Wei: Right — the robot can decompose the task and execute it.

Li Feng: Of course, the simplest and most brute-force method might be to knock Professor Zhang unconscious first, search him, and then bring the wallet over.

Zhang Wei: Hahaha, that's poorly trained — didn't learn from the right people, learned from the wrong ones.

Li Feng: Wrong role model for imitation learning.

14. "Since I found the switch, I can light up this room"

Li Feng: There are also some questions related to you personally. You've worked in research, especially in robotics-related fields, for many years — both as a researcher and as a respected frontier scientist. I have two questions: First, when humanoid robots weren't yet a hot field two years ago, why did you choose to found a general-purpose robotics company? Second, after years of doing research in both China and the US, what have you learned from turning to company operations and entrepreneurship?

Zhang Wei: I can briefly share my reasons for starting a company. On one hand, I believe technology and business have an intersection. Though conditions weren't as mature then as they are now, at least in terms of robots' all-terrain mobility capabilities, we had already found the "switch." On the other hand, in many fields including engineering, the main contributors to future industries will likely increasingly come from industry itself. We hope to create value through entrepreneurship — especially in an application-oriented field like humanoid robotics, where the value created through entrepreneurship will be somewhat different. This is something we're very excited about.

Li Feng: So the benefit of science and research is that when a room is completely dark, you figure out how to find the switch. The benefit of industry is: since I found the switch, I want to install it in every dark room.

Zhang Wei: At least I can light up this room, and then in this room I can enjoy myself or create a lot of value. For example, I can arrange the room a bit better, or I can proactively place the wallet in front of you.

Li Feng: As a scientist versus as a company founder, how do the experiences differ, and which role presents a greater challenge for you right now?

Zhang Wei: The two are quite different. Research is essentially a process of finding the switch — it's divergent work that requires constant experimentation and innovation. An entrepreneur, having found the switch, needs to clarify goals and focus energy on planning and executing one thing.

I think entrepreneurship is very challenging — building a good company is difficult. Technology is only a small part of the entry requirements. After finding the switch, everything in the room needs adjustment and modification. Looking back, finding the switch may have been relatively simple.

Li Feng: Or rather, finding the switch is only the first step.

15. What are the development prospects for intelligent robots or the robotics industry in China?

Li Feng: Since you've spent time in both China and the US, with research and industry experience in both places, and have had in-depth exchanges with peers in both countries — in your view, what are the development prospects for intelligent robots or the robotics industry in China? And how does it compare to the US?

Zhang Wei: First, what the US excels at is finding the switch — it still leads in original innovation. However, once the switch is found and we enter the stage of practical application and industrial deployment, China's advantages become very apparent. The robotics industry in particular combines hardware and software, with software driving hardware development. Hardware is the foundation, and the iterative closed loop must include hardware. China enjoys unique advantages in this regard, especially in Shenzhen and surrounding areas, where the iteration speed for hardware products is extremely fast.

I often give this example: when I was abroad, if I bought a motor, it would take one to two months to arrive, and if it wasn't suitable, replacing it would take another one to two months. In China, this is a matter of morning or afternoon, or even just across the street. This efficiency is extraordinarily high. So when it comes to the overall commercial layout of the industry, I believe China has a natural supply chain advantage — provided that the switch has more or less been found.

Li Feng: To summarize, in China, humanoid and embodied robots hold enormous investment and industrial value. The main reason is that China's industrial structure and economic policies tend to encourage and support long industrial chains, especially industries that add value to manufacturing supply chains. The humanoid and embodied robot industry represents a very long hardware supply chain, including but not limited to motors, reducers, sensors, chip controllers, and various other components — and today it has added software and algorithm technologies on top of that. So it belongs to the category of long manufacturing chain + technological value-added. This is something China is both good at and particularly motivated to push forward in today's environment.

There's a corresponding example. Autonomous driving algorithms and new energy vehicles — these two have always been pursued with equal weight in China. China is one of the most open countries in terms of pure autonomous driving road testing and road access policies. Over the past decade or so, China has vigorously developed the new energy vehicle supply chain. New energy vehicles represent a very long supply chain, plus the value-added of autonomous driving technology or intelligentization. Looking at the results, after more than ten years of effort, Chinese new energy vehicles have begun to dominate the global market, especially in exports.

The robotics industry has an additional benefit. When embodied robots possess the strong capabilities Professor Zhang just described within 5 to 10 years, they are expected to serve the secondary industry (manufacturing), the tertiary industry (services), and even agriculture simultaneously, addressing the labor challenges brought by China's aging population and demographic shifts. Furthermore, China's large population base can provide robots with the broadest possible application directions and scenarios.

Another point: the humanoid design is easily understood by policymakers and readily accepted by ordinary people — especially when it becomes part of the tertiary industry, we are more accustomed to robots with appearances and behaviors closer to humans providing services.

Regarding humanoid robots and intelligent robots, we'll stop our discussion here for today. Special thanks to Professor Zhang Wei for joining us to explore these topics with his magnetic voice despite having a bad cold. Thank you, Professor Zhang!

Reader Engagement

In the embodied intelligence field, what innovative opportunities have you observed? What are your expectations for humanoid robots? By 17:00 on April 10, the 5 readers with the most thoughtful comments will receive copies of The Third Chimpanzee and Sapiens: A Brief History of Humankind.

▲ The Truth About Weight Loss, and Innovation Opportunities | FreeS Report 36

▲ Market Share From 1/10 to 1/3: How Did Domestic Industrial Robots Achieve Explosive Growth? | FreeS VC Dialogue

▲ Heading Into 2024: How We Think About AI Entrepreneurship and Investment | FreeS Year-End Special

▲ After ChatGPT Went Viral, Where Does AIGC Go From Here? | FreeS Report 28

▲ FreeS Report 20 | Learning From History: Why We're Bullish on Industrial Robots?

Star the FreeS Fund WeChat Official Account for timely business insights delivered to your feed