One Aims for the Skies, the Other Cares About Face: They're Redefining What You Think Robots Can Do | 5Y View
The humanoid robot race is getting crowded? They're looking for a different play.

If everyone has a destiny, Gao Fei's might be written right into his name — "Fei," meaning "to fly." He is a tenured associate professor and PhD advisor at Zhejiang University's College of Control Science and Engineering, a regular contributor to top-tier journals like Science Robotics and TRO, and a 2023–2024 selection for the world's top 2% of scientists. But compared to the dazzling titles of "PhD advisor" or "Outstanding Young Scholar," Gao Fei's current identity more closely resembles that of a geek reshaping the skies. He founded Differential Intelligence Flight (微分智飞) with a clear and far-reaching goal: to build a universal "embodied brain" for flying machines.
Hu Yuhang, founder of Shouxing Technology (首形科技), may be the most traffic-savvy scientist in robotics — he holds a PhD in robotics from Columbia University and is a regular in top journals like Nature Machine Intelligence and Science Robotics. He is also the creator of the short-video account "U Hang," with over 2 million followers and 150 million cumulative views. Behind his traffic experiments lies an extremely cool-headed strategic choice: giving robots a face capable of conveying 55% of human emotion. He is attempting to bypass the treacherous pit of complex physical contact, using high-fidelity facial interaction to be the first to crack the Scaling Law for physical robots.
This episode features a conversation between Xing Meng, partner at 5Y Capital, and two founders: Gao Fei of Differential Intelligence Flight and Hu Yuhang of Shouxing Technology. They discuss choices, traffic, and the ambition to redefine robots.
Whether it's a "brain" soaring through the skies or a "face" transmitting emotion, amid the frenzied heat of the humanoid robot race, neither chose to replicate success in the mainstream lane. Instead, they each took a road less traveled. This is not just the entrepreneurial story of two top scientists, but an attempt to evolve robots from industrial products into an entirely new species.

Scan the QR code to listen to the audio
Guests:
Gao Fei, Founder of Differential Intelligence Flight
Hu Yuhang, Founder of Shouxing Technology
Host:
Xing Meng, Partner at 5Y Capital
"Humanoid-face robots don't require physical contact. Their interaction dimensions can be trained entirely through pure digital information. This means we have a much easier path to cracking the Scaling Law and nurturing the 'GPT moment' for physical robots."
"We're not Reinventing robots. We're Redefining them. We've been waiting for this opportunity for far too long."
"If I were afraid of a mountain from day one simply because it's high, I would abandon all hope of ever climbing it. 'The mountain rises ten thousand feet — I only take one step.' I just focus on the step right in front of me."
Selected excerpts from the interview:
Xing Meng: Both of you are among the most closely watched entrepreneurs in robotics this year, yet neither of you chose the most mainstream direction. When people talk about robots today, the first thing that comes to mind is usually humanoids — a scorching-hot track with plenty of excellent companies emerging. But you each chose directions that are related yet distinctly unique. Could you both start by introducing your respective directions?
Gao Fei: I'm Gao Fei, founder of Differential Intelligence Flight. We focus on flying embodied intelligence. Put simply, we aim to build an integrated foundation model encompassing the brain, cerebellum, and swarm-brain that can empower all kinds of flying terminals. My vision is to bring intelligent flight into millions of households and serve thousands of industries.
Hu Yuhang: I'm Hu Yuhang, founder of Shouxing Technology. We specialize in humanoid-face robots and human-machine conversational interaction systems. We believe that today's AI agents are increasingly approaching the emotional expression and comprehension patterns of real humans in their interaction strategies. We want them to break free from existing physical carriers, no longer confined to screens and phones, but truly embodied in a physical human form.
This offers two core values. First, psychological research shows that 55% of human emotion and attitude in communication comes from facial expression, which dramatically expands the dimensions of interaction. Second, it enables IP commercialization — by applying biomimetic skin to a metallic robot frame, it can transform into various characters: game avatars, film figures, or virtual companions. Although universities have done considerable research in this area, it has never been properly engineered for commercialization. I believe that with China's advanced manufacturing capabilities and supply chain, we can absolutely make this happen.
The Humanoid Track Too Crowded? They Chose "Up to the Sky" or "The Face"
Xing Meng: Both of you are top students and regulars in Science Robotics, with many publications to your names. Actually, choosing a more mainstream direction wouldn't have been difficult for either of you. So why take this seemingly "contrarian" path? How did you think it through?
Gao Fei: First, regarding why I chose the flying direction — simply put, I just wanted to do it. If everyone has a destiny, this is mine. Look at my name: Gao Fei, "soar high" — I was meant to do this. My childhood dream was to fly planes and become a pilot. I never made it, unfortunately, but I started working on quadrotors as an undergrad at Zhejiang University, right when they were just emerging. Through my PhD and my professorship at Zhejiang, I've kept working on problems from single-agent autonomy to swarm autonomy in flight, because I genuinely want to do this.
Of course, before starting the company, some people advised me to do humanoid robots, given I had some reputation in the field. But I decided to stick to my original aspiration. Additionally, facing the opportunity of embodied intelligence, I believe we can use new AI technologies that integrate downward into physical entities to rebuild all previous robots from scratch. The essence is making robots general-purpose — evolving from Machine to Automatic Machine, then to Autonomous Machine, and finally to our generation's Intelligent/Smart Machine. This is a massive opportunity, and it's equally true for drones and flying robots. Since it's both an opportunity and a dream, the choice was simple.
Xing Meng: Let me follow up — inevitably in this direction, you'll be asked: how are you different from DJI? I'm sure you've heard this countless times. Could you explain how you view the comparison?
Gao Fei: First, I don't see us as competitors with DJI, or rather, at this stage we don't yet "qualify," right? Essentially, we are building a "flying embodied brain" — brain, cerebellum, and swarm-brain — with the goal of becoming a more platform-oriented, horizontally enabling company. This is fundamentally different from DJI's aircraft, which are oriented toward aerial photography and imaging. We want to empower all industries, not compete with DJI in the specific vertical of imaging.
From another angle, even if we face competition in the future, I think we should face it calmly. If I were afraid of a mountain from day one simply because it's high, I would abandon all hope of ever climbing it. I love a quote from Wang Yangming: "The mountain rises ten thousand feet — I only take one step." DJI is indeed a towering peak, but I just need to take the step right in front of me — do the technology well, build the product, and open up our own market.
Xing Meng: Yuhang?
Hu Yuhang: I just graduated with my PhD this June. During my doctoral studies, I experimented with various AI-robotics integration topics — legged robots, robotic arms, and now humanoid-face robots (the only thing I didn't do was drones, because New York has flight restrictions). For every topic I tried, I would think through its commercialization path and build an MVP (minimum viable product). The humanoid-face robot category is what I believe holds the greatest commercial imagination.
Through these different attempts, I discovered that data is the biggest bottleneck. While model parameters can be stacked infinitely, data is the necessary condition for the Scaling Law to take effect. Robotics inevitably involves interaction with the real world, which creates a massive Sim-to-Real gap — simulated data is very difficult to deploy directly in reality.
Looking across robot categories, legged robots are relatively controllable — they only interact with the ground, so you just need good ground modeling and body dynamics adaptation. Manipulation is much harder because it requires contact with all kinds of unknown objects: liquids, flexible fabrics, or visually deceptive items. This complex physical interaction makes training data extremely difficult to obtain, making it a very challenging starting point for commercialization.
But humanoid-face robots are completely different. They barely require physical contact or involve complex friction and contact dynamics. Their interaction dimensions — gaze, vision, hearing, language content, and even facial expressions — can be trained entirely through pure digital information. This aligns highly with the modalities that current large models can process. So I believe in the humanoid-face robot direction, we can more easily access massive amounts of data, crack the Scaling Law, and nurture the "GPT moment" for physical robots.
This is why I chose this field. My advisor always taught us not to do "quick and dirty" work, but to explore blue oceans. The human-face scene was once a blank space in reinforcement learning, deep learning, and multimodal research. When we published in Science Robotics last year, this was still a niche track, but now it's become a new research hotspot. Just like locomotion or drones back in the day, this is an entirely new academic ecosystem.
Xing Meng: Maximizing the advantage of data iteration while minimizing the impact of physical obstacles.
I remember in March this year when I was in the US, I had the chance to visit Yuhang's lab and saw this robot. It was incredibly stunning — the moment it opened its eyes felt deeply sci-fi. After we decided to invest, most initial reactions were confusion: just a head, no body — what can it do? By June or July, the winds completely shifted. People started throwing out ideas frantically: do IP, do games, do movies, do virtual pop stars, etc. I'm curious — did you also go through this process, from being misunderstood to an explosion of imagination?
Hu Yuhang: Indeed, at first everyone questioned whether this thing could be commercialized, not knowing what it could be used for. Later, when we released some cool videos, all kinds of directions came flooding in. At that time, an early shareholder told me: the biggest challenge you face now is making the right choice among so many options.

Building Hardware for AI "Grinders"
Xing Meng: In my early conversations with both of you, regarding the question of "what to build," I noticed you both independently mentioned Unitree. Unitree spent a considerable amount of time early on in the research and education market, building technical capabilities and serving the most R&D-demanding users, before expanding to commercial scenarios. Looking back, has this thinking from six months ago evolved? Have you discovered new scenarios?
Gao Fei: From day one, we established our technical roadmap: on the relatively mature drone platform, use large models to demonstrate reasoning and decision-making capabilities, use end-to-end cerebellar models to demonstrate extreme athletic capabilities, and use distributed swarm-brains to achieve multi-agent collaboration. Aside from the cerebellum enabling fast flight, the other two points weren't fully clear on specific commercial applications at the time.
So when I discussed with Xing, I proposed whether we could follow Unitree's path: start with the research and education market to build an ecosystem. We would provide foundational capabilities and partially open-source models, letting university and institutional users unleash their imagination to build demos and validate scenarios for us. Looking back now, I'm deeply grateful for Xing's reminder at the time: "Don't accidentally become an education company." He was absolutely right. Research and education is a phase for us, or a means, not the end goal. Our core must remain platform-oriented — using the research market to enrich our ecosystem, but not stopping there.
Hu Yuhang: I can speak from two angles: customers and the company itself.
For customers, we discovered this is a real pain point. Coming from an academic research background, the school of robotics inevitably involves EE (electrical engineering), ME (mechanical engineering), and CS (computer science). Today the people doing AI algorithms are mainly from CS, but getting CS people to handle hardware is excruciating. Unitree's entry point was precise: when RL (reinforcement learning) emerged, CS researchers needed to validate policies, and Unitree perfected the quadruped robot platform so CS researchers could directly access底层 control to train models.
We asked ourselves: what's the next research direction where people will "grind"? People in AI are grinders — they need to publish masses of top-conference papers every year. I'm also a reviewer, and I've noticed that in HRI (human-robot interaction), most papers have terrible hardware capabilities, leading to a lack of standard evaluation or benchmarks. Shouxing Technology can provide a very stable hardware platform to solve this problem. Moreover, the modalities of humanoid-face robots — vision, hearing, voice output, eye movement — match perfectly with today's mainstream multimodal AI, making them ideal for digital-human-related research tasks.
For the company itself, we need an open ecosystem to attract talent. In today's market, you simply cannot hire someone with "10 years of humanoid-face robot engineering experience" because this job category didn't exist before. By entering the university research market, we can cultivate future developers. When students use our products to publish papers during their PhDs, they'll prioritize joining Shouxing when they graduate — creating a positive talent loop for the company. The more products enter universities, the more validated talent enters the company.
The Million-Follower Creator
Xing Meng: For startups, storytelling is also a critical capability. Yuhang is quite special in this regard — with 2 million followers across platforms, mostly accumulated during his PhD. Coincidentally, your product is extremely cutting-edge and needs a platform to showcase it, and this showcasing process in turn gave you platform opportunities. Could you share how you think about self-media and communication?
Hu Yuhang: Right, I believe the new media era, including Douyin and other streaming platforms, gives us new opportunities to penetrate all industries, just like AI technology itself. At the time, the platform with the most traffic was Douyin, and its logic differs from traditional media — it cares intensely about the first-three-second attention mechanism. It no longer relies purely on trending topics or friend recommendations, but on algorithms. If you don't grab viewers in the first three seconds, they swipe away. I researched this in my spare time during my PhD and gradually figured out the self-media logic.
This has also dramatically reduced our marketing costs. Our most popular video has over 60 million views, probably 70 million now, and shot straight to the top of the trending list. The traffic has also brought some upstream-downstream partnerships and recruitment opportunities — many people actively reached out wanting to join after watching the videos.
This changed my understanding of "socializing." I used to think having many friends meant more opportunities, that you needed to actively socialize. But entering academia, I found true peers are rare, and most socializing is ineffective. The nature of socializing has changed — it shouldn't be about actively pitching others, but about actively exposing yourself to attract connections. People interested in your content and aligned with your entrepreneurial direction will naturally gravitate toward you; those uninterested will automatically filter out. For me, self-media is an excellent social tool.
Xing Meng: Attracting people in and letting them find their own way is more effective than active outreach — though the prerequisite is having a good enough product.
I'd also like to ask Gao Fei about hiring. One could say you're not just competing with drone companies, but with all robotics companies and even tech giants for talent. People skilled in reinforcement learning or robotics algorithms have transferable capabilities, and competitors often have more money and bigger names. As a startup, how do you win over these core talents?
Gao Fei: We do face fierce cross-industry competition in hiring, and I've invested considerable energy into it. I believe that in a team's early stages, finding people who complement your own capabilities is crucial.
I have two core strategies: First, sincerity is the ultimate killer move. I once made three trips to Shanghai to recruit a reinforcement learning talent. He didn't join in the end — I was heartbroken — but that process was absolutely necessary. Second, find people who share the same dream. For example, this October I personally went to several universities for campus recruitment talks. The most enthusiastic response wasn't at my home turf of Zhejiang University, but at Northwestern Polytechnical University and Beihang University. Students at these schools naturally love flight — they chose these schools because they love aircraft.
While many competitors have more resources and bigger names than us, I believe there are many kids around the world like me who dream of soaring through the skies. I just need to find those whose skills meet our bar and whose goals align closely with ours. There are actually plenty of such people; the key is that we must work hard to discover them.
Xing Meng: You mentioned that students at Northwestern Polytechnical and Beihang have a natural passion for flight. In your hiring priority ranking, how would you weigh this passion or willpower relative to capability and experience?
Gao Fei: This also depends on company stage and industry stage. For instance, this year all robotics companies are recruiting for reinforcement learning and embodied intelligence, but everyone finds it incredibly difficult. This is such a new direction that there are virtually no "properly trained" people — everyone is essentially a freshman. We're not building traditional drone hardware but the AI inside, which only really started in the past couple of years. So at this stage, I appropriately lower the weight on experience and value more the ability to self-iterate and think independently, plus alignment with our vision. I want the team to be "small but elite." Only when everyone's dreams and goals are highly aligned is the fighting power strongest.
Restraint and Expansion When "Ammunition" Far Exceeds Expectations
Xing Meng: Over the past year, both of you have raised funding far faster than expected. I remember when I first talked with Gao Fei, we thought a startup round would be enough to get things running, but the actual scale far exceeded initial plans. Facing such dramatic change, how do you adjust product pace and company strategy?
Gao Fei: First, the funding确实 brought confidence, but also challenges in how to allocate resources efficiently. I've come to realize that "the more the merrier" isn't something everyone can easily say — truly驾驭 vast resources is actually quite difficult.
With more ample room for error, our approach naturally changes. Specifically, we've increased investment in "infrastructure" like foundation models, data collection, and training facilities. This lets us play more "high and bold": with limited resources, we'd have to meticulously optimize for PMF (product-market fit) in a single scenario before slowly expanding. But in the fast-evolving embodied intelligence赛道, to seize technical positioning, we need to use our funding advantage to give products stronger universality and cover more scenarios.
Hu Yuhang: I strongly agree with Professor Gao. Additionally, I think funding advantages also show in building strategic moats.
Domestic market competition is extremely intense, and copycats have keen noses. Often as soon as we release new content online, even the filming techniques and music get rapidly copied. In this environment, ample funding lets us quickly flex our muscles and establish our tone, securing industry leadership and attracting more people in.
Xing Meng: Capital is a resource, but sometimes also a distraction — it can disrupt our rhythm, or with excess resources, tempt us to do many things we shouldn't simply because we "can." This has happened countless times in history. Have you had moments where abundant resources caused distorted actions? Or what mechanisms do you have to ensure decision-making discipline when facing massive resources?
Gao Fei: When our team makes major decisions, I let opposing voices come out first. I think in a team, because I'm the founder, if I speak first my views tend to carry too much weight and easily bias everyone. So I usually deliberately step back, withhold my position, and let colleagues who like to "sing a different tune" speak first. After all critical opinions are fully expressed, I then balance things out. This method effectively neutralizes my unconsciously aggressive impulses, and so far it has worked reasonably well.
Hu Yuhang: Although fundraising went smoothly, we actually haven't taken that much money, and our strategy remains "small steps, fast runs." We continue methodically pushing our MVP and technical breakthroughs. We absolutely will not upgrade to a bigger office or blindly upgrade equipment just because we have money in the bank. Our principle is clear: every penny must be spent where it counts.
Embodied Intelligence: Bubble or Eve of Dawn?
Xing Meng: Finally, I'd like to ask both of you: if embodied intelligence is not a bubble, what's your reasoning?
Hu Yuhang: As an idealist, I certainly hope it won't be a bubble — everyone should have a chance. But realistically, there are indeed many unconverged technical challenges: data standards aren't unified, sensors have bottlenecks, and even basic real-time voice interaction and high-precision task control aren't fully solved yet.
But I believe genuine value will precipitate after the foam recedes. At this stage, we're leveraging capital enthusiasm to攻克 core technologies. If a bubble bursts in the future, we'll seek more grounded "laying eggs along the way" scenarios to ensure healthy cash flow.
Gao Fei: I don't think this is a bubble. As a robotics industry practitioner, I see this as a massive era-defining opportunity. The robotics industry has developed for decades with ups and downs — from robotic arms in 2000, to robot vacuums in 2010, to drones and autonomous driving in 2015. While they all work, they're still far from the "general-purpose intelligent robots" that can coexist with humans, which we've long dreamed of.
The essence of embodied intelligence is adding AI to machines, evolving them from "specialized automation equipment" to "general-purpose intelligent agents." Broadly speaking, we're not Reinventing robots. We're Redefining them. The core lies in "generalization and泛化" — this is completely achievable in technical vision. For someone like me who transitioned from traditional robotics to AI, we've been waiting for this opportunity for far too long. So I not only don't think this is a bubble — I believe current investment and attention are actually still insufficient.
5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing support from the spiritual to all operational aspects. We believe that if the world begins to believe in the "crazy" you that others see, the world will become a different place.
BEIJING · SHANGHAI · SHENZHEN · HONG KONG
