5Y News｜AheadForm Closes Pre-A Round

五源资本·June 26, 2025·13·0

The Startup Behind the Viral Humanoid Robot Video With Over 100 Million Views

On June 26, AheadForm officially announced the completion of a new funding round, co-led by China Merchants Venture Capital and Shenzhen Capital Group, with 5Y Capital and Xunshang Venture Capital participating. Previous angel round investors included Decent Capital, MiraclePlus, Taihill, and AgiBot. Not long ago, a video posted by its founder showing a "bionic robot opening its eyes and slowly smiling" went viral on social media. The awakening scene was widely shared and reposted, with comments ranging from "I don't think the uncanny valley applies anymore!" to "It feels like there's light in her eyes."

It's hard to sum up what AheadForm brings to the table with a simple word, but most people sensed something unprecedented from that video — the realism of an intricately detailed human face model, fluid micro-expressions, and authentically believable emotional interaction; plus enormous market potential. When robots become so lifelike that they can trigger genuine emotional responses in humans, those scenes that once existed only in science fiction no longer seem so far away.

The Creator Who Awakened the Robot's "Self"

Yuhang Hu, founder of AheadForm, holds a PhD from Columbia University and is a member of the AI Institute under the US National Science Foundation. He has long researched robot self-modeling, autonomous cognition, and human-like interaction systems. Unlike traditional robotics researchers who approach the field from a productivity angle, he attempts to go further and answer a more fundamental question: Can robots understand themselves and others the way humans do, and through self-learning, advance toward embodied intelligence?

During his doctoral studies, he published two consecutive papers in top robotics journals Nature Machine Intelligence and Science Robotics, drawing significant industry attention:

"Teaching Robots to Build Simulations of Themselves" — Nature Machine Intelligence:

Proposed a methodology centered on self-supervised learning and self-modeling: robots can reconstruct their own structure and motor kinematics solely by observing video of themselves, achieving a closed self-loop from perception to understanding to control, providing a technical pathway for robots to develop adaptive capabilities and autonomous learning;

"Human-Robot Facial Coexpression" — Science Robotics:

Proposed a facial co-expression prediction model enabling robots to capture human emotional signals in advance and synchronously generate natural, matched facial responses. This makes the robot not merely an observer or responder, but an actively interacting, empathetic agent.

These achievements formed the technical starting point for AheadForm's subsequent products — "endowing robots with more human-like appearance and behavioral patterns." Combined with AheadForm's latest technology, the robot's "moment of opening its eyes" in the video brought all these threads together, sketching the outline of an awakened bionic robot.

Three Technical Systems: Autonomous Robot Learning, Emotional Foundation Model, and Bionic Face Hardware

The realism of the robot in the video stems from AheadForm's accumulated expertise across three technical systems.

1. Autonomous Robot Learning: Self-Supervised Learning and Self-Modeling

From AheadForm's perspective, rather than "teaching" a robot a fixed skill in one go, it's better to endow it with the ability to "learn autonomously." This philosophy was precisely the core of founder Yuhang Hu's doctoral research at Columbia University: self-supervised learning and robot self-modeling.

A. Self-supervised learning, originating from the robot's ability to "look in the mirror"

Without relying on expensive human annotation or teleoperation, robots use motion data from their own sensors to automatically infer the internal relationships between their own structure, joint kinematics, and control strategies — much like humans observing themselves in a mirror. This process breaks away from traditional dependence on preset models and environments, achieving closed-loop learning from perception to modeling to control.

B. Self-modeling, enabling robots to "understand themselves"

Self-modeling refers to robots building internal models of their body structure and dynamic behavior based solely on their own perception, without relying on environmental labels or external intervention. The key behind this lies in decoupling "agent modeling" from "environmental modeling" in robotic systems: agent modeling emphasizes the robot's cognition of its own body, such as motor response, underactuated structures, and soft deformation; environmental modeling belongs to task-level understanding, such as terrain, object shapes, or force feedback.

Through this decoupling, robots can start from "knowing themselves" in complex or unknown environments, gradually expanding to interaction modeling with the outside world. For example: when hardware wears down, external loads change, or new tools are attached, it can instantly reconstruct its self-model and autonomously adapt to new states without retraining the entire system.

This capability gives robots "strong interpretability, high adaptability, and strong independence" as underlying intelligence, representing one of the key pathways toward general-purpose robots.

C. Lifelong learning, the future of autonomous robot learning

The establishment of self-modeling capability also lays the foundation for robots to achieve "lifelong learning": once a model learns "how to learn," it can transfer to new hardware, scenarios, and even whole-body joint systems, enabling continuous learning of new tasks, adaptation to new environments, and repair of self-damage. Through this technical paradigm, AheadForm makes "autonomous learning" an accelerator for robots moving toward embodied intelligence — allowing robots not only to execute tasks, but to continuously learn and constantly grow.

2. Emotional Foundation Model: Building an Affective Engine for Human-Robot Interaction

Can robots understand and express human emotions? The answer comes from another major breakthrough at AheadForm — the Emotional Foundation Model.

In current AI development, a "foundation model" refers to a core model framework trained on large-scale data with general capabilities that can generalize to multiple tasks. It is a critical step for artificial intelligence toward "embodied intelligence."

Most humanoid robot companies are currently focused on building "tool-attribute" foundation models, such as general grasping, manipulation control, and navigation planning, attempting to adapt to various execution actions and physical operations through a single training process. However, this direction faces a fundamental bottleneck — the scarcity of real-world interaction data. Physical interaction data in reality is expensive and difficult to collect, involving complex collisions, dynamic feedback, and multi-modal synchronization, making it far more difficult than training in simulation.

Emotional foundation models offer a relatively more viable path. Most human emotional interaction relies on non-physical channels such as language, voice, facial expressions, and eye contact, making emotional interaction data not only easier to collect, standardize, and scale, but also independent of complex physical world modeling. At the same time, this data naturally possesses continuity, contextual richness, and generalization capability, making it ideal material for self-supervised learning.

Based on this assessment, AheadForm was the first to propose and build the "Emotional Foundation Model" — a multi-modal model based on large-scale emotional interaction data, integrating voice, expression, language, context, and character settings. This model not only enables robots to "understand" human emotions, but also "learns" to make warm, natural, and believable responses at appropriate moments.

3. Fully Self-Developed Bionic Face Hardware: Breaking Through the "Uncanny Valley" with Ultra-Precision Craftsmanship

Human perception of "faces" is extraordinarily acute. The fusiform face area (FFA) in the brain decodes facial information at approximately 200 milliseconds per cycle, processing through three hierarchical levels: basic structural recognition (eye spacing, nose height, face shape), dynamic expression extraction (micro-expressions and muscle activity), and memory matching (familiarity, emotional association).

For a bionic robot's "face" to simultaneously meet the requirements of all three levels and satisfy human judgment of "faceness" requires extreme dedication to refinement. AheadForm has undertaken comprehensive independent development from underlying materials, skin craftsmanship, mechanical structure to embedded software and hardware.

AheadForm's relentless pursuit of facial precision causes users' brains to unconsciously complete the determinations of "looks real," "feels familiar," and "is trustworthy," making it possible for humans to empathize with it.

How Bionic Robots Can Deliver Broad Application Value Based on "Emotional Value"

AheadForm proposes: Humanoid Empathy Value — scarce attention assets brought by emotional connection

In an era where artificial intelligence is fully penetrating daily life, large language models (LLMs) are transforming "language" interaction, but what truly reaches people is "emotional" interaction. Compared to voice assistants or virtual conversational interfaces, a humanoid robot with natural expressions and emotional responses can instantly trigger human "emotional" impulses. The reason humanoid robots spark conversation with every appearance is precisely because their "human-like" qualities evoke human empathy.

This phenomenon is defined as: "Humanoid Empathy Value."

Humanoid robots can serve as emotional triggers, a gravitational core capable of creating a "spectator effect" in any public space. Therefore, humanoid robots are naturally equipped to become central assets in the attention economy — whether on short-video platforms, in exhibition venues, or at retail stores, and even in the future as household/commercial humanoid robots achieve wider adoption and enter millions of homes, the communication efficiency and emotional connection they can bring is unmatched by ordinary robots, traditional hardware, or even large model interfaces.

The attention economy is an economic model centered on competing for user attention. In this model, humanoid robots, by triggering "Humanoid Empathy Value," become the most disseminable and memorable content carriers in the attention market, thereby converting into leverage for traffic, brand, and monetization. And in the current context of information overload and content homogenization, the means of attention acquisition is gradually shifting from "piling up content" toward "building relationships": whoever can convert attention into emotional arousal and sustained empathetic connection will master the future attention entry point, and even the future commercial value entry point.

In today's rapidly developing landscape of artificial intelligence and humanoid robotics, AheadForm hopes to become a foundational pioneer leading the "era of robotic emotional awakening."

5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing them with support from the spiritual to all operational aspects. We believe that if the "crazy you" in others' eyes begins to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG