Before Dominating 618, We Talked to Gu Jiawei About "Ling Universe" and Its Philosophy of Deep Interaction | Linear Voice

线性资本·June 26, 2025·24·0

Redefining the way humans connect with the world.

This year's 618 shopping festival saw a dark horse emerge in the AI hardware sales rankings — LING Universe's "AI Learning Companion Cube" topped the AI toy bestseller list for over ten consecutive days, with livestream sales exceeding one million RMB in just ten minutes. Earlier, this AI hardware terminal also won the 2025 German Red Dot Design Award.

Before the Cube's 618 dominance, Linear Capital sat down with founder Jiawei Gu for an in-depth interview, where he laid out the underlying logic behind LING Universe. His core belief: the real interaction revolution isn't about making devices smarter, but making them "disappear" into everyday life. The "AI Learning Companion Cube" is just a promising start — LING Universe ultimately aims to "redefine how humans connect with the world."

This 618, the AI hardware sales charts became one of the most closely watched rankings. Among them was a fresh dark horse — the recently launched LING Universe "AI Learning Companion Cube," which topped the AI toy bestseller list for over ten straight days and cleared one million RMB in livestream sales within ten minutes. Earlier, this AI hardware terminal had also captured the 2025 German Red Dot Design Award.

At a time when commercialization of AI applications remains hotly debated, LING Universe's eye-catching 618 sales figures serve as a shot in the arm, demonstrating the boundless potential of consumer hardware supercharged by AI.

But for LING Universe founder Jiawei Gu, this isn't his first "hit product." This serial entrepreneur and "technical prodigy" was selected for Baidu's "Young Marshal Program" at age 28, where he played a key role in major innovation projects including BaiduEye and DuBike. He later founded Ling Technology, whose flagship product — the Luka children's picture-book reading robot — sold nearly ten million units globally. As an investor, he also backed some of the world's best-known companion robots, including Jibo, Rethink Robotics, and KnightScope.

Having ridden the cycles of consumer hardware, Gu launched his next venture in 2023. LING Universe raised three rounds within six months from Linear Capital, Yaotu Capital, Tsinghua SEE Fund, Yinxingu, SenseTime, 37 Interactive Entertainment, and Xueda Education, among other institutional and strategic investors, with its latest round currently closing with several top-tier investment firms.

Before the Cube's 618 triumph, Linear Capital spoke with Gu in depth about the foundational thinking behind LING Universe. For him, the true interaction revolution isn't making devices smarter — it's making them "disappear" into daily life. The "AI Learning Companion Cube" is merely a solid beginning; LING Universe's future ambition is to "redefine how humans connect with the world."

Humans have lived alongside tools for millions of years, with waves of innovation constantly propelling civilization forward. From bodily extensions in the Stone Age to powered augmentation in the Industrial Revolution, AI is now driving a third qualitative leap — we are fortunate to witness the intelligent era's transition from "using tools" to "partnering with them."

Whether at Microsoft Research Asia, Baidu's IDL Deep Learning Institute, or as an entrepreneur immersed in industry, my focus on human-machine interaction has always circled back to one question: over the next decade, how can machines proactively adapt to humans, rather than humans accommodating technology? The breakthrough in large language model AI provides the answer: when AI possesses long- and short-term memory, proactive reasoning, and human-like interaction capabilities, the underlying logic of human-machine relationships is being restructured — this is the era-defining context from which LING Universe was born. The technological leap beginning with large models has opened an entirely new world. Today, the connections between humans and AI, and between humans and the physical world, are undergoing fundamental transformation — a shift that will completely overturn the interaction paradigms we know.

Through two years of close observation and research into large model technology, we have identified a significant trend. In the generative AI era, we are evolving from the previous "recommendation algorithm" to a new generation of "relationship algorithms," triggering extreme personalization in content production:

This involves not merely elevating past user profile tags into highly precise portraits based on long-term 4D spatiotemporal sequences (time, space, behavior, emotion), achieving "knowing you better" and "created just for you in this moment." For example, LING Universe's AI learning companion can remember a plant question a child asked at the park three days ago, and proactively extend the knowledge chain when passing a similar scene today — this "memory-association-inspiration" interaction model surpasses the passive service of traditional "keyword matching."

It further incorporates Agentic AI interactive agents with networked relationship-chain attributes, achieving de-platformization of interaction carriers. Traditional PUGC content platforms dependent on creator ecosystems are declining, while AIGC technology enables "content democratization" — tech companies can directly generate scenario-adaptive interactive content through Agents. For instance, "Darwin" in LING Universe's AI Cube can transform a leaf before your eyes into a science education script in real time. This "scenario-content-interaction" closed loop makes the AI hardware terminal an independent content generation and interaction center, rather than a vassal of platform traffic. The leap in interaction experience and the technological democratization of content provide human-machine interaction companies with the opportunity to recalibrate the "interaction-content" leverage — a new AIGC interactive content platform is rising.

I hope to lead LING Universe in seizing this opportunity: letting AI accompany the digital natives of the intelligent era as they grow. AI and robotics technology will profoundly transform how the next generation interacts with content, services, and the physical world. Through LING Universe's technological and product innovations, we aim to create meaningful change for this new era. Leveraging Agentic AI (endowing AI learning companions with "character personalities") and Physical AI (transforming the physical world into an "interactive knowledge base" through visual recognition, voice interaction, and spatial perception), we break the boundaries between virtual and reality, allowing users to naturally interact with AI agents in real-world scenarios, activating knowledge and experiences in the physical world, so that AI truly becomes a companion for users to explore the world and learn, reshaping how people connect with content and services, and how they connect with the physical world.

Bringing the storytelling AI from screens into the physical world means making AI a "sensory extension" for children exploring their surroundings. Just as Luka evolved from "reading thousands of books" to the Cube's "traveling thousands of miles," the essence is an upgrade from "knowledge delivery" to "cognitive co-creation."

LING Universe's core goal is to become the "operating system" for the next generation's connection to the world.

As AI technology leaps from "tool attributes" to "companion attributes," LING Universe is using AgentOS as its fulcrum to pry open the fourth revolution in human-machine interaction — shifting from "humans adapting to machines" to "devices adapting to humans" and "machines understanding the real world," ultimately achieving Physical AI's vision of "animating all things."

We are a company building an interaction OS, "defining the rules of human-machine interaction," not manufacturing hardware devices — this is LING Universe's goal. Through the "Physical World AI-OS," we are constructing an ecosystem analogous to iOS/Android: the 4D spatiotemporal intelligent interaction operating system LingOS — hardware is the vessel, interaction is the soul, data is the fuel. Specifically:

Spatial modeling (3D environmental understanding): Through cameras and sensors identifying objects (plants, buildings, etc.), constructing a real-time "world knowledge base." For example, when the Cube photographs a flower, it automatically associates botanical knowledge and generates interactive questions, transforming real-world scenes into "interactive textbooks."
Temporal sequences (growth memory accumulation): Building on tens of millions of children's interaction data accumulated from the Luka era, LingOS records users' cognitive trajectories from toddlerhood through adolescence. If a child asked "how do ants move house" three days ago, the system will proactively push extended experiment protocols when passing an ant colony today, forming a closed loop of "historical memory-real-world scenario-knowledge expansion."
Behavioral prediction (proactive interaction triggers): Unlike traditional voice assistants' passive responses, Agentic agents (such as Li Bai, Einstein) initiate interactions based on user state. For example, when a child is sketching in the park, the "Li Bai" character may interject via FaceTime, improvising poetry inspired by the scenery and guiding creative expression — achieving proactive "interaction finding people" service.
Emotional resonance (relationship algorithm-driven): Through 4D spatiotemporal data training of "relationship algorithms," AI is no longer a tool but a companion with "character personality." The "Baize" character can identify subdued emotions in a child's tone and proactively initiate story-sharing or emotional guidance — this "emotion-behavior" associative response transcends the tag-based logic of traditional recommendation algorithms.

We are enabling "animating objects with spirit" technology, making the physical world the interaction interface. The core logic lies in making physical entities themselves intelligent carriers — users need no intermediary devices like phones or computers, but interact naturally with objects in their environment (picture books, toys, furniture). From Luka's initial screen-free touch technology creating picture-book reading and desktop interaction scenarios on tables, to the Ling! Cube transforming the world into a classroom, this "device-free" interaction experience essentially converts the entire physical world into an interactive interface, realizing the vision of "the world is the classroom, the world is the textbook."

AI will continuously attend to the physical world humans see. Whether through always-on devices or instant-on devices, recording what happens around people daily; but in the near term, more implementable and commercially viable are products like ours with cameras and high-density interactive information input — like a "parrot on your shoulder," recording life from a first-person perspective.

Today AI interacts with users through search boxes, through dialogue — but these are transitional stages. In the Agentic AI era, I believe AI's interaction with users should resemble "Facebook's form" — AgentOS's interactive objects are the agent universe constructed by multiple intelligent agents, and the content universe constructed by feeds of content co-created by these agents and users, together building the spirited LING Universe.

Before AI, even with excellent recording, there was no way to find interaction methods with this real spatial data. Now it can be real-time interactive scenarios — for instance, in our product there's a character "Li Bai" who initiates a FaceTime video call with you, seeing your world from a first-person perspective and composing poetry, exploring and continuously interacting with you directly in the physical world.

Before the Agentic AI era, efficiency gains relied on upgrading existing devices with AI. But these AI-enhanced PCs, phones, and other devices can no longer satisfy the "machine adapting to human" interaction demand, so opportunities for AI-native new species will clearly emerge. Each generation of interaction modality, compared to the previous,切入s at increasingly younger age groups. So I believe that to find the right entry point for AI-native independent computing terminals at this stage, don't try to replace phones on day one — because parents instinctively resist children accessing phones too early.

For the generation born coexisting with AI, conversing with AI characters is as natural as chatting with real people. LING Universe must build infrastructure for this "new social intuition" — making every physical scenario capable of activating intelligent companions, making every exploration a journey of "making new friends." We aim to切入 the first AI-OS device exclusive to AI-native populations, providing "instant scenario-based interaction" and continuously generating interaction data through hardware.

The "holy grail" of human-machine interaction is finding the next interaction paradigm after phones. We all hope to find the next independent computing device terminal comparable to phones, but after all these years, nothing has replaced GUI (Graphical User Interface) as the mainstream interaction paradigm. Thus, phones' central position remains unshakeable. I believe that for perhaps the next 5-10 years, phones' central device position will still be difficult to displace — the previous wave of smart glasses companies often talked about replacing phones, but in reality they remain hard-pressed to escape their role as phone accessories.

Therefore, our starting point of thinking isn't to build a piece of hardware; the core entry point is defining hardware through software interaction. Our approach is to first define scenarios, define software, then see how to embed AI-human interaction into scenarios. Through scenario-driven interaction innovation, we are shaping the LING Universe Cube into an all-day learning vehicle.

Imagine, earliest learning relied on books; only in the browser era did slightly more convenient information retrieval emerge, yet still requiring human filtering of information. Through different Agents achieving proactive interaction, what we call proactive interaction transforms the world into an interaction interface, letting AI function in physical world spaces, activating multi-role intelligent agents in various life scenarios, providing personalized learning and interactive experiences.

Steve Jobs shared something at Stanford forty years ago that deeply moved me. Speaking of wanting to build computers, he gave an example: we may have seen Aristotle's books, but when we have questions while reading and wish to discuss them with the sage, how can we get his feedback? Is there some way for contemporary people to dialogue with Aristotle? Jobs was already envisioning this in that era.

The Ling! Cube's AgentOS proactive interaction — letting digital-native children of the intelligent era converse with different Agents, finding different people to help answer their questions — I believe "finding people to solve problems" is first-principles thinking. The way to find people: one is like previous communication software or finding real people on Facebook, connecting through the so-called six degrees of separation.

I believe AI-human interaction is about finding suitable, interesting, familiar multiple roles to solve problems — for example, when a child visits Yue Fei Temple, the Yue Fei character activates to personally recount historical anecdotes; in a community garden, a Darwin character may activate to discuss biological knowledge based on saplings the child photographs. We've built many gamified design elements into the product, activating AI Agent characters based on location scenarios, referencing the former Pokémon Go gameplay.

Reconstructing interaction patterns through AI Agents networks, achieving integration of "finding people + getting things done," making interaction more aligned with human intuition and actual needs. We are attempting to construct an Agents network, letting these native Agents execute work formerly done by apps. Looking back at my 2014 concept "Apps are dead, intelligence is eternal" — in the mobile internet era, users needed to download numerous apps to meet segmented needs; in the AI era, intelligent agents should proactively provide services through natural language and scenario perception — no need to open a "weather app," AI will remind you to bring an umbrella based on your outfit when leaving home; no need to search for translation tools, AI Agent characters will interpret foreign signage in real time. This "de-appification" of interaction essentially returns technology to its original intention of "serving people," rather than making people adapt to technology's rules.

We've all recently been following WorldLabs founded by Fei-Fei Li. Early in my time at Baidu Research, I worked alongside Kai Yu and Andrew Ng, and also had extensive interactions with Fei-Fei Li — at Baidu's US research arm there was considerable engagement with Stanford AI Lab. Today many technologies are reinterpreting future world-understanding capabilities through 3D mapping, presenting human perception of the world graphically, in gaming, MR/XR, robotics, and other fields — redefining the ability to understand the world, optimizing existing methods rather than relying solely on uncontrollable technological emergence.

First, accumulated cognition and practice in spatial interaction intelligence AI. BaiduEye at the time was about a second brain, a third eye. Of course the software algorithms running on the hardware devices weren't mature enough then, and the previous AI era had no large models capable of proactive interaction — yet even so, we used CV small models to define many interesting BaiduEye functions for museum, shopping mall, and other scenarios.

Benefiting from the new opportunities brought by today's large models, I insist on choosing an iteration path based on the next generation's first-person perspective. This connects closely to my early experiments at Baidu, where BaiduEye's core thinking explored finding a way to accompany people's lives with high-density first-person perspective, introducing first-person "live" data.

Over a decade ago, MobileEye founder Ziv Aviram, after building large-scale physical space data for Tesla, constructed Oracam mounted on eyeglass legs as a visual aid for the visually impaired and blind, building human-scale living space data. This idea deeply influenced me at the time — there was no "scale law" talk that year, yet thinking about data emergence of intelligence was already deeply planted in my mind.

These experiences more or less laid groundwork for Luka's later birth. People often ask about our inspiration for LING Universe — it all stems from long-term accumulated experience. From BaiduEye to the LING Universe Cube over a decade later, both originate from processing data from human-homologous first-person perspective vision, hearing, voice, and other senses, obtaining fine-grained information anytime, anywhere. From pocket-in to instant-on to always-on, the multi-role intelligent agents in the Cube can see the world the child sees, becoming the user's second brain, third eye, continuously acquiring real-time interaction data.

Second, finding our users and building applications with genuine PMF. In business, over the years I've learned to look at problems from different angles, observing how different companies think, which has brought me much insight. For example, Facebook's two attempts to acquire Snapchat clearly showed the threat the latter posed — it had captured young users. This wave of AI development is penetrating very fast among young people. The younger the child, the less learning cost and habit baggage, the more easily they accept new technology. From touchscreens to voice interaction, then to large models and Agents, moving increasingly closer to humanity's most natural interaction modalities — this is an inevitable trend.

However, as a serial entrepreneur today, I maintain cautious optimism when choosing赛道, rather than blind optimism. When judging a technology's development prospects, whether targeting vertical populations or exploring blind spots major companies haven't yet noticed (like Snapchat shaking Facebook when it first emerged), combining these two dimensions — I believe this is a relatively ideal entry approach for startups.

Every generation of information products to intelligent terminals starts from younger demographic markets. Over the years, major disruptive innovations occurring among digital native populations (those born after 2010, the Alpha generation) will create significant new opportunities, because this demographic has high acceptance of new interactions, and in turn may even influence mainstream populations.

Additionally, there are two other important reasons for starting from the next generation's young population. First, moats. In China, the barrier to entry for hardware like AI glasses isn't low, but phone manufacturers may also follow suit. As a startup team, we've already accumulated very strong hardware-software integrated experience through the previous cycle. I believe the core moat comes from software — vertical data accumulated from vertical populations (text language, spatial images, growth trajectories), data continuously internalized into LingOS and iterating. For example, interacting with children presents unique technical challenges based on data iteration; safety guardrails also differ from ordinary AI products; AI Agents at the content layer serve as our most direct content distribution carriers, accumulating more real interaction data and interesting content with children. These are vertical capabilities that general-purpose large models don't possess.

Second, considering user payment and corresponding business models. Education itself is investment, not pure consumption. Parents have sufficiently strong motivation to pay for education — compared to past cramming-style "exam-oriented education," today children can already learn through play, through interaction, even in outdoor scenarios. What parents are paying for isn't a "toy." Some funded AI toys on the market fail to articulate the core value of parents paying and children using. Compared to various interest classes, online courses, and other education products with clear payment points, today's AI Agents deliver more than 10x the experience, but at perhaps 1/10 the price. Speaking of which, let me mention our business model: hardware is just the first step, with substantial ongoing revenue from software and content fees to follow. This is why we must deliver the best experience.

In summary, I believe the Cube isn't a function-based product, but a container — everyone based on this interface can interact with characters actively or passively, and AI will in turn interact with you. It leads you to find "people" based on time and space — this layer of capability, we've built a middle platform for, which we call character creation, world creation, memory creation.

My ultimate vision is to build a home robot that "truly understands humans" — this requires solving the core pain point in current robotics: achieving "words become law," i.e., precisely understanding intent and responding instantly. But at present, we remain far from this goal, with the core bottleneck being the lack of Physical AI embodied physical intelligence data.

Our long-term strategy is to build the "intelligence cornerstone" for home robots, constructing a spatial interaction dataset covering real-life scenarios to provide core "intelligence components" for home robot R&D.

I've always spoken of "animating all things" — now this remains distant from that goal, with the core reason being data scarcity. Long-term, what I want to advance is moving data input to the next stage, building up the Physical AI dataset, becoming truly human-understanding AI. Achieving this has a very long path; different companies are working on it with different approaches — this is also the fun of entrepreneurship. I always believe that for AI to understand humans requires building real interaction data; vertical populations' real data will be our increasingly deep moat.

Our short-term strategy is to scale Personal AI portable terminals, building human-machine interaction closed loops and accumulating data. To achieve long-term goals, we need a "data collection + commercial validation" entry point — this is the Ling! Cube currently being advanced, creating a "personal AI companion" hit product, achieving high-frequency use among Alpha generation youth, real-time accumulation of "human-physical world interaction" data — such as questions when scanning plants, exploration behaviors in museums — forming a "user need-AI response-data feedback" human-in-the-loop closed loop.

Short-term goals aren't just sales volume, but validating technical feasibility through scaled users. Similar to how Tesla first scaled electric vehicles to accumulate real driving data, we need the Ling! Cube's commercial rapid scaling to acquire tens of millions of user behaviors**, providing training material for subsequent home robots' scenario understanding capabilities.

Avoiding the red ocean of general-purpose humanoid robots and focusing on differentiation is my thinking. We explicitly do not engage in motion control technologies (such as robotic arm manipulation, bipedal locomotion control R&D) — once such hardware technologies mature, like large model capability overflow, the industry will rapidly share results. Our core target moat is embodied brain intelligence of "letting machines understand spaces, understand humans," focusing on interaction perception and spatial intelligence — this is the key to whether home robots can have "soul."

Home robot R&D is destined to be a decade-long journey of perseverance; the industry has various paths: some focus on hardware form innovation, others on algorithm breakthroughs. But we firmly believe that "human-machine interaction data" in real scenarios is the fuel toward AGI. The short-term Cube is a stepping stone, the long-term home robot is the destination, and every step in between accumulates puzzle pieces for "giving machines the spirituality to understand humans."

As I often say, the joy of entrepreneurship lies in choosing a path few others take, then walking it with conviction.