"Linverse" Gu Jiawei: Three Funding Rounds in Six Months, Debuts Next-Gen Smart Hardware OS at CES | Linear Capital Portfolio Company

线性资本·January 14, 2025

In the age of AI hardware, "sensing" systems will be rebuilt from the ground up.

The world of smart hardware hasn't seen this much excitement in a long time. At CES 2025, thousands of companies are showcasing a wide variety of AI glasses, robots, wearables, and other products. Among the dazzling array of offerings, we spotted an AI device for children — Ling! — on the CES show floor.

Image: Ling! demonstration

Ling! is a portable AI learning companion designed specifically for children aged 3 to 12.

From the demo, it's clear the device aims to provide children with instant learning opportunities and emotional companionship through photography, comprehension, and communication — becoming a partner for kids as they explore the world and acquire knowledge.

Image: Ling! demonstration

Behind this product is a startup that completed three rounds of funding in rapid succession just six months after its founding — "Ling Universe" (灵宇宙). Founder Jiawei Gu told us that Ling Universe, established in 2023, aims to build a next-generation hardware OS that infuses everything with "spirit" through AI. For now, the company is entering the market through consumer smart products for home scenarios. Ling! is one of them.

And it's not just during CES. Over the past year, a wave of smart hardware in various forms has launched globally — Ray-Ban Meta, Figure, Plaude, Ola Friend, AI Pin, Rabbit R1, to name a few. But before 2022, smart hardware wasn't exactly the most sought-after trend. The last story about using AI to capture the general home entry point ended with the smart speaker wars. In Gu's view, these recurring AI "trends" are actually individual industry cycles. And having worked in both product development and investment in smart hardware, he happens to be someone who has lived through these cycles.

Image: Jiawei Gu, founder of "Ling Universe"

Before founding Ling Universe, Gu had already spent a decade in smart hardware. His career began at Microsoft Research. At 28, he joined Baidu's Institute of Deep Learning and was selected for Baidu's "Young Leaders" program, where he led the development of cutting-edge AI products including Baidu Eye, DuBike, and DuLight. After 2016, Gu shed his identity as a pure technologist, dividing his time between investing in robotics at A-share listed company Netposa and serving as CEO of AI startup "Wuling Tech" (物灵科技). During this period straddling investment and entrepreneurship, he first spent 2 billion RMB investing in products including the world's most famous companion robot Jibo, Rethink Robotics, and KnightScope. Later, through Wuling Tech, he incubated Luka, one of China's first children's picture book robots, competing in the early education market against a host of early learning devices. To date, Luka has sold nearly 10 million units across 18 countries. In 2023, as the large model wave surged, Gu embarked on his second entrepreneurial chapter — founding Ling Universe.

Image: Jiawei Gu with Ling! at CES 2025

Perhaps because Luka's picture book feature was so down-to-earth, Gu's earlier AI projects at major companies seemed more like explorations of frontier technology. The most famous example is Baidu Eye, a smart wearable born in 2014 positioned as an AI assistant akin to the one in the film Her. Such products were conceptually cutting-edge, mostly fading from public view after their dazzling debuts — seemingly worlds apart from the Luka picture book robot that would later sell nearly 10 million units. But Gu told us that Baidu Eye and DuBike, though products of their era, inspired his subsequent entrepreneurial direction. "Even back then, people could see that smart hardware products needed complete perception, decision-making, and control systems — like robots — to achieve true 'intelligence.' And in personal consumer scenarios, implementing perception, decision-making, and control systems is far more complex," he said. Now, "many companies on the market are focused on the control layer. But from a results perspective, the perception and decision layers are more important." Gu added that to make these two layers smarter, starting from text alone has limited intelligence; multimodality is definitely the trend. Through multimodal product design that fully absorbs real-world data, making robots' perception and decision systems more intelligent and giving machines the ability to read emotions and situations — this has been Gu's persistent goal, and the origin of the name "Ling Universe."

Another factor different from the Baidu and first-generation Luka era is that in this AI cycle driven by large models, Gu sees the possibility of making "perception systems" more precise. More accurate "relationship algorithms" and more powerful "interactive behavior algorithms" are pieces of the technical puzzle in the large model era. Breaking it down further: multimodal video data co-training can help robots' perception and decision systems better understand user intent, emerging with natural emotional expression and even logic. And through fine-tuned generative models, multimodal temporal tasks can be automatically generated, enabling semi-predictive environmental perception that reduces the uncontrollability of robot tasks and actions. Gu is attempting to use this approach to create a LingOS system that allows machines to read emotions and situations. Before achieving this goal, both Ling Universe and the "spatial intelligence" camp led by Fei-Fei Li need to solve the problem of real-world perception. Fortunately, Gu had already run the closed loop of technology-product-commercialization-product upgrade with Luka. Facing greater real-world perception demands this time, Ling Universe has already prepared multiple products, hoping to feed LingOS with data while providing robot services to users, making the system more "spiritual." In concrete terms, Ling Universe recently partnered with Luka to launch a new AI learning companion product, with customers already completing purchases. The Ling! that Ling Universe brought to CES this time also integrates its self-developed LingOS system. It seamlessly connects AI with the real world through 4D spatial interaction (AI Spatial Interaction), providing children with immersive interactive learning experiences.

According to the company, Ling!'s core advantage lies in integrating the concept of "the world is the textbook" into every child's learning process, providing them with an exclusive "super study companion team." This "super study companion team" refers to built-in wise historical figures, subject expert teachers, and virtual IP characters. They not only allow Ling! to cover multiple disciplines including science, biology, English, history, geography, and literature, but also create more immersive interactive learning experiences for children through character performances, interactive stories, and light games — ensuring they always have "company" in the real world.

Image: LingMate's built-in AI characters accompany children in learning

Gu gave us an example: AI "Darwin" allows children to have "company" during outdoor exploration.

In real-world scenarios, this AI Darwin not only explains biology knowledge, identifying over 8,000 animal species, more than 20,000 plant species, and thousands of flowers and fruit trees, but also asks layered questions through chain-of-thought reasoning.

In effect, it guides children to slow down, observe carefully, explore the various parts and functions of organisms, and think about how biological systems operate.

This approach can build children's foundational thinking abilities, cultivating curiosity and an exploratory spirit.

Image: Ling! demonstration

Through the product, Gu summarized his core belief: he is committed to spatial intelligence, always hoping to make the world the interface, with everything as an executable object.

The LingOS system can unify and integrate spatiotemporal data sources including objects, actions, space, and behavior, transforming everything into interactive units within space, truly realizing the concept of "space as learning."

Before this CES, "Simu Relativity" (四木相对论) had a conversation with Gu.

In this conference room filled with various smart hardware devices, he explained to us the disruptive impact of large models on smart hardware, as well as Ling Universe's origins and progress.

The following is our exchange with him:

Part.01

To give machines "spirit," spatial interactive intelligence is essential

1) Simu Relativity: Your new company is called "Ling Universe" — what's the meaning behind this name?

Jiawei Gu: We've always had a direction we've been working toward: exploring the "soul" of machines and AI. "Everything has spirit" (万物有灵) is the philosophy we pursue, and "spirit" (灵) is in our company name.

"Spirit" represents machines "waking up," beginning to interpret human intention, expressing themselves like humans.

Since I decided to start a business in 2016, I've wanted to achieve this layer of proactive interaction and spirituality in machines. That hasn't changed.

To achieve this "spirit," we've been working on interpreting and deconstructing the physical world — that is, understanding the physical world.

For example, back at Baidu. When we first worked on the Xiaodu robot, we turned the body into an operable space. The Face U face recognition project turned third-person perspective interaction into an understandable space. Or making a wearable device that lets machines see from a first-person perspective — that space could also be understood.

Later, this machine capability to understand space was summarized by Fei-Fei Li's concept of "spatial intelligence" — that is, comprehensive understanding beyond text and voice, encompassing vision, 3D space, and even temporal information, while also enabling interaction with the physical world.

We previously built Luka at Wuling Tech, selling nearly 10 million units and serving 400,000 families, which was also a practice of this philosophy.

Its core was increasing physical world interaction. For example, you could specify reading content through finger movements, and through expressive "big eyes" that changed their gaze, convey reading direction and emotion.

Luka brought some good definitions to the industry. All learning machines, story machines, and early education machines now come with camera-based interaction. That is, after we created this category, it became an obvious move — everyone recognized it was good.

However, Luka at that time still had distance from true "spirituality." Spirituality is a comprehensive experience and expression. If we can advance both the understanding of the physical world and human interaction to the next level, it will possess the capability of spirituality.

2) Simu Relativity: How specifically should we understand this "next level"?

Jiawei Gu: In terms of effect, I believe proactive interaction is very important — machines that can read emotions and situations, with timely feedback and emotional connection, keeping users in a passive, comfortable state.

From a pathway perspective, it's likely based on LLM large models' elevated understanding of intent, collecting all-day data through IoT and sensors, achieving scene intelligence, then spatial intelligence, with Agents automating execution, ultimately achieving Personal AI — personalized artificial intelligence — for everyone.

From the underlying model structure, this manifests as relationship models between objects, space, actions, and behaviors.

To reach this level, machine environmental perception and emotional expression are indispensable — accurately and quickly recognizing user intent, proactively interacting with users. Analogous to robots, these are the "perception system" and "decision system."

In more professional terms, we call these the "relationship model" and "interactive large behavior model."

In the previous era doing similar things, practitioners including myself basically encountered challenges.

First, AI capabilities were limited at the time. Every field — CV, NLP, TTS, and related technologies — was a separate tech stack. At the product level, unified multimodality wasn't possible, making it impossible for end products to achieve "reading emotions and situations" intelligence.

Second, the data flywheel effect must be built on joint modeling of user behavior and the user's surrounding environment. Without truly grinding through a hardware closed loop in each field, it's difficult to achieve data closed loops, and equally difficult to iterate products to thoroughly penetrate user scenarios and demand value.

3) Simu Relativity: So this time, large models make "spirituality" easier to achieve.

Jiawei Gu: Breaking down from the three elements of robots — "perception, decision-making, execution" — may make it easier to understand where "spirituality" comes from.

Perception is the input of information and data, the understanding and recognition of various information in physical space. This includes data modalities such as text, voice, images, video, and so on. Decision-making analyzes and processes perceived content, then issues instructions, forms tasks, and guides robot behavior.

In the previous cycle, every modality — auditory, visual, tactile — needed to be perceived separately using CNN, RNN patterns, with rule-based mapping (machines following human-defined mappings, inputting question data, outputting answers) for rules, then unified decision-making for implementation — low efficiency, difficult generalization.

The Jibo we invested in at that time hired Broadway writers to script 100,000 dialogue pairs. We then implemented these pairs using rule-based methods.

Now with large language models, what was previously written rule-based has become the Transformer foundation. After visual data is mixed into model training, machines have unified multimodal perception capabilities, and also emergent emotional expression and output capabilities. Through large models and fine-tuned generative models, advanced task emotional expression can be achieved, and multimodal temporal tasks can be automatically generated.

In environmental perception, large language models' overall capability improvement — especially multimodal models — enhances environmental understanding, improving accuracy in predicting user intent. This is the biggest change.

In decision-making, first, the previous rule-based database decision mechanism has shifted to brain-like generative model decision-making, finally calling large model Agents to complete task output.

Going deeper is the cerebellum and execution — this is the part most humanoid robot companies are currently "competing" on.

We focus more on the upper layers — perception, interaction, and brain-side Agents. This is our biggest difference from other humanoid robot companies.

Similar to the autonomous driving industry, which was previously all rule-based, going through perception, decision-making, control, and other steps. Tesla adopted an "end-to-end" approach.

Today, AI hardware and embodied intelligence will likely go through similar stages, with the opportunity to move from original intent understanding capabilities directly to environmental pre-perception and pre-decision-making, even directly to decision-making.

We define this set of perception-decision model systems as "LingOS." We hope this is a general capability that can be used in multiple vertical scenarios.

4) Simu Relativity: For Ling Universe, what is the moat for doing this now?

Jiawei Gu: Robot "spirituality" must achieve spatial interactive intelligence, and perception is the most important link in achieving spatial interaction. In this link, data accumulation is the most core element.

Analogous to Tesla's visual approach to autonomous driving, the essence is first completing the data construction of how the driver as a character interacts with the real world, then re-annotating this data and adding it to model training.

Replicating this path on embodied intelligence has one major problem: the time cost is too high. Embodied robot data annotation requires operators behind the machine to complete various actions to construct. The tasks embodied robots need to execute are highly divergent, making it nearly impossible to exhaustively enumerate all tasks for data annotation.

So our thinking is: deploy enough consumer terminals and AI hardware to get back the closed loop of data streams. There are three types of data streams.

The first is putting real machines out to let them interact with people, completing the data closed loop. This is mainly third-person perspective data — from the robot's perspective, how people react during various tasks and interactions.

The second is first-person perspective data streams — face-to-face data of humans directly interacting with the physical world. This data can be obtained through a wearable companion device.

The third is video data from mixed training, manifested in emotional expression capabilities and further ability to exclude the cocktail party effect.

Beyond these data types, there's also a relationship algorithm link — making machines have long-term interactive relationships with users, with memory that "understands you."

Part.02

Focusing on high-value scenarios and vertical hit products

5) Simu Relativity: Compared with previous cycles, where do you think the imagination lies for AI hardware companies now?

Jiawei Gu: Macroscopically, as LLM multimodal large model capabilities significantly strengthen, Embodied AI spilling over to the hardware side — intelligent human-computer interaction — will take over from language models as the most important direction for AI implementation. General embodied intelligence (EAI) entering homes is a deterministic opportunity.

Currently, Physical AI represented by humanoid robots has gained capital attention. But today, when the entire market is pouring massive money into competing on humanoid robot capability layers, yet still cannot match home scenario demands or achieve low costs — this is a window period for the spatial interaction layer. In this aspect, there's instead hope to first overtake on interaction and scenario-based capabilities.

Historically, many AI hardware products have appeared, most of which cannot become next-generation computing terminal interactive devices in homes.

For example, robot vacuums are currently a hot category. But robot vacuums will never interact with people — basically, they come out to work after people leave. That is, they never involve data from human interaction. Previous smart speakers had similar problems.

Robot vacuums only solve a small portion of a 5,000 RMB housekeeper's tasks. There are higher ticket-price industries in homes that also have opportunities to be replaced by AI products. For example, hiring nannies, maternity nurses — these scenarios are much more expensive than hiring cleaning staff, and tutoring teachers are even more expensive. So, the ceiling in the education industry where we operate is very high.

As long as technology keeps improving, these scenarios that previously just lacked good product supply have opportunities to scale.

Today, technology has clearly reached the next stage. But many products we've experienced still have response speeds and effects that cannot handle complex tasks.

Overall, multimodal products are still needed. A person is already tired enough in the physical world — making users continuously speak voice commands actively is too high a threshold, too much energy consumption for users.

6) Simu Relativity: From Luka to Ling Universe, you've consistently focused on education. Why do you keep choosing this direction?

Jiawei Gu: I previously invested in Jibo. After deploying it in some education scenarios, I found training English and doing dialogue worked well. Then I found using it to look at picture books, to interact — that is, using it in physical world desktop education scenarios — worked well. So I converged Luka here.

Luka also had its own innovations. We essentially defined a multimodal, vision-camera-equipped interactive robot, which didn't exist before. All subsequent early education machines, learning machines, can basically be considered copies of Luka.

It was sought-after at the time because we were determined to cut into markets with scenarios and sufficiently high-frequency demand. For example, storytelling — we determined that the one thing that hasn't changed in thousands of years of human history is: women going out to gather fruit, men going out to hunt, everyone coming back to tell children stories. This hasn't changed.

The education track's ceiling is extremely high; it's actually a track with denser richness of interactive attributes. Especially in a domestic education market like China's, learning machines emerge.

That is, without convergence, a project is difficult to truly become a hit product. We converged Jibo to Luka; many companies converged Jibo to smart speakers, and all gradually found their footing.

Technology's boundaries need convergence, need time to iterate, and finally need to thoroughly grind through some vertical things.

At the time we did the Baidu bicycle, we also said it could carry vegetables, walk dogs, haul things — but actually this function was simply balance and following. In that era, using CNN and RNN to train separately was very inefficient, and decision-making was basically very difficult. The smart bicycle project inspired later scenarios like self-balancing scooters, quadruped robot dogs, and golf caddie carts. Everyone later found their footing, so convergence is still necessary.

In some vertical fields, one scenario is worth having an independent hit product. Especially with AI, a good piece of hardware can carry all AI capabilities, achieving 10x the experience of traditional models.

Take the translation scenario as an example. Before mobile phones, we had to look up the Xinhua Dictionary; later checking on Google shortened time by 10x.

On our Ling!, a teacher's listening lesson not only completes translation but also draws inferences to raise more suggestions and knowledge, even eliminating the note-taking step. For users, this might be a 10x user experience.

7) Simu Relativity: From the product and scenario perspective, what is Ling Universe's current plan?

Jiawei Gu: Currently, our product line focuses on home scenarios, with different product plans for different age stages — not just children's products.

For preschool children, we have Luka-type story machines or picture book reading robots. For teenagers, we also have exploratory companion robot products, eventually covering all age groups.

Once there is sufficient data, products can have capabilities like: recognizing children's emotions, having proactive reasoning cognition. Then it's like a very understanding assistant or a nanny that can read emotions and situations.

So our product experience becomes: reading emotions and situations, timely feedback, and emotional connection. A machine's interactive experience can keep people in a passive but very comfortable state. This is the direction we're ultimately heading.

8) Simu Relativity: What is Ling Universe's product and market rhythm?

Jiawei Gu: The algorithm and technology that Ling Universe currently defines at its core is something I've worked on for many years. I value LingOS highly. For devices, I most hope that the main control device is our own, but we can also empower various partners.

At this stage, I care more about whether the original market can produce stable results with new products. Current model maturity isn't good enough to deliver an experience that exceeds expectations, but if expectations are controlled, there's opportunity.

From the first-person perspective data standpoint, we need to first polish good software experience before user interaction becomes meaningful.

If current software experience hasn't been polished to such interactive experience, then experience and data are still disconnected, not continuous. In that case, even if product interactive experience already has value and can be pushed to market, the thickness of data value isn't enough.

About Linear Capital

Linear Capital is an early-stage investment institution focused on "frontier technology + industry" — that is, frontier technology represented by data intelligence, digital new infrastructure, next-generation robotics technology, and new technological transformations in traditional fields (such as biomedicine, materials, energy, etc.), applied across vertical industries to substantially improve industrial efficiency, empower solutions to pain points, and complete industrial upgrades — achieving excess returns through substantial increases in industrial value. It currently manages ten funds with total assets under management of approximately $2 billion.

Our investment stage focuses mainly on angel to Series A lead investments, with each project ranging from $1 million to $10 million (or RMB equivalent).

To date, we have invested in over 120 early-stage teams including Horizon Robotics, Kujiale, Sensors Data, Tezign, Rokid, Guandata, and Agile Robots. The combined valuation of Linear's portfolio companies is approximately $20 billion. In the near term, Linear Capital is working to become the best "Data Intelligence Technology Fund," and in the long term, gradually build itself into the most influential "Frontier Technology Application Fund."