Robot Vacuums, AI Glasses, Companion Robots... Is This Generation of AI Hardware Actually Any Good? | 5Y Tavern Vol. 25 [Podcast]

五源资本·February 12, 2025·26·0

2025: The Evolution, Belief, and Expectations of AI Hardware.

This episode of the 5Y Tavern podcast explores AI hardware. Our guests dive into today's hottest AI hardware products, sharing their observations on product and technology trends, and which directions look promising for 2025.

You'll hear about everything from robot vacuums with robotic arms to AI glasses, as they analyze the market potential of existing products, examine technical bottlenecks, and imagine future trajectories. They also discuss the fusion of emotion and functionality in robotics, and how technological innovation and product design can meet users' emotional needs. We hope this episode offers you some fresh perspectives.

Guests:

Di Zhang, Founder of Benmo Technology

Xing Meng, Partner at 5Y Capital

Host:

Kaiyan He, Investor at 5Y Capital

Selected excerpts from the podcast:

What are your new expectations for smart hardware in 2025?

Kaiyan He: What are you looking forward to in 2025?

Xing Meng: I see several major variables shaping the industry. The most important direction is definitely agent capabilities — an AI that can use various tools to solve complex, multi-step problems. As the range of functions expands, that's what we're most excited about. This applies to smart hardware too. Take a robot vacuum with a robotic arm: simply identifying an obstacle and moving it aside isn't that complex. But judging the type of item and deciding whether it goes in the trash or the laundry basket — that involves more sophisticated long-chain reasoning. I think this could happen this year, though expectations may be running high. Software progress is still incremental, and porting it to hardware will likely take at least another year.

Also, the robot vacuum market is massive, with annual shipments between 10 and 20 million units. Even if high-end models with robotic arms only capture 1% to 5% market share, that's potentially 100,000 to 500,000 units. While not as large as the industrial robotic arm market, these vacuums still contain key components like motors, reducers, and controllers. Once volumes scale up, what breakthrough innovations might emerge in the supply chain? Will these drive cost reductions and open new scenarios? That's another expectation I have for this year. I'd also love to hear Di's thoughts.

Di Zhang: Both points are fascinating. Our company has been investing in robotics for the long term, and we're particularly excited about this space. On the algorithm side, people talk about emergence and generalization. When you reverse-engineer that to the product side, you can see many robot categories also possess generalization capabilities. For example, traditional robot vacuums with minor modifications can deliver greater value in other scenarios — very similar to algorithmic generalization. Whether it's a front-mounted robotic arm or a rear add-on module, we're seeing new product forms emerge in home cleaning robots. These forms transcend single-function limitations and can flexibly adapt across multiple scenarios. That's what we're anticipating in robotics for 2025.

Xing Meng: There's been ongoing discussion about whether to build on a good chassis with modular functions, or to develop a general-purpose device like a robotic arm without modular add-ons. The first direction leans toward a chassis — perhaps a universal upper platform, but the complete machine might take a specialized form. The second direction is a fully general-purpose device from top to bottom. Which form is more likely to achieve mass production? If mass scale means 5 million units or more, what's your view?

Di Zhang: I think it's too early to tell. Upper-mounted robotic arms might push the supply chain to deliver better solutions — that's a potential opportunity. As for which functional module constitutes the true minimum viable product, there's no clarity yet. So both paths face challenges. One offers better generality, but actual functional performance and supply chain compatibility remain to be tested. The other would deliver excellent functional experience since it uses mature modules, but the critical question is: what do you mount? This requires more precise product definition and careful analysis of target demographics. Whoever solves these two key problems will likely gain more customers and shipment volume in their respective domain. It's still an open question, but the current discussion has already advanced beyond where we were before.

Kaiyan He: I'd like to share my own expectations for 2025. We see several major directions for product-level robotics innovation. First, the emergence of new technologies and the opportunities that "change" creates. AI is the obvious example today — as Meng mentioned, more intelligent agents. These underlying technologies drive new scenarios and products. In 2023 we had Rabbit; 2024 brought smart glasses; 2025 will certainly see new categories tied to this major technology cycle. They may not immediately achieve product-market fit, but they represent emerging product directions.

Second, opportunities around what stays constant. As Di noted, robotics and AI technology manifest gradually in products. There are iterative improvements every year, new breakthroughs in existing scenarios and needs. Robot vacuums, lawn mowing robots, even companion robots — these categories will endure. Consumer robots in cognitive and emotional domains may score only 20 or 30 points in experience today, but I believe they'll improve — user experience and product experience will get better.

Finally, the infrastructure needed for robots to become smarter — supply chains, software modules, even data services. These underlying technical capabilities, supply chain capabilities, and infrastructure capabilities are gradually advancing and creating new innovation opportunities.

These elements — consumer demand, technological development, supply chain progress, and product iteration — converging together create the acceleration we're so excited about in consumer electronics and robotics.

Why glasses could become a major category

Kaiyan He: What's the current experience with AI glasses? What's your long-term outlook for the AI glasses hardware and software technology stack?

Di Zhang: AI glasses can be divided into two categories: display-based and recording-based. Display-based includes head-mounted displays (HMDs) — Apple's Vision Pro is essentially a display-based product showcasing its AI capabilities. Many brands are now focusing on simply doing the display function well.

But display-based progress may lag behind recording-based, or capture-and-share, products. The category leader is Meta Ray-Ban. While it benefits from the Ray-Ban collaboration and associated brand premium, Meta genuinely delivers on the photo-sharing and recording experience. Its glasses almost perfectly simulate a first-person perspective, and the data captured from this viewpoint holds enormous potential for AI technology innovation. I'm particularly interested in this type of AI glasses product — one with brand premium, that nails its core recording function, and integrates practical large model capabilities.

Kaiyan He: I've tried Meta Ray-Ban myself. Shipments have already exceeded one million units. As mentioned, it identified a user need: first-person perspective capture. While the shooting experience isn't perfect yet, Meta found a balance between price and photo experience. I'd also like to hear Meng's perspective. Why did Chinese manufacturers rush to follow suit after Meta Ray-Ban's launch? Does this indicate low barriers to entry in this industry? Is this track suitable for startups in the long run? And glasses are so close to phones — they could easily become phone peripherals. How do you view this?

Xing Meng: This has been debated for a long time. The mainstream view on glasses, especially AI glasses, is that when good enough, they could become the next generation of phones. Because they're closest to the eyes — the highest-density information channel — and their interaction methods are more natural, closer to normal human gestures. Though not there yet, theoretically, if executed well enough, it would be a massive opportunity.

With an opportunity this large, is it for startups or for large companies? How does anyone find a breakout product? I've bought everything in this category, from the earliest Google Glass to later prism-style designs. The category then bifurcated: display-focused on one side, and environment-interaction, information-gathering-and-output on the other. The capture-focused path had Snap as an early practitioner — its Spectacles product was quite similar to Meta Ray-Ban, but about 8 years earlier. Its marketing was successful too, initially sold through pop-up stores and extremely popular at first, then fading. I think the Ray-Ban Meta collaboration — if you rewind to a year ago without knowing the sales numbers — most people would probably still predict it would flop like Spectacles. But at least in sales and cultural penetration, it succeeded, though I believe this relates to Meta's traffic buying and advertising. I spent considerable time in the US last year, and roughly every three TikTok videos, I'd see a Meta ad. The danger is you might not understand how it took off. So when making similar products, you likely lack the traffic buying budget or capability to achieve comparable scale.

Last year's Halliday Glasses used a small display, functionally harking back to early Google Glass with a retro feel. This change de-emphasized spatial positioning and SLAM technologies, adding only some large model interaction features — an interesting shift. For a long time, people assumed AI glasses would progress from prism-style, text-only, narrow-field-of-view displays, to wide-field-of-view, multi-information, rich-color displays, then to spatial positioning and interactive capabilities. But with large models, spatial positioning was de-emphasized while language interaction capabilities were amplified, plus traditional simple displays — this direction prompted rapid follow-on. Yet in this definition process, I don't see that many directions for differentiation. Going forward, this will be difficult for startups. The only opportunity might be niche models, vertical segments — technically there may be little differentiation, but from a user perception standpoint, there can be significant differentiation.

To push this further, if a startup truly seized this opportunity and really went crazy — what should this product ultimately look like? Just thinking out loud: why might glasses themselves become a major category? Because they're directly on the eyes, the largest window for human information input — everything you see goes through your eyes. But if their output capability is limited to display, that may be too thin. If a startup wanted to challenge a harder but more promising direction, perhaps some combination of glasses with exoskeletons. In this combination, output wouldn't be limited to display but integrated with body dynamics, covering more physical behaviors. This is a more complex system, harder to build, with longer wait times for companies, but potentially more opportunistic for startups. I haven't seen similar products yet, and I'd be quite excited to see something like this.

Companion robots: Function, emotion, and negative space

Kaiyan He: Regarding companion robots, current products fall into two major camps. One is the biomimetic camp, represented by fluffy LOVOT, with some Chinese and Japanese brands also producing these in familiar forms — monkeys, cats, raccoons, and such. Unfortunately due to cost constraints, these robots generally have low computing power, probably under 1 TOPS, likely priced around two to three hundred dollars. Another exciting development is companion robots beginning to attract more female users, different from the primarily male and geek user base of the past — I find this very interesting.

I think what's brilliant about companion robots is that their form and interaction design create stronger user tolerance for errors. If a robot vacuum performs poorly, users may abandon it. But if a companion robot's design and interaction are excellent, users will be more forgiving even if its functions are flawed. From the user perspective, functional products are easier to accept, but companion robots also have significant imaginative space in the long term.

Di Zhang: On companionship, I actually have a small point to share. I think we may have separated emotion and function too distinctly, which doesn't quite align with human nature. For example, if you use a tool for a long time, you may develop feelings for it, even feeling reluctant to discard it. There was also a widely shared video online where a household cat beat up a new robot vacuum, with comments joking that the most useful member of the household got taught a lesson by the most useless one. In this hidden context, people categorize home robots in the same realm as cats — even functional products get imbued with emotional value by users.

When we talk about emotional or companion robots, what exactly do we want them to accompany? Just like functional robots, they need to be broken down into specific modules to define good products. Emotional or companion robots similarly require segmentation to define excellent products. Even seemingly simple objects, like a fountain pen, if endowed with signatures, long-term companionship, or special imprints, can become companion items. So I believe in discussions of emotion, companionship, and functional robots, the boundary between them may contain opportunities for product definition.

Kaiyan He: Companion robots actually need a genius product manager to precisely define and trim — the boundary between intelligence and function is very blurry. Meng has also looked at many companion robots. What directions might a good companion robot product take? And what challenges might it face?

Xing Meng: I'd like to respond to Di's point first — I find it quite interesting. I've actually thought for a long time about whether emotional companionship products can merge with efficiency tools. A few examples: the fountain pen you mentioned is an efficiency tool, but with prolonged use, it can become an object of emotional companionship. Scaling up, cars are similar — you drive it for a long time, through weather and experiences together, memories and emotions form. But as products become more intelligent, this sense of emotional companionship seems to weaken. Computers and phones are highly intelligent, yet their emotional companionship value is relatively low.

I believe the essence of emotional companionship is people projecting emotions onto an object — it could be an animal or an item. Whether it's an efficiency tool isn't what matters; what's critical is whether there's sufficient "negative space" in your relationship with it. If a product is constantly responding to your needs, then it dominates the relationship — there's no negative space, no room to inject emotion. Your bond with a car, for instance, comes from feeling you depend on each other, experiencing weather and interesting moments together. But essentially, the car isn't "responding" in these scenarios — you're the one giving it the opportunity and capability. It's simply loyally helping you accomplish something simple.

I believe intelligent products are fundamentally non-negative-space by nature. Compare an electrified plush toy that can practice English with you, chat with you, powerful in function, with another plush toy that has no functions but looks equally cute. If we measure by emotional connection, duration of companionship, or even what you'd least want to throw away when moving — I would venture that the toy with no functions might have the advantage. Companionship is fundamentally bidirectional, not you having needs and it serving you. Once this service relationship forms, it's no longer companionship. Companionship is you forming some connection with it, you injecting emotion into it. With pets, you're the one caring for them, not them caring for you — you invest your emotions in them, creating a kind of emotional mapping.

Di Zhang: This topic is fascinating. I have a question: is Meng definitely a "cat person"?

Xing Meng: Yes, I have cats.

Di Zhang: I strongly agree with the "negative space" argument, but I clearly sense you're a "cat person" — it hides between your lines. However, many "dog people" may not have such a strong perception of "negative space." They would feel that what they have works for them every day, serves them loyally — it's precisely this service that allows them to invest emotion.

I think "dog people" and "cat people" differ significantly on the emotional dimension, and the companionship they expect is completely different. You could even say these are two entirely different groups, and you could design completely different products to serve them. Another point I strongly agree with is that AI or large models are extremely attractive in emotional products, because "negative space" can be implemented through large model algorithms. Large model algorithms can, based on different prompts, display almost any function and preset persona.

Beyond "dog people" and "cat people," there may be other types. I think future home category leaders will, on the foundation of doing basic functions well, also be able to collect customers' personalized needs data. Everyone's personality, emotional needs, and material needs differ — how can large models use this data to provide each person with the most suitable product combination,挖掘出不同的情感陪伴体验? Future industry leaders won't just know what you want to buy, but will understand you as a person. This is both the most terrifying, dangerous thing, and the most captivating.

Xing Meng: Returning to the essence of companionship — first, it's an emotional experience. But current AI development, whether large models, language models, or multimodal models, doesn't take emotional companionship as a core driving metric. Current evaluation standards focus mainly on reasoning capability, math problem solving — emotional cultivation is merely a byproduct. Therefore, for companion products, we need to approach from product design rather than relying purely on technology push.

A good product manager can drastically cut AI's complex functions, retaining only a few key features, and combine with visual design and other elements to create emotional resonance. Take Kaiyan's example of LOVOT — its functions are simple, but through concise expression and user emotional projection, it creates profound connection. In product definition, negative space should be preserved as much as possible, and the specific design of this "negative space" needs to be completed according to product positioning, enabling resonance with specific age groups or personalities on some emotional dimension. This may be one form of future emotional companion products.

Kaiyan He: As just mentioned, human emotions are diverse and can be projected onto various static or dynamic, functional or non-functional objects. Current large models' cognitive and emotional capabilities are still quite limited. In this situation, product managers need precise trimming and differentiated design in function, form, and interaction.

In my view, companion products have two core qualities that matter. One is sense of life and vitality — the product itself should give people a feeling of "seemingly alive." The other is needing a feedback system, not necessarily language-dependent. For example, LOVOT actively hugs you, or when you pet its head, it shows enjoyment, or even makes eye contact with you through lively eyes. These interactions themselves are feedback. Only through genuine interaction can users develop increasingly deep emotional connections with robots, and this connection is essential for long-term companionship.

Imagining the next generation of super hardware

Kaiyan He: Finally, I'd like everyone to exercise imagination: in which domains might the next generation of super hardware emerge? From a long-term perspective, what kind of product could truly become super hardware?

Di Zhang: I think there may be product definition possibilities at the boundary between function and emotion, and such definitions could potentially give birth to super hardware.

Xing Meng: I see two possible directions. One is single, small products with limited hardware functionality but greater potential at the software level. These may be closer to users' new needs — a single-point innovation that combines both emotional and functional utility.

The other is large, complex system combinations, like exoskeletons. Exoskeletons are essentially similar to glasses — one worn on the eyes, the other on the body. They can collect the richest signals from the human body and help users accomplish the behaviors they most want to perform. I believe this direction has tremendous long-term potential and could become a category on the scale of cars, computers, or phones.

5Y Capital seeks out, supports, and inspires lone entrepreneurs, providing support from spirit to all operational matters. We believe that if the crazy you in others' eyes begins to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG