Ant Group's Jigang and FengRui's Li Feng: How Can AI Hardware Startups Survive the "Data Desert"?

峰瑞资本·December 4, 2025·32·2

Past data has powered large language models; future AI development cannot happen without new data.

In late November, OpenAI made news in the smart hardware space: in a conversation video released by Emerson Collective, CEO Sam Altman mentioned that io had completed its first hardware prototypes.

io was founded by former Apple chief designer Jony Ive. About six months ago, OpenAI acquired the company — which aims to redefine how humans interact with computers — for $6.5 billion.

"The world deserves something better than what exists," Altman said. There's no doubt that the Altman-Ive partnership to build the next-generation smart device that surpasses the iPhone has added fuel to the increasingly fervent AI hardware frenzy. Yet looking globally, the next "iPhone moment" for smart hardware still seems quite distant. Why have so many futures we anticipated failed to arrive on schedule?

Not long ago, at an AI hardware event co-hosted by Ant Group's investment division, FreeS Fund, and Ant Entrepreneur Camp, Feng Shu (Li Feng) and Ant Group VP Ji Gang engaged in a deep conversation — relaxed in atmosphere, yet sparking with debate (Ji Gang jokingly called it "sparring").

The main topics they covered:

How should we define smart hardware in the AI era, and how is it fundamentally different from past "micro-smart" products?
Why is there less buzz around large language models now, and why have hot sectors like robotics and Agents cooled somewhat?
What's blocking the "iPhone moment" for AI hardware — missing algorithms or missing data? If it's data, where did the data come from that previously fueled internet super-apps, large models, and intelligent driving?
For smart hardware to become widely adopted consumer products, should we prioritize upgrading technology or satisfying user needs? In other words, build "black tech" first or build "good products" first?

We've compiled portions of the conversation, hoping to offer practitioners following the AI hardware track a perspective worth considering. We also look forward to connecting with more innovators — reach us at bp@freesvc.com.

Reader Giveaway Do you think smart hardware should prioritize "black tech" or "good products" to achieve mass-market adoption? Share your thoughts in the comments. By 5:00 PM on December 10, 2025, the two most thoughtful commenters will each receive a copy of The Competitive Advantage of Nations.

/ 01 /

How Should We Define "Smart Hardware in the AI Era"?

Ji Gang: First question for Feng Shu — how do we define smart hardware in the AI era? Products with some intelligence, like smart toilets, have existed for years. What's different about AI-era smart hardware?

Li Feng: This is an excellent question. I often use Japan's example to explain China's current industrial stage. After WWII, in the 1970s and 80s, Japan caught the technology upgrade from vacuum tubes to transistors. What Japan did was use the advanced transistor and integrated circuit technologies it had mastered at the time to convert many mechanical products into electronic ones — mechanical watches (Shanghai brand) became electronic watches (Casio), traditional mechanical pianos became electronic keyboards (Yamaha).

But back then, chips and sensors were far less advanced than today. Japanese companies could achieve "electrification," but not "digitalization." To use a modern analogy: they could convert gasoline cars to new energy vehicles, but couldn't add millimeter-wave radar or LiDAR to enable intelligent features like automatic parking.

Japan's electrification produced two results.

First, these products opened global markets, though they were once criticized as overcapacity. Originally, only so many people could afford pianos. After electrification, production volumes rose, making supply appear excessive.

Second — the flip side of that statement — while they seemed to export excessive capacity, they actually drove unit prices down, causing the product's penetration rate to rise significantly ten years later. In my parents' generation, a Shanghai-brand watch was dowry. By the latter half of my middle school years, many kids from families like mine — where parents were teachers — could wear a 5- or 10-yuan electronic watch.

Back to your question. Over two years ago, I told our team to invest more in robotics. A year ago, I said invest more in smart hardware. The reason is that China's situation somewhat resembles Japan's back then.

China has caught a new opportunity.

First, we have an extremely comprehensive manufacturing chain — hardware production capability.

Second, over seven or eight years, especially the six to seven years after the 2018 US sanctions on Huawei and ZTE, China completed its chip and sensor industrial chains. This includes part of the computing chip chain.

Third, though everyone now talks about consumption downgrade, China as the world's second-largest physical consumer goods market has the highest circulation efficiency globally.

Fourth, what we call the "involuted market" has a management definition: a market that reaches saturation quickly through competition. Once there's an innovation, it spreads rapidly.

These four factors force companies toward highly uncertain innovation. Management theory also notes that involution is the inevitable first stage of national innovation. In the imitation innovation stage, you knew where to go, who to learn from, what to achieve — you just worked to make it cheaper. Now we've reached the stage where we don't know what to do next. So one approach is to reconnect these available opportunities, stringing together a new line — some paths may be right, others wrong.

Involution has downsides, like impacting R&D cost investment, but it also spawns an enormous amount of fragmented innovation. These fragmented innovations leverage the existing industrial chain's horizontal and vertical structures, forming the core of today's smart hardware discussion.

Looking back at 1980s Japan: they too had a rapidly growing, internally competitive domestic market, complete industrial and manufacturing chains, and caught a wave of electronic component upgrades. They mastered better technologies faster. The combination of these factors enabled them to convert mechanical to electronic. Today, we can not only convert electronic to digital and intelligent, but also convert previously non-electronic things — like guitars — to electronic. Then, progressing from domestic "involution" to overseas "expansion," we can ultimately achieve world leadership.

Ji Gang: Let me push further. Growing domestic demand and complete industrial chains are the accumulated results of two to three decades, not something that happened at this moment. Yet the smart hardware explosion has become more prominent in just the past two years. Why didn't this opportunity emerge in earlier years when industrial chains were already relatively complete?

Moreover, AI as the era's biggest technological variable — what truly AI-native hardware has it driven? Personally, I see very little.

From my observation, the first category is AI-native hardware that changes how data is incrementally gathered (like AI Pin), but there aren't many. The second is hardware suitable as AI carriers, like digital glasses or AI companion devices. But this category easily gets lumped into traditional toys. The third is hardware going overseas.

From an investment perspective, these three categories have completely different valuation frameworks. For AI-driven hardware, you need to educate consumers about the product. Though sales are small now, there's potential for massive market growth later, possibly commanding high valuations. On the consumer goods side, especially overseas consumer goods, it's about supply chain and marketing channels.

I'm curious how Feng Shu distinguishes these smart hardware categories? Or do you look at and invest in anything that meets the factors you mentioned?

Li Feng: We look at anything meeting these factors. But I think a painful aspect of investing is that you can deduce what characteristics something should have, but you can't deduce what the final product will look like or who will create it. So anything meeting 50% or 70%+ of these characteristics, we'd consider investing in.

/ 02 /

Large Models Enter "Deep Waters"

Li Feng: Let me turn the question around. Why has there been less attention and discussion of large model developments themselves over the past half-year? Why isn't scaling law — the term popularized by large models that everyone mentioned a year and a half ago — discussed much anymore?

Ji Gang: I'm not sure I'm the best person to answer this, but I'll say a few words. When domestic large models emerged, we invested in Moonshot AI and Zhipu AI, later invested in some multimodal companies, and kept looking for application companies. At the time, there was a perception that large models might not be applications themselves — more like upgraded search engines. Because people's interaction habit remained the input box format.

Recently, OpenAI's developer conference revealed interesting data: while weekly active users reached 800 million, average daily usage was only about ten-plus minutes. This means it's only slightly longer than traditional search engines. In users' minds, it's still a "better search engine."

Additionally, the three products OpenAI announced at the developer conference — beyond pursuing AGI through underlying technology — showed OpenAI's commercial ambition, in my view.

The first is Apps SDK, providing frameworks for all existing products to let developers feed back information. The second is AgentKit — for all unsolved problems, entrepreneurs and developers can develop and solve within my framework. The third is Codex, which not only writes code but solves long-tail problems on the spot; if it can't solve something, it writes a solution for you in real-time, very efficiently.

After addressing all user needs within its framework — looking ahead two years, if OpenAI can grow to 800 million daily active users with 2 hours average daily usage — users will naturally revisit large models from a consumer perspective. They might not mention large models themselves, but rather the products they use. By then, what will OpenAI possess? All user entry points, plus unified memory of users' various products.

Coming back: why aren't investors talking about large models lately? Because large model competition has entered deep waters. Its progress no longer manifests much on the consumer side. The intelligence lead is already at a relatively high level, but unlike the early days when it was "100 times smarter" — very visible — many people may no longer perceive it.

Li Feng: Another reason is the lack of higher-level, publicly available data of different categories. Large models struggle to progress further along the same path.

/ 03 /

Insufficient Data: The Shared Challenge for Robotics and Agents

Li Feng: Both of us have invested heavily in different directions of embodied intelligence robotics. From the investment industry's perspective: two and a half years ago, large models were hot; a year ago, embodied intelligence robotics heated up domestically while the US leaned toward Agents, using large models for various digital applications. Now, both Agents and robotics have cooled somewhat. Why have both directions cooled?

Ji Gang: Sounds like a trap — let me try jumping in. We have indeed invested in some embodied intelligence projects, about 8 previously, maybe 2 more recently, but that's the order of magnitude. I think the bubble is quite serious; some companies' valuations rose 5x in a year without much substantive progress.

Li Feng: That point is important — that actual progress hasn't been observed.

Then let me ask two more questions. First, what capabilities do the impressive humanoid robot demonstrations mainly showcase? Second, what progress was expected but hasn't materialized as anticipated? What's blocking this progress?

Ji Gang: Let me answer the second question first. It's hard to imagine an industry going directly to industrial maturity when fundamental algorithms, data collection, and even technical approaches for the hardware itself haven't converged. That's impossible. I think without gradually resolving these issues, the robotics industry will struggle to advance.

But conversely, like autonomous driving startups — there were perhaps 200+ in 2015-2016. A few made it through today, but due to technology, regulations, and other factors, they're stuck at L2+, not yet capturing the industry's major dividends. Will we eventually reach autonomous driving? Certainly.

Robotics is the same. Today, too many problems remain unsolved; there's阶段性泡沫 (periodic泡沫). But perhaps in 15 years — though 15 years may not be precise — robotics will be a larger industry than electric vehicles plus autonomous driving combined. It's highly likely every middle-class family globally will have one or two units. That确定性 (certainty) is very strong.

Li Feng: Yes, so two and a half years ago I told our colleagues to invest as much as possible.

Ji Gang: But the issue is this industry's fluctuations and twists may be severe. Perhaps 80% of currently existing companies will be eliminated.

To answer your first question — what are the demonstrations actually showing? I think mainly the hardware's locomotion capabilities, or even partial locomotion capabilities. Many demo videos are sped up. The technical evolution goal is making robot movement speed approach human speed. This step may not be solvable within a few years. Perhaps I'm pessimistic.

Li Feng: Not at all. Yes, well-known companies' demos all lean toward pure locomotion capabilities. Think about it — whether dancing, somersaults, kicking, or playing soccer, it's all locomotion. What capability isn't being demonstrated? Human's other capabilities, like manipulation.

For solving manipulation, when everyone was investing frantically a year ago, different answers were given. Some thought transplanting large models as the brain would solve it. Others thought using more visual data to generalize training for upper limbs would work. I believe these certainly help, but final robustness and precision may be challenging.

The human brain is extremely complex. For example, suppose you're an extremely loyal football fan who's watched countless matches, knows all technical moves and referee rules by heart. Can you play at semi-professional level? I think definitely not, or I'd be a badminton champion already.

We can't likely transition from just watching to operating, especially involving specific action manipulation. The core issue is insufficient data.

Recall why we have large language models: internet text accumulated for decades. That data, plus computing power, plus algorithmic advances, produced today's large language models. Robotics locomotion capabilities stem from industrial robots starting to focus intensely on motion control — what angle to operate at, how dual arms coordinate, plus motor advances and control — accumulating three to four decades, with China becoming the world's largest industrial robot market in 2013.

Coming back to today, for manipulation, you need physical models, environmental data, multi-dimensional data of human-environment interaction. And this data is currently missing.

This resembles autonomous driving's past predicament. Around 2015, people said autonomous driving would soon be widespread. Ten years ago, people thought we'd be at L4. But ten years later, national standards only permit advertising up to L3.

What does autonomous driving need? Environmental data — digitizing the entire vehicle's state. What's the driving condition? What's the current speed? Which lane? Surrounding vehicles' conditions? Driver's condition? The ability to digitize these hasn't existed that long. Iterating on this basis brought us to today's intelligent driving stage where human takeover is still needed.

So my question is: where did the data supporting large language model development and robotics locomotion capabilities come from? And the data that produced autonomous driving's L2, L3?

The answer:普及化的新传感器到了消费者手里 (popularized new sensors reaching consumers), with enough people helping convert it into data. Simply put, for text, it was because PCs, keyboards, and mice let you turn thoughts in your head into internet text.

In autonomous driving, because of companies like Tesla installing numerous cameras and millimeter-wave radars on consumer vehicles, followed by Chinese new energy vehicle makers, more sensor types "got on vehicles," accumulating usable autonomous driving data, enabling today's L2.

Humanoid robots' current locomotion capabilities relate to our long-term accumulated scene experience and production line practice in industrial robots, control algorithms, and motor technology — with today's locomotion plus advanced algorithms introduced on this foundation.

Why does Douyin exist? One non-negligible reason is smartphones popularized high-definition cameras. Why do food delivery and ride-hailing apps exist? Because GPS sensors were popularized. Why does WeChat exist? Because they popularized microphone arrays capable of high-definition voice recognition.

Consumers won't buy sensors for sensors' sake. They buy a product — it's just that this product, coincidentally equipped with sensors, naturally converts consumer needs into data. After consumer-grade sensor popularization, you have this usable data. Building algorithms and computing power on this basis produces final technological progress.

Now, for robots to interact with the physical world, for Agents to handle various problems, for multimodal large models to generate more video and image types — all need massive data. Not text and image data, but human emotions, human language, human vital signs, physical environment conditions, human-environment interaction changes — infinitely multi-dimensional data.

We need massive new consumer hardware equipped with sensors to collect the large amounts of data we want. On this basis, algorithms and models can take the next step, evolving future technology.

Ji Gang: My sparring instinct is rising again — the causal relationship here may warrant further discussion. Like the US moon landing: they didn't wait for all technologies to mature or space stations to be built before launching. They went to the moon first, then反向带动 (reverse-drove) various technology developments.

The implicit topic here: if we view embodied intelligence as the endpoint, smart hardware, sensors, and data collection will indeed lead us there. I partially agree that intermediate data/hardware needs filling. But this may not be a strictly sequential process — it doesn't prevent us from sprinting toward embodied intelligence before fully filling these gaps.

Perhaps conversely, it's in the process of sprinting toward this goal that sufficient industrial development is driven,溢出 (spilling over) many technologies, leading to better smart hardware development today.

Li Feng: Yes, it's actually a mutually reinforcing process.

/ 04 /

The Cyclical Nature of Tech Investment and Path Choices for Smart Hardware Entrepreneurship

Li Feng: Next topic: "how to define new products." Earlier we discussed this from national and industrial chain perspectives, and from the data perspective. We can also view it from investment cyclicality. Tech investment typically works like this: first wave invests in the technology transformation itself (like large models); second wave invests in the most imaginative applications (like Agents, robotics), but these often have extremely difficult落地 (landing/implementation) and large bubbles; third wave goes to applications that can both use technology and prove demand, preferably making money. The good news is the third wave is about to begin.

So when there's technological progress, how do we find consumer-side applications? The biggest challenge here — also what investors often misjudge — is this eternal debate: do we find the product most suitable for the technology based on technological progress, or find a more technologically advanced product than what consumers use today based on consumer demand? Are we defining a user product at the most AI level, or using AI to enhance product capability and experience at the level closest to user demand?

Ji Gang: This is actually what I wanted to ask. Both directions can have answers.

Everyone knows Feng Shu was a very early investor in Insta360, which took the latter route — making a previously poorly-used scenario/product better. Conversely, DJI, which started slightly earlier, created a new category. If we go back 10+ years, and you could only invest in one of these two, how would you choose?

Li Feng: Actually their logic is consistent. Both founders had computer science, software-related backgrounds — algorithmic DNA. DJI mainly relied on flight control technology, using China's manufacturing industrial chain to take military-grade things down half a notch for Professional users, then降维 (dimensionally reduce) to consumer-grade drones. Insta360 relied on image stitching technology, combining with the industrial chain to make panoramic cameras, first selling to geek consumers with GoPro habits, then popularizing to the masses.

What they did resembles Japanese companies converting pianos to electronic keyboards — using China's industrial chain capability plus software capabilities at that time, taking an original professional product down half a notch, gaining recognition in overseas markets. On this basis, they established profitable positive cycles. Then as Chinese users' consumption power and market developed, they dropped another half notch to semi-professional, then continued down to mass consumer grade, becoming mass consumer goods.

/ 05 /

"Life Cheat Device" and AI Hardware's "iPhone Moment"

Ji Gang: If so, glasses weren't originally a Professional thing — they're a necessity everyone can wear, with a relatively single function: helping people see more clearly. Today entrepreneurs are making smart glasses products in different directions, yet computing power, battery life, and other issues remain unresolved. One product category took another path, choosing compromise — for example, not capturing from the best first-person perspective, becoming something hung on the chest. Slightly worse perspective, slightly worse image quality, but solving battery life and initial capture needs.

I focus on capture functionality because we internally discussed the "three eight-hours" concept:

Eight hours sleeping at night — generates massive data, but no interaction possible;

Another eight hours is your screen time — attention almost fully occupied by phones, computers;

The remaining eight hours — like our current conversation scenario — using traditional methods to capture data still has very high barriers. Could there be a device, like Plaud, that easily records conversations? Not just recording, perhaps also analyzing your expressions and movements.

When discussing with colleagues, I defined this as a "life cheat device" — during interviews, seeing the interviewer's expressions through the device, preparing how to answer the next question. I'm wondering if such devices could become next-generation mainstream capture and interaction devices? Is the final boss glasses?

Li Feng: First, I agree glasses are definitely one category of final major device. But as mentioned, consumers won't buy sensors for sensors' sake — they buy products to solve needs. So don't define needs from the data level; define products from the needs level, obtaining data. This sounds convoluted — simply put, you can't assume a capture device that, once popularized, gets the most data and makes money selling data, just because we lack this data. That sounds like a beautiful story, but it's not consumer logic.

Like many people saying they'll make the next iPhone, defining a consumer-grade product like Steve Jobs. But before iPhone came iPod; between iPod and iPhone were BlackBerry and Palm; then came iPhone. Before iPod was MP3 — people first developed the habit of listening to music on MP3, then found iPod a better product.

Additionally, it wasn't until the third generation that you started considering iPhone a good phone. These steps are hard to skip, even for Jobs. So glasses are ultimately a result — eventually there will be an "iPhone," but is today the time to sell "iPhone"? Different people likely have different answers.

Ji Gang: So will people invest in intermediate-state products?

Li Feng: Simply put: either you obtain multi-dimensional human body data or multi-dimensional environmental data, or you obtain various scene, state, and emotion data through vision. One dimension from sensors, one from lenses — these are two dimensions. Second, ideally tolerate having chips on-device, so edge-cloud can combine for computing power — though this has challenges including power consumption, size, volume, cost.

Because you're capturing new data, no device can be AI-native from the start. More reasonable is using demand to drive digitalization first. Like Tesla — install sensors first, then digitize users' driving habits, states, routes, usual road conditions, and surrounding environments. On this basis, use edge-cloud combination for intelligence.

What is intelligence? Making this further become personalization. Trying to do AI personalization in step one is arguably impossible. Because it's new dimensions, new environments, new needs, new scenarios — you lack data. Going AI-first from the start may be a story some investors are willing to buy, not a story consumers can perceive.

Finally, find intersections — China has industrial chain capability, large consumer market, and sufficient market competition; you can find intersections within. Having found intersections, don't rush iteration — satisfy consumer needs first.

Take cameras as an example. Nokia and Motorola phones we used had rear cameras. But back then, almost nobody used rear cameras to shoot things. After iPhone, even with only 2-3 megapixels, it first got people using cameras — that was remarkable. After users developed photo-taking habits, it pushed cloud services to upload all photos, eventually forming dependence. The logic is the same — saying you want to make an extreme camera from the start would be somewhat challenging.

Ji Gang: Completely agree. For example, sleep state and quality were originally very subjective, but now quantified through devices. This quantification is actually inaccurate. Sometimes you feel you slept well but get a low score; sometimes it gives 80 points but you feel terrible — yet the psychological comfort is good. Similar examples abound.

What I'm expressing is that much data's value is hard to predict before being captured. I always feel this is like an undeveloped mine — like the 1958 Bayan Obo mining, when everyone thought it was an iron mine, later realizing Bayan Obo was a massive rare earth mine.

Right now I'm thinking, humans are ultimately embodied anyway — whether it's AI or not doesn't matter. The massive data generated during Feng Shu's presentation just now, including every second of data we produce going forward, may all be valuable in the future. That's my final sparring point.

Reader Giveaway

Do you think smart hardware should prioritize "black tech" or "good products" to achieve mass-market adoption? Share your thoughts in the comments. By 5:00 PM on December 10, 2025, the two most thoughtful commenters will each receive a copy of The Competitive Advantage of Nations.

Looking Ahead to 2026: What Innovation Opportunities Exist in AI? | FreeS Report

How to Bridge the "Last Centimeter" Between Robots and Real-World Interaction?

"In AI Hardware, There Are Always New Opportunities"

From Large Models to AI Companions: What Cyclical Patterns Lie Behind AI Hot Topic Rotation?

Zhang Sai's 13-Year Robotics Entrepreneurship Oral History: The Rise and Fall of the "Big Four," Persistence to the End, Layout of Embodied Intelligence and Going Global

Star the FreeS Fund WeChat Official Account — timely business insights delivered to you.