NVIDIA's $20 Billion Groq Acquisition: Is LPU Architecture the Optimal Solution for AI Inference?

峰瑞资本·January 28, 2026·16·1

Where Are the Opportunities for China's AI Inference Chips?

By late 2025, the global semiconductor world was rocked by major news: NVIDIA announced an agreement with AI chip startup Groq, spending $20 billion for a non-exclusive technology license and to absorb its core team.

Even for NVIDIA, $20 billion is no small sum — it amounts to one-third of its annual cash flow, nearly one-quarter. So what kind of company is Groq exactly?

Groq represents the most advanced LPU (Language Processing Unit) architecture today. Its founder, Jonathan Ross, was previously the inventor of Google's TPU (Tensor Processing Unit). But this doesn't mean the LPU is an upgraded version of the TPU.

The chip family tree does have plenty of "PUs." The CPU (Central Processing Unit) and GPU (Graphics Processing Unit) we're familiar with represent traditional logical processing and the massively parallel computing that dominates AI training, respectively.

The TPU attempted to serve both large-scale training and inference needs.

The LPU, by contrast, is a "new species." It sheds the burden of training, more like an F1 race car purpose-built for large-model inference. It abandons the mainstream von Neumann "memory-compute separation" architecture in pursuit of the most extreme low latency.

"Consistently stable low latency" is especially critical in embodied intelligence and edge hardware scenarios, because it determines whether human-machine interaction is real-time and smooth. In consumer hardware, choppy or inconsistent human-machine interaction might merely cause discomfort, but in autonomous driving, it can be fatal.

More importantly, in these kinds of scenarios, China has enormous opportunities for development. One key reason: we have the world's most complete electronics supply chain, meaning we're closer to the supply chain and closer to customers.

Not long ago, FreeS Fund partner Yongcheng Yang sat down for an in-depth conversation with Bin Yang, who has 20 years of top-tier chip-building experience at major tech companies and recently founded Yuanchuan Micro, focused on cost-effective AI inference edge processors using a Groq-like LPU architecture.

Their main topics of discussion included:

Did NVIDIA throw down $20 billion for Groq's core team out of fear of a competitor, or to fill the final missing piece of its compute empire?
Groq's founder may be the "father of TPU," but why is LPU not an upgraded TPU — rather, an entirely different "new species"?
"One step ahead makes you a martyr; half a step ahead makes you a pioneer" — the LPU architecture lay dormant for nine years; what's behind its sudden explosion?
Why is the CPU/GPU working principle like "eating a Manchu-Han full banquet," while LPU is like "eating conveyor-belt sushi"?
How does Groq's 14nm "legacy process" chip manage to "crush" NVIDIA's 4nm H100? One-sixth the latency, one-third the power consumption, one-quarter the cost — what secrets do Groq's "counterintuitive" numbers reveal?
Why is the AI inference edge a long-tail market, and why do long-tail markets favor entrepreneurs?
To what extent will this wave of AI technology transform the education industry?

We've edited portions of the conversation. For the full dialogue, search for "High Energy" (Gao Nengliang) on Xiaoyuzhou App and Apple Podcast.

We've edited portions of the conversation, hoping to connect with more practitioners in the LPU architecture and AI inference chip fields. This is part of our "AI Industry Observations" series, which will continue to share firsthand practices and observations from AI entrepreneurs. If you're building a startup in the AI chip direction, welcome to reach out at yangyongcheng@freesvc.com

"AI Industry Observations" Series

AI for Science — Investment and Entrepreneurship: Where Are the Opportunities for the Next Decade?

From Large Models to AI Companions: Behind the Rotation of AI Hotspots, What Cyclical Patterns Exist?

How to Bridge the "Last Centimeter" Between Robots and Real-World Interaction?

"In AI Hardware, There Are Always New Opportunities"

Sai Zhang's 13-Year Oral History of Robotics Entrepreneurship: The Rise and Fall of the "Big Four Families," Persistence to the End, and Positioning in Embodied Intelligence and Global Expansion

Interactive Giveaway

What application scenarios do you think LPU architecture will appear in? Share your thoughts in the comments. By 17:00 on January 31, 2026, the 2 readers with the most thoughtful comments will each receive a book recommended by Feng Shu (Li Feng).

/ 01 /

NVIDIA's Anxiety and the "Second Half" of AI Compute

Yongcheng Yang: Today we've invited Bin Yang, founder of Yuanchuan Micro, to discuss AI processors — including GPUs, NPUs, and especially the recently hot LPU. Let's start with a self-introduction.

Bin Yang: We're a semiconductor design startup, focused on edge and endpoint inference compute. Actually, whether you're an entrepreneur or investor, there's a shared consensus right now: the AI industry has entered its second half.

The first half was a competition over model capabilities — everyone looking at parameters, at benchmark scores. This was essentially measuring the ceiling of model capability. At this point, models already have commercial viability.

So entering the second half, it's fundamentally about realizing model value. Therefore, around application deployment, AI inference compute is definitely a critical core track.

Yongcheng Yang: There's been very explosive news recently — NVIDIA acquired Groq's technology license and team at roughly a $20 billion valuation. There are various interpretations. Some say NVIDIA wanted to kill a competitor in the cradle. As a veteran industry practitioner who's studied LPU for many years, what's your take?

Bin Yang: I think the "blocking competitors" view is relatively narrow. Let's do the math: $20 billion may not be a big deal for NVIDIA's stock price, but NVIDIA's cash flow for the first three quarters was roughly $56 billion. This transaction probably accounts for 1/3 to 1/4 of its full-year cash flow.

The cost is undeniably high, and precisely reflects NVIDIA's deep understanding of industry evolution. Why buy an inference company rather than a training company? Because starting from 2025, NVIDIA's strategic center of gravity has shifted to inference, especially embodied intelligence.

Look at NVIDIA's map: CPU has Grace, GPU has Rubin, DPU comes from Mellanox — these three address cloud supercomputing training needs. But when truly moving to the edge, to embodied intelligence, it's missing a piece — real-time inference. LPU can bring exactly that critical complement.

Yongcheng Yang: Let me summarize. First, NVIDIA's acquisition targets the LPU technical route, particularly its advantages on the inference side. Second, it also reflects its urgent expectation of rapid inference market growth. The acquisition enables the fastest possible强强联合 (strong alliance), helping it achieve as early as possible the kind of leading position in inference that it once held in training.

Bin Yang: Yes. I can say without hesitation: in the AI inference domain, the LPU architecture is the most suitable. Full stop.

Yongcheng Yang: There's another theory — that NVIDIA wanted to acquire Google TPU talent, since Groq founder Jonathan Ross previously developed TPU v1 at Google, which later powered AlphaGo's victory over Lee Sedol. Are TPU and LPU the same thing?

Bin Yang: This theory feels a bit like "riding the hype wave." Jonathan Ross was indeed the lead architect of TPU v1, but he left in 2016. Today's TPU has iterated to v7 — vastly different from back then.

As for LPU versus TPU, they're very different, enormously different, even two distinct species. Google's TPU still carries the mission of training, while LPU is purely born for inference.

Yongcheng Yang: Actually, NVIDIA's operation this time acquired IP and brought the core team on board, but the Groq brand remains.

Bin Yang: I think antitrust law explains it better. Just as Intel once propped up AMD — if NVIDIA does CPU, GPU, DPU, and LPU all well, with no second company left in the market, it could face breakup risk. So keeping the Groq brand operating independently is a kind of legal survival wisdom.

/ 02/

From "Manchu-Han Full Banquet" to "Conveyor-Belt Sushi": The Technical Truth of LPU

Yongcheng Yang: The LPU architecture originated in 2016. Why did it lie dormant for nine years, only catching fire today?

Bin Yang: To put it in one phrase: "One step ahead makes you a martyr; half a step ahead makes you a pioneer." When Groq was founded, large models weren't hot yet — it was too early. The current LPU heat is essentially resonance between industry and technology.

Today's AI industry has two characteristics: first, relatively low-cost model capability is very strong; starting from DeepSeek's explosion, 30B to 70B models are already the sweet spot (the optimal balance of cost-performance). Second, inference has truly scaled.

Yongcheng Yang: This is like ARM chips back in the day — they only achieved major growth when consumer electronics rose. LPU's "careful calculation and strict budgeting optimized for inference" also needed this inflection point.

If it's pilot applications, people don't have high performance requirements — being a few dozen milliseconds slower is fine. But once it enters large-scale application, reaching millions of households, you have to count every penny. High power consumption means high operating costs; poor latency means collapsed user experience.

Bin Yang: Yes, especially the "low latency" characteristic — often easily overlooked. LPU isn't just about absolute short time; more critically: it can provide consistently stable low latency. For example, with robots or autonomous driving, if compute speed fluctuates, response times vary unpredictably — that's fatal.

Yongcheng Yang: Yes, this actually determines whether something is usable at all. Take AI photography — much of it is currently post-processing: you take the shot, discover closed eyes, then fix it. If latency is low enough, you can do "real-time inference" — pressing the shutter directly processes closed eyes into open ones. This creates new demand.

Bin Yang: I can offer a concrete parameter here. Groq published data at the 2024 ISCA conference: using a 14nm legacy process chip, compared against NVIDIA's 4nm H100, Groq achieved one-sixth the latency, one-third the power consumption, and one-quarter the cost. So overall energy efficiency is 10x NVIDIA's.

Yongcheng Yang: These metrics are so important for the edge. Can you give readers a simple explainer on the fundamental differences between LPU and the GPUs and CPUs we're familiar with?

Bin Yang: I'll use an analogy. CPUs and GPUs are fundamentally von Neumann architectures. They both have extensive multi-level memory structures (multi-tier caching systems for temporary data storage), with everyone exchanging data through shared regions.

This is like eating a "Manchu-Han full banquet." The table doesn't rotate; all dishes sit in the center (shared memory). Every person (core) has to stand up to reach for food, which can lead to chopstick clashes (conflicts, unpredictability) — like "implicit data flow."

LPU is completely non-von Neumann architecture — a different species entirely. It breaks the shared memory mechanism. It's more like eating "conveyor-belt sushi." You sit still; the dishes (data) rotate on the belt. The magic is: when you want to eat, that dish happens to rotate right in front of you. This is "explicit data flow" — determinism.

Yongcheng Yang: That's a vivid analogy. Let me add one more:

GPU/NPU is like a professor leading a group of PhD students (many cores)**. Everyone is very capable, but there's scheduling overhead when assigning work, often with uncertain resource utilization and poor timeliness.

LPU is like an industrial assembly line**. Each station worker only needs to be good at screwing bolts or applying labels — simple, single actions — but from start to finish, there's no stopping. Extremely high efficiency.

Yongcheng Yang: There's also a deeper business competition question here. On the training side, NVIDIA has near-total monopoly. When it comes to inference, will it quickly become a red ocean?

Bin Yang: The most favorable aspect of the inference market for us entrepreneurs is that it's actually a very long-tail market.

Training scenarios are simple: just stack compute in the cloud. But on the inference side, scenarios are extremely fragmented — from a smart band to an edge server to an automotive intelligent driving system, their chip needs are completely different. This determines: there can't be a perfect "hexagonal warrior" chip that solves all problems in this market.

Moreover, because inference-side opportunities lean toward the application end, and China is the world's largest supplier of electronic products, we're closest to the supply chain and customers — this is also the opportunity for Chinese companies building LPUs.

Changing Lanes to Overtake: China's Supply Chain and "LPU Plus"

Yongcheng Yang: We just discussed how LPU has been developing for 9 years but only recently got hot. Beyond market drivers, I've also observed that relatively few people in the industry have been tracking the LPU technical route. Why is that?

Bin Yang: Beyond timing issues, there are extremely high technical barriers.

On the hardware side, LPU is a pipeline structure — every stage requires careful custom design. You can't copy cores like with GPUs; design verification workload is massive.

On the software side, compiler challenges are enormous. You need to understand processor architecture, compilers, and place-and-route all together — this "trinity" of talent is extremely rare.

Yongcheng Yang: Actually, as a pioneer, Groq also stepped on many landmines. For example, early on, in pursuit of speed, it used large amounts of on-chip SRAM, leading to excessive die area and soaring costs.

But this is also opportunity for us. Because Groq is far from consumer electronics supply chains, iteration is slow. And China is the world's largest electronics supplier — we're closest to the hardware, closest to customers, and we can iterate fast.

Bin Yang: Yes, this is what we're doing with "LPU Plus." We're not simply copying Groq; we've made extensive architectural upgrades and improvements, solving cost and storage problems. The data we've run is extremely consistent with Groq's, and we're confident we can do even better.

Yongcheng Yang: Following a giant's footsteps to surpass him is very hard. But on this new LPU track, plus our proximity to market, there are more opportunities to overtake. So Bin, what are your main application deployment scenarios going forward?

Bin Yang: Medium-term, it's embodied intelligence — that's the vast ocean of stars. Short-term, it's reconstruction of traditional hardware.

The logic behind this is that model capabilities have shifted from "classifiers" to "generators." Before, a camera could only classify "this is a cat." In the future, a camera can write itself a daily work report: "Was there anything suspicious today?" These are enormous business opportunities — all existing markets can be turned over.

Yongcheng Yang: Finally, a personal question. What was the moment you decided to jump in and start your own company?

Bin Yang: We had actually scanned all architectures and understood that LPU was the best, but the moment that made me determined to start a company hadn't arrived.

Until Spring Festival 2025, when the DeepSeek-R1 technical report was released. That paper struck me enormously: large models are finally not a bubble — they're usable. Model capability is strong; costs have dropped to a stage where everyone can use them.

That night, after finishing the paper, I felt I could finally jump in. Although the LPU path is difficult, I believe in FreeS's phrase: do the hard and right thing.

AI for Science — Investment and Entrepreneurship: Where Are the Opportunities for the Next Decade?

If Mobile Internet Failed to Reshape Education, Can AI?

FreeS Fund Li Feng's 2025 Year-End Sharing: The Logic and Outlook of AI Investment

Looking Ahead to 2026, What Innovation Opportunities Exist in the AI Industry?

AI Healthcare: Is It Experiencing Its DeepSeek Moment?

Star the FreeS Fund WeChat official account — firsthand business thinking delivered promptly