Who Will Own the Next Decade of AI Compute After the Global Semiconductor Stock Rally? | Linear View

线性资本·June 3, 2026·32·55

Stock prices are a sentiment meter; the industry itself is the scale that measures true value.

Recently, global semiconductor stocks have seen dramatic volatility in secondary markets, with individual names diverging sharply. But beneath the surface of price swings, a deeper industry question is coming into focus: Is the center of gravity for AI compute shifting from training to inference?

On May 26, Harry Wang, Founder and CEO of Linear Capital, was invited to join Barron's Elite Roundtable, setting aside short-term price fluctuations to offer a systematic, in-depth analysis of the ARM versus x86 architecture debate, the true moat of NVIDIA's CUDA ecosystem, the evolving landscape of training versus inference compute, and the synergistic relationship between memory chips and GPUs — all grounded in technical fundamentals and industry trends.

The following is a partial transcript of the live session, representing personal technical views only and not constituting investment advice. Discussion is welcome.

Recent turbulence in global semiconductor equities has been striking. Viewed purely through the lens of stock prices, this appears to be yet another round of short-term capital rotation. But shift your gaze from the candlestick charts to the underlying logic of chip design, and you'll find that these secondary market movements actually reflect an industry proposition that is accelerating in real time:

Is core AI compute demand shifting from the training side to edge inference? If so, how will this reshape the entire technical landscape — from instruction set architectures and chip design to ecosystem dynamics?

Is ARM's surge a short-term earnings-driven emotional spike, or the starting point of a fundamental repricing of RISC (Reduced Instruction Set Computing) value in the AI inference era? Is NVIDIA's CUDA ecosystem truly unassailable — perhaps so on the training side, but how large a gap can AMD, Google, and Microsoft's custom chips open on the inference side? As inference compute workloads have already surpassed training, and are projected to reach 70–80% of the total, what new technical challenges does this pose for power consumption, cost, memory bandwidth, and heterogeneous design?

These questions cannot be answered by tracing capital flows or interpreting technical charts. They demand a return to the source of the technology itself — to instruction sets, architectural design, and ecosystem moats — to find answers that actually matter.

Sun Cheng, Barron's: ARM surged 42.58% in a single week. The core market narrative is that "AI is shifting from training to edge inference, and ARM architecture will be a major beneficiary." So from a technical perspective, what are the fundamental advantages of ARM's instruction set and licensing model in AI inference scenarios compared to x86? Is this repricing driven by short-term earnings catalysts, or is it a long-term industry inflection point?

Harry Wang, Linear Capital: From a technical standpoint, the broad narrative of the market shifting toward edge inference-centric AI compute applications is fairly clear. There are three core reasons why ARM is better suited for the edge than x86:

First, instruction set advantage. ARM is RISC (Reduced Instruction Set Computing); x86 is CISC (Complex Instruction Set Computing). RISC doesn't require overly complex pipeline designs, so for the same die area you can integrate more cores, cache, or dedicated acceleration units — which benefits near-memory computing. In AI inference, a major bottleneck is memory communication. ARM offers flexible physical IP and customization capabilities that are naturally suited to AI inference operations. Under typical edge workloads, ARM's energy efficiency is markedly superior to x86, with power consumption typically nearly 50% lower — a massive cost advantage.

Second, licensing model advantage. ARM operates on an IP licensing model, so customers can do custom designs after obtaining the license — how to coordinate CPU with GPU, NPU, and so on is entirely up to them. x86 is essentially a "black box," making it very difficult to do heterogeneous, layered optimization on top of it. This openness makes ARM more attractive for integrating high-bandwidth memory (LPDDR).

Third, matrix operation optimization. The matrix multiplications most heavily used in Transformer models are far more efficiently optimized on ARM than on x86. x86 can do matrix multiplication efficiently too, but to achieve the same throughput in terms of energy efficiency and area efficiency, it falls short of ARM's vector solutions — making it uneconomical at the edge. If you care about compatibility and maturity, x86 is an option; but if you care about cost-performance and power consumption (a huge cost factor going forward), ARM's advantages are clear.

From an industry trend perspective, AI is shifting from training to inference, especially toward Agentic AI. This isn't just large model dialogue; it involves massive API calls, network calls, file calls — all of which rely on the CPU. As a CPU, ARM holds significant promise for Agentic AI applications at the edge (PCs, phones, vehicles, etc.).

I believe the broad direction is correct, and market recognition of this inflection point has already formed. But short-term stock price volatility will be substantial; over the long term, the realization of these major trends will solidify value. In the long run, the market is a weighing machine; in the short run, it's an emotional voting machine.

Sun Cheng, Barron's: NVIDIA fell 2.58% last week (the week of May 18), with market concerns about slowing growth. AMD rose 3.60%. In the AI chip space, is NVIDIA's CUDA ecosystem moat still secure? Can AMD's MI-series GPUs narrow the gap in inference scenarios? How significant is the threat from customer-designed chips to NVIDIA?

Harry Wang, Linear Capital: On the training side, no one can challenge NVIDIA's position right now. But as large models proliferate and Agentic AI develops, inference-side compute demand has risen substantially, creating opportunities for other players. Inference-side demand is more complex, requiring heterogeneous architecture co-design across CPU, GPU, NPU, and placing high demands on memory. This is what gives AMD and others a real chance to grab a slice of the pie.

AMD will certainly benefit on the inference side from the sentiment of "the world has suffered under NVIDIA's yoke for too long." Enterprises need a reliable second supplier to ensure supply chain security and avoid being held captive by a single vendor. So as long as a viable second supplier emerges, customers will tilt toward it with extra support. AMD's data center business is growing at a significant clip, but overall revenue growth still lags far behind its stock price gains — there is certainly an element of emotional volatility here, but the market's release of inference-side opportunity is real.

For NVIDIA, inference-side sales will be impacted to some extent, but not fatally. The CUDA moat cannot be easily surpassed overnight. At the same time, Google, Microsoft, Amazon, and others are all developing custom inference chips — partly to reduce dependence on a single supplier, and partly to do customized heterogeneous design for their own specialized scenarios. This trend is clear. But the most general-purpose, most universally applicable chips remain NVIDIA's. For the next two to three years, even on the inference side, NVIDIA will still hold the dominant share — just not with the same concentration as on the training side. On training, there may be no substantive challenger for the time being.

Sun Cheng, Barron's: Market views hold that AI investment is shifting from "training compute" to "inference applications." From an actual deployment perspective, what is the approximate current global ratio of training to inference workloads? How will the inference share change over the next two years? What new technical challenges does this shift pose for chip design, memory bandwidth, power consumption, and so on?

Harry Wang, Linear Capital: I've reviewed some reports. The current compute workload split between inference and training is roughly 60:40 or 55:45 — inference has already surpassed training. But capital expenditure (Capex) is inverted: training still accounts for 60%, inference for 40%, because the supernodes required for training are far more expensive.

The inference share will rise substantially going forward. For one thing, global penetration of large model applications is still quite low. Data shows that 86% of people on Earth have never touched a large model application. Beyond China and the US, many countries have very little large model usage. For another, Agentic AI will dramatically increase the frequency of large model invocations. Over the next three to five years, it is entirely plausible for inference compute to reach 70–80% of the total.

Another factor is that the pace of iteration on the training side is slowing markedly. The companies capable of training large models are concentrating; fewer and fewer enterprises can afford training costs. With reduced competition, companies will lean more toward commercializing existing models rather than rapidly launching new ones. Training-side compute demand is relatively declining. Unless breakthroughs in areas like embodied intelligence or world models create new 10,000-GPU training demands, that could become a new variable.

This shift from training to inference imposes higher demands on chip design. Inference is extremely sensitive to cost, power consumption, and latency, and must adapt to diverse scenarios from cloud to edge — creating segmented entry opportunities for different architectures (GPU, ASIC, CPU). Requirements for memory bandwidth and context caching (KV Cache) are also more complex than on the training side. These technical challenges are precisely what create more market opportunity for architectures like ARM and for memory chips.

Sun Cheng, Barron's: AI server demand is exploding, and some argue that "AI memory chips will take the baton from GPUs to become the new engine of the rally." Do you believe memory chips can truly take the baton? What are their growth prospects compared to GPUs, and what are their advantages or shortcomings?

Harry Wang, Linear Capital: Whether memory chips can "take the baton" depends on what you expect from the baton. If you're expecting hundreds of percentage points of gains over a few years like NVIDIA, I personally think that would be difficult. The GPU "baton" is too heavy; it's hard for anything else to catch a baton of that magnitude.

That said, inference-side memory demand is indeed rising substantially — especially for KV Cache, context windows, and so on, memory requirements are growing higher and higher. Long-term, memory chip demand is definitely increasing. In the near term, all AI-related plays have valuation bubbles to some degree, but the development and demand of AI itself is not a bubble — it continues to advance rapidly. Right now, price far exceeds value, but long-term value will catch up. Absent major geopolitical disruptions, the long-term value of these stocks will chase their prices upward, unlike pure speculative themes with no underlying support.

As for growth prospects, memory chips and GPUs have a synergistic relationship, not a simple substitution relationship. Inference requires large-memory GPUs to exchange massive amounts of data through high-speed interconnects, which also places higher demands on optical communications and interconnect technologies. So the more likely scenario is that memory, optical communications, and multiple other directions develop in concert.

For investing, this is not a logic of waiting for short-term explosions; it should be viewed through a longer-term lens. These directions are all massive growth sectors, but whether there is short-term price froth and how severe emotional pricing may be — that I cannot say.

This article was first published in Barrons

Personal technical views only, not constituting investment advice