An Underwater AI Chip Company Raises 1 Billion Yuan, Targeting Large Model Inference
Walking between two "hands."

"Walking between two 'hands'." By Zhiyan Chen

An underwater 3D AI chip company closed two funding rounds in four months.
"Waves" has learned exclusively that Suanmiao Technology recently completed two consecutive rounds totaling nearly RMB 1 billion. The Pre-A round was co-led by Source Code Capital and Stonepine Capital, with follow-on investments from Lenovo Capital and several other core semiconductor industry players. The Pre-A1 round was led by Xianghe Capital, with additional backing from state-backed investors including China Development Bank Capital and Beijing Shunxi. The proceeds will be used for R&D and mass production of fully domestic 3D compute chips.
Suanmiao Technology is a company focused on long-term development of 3D compute chips. Its core product is customized 3D chips for AI large model inference.
"The biggest problem with existing AI chips isn't computation — it's memory. When Nvidia's H100 runs AI inference, up to 70% of its compute units sit idle, waiting for data to be moved from memory. Over the past 20 years, Moore's Law has driven a 60,000-fold increase in compute power, while memory bandwidth has only grown 100-fold."
Suanmiao founder Fuquan Wang told "Waves" that his goal is to solve the "memory wall" constraint on AI large model computation through computer architecture innovation, combined with a 3D IC supply chain built over years of collaboration with domestic semiconductor industry partners. Current 3D DRAM bandwidth can reach 32 TB/s — four times that of Nvidia's B200. Suanmiao's R&D focus is converting that high bandwidth into tangible inference performance.
Suanmiao provided "Waves" with Paladin simulation data for its chip A4, showing that on mainstream open-source large models like Llama and Mixtral, A4's inference throughput (tokens/s) reaches 1.26x to 2.19x that of Nvidia's H200.
Wang, 51, earned his PhD at the Institute of Acoustics of the Chinese Academy of Sciences (CAS), a state key laboratory, where he studied under Academician Renhe Zhang. After graduation, he joined the CAS Institute of Computing Technology for postdoctoral research in computer architecture, working with Weiwu Hu, chief scientist of the domestically developed CPU "Loongson."
Suanmiao's core scientists mostly graduated from CAS's Institute of Computing Technology, Institute of Acoustics, Institute of Automation, Tsinghua University, and other top institutions. Among them are veteran entrepreneurs who have spent years in the semiconductor industry, as well as principal researchers who conducted cutting-edge AI research at Microsoft Research Asia.
After 2019, full-chain domesticization became the dominant narrative in China's chip industry. But Suanmiao's team says they are not opportunists riding the "import substitution" wave. Behind this startup lies the story of a group of technology workers who were among the earliest to commit to domesticization, navigating between state will and market forces.
Twenty years ago, after earning his PhD from CAS, Wang naturally became a deep participant in the domestic CPU chip "Loongson." In 2009, he founded Sunsonic Technology, and for nearly the following decade focused on almost one thing — various industrialization efforts around Loongson. In an era that worshipped brands and globalization, sticking with Loongson was an extremely lonely journey. The lack of software ecosystem constrained the market competitiveness of domestic general-purpose processors.
In January 2018, Wang was invited to the annual party of a crypto mining chip design company, where he received "the most direct shock" — "A team of 10 people, doing several hundred million in revenue in one year, with profits over 100 million." This was because crypto algorithms demanded extreme hardware design (ASIC), and there was no software ecosystem barrier. It provided a "level playing field" for domestic chip companies to challenge international giants.
At 44, wanting to "find a new direction," Wang quickly embarked on a "midlife rebellion." He shut down all Sunsonic businesses related to Loongson, took RMB 10 million-plus in angel funding and a "5.5-person" R&D team, and with the dream of becoming a world-class chip design company, plunged headfirst into the dark forest of crypto mining.
Entering the completely unfamiliar crypto mining field, Wang's team didn't choose the most mainstream Bitcoin mining chips, but instead selected Ethereum — technically and commercially one of the most challenging options.
Upon entering the crypto chip space, the team discovered that unlike Bitcoin, which had long entered the ASIC mining era, on the Ethereum blockchain almost everyone was still mining with Nvidia and AMD graphics cards. Unlike Bitcoin's consensus algorithm, which was very ASIC-friendly, Ethereum's consensus mechanism (ethash) deliberately exploited the so-called "memory wall" problem and was designed to be "ASIC-resistant" — a classic memory-hard algorithm. This algorithm deliberately bottlenecked compute power on memory access bandwidth. To achieve extreme hash rate release, there was only one path: finding extreme memory bandwidth technology beyond standard bus memory (like DDR, HBM).
To crack this problem, Wang's team rejected existing technologies at three "last-minute" junctures to find new paths, and finally locked onto the then-nascent "3D stacking" architecture by end of 2019. In Q4 2021, its high-throughput compute chip JASMINER X4 launched globally, using mature 40nm process to achieve 20x better hash rate-to-power ratio than Nvidia's 7nm flagship graphics cards, single-handedly eliminating Nvidia and AMD GPUs from Ethereum mining. In the final year before Ethereum's transition to PoS consensus, this single chip alone brought the team RMB 800 million in revenue, making JASMINER the world's top brand in Ethereum mining.
Unexpectedly, ChatGPT emerged at the end of 2022. Wang saw that behind AI large model computation lay a bottleneck extremely similar to Ethereum mining — compute power was being choked by the "memory wall." Through the team's long-term focused work, 3D stacking had been proven the best practical solution for such memory-hard problems, and the explosion of AI large models provided an extraordinarily vast compute scenario. Suanmiao Technology was born.
At a time when GPU capital stories at home and abroad are already dazzling, Wang believes Suanmiao's key to survival and development is "the new opportunity brought by computing paradigm shifts in the AI large model era."
"At Suanmiao we rarely mention 'domesticization' or 'import substitution,' because what we do is already the best globally. Our goal remains to become an internationally competitive chip company, contributing new solutions with Chinese advantages to global AI large model computation, alleviating the global compute power crisis and compute energy crisis," Wang said. His confidence comes from the team's years of accumulated R&D experience in 3D IC, and successful large-scale commercialization in the crypto mining market.
Suanmiao's two rounds brought together state industrial capital, 3D IC core supply chain industrial capital, and top market-oriented funds. At least at the capital level, this ambitious dream now has its most basic support.
Notably, Source Code Capital, which completed its investment in the second half of 2025, was Suanmiao's earliest lead investor. Looking across Source Code's portfolio, this long-low-key firm has quietly completed systematic positioning across the entire AI industry chain. From semiconductor materials to equipment to chips — ESWIN Materials, Biren Technology, SeeYA Technology, Changchun Guangxun Optoelectronics, Sihang Semiconductor, Accelerated Epochs — to AI + robotics scenarios including Unitree, Galaxy General, Hillbot, Accelerated Evolution, Woan Robotics, and on to models and applications like Kimi, Lovart, sand.ai, Meshy, and AI for science company DP Technology — over 20 companies in total. Clearly, Source Code has no intention of missing this AI era rushing toward us.
In the winter of 2025, at Suanmiao Technology's headquarters in Beijing's Zhongguancun, "Waves" met Wang. This middle-aged entrepreneur who emerged from CAS described a grand vision and strong confidence facing the current market. From his account, you can see how a scientist-entrepreneur with a "national team" background embraced the market, believed in the market, and then returned to the "fully domestic" path with a new mindset.
The following is the interview —
Part 01
The "Top-Tier" Chip Architecture That Nvidia Won't Build
"Waves": Your simulation data shows A4 using 12nm process "defeating" Nvidia's H200 built on TSMC 4nm in inference performance. Is this reasonable?
Wang: Counterintuitive, but it aligns with the physical nature of large model computation. Large model inference is a classic "memory-bound" task. Simply put, the bottleneck isn't that the brain isn't spinning fast enough (insufficient compute cores), but how quickly data can be brought into the brain (memory bandwidth).
Think of a compute chip as a factory, with data in memory chips as raw materials. Nvidia's H200 has very complete factory equipment (4nm process), capable of producing all kinds of products (general-purpose processor). But the problem now is how to quickly transport raw materials into the factory. The 2.5D chip (CoWoS) approach is to build wider conveyor belts, ultimately constrained by the factory gate width (shoreline). The H200's die size is already at the limit of mass-producible dimensions, with memory bandwidth of 4.8 TB/s.
The 3D architecture chip takes a completely different approach. We stack memory chips directly on top of the compute cores (shortening transport distance), and build hundreds of thousands of vertical elevators, so raw materials can quickly reach every corner of the factory, no longer limited by gate width, achieving 16-32 TB/s bandwidth. Meanwhile, our factory (compute chip) is specifically designed for AI large model inference (specialized chip), so we can achieve higher inference performance with less equipment (12nm process).
"Waves": If 3D architecture is so powerful, why doesn't Nvidia do it itself?
Wang: Nvidia is a great company. Its moat is built on the CUDA ecosystem and general-purpose GPU architecture. But this also means its hardware architecture innovations must yield to software ecosystem compatibility. Its hardware needs to accommodate graphics rendering, scientific computing, AI training, and various other scenarios — it must be an "all-around champion." Meanwhile, 3D stacking architecture innovation brings entirely new challenges. Suanmiao chose the customized ASIC approach, sacrificing the generality unnecessary for non-large-model computation in exchange for extreme inference performance. If Nvidia did this, it would be dismantling its own GPU empire. This is precisely the startup's opportunity — not burdened by the giants' past baggage, we can fight based on first principles.
"Waves": This leads to my next question — why only large model inference and not training? Are you avoiding the frontal battlefield?
Wang: This isn't avoidance; it's strategic focus.
From a technical perspective, large model training requires not just chip design capability but also engineering capability for interconnecting tens of thousands of chips and complex software stack ecosystems. Therefore the training market isn't an appropriate entry point for a startup.
From a market perspective, 90% of future AI compute demand will occur on the inference side. Large model inference compute demand will far exceed training compute demand. Large model training will eventually converge. In the future, everyone's phones and every company's servers will mainly be running large model inference. At that point, inference cost (TCO) will be the only consideration. Customers won't care whether you're a GPU — only how much money and electricity it takes to generate one million tokens. This is precisely ASIC's strongest battlefield.
Moreover, I believe this is an opportunity for Chinese people.
"Waves": Is this nationalist sentiment?
Wang: This isn't sentiment; it's my observation over the years of engineering thinking differences between China and the US.
American engineers excel at "abstract thinking" and software. Look at CUDA, Windows, iOS — they excel at abstracting the complex world into layers of standard interfaces, building ecosystems. This is what American engineers do well.
But ASIC specialized chips are different — they're the ultimate in "concrete thinking." They require arranging transistors with Swiss watch precision in an extremely small physical space, repeatedly polishing to save a tiny bit of power or squeeze out a bit more performance, even "hacking" memory dies. This meticulous craftsmanship of "making a temple in a snail shell" is precisely what Chinese engineers excel at.
The history of crypto mining chips has already proven this: Although the West invented Bitcoin and Ethereum, the companies that ultimately dominated Bitcoin mining chips were Bitmain and those that ruled Ethereum mining were Sunsonic — all homegrown Chinese companies. On the AI inference battlefield, which similarly demands extreme efficiency, I believe history will repeat.
"Waves": Will customers accept this separated chip layout of training and inference?
Wang: Training and inference are two different scenarios/customer groups. Only a very small number of top players can sustain large model training. The inference customer base is much larger. Our first product focuses on large model inference, allowing our chip architecture to be extremely streamlined — only needing to accommodate limited generality within inference scenarios, while dedicating more resources to enhancing inference computation itself. This brings lower cost and lower power consumption. Suanmiao doesn't want to replace all of Nvidia, but to carve out the biggest piece of its future growth.
"Waves": How big is this piece?
Wang: For large model inference alone, globally this is already a hundred-billion-dollar compute market. In China alone, it's already a market of hundreds of billions of RMB. And both are growing faster than most people expected.
"Waves": Domestically there are the GPU "N Little Dragons," plus big tech companies eyeing this space. In such a crowded competitive environment, you're quite confident.
Wang: We don't do GPUs; that's not our strength. For six or seven years, our team — driven by market demand and working hand-in-hand with supply chain partners — pioneered China's 3D stacking chip field and has initially established significant advantages globally. We focus on R&D, mass production, and global sales of 3D chips. As a startup, we've invested over RMB 1 billion in this field, pouring our earnings and funding into it, building an experienced and battle-hardened integrated team for 3D chips and large model computing software and hardware. We certainly have reason to be confident.
At the same time, we maintain a mindset of treading on thin ice, carefully and steadily walking the broad road we've pioneered. We believe 3D chips represent the future of computing, and that ASIC is the correct path for large model inference computing — GPU is just a transition. But either way, we must thank GPU companies represented by Nvidia for giving birth to this great era of AI large models.
Part 02
What Is the Correct Answer for Domesticization?
"Waves": That 2018 annual party completely changed you. Why was its impact so great?
Wang: It was my friend's company. They also made chips, but made the crypto mining chips that we "regular troops" looked down on. That night what shocked me wasn't just those numbers — 10 people, RMB 300 million revenue, RMB 100 million profit — but the brutality and directness of that business model.
They didn't need to "beg" customers, didn't need to write tedious application materials. As long as the chip's hash rate was strong enough, customers would line up with cash in hand. That was the first time I intuitively felt that "hash rate is currency." The power of market forces was so raw. Based on our experience and understanding, I firmly believe that China's chip industry's future lies in market forces. China's technological market forces have been severely underestimated.
"Waves": Was it that easy to have a "midlife rebellion"?
Wang: I developed a profound self-doubt: If technology can't be converted into real money in the market, what is its value? If I wanted to build a great chip design company, it had to withstand market testing.
So I decided to "jump into the sea," letting go of all Loongson-related businesses. Many old friends didn't understand at the time, thinking I had "degraded" from building national heavy equipment to making crypto mining chips. But looking back, without those years of extreme survival training in the global market, without fighting in that arena that purely cared about PPA (performance, power, area), we could never have forged today's team, let alone developed together with domestic 3D IC core supply chain partners.
"Waves": Why did you choose Ethereum at the time?
Wang: Our team's instinct is to find big challenges. Ethereum was the second-largest cryptocurrency network at the time. More importantly, Ethereum mining chips were the most challenging chips in that field. Because the entire chip's algorithm bottleneck was stuck on "memory bandwidth." To double the chip's hash rate, you had to double the effective bandwidth. No shortcuts possible. This "memory wall" problem has long plagued the entire computing field.
"Waves": How did you crack this hard problem?
Wang: The team had just five and a half people; one member was still part-time. We determined to take the "pure ASIC" path. Our first attempt was HBM, the highest-bandwidth memory solution based on DRAM at the time. But as research deepened, due to supply chain and cost-performance considerations, we quickly abandoned it. We were a team that saw through HBM memory very early. The second approach was to not use external memory at all, but use limited SRAM inside the chip, trading computation for space. Though the space was small, repeated computation could compensate. This approach took 18 months; the chip was even designed. At that point we found a third approach.
"Waves": You just dropped 18 months of work?
Wang: Abandoned it. Because we found something better. The third approach was strong interconnection, using SerDes to interconnect multiple chips to solve the low SRAM capacity problem. Power consumption was good, but the challenge was the board became extremely complex. We worked another 3 months until end of 2019, when we encountered what we believed was the ultimate solution: the 3D architecture.
"Waves": What was the process of pivoting to AI chips? What opportunity did you see?
Wang: When I first encountered AI large models in early 2022, ChatGPT hadn't broken through yet, and I was quite hesitant. On one hand, before this AI had always been much talked-about but little purchased — no chip company had really made money on AI chips. On the other hand, in the previous small-model era, various model architectures emerged endlessly, forcing design of general-purpose processors to accommodate all models, which went against our team's DNA.
In the first half of 2023 I was in Silicon Valley, personally experiencing this wave of AI large models. Scaling law-driven improvement in AI large model intelligence far exceeded my expectations; the Turing test was effectively broken — this was truly a major event in AI. Meanwhile, our team's research on Transformer algorithms had reached considerable depth. Our accumulated 3D architecture technology happened to be the most promising solution for the memory bottleneck in large model computation. So we quickly pivoted to AI large model chips.
AI compute power in the future will be like today's water, electricity, and gas — infrastructure for the new era. The core competitiveness of the AI era lies in compute power, and the future of compute power lies in architecture innovation. We firmly believe that 3D stacking architecture and ASIC extreme-optimization design philosophy are the optimal solution for AI large model compute power in the next 5-10 years.
"Waves": I know you're actively recruiting now. How do you persuade top engineers already at major chip companies to join you?
Wang: Simple — top talent should do top-tier work, and what we're doing is the most top-tier work in the global large model compute field. And our team has already undergone long-term, thorough commercialization training in global markets, with deep understanding of the compute business.
Additionally, we don't compromise on compensation. Whether equity incentives or salary, it's definitely top-tier.
"Waves": Starting from Loongson's "national team," you became a "grassroots hero" in crypto mining. Now you're back on the "national heavy equipment" main track of AI compute chips. Looking back, how do you define yourself now?
Wang: I think I've become a "realistic idealist."
Layout | Nan Yao Image source | AI-generated

Recommended Reading
Where Money Flows, People Rise and Fall
