Liang Jun's First Year at the Helm: Fangqing Tech Raises Over 500 Million RMB in Three Rounds Within Six Months

暗涌Waves·December 14, 2025

A star-studded lineup of shareholders quickly assembled.

"A star-studded shareholder lineup came together fast."

By Zhiyan Chen

AnYong Waves has learned exclusively that AI chip and system architecture developer Fangqing Technology recently completed its Pre-A round. Strategic investors include a major internet company, Xinlian Capital, and Yima Capital (the industrial fund under Hundsun Electronics). Financial investors include GF Xinde and a top-tier VC firm. Existing shareholders Lingang Sci-Tech Innovation Investment and 37 Interactive Entertainment also made additional oversubscribed investments. Specific funding amount and valuation were not disclosed.

This marks Fangqing's third financing round in just six months. Previous investors include Xiaomi Strategic Investment, NIO Capital, Mingshi Ventures, Lingang Sci-Tech Innovation Investment, Huaye Tiancheng, GaoJie Capital, 37 Interactive Entertainment, New Intelligence Cognition, and Dawu Ventures, among other strategic and financial institutions. AnYong Waves understands that recent funding has exceeded RMB 500 million. The capital will be used for core technology R&D, productization, and ecosystem and market expansion.

Fangqing Technology was founded in late 2022 and is registered in the Lingang Free Trade Zone in Shanghai. In August 2024, Liang Jun — former CTO of Cambricon and former chief architect of Huawei HiSilicon's Kirin SoC — joined Fangqing as CEO.

Subsequently, Fangqing proposed a new technical direction: a distributed computing architecture that decouples "context-aware" and "context-free" processing. Specifically, it separates the Feed-Forward Network (FNN) and Attention mechanism into two independent modules, assigning each to the hardware architecture best suited for distributed processing — rather than chaining them together in the same layer as traditional Transformers do — thereby improving overall computational efficiency.

From a capital perspective, Liang Jun's name in the chip industry itself signals a systems-level vision.

As one of the few architects in China's chip sector with top-tier experience in both general-purpose SoCs and high-performance AI chips, Liang's career spans nearly the entire golden two decades of Chinese chip design's rise. He spent 17 years at Huawei HiSilicon as chief architect of the Kirin SoC, single-handedly creating the Kirin 970 — the world's first smartphone SoC with an integrated NPU, making "on-device AI" a tangible reality. In 2017, he moved to Cambricon as CTO, leading the AI chip unicorn's early technical planning and product development through its full journey from unicorn to STAR Market listing.

Liang's experience of "having seen the summit and personally led the climb" is undeniably scarce in today's primary market. And capital's heavy bet on Fangqing isn't just a bet on the person — it's a bet on the possibility of breaking the existing compute monopoly.

2025 is seen as the decisive year for AI application explosion. As domestic large models like DeepSeek spark a new wave, and giants like ByteDance, Tencent, and Alibaba announce hundred-billion-yuan AI infrastructure investments, compute anxiety has never felt so concrete. But in this surge, capital logic is also shifting subtly: from simply seeking Nvidia "alternatives," toward finding "new species" that can break through the Transformer architecture's efficiency bottlenecks.

Traditional Transformer architectures chain Attention — responsible for memory and context — and FNN — responsible for logic and knowledge — at the same chip level, creating significant efficiency waste. When you just want the AI to do simple logical reasoning, it has to drag along heavy memory modules; when you need to process ultra-long text, massive memory throughput demands clog up the compute units entirely.

Hence the emergence of completely different approaches: GPGPU, RVV (RISC-V Vector Extension), compute-in-memory, SRAM-based "speed-for-scale" designs. Fangqing's decoupled architecture is another new attempt.

Liang told AnYong Waves that the first priority in chip design should shift from pursuing single-chip performance and integration toward prioritizing a scalable system design. In practical terms, Fangqing's approach liberates compute capability from a single SoC-centric model, giving edge devices like smart glasses and earbuds the chance to become compute nodes on par with smartphones.

"When you change the first priority to pursuing a scalable system design, it transforms existing AI hardware design thinking and opens opportunities to create new system forms and new markets."

This perhaps also means that Fangqing's ceiling isn't merely as an AI chip and system architecture design company, but holds platform-level potential to define next-generation AI hardware systems. This imaginative space partly explains how it so rapidly assembled a luxury shareholder roster spanning internet giants, hardware leaders, automakers, state capital, and top-tier VCs.

When a top-tier architect returns to the stage with fresh thinking for the "post-Transformer era," capital firm in its belief that "AI is the future" indeed has little reason to miss this bet.

Below is AnYong Waves' conversation with Fangqing Technology CEO Liang Jun —

AnYong: Domestic GPU chip companies are exceptionally hot right now — Moore Threads surged post-IPO, MetaX is in the listing process, and Huawei has also pivoted to GPU architecture. What's your view on the future of GPU chips?

Liang Jun: GPGPU architecture (note: General-Purpose GPU, used for non-graphics computing tasks with emphasis on general compute capability) is designed for high concurrency and high throughput. Achieving low latency simultaneously requires greater cost and sacrifice.

Moreover, compared to global competitors, Chinese companies face more supply chain restrictions. Under constraints including regulated process nodes, the difficulties are greater. In other words, the cost Chinese companies pay is higher.

Many domestic companies' pivot to GPGPU architecture is partly based on better compatibility with Nvidia's existing software ecosystem. Another reason comes from market feedback over recent years — recognizing that the market is fundamentally general-purpose compute-oriented, and that deficiencies in underlying general-purpose compute design will eventually manifest in products' inability to support customer needs.

From an R&D organization and management perspective, when the organizational goal is defined as CUDA compatibility, there's much less need to reverse-engineer from application-layer requirements to verify whether underlying software-hardware implementations are appropriate. Using Nvidia's design as the benchmark, you can directly compare whether底层软硬件实现匹配 at the底层 level, which simplifies R&D management considerably. The cost is that, given current supply chain realities, the product's achievable ceiling is also capped.

AnYong: Compared to this, what's new about Fangqing's architecture?

Liang Jun: Fangqing's goal is to become a company that defines innovative systems**, so we've chosen a different path and are targeting completely different markets.

General-purpose compute design is difficult because大量底层设计细节 ultimately reflect in the programming interface. Balancing programming generality with specialized hardware acceleration — this is not merely a technical problem but also an R&D organization management problem.

Operating a team with excellent technical taste while also delivering on time is extremely challenging work. But based on our experience, it remains achievable.

AnYong: Many companies have also emerged with new technical architectures — domestically Kunlunxin represents this, and overseas there are Groq, Tenstorrent, and others. How do you evaluate current mainstream architecture directions?

Liang Jun: At the most底层计算核 design level, things have converged to a limited set of options —

One is GPGPU, providing CUDA-compatible or CUDA-like programming interfaces to programmers.

Another emerged after the RISC-V RVV (RISC-V Vector Extension) V1.0 release in the second half of 2021: designing compute systems based on RISC-V. The advantage here is adopting open-source instruction set design at the底层 rather than proprietary instruction sets, significantly reducing customers' software investment risk. But there are problems. The current issue is that nearly all vendors are designing with CPU design thinking — what's actually needed is merely RISC-V instruction set compatibility, with hardware design implemented through entirely new thinking.

The third is cloud vendors' proprietary designs. Because they're sold as services, deficiencies in chip design generality can be compensated through system design. Google does this best. After eight or nine years iterating through seven generations of chips, they achieved a breakthrough this year for specific major customers. Groq can also be considered in this category — because LLM's decode phase is serial output, Groq's design pursues极致延迟, with system throughput equal to parallelism multiplied by the inverse of latency, gaining competitive advantage in user experience and per-token cost. The sacrifice is programming generality, hence the service-based sales model.

AnYong: So Fangqing is different from all of them?

Liang Jun: Fangqing's path is designing systems based on decoupled architecture. We believe decoupled structure represents a higher-level computing architecture and programming model, and this is our current focus. The choice of底层计算核心路线 is no longer a focus — it already has definitive conclusions. The first two approaches each have strengths and weaknesses; we don't have strong preferences. Any route done well can meet market needs. The consistent requirement is maintaining appropriate balance between programming generality and specialized hardware acceleration, which demands that software-hardware teams have correct understanding of this.

Additionally, in the decoupled architecture definition, the system is decomposed into context-aware parts, context-free parts, with native support for heterogeneous computing.

AnYong: You emphasize the distributed computing architecture decoupling "context-aware" and "context-free." What thinking underlies this?

Liang Jun: AI systems, in terms of computation itself, are large-scale parallel computing. Specifically regarding input-output sequence processing, because different sequences are independent, the implementation methods from software to底层 hardware differ significantly from those used for weight-related computation. One could also say that decoupled structure is a higher-level computing architecture and programming model oriented toward this application.

The computing paradigm is rapidly shifting from processor-centric system design to Memory-centric system design. In the current era of dominant AI models, this paradigm shift is factual but not yet widely recognized. This is the true reason why concepts like compute-in-memory and near-memory computing have gained such loud voices in the industry in recent years. However, current discussion is largely from a hardware perspective; viewing it through the lens of computing paradigm shift yields entirely new explanations.

Whether KV Cache or weights, both can largely be defined as Memory. Traditional Memory had two attributes: capacity and bandwidth. New Memory adds two new attributes: compute semantics and communication, becoming four-dimensional. If we draw a comparison, the input-output processing portion more resembles the processor in von Neumann architecture, while the weight-related processing portion more resembles Memory in traditional processors. Simultaneously, we believe these two portions can largely be viewed as new forms of Memory.

Based on these understandings, we've re-examined system design approaches. For many years, the trend in system design has been toward higher integration — SoCs becoming increasingly powerful, integrating more and more functions. Faced with current algorithm evolution, individual SoCs, constrained by physical limitations, increasingly show pronounced limitations in bandwidth, memory capacity, and other aspects.

We believe that chip design's first priority should shift from pursuing single-chip performance and integration improvements toward prioritizing a scalable system design. From this perspective, adopting decoupled architecture for system design becomes a rational choice.

AnYong: Specifically, in what ways can this architecture enhance chip capabilities?

Liang Jun: In the decoupled architecture definition, the system is decomposed into context-aware parts, context-free parts, and communication between them. System expansion thus shifts from single-dimensional to multi-dimensional expansion, with clear boundary definitions between system components. From this perspective, we've examined various computing systems, and the conclusions are very positive. Not only can we design more forms of computing systems and create new markets, accelerating deployment of various applications, but because we're composing at the system level, system development can be appropriately decoupled from chip development cycles. Meanwhile, because there's no longer need to design a full-function SoC, chip costs and development costs can be reduced, which also benefits accelerating industry innovation speed.

AnYong: If using the most通俗 example to explain to the public, what pain point are you trying to solve?

Liang Jun: For example, after a smartphone's SoC system is converted to decoupled architecture, phones, smart glasses, smart earbuds, smartwatches, and other devices can all serve as independent input-output processors connected to a weight processor — or put traditionally, the original SoC handles context-aware parts, while a new weight processor in the system handles context-free parts. So as long as you believe model capabilities will continue strengthening, various IO processors like earbuds and glasses need only connect to the weight processor to independently complete more functions. Under existing system definitions, these devices are peripherals to the phone SoC; in the new system, these devices are peers to the phone SoC.

Our judgment is that when the first priority shifts to pursuing a scalable system design, it will transform existing AI hardware design thinking, with opportunities to open new system forms and create new markets.

AnYong: With CUDA's ecosystem still dominant today, what is the relationship between Fangqing's architecture and CUDA? Will there be customer migration cost issues?

Liang Jun: Systems designed based on decoupled architecture natively support heterogeneous computing. Whether context-free parts or context-aware parts, both can be built based on existing systems. From this perspective, decoupled architecture-based systems represent the most compatible design with existing systems — which in some ways is counterintuitive.

The entire industry is still rapidly evolving. Fangqing's strategy is to design systems based on decoupled architecture. After system components are decoupled, innovation speed accelerates, enabling definition of more forms of computing systems and creation of new markets, rather than replacing existing systems.

AnYong: When you left four years ago, there was much speculation. Why ultimately choose to join a startup?

Liang Jun: The answer is simple: for Fangqing's goals, this work is better suited for a startup to accomplish. Startups have no historical baggage, making decisions simpler; facing market and capital market challenges, they must create innovative and differentiated products to survive; with small teams, more energy can be devoted to technical details.

I joined in August 2024. Since then, Fangqing has brought in multiple strategic investors. These strategic investors recognize Fangqing's team has the capability to complete technology platform development, market definition, and product R&D in a systematic manner, approve of current progress across all work streams, and are willing to provide financial support. So from the current perspective, this choice was correct.

On the other hand, my labor dispute with my former company involving enormous amounts is already public knowledge. Against this background, after a two-year non-compete period, the difficulty of re-entering the job market was extremely, extremely high. My understanding of reality, and making difficult but correct decisions — I trust my own judgment.

AnYong: When people mention you now, they still say former Cambricon CTO, former Huawei HiSilicon Kirin SoC chief architect. In 5-10 years, how do you hope people will introduce you?

Liang Jun: The China market doesn't lack high-tech companies. Companies with sustained original technology creation at the底层 level, excellent technical taste, and super products that achieve market success — such companies are rare in the market. This is the future goal I've set for Fangqing. When that day comes, I hope people will know me this way — Liang Jun is Fangqing's CEO.

Layout by Yao Nan | Images by IC Photo

Recommended Reading

Where money flows, people rise and fall