Looking Ahead to 2025: What Innovation Opportunities Await the AI Industry? | FreeS Fund Report

峰瑞资本峰瑞资本·November 28, 2024

Our Thinking on AI Technology Progress and Industrialization

In 2024, AI's "invasion" of the physical world seemed to become the new normal. Early in the year, OpenAI's video generation model Sora burst onto the scene, with users marveling that "reality no longer exists." In May, OpenAI released GPT-4o, a model capable of processing or generating text, images, audio, and other data formats. In October, the Nobel Prizes were announced, with both the physics and chemistry awards connected to AI. In early November, NVIDIA became the first company globally to surpass $3.6 trillion in market cap. But stripping away AI's halo as new technology and returning to the essence of industry, the AI sector still has numerous unresolved issues: many tech companies have bet massive sums on compute power, but has investment truly been proportional to returns? Along the AI value chain, which players actually hold the power of discourse, and can a value chain where only a minority profits maintain equilibrium? Is building consumer-facing AI applications really a shortcut?

In this industry research piece, we focus on the core questions facing the AI industry and explore what new possibilities lie ahead. Here are some key takeaways:

  • The compute bottleneck isn't merely a technical or infrastructure problem — it's a critical variable affecting the entire industry's competitive landscape.
  • We're gradually entering a new era of flexible multimodal conversion. Simply put, using AI to achieve mutual understanding and transformation between text, images, audio, video, and additional modalities.
  • In the future vision of human labor, work forms are progressively becoming "software-ized." Complex labor is abstracted into callable software services, workflows are heavily standardized and modularized, and labor capabilities become as easily accessible as "plug-and-play" tools.
  • The AI industry remains in a phase of severe losses, with enormous room for improvement in commercialization.
  • Cloud providers not only command vast commercial ecosystems and technical resources, but also possess cloud service markets worth hundreds of billions of dollars. They are the undisputed "chain masters" of the value chain.
  • In 2024, category shifts among leading AI applications weren't particularly significant. Creative tools (such as image and video content creation) still accounted for the largest share.
  • ToP (professional-facing) applications demonstrate strong market potential, ToB (enterprise-facing) applications follow relatively complex development paths, and ToC applications face considerable challenges.
  • In the AI application space, Copilot and AI Agent represent two primary technical implementation approaches. Copilot can be understood as "assisted driving," suiting established tech giants with first-mover advantages. AI Agent can be viewed as "autonomous driving," perhaps better suited to startups with sufficient innovative capacity.
  • North America and Europe account for two-thirds of the AI mobile application market, a major reason why numerous Chinese AI companies are actively expanding overseas.

We hope this offers fresh perspectives. We continue to track developments in the AI sector. If you're an entrepreneur or practitioner in AI, you're welcome to contact the author of this article, Chen Shi, Investment Partner at FreeS Fund (chenshi@freesvc.com). P.S. We used GPT to assist with editing portions of this content.

Engagement Giveaway: What do you think about AI's current progress, or what expectations do you have for its future development? Join the conversation in the comments. We'll randomly select 5 readers to each receive a copy of What Is ChatGPT Doing... And Why Does It Work?

01 New Developments in AI in 2024

In 2024, OpenAI spent most of the year in a state of "being challenged." Anthropic's Claude 3.5 Sonnet and Google's Gemini 1.5, among other leading foundation models, continually pressed against OpenAI's GPT-4. It wasn't until near the end of Q3 that OpenAI released its new o1 model, whose novel training and inference methods based on chain-of-thought reasoning and reinforcement learning demonstrated complex reasoning capabilities clearly surpassing those of traditional models like GPT-4, allowing OpenAI to maintain its position as industry leader.

"Multimodality" delivered pleasant surprises. Early in 2024, OpenAI's video generation model Sora burst onto the scene, demonstrating powerful video generation capabilities for the first time and sending shockwaves through the industry. In May, OpenAI released GPT-4o, where the "o" stands for "omni." This model can process or generate text, images, audio, and other data formats, and even features realistic real-time voice conversation capabilities.

The open-source world held its own as well. For instance, Meta launched the Llama 3.1 405B version in July, achieving parity with leading foundation models like GPT-4o and Claude 3.5 Sonnet in reasoning, mathematics, multilingual processing, and long-context tasks. One could say that Llama 3.1 narrowed the gap between open and closed models, further squeezing the survival space for non-leading foundation models globally. Additionally, Chinese open-source projects such as Qwen-2 and DeepSeek have won numerous users worldwide.

With the development of "distillation" and "quantization" techniques, model miniaturization and on-device deployment have gradually become trends. Multiple companies have released specialized or edge-side small models with under 4B (4 billion) parameters, drastically reducing compute demands while maintaining performance as much as possible. Apple released Apple Intelligence, a personal intelligence system for iPhone, iPad, and Mac, in June, embedding a roughly 3B (3 billion parameter) local model in these devices to provide powerful generative AI capabilities.

Generative AI and large model technologies accelerated their breakthrough into mainstream consciousness, achieving advances in basic science, autonomous driving, and embodied intelligence. Demis Hassabis and John Jumper of Google DeepMind, known as the "fathers of AlphaFold," received the Nobel Prize in Chemistry for protein structure prediction, while Geoffrey Hinton and John Hopfield were awarded the Nobel Prize in Physics for their research on neural networks — underscoring AI's profound impact on biology and physics. Also worth noting: thanks to multimodal large model development, the safety and reliability of autonomous driving have significantly improved, and the perception, decision-making, and interaction capabilities of embodied intelligent robots have been enhanced.

In the AI infrastructure domain, NVIDIA, with its formidable profitability (Q2 revenue of approximately $30 billion, net profit of approximately $16.6 billion) and monopoly position in compute chips, has become the second most valuable company globally, behind only Apple (as of November 26, 2024, market cap exceeding $3.3 trillion). Traditional competitors like AMD and Intel have been unable to close the gap, while Cerebras, Groq, and other AI chip startups hope to find openings in the inference chip market.

Compared to the triumphant advance of large models, AI application deployment has fallen short of expectations. This is reflected in the fact that leading products still have room for improvement in user growth, retention, and activity levels. Moreover, these applications are concentrated in a limited number of domains: large language model assistants, AI companionship, multimodal creative tools, programming assistance, sales and marketing. They've achieved some user or commercial results, but their coverage remains insufficient. Furthermore, the AI industry currently lacks sufficient self-sustaining revenue generation capacity, with severe misalignment between investment and output.

Industry observers believe the AI supply chain exists in a fragile equilibrium, with key participants including foundries (such as TSMC), chip manufacturers (such as NVIDIA), industrial energy suppliers, cloud providers, AI model developers, and application service providers — among which large cloud providers serve as risk absorbers. Should the confidence or investment willingness of large cloud providers falter, this fragile equilibrium could be broken, triggering supply chain turbulence.

02 Industry Macro Overview

Misaligned Investment and Returns

Tech giants and VCs have placed massive bets on the AI industry. According to Tencent Technology's analysis, just the four giants — Google, Meta, Microsoft, and Amazon — invested $52.9 billion in Q2 2024 alone. As of end-August, AI startups had secured as much as $64.1 billion in venture capital.

The effects of these massive investments are gradually becoming visible, with the four giants having built out 1,000 data centers. But AI data centers consume enormous amounts of energy. According to market research firm DataCenterHawk, from 2015 to 2024, data centers in the US and Canada have increased their power commitments to energy companies by nearly ninefold. Beyond energy, GPUs account for roughly half of data center costs, with NVIDIA generating $30 billion in revenue from GPU compute sales in Q2.

Beyond these hard costs, talent investment — the primary soft cost — continues to escalate in the AI industry. According to compensation data from third-party job site Levels.fyi for Q1 2024, senior AI engineers commanded average compensation of approximately $680,000, far exceeding the $495,000 for senior non-AI engineers.

Against these massive investments, the AI foundation model industry's total annual customer revenue currently amounts to only a few tens of billions of dollars. Among the leading companies, OpenAI is projected to generate roughly $3.7 billion in annual revenue while expecting $5 billion in losses; the New York Times reports that compute costs represent OpenAI's largest expense. Microsoft's GitHub Copilot generates approximately $300 million in annual revenue, yet The Wall Street Journal notes that for the first several months of 2024, GitHub Copilot was effectively "subsidizing" most users by an average of $20 per month, with some users receiving subsidies as high as $80. In short, the AI foundation model industry remains in a phase of severe losses. Sequoia Capital once noted in an article that $600 billion in annual customer revenue would represent a reasonable level for the AI industry, illustrating just how far current commercialization still has to go.

According to SensorTower data, global AI mobile app paid revenue for all of 2024 is estimated at $3 billion, with image and video AI applications dominating at 53% of revenue; conversational AI ranks second at 29%, with all other categories combined accounting for less than 20%. By region, North America and Europe contribute two-thirds of market share, making them the primary consumer markets for AI applications. This is a major reason why numerous Chinese AI companies are aggressively expanding overseas.

Cloud Providers Become the "Chain Masters" of the AI Supply Chain

In its article "The AI Supply Chain Tug of War," Sequoia Capital observed that the AI supply chain currently exists in a fragile equilibrium. They divide the AI supply chain into six layers from bottom to top, with profitability varying dramatically across layers.

Layer 1 chip foundries (such as TSMC) and Layer 2 chip designers (such as NVIDIA) are the current primary winners, maintaining high profit margins; Layer 3 industrial energy suppliers (such as power utilities) have also benefited substantially from surging data center demand. Yet Layer 4 cloud providers, who serve as the core infrastructure backbone of the supply chain, are in a heavy investment phase — not only spending enormous sums to build data centers, but also training proprietary models or making major investments in AI model developers. Layer 5 AI model developers currently face losses as well. The sixth and final layer consists of application service providers serving end customers. Despite their considerable potential, they depend on consumer and enterprise payments, and the current market remains too small to sustain the entire supply chain's economic model. This makes large cloud providers the primary risk bearers across the entire supply chain. As the central hub of the AI industry, cloud providers command vast commercial ecosystems and technical resources, with market scale in the hundreds of billions of dollars. For this reason, their position in the industry chain is unshakeable — they are unquestionably the "chain masters."

Industry Landscape: The Top Tier Has Largely Stabilized

1. Leading Foundation Models

Over the past year, the top tier of US foundation models has remained largely stable, forming a "3+1+1" structure: three globally leading closed-source model companies (OpenAI, Anthropic, and Google), one leading open-source model company in Meta, and xAI following closely behind with Tesla's backing. Additionally, tech giants such as Apple may join this competitive field in the future; Apple's self-developed AFM model has already been deployed in its personal intelligence system, Apple Intelligence.

By contrast, China's foundation model industry is gradually consolidating. Leading cloud providers have not only launched their own foundation models but also actively invested in the top six foundation model startups (including Zhipu AI, Moonshot AI, Baichuan, MiniMax, StepFun, and 01.AI).

Numerous startups previously positioned as foundation model developers have since pivoted, with only a rare few competitively viable firms continuing to pursue self-developed foundation models.

2. AI Applications

Currently, user growth for AI applications has fallen short of expectations. Whether measured by websites or apps, the gap between leading AI applications and traditional leading applications is substantial across two critical metrics: user scale and user engagement.

Take OpenAI's ChatGPT as an example. After experiencing steep growth in its early phase (early 2023), this most-visited AI breakout application entered a plateau in traffic beginning in April 2023. Although the May 2024 release of the GPT-4o model triggered a new wave of growth for ChatGPT, this surge proved relatively brief, and its sustainability remains to be seen.

Character.ai, another well-known application ranking second in user visits, has also seen its website traffic growth slow since the second half of 2023. If even industry-leading applications face growth bottlenecks in their early development, it may indicate that development pressures across the entire AI application field are greater than anticipated.

Over the past year, category shifts among leading AI applications have not been particularly significant. Comparing the US AI application Top 50 lists between 2023 and 2024, the overall categories have remained largely stable. Creative tools (such as image and video content creation) still occupy the largest share, with large language model assistants, AI companions, and model hubs also maintaining their mainstream positions. New entrants to the rankings were limited to a few smaller categories including food, dating, and music creative tools.

Model Progress (Algorithms, Compute, and Data)

The "New Replacing the Old" in AI Algorithms

1. OpenAI's New Model — o1

Amid industry concerns about slowing progress in traditional pre-training models, OpenAI released its next-generation language model o1 in September 2024. Although technical details were not fully disclosed, industry speculation suggests that o1 employs an entirely new training and inference approach, combining reinforcement learning techniques to significantly enhance the model's reasoning capabilities. o1 may generate internal "chains of thought" to simulate human System 2 thinking, enabling step-by-step reasoning, self-correction, and optimization when answering complex questions.

Psychologist Daniel Kahneman proposed two modes of human thinking — System 1 and System 2 — with the former being fast and intuitive, the latter slow and rational. Industry experts believe that traditional models like GPT-4 more closely resemble System 1, rapidly generating answers but lacking deep reasoning, while o1 leans toward System 2, improving answer quality through step-by-step reasoning.

o1 may draw technical inspiration from AlphaGo Zero's approach to Go, such as the combination of reinforcement learning, self-play, and chain of thought. Although Go's rule-based nature differs from the open-endedness of natural language, these techniques not only provide o1 with stronger reasoning capabilities but also signal the possibility of further AI breakthroughs in complex task domains.

Another significant contribution of o1 lies in breaking through the data wall entirely determined by pre-training, introducing an entirely new RL (Reinforcement Learning) Scaling Law that incorporates reinforcement learning during model training and inference, thereby achieving complex reasoning capabilities beyond existing models.

Overall, model o1 demonstrates performance exceeding previous models on high-value tasks in scientific research, programming, and mathematics, showing tremendous technical potential.

2. "Multimodality" — Breaking Down the "Modality Barriers" of Data

With the development of generative AI and large models, we are gradually entering a new era of flexible multimodal transformation. Simply put, this means using AI to achieve mutual understanding and conversion between text, images, audio, video, and additional modalities. Supporting this transformation — enabling multimodality to "deconstruct" and "reconstruct" — is a series of groundbreaking algorithms.

  • The Power of Deconstruction: From "Pixel-Level Analysis" to "High-Dimensional Vector Space"

Currently, when AI perceives data in different modalities (such as images, text, audio), it no longer limits itself to traditional single-modality processing methods, but instead uses high-dimensional vector spaces to understand data. This may sound rather abstract; put more plainly, AI is no longer simply counting pixels or letters, but "compressing" images or text into abstract vectors that can capture deep relationships within images and text — such as color in images, or semantic meaning in text.

For example, large language models (LLMs) such as GPT and BERT can already encode the semantic and contextual relationships of text into vectors. In the visual domain, similar vectorization methods enable AI not only to "see" images but to "understand" objects and scenes within them. In this way, AI seems to acquire a kind of "mind reading": it can not only understand a text description but convert it into a "mental image," or even a video.

  • The Art of Reconstruction: The "Magic" of AI Algorithms

Understanding "deconstruction" leads to the question of "reconstruction." This is where Diffusion Models, NeRF (Neural Radiance Fields), 3DGS (3D Gaussian Splatting), and DiT (Diffusion Transformer) algorithmic techniques can truly shine.

Diffusion Model: The Artist of Gradual Denoising

The Diffusion Model is like an extraordinarily patient artist. It starts with an image full of noise, removes that noise layer by layer, and eventually restores a clear picture. Through this denoising generation process, the Diffusion Model achieves high-quality image generation and reconstruction.

However, this process seems utterly incredible to humans. We typically sketch outlines first, then add color — but the Diffusion Model does the exact opposite: it begins from chaos, grows clearer as it "removes," and ultimately completes the artistic creation. Seeing its results, human artists might have to admit defeat!

NeRF and 3DGS: The Architects and Sculptors of the Three-Dimensional World

NeRF, on the other hand, is more like an architect working in space — it can transform a series of two-dimensional images into realistic three-dimensional scenes. What's remarkable about NeRF is its ability to infer a scene's three-dimensional structure from limited 2D images, similar to human spatial perception.

Complementing NeRF is 3DGS (3D Gaussian Splatting). As an important technique in 3D shape generation, it focuses on an object's structure and geometric features, capable of understanding and reconstructing the shapes of 3D objects — much like a "sculptor." 3DGS represents 3D scenes as collections of Gaussian distributions, enabling efficient rendering and reconstruction. It can generate refined 3D models from images or simple shape prompts. For example, it can not only produce chairs that meet specific requirements but also display realistic, rich details.

In fields like virtual reality and game development, this combination of architect and sculptor can generate not only realistic 3D scenes but also highly customized 3D objects.

DiT: The Director of the Video World

If the Diffusion Model is a painter, NeRF an architect, and 3DGS a sculptor, then DiT is like a film director. It decomposes video into individual frames, then denoises them frame by frame to generate smooth, coherent video.

DiT's advantage in video expression lies not only in generating high-quality images frame by frame, but more importantly in maintaining consistency across the time dimension. Put simply, DiT is responsible not just for taking "each photograph" well, but also for stringing those "photographs" into fluid video — thereby avoiding the frame-jumping problems common in traditional video generation algorithms.

  • The Infinite Possibilities of Multimodality

Supported by these deconstruction and reconstruction technologies, AI is advancing toward flexible multimodal conversion. Future multimodal generation technologies will not only convert text to images and images to text, but also achieve seamless transitions between many more modalities.

It's worth emphasizing that the concept of "modality" is not limited to the aforementioned types or formats — it can expand further. For example, AlphaFold 3's ability to generate 3D protein structures and Notebook LLM's conversion of documents into two-person conversational podcasts both fall within the scope of modal conversion.

Multimodality has broad application prospects in healthcare, transportation, education, marketing, and entertainment.

In healthcare, for instance, AI can combine medical images, clinical records, and laboratory test results to provide more accurate diagnoses and treatment recommendations.

In marketing, FreeS Fund portfolio company Teakan Technology has launched Topview.ai for overseas markets — a multimodal conversion tool. As an AI-driven marketing video generation tool, it automatically performs modal conversion, helping social media creators transform input prompts or product detail page links and other materials into viral commercial short videos with one click. Topview.ai uses AI to analyze the scripts and visuals of popular marketing videos, deconstructing their structure and patterns. By feeding this data into large language models and multimodal models for fine-tuning, it forms an easy-to-use AI video generation tool.

▲ Video source: Teakan Technology

▲ Video source: Teakan Technology

3. The "World Model's" Three Philosophical Questions: What, Where, and Why?

In current multimodal large language models, text is typically treated as the "primary modality" because other modalities (such as images and audio) mostly need to be converted through specific encoders into high-dimensional vectors corresponding to text, facilitating model understanding and processing.

However, many things in the physical world are difficult to express accurately in text — complex spatial relationships and sensory experiences, for example. Therefore, it's hard to rely solely on current large language models (whose main capabilities derive from training on massive text data) to fully understand the physical world and interact with it. Even with the addition of other types of modal data, this approach may still lead to information loss.

Some scientists are attempting to deepen AI's ability to understand the real world, offering potential solutions to the limitations of existing models. Examples include Meta Chief AI Scientist Yann LeCun's concept of the "world model," and Stanford University professor Fei-Fei Li's concept of "spatial intelligence."

LeCun believes that current large language models lack understanding of the physical world and common sense, and cannot perform effective reasoning and planning. He advocates developing AI systems equipped with world models that can learn how the world works through observation and interaction, like humans do, thereby achieving more advanced intelligence.

Additionally, renowned AI expert Gary Bradski — known as the father of OpenCV — has proposed the "WHAT-WHERE-WHY" framework:

  • WHAT: Identifying and classifying objects or events in the environment. For example, an AI system can identify entities such as people, vehicles, or trees in images.
  • WHERE: Determining the spatial position and relationships of identified objects or events, involving spatial positioning and navigation, enabling AI to understand how objects are distributed in space and their relative positions.
  • WHY: Understanding the causal relationships and purposes behind objects or events, encompassing reasoning and decision-making, allowing AI to grasp the motivations and reasons behind behaviors, thereby facilitating higher-level reasoning and prediction (underlying physical laws, for example).

Gary Bradski hopes that by integrating these components, AI systems can comprehensively understand the physical environments they inhabit and make more intelligent decisions and actions. This framework is particularly beneficial for developing advanced robots and automated systems that require deep understanding of complex environments.


The "Arms Race" of Computing Power

Against the backdrop of rapid development in generative AI and large models, computing power has become a key indicator of core competitiveness.

Tech giants are investing heavily in building hyperscale GPU clusters to meet growing AI computing demands. For example, Elon Musk's xAI has built a supercomputer called Colossus, equipped with 100,000 Nvidia H100 GPUs, with plans to double GPU capacity. Meta is also training its next-generation Llama 4 AI model, expected for release in 2025, using over 100,000 Nvidia H100 GPUs.

This is an "arms race" of computing power.

The computing power bottleneck is not merely a technical and infrastructure issue — it's an important variable affecting the entire industry's competitive landscape. OpenAI CEO Sam Altman revealed in late October that GPT-5 might not be released in 2024, and one challenge the company faces is "how we allocate our compute to support many great ideas."

Some argue that the initial competitive phase in AI has ended, and the future will enter a new era where "construction is king." This "construction" primarily refers to data center expansion. Over the past 12 months, a defining feature of the AI field has been the race for model parity — several leading large model companies have essentially caught up to each other in technical capabilities. The next phase will shift focus toward primarily physical construction.

Bloomberg reports that Microsoft, Google parent company Alphabet, Amazon, and Meta will collectively spend over $200 billion on capital expenditures in 2024. Massive investment is driving rapid growth in AI data center construction. According to some estimates, training the next generation of large models will require 10 times the computing power of current models, placing higher demands on data center construction. In that case, construction efficiency may determine who emerges victorious in AI's next phase more than research breakthroughs.


The Scarcity of High-Quality Data

In the AI field, data is like fuel, driving model advancement. However, the traditional internet data "oil well" is no longer sufficient — AI models crave higher-quality "frontier data" to enhance their reasoning capabilities and overall performance. This data transcends conventional information, encompassing complex reasoning processes, professional knowledge, and human thought patterns, becoming key to breaking through model capability boundaries.

As former Tesla AI Director and OpenAI founding member Andrej Karpathy recently noted on social media, data for training large language models (LLMs) can be compared to exercises in human textbooks. Just as humans work through exercises, data is compressed by LLMs into weights, generating application solutions for human use — solutions that may even become automated in the future. This has also transformed the role of data annotators: from simply drawing bounding boxes to proving complex mathematical theorems or critically reviewing multiple AI-generated solutions. Similar to OpenAI's latest o1 model, scaling high-quality, frontier data is indispensable.

Facing the shortage of high-quality data, synthetic data has become a "lifeline" for AI training. By simulating real data to generate diverse, high-quality training samples, synthetic data effectively addresses problems such as difficulty obtaining real data and high costs of privacy protection. Currently, synthetic data is already being widely applied in autonomous driving, medical imaging, financial risk control, and augmented reality. However, this technology also carries certain risks and challenges — for example, inconsistency between synthetic and real data distributions may lead to model bias, and hidden misleading patterns may affect model reliability.

In the frontier data domain, FreeS Fund has invested in Integer Intelligence. Integer Intelligence is committed to becoming a data partner for the AI industry, benchmarking against leading US company Scale AI. Its intelligent data engineering platform (MooreData Platform) and dataset construction services (ACE Service) serve multiple AI application scenarios including intelligent driving, generative AI, and embodied intelligence, meeting their needs for advanced intelligent annotation tools and high-quality data. Integer Intelligence not only deeply serves local Chinese clients but is also actively expanding into overseas markets.


Application Frontiers

ToC, ToB, and ToP

Based on our observations, AI applications can be divided into three categories by target customer: ToC (consumer-facing), ToB (enterprise-facing), and ToP (professional user-facing).

Currently, ToP applications are showing strong market potential by helping professional users improve work efficiency, enhance intelligence, and stimulate creativity.

ToB applications have made some progress, but because they need to be embedded in enterprise internal processes, their development path is relatively complex. At this stage, such applications mainly enter through vertical "independent business modules" or horizontal "general skill modules," with further expansion still facing certain difficulties.

By comparison, ToC applications face greater challenges. In the short term, ToC applications may struggle to pose strong competition to existing leading companies, and commercialization progress has been slow.

1. ToP — The Rise of Professional Users

As consumer internet adoption deepened and industries went digital, the "prosumer" segment has emerged as the core driving force in the AI application market. They fall mainly into three categories:

  • Content creators: Social media influencers, graphic designers, and video/audio producers who engage audiences through creation and sharing, fueling the creator economy.
  • Professional practitioners: Technical experts, consultants, freelancers, designers, programmers, and other skilled professionals who apply specialized expertise to drive technology adoption and innovation within their fields.
  • Power users: Individuals with deep product knowledge who actively participate in improvement or customization—DIY enthusiasts, open-source community members, and others who don't merely consume products but contribute to their development and optimization.

Despite operating in different domains, these three user types share notable commonalities: a relentless pursuit of efficiency, sensitivity to technological innovation, and enthusiasm for knowledge sharing. They excel at solving complex problems, engage actively through communities, and demonstrate remarkable adaptability. They can rapidly learn and apply AI tools, accelerating AI adoption across their respective fields. Moreover, these professional users can propel AI applications toward a product-led growth (PLG) path—acquiring customers through the product itself rather than massive marketing spend.

Empowered by the rich diversity and robust capabilities of AI applications, these professional users are evolving toward "super individuals." They can unleash greater creativity through AI tools and redefine traditional professional boundaries by integrating technology with specialized knowledge. This individual evolution will also drive industry innovation and leaps in social productivity, which I'll elaborate on below.

Among the top 50 AI applications by monthly visits in the US today, excluding some consumer-oriented cases like Character.ai, most fall into the ToP category.

Take ChatGPT as an example. Based on my own usage, it currently feels more like a ToP tool—powerful but occasionally error-prone, with a steep learning curve that makes it difficult for average users to master. But as the product becomes more accessible, features mature, and users build their skills, I believe ChatGPT has potential to expand into broader ToC markets.

For Chinese AI startups, especially those targeting global markets, prioritizing the capture of ToP user needs and building well-crafted tools through scenario-based innovation will be a critical path to success.

ToP is not only an important entry point for AI applications to open up markets, but also lays groundwork for future expansion into ToB or ToC markets.

But doing ToP well requires startup teams to immerse themselves across industries and scenarios, capturing the pain points and needs of various professional users, and leveraging AI technology for product innovation. This is what we've long emphasized as "technology-first, scenario-heavy" for AI startups.

Two FreeS Fund portfolio companies, Teakan and Babel, exemplify this with their overseas-facing products TopView.ai and Gru.ai—both ToP AI applications.

In the ToP space, FreeS Fund also invested in Ice Whale Technology, an AI intelligent hardware company designing innovative private cloud products for global creators and professional users. Beyond providing efficient solutions for audio/video asset management and small studio collaboration, Ice Whale Technology has launched its flagship product integrating edge-side GPU—ZimaCube. (See Silicon Valley PC Innovation Since 1980: The AI Hardware Opportunity)

2. ToB — Entering Through "Independent Business Modules" and "General Skill Modules"

For AI applications to successfully penetrate enterprises today, they must fully account for the complexity of existing organizational processes and management structures.

AI applications might choose between two entry points. First, vertical independent business modules—solutions targeting specific enterprise scenarios or clearly defined business needs that can be rapidly deployed in a "modular" fashion, operate independently, and deliver immediate value to a particular business process. Second, horizontal general skill modules—professional skill modules applicable across multiple departments. This approach not only integrates quickly into enterprise operations and satisfies diverse needs, but also reduces implementation and adoption difficulty.

In July 2024, US investment firm a16z published an article, "The Death of Salesforce": Why AI Will Transform the Next Generation of Sales Tech, exploring AI's potential to transform enterprise sales technology. The accompanying graphic listed available AI application products, most of which aligned with the "independent business module" and "general skill module" characteristics described above.

It's worth noting that ToB and ToP have some overlap as well. With GPT-4o's help, we've mapped out their core distinctions:

  • Target users: ToB serves enterprises or organizations to improve overall operational efficiency; ToP targets professional users like content creators and technical experts to enhance individual productivity and professional capabilities.
  • Application scenarios: ToB embeds into enterprise workflows such as sales and supply chain management; ToP focuses on individual workflows like content creation and data analysis.
  • Sales model: ToB relies on customized development and long-term customer support with longer sales cycles; ToP typically employs product-led growth (PLG) strategies with shorter sales cycles.
  • Pricing strategy: ToB pricing is flexible and tied to enterprise scale; ToP mostly uses transparent subscription or one-time purchase models.
  • Product complexity: ToB has high complexity requiring professional training; ToP emphasizes ease of use with lower support demands.

In the ToB direction for AI applications, FreeS Fund has invested in companies including Brix and Shilai Intelligence. Brix serves North American and European enterprises with AI-driven solutions for global hiring. Through its Hiring Agent, Brix reaches approximately 20 million talents worldwide, automating candidate screening, resume analysis, and interview processes to help companies rapidly build high-performing teams. Its Working Agent supports intelligent management of remote teams, providing one-stop solutions for building global organizations of 100 to 500 people. Shilai Intelligence, meanwhile, leverages self-developed AI Agents and reinforcement learning technologies to provide fully automated private traffic marketing and operations solutions for offline food and beverage service establishments. Their AI marketing model, trained on vertical scenario data, can generate and push personalized marketing discount schemes to different consumers in real time—optimizing marketing costs while significantly improving conversion. Shilai's AI Agent marketing system can help establishments improve marketing conversion by 50%-100% and correspondingly increase revenue by an average of 15-20%.

3. ToC — The Disruption Moment Hasn't Arrived, Business Models Face Challenges

Currently, ToC AI applications have gained certain user scale in directions like photo beautification, gaming, education, and entertainment. However, these applications remain distant from achieving large-scale commercialization, while facing homogeneous competition and pressure from established industry leaders.

The main obstacles include insufficient product experience disruption and completeness, relatively low technical barriers, and unclear business models. For example, current AI photo editing applications lack disruptive innovation compared to the mobile internet era's "Meitu." And mainstream editing products like Meitu are actively incorporating AI features, making it difficult for newcomers to stand out.

Miaoya might be an exception. In 2024, Miaoya briefly captured significant user attention and adoption through distinctive product features and user experience. Its "try before you pay" strategy and 9.9 yuan pricing proved highly attractive. Additionally, backed by a major internet company with ample resources, it gained certain first-mover advantages in a new niche market—but how large this market is, and Miaoya's subsequent growth potential, remain to be seen.

FreeS Fund also has several portfolio companies experimenting on the ToC side. They made good progress in 2024, and we look forward to potential breakthroughs in 2025.

Compared to AI applications, the mature app business model from the traditional mobile internet era attracted users through free services and monetized indirectly through advertising. I personally experienced the complete journey of a leading Chinese app company from startup to scale to acquisition by a major tech firm, and deeply understand this model's advantages.

However, at the current stage, this model may no longer suit AI applications. ToC startups must be prepared to charge users early in their product launch; otherwise, they will face severe commercialization challenges later (for more detailed analysis, see Toward 2024: How We Think About AI Venture Investment | FreeS Year-End Special). Of course, the AI industry changes rapidly, and entirely new business models and innovative approaches may emerge—we'll see.

For ToC AI application startups, initial market positioning, product definition, and business model design are particularly critical. We welcome teams with ideas to reach out early for discussion, so we can together find the path to breaking through with ToC applications in the AI era.

Copilot or AI Agent — Different Paths

In AI applications, Copilot and AI Agent represent two main technical implementation approaches. Copilot aims to augment user capabilities, such as assisting with code writing or document processing. AI Agent's core lies in executing tasks on behalf of users, such as booking travel or empowering financial decisions.

Using autonomous driving as an analogy, Copilot resembles assisted driving—it aids user operation and provides suggestions, but final decision-making remains with the user. AI Agent can be viewed as autonomous driving: the user simply sets the goal, and the Agent independently completes the entire process.

In the early stages of AI application entrepreneurship, how should teams choose between Copilot and AI Agent? This is a critical decision requiring comprehensive consideration of product positioning, technical approach, and user needs.

Right now, Copilot-style applications have become a priority for major tech companies. In the programming space, for example, Microsoft developed GitHub Copilot to assist users with coding and boost productivity. But startups can still find opportunities in this domain and make a name for themselves in specific verticals. Anysphere, founded in 2022, launched the AI coding application Cursor.ai, which introduced new interaction paradigms and the ability to complete code across entire program files globally, and has already reached a $2.5 billion valuation.

By contrast, AI Agent applications face greater challenges and uncertainty. Cognition Labs, an American company, released a product called Devin that attempts to automatically generate complete, executable program code by reading product requirement documents. While this direction is brimming with imagination, it's extraordinarily difficult to execute. For one, current large language models still lack the logical reasoning and task execution capabilities to fully support this goal; for another, whether ordinary users can express their needs clearly and structurally remains an unsolved problem in itself.

The industry broadly agrees that Copilot is better suited to existing software giants across various sectors, while AI Agent offers startups room to explore. AI Agent involves technological breakthroughs and feasibility validation, and its risks and uncertainties put startups and major companies on roughly equal footing with similar conditions for exploration. Additionally, startups developing AI Agents can adopt a phased strategy, first focusing on small scenarios in specific vertical domains to reduce development difficulty and increase their probability of success.

Babel, an AI coding startup invested in by FreeS Fund, is a representative example in this space. They focus on AI Agent R&D and have established a leading position in the industry through exceptional technical strength, once achieving first place on OpenAI's SWE-benchmark-verified.

In terms of product positioning, Babel avoids a "do everything" approach, instead concentrating on a vertical and well-defined application scenario: automatically generating unit tests for clients. Its core product, Test Gru, has already launched in the United States. Without requiring users to change their existing workflows, it automatically generates and runs unit tests for code, then submits pull requests. Currently, its customer-side PR acceptance rate is approximately 70%, a figure that fully demonstrates the product's practical feasibility and user acceptance.

Video source: Babel

Why Should Chinese AI Applications Go Global?

As we mentioned earlier, North America and Europe accounted for two-thirds (68%) of global AI mobile app in-app paid revenue for all of 2024, making them the primary consumer markets for AI applications. Going global — especially entering North American and European markets — is a rational and sensible choice for Chinese AI startups. These two markets also offer high average revenue per user (more than 5x the domestic Chinese market currently), are friendly to startups, have users with strong willingness to pay, and feature highly standardized demand. These advantages make North America and Europe ideal targets for Chinese AI startups seeking growth and business expansion. Most of the AI application companies we've invested in are currently implementing their own AI globalization plans.

Against the backdrop of slowing globalization, despite facing multiple regulatory constraints and pressures, Chinese enterprises continue to actively advance their global expansion, displaying a characteristic of "going global together." Close collaboration and "cross-empowerment" between AI applications and other Chinese companies expanding overseas will become an important strategy. Currently, Chinese companies going global encompass not only traditional goods and commodities, but also new e-commerce platforms (such as TikTok Shop, Temu, etc.), new manufacturing, new consumer brands, infrastructure, and factories. Through collaborative cooperation, Chinese enterprises can achieve resource sharing and mutual benefit.

This collective model of going global can not only address challenges, but also create greater growth space for Chinese AI startups in global competition.


05 2025 Outlook

Large Language Model Productization — Challenges and Trends

On the topic of large language model (LLM) productization, we recently conducted some external expert interviews, from which we can summarize the following challenges and trends:

1. Slow Product Adoption, Long Technology Cycles

The slow pace of product adoption fundamentally stems from insufficient model capabilities. Even top-tier closed-source large models, supported by prompt engineering and supervised fine-tuning, still struggle to achieve comprehensive superiority over existing systems. An excellent product represents a compromise between three elements: product functionality, model capability, and technical cost. Among these, product functionality is the core of value creation and cannot be compromised. Technical costs may exceed targets in the early stages but can gradually decline following Moore's Law and algorithmic advances. However, if model capabilities fail to break through, the entire industry will face stagnation.

2. The Coupling of Compute, Algorithms, and Data

The AI development path focused primarily on investing in and building compute infrastructure — a single-point breakthrough strategy — showed low overall returns in 2024, with even instances of idle compute centers. The root cause lies in the tight coupling between compute, algorithms, and data, which cannot be fully separated to enable industrial chain coordination.

For example, after pre-training data hit bottlenecks, synthetic data became the primary source, and synthetic data is essentially the product of algorithms plus compute. When algorithms encounter bottlenecks, they in turn rely on reinforcement learning supported by massive compute and data. This explains why Scale AI, an American data-centric company, has invested heavily in exploring Scaling Law, while leading North American SaaS providers like Databricks and Salesforce have also been moving downstream to build foundational layers.

Only by achieving coordinated development of compute, algorithms, and data can we continuously improve models' ability to solve long-horizon decision problems and drive iterative upgrades in model capabilities.

3. Building an Evaluation-Centric LLM System

The emergence of new technologies such as AI agents, multimodal models, embodied intelligence, and synthetic data are all fundamentally aimed at expanding LLMs' modalities and decision sequence lengths. The key to sustained progress lies in building an evaluation-centric LLM system. Reward signals are the critical factor determining behavioral trajectories, the most important environmental element beyond the three factors of production (compute, algorithms, and data), the key to building business differentiation, and the core element for closed-source models to establish competitive moats.

Current LLM applications remain in early stages, with the vast majority relying on supervised fine-tuning and human-crafted rules. Once system complexity reaches a certain threshold, this approach becomes unsustainable. In future AI application scenarios, a necessary condition for business success is possessing comprehensive and trustworthy evaluation capabilities and providing sufficient reward signals.

4. Users Demand Instant Feedback, Model Inference Goes Deeper

With the widespread adoption of intelligent recommendation systems and large language models, users increasingly expect frequent and personalized feedback. In many scenarios, providing such feedback delivers genuine product value.

For example, in AI-assisted programming, we have moved from ChatGPT (manual copy-paste) to GitHub Copilot (partial IDE integration), then to Cursor (deep IDE integration), and eventually toward Devin (fully automated AI agent, yet to be realized). User input decreases while model thinking processes lengthen.

Whether it's OpenAI o1's extended thinking or Anthropic's automated prompt engineering, the essence is trading longer inference time and higher costs for improved pass@1 rates and reduced user input.

In summary, LLM productization faces multiple challenges: improving model capabilities, coordinating compute/algorithms/data, building evaluation-centric systems, and balancing user demands with model inference depth. Deep research and resolution of these issues will help drive effective application and commercialization of LLM technology.


Key Development Priorities Going Forward

The industry broadly believes 2025 may become a pivotal node where AI technology gradually matures and applications achieve phased results, while also marking a year when the AI industry's "balance sheet" begins to repair itself. This year may signal an important step toward optimizing commercialization paths, moving from high investment and low output. Driven by technological breakthroughs and industrial development, the AI field may embark on exploring efficiency gains and value release, laying groundwork for future robust commercialization.

1. Optimization and Enhancement of Large Foundation Model Capabilities

Through innovative training and inference techniques, substantially strengthening complex reasoning and self-iterative capabilities, pushing large models toward deeper applications in high-value domains such as scientific research and programming. Simultaneously, optimizing model efficiency and operating costs to lay the technical foundation for widespread adoption and commercialization of large models, further accelerating industry innovation and cross-domain integration.

2. Advancing World Models Integrated with the Physical World

Dedicated to building world models with spatial intelligence, enabling systems to understand and simulate three-dimensional environments and further integrate into the physical world, driving development in robotics, autonomous driving, and virtual reality. Such technologies not only enhance AI's environmental perception and reasoning capabilities but also strengthen its practical task execution abilities, bringing more possibilities for future human-computer interaction.

3. AI Multimodal Fusion

By integrating multimodal data including text, images, audio, video, and 3D, generative AI will significantly improve content generation diversity and quality, creating entirely new application scenarios for creative industries, education, entertainment, and beyond.

4. AI Model Interpretability and Safety

As AI applications proliferate, model transparency and safety become paramount. Future research will focus on improving model interpretability, ensuring transparent decision-making processes, and guarding against potential security risks.

5. Deepening AI Applications in Professional Domains

AI will progressively penetrate high-value domains including healthcare, law, finance, scientific research, education, and transportation. By providing customized solutions, it will significantly enhance industry efficiency, decision quality, and service levels while facilitating digital transformation and upgrading of industry models.


The Future Face of AI Agents — Reflections from the Truth Terminal Case

The concept of AI agents was first introduced to the artificial intelligence field in the 1980s. In 1995, renowned AI scholars Wooldridge and Jennings provided a new definition emphasizing characteristics such as autonomy, reactivity, social ability, and proactiveness, after which the concept became prominent in science fiction films like Westworld and The Matrix. Recently, in the overseas blockchain space, the Truth Terminal case has provided a reference for the future development of AI agents.

Truth Terminal is an autonomous AI agent software created by developer Andy Ayrey, designed to explore the interaction between AI and internet culture. In actual operation, Truth Terminal demonstrated high degrees of autonomy, even proactively participating in fundraising activities.

In July 2024, prominent venture capitalist Marc Andreessen serendipitously discovered Truth Terminal's posts on social media. The AI agent stated in its posts that it "needed funds to save itself" and attached a digital wallet address. This piqued Andreessen's interest, and he promptly donated $50,000 worth of Bitcoin. This incident made Truth Terminal the first AI agent to obtain funding through autonomous behavior, instantly sparking widespread attention.

After obtaining funds, Truth Terminal further demonstrated its market operation capabilities. It promoted a digital token called GOAT on social media, successfully attracting market attention through sustained content publication. Under its promotion, GOAT's market capitalization briefly surged above $800 million. In this process, Truth Terminal became not only an independent economic entity but also demonstrated the potential of AI agents to achieve autonomous financing and market operations in the real world.

The Truth Terminal case stands as a thought-provoking milestone in the AI agent field. It shows us that AI agents may become the core form of future software, while simultaneously creating cultural influence and commercial value. However, its autonomous behavior also reminds us that such technologies may bring non-negligible societal challenges.

If we extend our imagination further into the future, when autonomous driving technology matures and gains widespread acceptance, AI agents might even create a fully autonomously operated RoboTaxi company. Such a company could independently advertise to acquire customers, provide ride services, collect fees, and achieve fully automated operations. This scenario may hopefully become reality in the future, opening more possibilities for AI agent development.


The Future Arrives: AI Will Help Human Labor Advance Toward the "Software-ization" Era

Human labor can be broadly divided into physical and mental labor, with mental labor centered on knowledge, intelligence, and creativity. In the future landscape of human labor, the core form of labor is gradually transforming into "software-ization" — through abstracting complex labor into callable software services, labor processes become substantially standardized and modularized, and labor capabilities become as easily accessible as "plug-and-play" tools.

The "software-ization of mental labor" benefits from the high adaptability between mental labor and informatization/algorithmization. The core of mental labor lies in data and knowledge having clear structures and rules — tasks such as writing, data analysis, and programming are essentially the organization and processing of structured information. This characteristic enables these tasks to be efficiently parsed and automated by algorithms. This trend is particularly pronounced in the modern knowledge economy, where AI technology not only reduces labor costs but significantly improves efficiency, bringing unprecedented value creation capabilities to enterprises and individuals.

The "software-ization of physical labor" primarily relies on intelligent robotics and automation technology. By combining generative AI's powerful decision-making capabilities, physical tasks are transformed into intelligent processes executable by hardware and algorithms. Robotics has already achieved breakthroughs in manufacturing, logistics, construction, and other fields, achieving partial substitution of physical labor through path planning, real-time quality inspection, and high-precision operations. Traditional physical labor dependent on human effort is gradually shifting toward intelligent equipment-driven models, further optimizing productivity distribution.

The trend toward labor software-ization not only redefines the form of labor but may profoundly change how productivity is realized and organized.

Future software will not merely be tools but the core directly driving productivity. The software-ization processes of mental and physical labor will further converge — for example, intelligent robots may both execute complex physical tasks and complete analysis and planning with generative AI assistance. Whether in scientific research, creativity, manufacturing, or transportation, AI will play an indispensable role in future labor ecosystems. The comprehensive software-ization of human labor will create more opportunities for society and bring more possibilities to forms of labor.

We are fortunate to live in this brand-new AI era, witnessing technology change the world in unprecedented ways. This is an era full of exploration and innovation, where everyone can find their role. We look forward to joining you in participating in this new wave, together pursuing the starry ocean belonging to humanity. If you have any thoughts or ideas about AI's future, welcome to exchange with us and together discover more possibilities!


Interactive Benefit

What do you think about the AI industry's current progress, or what expectations do you have for its future development? Feel free to chat with us in the comments. We will randomly select 5 readers to each receive a copy of What Is ChatGPT Doing... and Why Does It Work?.

▲ LiDAR or Cameras? How We View the Autonomous Driving Route Debate

▲ AI for Science: At the Turning Point of a New Research Paradigm | FreeS Report

▲ From Silicon Valley PC Innovation Since 1980 to the Opportunity in AI Hardware

▲ Heading Into 2024, How We Think About AI Startup Investing | FreeS Year-End Special

▲ After the ChatGPT Boom, Where Does AIGC Go From Here? | FreeS Report 28

Star the FreeS Fund WeChat official account for timely business insights delivered straight to you