What's Changed and Where Are the Opportunities in AI Agents in the First Half of 2025? | FreeS Research

峰瑞资本·July 10, 2025·87·1

Model Wars, App Boom, Commercialization

In the first half of 2025, AI agents have surged forward, igniting a wave of "everything can be an agent" enthusiasm. This wave first manifested at the technical foundation — a fierce "arms race" in the model layer. Early in the year, DeepSeek shattered OpenAI's monopoly on reasoning models, sending shockwaves through the industry. Then OpenAI, Anthropic, Google, and other leading players took turns rolling out heavyweight models: o3 Pro, the Claude 4 series, and Gemini 2.5 Pro. The leap in model capabilities directly fueled an explosion at the application layer. With OpenAI's release of Operator (for executing web-based tasks) and Deep Research (for in-depth research), competition in the AI agent space intensified dramatically, with new products emerging constantly.

Major tech companies are all betting big on the agent track: Google plans to release Project Mariner this year, capable of operating browsers and other software; Baidu launched "Xinxiang," an app positioned as a general-purpose super-agent; and Alibaba's "Xinliu" project delves deep into exploring human-agent collaboration efficiency. However, critical questions around PMF (product-market fit), commercialization paths, and core product moats remain to be further explored by the industry.

AI agent represents the third stage of AI application development, following prompt engineering and workflow automation. The core value of AI agents lies in their ability to perceive environments, make autonomous decisions, and use tools. We believe that to achieve genuine breakthroughs and address the aforementioned challenges, reinforcement learning-driven continuous iteration will be the critical path for agent development.

Last Sunday, Pengqi Liu, Executive Director at FreeS Fund, and Qianhang Yan, Vice President at FreeS Fund, engaged in an in-depth discussion in a livestream about the entrepreneurial frenzy, technical breakthroughs, and development trends of AI agents in the first half of 2025. The questions they explored included, but were not limited to:

How should we understand the concept of AI agent? What consensus and disagreements exist in this space?
What technical breakthroughs have actually occurred in AI applications? Why does the industry broadly favor reinforcement learning-driven agents?
What are the core arguments in the "AI bible" The Bitter Lesson? What insights do they offer for AI agent development?
How should agents achieve real-world deployment? What innovation opportunities exist in this process? What will be the long-term moats?

We've edited portions of the livestream, hoping to offer fresh perspectives. Welcome to watch the replay on the FreeS Fund WeChat video channel, or search for "Gao Nengliang" on Xiaoyuzhou App and Apple Podcast to listen to this episode.

If you're an entrepreneur in the AI space, feel free to reach out to Pengqi Liu (pengqi@freesvc.com) and Qianhang Yan (qianhang@freesvc.com) at FreeS Fund.

Engagement Giveaway Have you used agent products? What conveniences have they brought to your work and life, and what limitations remain? Share your thoughts in the comments. By 17:00 on July 15, 2025, the three most thoughtful commenters will receive a copy of Mind and Body: The Philosophical Challenge of Cognitive Science and Artificial Intelligence.

/ 01 / What Surprised Us in AI This Half-Year?

Qianhang Yan: From DeepSeek's viral breakout early in the year to the wave of agent applications now, what unexpected developments have defined this half-year in AI?

Pengqi Liu: This year, since DeepSeek's release, the entire AI track has visibly accelerated, with key changes on both the model and application sides.

First, on the model side, reasoning models led by DeepSeek quickly captured market attention, pushing major players to accelerate their entry and kicking off an "arms race." DeepSeek's deeper significance lies in bringing reinforcement learning-based reasoning models into public view and formally opening a new track for large models.

Beyond product-level breakthroughs, model iteration speed has far exceeded expectations: OpenAI launched o3 Pro, Anthropic released the Claude 4 series, and Google unveiled Gemini 2.5 Pro. Leading vendors took turns "topping the charts," completely shattering earlier predictions of "slowing model iteration." Meanwhile, some companies are regrouping — for instance, Meta recently announced a $15 billion investment in data-labeling startup Scale AI and restructured its AI division.

Notably, DeepSeek proved that the gap between domestic and international large model technology is not significant. Major tech companies are also stepping up model-layer deployments: Alibaba released Qwen 3.0, and ByteDance launched Doubao 1.6. Among China's "AI Six Little Dragons" (Zhipu AI, MiniMax, Moonshot AI, StepFun, Baichuan, 01.AI), some firms have lagged slightly, but top products are still iterating rapidly.

Second, on the application side, the landmark event was OpenAI's successive releases early this year of Operator (an agent for simple tasks) and Deep Research (an agent for in-depth research). 2025 is thus considered by the industry as the "Year One of AI Agents."

In this wave of agent entrepreneurship, Chinese teams appear frequently: Manus, Genspark, and other agent products have sparked widespread discussion and attention; large model vendors like MiniMax and Moonshot AI have also joined the fray, releasing their own agent products.

Third, the AI coding track has validated PMF — product-user fit. The breakout tools Cursor and Windsurf (acquired by OpenAI), plus the rapid growth of Lovable, Replit, Bolt, and others, have all become hot topics in the industry. (Welcome to read "The Ambition, Dilemma, and Endgame of AI Coding | FreeS Research")

Taken together, the entire market and track in AI are riding a wave of enthusiasm.

Qianhang Yan: Breakthroughs in model reasoning capabilities were another major highlight of the first half. The industry's focus is shifting from the "pre-training" Scaling Law (data scale effects) to the "post-training" Scaling Law.

Pre-training refers to improving foundational model capabilities through parameters, data, and compute. Post-training optimizes model performance through reinforcement learning, human feedback, and other techniques. Previously, Scaling Law effects mainly meant continuously investing in parameters, data, and compute to obtain increasingly powerful models.

The turning point came when the DeepSeek team launched the R1 model, applying reinforcement learning at scale during the post-training phase. Even with minimal labeled data, this improved model reasoning capabilities and achieved a Scaling Law for inference performance.

On the application side, an interesting phenomenon: OpenAI, Google, and Microsoft have all entered the agent space, with some even arguing that OpenAI is essentially an "AI agent company powered by language models."

Previously, we believed AI applications needed to maintain some distance from model vendors; otherwise, with unclear model boundaries, applications risked being overwhelmed by rapid iteration. But in this year's agent wave, some model-focused companies have actually gained ground in the application market by excelling at user experience delivery.

The current market has seen a surge of "everything can be an agent" enthusiasm. Major tech companies' involvement has pushed the model side toward a "universal arms race" — Gemini 2.5 proposed the AIOS concept (AI agent operating system, embedding large language models into the OS as a brain), and competition between the domestic "Six Little Dragons" and major tech companies has reached a fever pitch. On the application side, companies like Cursor represent efforts to promote and validate agents in existing scenarios.

Pengqi Liu: This war is far from over. Large model vendors are building their own applications and agent products, and many startups are too. The boundary between models and applications is growing increasingly blurred, and it remains to be seen who is more likely to win in the long run.

Looking back at this half-year, something new may have happened every day, with many conclusions quickly falsified. Our current views may not be correct either — this is a process of staying open and continuously learning.

/ 02 / Three Evolutions of AI Applications: Where Did the Agent Paradigm Come From?

Qianhang Yan: What exactly is the definition of "AI agent"? What are the essential differences between different applications?

Pengqi Liu: Since OpenAI released ChatGPT in late 2022, propelling AI applications onto a new track, there have been roughly three approaches to task processing:

The first stage is the prompt (prompt engineering, i.e., conversational interaction) form. Users input prompts, make requests, and the large model directly outputs answers. This is the most basic and widespread form of AI application.

The second stage is the AI workflow form. The large model connects to external data sources and completes tasks through multiple steps along predefined nodes and paths set by humans.

Compared to the first stage, workflow adds data reading and processing, but still relies on expert-preset fixed flows. While controllable, it lacks flexibility and generality. Currently well-deployed and commercialized applications are mostly based on this form, such as Dify (providing a low-code development platform for quickly building marketing copy and user persona analysis), Coze (intelligent customer service, voice assistants), and LangFlow (low-code, visual AI application building tool).

With OpenAI's release of Operator and Deep Research, AI applications have entered the third stage — AI agent. Its broad definition is "an intelligent system capable of autonomously perceiving the environment, making decisions, executing tasks, and achieving goals." This can be understood by breaking down the keywords:

"Perceiving the environment" allows AI to more fully understand user needs, instructions, and contextual information, including even long-term memory. At the same time, AI can further change the environment, which depends on the critical breakthrough in Tool Use (tool usage) capability during "executing tasks."

"Autonomous decision-making and planning" — unlike workflow's reliance on expert-preset fixed flows, agents can autonomously decide task steps. While workflow has advantages in controllability, it has limitations in flexibility, generality, and generalization capability. Agents with autonomous decision-making capabilities, though still facing challenges in task execution success rates, have shown potential far beyond expectations. The combination of these characteristics has pushed the third-stage agent application form into the public eye.

How Do Tool Use and Reinforcement Learning Empower Agents?

Qianhang Yan: Combining what Pengqi mentioned — that agent core characteristics include perceiving the environment, autonomous decision-making, and Tool Use capability — what exactly are the core advantages of agents compared to AI applications represented by ChatGPT? Which specific tracks are more suitable for deployment currently, and what challenges exist?

Pengqi Liu: This year's core change for agents is the breakthrough in Tool Use capability.

Specifically, from programming to browser-use (agents simulating user operations in browsers), to computer-use (agents controlling computer systems), and with the rising adoption of the MCP universal interface (Model Context Protocol, a unified standard enabling seamless connection between AI models and external resources), agents' Tool Use capabilities have been enhanced, enabling more efficient information acquisition from external sources.

A core limitation of large models regarding world knowledge was that training data only included public data up to a certain date, lacking real-time data and private domain data injection. With Tool Use capability, AI can autonomously retrieve information and interact with the external world, achieving an order-of-magnitude improvement in information acquisition compared to previous versions.

Today, agents have validated PMF in the development and programming track. Tools like Cursor have proven that certain closed-loop operations in programming can be fully delegated to agents. More critically, this year's technical breakthrough in reinforcement learning has significantly improved large models' reasoning capabilities, further enhancing agent practicality.

Qianhang Yan: Let me add why agents were able to prove their value first in the AI coding track. Programming is essentially a combination of "text + language data," with highly structured training data. Thus ChatGPT showed strong code generation capabilities from its debut, though early code often had hallucination issues and couldn't be directly verified by running it through a compiler.

By integrating mature software development toolchains built over the past two to three decades, AI coding can form a complete closed-loop system across code writing, debugging, and compilation output, running independently in virtual computer environments. This provides strong support for agents' efficient iteration and experimental verification.

By contrast, embodied intelligence scenarios face higher deployment difficulty. The core challenge is that robots need to interact directly with the physical world, and there is a significant gap between code instructions and actual execution. Model-level iteration alone is insufficient for rapid agent breakthroughs in embodied intelligence.

Tool Use has empowered agents, so how will reinforcement learning further agent development?

Pengqi Liu: The starting point for this round of agent deployment is indeed the improvement in Tool Use capability, but future development will still depend on reinforcement learning. In my view, reinforcement learning-iterated agents represent the path for AI applications toward "ultimate intelligence."

In fact, the concept of "agent" originated from the reinforcement learning field. The classic textbook Reinforcement Learning: An Introduction defines an agent as "an entity that executes actions in an environment and adjusts behavior based on environmental feedback to achieve long-term goals" — highly consistent with the agent concept discussed in current AI applications.

"Reinforcement learning" originated in computer science and later cross-pollinated with cognitive science, psychology, and neuroscience. It represents not only an iterative, evolutionary path in computer science but also one of the universal laws of evolution.

Including reinforcement learning, large model evolution also has three stages. A relatable example: a student attending school and listening to lectures resembles "self-supervised imitation learning" in large models (the pre-training phase based on massive public unlabeled data); a teacher explaining example problems is "supervised fine-tuning" (supervised training based on specific labeled data); and truly mastering knowledge through homework and exams with feedback is typical "reinforcement learning" (using reward models to guide training of the base model). This same pattern applies to biological evolution — for instance, each species' genetic combination is essentially an agent for different environments, also needing to become stronger through the evolutionary process of survival of the fittest.

The programming track was able to quickly validate agent value because it has a clear data feedback closed-loop environment. Whether code is correct or not is easily verifiable, with very clear reward signals, allowing rapid agent capability iteration.

In the future, to enable agents to surpass competitors or even human intelligence, they must enter the reinforcement learning closed loop, autonomously exploring learning methods rather than relying on human guidance.

Qianhang Yan: Reinforcement learning has already been explored extensively in areas like robotics and game AI, becoming one of the foundational methods driving AI development.

OpenAI early on developed robotics and game AI applications through reinforcement learning. Once large language models' base capabilities became sufficiently powerful, we found that reinforcement learning played a key role in raising the ceiling of model capabilities. In other words, reinforcement learning can only unleash its maximum value when the base model has certain capabilities.

Using tennis as an analogy: a coach must first teach basic swing mechanics before practice can continuously optimize and iterate. If basic mechanics aren't mastered or contain errors, extensive reinforcement training may instead solidify mistakes, harm performance, and limit the ceiling. Thus, a model's ultimate capability ceiling is determined by both base model performance and reinforcement learning capability.

Therefore, before using reinforcement learning to develop agents, researchers need to consider two questions: First, do agents follow the pattern of "first achieving good base performance, then improving the ceiling through reinforcement learning"? Second, when will the industry enter the critical stage where "reinforcement learning brings massive improvements to agents"?

Pengqi Liu: From current observations, though multiple vendors have released their own agents, close examination of technical documents reveals significant path differences, roughly dividing into two forms:

The first is fully end-to-end, reinforcement learning-trained agents, represented by OpenAI's Deep Research and Moonshot AI's Researcher, with Manus as a typical example. Currently, these appear more suitable for breadth-first, general-purpose generalization tasks. "End-to-end" means the model's context understanding, tool calling, multi-step chain-of-thought, and other full processes are completed within a unified framework — currently only model vendors possess this capability.

The second is modularly decomposed agents, where different capabilities are broken down and assigned to different models or agents to jointly complete a task within an engineering framework. This modular approach currently appears more suitable for breadth-first, general-purpose generalization tasks. Under this framework, for instance, decision-making and reasoning might use a model like DeepSeek R1, while programming might use the Claude model. Reinforcement learning mainly acts on improving individual module capabilities, which are then connected through external engineering to achieve stronger overall performance.

Qianhang Yan: Currently, reinforcement learning's improvement of individual capabilities has shown results, but achieving end-to-end reinforcement learning still requires breakthroughs. This resembles model "post-training" (post-training refers to enhancing large models' adaptability in specialized technical domains through data-driven and algorithmic interventions) — it requires both deep understanding of large model post-training and productization capabilities. Currently, only a few model vendors and startup teams with both "model + product" capabilities possess such comprehensive ability.

How Does the "AI Bible" Influence Agent Development?

Qianhang Yan: A current controversy in the agent field concerns technical path selection — choosing between workflow-based agents or reinforcement learning-based agents?

Specifically, workflow-based agents emphasize visualization, interpretability, and controllability, offering higher transparency and stability, making them more suitable as a commercialization path in the near term. Reinforcement learning-based agents, while theoretically having higher performance ceilings, face greater technical and application challenges due to their unpredictable behavior and poor controllability, and are thus viewed as a more exploratory, long-term-oriented direction.

Machine learning pioneer Rich Sutton proposed in his 2019 classic article The Bitter Lesson that methods overly reliant on human prior knowledge and feature engineering to improve model performance are less effective than methods improving compute and data utilization efficiency. This prophecy precisely pointed toward the development path of large language models. OpenAI, for example, adopted the decoder-only architecture, which has advantages in scaling compute and data utilization, and is now the mainstream architecture for large language models.

So, do the core arguments of The Bitter Lesson also apply to the AI agent field?

Pengqi Liu: Reinforcement learning-driven agents align very well with The Bitter Lesson's conclusions. Reinforcement learning essentially means not teaching the model too much, only giving it prior capabilities — how to iterate and improve in the future depends on the agent itself.

Specifically, achieving the goal of autonomous agent learning requires doing well in two aspects.

First, certain prior capabilities are needed. For a "novice"-level agent, there may be too many search paths to find the optimal solution. Thus, agents need prior capabilities to improve themselves, including both the model's own capabilities and vertical domain accumulated knowhow.

Second, building a relatively good environment. There is some industry controversy about how to build environments. Currently, most general-purpose agents on the market pursue understanding user needs through conversational chat boxes and delivering results. But language, as a compressed form of information, struggles to describe needs or results in detail in complex scenarios. Thus, multimodal information is also important.

For example, graphical interface interactions — design drawing sometimes requires circling and modifying images, operations that can't be completed through language alone. This requires more complex interaction tools, allowing users to participate in the entire process. User feedback signals can further help agents iterate their capabilities.

So the second point is important: products need to build rich contextual environments and feedback loops between model and user. For instance, Cursor's early insistence on using an IDE (integrated development environment) was precisely to collect more feedback signals through deep interaction with users.

To summarize, to give agents the possibility of self-iteration, on one hand they need prior capabilities based on industry and vertical domain knowhow to first achieve PMF; on the other hand, they need to build human-computer interaction environments with sufficient feedback and context for long-term self-learning and iteration.

How Will Agents Land? What Innovation Opportunities Exist?

Qianhang Yan: Having reviewed the development and future expectations of AI agent applications, returning to venture investment: what are the current pain points and bottlenecks for AI agent deployment? What consensus and controversies exist?

Pengqi Liu: In the first half of 2025, many agent applications have launched, making significant progress in tool usage and reasoning capabilities, but evaluations of agents remain mixed.

Agents still face many technical challenges, such as whether the context they can capture is long enough, how to manage memory mechanisms, and how to reason about most subjective questions and random results.

On the tool usage side, agents' ability to access tools like browser search is already strong, but interacting with real physical environments and complex internal enterprise software systems still has a long way to go. Until these points are sufficiently developed, the ceiling for agent development remains relatively low.

A second difficult question is: what will be the moat for future agent applications? If we borrow from one evaluation criterion of the previous generation of internet applications — network effects — then for agent applications, the moat may lie in whether more users and usage can improve product experience and model capabilities. Current agent products may not have reached this stage yet; continued observation is needed.

A third question is how agent business models will evolve. Currently, agents mainly use subscription models. In the future, with more vertical agents emerging, will the subscription model remain sustainable long-term?

One hypothesis is that agents may shift to another model: paying by token usage, which is currently the main model for ToB services. But this model may have limitations for consumer applications, as users rarely have the habit or awareness of paying by cost. Another model is having users pay for results, but the value of results is also a subjective judgment. Additionally, if multi-agent collaboration is achieved in the future, the settlement method between general-purpose and vertical agents remains a commercial challenge.

Currently, there are still many variables in the agent field, and the industry has not formed complete consensus. When AI first emerged, people thought it changed productivity; now it appears to have changed many production relationships as well. How humans and agents collaborate and manage each other is a very interesting topic.

Qianhang Yan: In the agent field, what innovation opportunities do you focus on? What kinds of agents are you more optimistic about investing in over the next one to two years?

Pengqi Liu: From an investor's perspective, we may value vertical domain agent opportunities more, because vertical agents possess industry and sub-domain prior knowledge, have relatively closer relationships with users, and don't need to be completely constrained to the agent form.

The current situation is that most applications are still prompt-based, some have evolved into workflow, and only a few are beginning to experiment with agents. In the process of applications seeking PMF, workflow has already played a significant role. As model capabilities improve, workflow will gradually evolve into Agentic Workflow, eventually moving toward a fully agent-managed form — this development path is worth looking forward to.

Competition across the entire industry is extremely fierce right now. Everyone is racing to become the "world's first XXX," and we may not be far from a state where a single entrepreneur can build a unicorn. When entrepreneurs choose directions, we suggest building on existing experience and accumulation, extending product service chains as much as possible to cover tools, services, and delivered results.

Qianhang Yan: Let me add some views on ToC application directions. People often misunderstand that ToC products must be general-purpose, but many niche demands actually have sizable markets. So we also pay attention to deeply excavating AI product value in vertical scenarios within consumer environments.

Current ToC AI explorations, such as large model-based text generation and conversational interaction in short-chain, text-generation interaction scenarios, have already been captured by players like Moonshot AI and OpenAI. The real opportunities may lie in long-chain task planning and tool-based content generation for consumers, such as Deep Research's delivery of long-chain output results, or combining AI with hardware products.

Why are general yet vertical ToC products valuable?

We can find the answer from the development of smart hardware over the past decade. Early smart hardware emerged and flourished mostly in vertical scenarios, because the consumer base itself is large, and products in vertical scenarios that scale up have many opportunities to turn small pies into big pies, even creating new categories. We very much look forward to new products combining AI with consumer demand.

We anticipate explosive opportunities for AI applications and AI agents, and are very optimistic that various vertical domain AI applications in both ToC and ToB directions will further develop.

Even though we've discussed so many views today, many may be overturned in half a year. We look forward to further exchanges with entrepreneurs. In an era of continuous technological and cognitive iteration, maintaining an open mindset, continuous learning, and in-depth exchanges with peers are key to how we respond to uncertainty.

Engagement Giveaway

Have you used agent products? What conveniences have they brought to your work and life, and what limitations remain? Share your thoughts in the comments.

By 17:00 on July 15, 2025, the three most thoughtful commenters will receive a copy of Mind and Body: The Philosophical Challenge of Cognitive Science and Artificial Intelligence.

▲How Hard Is the "Artificial Sun"? Unveiling Core Technologies and Entrepreneurial Opportunities in Controllable Nuclear Fusion | FreeS Report

▲The Ambition, Dilemma, and Endgame of AI Coding | FreeS Research

▲Looking Ahead to 2025: What Innovation Opportunities Exist in AI? | FreeS Report

▲Seven Core Questions About DeepSeek, Explained | FreeS Report

Star the FreeS Fund WeChat official account for timely business insights.