The Ambitions, Dilemmas, and Endgame of AI Coding | FreeS Research Institute

峰瑞资本·May 22, 2025·63·1

After OpenAI and Apple Bet on AI Coding, Do Startups Still Have a Shot?

This May has been packed with landmark events in AI coding. On May 3, Apple teamed up with startup Anthropic to develop an AI-powered Vibe Coding platform. Three days later, OpenAI was reported to be acquiring Windsurf, a rising star in AI programming, for $3 billion. On May 17, OpenAI launched Codex, an agent integrated into ChatGPT capable of automatically generating, debugging, and optimizing code. Then on May 20, Meituan announced it would roll out "NoCode," an AI coding tool — injecting a "China variable" into this global race. AI coding tools are now exploding onto the scene worldwide. From GitHub Copilot to the breakout hits Cursor and Devin, and domestically to ByteDance's Trae and Alibaba's Tongyi Lingma, large language models are pushing AI coding beyond simple code completion toward more intelligent, end-to-end solutions. Can AI coding fulfill its ambition of executing complete programming tasks? What path will it take, and where will it ultimately land?

On the podcast What's Next | Tech Matters, FreeS Fund venture partner Chen Shi, Sheng Dong Huo Po co-founder and host Diane Ding, and producer Yaxian explored these questions, including:

What trajectory will AI coding follow, propelled by large models?
Where does the core moat of AI coding lie? Does "whoever controls context, controls the market" hold true?
Why is finding "non-consensus" directions so crucial in AI coding?
Is "coding for beginners" a false premise? What does the user profile for AI coding look like?
In the Chinese and American startup ecosystems, which AI coding tracks will startups and tech giants each focus on?
How will the AI coding pie ultimately be divided? Will this boom reshape the chip market?

We've edited excerpts from the podcast to offer fresh perspectives. We welcome you to follow along as we continue observing and discussing these developments. You can also find the full episode on Xiaoyuzhou or Apple Podcasts by searching for and subscribing to What's Next | Tech Matters.

We look forward to continuing the conversation. If you're also following the chip industry or are building a startup, feel free to reach out to Chen Shi at FreeS Fund (chenshi@freesvc.com).

Giveaway: What innovation opportunities do you see in AI coding? Share your thoughts in the comments. By 17:00 on May 28, 2025, the three most thoughtful commenters will each receive a copy of Yuval Noah Harari's Nexus: A Brief History of Information Networks from the Stone Age to AI.

/ 01 / Copilot or Agent: Two Evolutionary Paths for AI Coding

Yaxian: AI coding has been around for a while, but only broke into the mainstream in the past two years. What's the evolution path been, and what are the typical product forms today?

Chen Shi: In programming, there's a concept called "pair programming." Two programmers work at one computer — one is the "driver," actually writing code and focusing on implementation details; the other is the "navigator," observing the code, thinking about the big picture, spotting problems, and making suggestions.

AI coding truly originated in 2021.

Before that, companies like Tabnine had tried using machine learning for AI coding. But large models hadn't yet taken off, and AI coding products struggled to deliver.

In 2021, Microsoft's GitHub partnered with OpenAI to develop GitHub Copilot, and AI coding finally had a viable product. Copilot was positioned as the driver in pair programming — the one actually writing the code — with humans serving as navigators. Early on, Copilot mainly did code completion, enhanced by GitHub's code repository and the GPT-3 model.

By late 2022, after GPT-3.5 launched, AI coding made real progress along two tracks:

One track is the Copilot assistant — human-led, AI-assisted. Products like GitHub Copilot, Cursor, Windsurf, and Trae have reached practical utility. GitHub Copilot in particular has surpassed 15 million users and contributed over 40% of GitHub's revenue growth in fiscal 2024.

The other track is the Agent — AI-initiated execution with human oversight. The original vision was for agents to independently complete full programming tasks, but this hasn't been fully realized yet, and product-market fit remains elusive.

Take Devin, for instance. It aims to be a fully autonomous AI software engineer. We've also seen companies building vertical agents for specific tasks like unit testing (verifying the smallest testable units in software, typically functions or methods) or code review (examining code to ensure quality and catch potential errors).

Yaxian: If I understand correctly, Copilot is more like a tool, while an Agent is more like a person who can understand what you want from start to finish and deliver a result without much human intervention.

Chen Shi: Building agents is actually extremely difficult. To do it well, you need strong enough model capabilities, especially sufficient context window length. Currently, the context length — or "brain capacity" — that Cursor's AI models provide is only 200,000 tokens. I think 200,000 or even 1 million tokens is far from enough.

Additionally, agents need strong human context collection capabilities — gathering and understanding individual user context or enterprise context — otherwise they can't grasp underlying needs. Something like "build a short-video app" is an incredibly complex requirement that's hard for people to articulate clearly.

Agents currently face constraints on both model capability and context collection, so collaborative products like Copilot are more likely to break into the market first.

Yaxian: You mentioned Devin. It caused quite a stir when it launched and has since raised a lot of money, so why hasn't it become a truly practical product?

Chen Shi: Devin is positioned as an all-in-one solution for writing complex software from scratch. "Complex" most directly translates to code volume.

Take Google's Chrome browser — it has roughly tens of millions of lines of code, with each line containing maybe 5–10 tokens. A typical AI model obviously can't hold that many tokens.

▲ GPT's compilation of code volumes for major software.

And it's not just Chrome. Distributed systems or applications like Facebook and Netflix reportedly have hundreds of millions of lines of code. Without massive "brain capacity," AI coding products can't understand the global architecture of a system or application, let alone design one.

So AI coding isn't as simple as writing a few lines of code. Here's an analogy: designing and writing complex software requires reading through extensive documentation and code — equivalent to reading every book in an entire library — before you can understand each book's contents and the logical relationships between them. Only after absorbing that volume can you do AI coding for complex software well.

I think Devin's positioning is quite good and ambitious, but it currently seems stuck at an intermediate stage — it can write some simple or medium-scale code, but probably can't handle complex code, or still needs human help to do so.

▲ AI coding product landscape.

Yaxian: We saw a chart summarizing the major AI coding products. The vertical axis shows L1 to L5, representing automation levels — higher numbers mean greater automation. Devin's ideal might be reaching L4, becoming an AI engineer rather than task-based or project-based — a higher stage of autonomy.

Chen Shi: AI coding products are still immature. Being able to execute some task-level simple tasks is already pretty good; we're not even at project level yet. There may still be opportunities for Copilot and some vertical-demand agents to mature, become more practical, and gain market acceptance.

/ 02 / Whoever Controls Context, Controls the Market?

Diane: Right now the safest, most robust approach for programmers is probably still using Copilot for AI-assisted code completion. That's also where willingness to pay is strongest, right?

Chen Shi: Right. Programmers are a relatively well-defined demographic with high incomes — or high costs to employers. If you can improve efficiency, both companies and individuals are willing to pay.

Many people ask: in the era of AI large models, if most capabilities concentrate in the foundation model layer, what's the value of building AI applications? After all, large models might just build the applications themselves. This is something the industry has agonized over, and I only figured it out in the second half of 2024. (Read Looking Ahead to 2025: What Innovation Opportunities Exist in AI? | FreeS Report)

I believe future value, moats, and technical accumulation in AI will concentrate on two sides: "cloud" and "edge."

The cloud side is large models or cloud services — basically where AI applications get their intelligence, creativity, and planning capabilities.

The edge side has a clear mission today: capturing user context. For individual users, context might mean habits, backgrounds, choices and preferences across various products and applications. For enterprises, context means code repositories, internal data, knowledge bases or documentation, and industry domain knowledge.

Then there's what we call "new context." When using Cursor for tasks, users are essentially doing data labeling themselves. If collected by companies like Cursor or Trae, this labeling is extremely valuable.

Moreover, there are now so-called MCP (Model Context Protocol) or Agent-to-Agent protocols (an open-source protocol launched by Google aimed at enabling communication and interoperability between agent applications) that let users call services from any other application on the client side. For example, maybe someday we'll be able to order Meituan takeout directly from Cursor or Trae. If so, user context could be collected across all application scenarios.

I think in the future of AI applications, it's likely that whoever controls context, controls the market.

Why would large model companies like OpenAI want to build their own client applications? To reach users directly. Data on the internet has been largely tapped for AI model pre-training and various post-training, but one category remains under-collected: user context — the true source of human demand. If you collect enough accurate context, you get data that can produce better training outcomes.

/ 03 / Is "Coding for Beginners" a False Premise?

Yaxian: Among AI coding products, Cursor feels more like a B2C business, while Windsurf seems more B2B. What's the industry consensus on B2C versus B2B in AI coding?

Chen Shi: What's "ripened" first is actually ToP — To Professional. For B2C, it's still quite hard for beginners to use these AI tools.

Take ChatGPT — professionals are probably more willing to pay. For average users, large model applications have a learning curve that not everyone can adapt to.

For AI coding products, targeting professional users is a solid path. There are two types of professional users. One is professional developers already working in development — these tools feel natural to them. The other is professionals without programming backgrounds, like product managers, data engineers, or people like me who used to code but don't anymore. These folks have the potential to guide AI in writing good programs. Both types might pay for AI coding products.

There are tens of millions of programmers globally. Add in professionals without programming backgrounds, and you're probably looking at several hundred million people.

Yaxian: Is "beginners have a need to write code" a false premise?

Chen Shi: I don't think it holds up. Most beginner users probably don't need programming — they need better office software.

Large models now have some upgraded office features built in, like OpenAI's Canvas (assisting with writing and programming) and Claude's Artifacts (assisting with programming and content generation). In the future, we might see new office applications that can execute small programming-like tasks through natural language, but they'd essentially still be office software.

/ 04 / Will Vibe Coding Become a New Trend?

Diane: Vibe Coding has been super hot lately, but it seems like just a new name for AI coding — it's still about how programmers use AI to code.

Yaxian: Vibe Coding is a new programming paradigm proposed by AI expert and former OpenAI co-founder Andrej Karpathy. It refers to describing what you want in natural language rather than typing code line by line like traditional programmers, letting large language models generate the code. If Vibe Coding achieves this paradigm shift from programming languages to natural language, programmers wouldn't need to look at code details — just provide guidance or feedback — to fulfill programming needs.

Chen Shi: Back when I programmed, the lowest-level language I wrote was assembly, which basically maps to machine code — called a "low-level language." Then came intermediate languages like C, and finally high-level languages like Python, Java, and JavaScript. Programming languages evolve toward greater abstraction, causing programmers to lose "low-level" control.

But this "loss of control" isn't necessarily bad. If everyone still used assembly language today, I doubt we'd have so much great software. When compilers and similar tools can accurately map high-level and intermediate languages to low-level ones, people naturally should move toward higher, more abstract levels.

Andrej Karpathy once said: "The hottest new programming language is English." Here "English" can be understood as natural language.

▲ Image source: X. In February 2025, Andrej Karpathy proposed the concept of Vibe Coding. The idea is conversing with the model in natural language to have it write and modify programs, essentially "forgetting" about the code in the process.

Of course, Vibe Coding still has all sorts of problems. From my personal experience: I once used a Vibe Coding product and asked the AI to write a program in natural language. But the program kept failing to compile. The AI diagnosed a framework version issue that needed upgrading, but after tinkering for a while it still couldn't fix it. Finally I explicitly told it to switch to a different framework, and it worked immediately.

As a "veteran programmer," I roughly knew where the problem was. But if it had been a "coding beginner," and you didn't point the AI in a new direction, it would just keep going in circles within the original framework — probably not a great experience.

But I think Vibe Coding is achievable in the future.

Compared to programming languages like Python or JavaScript, natural language is more abstract but possibly less precise, prone to ambiguity in interpretation. Yet precisely because natural language is more abstract, it's highly efficient in expression. If combined with programming tools or mathematical formulas, perhaps even "coding beginners" could build relatively complex applications.

Diane: In a few years, as Vibe Coding matures, will it develop into programming with purely natural language?

Chen Shi: Looking ahead about five years, it's possible to build small-to-medium scale software applications with Vibe Coding, given how fast models are developing today. But for very large software applications, the more ideal approach might be leaving a human assistance interface — having a veteran programmer oversee, guiding the AI while also labeling, which helps the model learn better and Vibe Coding gradually improve.

/ 05 / Different Ecosystems in China and the US: How Should Startups and Giants Choose Tracks and Opportunities?

Yaxian: I'm quite curious about startups versus big tech in AI coding. Many breakout coding companies in the US, like Devin and Cursor, are startups. Domestically we also have products like ByteDance's Trae and Alibaba's Tongyi Lingma, where big tech seems to have responded faster. How do you see startups versus big tech developing in AI coding?

Chen Shi: I spent five years at a big tech company. When big tech greenlights a project, they tend to favor "visible" directions, or ideally ones where PMF has been preliminarily validated. Choosing consensus tracks makes sense for them — they don't need to try every fringe new thing themselves.

It's fine for big tech to leave room for startups to experiment, then acquire or build it themselves later. This is normal, and actually creates opportunities for small companies.

Small companies have several characteristics. First, faster innovation speed — fewer people, more agile, and very importantly, willing to try radical, non-consensus ideas. Second, high technical density — though small, they have high talent density and work efficiently without big company processes. Also, small companies can fully leverage open-source external ecosystem support.

The AI coding market is enormous, everyone is still early stage, and the endgame isn't visible. So big tech and startups are basically at the same starting line — there are plenty of opportunities.

But ultimately, startups should seek "non-consensus" opportunities. Cursor and Devin are both classic examples of targeting non-consensus directions.

Before Cursor, GitHub Copilot existed as a plugin for VS Code (Microsoft's open-source code editor, basically the "Word" of programming). Cursor proposed building a complete code editor from the start.

This was a highly non-consensus move. For a startup, attempting extensive modifications to a code editor with over 500,000 lines of code involved massive engineering effort and technical risk. But if executed well, controlling the code editing environment creates opportunities to build significantly superior features beyond GitHub Copilot.

Today, Cursor's context collection and packaging capabilities are very strong. When code errors occur, Cursor automatically packages the code, environment settings, error messages, and other context to send to the large model — no manual copying needed by the user, greatly improving efficiency.

Devin's "non-consensus" lies in its positioning. Devin positioned itself as a "fully autonomous AI software engineer," once criticized as "aiming too high." But high positioning is also protective — without much certainty, big tech may be reluctant to chase it.

Yaxian: Could Devin be an ultimate form of AI coding?

Chen Shi: Devin may be aiming for the ultimate form, but whether it can achieve it, how, and whether it's the one to do so are all uncertain. At least its positioning is unique, and it can keep developing in this direction. If models or other conditions become sufficient, perhaps Devin can pull it off. That's the advantage of small companies — big tech finds this hard to do. When the technical route or product branch is unclear, big tech tends to wait and see.

Yaxian: It seems US big tech is moving slower than Chinese big tech.

Chen Shi: US big tech follows more slowly, which actually leaves opportunities for startups to achieve original, positioning-level breakthroughs.

Some of FreeS Fund's portfolio companies building AI applications — I encourage them to go global first, test the waters, maybe find more opportunities, then "export back to domestic" later.

In the near future, in AI, China will evolve toward unique creations, just like in China's mobile internet era. We may start by learning from the US, then eventually build better products and business models that we can export back. Chinese teams are very strong in ToC product capabilities.

In the mobile internet era, China surpassed the US in both application quantity and user scale. The last globally viral ToC application from the US was probably Instagram, over a decade ago.

In the AI era, the massive wave of Chinese AI applications going global also proves Chinese companies have the capability. When conditions are right, they can return and create new species.

/ 06 / What Endgame Awaits AI Coding?

Diane: From an investment perspective, AI coding is a hot track. It may be where PMF is furthest along in AI, and the most commercially successful. In both China and the US, is this opportunity window slowly closing? It feels like leaders are already emerging.

Chen Shi: The Copilot landscape in China is already fairly clear — it's a track being targeted by big tech.

We encourage startups to make hard choices and find non-consensus directions. For example, try to predict what capabilities next-generation models will have, and think about what scenarios to apply them to. Another category is going deep into vertical applications, like in biology. These are good opportunities.

For coding specifically, I'd suggest startups try Agent, because the Agent technical route is harder. Everyone's at the same starting line anyway — might as well take a bet.

Yaxian: In the future, will the AI coding pie be divided among different players, or could it end up winner-take-all?

Chen Shi: AI coding is still in flux on all fronts — interaction, vehicle, model capabilities, and how far context can go are all question marks.

Future AI software development involves many complex factors: model context windows need to be long enough, humans need to willingly do labeling on the front end. Moreover, writing software with code versus writing it with "neural networks" are completely different things, and which method user demands should be implemented through is also uncertain.

From a historical perspective, coding is a development tool that has played an important role in human society. From the advent of computers to today, coding's vehicle, target, and users have all changed. In the future, coding will still have various possibilities, and right now we can hardly predict its endgame.

Yaxian: The endgame is hard to predict, but opportunities still exist.

Chen Shi: Right, since no one can see clearly, startups can also "fish in muddy waters."

Yaxian: After our conversation today, I have one takeaway. If we think of AI coding's endgame as a very high step that we can't reach in one stride, we can build many smaller, shallower steps and climb up step by step.

Chen Shi: In investing there's a concept called "laying eggs along the way" — some projects can output阶段性 products or services before reaching their designed end goal, generating certain technical validation and commercial revenue, while also contributing to the ultimate objective.

Take Devin: having it directly complete a complex requirement might cause errors at different stages. Consider introducing human assistance and prompts, breaking requirements into阶段性 components, laying eggs along the way, completing step by step. This is also a process for Devin to iterate and evolve.

If programming product design follows a similar evolutionary process, laying eggs along the way can not only sustain itself and train the team, but also accumulate experience, gradually moving toward the endgame.

▲ From the Century-Long Evolution of US and Japanese IP Industries, Viewing Trends in China's IP Economy | FreeS Report ▲ Looking Ahead to 2025: What Innovation Opportunities Exist in AI? | FreeS Report

▲ Seven Core Questions About DeepSeek, Explained | FreeS Report

▲ DeepSeek Fired the First Shot, But AI Democratization Is More Worth Anticipating | Conversation with Xingyun's Ji Yu

Star the FreeS Fund WeChat Official Account for timely business insights