The Evolution of Coding Agents: A Three-Hour Deep Dive with Chinese and American Agent Founders, Alibaba Researchers, and Investors | Seriously Speaking EP34

真格基金·January 3, 2025

The future is arriving faster than ever.

Hear ZhenFund in your ears.

"For Real" is a general business podcast. We want to build a platform for sharing and exchange where anyone curious about business, technology, and venture capital can find something worthwhile. Each episode features a different ZhenFund investor as host, joined by leading figures across industries to take you deep into tech trends and the impact of emerging technologies. When it comes to breaking down hot topics in tech, we only give you the most professional analysis.

Of course, we hope this is more than just a podcast — it's an exploration of entrepreneurship. ZhenFund: your first stop on the startup journey! We look forward to meeting you and discovering new possibilities together.

Looking back at 2024, AI Coding was undoubtedly one of the hottest fields of the past year. The emergence of multiple unicorns — Cursor, Poolside, Cognition, Magic, Codeium, Replit — has repeatedly validated market demand.

On a longer time horizon, the Agent is the smallest viable unit for AI to integrate into daily life. In less than two months, the Coding Agent has completed two leaps in product form: from Cursor, the IDE programming assistant capable of contextual prediction, to Replit, which supports multi-turn dialogue. What users receive as "deliverables" is no longer web pages based on search keywords, but software generated according to their own needs. On December 11, Cognition AI officially launched "AI programmer" Devin to the public. It can not only offer users advice and automatically execute command tasks, but independently complete the development of entire software projects — once again opening up new imaginative space for what a Coding Agent could become.

On December 21, the final day of OpenAI's 12-day release marathon, the o3 model made its grand entrance. On SWE-Bench Verified, a benchmark composed of real-world software tasks, o3 achieved 71.7% accuracy — more than 20 percentage points higher than the o1 model. On ARC-AGI, which evaluates AI's human-like reasoning capabilities, it scored 87.5%, breaking through the human-level threshold (85%) for the first time. OpenAI CEO Sam Altman said at the launch event, "We think this is the beginning of the next phase of AI. o3 is able to do many complex tasks that require deep reasoning, and its performance in programming and math is incredible."

Looking ahead to 2025, there are even more questions worth pondering in AI. After breakthroughs in foundational large models, what forms will the AI Agent take? Where do the core capabilities and technical moats of Coding Agent products lie? What kinds of new employees will future organizations need? What form will the next generation of the "internet" take? In open-ended imagination oriented toward the future, the Coding Agent and the o3 series with RL (reinforcement learning) as its new paradigm will be unavoidable focal points in any discussion.

At present, outdated imagination can no longer satisfy technology's ambition. Perhaps an end-to-end Coding Agent can accomplish far more than coding itself. Starting from changing how people work, the Agent is the smallest unit for humanity's march toward AGI, and each iteration's new paradigm is a repeated reminder of one fact — the future is arriving faster and faster.

Content Outline

Why does Devin demonstrate the "scaling law" of work?
Replit Agent, which first led the Coding Agent trend, and the open-source Devin project OpenHands — what were the key design decisions?
In the coding field, will a Devin-like form win it all in the future?
Beyond underlying model capabilities, what are the core strengths and moats of Coding Agent application companies?
What profound impacts will Coding Agent have on future social organization and forms of work?
How should we view o3's abilities surpassing most humans? Where does future development space lie?

The Future Is Arriving Faster and Faster

Hosts

Monica Xie: Vice President, ZhenFund

Peak (Co-host): EIR, ZhenFund

Guests

Yusen Dai: Managing Partner, ZhenFund; Co-founder, Jumei

Li Zhen: Core member of Replit Agent; Senior Engineer, Replit

Wang Xingyao: Co-founder and Chief AI Officer, Allhands AI (FIFIE OpenHands); PhD, UIUC

Hui Binyuan: Scientist, Alibaba Tongyi Lab

Timeline

Evolution of Coding Agent

01:56 Guest introductions and interesting projects they've experienced recently

13:36 Four generations and three evolutions of Coding Agent

16:53 The new imagination sparked by Devin: the Scaling Law of work

Startup Stories of Two Coding Agent Companies

20:47 Replit Agent's entrepreneurial journey and key milestones

25:50 "The output of excellent founders will be infinitely amplified."

35:59 Several technical updates: Integration, Edit, and UI

38:06 Will different Coding Agent product forms converge in the future?

37:30 The unusual birth of OpenHands

47:48 "Use limited space to do unlimited things."

52:25 Open source: using coding to drive a kind of technological democratization

1:02:40 What is the long-term competitiveness of coding products?

o3 Release: Breakthroughs, Limitations, and Trends

01:07:20 The future of AGI is already within reach

01:18:10 What else does o3 need to solve complex real-world problems?

01:22:23 SWE-bench has been "maxed out" — what's the next benchmark?

The Future: How to Build a Good Coding Agent

01:34:27 Review of key moments in Coding Agent development

01:39:23 Future organizational forms and the ideal engineer profile

01:56:11 How to improve models' multi-step task capabilities?

02:05:54 New opportunities in the underlying tech stack after Agent proliferation

02:11:30 Entrepreneurial opportunities from an investor's perspective

02:23:27 Rapid-fire Q&A

Related Links

Cursor: https://www.cursor.com/

Cognition labs/Devin: https://app.devin.ai/

Replit: https://replit.com/

Replit Agent: https://docs.replit.com/replitai/agent/

OpenHands:

Official site: http://github.com/
Paper: http://arxiv.org/

VisualWebArena: http://arxiv.org/

TheAgentCompany:

Official site: http://the-agent-company.com/
Paper: http://arxiv.org/

Computer use by Anthropic: https://www.anthropic.com/news/3-5-models-and-computer-use/

You can listen to us on Xiaoyuzhou, Apple Podcasts, and Ximalaya. If you have any suggestions or expectations for the show, feel free to interact in the comments section~

If you have any startup ideas or collaboration ideas, welcome to email us at media@zhenfund.com!

Recommended Reading