The Evolution of Coding Agents: In-Depth Conversations with Chinese and American Agent Founders, Alibaba Researchers, and Investors

December 30, 2024·13·0

The long-awaited deep dive into Coding Agents is here as our 2024 finale! For this episode, we invited frontline entrepreneurs from both China and Silicon Valley, Coding LLM researchers, and AI investors for a discussion lasting over three hours (rare to find anywhere online). We cover everything from first-hand design breakdowns of Coding Agents to an analysis of the newly released o3's implementation challenges and future prospects. Fresh, hot content — absolutely worth your time and notes!

In less than two months, Coding Agent products have completed two leapfrog upgrades. If the first upgrade was from the IDE programming assistant Cursor to Coding Agents represented by Replit and Windsurf, then Devin's stunning launch undoubtedly opened up an entirely new realm of imagination — end-to-end Coding Agents can do far more than just coding.

More coincidentally, the early morning of our recording happened to fall on the final day of OpenAI's 12 Days of OpenAI event, when OpenAI o3 made its grand entrance. Its断层式 performance on the SWE-bench test set undoubtedly raised people's perception of LLM capability ceilings once again. Looking ahead to 2025, what else will happen in AI? Coding Agents and the o3 series with RL (reinforcement learning) as its new paradigm will undoubtedly be central topics in the next round of discussions.

Beyond that, in this episode we also explore:

Why does Devin demonstrate the "job completion" scaling law?
What were the key design decisions for Replit Agent, which first led the Coding Agent trend, and the open-source Devin project OpenHands?
Will Devin's form be the winner-takes-all future for Coding Agents?
Beyond underlying model capabilities, what are the core competencies and moats for Coding Agent application companies?
What profound impacts will Coding Agents have on engineers and future organizations and society?
How should we view o3's ability to surpass most humans? Where does future development space lie?

Meanwhile, we're deeply grateful for your support and companionship throughout this year! See you next year! Enjoy!

Host: Monica Xie Yan, Investment VP at ZhenFund Co-host: Peak, EIR at ZhenFund, former founder of Mammoth Browser

Guests:

Yusen Dai, Managing Partner at ZhenFund, co-founder of Jumei International Holding Limited
Li Zhen, core member of Replit Agent, Senior Engineer at Replit, ex-ByteDance, Google
Wang Xingyao, co-founder & Chief AI Officer at Allhands AI (FIFIE OpenHands), UIUC PhD
Hui Binyuan, Scientist at Alibaba Tongyi Lab

Timeline

Coding Agent Development History

01:56 Guest introductions and recent interesting projects they've experienced
13:36 Four generations and three evolutions of Coding Agents
16:53 Devin's new imagination: The Scaling Law of work?

Replit Agent

20:47 Replit Agent's birth journey and key milestones
25:50 "For excellent entrepreneurs, output will be amplified 10x or even 100x."
35:59 Replit's key updates: Integration, Edit, and UI
38:06 Will different Coding Agent product forms converge in the future?

OpenHands

37:30 OpenHands' unusual origin story
41:48 OpenHands' architecture design: Agent, EventStream, and Runtime
47:48 The ultimate ideal of Computer Use: doing infinite things with finite space
52:25 The decision to go open source
1:02:40 What is the long-term competitiveness of Coding Agent products?

Impact of o3 Release

1:07:20 What was most impressive about o3? What does it mean for the future of Coding and AGI?
1:18:10 After o3, what else is needed to solve real-world complex problems?
1:22:23 After SWE-bench is "maxed out," what will be the next benchmark?

Impact and Outlook of Coding Agents

1:34:27 What other important events happened in Coding Agents this year?
1:39:23 In future organizational forms, what is the ideal engineer profile?
1:56:11 How to improve models' multi-step task capabilities?
2:05:54 What new opportunities in underlying tech stacks can Agent普及 bring?
2:11:30 In investors' eyes, what are the future opportunities for China's coding agents?

Rapid Fire Q&A

2:23:27 Expectations for AI in 1 year and 3 years
2:32:15 Coding agent failure cases: Context and memory management
2:37:23 What capabilities in AI are currently overestimated and underestimated?

Key Concepts

LLM monitoring: Processes and tools used to supervise and manage Large Language Model (LLM) performance during deployment and operation.
SWE-Bench: A benchmark dataset for evaluating LLMs' ability to solve real-world software problems on GitHub. Given a codebase and an issue, the LLM needs to generate a patch that resolves the described problem.
Pre-training: The initial training phase of LLMs, where the model learns from large, diverse datasets containing trillions of tokens, aiming for broad understanding of language, context, and various knowledge.
Fine-tuning: Further training of an already pre-trained model on more specific datasets in particular domains.
Instruct-tuning: Used for models like ChatGPT and InstructGPT, where the model is trained (or further trained) to better follow prompt instructions, improving its ability to parse and respond to prompts in ways that align with user intent.
RLHF: Reinforcement Learning from Human Feedback.
Autonomous Agent: An AI system capable of independently executing complex tasks, understanding and responding to inquiries, and taking actions completely without human intervention.
Async: Asynchronous, where program execution no longer has sequential relationship with the original sequence.
Asynchronized Agents: Asynchronous agents that collaborate asynchronously with other agents to solve larger computational tasks.
PR: Pull Request, the process of modifying, debugging, and improving code.
Knowledge Cutoff Date: The date when an LLM's data was last updated, meaning the model cannot process information after this date.
Compound AI: Composite AI systems that handle tasks by combining multiple interacting components.
EventStream: A continuous, unidirectional data stream format for real-time transmission of event data.
Runtime: The execution environment, also called "runtime system" or simply "runtime," where executing code runs on the target machine.
ReAct: A language model paradigm that uses natural language reasoning to solve complex tasks, combining Reason and Act steps for more flexible and dynamic decision-making.
DAG: Directed Acyclic Graph, a data structure distinct from traditional blockchains.
VM: Virtual Machine, using software rather than hardware to run programs and deploy applications. Virtual machines virtualize the entire computer down to the hardware layer.
Container: A single container can execute anything from small microservices or software services to large applications. Containers only virtualize software layers above the operating system level.
Docker: An open-source containerization platform designed to simplify application development, deployment, and running.
Feedback Loop: A "closed loop" formed when a system (organization or organism) sends output back to input through certain channels.

Mentioned Companies and Key Events

Cursor: www.cursor.com
Cognition labs/Devin: app.devin.ai
Replit: replit.com, Replit Agent: docs.replit.com
OpenHands: github.com, OpenHands paper: arxiv.org
VisualWebArena: arxiv.org, TheAgentCompany: the-agent-company.com, paper arxiv.org
OpenAI o3: x.com
Computer use by Anthropic: www.anthropic.com
SWE-bench: github.com
Windsurf: codeium.com
Bolt.new: bolt.new

Staff

Producers: Wendi, Zoe
Post-production: Keyone Studio

About ZhenFund

"This is Seriously Speaking" is a general business podcast produced by ZhenFund, where ZhenFund's investment team shares the latest hot topics and industry insights with leaders from various fields.

Founded in 2011, ZhenFund is one of China's earliest angel investment institutions. Since its inception, ZhenFund has been actively seeking out the best entrepreneurial teams and era-defining investment opportunities in artificial intelligence, chips and semiconductors, robotics and hardware, healthcare, enterprise services, new energy, cross-border expansion, and consumer lifestyle.

ZhenFund — Your First Stop for Entrepreneurship!

Contact Us

WeChat Official Account: 真格基金 (ID: zhenfund)
Website: www.zhenfund.com
Email: media@zhenfund.com

You can listen to us on Xiaoyuzhou, Apple Podcast, and Ximalaya.

If you have any suggestions or expectations for the show, we welcome your interactions in the comments!