2025 New Year Conversation with Koji: AI's Pivotal Year, the Dawn of the Agent Era | Seriously Speaking EP35

真格基金·January 11, 2025

Stay optimistic about AI.

Listen to ZhenFund, in sound.

"Seriously Speaking" is a general business podcast. We hope to build a platform for sharing and exchange where anyone curious about business, technology, and venture capital can find something worthwhile. Each episode features a different ZhenFund investor as host, joined by leading figures across industries to take you deep into tech trends and the impact of emerging technologies. When we discuss hot topics in tech, we aim to give you nothing but the most professional analysis.

Of course, we hope this is more than just a podcast — it's an exploration of entrepreneurship. ZhenFund: your first stop on the entrepreneurial journey! We look forward to meeting you and discovering new possibilities together.

The first episode of 2025 is a crossover between "Seriously Speaking" and "Crossroads." At the start of the new year, Yusen and Koji are both excited because they feel they're witnessing an important moment in tech history. This feeling stems from two major events: the public launch of Devin, and the release of OpenAI's o3 model.

Why is Devin so exciting? Just six months ago, when Yusen appeared on "Crossroads," he offered an analogy: "Large models are still elementary school students — don't rush to put them to work." But after experiencing Devin firsthand, Yusen believes that as the first truly usable Agent product, it has shown people the Scaling Law of work. From "you ask, I answer" to "you ask, I do" — once AI can asynchronously and autonomously plan and execute, the new question facing every "human" has become: how to learn to be AI's boss.

Beyond deeply exploring the experience of using Devin and its technical leap forward, this episode reviews the rapid development of the AI industry over the past year and imagines what the big opportunities for AI entrepreneurship will be in 2025.

The new year is a time for dreaming big. AI technology surges forward like towering waves, yet dive in and you'll see that astonishing progress always emerges from concrete, granular efforts again and again. In the year ahead, let us continue full of anticipation — and watch where these massive waves will carry us.

2024 in Review: AI Technology Explodes, Model Progress Exceeds Expectations, Application Growth Significant

Koji: Let's start with the first question for Yusen: looking back at 2024, what's your overall impression?

Yusen: I'm very glad to collaborate with Koji again, sharing our thoughts on AI development and investment, and having the chance to exchange ideas with everyone.

Looking back at 2024, my overall feeling comes down to one word: "fast." Because we've seen rapid iteration in both AI models and products.

I remember at the beginning of 2024, the most advanced model was GPT-4. There was a new benchmark called SWE-bench, which took common task types from GitHub and had AI attempt to complete them. At that time, the most advanced model GPT-4 scored 2.8 out of 100. By the end of 2024, Sonnet 3.5, which people could actually use, scored 50 — solving half the tasks. And the just-released o3 scored 71.7 in preliminary evaluations.

Optimistically speaking, at this pace, within one year — that is, in 2025 — we could see AI solving the vast majority of tasks on GitHub. This also means that individual tasks of existing programmers, though not their entire jobs, can indeed be solved in many cases. At the beginning of 2024, ChatGPT still struggled with basic arithmetic. People would often test it on three-digit by three-digit multiplication, and it might get it wrong. But now it handles IMO-level problems with ease, and even on the Frontier Math test set — which is difficult for mathematicians — o3 scored 25. This was endorsed by Terence Tao, who said the easy problems are IMO-level and the hard ones are frontier research-level, and now AI performs quite well on them.

Koji: What impact has this had on applications?

Yusen: For example, Moonshot AI, which we invested in — their product launched on October 9, 2023, shortly before 2024 began. By the end of 2024, they already had 40 million monthly active users. Considering it's an app that's only about a year old, this user growth is extremely fast.

I also remember during the February 2024 Spring Festival holiday, watching Sora's launch demo and feeling it was incredibly stunning. At the time, I wondered how long it would take and at what cost such a video generation model would become available. But by the end of 2024, people could already use it. Products like Keling AI, Hunyuan, and others, including Google's Veo 2, were arguably better video generation models than Sora at the time — and they were free, making people think "it's no big deal." So people's standards for AI products have risen quickly; what amazed people a year ago now feels ordinary. But we always feel there's more to be done, many things haven't landed yet, and progress is actually very fast.

At the same time, I think many opinions and views have been proven wrong. I remember at the start of 2024, if you asked Chinese investors and entrepreneurs, many wanted to build "China's Character AI." Everyone thought this seemed like a To C application with lots of users, and there was talk of a "hundred C's war."

Koji: Early in the year, many predicted that a "hundred C's war" was certain to happen in 2024.

Yusen: I didn't make that prediction, but many people did. In August 2024, Character AI announced its acquisition by Google, and people realized breaking out of niche circles wasn't so easy. In March, Cognition, the company behind Devin, released a demo video. At the time, no one believed it — people thought the company was running a scam, some even called them frauds, and there were debunking videos. Then in December, Devin launched, and everyone was shocked to discover it was real, capable of many AI functions.

I also remember the OpenAI board drama at the end of 2023, when the entire OpenAI staff collectively voiced support for Sam Altman on Twitter, with "OpenAI is nothing without its people"刷屏-ing everywhere. But by the end of 2024, who knows how many people had left. In the end, even Alec Radford, a founding member and core researcher at OpenAI, departed. Most of the early employees had left by then. And early in the year, everyone thought GPT-5 was coming soon, but GPT-4.5 never materialized by year-end. What arrived instead was a different path: the o1, o3 route focused on inference.

So much changed over the year — rapid changes, and many unexpected or unpredictable ones. This is probably just the normal state of an industry in its early stages of transformation.

Koji: Half a year ago on that "Crossroads" episode, Yusen had a core view: "Large models are still elementary school students — don't rush to put them to work and make money, give them more patience."

Behind that statement was the implication that while technological progress was fast, it was still far from commercialization, far from large-scale To C applications. Do you still believe that today? Or do you think the pace of evolution has exceeded what you understood at the time?

Yusen: First, that statement actually had a specific context: people were asking, "So much money was spent training models — when will it be earned back?" In discussing the investment return cycle for model training, I felt this followed a similar pattern to every technological revolution: first invest in infrastructure and research, then products gradually find landing scenarios, and eventually generate commercial revenue.

So over the past year, in specific domains where model capabilities excel — such as programming — large models have indeed crossed the threshold of being able to "work." As I mentioned earlier, on the SWE-bench test, solving only 2% of problems at the beginning of the year clearly wasn't job-ready, but now solving 50% is. Especially after ChatGPT 3.5 emerged, we saw products like Cursor, Windsurf, and Devin begin to emerge, truly helping programmers solve many problems and bringing substantive productivity gains.

From a revenue perspective, some AI Native applications have grown rapidly after finding product-market fit. For example, Cursor's annual recurring revenue is now close to $100 million. Another AI programming company targeting non-technical users, bolt.new, reached $4 million ARR in four weeks and $20 million ARR in two months — the fastest growth ever for an enterprise application. And a Stockholm-based company called Lovable reached $4 million in annualized revenue within four weeks.

Including our portfolio company HeyGen, which hit $1M ARR in mid-2023 and grew dozens of times over 18 months, reaching nearly $50M ARR by end of 2024. Our portfolio company Monica has also surpassed $10 million ARR, all achieved in just over a dozen months. Whether overseas startups or our own investments, significant progress has been made in user growth. Like the Moonshot AI I mentioned earlier, which already has 40 million users.

I believe that in certain domains, AI already has the ability to "work," but overall revenue still falls far short of costs. We need to remain patient — after all, ChatGPT has only been around for two years. We're still in a stage where model capabilities continue to improve, unlocking new application scenarios. Only after application scenarios generate enough value can commercialization gradually unfold.

Koji: Actually, I feel the speed of this wave of technology diffusion is extremely fast. The Cursor, bolt.new, HeyGen, and Monica I just mentioned — except for Monica, because Red gave me a VIP membership — I'm a paying user of the other three. These technologies spread faster than the last wave, I think. Even without network effects, there's a very passionate group of tech frontier explorers today, constantly trying new things and enthusiastically spreading the word. "Crossroads" is one of them too — Yusen and I both share immediately whenever we use something exciting.

I have a strong feeling, which is also why we're recording this episode: I hope people won't just watch from the shore, thinking what they see is just new version number releases that don't seem to affect them. I especially hope everyone will jump into the waves, download these applications and experience them, feel it early, start using it early.

Yusen: I think there's a quote that really captures it, from sci-fi writer William Gibson: "The future is already here — it's just not evenly distributed." If you're just using a basic chatbot day-to-day, or if you haven't really used AI products much at all, then this probably all feels like headline news. But in certain fields — for programmers or digital art creators, for instance — I believe AI tools have already become indispensable to their work. I've always felt that spending a little time or a little money to experience the latest AI products is well worth it. It's a great way to viscerally feel the progress we're making in certain areas, and it's a good way to see the future.

AI Diffusion: How to Let Everyone Create, Not Just Consume

Koji: You mentioned these massive leaps for digital art creators and programmers. I think their significance goes beyond just helping those two groups. The bigger meaning is in helping ordinary people do the kind of creation that only programmers and artists could do before.

Coming back to this — Yusen, how many AI application startups did you talk to at ZhenFund last year? Any overall impressions? Do you feel like AI applications are landing faster?

Yusen: Our team collectively probably spoke with over a thousand AI application startups. I personally talked to nearly 200 founders. We do feel that with technological progress, the pace of AI application deployment is accelerating.

Specifically, I think three developments matter a lot:

First, model reasoning capability. Releases like GPT-4o and o1 have strengthened models' reasoning, which reduces hallucinations and enables them to plan and execute more complex tasks.

Second, improved programming capability. In the digital world, many tasks can be accomplished through code. As we mentioned earlier, programming capability is growing very fast. When common tasks can be solved through programming, execution ability improves dramatically — at least in programming itself, and in other areas that can be generalized as programming problems.

Third, tool use or "Computer Use," which Anthropic pioneered. AI can use the software we already have, starting with browsers and extending to other applications. All the software humanity has built can be leveraged by AI to solve tasks. Combined, I think these have significantly improved AI's ability to complete tasks.

I think Devin's release in 2025 was important because it was the first product that turned Agent from imagination and prototypes into real, deployed reality. We'll quickly see Agent attempts across various fields. Many are still at relatively early stages, but I think a lot of interesting thinking will become reality.

Koji: So we'll spend considerable time talking about Devin, and our expectations for Agent-represented AI development next year.

Yusen: We're seeing quite different AI application entrepreneurship directions in the US and China. In China, because enterprise service deployment remains somewhat difficult, many founders still want to build consumer-facing applications. And within consumer apps, many lean toward time-killing applications — various forms of emotional companionship, AI chat variants. In the US, we're seeing people across every niche thinking about replacing portions of human work, making work more cost-effective and efficient. This is a major contrast between Chinese and American entrepreneurial directions.

Of course, domestically there's also the major direction of robotics — the entire embodied intelligence space has many new companies emerging, raising lots of funding, even to what we feel is somewhat overheated. But overall, I think people are very excited. Especially for young entrepreneurs, because previously people felt the internet era was nearly over — we post-80s generation were the beneficiaries of internet-era dividends, but what could the post-00s generation do? Before AI emerged, they felt there really wasn't much to do in the internet space. But now AI has shown them many new opportunities, opportunities belonging to their generation of young entrepreneurs. So as a fund that always pays attention to young people, we still find many interesting entrepreneurs and projects emerging.

Koji: Speaking of this wave of entrepreneurs, what typical commonalities do you see in them? Beyond being more youth-friendly?

Yusen: Youth is an inevitable characteristic as eras progress.

They generally have more international perspective. Information spreads faster now. In the internet era, when an overseas application became popular, China might take three to six months to produce a copycat. Now basically anything new overseas gets reported the same day, much of it summarized and translated by AI. So people are generally very aware of overseas model and application progress.

Similarly, because products are often international from the start, going global has become a major theme. Models inherently have strong multilingual capabilities, so many products are global from day one. This was harder to see in the internet era, when people typically said "I'll just build something for the China market." Now people are walking two paths simultaneously from the start — both domestic and international. I see many entrepreneurs and teams being more AI Native, with quite a few having experience in AI research or engineering practice. This is why they can spot opportunities earlier and execute on them.

But at the same time, I think for young entrepreneurs, because they may not have experienced many internet-era business processes, they have some lessons to learn in areas like promotion and commercialization. At times like this, some "old drivers" — like teams such as Monica that we've invested in, who have been through much internet-era growth — do have advantages in this experience. But I think these are all learnable, and can be improved through hiring and team augmentation, so we're still very confident long-term. We believe new-generation AI Native entrepreneurs can build very interesting products, and can make up for the lessons they need to learn.

Koji: Let's talk about how our understanding of AI technological breakthroughs, industry changes, and entrepreneurial opportunities has evolved from last year to this year. First: what are some views you quite agreed with a year ago that you no longer hold?

Yusen: I think there are too many — that's why I became reluctant to record podcasts, because every time I speak I risk getting slapped in the face. But doing early-stage investment, especially looking at early technology, getting slapped in the face is the norm. Only by not fearing it can you continue learning and growing.

A little over a year ago, everyone emphasized Pre-training. The narrative was about how many GPUs you needed, how big your cluster had to be — this was why NVIDIA's stock exploded. Because people simply understood it as: more GPUs, more compute, throw more data in, and good models come out.

Looking at late 2024 to early 2025, in Pre-training, from OpenAI and various industry-leading teams, we've hit a relative bottleneck.

If we say Pre-training is the compression of intelligence, then intelligence in forms like text that are easily compressible has been largely compressed. Ilya said in a talk that "all this internet text is like fossil fuels, the text humanity has accumulated over so many years, and now we've trained it all into models. Next we need new knowledge — whether knowledge still in our brains that hasn't been extracted, or new knowledge produced through AI — and the growth rate of such knowledge isn't that fast." So I think the "brute force miracles" of Pre-training is something everyone realized needed to change this year.

A year ago I did discuss some Agent content. At the time, given hallucinations were widespread in large models, I felt autonomous agents or L4-level Agents would need relatively long landing times. But currently, model reasoning capability, code generation capability, and tool use capability have progressed very fast. This means in the digital world, for tasks with relatively deterministic target outcomes — like programming — Agent landing speed has indeed accelerated significantly. We've seen products like Devin are no longer just ideas, but reality.

There are two key points here: one is how to better plan tasks and execute longer-horizon tasks; the other is tool use, including writing code to use tools and using existing tools. When both these capabilities become strong, Agent landing speed may be faster than people thought, especially in the digital world.

A third point: a year ago people generally believed models would keep getting larger — 7B, 70B, even 700B. But currently, advanced model size doesn't need to increase that fast. We can get increasingly good results from 70B models, while also putting equivalent capabilities into smaller models.

In reality, such truly massive models may mainly be used for aligning models that are deployed, or as Teacher Models. This is somewhat like the earliest personal computer era. Everyone initially thought CPU processor frequencies needed to keep rising, but actually after 3GHz, single-core frequency didn't really keep growing alone — instead performance improved through better architecture and lower power consumption. Like the human brain, it's not about getting bigger, but learning more knowledge and skills within the same size, becoming smarter. In this regard, I think model cost declines have exceeded expectations. While everyone always knew model costs would keep falling, now we're seeing that every year, for equivalent models or equivalent intelligence, costs drop to one-tenth of before. This will unlock many application opportunities — things people didn't so clearly realize in early 2024, or that changed during the process.

Koji: Another question about evolving understanding: what were things you felt worth paying attention to in early 2024 but not that important, that have become especially important today?

Yusen: I think first, as investors, our understanding of frontier research often lags somewhat. Some things may have reached consensus in researcher communities while we're still in a state of catching up.

A major focus in 2024 was the rise of Reinforcement Learning (RL). As mentioned, Pre-training has hit a bottleneck, and in Post-training, using RL to continuously strengthen model capabilities — especially after o1 and o3's releases — people discovered there's still a long road to go on the RL path, and model capabilities can improve substantially. In early 2024, this was really only discussed in small circles, not yet common consensus even beyond the research community. So we find that predicting large model or AI technology directions is always very difficult. RL talent is actually quite scarce, so everyone is building teams and technical reserves in this area.

At the same time, a crucial Scaling Law was proposed: the Inference Scaling Law — how to extend reasoning time to produce better results.

This was a major development last year, showing up not just in model design but in how we design products. Because most current products — whether ChatGPT, Claude, or Cursor — require real-time interaction with humans: I say something, it responds. So the question becomes, how do we let it take more time at each step, even allowing it to plan and use tools to keep working continuously without requiring constant input from me? This "slow thinking" approach isn't about blurting out answers, but about deliberating to reach better results. How to achieve better performance in this area will be crucial this year.

Another point: models previously had very little background information. When I asked ChatGPT a question, it really only had my input as context. In fact, any smart person would find it difficult to answer a question based on just one sentence. But now we're seeing that Cursor, for example, can use an entire organization's codebase as background information. And Devin, integrated into Slack, can draw on existing conversation records and feature documentation within the organization. With the same level of intelligence, a model that has more information can better understand intent and provide better answers.

I think in this regard, new product designs that let users effortlessly and simply bring in more background information will become important. So the current back-and-forth format of something like ChatGPT still feels very primitive. Everyone is thinking about what new product forms might look like — these are things that gradually came to the surface this year.

Koji: In our last episode of "Crossroads," we happened to discuss what OpenAI released during its 12-day launch marathon. Regarding the third point mentioned earlier — how to obtain more background information — OpenAI also released a new feature: the Mac version of ChatGPT can now read your screen, using on-screen content as background information to generate responses in combination with your questions.

This screen-reading capability isn't simple screenshotting — it operates on three levels. The first level is screenshot-style understanding: it comprehends whatever is displayed on screen. The second level is that it can read all content within a program window, even content not currently visible on screen that would require scrolling to see. The third and most powerful level: it knows your cursor position. Because where your cursor is often indicates where your attention is most focused. So when you ask questions or discuss with it, it incorporates your cursor position or selected text into its responses.

So I think even within a chatbot format, applications that can read more background information make AI capabilities significantly stronger.

Yusen: Right, the original ChatGPT was kind of like a pen pal — you could only write to it, you write a letter and it writes back. But if this "pen pal" isn't on the other end of an email, but standing behind your computer watching how you use it, or even living inside your computer seeing things that aren't visible on screen, it obviously becomes much more useful.

So I think how we combine AI with users' background information, users' existing knowledge, and organizations' existing knowledge has enormous impact on AI's usefulness. Because it can now digest so much information, which of course also benefits from advances in model technology.

Koji: It's not just these developments. Gemini 2.0, released just two weeks ago, also introduced multimodal understanding capabilities. You can simply open your camera and point at something, asking "what is this?" I tried it myself — I pointed at a film festival poster on the wall and asked what festival it was, which edition's poster. Questions like this only existed in science fiction movies before, but today they've become reality. And this reality comes at acceptable cost, with very fast response times. Of course, it hasn't been well productized for C-end consumers yet, but if you try it, I think the effect is truly stunning.

From ChatGPT to Devin: Four Development Stages and Paradigm Shifts in AI Programming

Koji: Let's talk more about AI programming. The programming field made very exciting progress this year. Yusen, you've always had strong ability to synthesize frameworks and summarize. Not long ago you shared with me your distilled four-stage theory of AI programming development — want to share it with everyone on the podcast?

Yusen: This actually came from discussions with many friends — it's the crystallization of collective wisdom. AI programming has only been around since ChatGPT's emergence, just over two years, but has already gone through four stages.

The first stage was having AI write code directly, typified by early ChatGPT and Claude. We'd give it a requirement like "help me write Snake," and it would produce some code. In this process, it neither knew why I wanted to write Snake, nor how the code was running. I'd probably need to compile and run it locally, find errors, tell it about them, and then it could give debugged results. At this point, AI was completely like a pen pal you could only communicate with through mail — simple Q&A mode.

The second stage, represented by GitHub Copilot, was when AI began to have context — it could use an entire organization's codebase as context. This gave AI access to vast new background information. But users still needed to manually paste code into their IDE for debugging. I think this was the 2.0 stage: we gave AI codebase as context.

A major advance in 2024 was the emergence of programming copilots represented by Cursor. Its core concept was predicting what code the user would write next. Based on your codebase and what you'd just written, it predicted what code you'd write next, what files you'd create, what operations you'd perform. This brought major improvements in both the quality and quantity of generated code, as well as file creation and modification. Later, Windsurf added automation for command-line operations, so AI could effectively use my computer. Originally AI wrote code on a piece of paper and I'd copy it to run; now AI could create files on my computer, execute command-line operations — entering the "I write for you" stage.

Just when we thought this was exciting enough, Devin's appearance brought several important breakthroughs: First, it can work asynchronously. Cursor, Windsurf, and similar tools, while doing more in each step, still require sustained attention — "I say one step, it does one step." But Devin can work continuously, freeing up the user's attention. This is because it adds a Planner that can plan tasks.

Second, it can execute more operations through virtual machines, doing more debugging work. For example, if you write a website, it can use a virtual machine to visit the site itself, checking whether front-end and back-end business logic is correct, and can be interrupted and adjusted at any time. With Cursor or ChatGPT, you know you can't make adjustments mid-output — you have to wait for it to finish before modifying. But Devin is like a real person: you can give new instructions while it's completing a task, and it will incorporate this into its existing Planner to adjust the plan. This evolves from "writing for you" to "doing for you."

To summarize these four stages: Stage one is having AI write code, represented by ChatGPT; stage two is AI opening up the codebase, represented by GitHub Copilot; stage three is AI automatically writing and executing code, represented by Cursor and Windsurf; stage four is AI virtual employee, where Devin set an excellent precedent.

04 AI Going Global: Deep User Needs, Clever Content Marketing, Avoid Simple Ad Buying

Koji: This reminds me of an analogy: 1.0-era AI "read ten thousand books" to answer questions, while 4.0-era AI "traveled ten thousand miles." It becomes a real employee — you assign it tasks, it goes out to complete them, then returns to report. This is a leapfrog four-stage transformation we witnessed firsthand this year.

ZhenFund has invested in quite a few AI startup teams going global, with very typical representatives being HeyGen and Monica, both performing exceptionally well. So I'd also like to discuss the going-global topic with you.

This year, there's a widely circulated saying in the industry: "Go global or get eliminated" — going global seems to have become very important, even crucial. So first I want to ask: why is the contrast so stark between overseas AI adoption and domestic adoption? To the point where we encourage domestic entrepreneurs who can't even speak English well to bravely try AI going global?

Yusen: The core reason is that AI is currently primarily a productivity-enhancing technology, and in regions like Europe and America with much higher per-capita wages, willingness to pay for tools is stronger.

So when you make a productivity tool — like the productivity-enhancing tools we've invested in such as HeyGen, Monica, Oculus, Max AI, and others — overseas users, especially European and American users, have relatively strong willingness to pay for productivity, and they pay in dollars, so the absolute amount is higher. This is the most important factor.

There are also some other reasons: for example, going overseas gives access to more capable models like Sonnet 3.5 or GPT-4o, unlocking more application scenarios, while models available domestically do have some gaps. Additionally, once a product is well-made, since large models themselves can process multilingual input and output, since you've already built it, why not promote it globally?

I think subscription models are now widely adopted — this is indeed harder to implement domestically, but overseas subscriptions are already broadly accepted. This also significantly improves startup teams' ability to generate commercial revenue.

Koji: So what characteristics do you think this generation of AI entrepreneurs needs? And would you encourage them to go global? Because I imagine you wouldn't encourage everyone to go global.

Yusen: Actually, we feel that when all VCs are urging entrepreneurs to go global, this usually means the market is overheated.

Because we've always been wary of these so-called highly consensus views. And we think for most Chinese entrepreneurs, going global is definitely a debuff rather than a buff — after all, you're playing away games, you need to solve many problems you wouldn't need to solve domestically, and understand many users you originally didn't understand.

There are actually many opportunities in China. Like the AI companies we've invested in domestically — Moonshot AI, AI Weiwu — are actually growing faster. It's just that commercialization may be slightly slower. But I think this is also something we learned from the internet era. Think about it: in the internet era, when eBay commercialized early and took commissions, Taobao went free first, and eventually built an even stronger business model. China and Western markets have always suited somewhat different business models — not every team needs to go global.

Koji: For Chinese entrepreneurs who've already chosen to go global, and I believe many are listening to this episode — Yusen, what advice would you give them?

Yusen: I think going global is fundamentally the same as building products anywhere — you have to deeply understand users' real needs. The language and geographic barriers make this even more critical, especially in enterprise services. We've seen quite a few Chinese entrepreneurs in enterprise services who think, "Our engineers are really capable, our problem-solving is strong, so we can go overseas and outperform competitors."

While our teams have strong execution, defining the right problems requires field research and truly understanding customers. So especially in sales-driven domains, you absolutely need to find experts with Go-to-Market experience, and the team needs to physically go to the target market. For consumer-facing products like Monica, the needs might be more universal or easier to grasp, so that's not always necessary. But for enterprise services, people have to go.

Of course, we've also seen many teams succeed in niche markets, since those needs are the easiest to understand — they're often pretty similar across humanity. That's the first point: really figure out the users and their needs. Second, the common trait of teams that do well is thinking through and finding a low-cost, high-return marketing strategy. Take HeyGen, Monica, Viggle — these Chinese products that have done well overseas — they typically excel at SEO, social media distribution, or viral spread of quality content, rather than simply buying ads. Of course, if your product monetizes very well, maybe you can make the ROI work on paid acquisition, but basically all paid channels are expensive now.

So how to do marketing cleverly, especially achieving viral marketing through product features, becomes extremely important.

Using overseas platforms like Twitter effectively is actually quite different from China. Domestically, people are used to buying信息流 ads, doing paid acquisition, competing on sophisticated ad strategies. Overseas, I think you need to be much more clever about it. Chinese teams generally have strong product execution, so it really comes down to two things: what to build and how to promote it. These are the areas where people commonly face challenges, or where doing well really sets you apart.

05 AI Hardware Entrepreneurship: Looks Beautiful, but Requires Caution

Koji: There's another view going around — that this wave of AI hardware is quite active, and that building AI hardware can really leverage China's strengths. In the AI hardware space, Yusen, have you looked at or invested in anything over the past year?

Yusen: We've looked at quite a few AI hardware projects, but honestly, hardware looks beautiful without necessarily being easy to actually land.

What's worked better in the past is when the product prototype was already proven overseas, and we made it faster, cheaper, or smaller. We've also seen teams like Plaud that created genuinely innovative products. But overall, hardware doesn't scale that quickly — software remains the better vehicle for AI's current diffusion. So we've always been relatively cautious on hardware.

We do invest in such entrepreneurs, but overall we haven't invested as heavily as some funds. Personally, I've always been cautious about AI hardware — including when Rabbit and Humane first came out, I held a fairly skeptical view.

06 Devin: Not Just a Coding Tool, but the First Truly Usable AI Agent

Koji: Alright, let's move to part two of today's discussion. We'll be talking about Devin with Yusen. First, a special note: we'll be approaching this from a non-developer's perspective. Neither of us are professional engineers — though we both studied computer science for seven years, we've worked as product managers since graduation. It wasn't until Cursor launched six months ago that we started coding again, or rather, started commanding AI to write code for us (laughs).

But precisely because both Yusen and I come from non-developer backgrounds, this gives us a unique lens to experience Devin and to predict how AI Coding Agents, and AI Agents more broadly, will transform everyone's future work and life.

Because we believe this generation of AI programming technology will ultimately evolve in two directions: one serving professional programmers and developers, and two empowering all non-developers like us. And the latter's commercial value and application prospects may be far more profound and extensive.

So first question for Yusen — on launch day, you immediately put in $500. After funding Devin, what was the first thing you used it for? And what's the most impressive thing you've done with it?

Yusen: After installation, Devin suggests some tasks. One of them incorporates your name, searches for information about you online, and builds you a personal website. After that, I gave it typical intern work: I asked it to revise our venture fund's manifesto. I said, go find out which top US VCs there are and what their manifestos are. This is a classic task — you roughly know what's needed, but it requires information gathering, organization, and problem-solving ability.

Watching it work revealed many interesting points. First it had to determine what counts as a top US VC, so it went to Pitchbook, CB Insights, and similar sites for ranked lists. It found what it considered the top dozen or so, and I checked — they were indeed about ten of the most elite firms. Then it went to each website to find their manifesto. But "manifesto" goes by different names in the VC world. Sequoia Capital calls it "Ethos," Founders Fund calls it "Manifesto," elsewhere it might be "About" or "Philosophy." And several VC websites didn't have this content at all — no "who we are, what we believe" statement. So I watched Devin try to understand the task and find the most relevant content.

For example, when searching Accel (a very well-known US VC), it found nothing matching on the website. So it went into their News section, searching back two or three years, and found an article describing Accel's values and methodology. It extracted that as what it was looking for. You can see it solving problems like a junior human employee — not mechanically checking whether your website has something literally called "Manifesto" and giving up if not. Instead, it thought: I need to look across your entire site for content that fits this description, then search for it.

It ultimately gave me a Markdown file with ten VCs and their corresponding manifestos, though it had many typical current AI model issues. Sometimes it takes shortcuts — I wanted the full text, but for several VCs it summarized instead. This is something we often encounter with current AI chatbots: due to token limits, you don't get the full text but an abbreviation. At that point you have to tell it: give me the complete text content. So it really does need coaching, just like an actual intern. But I found its planning ability, and its creative problem-solving when direct solutions weren't available, quite remarkable.

Of course this may not be the typical Devin use case, since I didn't have it do programming but rather typical language model AI tasks. So I can easily imagine: if we now have Devin for programming, we could absolutely have corresponding Agent products for text work, finance, or legal work.

What I believe is: as long as I define the work as something a person can do sitting at a computer, using the computer, going online, using software — then it can likely be accomplished to some degree in this workflow. That really impressed me.

Koji: So from day one until now, about two weeks in — what future do you feel you've experienced?

Yusen: After experiencing Devin, I feel that as the first truly usable Agent product, it may mark an important moment in human history.

Why do I say this? Humans have invented many tools. Some say "humans are the animal that can use tools." But all these tools basically fall into two categories: first, tools requiring sustained attention — like drills, hammers, or keyboards and mice — that need our continuous focus and input; second, mechanically repetitive automated tools — like washing machines, vending machines, assembly lines — that don't need our attention but can only solve repetitive tasks.

We've always been searching for a third kind — tools that don't need sustained attention but can plan and solve problems autonomously. This is what's called the Autonomous Agent.

In previous conceptions, perhaps only products like Viggle achieved this in hardware. On the software level, we hadn't seen such products emerge. Last year there were attempts like AutoGPT, but they remained at the prototype stage.

I found that Cursor defines several characteristics that true Agent products need:

First, the asynchronous experience enabled by powerful task planning. Its originally designed scenario was: in Slack you @Devin saying "help me fix this bug," and it goes off to fix it itself. It only comes to me when it genuinely needs help or has completed the task. This is just like an intern — you assign the task, they work independently, and only come to you when they hit an unsolvable problem. Meanwhile, I can assign tasks to multiple interns, letting me focus on what truly matters.

Second, cloud-deployed virtual machines. It can use browsers, and in the future more software, thereby completing more tasks. This is completely different from Cursor and Windsurf using my own computer. If you've used RPA software before, you'll find that when RPA is operating, you dare not touch anything — your actions will interrupt its workflow. The AI is using your computer. But Devin uses virtual machines — just like giving an intern their own computer. The flexibility of AI using its own virtual machine is fundamentally different.

Third, Devin learns and grows on the job like a real employee. When we hire an intern, they'll mess up plenty on day one because they don't yet know how to navigate the social dynamics of our organization. As they work, they gradually recognize the need to accumulate relevant experience — what we call "knowledge." They'll proactively flag that they've learned something, like checking official websites first when searching for information. I confirm they've picked up these good habits, much like performance reviews with interns or staff. Just as an employee writes a summary of lessons learned and we affirm "yes, those are the right takeaways," this theoretically allows continuous accumulation of organization-specific knowledge, making them increasingly suited to the team.

This mirrors how we actually hire. A new employee's initial value is relatively limited; they need ongoing learning to better adapt to the organization. But with previous tools, we expected them to work perfectly out of the box — we never expected a computer to keep getting better through continuous learning.

With Devin, we're genuinely seeing a growth curve resembling that of a human employee. Though still early, this paradigm shift feels profoundly important.

Fourth, Devin introduced a task-based pricing model. $500 buys 250 ACUs, with each ACU representing roughly 15 minutes of work, translating to about $8 per hour. That's less than half California's minimum wage ($16/hour). As AI compute improves and costs drop, this investment will accomplish even more in the future. Compared to hiring, which involves HR, office space, management overhead, and other hassles, AI is a 7×24 tireless employee.

A friend put it well: programmers love Cursor because it's a programmer's Copilot, boosting individual efficiency; bosses love Devin because bosses think about how to buy productivity. Devin demonstrates a potential paradigm shift — expanding productivity through spending. I think Devin showed me the Scaling Law of work.

Among many Coding Agents, the first task is often building a personal website. We joke that "this is the new Hello World." Devin handles this well because finding my information online is straightforward, allowing it to quickly assemble a site.

Koji: Devin's emergence hasn't just convinced people that AI programming has become impressive — it's defined an entirely new interaction pattern. You can see how an AI Agent can work this way. Since Yusen and I share a team account in Devin, I can see all his task progress, how he uses Devin, and how Devin responds to him.

It genuinely feels like being in an office. There's an intern initially helping Yusen with tasks, and now he's produced a report. Yusen happens to be downstairs eating, so I see the report and suggest: actually, what Yusen wants is this, go refine it so it's ready when he returns. This really feels like working with a person — which is why we say it's a true Agent. Because "Agent" translates to "person," not merely machine; it carries the sense of an assistant. This is why I feel Devin has created a new paradigm of working with something like an assistant.

Yusen: Right, there are many interesting details here. Let me give another example. In another friend's task, he asked Devin to scrape some people's information from LinkedIn — say, Chinese employees at OpenAI. But Devin obviously doesn't have a LinkedIn account, so it needed to ask the user: could you help me log into LinkedIn? Since Devin runs on a virtual machine, it has an interactive mode. As the user, I can enter my LinkedIn credentials into the virtual machine, and then Devin continues working.

What does this resemble? Imagine hiring an intern, giving them a computer, but they don't have a subscription for specific software. They'd say, "Boss, come enter your account." After I input the credentials, they continue working with my logged-in account.

This is why virtual machines matter so much — they enable many operations without disrupting my workflow. Otherwise, like with Cursor or Windsurf borrowing my computer, I couldn't do anything else during that time. This asynchronous approach lets me assign Devin many tasks simultaneously. It's a parallel working mode where I only pay compute costs.

This is actually significant. In daily life, I might have one intern, but if I had ten, each helping with many things, the productivity gain could be exponential.

Koji: It reminds me of when people said "everyone can be a product manager," but today it's become "everyone can be a CEO." Because in interacting with AI units, you seemingly only need to do the three things CEOs enjoy most: first, give orders; second, check work; third, at a more sophisticated level, provide inspiration and guidance.

Yusen: Actually, many people using Devin or other AI products encounter the same problem: what should I do, and how should I articulate my needs? Imagine hiring an employee and simply telling them "build me a Taobao" — they'd certainly fail. So why do we often have unrealistic expectations of AI, thinking "build me a Taobao" will just work? This is clearly wrong.

Indeed, each of us must think about what we actually want to do. Faced with a powerful model with many capabilities, the key is whether you clearly know what you want to achieve and can articulate needs in a more reasonable, comprehensible, structured way.

Just as we ourselves, working as product managers, designers, or programmers, find it frustrating when bosses haven't clarified their own requirements — like asking for "colorful black." But when we become AI's boss, can we be a good one? This is what everyone needs to learn going forward: how to be a good boss.

Koji: There's another strong feeling from using it, something hidecloud mentioned recently. He reminded everyone that Devin has a remarkable capability: it can help us tap into the accumulated wisdom of human history. What does this mean?

When we need to complete a task, we often don't know that a wheel already exists, that someone has already built this tool. Many tools exist as code, as repositories on GitHub or Hugging Face. Downloading such code locally, deploying it to a machine, and connecting it with other work or software — perhaps only one in a thousand people could do this. But today with Devin, theoretically anyone can, because you can give instructions in natural language like a boss.

A concrete example: say we want to build a chess application. In the past, simply writing out chess rules would take hundreds or even thousands of lines of code. You might think to search whether someone has already written this as a callable code library. But you might get hundreds of Google pages of results, with no way to know what's best or what represents best practices. With Devin, you can assign this command, and it will use its own analytical approach to find the most suitable existing code library and put it to use directly.

The value this brings: all tools or code libraries previously developed to solve specific problems become directly usable — no need to reinvent the wheel. You can stand on giants' shoulders, using these community-validated best practices to develop the tools you want. I think this is a value that Devin and Cursor both realize, perhaps not dramatically obvious but profoundly impactful.

Yusen: When ChatGPT first appeared, I had a strong feeling: if much of your work involves copy-pasting or being a "frankenstein" of assembled parts, that's easily replaceable. People found that the earliest work dramatically made more efficient — or, put less charitably, most easily replaced — was entry-level design work of the cut-and-paste variety. Copying others' designs, or junior coders simply modifying a library for their own project. Such work is most easily replaced, so frontend programmers actually face considerable pressure, because frontend presentation mostly doesn't require that much innovative thinking.

In this process, I think for everyone, the abilities to generate ideas and solve problems creatively become increasingly important.

And finding existing solutions, gluing them together — this is precisely what AI excels at. Most of our work actually consists of problems already solved, wheels already invented; it's just that humans previously didn't know these wheels existed, or couldn't effectively stitch them together. Now AI can help us do this, allowing us to focus on thinking about "what to do" — which becomes ever more important.

This also makes me think about implications for education. Much of our previous education and training taught "how to execute." Like when there were no calculators, we had to learn extensive manual and mental calculation. But now, we need to understand computational principles without necessarily performing the calculations ourselves. We can devote more energy to thinking about what to do, asking the right questions. This is why I believe future education systems need major transformation.

Koji: So 2025 is something to genuinely look forward to. From Devin's release, what we see isn't just AI programming being elevated to the next level by Agent technology — this new paradigm's emergence will bring disruptive revolution to all aspects of life, and with it, entrepreneurial opportunities everywhere.

Earlier Yusen mentioned a fascinating point: Devin is the first tool in human history that neither requires sustained attention nor merely performs mechanical repetition. This also shows us a kind of Scaling Law for work. Could you expand on this? Help everyone better understand what remarkable value this represents.

Yusen: First, Scaling Law in the most straightforward explanation: I can get more productivity by spending more money, where money is equivalent to compute. This is actually quite remarkable. Think about it — many companies raise lots of money but can't effectively convert it into productivity. They need to hire people, build organizations, handle various trivial matters. But with these asynchronously working AI Agents, we can assign many tasks to different types of AI. They consume compute and electricity to complete the work itself, and can operate in parallel.

You can easily imagine a "product manager" AI that's better at articulating requirements and breaking them down, directing a team of AI programmers to form a virtual organization. In this kind of organization, you need to focus on two things: first, what you want to do; second, securing enough compute and capital. In an organization that's rapidly becoming reality, we can effectively scale work by pouring in more money and compute. This is what's meant by the Scaling Law of work.

The second point is interesting. We often hear entrepreneurs say, "I have a great idea, but I need a programmer."

Excellent programming execution is still a scarce resource today. But when execution itself is no longer scarce, "what to build" becomes critically important. As I mentioned earlier, everyone needs to learn to be a boss. This way we'll see more entrepreneurial opportunities. Many founders whose ideas were previously buried due to lack of strong programmers may now get more chances, and more creative concepts can be put into practice. This is also why we can scale entrepreneurship itself: because productivity can be increased through capital investment.

All of this is possible because AI Agents can work in parallel. If our attention had to be on tools, it would be limited. But now our attention can be distributed across different Agents — one person can simultaneously issue instructions to multiple Agents to complete tasks.

Koji: Actually, speaking of Scaling Law, I'm reminded of an analogy. Back then, Xing Wang had us read a book called The Leadership Pipeline. The book talks about an important cognitive shift when you first become a leader of a small team: your output is no longer your personal output, but the output of the entire team.

Today, the Scaling Law of work we see in Devin is similar. The output here is no longer what you personally produce by focusing on the task at hand, but rather depends on how well you delegate team tasks and set inspection standards. All of the team's output, including all of Devin's output, ultimately becomes your output. This means you can achieve unlimited scale-up with limited attention. As long as you can manage enough people and Agents — and managing AI Agents is much easier than managing people, since managing people involves more communication, coordination, and emotional labor. I think this is probably what Yusen meant by the Scaling Law of work.

Yusen: This concept is spot on. Imagine if you could become the CEO of a multinational corporation, commanding thousands or tens of thousands of people — what could you accomplish? We didn't have such opportunities before, but now we can achieve something similar by managing AI Agents and having Agents mobilize other Agents. What's required is money and compute, and many companies aren't actually short on money. What they lack is talent, and organizational structures that can execute.

So I believe two trends will emerge in this scenario: on one hand, capable companies and individuals will be able to do more; on the other hand, many people with ideas will be able to quickly realize them at relatively low cost, gaining user validation or investment, giving us more entrepreneurs and innovation space.

Koji: Right, this is one of the most popular phrases this year: the "super individual." Because as a person gains empowerment from more and more tools, including AI Agents, they can accomplish what previously required ten or twenty people.

However, not long after Devin's launch, it also received a lot of complaints and criticism. What's your take on that?

Yusen: Much of the criticism focused on the $500 price point, with people comparing it to Cursor's $20 price. First, I think these are two different paradigms.

One is a tool that requires my time to use — it makes my time more efficient, but doesn't save time. So when using a tool-type product like Cursor, my costs haven't actually decreased; in reality it's my costs plus the tool's costs. But if you treat it as an employee, the comparison becomes employee salary. As long as it can do more work than an employee hired at the same price, I think this pricing is acceptable in the US and European markets. Many people see the price and immediately ask if it's a rip-off, but the key is how you view and use it.

When I discuss with programmers their experiences using Cursor and Devin, I find that when Devin's capabilities weren't strong enough, using Devin represented a major shift in most programmers' workflows. Because programmers themselves understand how code runs, they often want to maintain control of the big picture. So at this stage, a Copilot like Cursor is a better fit for their current workflow. Programmers already accustomed to working in an IDE, when they have a task to complete, need to talk to Devin, wait for Devin to work, then review the results — this process isn't very efficient. They'd rather fix bugs or write code themselves. If you're a very skilled programmer, you probably wouldn't want to be stuck with a capability-limited intern. Because Devin right now is only at intern level, and training an intern takes time and patience.

At this point programmers might feel that rather than waiting for you to write code and then helping you fix problems, they might as well write it themselves. I think this is completely understandable at the early stages of technology — we need to look at this from a human perspective. If a person makes mistakes, as managers we tend to be quite patient, because we know people can learn and grow. Point out their problem today, they might remember it, then have more motivation to work, and through training become a decent programmer.

Devin can actually learn. But we haven't yet established the expectation for AI software and products that "it can grow, it can learn, it can be managed."

So when it has issues, many users' reaction becomes "I bought this expensive $500 tool, and it still has problems," feeling disappointed. Therefore, when introducing a product like Devin into an enterprise, managing expectations becomes very important. Including in Devin's own documentation, it states that it first does tasks that would be assigned to interns — simple frontend tasks, bug fixes, adding a Dark Mode toggle to the frontend, that kind of work.

But humans' ability to ask good questions also needs to be learned. I often see people make requests like "help me build a Taobao" or "help me make a WeChat" — this far exceeds its capabilities. Devin now, like all AI products, will naively accept such tasks and say "okay, I'll help you build a Taobao." The results in such cases definitely won't be satisfactory. Learning how to use a tool well takes learning — we're not yet at the stage where any demand can be directly fulfilled. That wouldn't be an intern, that would be a god.

As Devin's capabilities improve and its understanding of organizational contexts deepens, I believe it will gradually grow from intern to junior full-time employee, then to senior full-time employee — this requires a process of acceptance.

I think Cursor is incremental innovation on existing workflows — it hasn't fundamentally transformed programmers' work. But Devin represents a disruptive innovation logic, which often requires significant adaptation time and different onboarding processes. The first product may not necessarily achieve this, so I don't think Devin is necessarily the final answer.

Devin likely just demonstrates one form that future AI products might take. We truly need to learn to adapt to and use AI-type products, just as adapting to the SaaS concept, or adapting to distributed work concepts like remote work, both required long periods of time and suitable catalysts. So I think it gives us strong directional guidance, but it's still at intern level right now. Pointing out its problems is easy in this process, but what's more important is that it proposes this future direction — getting inspired from here to build better Agents is what matters.

Koji: This is like the half-full glass theory — some see value in the half glass, others see problems. Just as when we discussed Devin completing the task of "finding the manifestos of ten top VCs," it knew how to find this content from press releases when the Accel website lacked relevant background. This is a huge highlight — it can set tasks, reflect, and self-check. On the other hand, there are indeed many problems, like the webpages it creates being very unattractive. But seeing the highlights rather than the problems, seeing future possibilities rather than points worth criticizing now — this makes me think: critics often feel correct, but only builders, though they may appear clumsy, are more likely to succeed.

This reminds me of something Huiwen Wang once said: "If you believe something will eventually happen, do it once every three years." Agents have been expected to appear since humans had science fiction, and people occasionally try. But after seeing Devin, it feels like this might be our closest approach to success yet.

Let's talk about 2025. Throughout 2024, although our discussions were quite optimistic, the broader environment would occasionally produce various pessimistic narratives. I especially remember in Q2 and Q3, the entire discourse was discussing where AI's PMF actually was — it seemed this wave of AI落地 was harder than expected.

Now standing at the beginning of 2025, there's a very simple yes or no question: Yusen, are you optimistic about 2025?

Yusen: I'm actually still quite optimistic.

First, finding PMF for AI applications shouldn't be expected to happen so quickly. I often use an analogy — although many people compare ChatGPT's launch to the iPhone launch, saying AI has entered the iPhone era, I always believe it represents more of a BlackBerry era.

What's the difference between the BlackBerry era and the iPhone era? Many listeners probably haven't used a BlackBerry — this belongs to us post-80s memories. Before the iPhone launched, smartphone form factors were very inconsistent, because the technology was still relatively early, development was fragmented, and people hadn't found a convergent path. This meant many things you wanted to do couldn't be done, the technology itself was expensive, there were no unified development standards or product standards, and there were few developers. So at that time, trying to build truly popular mobile internet applications like TikTok would have been very difficult. I've repeatedly made this point: you couldn't build TikTok in the BlackBerry era. As technology advances, moving from the BlackBerry era to the iPhone era unlocks more application opportunities.

After the iPhone appeared, first the technology became good enough that many applications went from "want to build" to "can build" — including good cameras, good screens, good processors. Second, technology became standardized. After the iPhone launched, phones all looked the same, and people realized the technology direction had converged. At the same time, more developers emerged, because development became easier, technology became standardized and cheaper, and people understood it better. So the iPhone era gave birth to massive numbers of applications.

When ChatGPT first came out, we also found that many things were conceivable but impossible to execute. Agent is a classic example. In the first half of 2023, there was an experiment called AutoGPT that introduced many compelling concepts — using language models to make plans, then checking completion and iterating. But models back then hallucinated too much, struggled to use tools effectively, and couldn't browse the web properly, so it simply didn't work. This is a classic case of "trying to build TikTok during the BlackBerry era."

Now, with Agent's advances in reasoning, programming, and tool-use capabilities, it's starting to look genuinely viable. There are still many shortcomings, but it's at least reached the first stage of being usable at an intern level. This is a textbook example of technological progress unlocking more application opportunities, and I believe this will ultimately take us from the BlackBerry era to the iPhone era.

From ChatGPT's emergence to now — just two years — we've seen tremendous progress, which makes me very optimistic. In just two years, AI programming has evolved from ChatGPT's "you ask, I answer" to Devin's "you ask, I do" and Cursor's "you ask, I write." That's enormous progress, and actually quite fast.

Second, PMF often comes from technological progress itself. Take Cursor — the product actually launched in 2023, but its proposed feature of predicting next actions required more powerful models to make predictions and write better code. You could say Sonnet 3.5's emergence is what enabled Cursor to actually deliver on what it set out to do. Sonnet 3.5 activated the product experience Cursor wanted to deliver, and Cursor's widespread adoption in turn made Sonnet 3.5 the most popular model in AI programming — a mutually beneficial relationship.

Similarly, products like Devin will need models to improve in reasoning and tool-use capabilities to succeed. Sonnet 3.5 or GPT-4o may not yet be sufficient. So Devin's product form may need a more advanced model to activate it — perhaps o1, o3, or another new Anthropic model. It's a reciprocal process where a product waits for a model to activate it, then drives widespread use of that model. So this stage genuinely requires progress in the technology and models themselves.

One characteristic of the mature mobile internet era we just experienced was how easy products were to use. TikTok just takes a finger flick; WeChat and Xiaohongshu are all easy to pick up. But when you're in the early stage of a technology, using a product well has a learning curve. Think about the earliest smartphones, personal computers, the internet — they all required learning to use.

Right now, most people using AI are far from extracting the intelligence embedded in these products. Today's large models, whether OpenAI, Claude, or Moonshot AI, have compressed vast amounts of knowledge and intelligence. But have we learned to use them correctly, to prompt efficiently, to extract the model's intelligence effectively?

I don't think most people have, myself included. I'm constantly discovering that the model can do this or answer that for me. So we're going through a transition from the easy-to-use product era of mobile internet to a deep AI era that requires learning to use.

People will initially feel some frustration, finding products somewhat difficult to use — that's characteristic of technology's early stage. Often the applications can already do many things; we just haven't learned to use them well yet, haven't become good prompters or good managers.

These require learning, or waiting for model capabilities to grow strong enough to handle these things for us. Then we may enter another product application period, but right now products are still in a磨合 [working-out] phase with us.

Koji: So everyone needs to understand where the boundaries are through experimentation, and how those boundaries keep expanding. I'd like to add that beyond the new opportunities unlocked by technological and model advances, especially in the Agent space, there's a fourth dimension.

In our previous "Crossroads" episode discussing OpenAI's 12 Days of OpenAI, guest "Big Brain" noted that some heavyweight content was actually held back from the announcements, whether for PR considerations or to avoid drawing excessive competitor attention. One point particularly crucial for Agents was OpenAI's Function Call and structured output capabilities, which enable Agents to receive much more precise instructions. This had been somewhat overlooked before, but makes perfect sense once pointed out.

Looking at 2025, Yusen, what application directions do you think are relatively easy to land? This is also what entrepreneurs are very focused on right now.

Yusen: Looking at what's landed more easily over the past two years, I see several patterns.

First: things that help customers make money. If your technology isn't yet fully mature but can directly help me make money, or directly improve efficiency in commercial workflows, that becomes very important. Take Midjourney — it has hundreds of millions in annualized revenue, with roughly half coming from advertising use cases, generating commercial images for ad campaigns. That's a very concrete scenario: I was already making these ads to make money, and now I can produce ad content faster and better. Or HeyGen, which is also primarily used in marketing scenarios — people use it to create promotional video ad content. So first, technology that helps customers make money — in early stages, people are willing to spend time learning and figuring it out.

Second: things that improve productivity by 10x or more on important tasks. Because with good technology, if it only improves productivity by 50%, people will still have a lot of resistance. It has to deliver transformative productivity gains. Cursor, Devin — for programmers these are absolutely 10x productivity improvements. Programmers might spend ages searching through codebases, so the motivation to use them becomes very strong.

Or Perplexity as an AI search engine — it's also a 10x productivity improvement over traditional search. Before, if I wanted to find information about Koji, I'd have to search through lots of content, read a dozen or two articles from "The Fair." Now I just ask it, and it reads those dozens of web pages and summarizes for me. So for information gathering and question-answering, it's more than 10x more efficient than search engines. Such products relatively easily find product-market fit.

Third is satisfying basic human needs — NSFW content, as everyone has seen plenty of such scenarios. Overall, either it makes money or it dramatically improves my efficiency. Achieve one of these and you're in good shape.

Koji: What application directions do you think people should be somewhat cautious about, that are somewhat difficult to execute?

Yusen: In mobile internet, many winners were "time-killing" apps. In China, people are accustomed to building apps with high user stickiness, where users spend lots of time, then monetizing through advertising. ByteDance, Xiaohongshu, Kuaishou all follow this paradigm. This is mobile internet's established pattern, because it was a new device making previously unavailable online time usable — a zero-to-one logic.

Now that apps like TikTok already occupy massive amounts of our time, if AI applications try to compete with these mature players on "time-killing" from the start, they face competitors that are already very powerful and have already captured most available time. Building "time-killing" apps in this context is very difficult.

What can ultimately land are relatively niche, specific-audience-facing products. AI companion chat for general users is hard-pressed to be more attractive than video apps like TikTok. Be cautious about apps competing with giants for time.

Second, changing the physical world remains quite difficult. We mentioned AI writing code, AI using tools — these are still in the digital world. AI can do many things in digital space, but in the physical world, even basic actions like picking up a cup remain challenging.

Although humanoid robots are extremely hot right now, the technical implementation path and how to scale model data in this direction remain open questions. Over the next three to five years, applications that change the physical world will still face many challenges.

Third, these past two years have seen many devices trying to replace phones — Rabbit, Humane, and others. They emphasize building phone-replacement products, and currently about 100 teams are working on smart glasses. My view is: if the scenario you're addressing is something phones already do — calling, searching nearby information, listening to music — replacing the phone is extremely difficult.

So far, hardware that can coexist with phones basically does things phones fundamentally cannot do. Drones can fly. Smartwatches go on wrists. Smart rings go on fingers. Or Insta360 for sports scenarios. But products like Humane and Rabbit are essentially doing things phones already do well. User switching motivation is minimal, because phones already achieve at least 80% in most scenarios. Unless your product is dramatically better, or does something phones simply cannot do, replacing phones will be very difficult.

I think in 2025 we'll see an enormous number of Agent products emerge. Many will face a challenge: when you need to make major changes to an organization, can you actually achieve that? Devin, for example, faces changing how programmers work — from writing code themselves to directing others to write code. This workflow change encounters much resistance in many organizations, especially large companies.

We've found that pushing AI in large companies also involves data permissions, privacy, security issues. If you're changing workflows, many people's jobs change, creating even greater difficulty. So I think making major organizational changes — unless you can dramatically improve productivity giving organizations no choice but to adopt, or targeting small and medium enterprises — otherwise, making big changes in large organizations often faces barriers of human nature, not technology.

2025 Outlook: Agent, Personalized Services, and Superhuman Breakthroughs

Koji: We just discussed how technological unlocking brings new opportunities — with much discussion around model reasoning capabilities, reduced hallucination, and computer-use abilities enabling Agent opportunities. Beyond these, what other technological unlocks do you think could bring wave-like AI entrepreneurship opportunities in 2025?

Yusen: I've summarized several technological unlock directions that could bring wave-like AI entrepreneurship opportunities in 2025:

First is Agent. We discussed this earlier — we'll see AI products targeting various domains. They'll draw on Devin's approach, doing asynchronous tool use, charging by workload.

In the US, some have inverted the original SaaS (Software as a Service) to "Service as Software" — turning services into software sales, or "sell work, not software" — selling work outcomes rather than the tools themselves.

We’ll likely see a lot of experimentation along these lines in 2025. Many attempts will fail, but some genuinely interesting products will emerge.

The second is Scalable Personalization. Looking back at the evolution of online content distribution: first came portals with their "one size fits all" approach, where everyone saw the same thing. Then search engines, which personalized content around keywords but still returned identical results for the same query. Then recommendation algorithms exemplified by TikTok, which proactively pushed content based on user profiles.

Now we’re thinking about the next level of personalization: if the content a user wants to see doesn’t exist yet, generate it for them. Video generation technologies like Sora are designed to create content tailored to individual preferences. Recently fast-growing applications like bolt.new and Windsurf generate personalized websites from text prompts. In software development, the future may no longer resemble the "Hollywood blockbuster" model of centralized development like WeChat or TikTok, but instead offer more personalized software and content experiences for every category of user.

Google’s NotebookLM also reflects this trend. Take podcast content: today we can only listen to conversations that have already been recorded, but in the future AI could generate dialogues between any two people on any specific topic. As AI capabilities improve, the software we use and content we consume will become increasingly personalized.

Third, with o3 we can see AI capabilities evolving from "human-level" to "superhuman." Early tests like MMLU were still evaluating whether AI could reach ordinary human performance; now benchmarks target elite humans, such as SWE-bench for programmers, AIME for American high school math competitions, and GPQA for PhD qualifying exams. By early 2024, advanced models like o1 and o3 had already scored around 80 on these tests.

We now need benchmarks for superhuman performance, such as FrontierMath endorsed by Terence Tao. o3 recently scored 2700 on Codeforces, a level achieved by fewer than 130 people in all of human history. This means AI will play a significant role in scientific research and frontier exploration.

After o3’s release, some criticized its high cost and heavy compute requirements for single tasks. But o3’s high-compute mode was never designed for routine tasks — its purpose is to solve the hardest research and exploration problems at humanity’s frontier. It’s entirely normal for such capabilities to be expensive.

Going forward, we’ll likely see AI models diverge between everyday tasks and frontier research. Like Sheldon from The Big Bang Theory — a brilliant scientist who’s hopeless at daily tasks. Some AI models will be more like Sheldon, tackling frontier exploration; others will be like the cost-effective o3 mini, primarily for getting work done, more like a programmer; and there will be even simpler models for answering basic on-device questions like "what’s the weather today."

Here, we can see both everyday needs being solved more efficiently and cheaply, and genuine frontier research where AI collaborates with scientists to produce new advances and generate new knowledge. That prospect excites me tremendously.

Koji: There was another major breakthrough in multimodality this year. Beyond 4o’s real-time voice, OpenAI tucked something away in an inconspicuous corner of their release that many considered one of the most significant achievements of the 12 days — their native multimodal interaction across multiple input and output modalities. What entrepreneurial opportunities do you think multimodality will bring next year?

Yusen: The first priority is how AI understands this multimodal world.

Take text: "The weather is nice today" is a simple sentence, but it contains a great deal that requires seeing to fully understand — so a picture is worth a thousand words.

Images and video contain enormous amounts of information. If AI cannot adequately comprehend this information, its intelligence will have significant blind spots. Current AI is like a blind person — capable of solving incredibly difficult math problems, which may not hinder certain capabilities, but to achieve more complete intelligence, multimodal understanding is indeed crucial.

OpenAI and leading overseas researchers generally believe generative capability may not be the most important priority, which is why Sora currently receives relatively limited resources. In the US, multimodal generation runs on a somewhat parallel track because its primary use cases are entertainment content and content production, so it still seems somewhat distant from AGI. Companies like Anthropic, which don’t pursue multimodal generation, believe AGI can be achieved through text, code, and APIs alone — a different perspective entirely.

On multimodality, I think NotebookLM offers an excellent insight: how to transform content from one modality to another for consumption.

Consider TTS: we used to convert text directly to speech, but turning text into a podcast isn’t simply reading it aloud — that would just be audiobooks. Podcasts require transforming content into a form better suited for audio consumption. Similarly, text to video: adapting Romance of the Three Kingdoms into a TV drama isn’t straightforward reproduction but requires artistic adaptation. Video to text, video to audio — the same principle applies. Naturally converting between different modalities and creating content optimally suited for each modality’s consumption is an exhilarating process.

Suppose I enjoy scrolling TikTok — I could transform The Three-Body Problem into content suited for TikTok consumption, or into podcast-appropriate content. This opens up numerous opportunities in content consumption.

Going further, many believe multimodal generation and understanding will significantly benefit embodied intelligence. We’re seeing cutting-edge research like the recent Genesis project, studying how to simulate the physical world and how robots manipulate real-world objects — all fascinating work, though I’ve been following this area somewhat less closely lately.

Overall, conversion between modalities is indeed a critically important direction. As you mentioned with Gemini 2.0, its ability to efficiently understand received video signals enables some very intuitive applications. In daily life, we encounter many things we see but don’t know how to use — but with strong enough video generation capability, usage instructions could be overlaid directly on the video feed. We previously discussed a scenario with Google researchers: I have a coffee machine at home, I point my phone at it, and the video stream overlays a generated video prompt saying "press this button to start brewing." These are fascinating ideas, though they likely still require further technical advancement.

08 AI Native Applications: Waiting for New Business Models After Deep Technology Diffusion

Koji: I think 2025 will likely see such applications emerge, including their integration with AI hardware. I previously saw a demo of AI glasses for tennis that provided real-time coaching — advising how to adjust posture and receive the ball as it came from the opponent, helping improve your game.

On native multimodal interaction, I want to expand a bit — this is a development that’s recently surprised me. As a guest on the previous episode of Crossroads noted, this technology was released during the 12-day event but tucked away in an inconspicuous corner. He considered it actually the most noteworthy breakthrough. OpenAI chose to reveal this information quietly, apparently to avoid drawing competitors’ attention. However, within developer circles, they still conducted one-on-one outreach with select key developers.

What makes this technology special is its ability to simultaneously receive multimodal inputs and output multimodal content — and this input and output is native multimodal-to-multimodal. Everyone understands the end-to-end concept; native multimodal-to-multimodal represents several qualitative leaps beyond that.

I also want to ask Yusen an interesting question that everyone must be wondering: what do you think the big opportunities for AI Native applications might look like?

Yusen: First, I believe major opportunities will emerge after deep AI technology diffusion. If it’s still being used by a niche population, the big opportunity probably hasn’t appeared yet. Let’s review how internet-native and mobile-native applications historically emerged.

Step one: as technology diffuses, use new technology to solve old problems. In the internet era, we had email for communication, portals for news, and first-party e-commerce for selling goods. But as the internet expanded further, social networks emerged only after everyone was online; search engines became necessary only after information was online; and platform e-commerce appeared only after buyers, sellers, payments, and logistics were all in place. These platform e-commerce businesses, social networks, and search engines were the true internet-native applications, all created by startups that ultimately captured the largest market capitalizations.

Mobile-native applications followed similar logic. Only after mobile internet (including smartphone hardware and 4G networks) became ubiquitous, with both content producers and consumers on smartphones, did mobile-native information platforms like TikTok, Kuaishou, and Xiaohongshu emerge. Only after blue-collar workers had smartphones could Meituan Waimai and DiDi be born; only after gamers had smartphones could mobile-native games like miHoYo’s titles and Honor of Kings appear.

AI Native applications should follow similar logic. Initially we may see applications like ChatGPT, giving everyone an AI assistant, but diffusion needs to scale further. When each of us has our own AI assistant, using AI to solve many work problems, even conducting meetings as we are now, new possibilities will arise.

What happens when AI interacts with AI? In a company where most execution is handled by AI, there could be massive changes to productivity and enterprise software. Because you must not only execute but also manage these AIs — assigning tasks, breaking them down. These may be things humans simply cannot do, as people lack the attention and energy.

Another important theme is monetization in the AI era. In the mobile and internet eras, much monetization happened through advertising. But when you ask Moonshot AI or Perplexity a question, the ads in search engines and on webpages become invisible, because AI is reading those pages for you. This requires reconstructing value capture. How do we extract value from answers I get from AI? Ads were meant for human eyes, but AI sees and filters them out. So the disruption of advertising business models will also create many opportunities for AI Native applications.

Koji: One last question: in 2025, what investment directions will ZhenFund and you find most interesting? And are there any non-consensus views, differentiated perspectives you hold?

Yusen: Our differentiated views mainly fall in three areas:

First, we're cautious about "time-killing" apps. Right now, many people are trying to find the next ByteDance by following ByteDance's playbook — looking for high-engagement, ad-driven consumer apps that scale through paid acquisition. But I think when users' time is already so heavily captured by ByteDance, the next killer app may not emerge in that same mold. In other words, the next ByteDance probably won't look like ByteDance.

Second, compared with the current enthusiasm for humanoid robots, we're staying relatively cool-headed. We've seen many humanoid robot本体 companies raise large rounds, but the technical path for general-purpose humanoid robots — whether it's Sim-to-Real, training from video, or manipulation data collection — hasn't converged yet. How to collect this kind of data at scale remains an open question. Investment sentiment in this space is overheated. The time required for humanoid robots to perform tasks in the physical world, let alone enter homes to do housework, is likely much longer than current projections and investment cycles suggest. So we're cautious on the robot bodies themselves, though we have invested in critical upstream components like dexterous hands and motors.

Third, regarding AI applications in productivity. We've found that in the US, agents are landing quickly in productivity and enterprise software. But in China, because of the widespread belief that enterprises won't pay for tools, there's been a lot of resistance and challenges — many enterprise software entrepreneurs and investors have been burned. But I'm thinking, when sentiment gets this extreme, it often signals a reversal opportunity.

If simply selling tools may not work in China, offering the work output itself at one-tenth the price might be something enterprises would buy. This could become a powerful AI outsourcing model — not traditional human resource outsourcing, but outsourcing tasks to AI agents to complete.

We're thinking, if we must find consumer entertainment use cases for AI, that's always going to be difficult. But can we combine breakthrough productivity gains with product落地 in China? I think enterprise software may not be a closed door — there may be new opportunities here.

Koji: So what directions are you focusing on? You just shared some non-consensus views, or areas you think require caution and deeper thought. What are your priority areas?

Yusen: Since we're always a founder-centric fund, we don't pre-define priority directions every year. But speaking personally, I think having AI do agent-like work in various forms will be a very important area this year. At the same time, I think the scalable personalization through AI programming or modality conversion that I mentioned earlier is also important.

One AI education company we invested in is exploring how to make education sufficiently personalized through AI. Previous internet education solved the scalability problem — using internet methods to bring elite teachers to more people. But going further, we hope to achieve personalization while maintaining scale — this is an important opportunity AI brings us.

So we mentioned "Yu Ai Wei Wu" (Dancing with Love), this AIGC technology company, and we've also invested in some companies exploring directions similar to bolt.new — using AI programming to generate personalized applications. But this space is still in a very early stage and will definitely require lots of adjustment.

Koji: Let's wrap up here. First episode of 2025, we've talked for a long time with a lot of information. But the most important thing isn't just the information — it's hoping to convey more optimistic signals and energy through this episode, to get everyone to take more action, to create and produce more. If anyone wants to raise funding, welcome to reach out to ZhenFund. Thanks again, Yusen.

Yusen: Thanks for having me. Your summary just now was excellent. When technology waves are this turbulent, even though there are still many problems and unrealized ideas, the pace of落地 these past two years has far exceeded my expectations. So we have many reasons to stay optimistic and try to break through.

Spend enough time, even spend a little money, to experience the latest AI products and feel the incremental progress they bring. This is meaningful and valuable for all of us — whether as investors, entrepreneurs, or simply as people curious about the future.

Of course, also welcome everyone to listen more to "Crossroads" and ZhenFund's "Seriously Speaking" podcasts — they'll help us learn and understand AI better.

Koji: Alright, thank you everyone, happy new year, bye.

Yusen: Happy new year everyone.

Stay Optimistic About AI

Hosts

Yusen Dai: Managing Partner at ZhenFund, investor in leading AI companies including Moonshot AI

Koji: Host of "Crossroads," Co-founder of Newshixiang and Tangdao

Timeline

PART 1: 2024

02:19 One word to describe 2024: "Fast"
10:06 The future is already here — it's just not evenly distributed
11:58 New requirements for AI application落地: reduced hallucinations, improved coding ability, computer use
14:35 Is AI the era of post-00s entrepreneurs? Common traits of the new generation?
17:03 Progress that exceeded expectations this past year: pre-training hitting bottlenecks, agent acceleration, falling model costs
20:50 New product forms will emerge from "reinforcement learning" and "context"
26:30 Four stages of AI programming development: ChatGPT → GitHub Copilot → Cursor → Devin
30:22 Why encourage AI companies to go global? Who should go global?
32:45 Advice for Chinese entrepreneurs going global: figure out "what to do" + "how to promote"

PART 2: 2025

37:02 Why is Devin so exciting? An all-around "intern" for $500/month
40:37 What future did we experience from Devin?
42:36 Characteristics of the new paradigm: asynchronous experience, cloud virtual machines, knowledge accumulation, task-based pricing
54:47 "Cursor is the programmer's Copilot; Devin is the boss's Scaling Law."
01:05:08 Staying optimistic about AI: people, models, products — PMF磨合 still requires patience
01:11:27 In 2025, what application directions are easier to land? Where are the boundaries of entrepreneurship?
01:17:16 Emerging technology waves with hidden potential: Agent, scalable personalization, o3
01:30:08 What does the big opportunity for generative AI Native look like?
01:33:30 Consensus and non-consensus among investors in 2025

Related Recommendations

OpenAI o3:

https://mp.weixin.qq.com/s/BdLjKBa2VxoE5Nxh4WuBdQ

https://arcprize.org/blog/oai-o3-pub-breakthrough

https://mp.weixin.qq.com/s/MJlJ1hdN9oYhL9xZK9HR9Q

You can listen to us on Xiaoyuzhou, Apple Podcasts, and Ximalaya. If you have any suggestions or expectations for the show, welcome to interact in the comments~

If you have any entrepreneurial ideas or collaboration ideas, welcome to email media@zhenfund.com!

Recommended Reading