Zhang Tao Responds to Controversy for the First Time: Why Manus Hasn't Been Replaced | Tsinghua University Campus Tour

真格基金·December 9, 2025

The magic only happens when you give intelligence back to the model.

On November 30, ZhenFund hosted a sharing session on AI entrepreneurship. ZhenFund managing partner Yusen Dai joined Moonshot AI president Yutong Zhang, AIwudao founder Huaiting Zhang, and Manus co-founder Tao Zhang at Tsinghua University for an in-depth dialogue on innovation and the future.

On March 5, the first general-purpose agent Manus was officially launched. In the video, Peak (Yichao Ji) sat on a couch in the common area of a shared office, introducing the new opportunity they saw. The next morning, social media at home and abroad had gathered far more attention than expected. Four days later, Twitter founder Jack Dorsey, Stripe CEO Patrick Collison, and the Hugging Face product lead had all reposted it.

But controversy followed. On one hand, some said Manus was nothing more than a wrapper product, "something you could build in a weekend." Yet weekend after weekend passed, and Manus remained un-replaced, consistently ranking first on a series of benchmarks including Scale AI's RLI. On the other hand, some believed Manus was merely "good at marketing," that it blew up because of one video. But what Manus truly accomplished was delivering the right product at the right time.

At this sharing session, Tao Zhang responded to the doubts about Manus for the first time. He admitted that the secret of Manus, which simplifies the complex, has always been public — written right there in the lower left corner of the official website: Less structure, more intelligence.

Since entering the AI field, Tao Zhang has wanted to build a product that can truly accompany users 24 hours a day, continuously reasoning. He believes future agents will have more complete systems, better understand your context, master more tools, and proactively complete tasks for you. When a product moves from early adopters to the mass market, no one will care whether it was the "first general-purpose agent" — what people care about is always the value it delivers.

As a frontline AI entrepreneur, Tao Zhang also offered advice to the Tsinghua students present: it's already the last month of 2025. If you haven't truly used an agent yet, you must start trying it in these final weeks of the year and learn to coexist with it. Just as people learned to drive fifty years ago and to use computers thirty years ago, in the future you'll be able to say: "I started using agents in year zero, not year two."

The full transcript follows.

The Story Behind a Phenomenal Launch

Yusen Dai (ZhenFund Managing Partner): From browser extension to Manus, how did this journey unfold?

Tao Zhang: In February 2024, right when Red rejected ByteDance's acquisition offer. If you're turning down ByteDance, you pretty much have to do something bigger, right? From that point, we spent seven months building an AI browser. Since Monica was a well-known AI browser extension, it felt natural for us to build an AI browser.

The browser was fully ready at that point. If we had shipped it, it would have been a very mature product, probably similar in form to Arc's Dia or Perplexity's Comet that people used this year. We had it built by September last year. But one week before launch, Red, Peak (Yichao Ji), and I realized that the AI browser path probably wasn't going to work, so we killed the seven-month project.

Mainly because in the process of building this product, we discovered something magical: AI is exceptionally good at manipulating browsers. Exceptionally good. But we suddenly realized — AI shouldn't use your browser.

It felt a bit like interning at a company where your mentor doesn't give you your own computer, makes you use his, and the two of you are fighting over that computer all day. In the later stages of building the browser, our experience was basically this state — it looked like AI was competing with users for control of their own computers, and the whole experience was incredibly jarring.

When you demo it alone and watch the software run by itself, it feels amazing: wow, AI can actually do this. But when I actually used the product daily, I found the experience completely off, because AI was constantly seizing my computer. So we ultimately decided to let the browser go.

Meanwhile, in June last year, a phenomenal product emerged: Cursor. Most people at our company can code, but we found that colleagues who couldn't code, even my family members, started using Cursor to solve problems in their lives.

For example, my wife used Cursor to convert a video file from MP4 to MP3. She had never written code in her life, but she could use Cursor to do this — have Cursor write Python for her, and get it done effortlessly. Many people who saw Cursor in the second half of last year decided to build a coding agent, but the opportunity we saw wasn't in the coding agent itself. Because as people who can code, we felt there were already too many fancy tools serving engineers in the world — engineers don't need another fancier tool.

We felt the bigger opportunity was to truly democratize the potential of AI coding, so that every ordinary person, non-engineer, non-coder could enjoy the dividends of AI coding.

So we realized two important things: first, AI is very good at using computers, but it shouldn't use your computer — it should use its own computer; second, Cursor showed us that ordinary people could benefit from AI coding. Combining these two learnings basically gives you the original form of what people see as Manus today.

Yusen Dai: After Manus blew up, there was a lot of skepticism from the outside. The first质疑 was that Manus had no technical substance — many people said "you could wrapper something together in a weekend," but I've waited many weekends and haven't seen anyone do it better. So where does Manus's technical substance actually lie?

Tao Zhang: Let me share a moment that made me especially happy. In March this year, two weeks after we launched Manus, I happened to be in the US for NVIDIA's GTC conference. We didn't have our own booth, so I was wandering around aimlessly, and I saw a fairly well-known industry company that mainly does B2B agents displaying their benchmark results on a big screen. Of course, they were first — otherwise they wouldn't have put it up, right?

But what made me especially happy was discovering that Manus was in second place, and we had only launched 15 days prior. I was genuinely thrilled in that moment. In Silicon Valley, on the most competitive frontline of AI, the top agent team was using you as the benchmark to measure against and to surpass.

From March to now, we've traveled extensively across Europe and the US, attended many conferences, and spoken with numerous founders and business leaders. I believe that to date, Manus has maintained long-term first place in overall performance and across various benchmarks, including the new metric RLI (Remote Labor Index) that Scale AI released recently.

The June version of Manus achieved SOTA on Scale AI's Remote Labor Index (RLI)

Here's a small tip for reading benchmarks: the ranking when a benchmark is first published is the most valuable reference.

Because the first time it's published, it hasn't notified any vendors in advance — they test quietly and then release it as a surprise. And once any benchmark is published, regardless of whether the dataset is public or not, people will always find ways to hack it, to optimize for it. But you'll find that in the vast majority of these surprise debut rankings, Manus frequently places first.

Eight months later, Manus is still leading.

I just searched Manus on Xiaohongshu and saw someone post: "With Manus's technical architecture, I went from not understanding it at first, to later looking down on it, to after doing a lot of work myself realizing — I still have to come back to this architecture."

We've never hidden anything. In the third week after launch, we told the world this technical secret: Less structure, more intelligence.

This is also why Manus has always emphasized that it is a general-purpose agent. Know that many concepts have now become best practices, but before we launched Manus in March, this was absolutely not industry consensus. At the time, the prevailing belief was that you should build workflows, teaching AI step by step: how to do this task, break it into how many steps, execute in what order.

But at Manus, we insisted on Zero Predefined Workflow. Facing every user's every task request, even repeated tasks, we leave it to the model to judge: what steps should this task be broken into? What atomic capabilities does each step need to call?

Atomic capabilities mean: should I open a webpage first? After opening, should I click? Do I need to scroll down? Do I need to create files in a virtual machine? Should I run a command in terminal? All of these decisions are left to Manus's agent system to handle autonomously.

And then you'll discover: magic happens.

When you truly hand intelligence back to the model, that's when the magic happens. Not only can it handle a vast number of long-tail tasks, but in the vast majority of formal scenarios, its performance often surpasses manually predefined workflows.

The technology itself holds no secrets — even if it did, the industry reaches consensus within two or three months. But whether you can actually land that "secret" is the true test of a team's capability.

Yusen Dai: The second criticism is that Manus is especially good at marketing. Particularly after Manus released its launch video, it seems like every Chinese AI product now launches with someone sitting on a couch speaking English. Are you guys really that good at marketing? How much did you actually spend?

Zhang Tao: This might be the first time I'm speaking publicly about this.

We launched Manus on March 5. On February 27 — six days before launch — I suddenly felt the website looked a bit too plain. There was no video on the site, just "Manus, the general agent" with thirty-something user examples below. I started thinking, should we shoot a video?

At that time we only had five people in Beijing, working out of a shared space in Haidian. I called some video production teams I knew from before, but they told me: for a product launch video, you need at least two weeks. I said no, I'm launching in six days. They said that's impossible, we can't make it.

So what to do? I said, I'll just shoot it myself.

I have a Sony A7C2, so the camera was covered, but as a struggling founder, I didn't have any decent lenses. I figured if I'm going to shoot a launch video, I should at least show some respect, so I borrowed Yusen's Sony FE 50mm F1.2 GM — the best lens I could get my hands on. So the video you saw was my camera, Yusen's lens, plus a $14.9 CapCut membership and $9.9 for BGM rights. CapCut was free to edit, but I only discovered I had to pay to export in 4K.

Everyone thinks this was marketing, but I believe people on the outside see the surface and tend to attribute things simplistically. If you're not an entrepreneur, it doesn't matter; but if you want to build something, you have to learn the essence of things.

Many people think Manus blew up because of the launch video. No, the real reason is that Manus delivered the right product at the right time.

Today it's hard to look back at the state of global AI products in March. By late last year, everyone was pretty pessimistic, thinking AI products were just chatbots: emotional companionship, playing lawyer, playing financial analyst. But by 2025, people felt there should be a new product form, though nobody knew what. And in March, Manus presented an answer to the world with an entirely new product form.

On March 6, the next morning, all Chinese self-media was talking about Manus. Some said: you must have paid for this, right? But when GPT-5 launched, wasn't all Chinese self-media covering it too? Did OpenAI pay? Why can't a company founded by Chinese people create something that self-media voluntarily discusses without spending a dime?

At the end of the day, it's about the product.

We had roughly $8-9 million in the bank at that point, but at peak, we were burning $500,000 a day. That meant potentially going bankrupt in 20 days. That's why we had to implement the invite code mechanism. Many people don't realize how extreme the traffic and attention Manus was dealing with was.

Without the invite code mechanism, the company could have burned through its cash in days.

The underlying reason, of course, is that we weren't ready. In March, we thought the product was too cool and just wanted to release it to the world. We hadn't even finished building payments — pricing was added 20-plus days after launch. Otherwise, if we had fully opened up, server costs alone would have crushed the company.

But many entrepreneurs only learn backwards from results, so they learned two things: first, founders should sit on couches speaking English; second, there should be green plants behind them, because there happened to be a row of greenery in the background of that shared office space.

So some people learned to shoot videos, others learned to do invite codes. But often your product doesn't even have initial users — what's the point of invite codes then?

If you really want to build something, you have to understand the essence and reasons behind things, not just copy the surface.

Q: Manus's March launch was extremely successful, but the viral spread also triggered backlash. If the team could relaunch Manus, what would you do differently?

Zhang Tao: Public opinion is something you can't control, and obsessing over it isn't something a founder should spend too much time on. But if we could do it again, we'd absolutely make the same choice.

When we first launched this product in March, we were targeting the global market, which is why we chose to launch at 10 PM on March 5 — that timing happened to be when people on the US East Coast were awake.

But I'm Chinese. I spent four months building something I'm incredibly proud of, the proudest moment in my 15-year product career, and of course I wanted to tell my friends I'd made something cool. So if you go back and look at that video, you'll notice it wasn't posted on any official account — it was on my personal WeChat Channels, and it's still my highest-viewed video there.

Its domestic spread completely exceeded our expectations. Honestly, a startup team isn't god — you can't predict this kind of surprise explosion. Public opinion is always two-sided: some will support you, some will oppose you.

Over the past nine months, we've received enormous positive feedback and love from many users, which has deeply moved us. There have also been many interesting reversals: users who cursed us out in March because they couldn't get access, by May when we opened pricing and they actually used the product, their attitude completely flipped — they proactively came to apologize. This happened many times.

Knowing these reactions would come, would you still do it? I believe every entrepreneur's answer is yes, because this is a happy problem.

But you don't need to chase it. What you should focus on is delivering genuine product experience and strength, letting those who once doubted you realize on their own that they were wrong. The more you obsess over public opinion, the less control you have over it.

Giving Users a Reason to Use You Every Day

Yusen Dai: What were some of the important product decisions in this?

Zhang Tao: One decision wasn't particularly difficult in itself, but was extremely important: we decided to go "general-purpose."

I remember being a judge at a Tsinghua University AI hackathon last year, where I shared this point: over the past two years I've attended many AI hackathons, and my least favorite project type is always "AI travel planner." It's not that I hate it from a product or technology perspective — it's the business model. If you're doing ToC, building a truly consumer-facing Agent, being too vertical is problematic.

Think about it: how many times does an average person travel per year? Once, twice, some people don't travel at all. If you're building a travel planning tool, you have to pay extremely high marketing costs to get users to remember your name in those very rare usage scenarios, reminding them through ads and various channels "come find me when you travel." That customer acquisition cost is just too high.

So I believe if you're going broad consumer, you have to be "general-purpose." You need to give users a reason to use you every single day — that's the only way to truly survive and grow. We made this decision quickly, but I consider it absolutely critical.

Another decision helped us go viral early on. Because costs were high at launch and we had the invite code mechanism, many users couldn't immediately experience the product firsthand. But two weeks before launch, we made a crucial decision: adding Session Replay.

Session Replay means you can share your completed tasks, letting others "replay" and fully see how an Agent works. This way, even if many people couldn't experience it hands-on, they could feel the magic of Agent through replay.

Yusen Dai: I remember when we first saw Devin, we all said: they just screen-record everything for people. But Session Replay essentially solves the "screen recording" problem, and also helped people truly see what the Agent form looks like for the first time.

When Manus came out, people's first reaction was skepticism: "This can be built in a weekend." When nobody built it that weekend, the second criticism became: "Just wait for OpenAI to do it, they'll crush you instantly." Later OpenAI did do it — first ChatGPT Agent, then Atlas's browser features.

Q: In the Manus entrepreneurial journey, which critical links did you insist on personally handling rather than fully delegating to the team?

Zhang Tao: I think the most extreme thing about our core team is that we always personally hold onto the most core experiential parts of the product.

With the team we had built from previously working on Monica, we already had over thirty people when making Manus. But on a completely new, revolutionary experience that nobody in the market had defined yet — whether in technology or product design — if you handed any single link completely to team members to execute, it would definitely suffer.

The initial decision to do this came from consensus among core members, but before the product was built, this consensus only existed in our few minds. If more people were brought in to execute without sufficient context and intuition, it would easily go off track. So from writing the first line of code through the first forty days of development, the project had only five people total — including me, who doesn't code and only handles product.

The benefit was extreme alignment and incredibly efficient communication. Whether it was how to write each prompt, the overall technical framework for the virtual machine, or how to polish product interaction details — it all stayed within these five people, extremely smooth.

Q: At the infrastructure layer, should you build the full technology stack yourself, or partner with companies like E2B? What are the main ways E2B helps with general-purpose Agents?

Zhang Tao: Manus's underlying infrastructure is indeed built on E2B. E2B completed their last funding round two months ago, and we're their main success story. After they raised, they even ran a round of ads for us on the streets of San Francisco.

Why use them? It's actually very simple: companies at different stages have different resources available.

If you were Alibaba or ByteDance, I believe they'd absolutely never use E2B — they have sufficient engineering resources to build from scratch. But for startups it's completely different. When we built Manus, the company had only 40 people, and fewer than four were working on E2B integration.

You have to ask yourself: do you want your engineers focused on Manus's core backend, or pull two or three people away to work on K8S scheduling and containerization infrastructure they don't even know?

Neither path is ideal. An unfamiliar team won't deliver world-class VM performance. Hiring two or three experienced people means interviews, onboarding, and ramp-up — at least two to three weeks, and our entire development cycle was only four months. We couldn't afford that.

For startups, time cost is the primary cost.

So the most rational choice is to use proven, battle-tested technology frameworks. That's why in the US startup ecosystem, infrastructure, middleware, and application layers are clearly delineated — companies at each layer buy services from one another. It's a healthy ecosystem. In China, the dominance of tech giants often makes it hard for infrastructure companies to thrive.

But this is only temporary. To adapt to our use case, we're currently using E2B's open-source version, which we've forked into a Manus-specific build. Our modified version and the original have diverged significantly.

In the long run, will we build our own? I think it's entirely possible. It mainly depends on the intensity of our own business needs. They have their roadmap, we have our product rhythm — they won't always align.

But what I want to emphasize is: building infrastructure is incredibly valuable. Every startup ecosystem in the world needs mature infrastructure to support applications built on top. This is a healthy structure.

2025: Learning to Coexist with Agents

Yusen Dai: How do you view the competition between AI applications and model companies?

Zhang Tao: The day ChatGPT Agent launched in July, Red and I happened to be in San Francisco for meetings — our last two days in the US. I can say that the moment OpenAI released Agent, we were probably the happiest people on the planet. We were thrilled.

Why? Because before July, we'd been asked the same question countless times: "What if OpenAI builds what you're building?"

But this question is almost impossible to answer, because everyone asking it already has a presumed answer in their head: "Model companies are naturally stronger; if OpenAI does it, it'll definitely be better." That's the mental model the past two or three years have imprinted on the industry, so you can barely convince anyone otherwise.

But the day ChatGPT Agent launched, we were actually ecstatic — because from that moment on, you no longer had to convince anyone. You could let the results speak.

That same day, we posted a nearly 20-minute side-by-side comparison video online: ChatGPT Agent on the left, Manus on the right, running through every prompt they demoed at the launch event. The results were stark. Not 99% — 100% of tasks. Manus comprehensively outperformed on output quality.

The facts are right there. Later, Scale AI's RLI Benchmark ranked us first and ChatGPT Agent fourth — and that was our old version from June.

Why did this happen? We think there are two core reasons, and they point to the direction we've chosen:

First, model companies can only use their own models — that's both an advantage and a shackle.

This held true two or three years ago. Back then, OpenAI was dominant; it alone had the world's top model. But by 2025, the world looks completely different. Closed-source models are fiercely competitive, each with its own strengths. Open-source models are catching up fast too — especially China's Qwen and DeepSeek, both very strong.

In our internal evaluations, it's already a blooming garden across verticals. Different tasks have different best-performing models; OpenAI no longer monopolizes everything.

In this environment, application builders like us actually have more flexibility. We can pick the most suitable model for each specific task — even down to the "task step" level — rather than using one model for an entire workflow. For example: when searching online, we use Gemini because it can access Google's search index and has the strongest retrieval; when we need maximum reasoning power, we use GPT-5; for backend building or data analysis, we use Claude because it's the most reliable at writing backends and Python.

But from a model vendor's perspective, can you imagine a ChatGPT product manager using Anthropic's model? Obviously not. That's a natural flexibility we have on the application side.

Second, Agent competition isn't about the model itself — it's about the entire system.

ChatGPT Agent in July proved one thing: even a powerhouse like OpenAI, competing in the Agent space against application teams, has to start from the same starting line as us.

Because Agent performance isn't just about model capability — it's about how you build the peripheral environment, how you prepare the tool ecosystem. There's simply too much engineering work involved. If you're interested, you can read "Manus Founder Walks Through Systematically Building AI Agent Context Engineering."

Yusen Dai: At the start of the year, you helped open the window to the Agent era for everyone, letting many people see the future for the first time. Standing at the end of 2025, how do you look ahead at Agent's development direction? What will future product forms look like?

Zhang Tao: First, Agents will definitely control more tools in the future. The magic happens because we found ways to let Agents manipulate enough tools.

Going forward, we'll continue expanding the platforms we can call. Right now Manus's VM is Linux, but Microsoft is integrating Manus's capabilities into Windows, enabling Manus to use Windows VMs and call Windows-ecosystem-specific applications. In the future, Manus will also control Android and use mobile ecosystem apps. More platforms means stronger boundaries of general capability.

Second, I've always wanted to build an AI product that can truly accompany users 24/7, continuously reasoning. We're gradually approaching that future. But to get there, you have to solve infrastructure problems first.

GPT-5 might crash or give wrong results after running ten or twenty minutes — but try giving Manus extremely complex tasks: 1,000 research subjects, 1,000 questions to investigate. Manus not only completes them, but runs fast, accurately, without cutting corners or hallucinating. The infrastructure behind us is preparing for a future, possibly next year, where "someone is willing to pay for an agent working 24/7 in the background for them."

Third, agent proactivity. Current agents still require you to assign tasks to trigger them. But if it's serving you 24/7 in the future, it can't depend on you proactively giving tasks every day — otherwise you become the AI's slave.

We already have an internal prototype: you connect Manus to your personal apps — Gmail, Calendar, Notion — and Manus finds things to do on its own within your context, like an intern with initiative. Every morning at 8 AM when I wake up, Manus has already prepared everything I need to do that day, without me even asking.

Yusen Dai: That's a better "cyber beast of burden." Proactive, capable, 24/7 without eating, just costs some money. But as tokens get cheaper and cheaper, you'll spend less and less.

Q: Manus's early growth was all organic traffic. Will you mainly maintain organic growth going forward?

Zhang Tao: I think organic traffic is a trap.

We relied heavily on organic traffic early on. Beyond our March launch and Share Replay going viral, we also randomly blew up in Egypt in May and hit #1 on the Egyptian App Store overall. Because Egypt is Arabic-speaking, it pulled in Saudi Arabia, UAE, and neighboring countries. Then in July we went viral in Brazil — some Brazilian YouTuber we'd never heard of made a casual video and it suddenly took off.

Zhang Tao (second from right) organizing a Manus offline meetup in Dubai

In the earliest months, there was a lot of outside chatter about how "you guys are great at marketing." My reaction was: fine, then I'll deliberately not do marketing. So it might be hard to imagine — from March to October, a full seven months, Manus spent only fifty or sixty thousand dollars total on marketing.

But we've recently reflected that this may not have been a good thing.

Because if you rely on going viral, on organic traffic, you're always reaching "innovators" and "early adopters." Anyone who's read Crossing the Chasm, that startup bible, knows: you don't need to explain anything to these two groups. They see something new and come try it themselves, come play with it.

But when you truly hit an inflection point and want to break into the mass market in 2026, penetrate a broader user base, you can't keep depending on small-circle buzz. Even if your traffic isn't low, relative to global population it's still a tiny segment — not enough for real market breakthrough.

At this point, you have to proactively communicate product value to the mass market. You need more traditional, more systematic marketing approaches.

So don't fall into that misconception: just because you started with organic traffic doesn't mean that's all you can ever rely on. It's not like that.

When you move from early adopters to the masses, nobody cares whether you were "the first to build a general-purpose agent." They only care about one thing: what can you do for me?

Then you have to think: through what media and channels, in what ways, can you most effectively communicate that to them? The key to marketing is information transmission efficiency, not simply spending money.

Q: For current students, if you view the next three years as the agent window, which capabilities would you prioritize building?

Zhang Tao: As for advice for students, I don't want to get too abstract, because whether you'll definitely start a company or definitely enter the AI field in the future is uncertain. But the one thing I most want to say is:

It's already the last month of 2025.

I bet there are still students here today who haven't actually used a top-tier Agent product yet. If that's you, you absolutely must start using an Agent before 2025 is over, and learn to work alongside it.

Just like learning to drive fifty years ago, or learning to use a computer thirty years ago — if you couldn't use a computer back then, you'd have had a hard time finding a decent job in the decades that followed. Agent is the same: it's a tool that helps us extend our own intelligence. If you haven't started using one yet, just start. That way, thirty or forty years from now when you're telling your grandkids about it, you can say: "I was using Agent from year one, not year two" (laughs).

By Cindy