Long-Horizon Autonomous Agents: The Birth of a New Species

Monolith砺思资本·March 5, 2026

Generals are scarce; soldiers come free.

On February 5, we published our experience using OpenClaw, calling it "the Cambrian explosion of Agents." In just a few short weeks around the Spring Festival, the pace of new developments exceeded anything we could have imagined — enough to make us pause and put together a more systematic reflection.

Contents:

I. 72 Hours, 60,000 Stars

II. From 30 Seconds to 30 Minutes

III. A Satoshi Moment?

IV. Whose Profits Will Evaporate?

V. After the Barriers Hit Zero

I. 72 Hours, 60,000 Stars

Over the past month, our hands-on experience has convinced us that OpenClaw is not a chatbot, not a code completion tool, and not any AI product form we've grown accustomed to over the past three years.

It is a fully authorized digital agent — it reads your emails, manages your calendar, executes code in terminal, and communicates on your behalf in Slack and Discord. It doesn't assist you. It acts for you.

In February, OpenClaw gained 60,000 stars on GitHub within 72 hours. It consumed 13% of all token traffic on the OpenRouter platform. Released around the same time, Moonshot AI's Kimi K2.5 was the most heavily used model.

This is not merely a landmark event. It is the signal of a new era. It marks AI's transition from "conversationalist" to "executor" — from answering questions to completing work. The technical paradigm carrying this transformation is called the Long-Horizon Agent.

II. From 30 Seconds to 30 Minutes

In less than three and a half years — from ChatGPT's launch to today — AI has evolved from merely answering questions to executing long-duration tasks across systems, gradually taking on the form of a "digital employee."

METR has been tracking AI's ability to complete extended tasks. Their data shows that the duration of tasks AI can autonomously complete doubles every seven months. At this rate, by 2028, AI should reliably handle work that currently requires a full day from a human expert.

Recently, Andrej Karpathy shared on Twitter his experience building a home security camera service — he described a task to an AI agent entirely in natural language: log into a local server, configure SSH keys, download a vision model, build a video analytics dashboard for home security cameras, set up system services, and write a markdown report.

The agent ran for about 30 minutes. It encountered multiple errors along the way, searched online for solutions, fixed them one by one, wrote code, tested, deployed, and returned a complete report. Karpathy himself didn't touch a thing.

He remarked: "Three months ago this would have been a weekend project. Today you launch it, go make coffee, and come back 30 minutes later to find it done."

Beneath his tweet, an even more interesting comment appeared: "You just replicated a startup that raised $17 million in 30 minutes."

Comment under Karpathy's tweet

Clicking through to the project's website confirmed it — the functionality was remarkably similar to what Karpathy had built.

The project's website

Three months ago it was a weekend. Today it's 30 minutes. What will it be a year from now?

III. A Satoshi Moment?

In this evolution, one easily overlooked fact stands out: making long-horizon agents actually work requires not just better models, but better systems engineering.

Heartbeat, progress files, checkpoints, context compaction — none of these are products of cutting-edge AI research. They're decades-old techniques from distributed systems and software engineering. But assembled in the right way and wrapped around an LLM, they transform an agent from something that "runs for 30 seconds" to something that "runs for 30 minutes" or longer.

This brings to mind Satoshi and Bitcoin. Every component of Bitcoin — hash functions, P2P networks, asymmetric cryptography — was not new in 2008. But Satoshi found a way to combine them into something entirely new.

OpenClaw's founder Peter didn't invent new technology either. Its core consists of five components with distinct roles: Gateway, Brain, Memory, Skills, and Heartbeat. None of them is particularly complex on its own — the Memory layer is literally just Markdown files, appended daily and loading the previous two days' logs at the start of each session.

OpenClaw's architecture design

After the project exploded in popularity, many engineers' first reaction was: "I could build this too."

But that's how new species are born. The parts have been sitting there all along, visible to everyone — but only one person found the way to assemble them: to make an agent that wakes itself up, remembers you across sessions, and acts for you through the messaging apps you already use. That assembly itself is the invention.

The model's harness has become critically important. Anthropic's engineering team recently admitted in a blog post that even though Claude has context management capabilities that theoretically allow agents to work indefinitely, the model alone is far from sufficient. They found that agents try to do too much at once, exhausting context, or declare tasks complete after seeing partial progress. The solution isn't waiting for a stronger model — it's designing a better harness: using an initializer agent for environment setup, a coding agent for incremental progress, and progress files for cross-session state handoff.

Consider this premise — if technological progress on Earth suddenly halted, even with models frozen at their current level, purely engineering-level combinatorial innovation could still unlock enormous new capabilities.

If you accept that, it implies:

First, the application-layer startup window is larger and more urgent than most people think. You don't need to wait for GPT-6.

Second, the moat from this combinatorial innovation is invisible. Unlike model parameter counts that can be benchmarked and compared publicly, a good harness's advantage hides in hundreds of engineering details. These can only be honed in real-world scenarios and are difficult to replicate quickly.

IV. Whose Profits Will Evaporate?

A radically new future awaits us.

Everyone can sense that modern commerce contains vast profit margins built on friction premiums. Auto-renewing subscriptions, bundled packages, insurance default renewals, intermediary commissions...

Long-horizon autonomous agents are systematically dismantling this premise. Soon, an indefatigable agent could spend 24 hours comparing prices for you, switching suppliers at optimal moments, canceling subscriptions you no longer use, and saving you substantial sums without you even knowing. When every consumer has an always-on optimization engine behind them, every company profiting from the old model faces structural pressure.

The future may need far fewer apps than we have now. Aggregation platforms, search engines and information feed advertising, subscription SaaS — many business forms we take for granted will be reshaped.

Excitingly, when agents can complete work like humans do, a natural question arises: will a "hire an agent" marketplace emerge?

This marketplace likely won't be a unified "agent store," but rather will fragment across vertical domains: an agent platform specialized in insurance claims, one for legal due diligence, one for financial reconciliation. Agents on each platform will be trained and calibrated on that domain's real-world data, with accuracy and reliability far exceeding general-purpose models.

Whoever first accumulates sufficiently deep execution trace data in a high-value vertical will possess an unreplicable moat. Models can be swapped out, prompts can be copied, but the corner cases and optimal practice paths honed through tens of thousands of real business executions — these are private, proprietary, and deepen with use.

V. After the Barriers Hit Zero

For entrepreneurs, this is simultaneously the best news and the most brutal news.

The best news: the capital barrier to starting up is plummeting. Previously you needed to raise funding first to hire a team — engineers, designers, marketers — before you could build an MVP. Now you might build one with a few hundred dollars in API costs and a weekend.

The brutal news: if you can do this, everyone can. When entry barriers approach zero, competitive density explodes. A thousand people could ship functionally similar products in the same week.

Differentiation no longer comes from "can you build it," but from how deeply you understand the problem.

Time windows are compressing too. In the long-horizon agent era, the cycle from idea to product shrinks from months to days. First-mover advantage becomes more important, but also more fleeting.

And then there's the deeper challenge to societal structure.

In the past, an average white-collar worker might make 300 decisions daily. But 90% of these were pseudo-decisions: reply to this email now or later, send meeting notes in the group chat or directly to the boss — these choices consumed real time and attention while producing virtually no real value, yet fabricated our sense of busyness and professional identity.

The emergence of long-horizon autonomous agents is essentially a decision compressor. It compresses 300 mediocre execution items into 3 core judgment calls. When those 297 attention-draining trivialities vanish, a naked question surfaces: when you're no longer busy running, do you still have the ability to tell direction?

Andrej Karpathy calls this surviving capability Taste. In an era where execution is nearly free, generals become scarce while soldiers become free — deep domain understanding, intuitive sense of what constitutes good versus bad, and the decisiveness to rein in an agent when it veers off course become the only assets amplified by leverage. Mediocre execution will be zeroed out; profound cognition will be multiplied tenfold.

And this screening won't send advance notice, nor will it wait for everyone to be ready.

Closing Note

As a technology investment firm, we write this not to predict the future, but to document the starting point of a structural shift we are witnessing firsthand.

In rapid change, no one can be certain who the ultimate winners will be. In mobile internet's early days, we might have derived DiDi and Meituan from LBS patterns, but no one could have foreseen TikTok — that was the genius creation of entrepreneurs.

For long-horizon autonomous agents, the changes we can already see are staggering, but new species aren't predicted — they're created. The portion we can extrapolate is clearly just the tip of the iceberg above water.

We look forward to you — perhaps reading this right now — being among those who create the future.

We welcome all entrepreneurs in this space to connect with MONOLITH anytime, and to join the next MonoX offline event: "Participating in New Species — Long-Horizon Autonomous Agents."