The First-Order Problem for Agents: Proactivity and the Economics of Context

五源资本·April 29, 2026·17·5

Human attention has never been more scarce.

Over the past year, the capability ceiling of AI agents has been pushed steadily higher — context windows keep expanding, tool calling grows more sophisticated, and multi-agent collaboration is becoming the norm. Yet the user experience of working with agents hasn't improved in lockstep.

A fact that most players have overlooked: while a model's context window grows, the human context window — our attention span — hasn't changed at all.

When a user has three Claude Code sessions running, five browser tabs open, and a dozen IM groups active simultaneously, intent begins to fall through the cracks at scale, and tasks start fragmenting. No matter how intelligent a passive agent is, it can't fill this gap — because the problem isn't on the model side; it's on the human receiving end.

It's along this fissure that the field has recently begun to diverge:

Passive agents are approaching a ceiling — ChatGPT, Claude, Gemini, and other "wait-for-you-to-speak" agents are growing increasingly homogeneous in capability.
Coding agents were the first sub-track to break out with a distinct approach — Claude Code and Cursor are crossing the boundary of passive response by "actively reading code, actively suggesting, actively executing."
General-purpose proactive agents are still searching for scenarios with clear user intent.

The proactive desktop agent is the direction that grew from this line of thinking — it doesn't aim to make models smarter, but to reinvent how models and humans collaborate.

AirJelly is one concrete answer in this space. It recently completed an investment from 5Y Capital and has entered internal beta. This podcast episode is a 5Y Tavern interview with AirJelly founder Bote Huang.

Guests Bote Huang | Founder, AirJelly Yuke Yang | Investor, 5Y Capital

Host Qianqian | Head of Brand, 5Y Capital

01 How to Input, So Agents Get Maximum Signal-to-Noise?

Qianqian: What problem does the desktop agent aim to solve?

Bote Huang: In this era, agent capabilities keep rising and model context windows keep expanding, but constrained by physiological limits, the human context window remains finite. In daily use, we constantly lose intent information from IM, AI, and various agents.

As AI capabilities grow stronger, more people are beginning to use multiple agents simultaneously for multithreaded work — the brain can no longer bear this load. So we wanted to build a proactive agent — to ensure no human intent is missed and every task reaches closure. AirJelly is that product.

Yuke Yang: One type of founder I've always wanted to find is the first person to coin a concept. The essence behind this is innovative ability and the ability to define problems. Many founders in China have strong execution, but they're all working on crowded tracks.

The proactive agent direction started seeing entrants around July last year, but Bote was bold. In October, while still at ByteDance, he created the open-source project MineContext — the first to use screenshots for agent context acquisition, and he was open and honest with users about doing it this way, actively filtering for early adopters.

Later, when he explained "Next-Enter prediction" to me, I found the concept novel and sensible — every time the user hits a key, it confirms their intent, thus providing AI with more precise user intent input.

The product design itself was also thoughtfully crafted. One feature that impressed me — AirJelly can rewind your day like a film, showing you everything you did. Additionally, it proactively alerts me before I forget about an upcoming Tencent meeting: what time, with whom, their background. These touches are all quite remarkable.

Qianqian: AirJelly treats "the user pressing Enter" as a significant information node. What's the thinking behind this design?

Bote Huang: From the earliest MineContext to the first version of AirJelly, we were doing full-screen captures to help agents better understand user behavior. But full-screen captures contain massive noise; the signal isn't prominent. Later we switched to using Enter — this is a form of focus.

At a deeper level, the Enter key is one of the few "intent-closing" actions in human behavior.

Most of a person's time in front of a screen is spent wandering — scrolling, hovering, switching windows. These signals have extremely low signal-to-noise ratio because they're exploration, not decision. But when someone types a sentence in an input box and presses Enter, they've already completed editing, deleting, and confirming in their head: before Enter is private thought flow; after Enter is a public commitment.

So full-screen capture records "this person is present," while Enter records "this person is making a choice" — the latter is the minimum effective unit that constitutes an intent trajectory.

Qianqian: What's the difference between meeting recording products and desktop agents?

Bote Huang: Audio meeting products essentially serve alignment communication between people, but miss the context between humans and AI. In the long term, over 90% of context will occur between humans and AI.

Second, our core is recording intent. Audio products do full recording plus summarization — but what most people actually care about are the few action items and recommendations after summarization. We capture intent one step earlier, so our information density is higher.

Qianqian: What's the current user profile for AirJelly?

Bote Huang: Roughly two categories.

The first is ADHD (attention deficit hyperactivity disorder) individuals. They're naturally multithreaded and prone to attention drift, which aligns closely with our philosophy of "intent collection + proactive closure."

The second is heavy AI users and OPCs — they have high communication volume on IM, many follow-up items, and are fervent practitioners of cutting-edge AI tools and multithreading. Additionally, we've received feedback from researchers: they need to do research, run experiments, write content, with tasks running in parallel.

Qianqian: Do you see ADHD as something that needs correction?

Bote Huang: Not at all. The conventional approach to ADHD assistive tools treats ADHD as an illness, trying to correct it back to single-threadedness. But my understanding is: ADHD's multithreaded trait is largely advantageous in the AI era — it maximizes multithreaded work efficiency. Its downside is: tasks are easily forgotten, and after switching tasks, context needs to be reconstructed.

Our approach: on one hand, we help you remember all tasks; on the other, we mark each task's current progress and next action. So even if you're jumping around frequently, you can steadily catch every task. This is a more sophisticated approach — assist, don't correct.

Qianqian: What's the most common aha moment from user feedback?

Bote Huang: I recently heard a stunning case. A friend discovered a subscription he forgot to cancel and was charged, and complained about it in a WeChat group. AirJelly captured this signal, automatically invoked Browser Use, and proactively drafted a cancellation email — just waiting for him to send. When he discovered this, he said: "This is absolutely shocking."

02 Proactive vs. Context: Which Matters More?

Qianqian: For better agent experience, is Proactive more important or is Context more important?

Bote Huang: At this stage, Proactive is more important — because context is always important.

Agent entrepreneurship — especially at the productivity application layer — has a distinction between paradigm opportunities and long-term opportunities. Proactive agents, general-purpose agents (like Manus) — these are paradigm opportunities: you must make noise in the short term, get the market to remember you in this window. The value of context is long-term; it's a layer we continuously accumulate, not the core point for external communication.

Recently, Sequoia Capital published an article about an overseas company called Block, exploring new organizational management approaches. The root of this issue is: for two thousand years, humans have relied on hierarchy. The macro manifestation of hierarchy is "a person can only manage 3–5 people at once," so structures must be stacked layer by layer; the micro manifestation is "a person can only manage 3–5 agents at once," hence the need for more tools.

Companies move fast or slow based on information flow. Hierarchy and middle management impede information flow. For two thousand years, from the Roman contubernium to today's global enterprises, we have had no real alternative. Eight soldiers sharing a tent needed a decanus. Eighty men needed a centurion. Five thousand needed a legate. The question was never whether you needed layers. The question was whether humans were the only option for what those layers do. They aren't anymore.

What we want to do is isomorphic at both macro and micro levels — both address the problem of "limited human attention windows."

In the future, we'll launch a Team version of AirJelly. Within teams, there will no longer be reliance on "alignment meetings" as batch information processing — you can check a colleague's AirJelly at any time to see what they're working on, allowing context to flow more fully. Based on this, human organizational models may see transformative opportunities. This is what excites us most in the long term.

Qianqian: What's the boundary of human-AI collaboration?

Bote Huang: Human value lies more in judgment at critical nodes. To use an analogy — you're a founder, with an engineering lead and GPM lead below you; they produce specific proposals for you to review and decide. The human role in the future will be similar. AI is growing smarter; much code is already written directly by AI, with no one reviewing line by line. When AI has more comprehensive context, it's more likely to become both task initiator and executor. But the decision-making step should ultimately remain with humans.

Qianqian: For the team version of AirJelly, does the agent belong to the employee or the company?

Bote Huang: It should belong to the employee. The ideal form is: everyone collaborates in "projects" or "groups," each bringing their own agent in, selectively sharing context or tasks. This protects privacy while maximizing user adoption.

If the agent belongs at the company level, it becomes a "boss monitoring employees" tool — employees will find ways to avoid using it, and effectiveness will actually suffer.

03 Napoleon, Hegel, and the Push of World Spirit

Qianqian: You've said that starting a company came from a sense of "being pushed from behind." How do you understand this?

Bote Huang: "Being pushed from behind" is a colloquial expression. I usually enjoy reading history and philosophy — Hegel has a more academic term for this: "world spirit."

World spirit describes how Napoleon was originally an ordinary person from Corsica, but the French Revolution propelled him to high position at a young age. In 1806 he entered Jena — where Hegel happened to be living at the time. Conventionally he entered as a conqueror, but Hegel believed it was "world spirit" pushing Napoleon, carrying bourgeois progressive ideas everywhere.

In great eras of transformation, someone needs to do the pushing. I chose AI back in 2021, before ChatGPT exploded. I originally planned to do research and had received a PhD offer from Fudan, but through a twist of fate, couldn't enroll.

My family is from Wenzhou; they're in business. My father caught the reform and opening-up generation, and 30 years later, the AI era arrived. At the 2025 graduation juncture, though I was just a campus hire, my business intuition, street smarts, Building in Public experience, and accumulated AI research and open-source product work — it was all there. It all felt like fate's strong calling and push from behind, so I chose to start out.

Qianqian: Yuke, what struck you most about Bote?

Yuke Yang: I think there are two things. When Bote met our IC last year, he was 23, but he was remarkably composed and confident on the spot, without a trace of nervousness. Near the end of the IC meeting, Bote said he hoped the ICs would ask more questions. This is the opposite of some founders who hope ICs ask fewer questions. I think this confidence comes from his deep familiarity with his own product and his abundant firsthand observation and feeling for users — he didn't treat meeting the IC as an exam, but as a mirror, a way to better iterate his product.

Another impressive point: afterward, when I frequently gave feedback while using AirJelly, these issues were basically updated within the same day. This execution speed is truly remarkable.

Qianqian: Bote, have you figured out the mission in your heart?

Bote Huang: One overarching mission is: after the arrival of the AI era, human-agent collaboration and human-human collaboration will both be dramatically reshaped. We're on the front lines of productivity products — both for individuals and for future organizations. As long as the product grows large enough and influential enough, the landscape of human-to-human and human-to-silicon-life relationships will be affected. We may not be the decisive role, but we hope through our efforts to help future silicon-based civilization and carbon-based civilization form a better configuration.

5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing support from spirit to all business operations. We believe that if the crazy you in others' eyes begins to be believed, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG