The Local Gateway to Agent OS

葬AI葬AI·October 19, 2025

The young upstart beats the old guard.

"The Young Gun Outdoes the Old Guard"

In mid-September, my young-gun friend Zhong Shiliu told me he was about to pull off something wild — building a product similar to Claude Code.

I genuinely had zero expectations at the time.

Everyone was making Claude Code wrappers, just like everyone was building workflow tools at the start of the year. When Zhong sent me the invite code for Stepfun Desktop Assistant at the end of September, I didn't even bother trying it properly 👋😭👋

Because by now, there are already way too many CC-related tools.

Besides, Stepfun has never been good at 2C. Their previous consumer products all crashed and burned — Maopaoya shut down, and Lupu is about to. The boss spends his days thinking about custom development deals with big companies like OPPO and Qianli Technology. So I really wasn't curious about Stepfun.

Then came National Day holiday. I was playing Red Alert and ran into a real problem.

I'd downloaded hundreds of maps in various versions, and different mods used different file extensions that needed to be unified to .mpr. The folder was a mess — compressed files in multiple formats, extracted folders, everything scattered everywhere.

I gave Stepfun Desktop Assistant a single instruction: organize this folder, unify the extensions, extract files, remove duplicates, archive everything into the main folder.

A task that would've taken me half an hour to do myself — the assistant actually pulled it off.

DM "114514" on our WeChat backend for the Red Alert map pack 🥵

It called the local terminal to write scripts for renaming extensions, invoked the decompression tool for archives, used scripts to organize everything, and moved files for deletion into a separate pending-deletion folder. Took about five or six minutes, and my Red Alert map folder was clean and tidy. I just had to drag the pending-deletion folder to trash and I was done.

DM "Red Alert" works too ☺️

That was my first aha moment. I suddenly realized: it can operate the local terminal and do things no cloud AI could ever do.

Another aha moment: fixing my Claude Code environment.

I'd been using CC for writing lately, sometimes switching between Claude and the K2 model. After using K2 for a while during the holiday, I couldn't switch back to Claude no matter what. I'd set new environment variables, but the terminal kept throwing API errors.

I searched on ChatGPT. It gave me a solution, but I'd have to create scripts myself and save them to specific folders — too much hassle.

I tried telling Stepfun Desktop Assistant about the problem. It asked for permission to access my Cursor app data (that's where I run CC), and actually fixed the issue in the terminal.

These were two massive aha moments for me.

Most of my needs are already met by cloud AI tools. ChatGPT-5 replaced my search needs, Gemini handles lightweight text processing, Claude Code covers heavy writing tasks.

But all these AI tools share one problem: they can't access the local environment. When Claude Code's environment breaks, I can't get any AI tool to fix it.

Stepfun Desktop Assistant can.

So after the holiday, I caught up with Zhong Shiliu. I was blown away — bro, this product went from project kickoff to launch in just six weeks, yet the completeness is remarkably high. Compared to CC's terminal interaction, he used a floating window — something ordinary users can actually use — to implement a local-environment agent.

Bro told me a big story: what he wants to build is JARVIS. The floating window format exists because he wants to create a new Agent OS entry point 🥵

JARVIS helps Iron Man complete tasks automatically — you say what you need, and it gets done, delivering results. Meanwhile, JARVIS gives you personalized interfaces: the 3D displays in Iron Man's home workshop.

These two points are the core of Agent OS.

The floating window and Workspace correspond to two different usage scenarios.

The floating window is like the heads-up display inside Iron Man's suit when he's flying around. Lightweight, non-intrusive, supplementary information on demand.

Workspace is like those massive screens in Iron Man's home workshop. It personalizes your complete view and provides richer GUI capabilities.

The two exist side by side, serving light and heavy scenarios respectively.

Everyone's telling the AI operating system story, but Zhong is the one who's articulated Agent OS most clearly and actually shipped a working product. The young gun really is ahead of the old guard 👍

But JARVIS is just the vision. Whether this story works hinges on the local environment.

From a trends perspective, the local environment is becoming the new battleground for agents.

The problem with cloud agents: you need to upload files, can't directly manipulate the local environment, and there are security and privacy concerns. Local agents can directly access your files, invoke local tools, and match user operating habits.

The catch is that existing local programming agents mainly target developers — running CC in a terminal obviously isn't suitable for ordinary users.

So the bigger value proposition is bringing local programming agent capabilities to ordinary users.

A recent interesting example is Manus. Manus partnered with Microsoft and can now be invoked directly in the local Windows environment. For instance, creating a website from documents in a local folder takes just one click — Manus builds the site in minutes.

Manus's most important update isn't that it can generate websites (that's a nothingburger) — it's that it can access local Windows files.

Of course, Stepfun Desktop Assistant is still early stage. Everything it can do has alternatives.

Renaming file extensions? Claude Code itself can handle that. Deep Research? I don't believe it can beat ChatGPT. Also, once Stepfun's floating window closes, you can't summon it with a hotkey — you have to reopen the app. And there's no voice input support.

My gut tells me: if I could press a hotkey and voice my request straight into the floating window, letting the AI execute it, that workflow would feel much more complete.

The bigger issue is security and trust. Using this product requires extensive local permissions. When fixing terminal issues, it even needs access to my Cursor application data.

I don't fully trust Stepfun — that's a major concern for me. Though Zhong Shiliu says security will be addressed through an on-device/cloud hybrid approach, with on-device models handling sensitive information. That's planned for future versions.

Even so, Stepfun Desktop Assistant's value proposition is clear. It brings programming agent capabilities to the local environment and makes the complex terminal remarkably simple to use. It's using the old-school floating window interaction to attempt a new entry point.

The direction is right. Because the local environment is where the real value lies.

I wrote an article before about the agent marketplace Mulerun.

Mulerun's core thesis is that programming agents like Claude Code will become the engine for a new kind of entry point.

Users state their needs in a search box, and the programming agent generates disposable software to meet those needs. No more downloading fixed apps from the App Store — software is generated on the spot based on your requirements.

Bro sees the same trend. The change brought by programming agents is new entry points and enhanced programming capabilities. They can generate disposable software, bypassing traditional service providers to directly satisfy user needs.

Don't want YouTube Shorts? Generate a version without them. Want to compare prices across Taobao, JD.com, and Pinduoduo? Generate a three-in-one price comparison tool. No waiting for service providers, no downloading fixed apps from the App Store. You have a need, the agent generates software on the spot.

This is the shift programming agents bring: no longer do service providers dictate product form — user needs drive software generation.

Bro's entry point is the local environment. Mulerun's is the cloud. But fundamentally, everyone sees the same opportunity: programming agents will become the engine for the next generation of entry points.

From tool to entry point to OS — that's a clear evolutionary path.

Right now, Stepfun Desktop Assistant is a tool. In the future, the floating window becomes a new interaction entry point. The endgame is Agent OS — a JARVIS-like existence.

Agent OS could run on entirely new terminal hardware: phones, PCs, cars, even Plaud, Fuzai, Looki.

I still haven't successfully talked to Fuzai to this day 😅

On PC, without depending on terminal progress, standalone apps can already obtain necessary permissions and user context, enabling faster iteration while maintaining expandability to other terminals. Plus, productivity is where users currently feel the most pain. So starting from this point makes it easier to build some moats.

This is an absolutely massive story. Everyone's telling the same story, but Zhong not only constructed it with internal logic — he actually shipped a working product. The execution is just 🥵

Everyone sees this trend. PC makers, phone makers, car makers — all trying to embed agent capabilities into local environments. Whoever establishes this entry point gets to define the next-generation operating system.

The cloud war has already gone scorched-earth, but local agent products for ordinary users are just getting started.

Bridging cloud and local — that's the new value Agent OS can deliver.

(Images in this article generated by ChatGPT, writing assisted by Claude Code)