Yunqi Capital AI Review | The Gap in AI Implementation We Found in OpenClaw's "Pitfalls"

云启资本·February 12, 2026·9·0

Full Throttle · Yunqi Capital New Year Goods Collection Vol. 03

Lately, OpenClaw has been the talk of the AI world — everyone's trying it, everyone's discussing it.

As a tech VC deeply rooted in AI, we have a saying: "Don't just invest — use." So when OpenClaw emerged as a tool that seemed tailor-made for "automated workflows," we obviously had to give it a spin.

For the third "unboxing" of our "Full Throttle · Yunqi New Year Collection," we're sharing our (slightly painful) hands-on experience from the OpenClaw front lines. In the new year, we'll continue sharing our AI tool observations through "Yunqi Reviews."

Phase One:

A Bumpy Full Day

with OpenClaw

Our first test scenario: automated vertical news delivery for frontier tech. In other words, getting OpenClaw to run news collection + categorization + Lark push notifications.

The workflow sounds straightforward: pull global updates from a database → precisely categorize across nine dimensions like algorithms, hardware, and funding → format into clean Lark cards.

But in practice, OpenClaw stumbled when it came to understanding this vertical business logic. From connecting to the database, analyzing field structures, classifying according to specified logic, calling models to generate summaries, stitching together Lark card styling, to push testing — every step required correction.

After nearly a hundred rounds of prompt adjustments and almost a full day, it finally produced that satisfactory card.

But then we hit a frustrating problem: the work couldn't be effectively preserved.

When we asked it to send again, hoping it would work like a "veteran employee" every day, all the previously configured NLP processing logic and message card styling "reset" — either the formatting broke or the categorization drifted.

This made us realize: at this stage, relying purely on prompts to get OpenClaw to complete a vertical, custom-logic task (collection, processing, Lark push) without preserving it is highly unstable.

A Different Approach:

The Claude "Dream Collaboration"

Since OpenClaw's native workflow is still evolving, we tried a workaround: bringing in Claude Code as support.

For the same frontier tech news delivery need, we threw the complex database logic and classification standards at Claude Code. To our surprise, Claude Code completed the full cycle from coding to testing in just over an hour. Its advantage was clear: it generates code, not an accidentally working sequence of actions. And code is inherently preservable, refactorable, reusable.

We then made a key move: packaging this robust code as a Skill and deploying it on the server where OpenClaw runs.

The reversal: when OpenClaw served as the "intelligent pipeline" calling this professional script, it performed naturally and stably. This suggests that for highly customized tasks at this stage, the more pragmatic path may be to codify the logic first, then hand it to OpenClaw for distribution and orchestration.

OpenClaw & Claude Code collaboration @ Yunqi office

We also tried configuring an open-code skill on OpenClaw, explicitly requiring OpenClaw to complete skill development through code — and that worked smoothly too.

So the key isn't which tool you use, but what mindset you bring to asking AI to do things.

A More Interesting Experiment:

"Needle in a Haystack" in the Skills Era

In large model evaluation, there's a classic test called "needle in a haystack."

We ran a similar experiment, adding a "secret code" skill to OpenClaw to see if it could precisely retrieve among a massive skill library.

The result: it couldn't smoothly respond directly, often requiring more explicit prompt guidance. It seems the deep coupling between agent intent recognition and skill retrieval still has considerable room for improvement.

Conclusion:

Embracing the Temporary "Little Bugs"

This hands-on test confirmed there's still a meaningful gap between "works" and "works well." At the same time, we've developed a more grounded understanding of AI implementation. A few thoughts to share:

1. Code remains "hard currency." At this stage, for tasks with extensive custom logic that demand 100% stability, the optimal path is still to "codify" the task. Claude Code creates the resource; OpenClaw handles intelligent distribution — this "dual open" model may be the best current solution.

2. Don't mythologize "autopilot." You can't just say "help me search for updates" and expect distilled精华. AI is just the pipeline. Carefully curated data sources, precise classification logic, and even an aesthetically tuned prompt style — these are the real core assets.

3. Stay patient with new things. As a rapidly iterating project, OpenClaw's frequent rebranding and compatibility issues can be maddening, but this may simply be the natural state of new things being born.

Of course, maybe we just haven't gotten good enough at using it yet

So we want to ask our all-knowing network: when deploying and using similar agent tools, do you have any secret techniques or clever guides? Drop your tips in the comments.** Let's explore and improve together!