World Models Are the Dreamcore Arms Race

葬AI·May 5, 2026

Rules for Interpreting World Models

"How to Read World Models"

In a corner nobody's paying attention to, world models are springing up like bamboo shoots after rain, collectively dropping with earth-shattering fanfare.

I'm not exaggerating. Back in April, before Alibaba released its video model Happy Horse, it had already started internal testing on its world model Happy Oyster. Lately, reviews of Happy Horse are everywhere — but have you seen anyone actually studying this Happy Oyster?

Around the same time, Tencent released and open-sourced Hunyuan 3D World Model 2.0, yet it generated less buzz than Honor of Kings: World, which went cold the moment it launched. Also in April, across the Pacific, those nefarious Americans — Fei-Fei Li, Jensen Huang — all dropped new versions of their world models too. And those also sank without a ripple.

The most absurd part? Even Lingguang released a world model, and it's packed inside a phone. Anyone who likes using computers can forget about creating worlds.

I have reason to suspect ByteDance is also quietly cooking up a world model, probably named something like Seedworld. After all, they're the only major player left that hasn't jumped on this bandwagon.

But with so many magical tools that can instantly spawn an open world for us to explore, why hasn't a single one triggered that Pavlovian reflex in AI media editors — "it's insane, it's game over, it's killing the competition"?

I took a shallow spin through a few of them, and the answer is simple: they're terrible.

Let's start with Happy Oyster 🦪

Happy Oyster has two usage modes. One is Directing — director mode. This takes the real-time short film generation route, similar to Pixverse R1, which I roasted before.

The selling point of world models on this path is that you can input prompts at any time to change scenes and plot, generating AI video like you're playing a galge.

But previous models like Pixverse R1 and Odyssey-2 suffered from common problems in this category. No coherence — a newly generated scene in one frame vanishes the next. Error accumulation — the longer the generation runs, the more unhinged the visuals get. For specifics, see my earlier dedicated piece on Pixverse R1.

So does Happy Oyster's Directing mode make any progress on these issues?

I had it generate a scene of "an American astronaut walking on the moon," planning to introduce various extraterrestrial forces to interact with them.

Throughout the video generation, I fed in a barrage of prompts: "an alien appears," "shakes hands with the alien," "marries the alien and has a child," "an alien acid rain destroys their love nest." Let's see how it went.

Turns out Alibaba is pretty clever.

Pixverse R1 used a single continuous long take with no cuts, which made error accumulation and coherence problems especially glaring.

Happy Oyster solves this at the root — it cuts to a new shot every few seconds, everything resets, solving the problem by simply avoiding it. No sarcasm intended here.

And honestly, it kind of works. The alien generated at the beginning never disappears, peacefully hanging out on the moon until the end. Coherence: achieved.

But at the same time, character consistency and action/plot execution are so bad I genuinely thought they were bugs. The overall visual quality is on par with Western dreamcore aesthetics.

I tried something else. I uploaded a photo of Doubao and had Happy Oyster generate a live-action game called I'm Surrounded by Sister Dou.

After seeing the result, I finally understood what this thing is actually good for.

Sure, the horse it generated randomly mutated into a two-headed horse from nuclear radiation halfway through. Sure, Doubao suddenly started doing The Shining impressions, with two Doubaos standing together nearly giving me a heart attack. Sure, the whole visual falls apart like AI video from years ago — but —

the people and horses in this video can actually make sounds and speak. And you don't need to give explicit lines; vague instructions push the plot forward. None of the previous world models had this kind of showmanship.

Isn't this perfect for AI virtual companions?

Traditional AI companions basically just have LLMs generating personalized dialogue and plot in real time, with AI whipping up a few scenes and reaction memes for you. In the future, using a world model for AI companions, you could just write whatever you want your virtual partner to do — infinite, unbounded plot, visuals, and performance.

My recommendation: Happy Oyster should immediately release preset digital avatars for Love and Deepspace characters, all-star galge casts, and Yongchun Taffy. If it doesn't blow up, come find me.

But even having found a use case for it, I still don't understand what real-time video generation tools like Happy Oyster, Pixverse R1, and Odyssey-2 have to do with world models.

In my view, world models mainly address two pain points in the AI era.

One is that today's dominant LLMs are too dominant. The latecomers can't catch up, so they need to invent a new concept to leapfrog ahead. The other is that LLMs can't understand the physical world — can't let carbon-based humans and silicon-based beings interact with AI in three-dimensional space — which means AGI remains impossible, and world models could bridge that gap.

For example, that recent news about Indian workers wearing head-mounted cameras while doing assembly line work? That's supposedly an important way for AI to gather real-world data for training and improving embodied intelligence.

But if we had a world model that could simulate real physical worlds, maybe laborers in developing countries wouldn't have to keep their heads down. Just let AI agents practice their positioning in a world model. Energy-efficient and effective.

But what does this real-time video generation tool have to do with that vision? What contribution does it make to understanding the physical world? I don't see it. Feels like just another video agent with a different interaction format.

Its only advantage is you can't gacha-roll anymore — because there's no time to even think of prompts.

But as mentioned, Happy Oyster has two modes. The Directing mode I just covered does real-time video generation. They also have a Wandering mode — stroll mode — which might have slightly more to do with understanding the physical world.

In Wandering mode, users define a scene and character with two separate prompts, then wander around the generated world.

But this looks functionally and interaction-wise basically identical to Google's Genie 3.

The more hilarious part: the English interface below is Happy Oyster's. The Chinese interface is Genie 3's.

Time to test.

First wave: let's try the factory worker. I had Happy Oyster and Genie 3 each generate a Shenzhen electronics factory, with the protagonist set as a blue-collar fresh grad just entering society, to see if we could get a satisfying cyber-struggle session.

Genie 3 first.

The electronics factory vibes hit immediately. I guided the protagonist to grab a component, return to their station, and bumped into a coworker along the way. The workshop remained stable and environmentally consistent throughout.

But the controls weren't exactly smooth — less fluid than those Spring Festival Gala dancing robots. And whether we're talking the scene, props, or surrounding characters, everything had that suspiciously subpar texture of cheap NPC assets.

Now Happy Oyster — same factory, same dream.

In terms of model fidelity and visual quality, I'd say Happy Oyster edges out a win. It doesn't look like flesh blobs wandering through paper houses.

But similar fatal problems persist: no environmental consistency or stability.

I had the protagonist go grab a screw, and when they came back, their workstation was gone. I had them do a 360-degree turn, and where there used to be a wall, there was now a hallway.

People appeared where no one had been before — so the boss went behind our backs and quietly made some hires, huh?

The more I think about it, the creepier it gets. Kind of like playing a dreamcore horror game like The Backrooms.

This makes me seriously wonder: does Happy Oyster even have a memory function? Is it just doing first-frame video generation based on whatever the user last saw?

Later I also had Happy Oyster generate a dragon continuously breathing fire in a forest, and this dragon too forgot its original purpose as it walked around.

But this is a universal problem. The dragon Genie 3 generated also forgot what it came here to do after thirty seconds of fire-breathing.

Beyond environmental consistency and stability, another benchmark for whether a world model is usable is its simulation of physical laws.

So I generated a world specifically designed for crossing the street, to see which model would get me killed by a speeding truck?

This round, Happy Oyster squeaks out a small win. Because while it still has the consistency and stability issues mentioned above, the cars in this world do actually stop when I'm crossing and resume when I finish.

Civilized society. AI might actually learn autonomous driving here.

Genie 3, on the other hand — no matter how many times I regenerated — the cars were completely stationary.

I looked closer. Red light. And Genie 3 maxes out at 60 seconds. The red light? Also 60 seconds.

My dude is just standing there gaming the system.

I suspect Genie 3 lacks confidence in multi-agent interactions, so it simply stripped objects of their agency.

After all, they themselves admitted numerous shortcomings on their official site.

After that, I switched perspectives and generated a world specifically designed for hitting pedestrians and crashing cars. This time, both performed decently enough.

Happy Oyster's generated Cybertruck crashes flowed pretty smoothly, like driving bumper cars, with occasional clipping issues.

Genie 3's hits had real impact. People would avoid obstacles too — had that GTA 6 feel, except with even worse clipping than GTA.

Here I must clarify: my comparing Happy Oyster's Wandering mode against Genie 3 throughout this piece is not meant as a head-to-head benchmark review.

Comparing a beta product that just started internal testing last month against a multi-generation SOTA product is inherently unfair and pointless. I'm just trying to show everyone what world models actually look like right now.

So my final conclusion: Happy Oyster has massive room for improvement, and Genie 3 is also just... that.

At the end of the day, world model products that end users can actually access and figure out may not possess practical value to begin with. They're merely a posture — AI companies relieving their own fear and anxiety:

"Oh, I'm also keeping up with the times. I haven't gone all-in on LLMs either. I'm hedging my bets. So when Yann LeCun or Fei-Fei Li actually cooks up something big later, I can proudly tell the world — haha, I've had one foot in this wave all along."

It looks cautious and safe. In reality, it's terrified, walking on eggshells. I recently watched a play called Copenhagen, about how during WWII, Heisenberg — a nuclear physicist of the Third Reich — visited his teacher Bohr in German-occupied territory, who later joined the Manhattan Project.

I didn't fully grasp the play itself. But there's one scene where Heisenberg, with anguish, demands of Bohr: Why didn't you tell the Americans that we weren't actually researching how to make a nuclear bomb explode?

The world model scene right now really does resemble the pre-end-of-WWII era — a more advanced version of the dark forest:

Most people are producing useless products, and they know everyone else's products are useless too, or they can't even comprehend what others are producing. But nobody dares to stop producing, because they're all afraid the other side might actually be building a nuke.

So everyone keeps releasing, keeps updating, keeps letting blobs of announcements seize the attention high ground.

But whatever — it's fine that you're burning money on unrealistic things right now. It's not my money anyway. Explore Mars if you want. At least you're contributing more wild stunts and GDP.

(Cover image generated by ChatGPT; article written entirely by human)

⬇️

Subscribe to our Substack

funeralai.substack.com