
2026: The Year of AI Video Is Coming | A Conversation with OiiOii Founder Nao Nao: After Stints at Tencent and ByteDance, How to Catch the Next Wave?
January 6, 2026
🚥 I believe 2026 will be a "breakout year" for AI + video, and many new unicorns will emerge from this space.
So the first AI founder guest on the "Crossing" podcast in 2026 is Nao Nao, founder of OiiOii. She works in AI video, and her product attracted massive attention and acclaim right from its beta launch.
Nao Nao's background is uniquely compelling — she's one of the few product people in China who has fought in the core business lines of both WeChat and ByteDance. She worked on QQ Mail, led CapCut and Douyin effects, and was once head of Bilibili's animation business. She brings deep insight into human nature and product values from her time at WeChat, plus a systematic, data-driven growth methodology from ByteDance. More importantly, she's a creator who has harbored an animation dream for over a decade.
In 2024, she finally started doing what she'd wanted to do since college — using AI to let ordinary people make animation.
In this episode, we covered a lot:
On product: OiiOii rebuilt its architecture four times in two months, from "first-and-last-frame" to fully embracing Sora 2 — what happened in between? Why does she believe Agent won't be swallowed by large models, but will instead flourish alongside them?
On methodology: She used a "supermarket vs. restaurant" metaphor to explain the relationship between Agent companies and model companies; used "left brain vs. right brain" to summarize the core difference between building products at WeChat versus ByteDance; and shared the three key capabilities for being a great product manager.
On entrepreneurship: Why does she say animation is one of the few industries in the business world that "rewards purity and passion"? Why does she believe "conflict is the force that drives action"?
On product management: What are the most important things for becoming an excellent PM?
If you're an AI entrepreneur, product manager, or investor interested in video generation, this episode will give you plenty of insights.
🎬 Our video podcast is here! Now live on Koji Yang Yuancheng's WeChat Channels, Xiaohongshu, Bilibili, YouTube, and other platforms.
📒 Text version published on the CrossingCrossing WeChat official account.
🟢 01:23 Rapid Fire:
Age, alma mater, MBTI and zodiac sign, one-sentence intro to current company and product, funding status, revenue and profit, team size, pre-entrepreneurship experience
🟢 06:40 The Agent Moment for AI Video: Why Now?
When Sora 2.0 demonstrated astonishing storyboarding capabilities, are startups eaten by "end-to-end" models, or entering their best era?
Why is Agent the optimal form for animation, rather than traditional GUI tools?
The two schools of video generation: the "first-and-last-frame" school pursuing extreme stability, versus the "reference image + text-to-video" school pursuing cinematic language.
Only Agent architecture can piece together different models like Lego blocks.
We initially didn't expect that parents would use OiiOii daily to make Christmas song MVs for their kids, or animate their pets.
🟢 12:44 Why We're Not Afraid of Sora
Large models are supermarkets; Agents are Sichuan restaurants.
As model vendors grow stronger, where does the application layer's moat actually lie?
Even when Sora reaches 4.0/5.0, Agent products won't die — they'll become more prosperous.
The Supermarket and Restaurant Theory: Large models are like supermarkets (providing raw ingredients), while Agents are Sichuan restaurants, Cantonese restaurants (providing finished dishes with specific flavors). You can buy groceries at the supermarket and cook yourself, but restaurants will always have value.
About 60-70% of our work is like "seasoning" in the back kitchen: how to transform those raw model outputs into dishes that suit specific tastes (MVs, science popularization, motion comics).
🟢 18:52 Not Just "Video Cursor": Where's the Incremental Market?
Motion comics, UGC communities, self-media creators... who are the real paying customers of this tech wave, based on our beta findings?
Reflections on CapCut: AI video Agent won't replace CapCut — it creates incremental value. It will eat the complex "effects production"环节, but lightweight "editing/trimming/sequencing" still requires traditional timeline tools.
Why not build a UGC community directly?
A counterintuitive finding: the people who need AI animation tools most may not be making content for mass audiences, but for maintaining social relationships (for children, teachers, partners).
🟢 35:28 WeChat's Right Brain, ByteDance's Left Brain: Two Paths of Product Cultivation
Nao Nao is a rare core PM who worked in China's two most elite product systems — WeChat and Douyin. What similarities and differences did she see?
Tencent WeChat (Right Brain): It's emotional, intuitive. Allen Zhang taught us to read 1,000 user feedback messages daily, to train that product intuition of "spotting real vs. fake needs at a glance."
ByteDance (Left Brain): It's rational, data-driven. Here I learned what "strategy product" means — data isn't just cold numbers, it's the probability of user behavior, telling you how to create trends.
Common ground: Both pushed their innate strengths to the extreme. WeChat maximized "human nature"; ByteDance maximized "recommendation engines."
What are the three most important things for being a good product manager?
🟢 50:42 Conflict Is the Force That Drives Action
Why "allow conflict, even create conflict" within the team?
Healthy conflict is the best filter for "people who get things done." When everyone puts the work first, conflict turns into back-to-back camaraderie.
Once fired her best friend, got cursed out brutally, but a year later received a WeChat message: "I finally understand you."
Was once a rebellious rock-and-roll girl, now has become very peaceful: because she discovered that shouting outward isn't freedom — true freedom is inner vastness.
🟢 57:53 Predicting 2026
Counter-consensus judgment for the future: technology's "greater interactivity/higher editing freedom" might actually make products more niche, because the mass audience is accustomed to "passive consumption."
Video models won't achieve "grand unification" in the near term: because each model company's data annotation standards differ, leaving huge room for Agent's "combinatorial innovation."
I don't want to express anything. I hope OiiOii, like me, is a container. Let those chosen by the "god of animation" tell their stories through this vessel.
🟢 01:03:33 10 "I am ___" Sentences
I am a... (content confidential, listen to the podcast)
Subscribe to "Crossing": 🚦 We track the industry transformations and new entrepreneurial opportunities brought by the new wave of AI technology.
🚦 "Crossing" is Steve Jobs' metaphor for Apple — standing at the intersection of technology and liberal arts, where great products are born. AI is transforming every industry. We seek out, interview, and bring together a new generation of AI entrepreneurs and active participants in the AI era. Together with them, we explore and embrace the new changes, the new possibilities.
👦🏻 Host Koji: I founded Crossing, launched AI Hacker House (a community space for the new generation of AI entrepreneurs), and serve as Venture Partner at ZhenFund. I believe technology, especially AI, represents the greatest value creation opportunity of our generation. Koji's Jike, Koji's website
👧🏻 Host Ronghui: I co-founded Crossing, worked at a USD VC, and spent five years as a Silicon Valley correspondent, tracking technological development and business stories. Feel free to chat with me and exchange ideas. Ronghui's Jike