Technology, Product, Emotion: What New Possibilities Has AIGC Opened Up? | 5Y 3Sigma Roundtable

五源资本·September 27, 2022·4·1

Welcome to register for the new session of the 5Y 3Sigma Small Roundtable.

From assistive tool to creative agent, AI is drawing increasing attention in content creation. When artificial intelligence meets imagination, will it spark a new Renaissance or fundamentally transform how we consume content? Can AI replace the genius of artists? What are entrepreneurs in this space exploring, contemplating, and wrestling with?

AI + Imagination: How Will It Disrupt Digital Content Creation and Consumption?

At the second 5Y 3Sigma Roundtable, eleven entrepreneurs in this field shared their experiences and perspectives — Ren Yi, head of audio at Sea AI Lab; Li Yanran, former head of emotional dialogue at Xiaomi; Li Yikai, Stanford computer science PhD candidate; Hu Yating, founder of AVAR; Qi Fanchao, founder of DeepLang Technology; Ao Tenglong, Peking University frontier computing graduate student; Yuan Xingyuan, founder of彩云小梦 (Caiyun Xiaomeng); Gunshi, product lead at Tekan Technology; Zhang Yuxuan, founder of Artflow.ai; Shi Haitian, founder of Graph Origin; and Damao, founder of Nieka.

We've excerpted some highlights, hoping they inspire you. Also, the next 5Y 3Sigma Roundtable will be held on October 2 — welcome to register and join the discussion :)

Why AIGC Matters

Peter, Managing Director at 5Y Capital

Over the past decade, mobile internet's rapid growth has steadily increased the supply of online content, while user demand for high-quality content has grown in tandem. With the emergence of AIGC technology, AI's role in content generation has been gradually amplifying.

AI-generated content will evolve through two phases. The first is AI-assisted GC, where AI primarily optimizes existing content creation workflows. Constrained by today's technical boundaries, we haven't yet reached true AIGC — where AI takes the lead across the entire process from creative conception to production, feedback, and iteration.

We also believe AI will transform not just content production but our very modes of interaction. AI can enable remarkably open-ended interactions with users, and products built on this paradigm hold vast imaginative potential. The impact of AIGC on content and interaction will gradually shift us from the centralized production and distribution models of the web2 era toward decentralization — such as the more decentralized ownership and distribution mechanisms that web3 enables. Meanwhile, existing content platforms still largely revolve around the physical world; as AIGC develops, content is evolving toward virtualization and 3D.

AIGC also raises significant challenges: high-quality data and compute power may concentrate among leading companies; better copyright protection mechanisms need to be established; and ethical questions will likely emerge. These are issues that will need to be addressed as AIGC gradually finds its footing in the coming years.

5Y Capital has supported numerous companies in the AIGC space, and we hope to identify the next wave of entrepreneurs who will drive disruptive change as this new generation of technology emerges. That's why we continue hosting 3Sigma Roundtables — we aspire to be the earliest, most committed, and most impactful partner for top technology entrepreneurs.

Roundtable: AIGC Technology, Products, and Emotion

Participants: Li Yanran, former head of emotional dialogue at Xiaomi Li Yikai, Stanford computer science PhD candidate He Kaiyan, investor at 5Y Capital

He Kaiyan: AIGC technology is evolving in so many directions. What's the most unexpected technical shift you've seen?

Li Yanran: For me, it's prompt engineering. When we first started working on generation, we often assumed poor output meant insufficient input — not enough information going in. But over the past couple of years, we've discovered through technical advances that even with minimal input, the model can imagine and grasp what you want it to generate. That interaction pattern feels remarkably natural. So I find this breakthrough itself stunningly disruptive, and it's also changing product possibilities in ways that let us explore further.

Li Yikai: My biggest impression came during my research when OpenAI suddenly released DALL-E 2. After trying it, I realized it was genuinely brute-force magic — generation speed and quality had improved several-fold over their previous version.

He Kaiyan: On the product side, what's left the deepest impression on you, and why?

Li Yanran: My answer might sound boring, but it's still Replika. From the start, Replika incorporated certain strategies that made it feel capable of nuanced, logical emotional attunement across multi-turn conversations, which kept people using it long-term. It also gave users an explicit interface showing what it understood.

Moreover, Replika focused on the emotional and psychological dimension, and it adopted a writing-based approach. Writing itself is a crucial form of emotional therapy — many people process their feelings through writing to achieve relief. So I see it as a genuinely bidirectional innovation: Replika provides users with a good template, and users are willing to build upon it to write their own feelings, creating a deep connection.

Li Yikai: I lean more toward an open-source community called Disco Diffusion. I watched it evolve from its earliest version, built on OpenAI's open-source CLIP model, and then saw a wave of people improving it — on speed, model size, and later from simple 2D generation to video generation, and then to 2.5D pseudo-3D generation. The continuous community improvements impressed me more.

He Kaiyan: You've all worked in this field for years. If we mapped AI's understanding of emotion to an L1-L4 progression, where are we today?

Li Yanran: Professor Huang Minlie at Tsinghua University and his team previously released a tiered definition for AI dialogue systems that discussed emotional understanding, and we helped refine it. We defined emotional understanding capability across L1 to L5. L1 is a general dialogue system that may not understand emotion at all. L2 means it can at least distinguish three tendencies — positive, negative, or neutral. L3 means it can identify at least three or more emotions, demonstrating classification capability. L4, we believe, involves combining multimodal information and even environmental context to understand emotion.

Finally, human emotion is complex and multidimensional, with innate physiological foundations as well as personalized emotional processing, so understanding user emotion requires considering multiple factors — this is the L5 threshold, which has barely begun and remains exploratory. We're probably somewhere between L2 and L3 now.

He Kaiyan: Yikai, your work focuses more on computer vision. Do you see this differently from that perspective?

Li Yikai: I think it's quite similar in essence — it's about modeling human visual and emotional responses to images. This modeling progresses from easy to hard in stages. The simplest is showing two icons and asking whether they like or dislike them. The next stage is gauging degree of preference. Harder still is giving them 6, 8, or 24 emotions to classify. More difficult yet is natural language-based multimodal analysis.

In terms of development, I can trace this back to 2014 when machine learning gained momentum. What's clearly visible is that after OpenAI trained a large model, it dramatically elevated multimodal classification tasks to a new level. It was like a discontinuous new technology emerged that enhanced everything. Now people are building on these multimodal tasks to analyze the relationship between visual and textual descriptions of emotion, with new papers constantly appearing.

From guest Li Yikai's presentation slides

He Kaiyan: If an important direction for AI products is evolving to feel more human, or reaching L4-level multimodal emotional understanding, what would be the most critical technical breakthrough from today's vantage point?

Li Yanran: I think the crucial breakthrough still leans toward autonomous intelligence — the machine needs to be able to explore on its own, to proactively initiate communication about its own feelings. That's a critical direction for breakthrough.

Li Yikai: Perhaps my focus is simpler. Regarding images and the emotions they evoke, AI currently has some capacity for aesthetic sensibility, but we've clearly hit a technical hurdle: human aesthetic preferences are too diverse. One person's data is completely insufficient to learn their taste. You have to collect massive amounts of data, aggregate preference data on a platform, cluster data from people with similar aesthetics for unified analysis, and learn from everyone's aesthetic preferences to train an emotion-understanding model. Using just one person or a few people's data is far from enough.

From guest Li Yanran's presentation slides

Roundtable: Multimodal Content Generation and Consumer Interaction

Participants: Ao Tenglong, Peking University frontier computing graduate student Yuan Xingyuan, founder of Caiyun Xiaomeng Shi Yunfeng, investor at 5Y Capital

Shi Yunfeng: We've all been exploring AIGC for quite a while. Why haven't we seen mass-market consumer products at scale yet — is it too early, or have we just not seen the signal?

Yuan Xingyuan: I think the technology simply isn't there yet. Phones existed for years before the iPhone, but it was the multi-touch interface and iOS mobile operating system introduced in 2007 that made them truly usable. For us now, our capabilities may already make AI feel human-like to a small subset of people — maybe 20-30%. But if someday more than half, even 80-90% of people can't distinguish AI from humans, and you're willing to keep chatting with AI indefinitely, that capability threshold would be an inflection point.

Ao Tenglong: I agree — our direction also faces technical challenges. Additionally, 3D animation currently has high barriers to entry, and another reason is we haven't truly validated whether users actually need this, whether it can genuinely spark creative energy. We haven't gotten real demand validation, haven't seen something that truly works land yet. Worth trying.

Shi Yunfeng: You mentioned Love and Producer — it's a game without AI intelligence, yet its revenue probably exceeds many AI-powered products. What's your take: in consumer interaction scenarios, is AI's soul or intelligence level more important, or are other factors like appearance more critical?

Yuan Xingyuan: I think we just need more time. At first cars couldn't outrun horse-drawn carriages, but that obviously changed. Human-made things will inevitably fall short of AI in diversity and efficiency — it's a matter of time.

Shi Yunfeng: It's like we're waiting for the transformation to truly arrive. What technical breakthroughs or variables do you think could push this transformation significantly forward?

Yuan Xingyuan: Our current challenge is making AI's memory persist across different contexts, so users can play with AI in virtual worlds — exploring martial arts fantasy realms, voyaging across oceans — without sensing it's AI.

From language to action, many game companies are making meaningful attempts. We need to somehow combine reinforcement learning-based intelligence with statistical natural language intelligence — this connects to the System 1/System 2 framework from Thinking, Fast and Slow. Once combined and advanced further, an algorithmic breakthrough may be needed. This could come from academia, though if we in industry work hard enough, possibly from industry too — perhaps within a year or two, we could push overall intelligence forward another step. If Caiyun Xiaomeng's current intelligence is one, our target is ten — there's probably tenfold room for improvement.

Ao Tenglong: Has anyone considered that individual branches are already quite mature — gesture, locomotion — but no one's combined them into something that truly wows people? Partly because the interfaces between components or open-source integration aren't great, which is something we want to push forward. If I had to predict, I'd say within two years, if this integration is done well, we could see animation on short-video platforms that's hard to distinguish from human-created work.

Shi Yunfeng: This scenario might require a genius product manager to trim and synthesize these needs together.

From guest Ao Tenglong's presentation slides

Ao Tenglong: I also have a question for everyone — how do you think AIGC product managers will differ from traditional internet product managers, and what skills will be required?

Shi Yunfeng: I've been thinking about this recently too. My rough hypothesis is that platform internet company product managers function more like system maintainers or macroeconomic regulators — balancing relationships between content creators and consumers, adjusting coefficients between creators and across content categories. The AIGC industry, at its current stage, needs more creative, imaginative people — these are fundamentally different roles.

One category of relatively large-scale companies that has systematically preserved creativity is game companies. For game companies, releasing two major flops could mean existential crisis, so they operate with a life-or-death urgency that sustains creativity — quite different from other internet platform companies. So the creative, passionate talent that AIGC needs at this stage is more likely to emerge systematically from game companies.

The next 5Y 3Sigma Roundtable will be held this Sunday, October 2, 9:30-12:30, on the theme: General-Purpose Robots: The Path to General Intelligence?

Welcome to register by [scanning the QR code in the poster] or clicking [Read More] to join the discussion.

Interactive Giveaway

Share your views on AIGC in the comments — we'll select 2 featured comments to receive a 5Y Capital commemorative T-shirt + 5Y Capital coffee cup. (Comments accepted through October 7; please reply with shipping information within 24 hours of notification.)

5Y Capital seeks out, supports, and inspires lone entrepreneurs, providing support from spiritual to operational dimensions. We believe that if the "crazy" you that others see begins to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG