AIGC's New Species: How AI Unlocks Creativity and Productivity | 5Y 3Sigma Roundtable

五源资本·December 7, 2022·8·1

AI's Creativity, Creation, and Productivity.

AIGC (AI-generated content) refers to the production method of automatically or semi-automatically generating content through AI technology. Over the past year, AI has brought significant changes to content generation, and AIGC has gradually attracted increasing attention from researchers and industry. Where will AIGC technology iteration head? How will AI-human interaction evolve? How can we leverage AI to unleash creativity and productivity? How can AIGC products build value and moats?

For the fourth edition of 5Y 3Sigma Roundtable, we once again focused on the topic of AIGC. Among numerous applicants, we invited eight industry professionals and entrepreneurs in the AIGC field for discussion. Every guest present delivered wonderful and insightful sharing. We welcome everyone to continue following and applying for 3 Sigma Roundtable events!

We've excerpted some content, hoping it inspires you.

Why Keep Paying Attention to AIGC?

Peter, Managing Director at 5Y Capital

Over roughly the past year, AI has brought significant changes to content generation. Since deep learning entered industrial applications in 2012, one of its core strengths has been pattern recognition. But in recent years, with the proliferation and development of large language models, AI has shifted from pattern recognition to pattern generation.

This trend initially started with text, including Google's BERT in 2018 and OpenAI's GPT-3 in 2020. These massive pre-trained models achieved not only strong text comprehension but also excellent content generation capabilities. With the popularization of Diffusion generation algorithms and CLIP pre-training, the image generation field saw numerous new models emerge, including recent attempts at text-to-video generation.

In other modalities, beyond mainstream text, image, and video, new domains are also seeing generative models emerge, such as code, 3D models, and game scripts.

This major trend shift is incubating new entrepreneurial opportunities. As technology gradually matures, more and more startup products are beginning to appear. Although many products are still early in PMF and business models, much like the early application and scenario explosion stages of PC internet and mobile internet, the first applications to appear are often not the ones that ultimately capture the largest market, so continuous iteration is still underway.

One interesting trend: a few months ago, I came across a perspective from OpenAI's Sam Altman. In traditional thinking, we often assumed AI would first replace physical labor, then move into cognitive labor, and finally creative labor. The assumption here was that creative labor is extremely difficult for today's algorithms. But with the rapid development of image generation engines, an interesting phenomenon has emerged: in these very open, creative scenarios, AI has demonstrated strong capabilities. With the emergence of large pre-trained models and complex generation algorithms, the order of AI labor replacement may actually reverse — breakthroughs may first occur in creative labor that is open-ended, somewhat uncertain, and has tolerance for error. This is quite enlightening for our search for new paradigms.

AIGC is not only changing how content is produced — content distribution is also undergoing significant change. Looking back at historical platform opportunities, from early manually categorized portals to search-centric search engines, to the mobile internet era where content recommendation became the dominant form. Now that AI has the ability to generate personalized content, whether content distribution will still exist in traditional forms is an entirely new question.

Additionally, the production relations of content are also changing. Some core elements such as means of production, consumption patterns, and distribution channels will also undergo systematic changes.

We are also continuously watching whether AIGC has the opportunity to define new human-computer interaction paradigms. Whether it's OpenAI or our portfolio company Colorful Cloud Xiaomeng, exploratory work on open-ended content interaction has already begun. We look forward to whether there will be opportunities to define a new human-computer interaction paradigm, much as the iPhone defined how humans interact with smartphones over the past decade.

At the same time, due to the openness of AIGC and the complexity of algorithms, it has also brought corresponding issues. Many of these lack clear solutions today, such as the centralization of data and computing power, data sourcing and copyright issues, and the ethics of synthetic content. We hope these can be gradually resolved in future development.

5Y Capital has already invested in many innovative companies in the AIGC field, and we hope to find more entrepreneurs to become their earliest, longest-term, and most influential partners.

Roundtable Discussion: AIGC New Species — Content, Interaction, and Commercial Innovation in the Generation Era

Participants:

KABA (Jiabo Yu) Information Science graduate student at Sungkyunkwan University, manga artist and novelist
Chaoqiang Huang Product manager, AIGC content creator
Chunyu Chen Founder of Jiqun Technology
Yuxuan Zhang Founder of Artflow.ai
Peter Investor at 5Y Capital
Kaiyan He Investor at 5Y Capital
Chao Ji Investor at 5Y Capital

Peter: How do you view the development of AIGC technology over the past year? Any surprises or things that exceeded expectations?

Yuxuan Zhang: DreamBooth that came out recently blew my mind. We also did some internal experiments. Just taking a single product photo with a phone — very random angle and lighting — based on that one image, you could generate entirely new product images under different lighting conditions. This is hugely significant for e-commerce, and also for the storytelling we want to do. Basically, whatever you can think of, you can connect virtual and reality and generate what you imagined. It was quite shocking to me.

Peter: I had the same reaction seeing DreamBooth. Perhaps not limited to generating a single object, but the entire object across different scenes and angles. If its flexibility and extensibility become stronger in the future, it may have stronger narrative capabilities, which could be very helpful for moving generated content from static to dynamic, and toward more open-ended storytelling.

Chunyu Chen: Personally, I'm hoping for something in 3D model generation that works as directly as NovelAI — something that would make many things fundamentally different. For games, it could make more metaverse-like creation jump by an order of magnitude. Although many startups are talking about these stories, when NovelAI came out, you knew it could be directly used for games. But 3D hasn't reached that level yet.

Peter: I'm thinking there may be two paths for 3D model generation. One is compatibility and adaptation with existing game production pipelines, where algorithms can generate clean surfaces and curves that can be directly used in existing assets. But there may be another path, where the generated models can't be easily edited in existing pipelines, but could be combined with a new rendering pipeline. This might replace the current toolchain for game creation and 3D content creation itself.

Chunyu Chen: Because what's most important for AI is that you can input massive amounts of data. For example, after Apple released LiDAR, every iPhone user could scan a 3D model. The data quality is actually quite poor, but it wins on volume — everyone can do it. If there were ways to re-render this, or if there were breakthroughs at the fundamental level of computer graphics, there could be completely different creation methods.

Peter: How do you view prompt-based generation methods, and how should AIGC interaction evolve in the future?

KABA: Current prompts are actually based on our existing information science classification systems, relatively specialized ones like DDC classification standards. But if we're talking B2C, we find that many existing content creation platforms have UGC-based tags themselves — users can create and invent concepts, and after gaining others' recognition, these concepts can achieve massive spread.

The core issue is the format and standards for information dissemination. In the evolution of AI prompts, or so-called human-computer interaction, we might think differently. We don't necessarily need to input an abstract sentence for AI to guess — instead, within a relatively mature model, we could build a feasible information communication paradigm for human-computer interaction. This paradigm could serve as an underlying industry technical standard applied across various platform creations. On this foundation, after collecting enough data and going through enough iterations, we could proceed to more abstracted expression. This is a layered, progressive process. My feeling is that the current prompt approach may be trying to leap too far at once.

Peter: My feeling is that today's generative models are extremely sensitive to text prompts — very weak signals can cause large perturbations in results. On this basis, we're also doing some structural optimizations to allow certain features to deform in more stable, controllable ways. And how to find more universal languages for expressing human creativity may be an important proposition for the near future.

Yuxuan Zhang: I think AI-assisted creation is essentially a search problem — you need to make what's in someone's mind. If we use the "how to put an elephant in a refrigerator" framework: first, I need to know what I want to create and describe it somehow, and this step already involves information loss. After describing it, the machine still needs to understand what I actually mean, which largely depends on the machine's semantic understanding and how large its library is.

For example, in storytelling we do, text is more about expressing meaning. Making something through description alone is very difficult for AI. But with assistance from other modules, it can actually be done relatively simply, such as setting up different scenes. Going forward, we'll also explore different multimodal assisted creation methods for this.

Peter: Can this multimodal assisted creation approach be simply understood as: for example, giving a sample image or reference image would be a relatively good interaction method for AI?

Yuxuan Zhang: I think it's one approach, but there may be several prerequisites. First, the creator needs to be able to find images that match their expectations, which itself has a threshold. I think a better approach is to directly abstract out different elements and straightforwardly provide some templates for people to get started quickly. For pro users, they could also not use templates, or combine multiple templates to make more complex things.

Chunyu Chen: I think although the current prompt approach is quite difficult, everyone's eager to try it, and users exhibit many strange behaviors. This itself is a huge PMF signal. It will definitely become increasingly simple.

As for evolution direction, it may also relate to data not being closed-loop yet. For example, if one person has collected or liked so many images across various platforms, can AI generation models directly capture this data to see which styles you actually prefer, and directly push similar ones to you. You might only need to make some preferential choices without inputting many prompts, and it would know roughly what you like. For example, giving you 10 images, if you pick one this time, it can learn your tendencies from your choices and gradually generate what you want. There's still much room for exploration here.

Kaiyan He: As KABA just mentioned, users might stare blankly at a text box. For instance, when search engines first appeared, ordinary users might also stare blankly at the search box, but once portals came out, they knew what to click. I also agree with what Chunyu said about prompts having other manipulable elements rather than starting with high-threshold inputs — lowering this threshold is quite necessary.

Chaoqiang Huang: In my actual observations, I've found that in some domestic tools, the user volume and frequency of generating images from text input is actually less than users submitting one image to generate another image. When we think about prompts, we should ponder this — is the default text input method what users actually want?

As mentioned earlier, having users input text might leave them staring blankly at the input box. Our imagination is truly limited. After using numerous products myself, I find I don't know what to input either. But perhaps users have seen many images and want to generate their own based on these images — this is one form of multimodal assisted creation.

Additionally, we can consider: current prompt logic is all single-turn input. My interaction with the dialog box is just what I say right now and what the model outputs for me. Is it possible to know what I want to generate based on all my past data?

I was recently reading the CLIP paper, and found its logic is essentially extracting text features and image features to generate the image. Could AIGC interaction logic undergo massive changes at the model level itself? This is worth watching.

Chao Ji: I'd like to add a question. I feel like Douyin might be in a very good position, because it has massive user preference data. Of course it's mainly video, but some videos are also composed of image sequences. I'm curious — if it were to do AIGC image generation, would it leverage all this data to achieve personalization at the individual level based on user preferences, with differentiation in smaller models?

Chaoqiang Huang: I think this is a great idea. This is indeed an advantage for Douyin — knowing many user preferences. Another advantage is that Douyin somehow knows trending styles and can guide users toward creating trending content. I find these two points quite interesting.

Chao Ji: Right, and might this model eventually be somewhat similar to federated learning — with a mother model and a small model at each individual level, equivalent to a combination of two parameters, so that each person's generated images still have some differences.

Peter: For the final question, I'd like to hear everyone's views. If building an AIGC community, how do you form a closed loop or find moats?

Chaoqiang Huang: First, I think we need to understand what a community is. I can offer a definition: a community can be considered a social platform with content as its carrier, where content cannot be discussed separately from social interaction. In a community medium, social interaction and content should be highly correlated elements.

Next, several dimensions. First, cold start. Currently almost everyone in China is just starting out. I believe cold start must target a small, segmented market that ideally no one else is doing. Second, differentiation. We need to help the community quickly occupy users' mental space. The community also needs to gather a group of mutually recognizing users who lead subsequent arrivals, with everyone following this community atmosphere. After establishing differentiated positioning, we need to think about how to provide valuable content to users, ensure the platform continuously produces good content, and incentivize subsequent users to continue producing.

Behind communities, there may also be two major issues. First, content safety review mechanisms require substantial human resources to maintain. Second, at the content level, how to help users form social connections. I also look forward to seeing interesting innovations.

KABA: Let me also briefly share. The concept of "cyber" itself, its core idea or etymology is essentially based on cybernetics. Current platforms may have relatively high manual intervention, essentially operating based on a control system. But my personal hope for AIGC's future is to enable users to spontaneously generate content. Communities determine a platform's content floor. Since AIGC is about liberation of expression, it should more encourage user expression and creativity, and form community mechanisms based on this atmosphere.

Chunyu Chen: Communities definitely aren't top-down where you set a tone and it becomes what you want. For community products that have succeeded worldwide, the founding teams often don't know how users made them what they became — there's underlying vitality. This underlying vitality requires enabling interaction between users. For example, in group chat formats, users must be able to interact with each other. Only with more human-to-human interaction, truly giving users creative rights, does community vitality become stronger.

Selected Highlights from Roundtable Guests

Technology + Cultural Creativity = New Era

On AIGC Content Empowerment

KABA (Jiabo Yu) Information Science graduate student at Sungkyunkwan University, manga artist and novelist

I'm currently studying information science at Sungkyunkwan University in Korea. During 2015-2016, I engaged in some novel and manga creation. Currently at school, I'm conducting more systematic theoretical research, mainly exploring the practical application of information science for cultural and artistic creation.

We often hear that AIGC is a creative tool and creation tool, but perhaps less discussed is: what is creativity, and what is creation?

We can be quite clear about one thing: so-called creativity is not generating something completely nonexistent out of thin air. Especially in cultural creativity, it's more like relatively free combination and divergence based on existing concepts. For example, as a screenwriter, what kind of plot and effects would cyberpunk plus wuxia produce? What interesting settings might wuxia evolve in a space weightless environment? Further refinement becomes creation.

What is creation? It's giving logic and continuity to creativity. Based on existing, very mature creative methodologies — even those passed down for thousands of years — to express and interpret content. It's the process of making creativity concrete and logical.

So creativity is a divergent need, while creation has extremely high requirements for determinism and logic. That is, when AIGC tools are simultaneously applied to both creativity and creation, this presents a conflicting paradox for the tool — we need both divergence and precision. Precision can help us improve efficiency, but creativity can provide more options for filtering.

The continuous cycle and verification of creativity and creation is overall a very simplified commercial cultural creativity model, including some individual creations that also think based on this model. The ultimate goal of cultural creativity is to logically construct creativity, express viewpoints and ideas, thereby forming resonance in user communities, and market value naturally follows.

If we classify current AIGC users by cognition and skill, the first category might be design and film/television professionals with systematic education. They have clear motivations and needs, and know exactly what they need when using AIGC models. The second category may lack skills but as资深爱好者 groups, their cognitive boundaries often exceed some professionals in many cases — what they lack is practical operational skills.

If we temporarily define AIGC as a tool, this tool might enable the second category of users with broader cognitive boundaries to achieve弯道超车, skipping so-called professional technical requirements. Relying on their own knowledge accumulation and expression capabilities, they can achieve a higher level of expression than professionals.

Most ordinary users may not have clear needs or extensive knowledge reserves. Faced with AIGC tools, their most common reaction might be staring blankly at the input box, not knowing what to input. Even seeing others generate gorgeous images and feeling motivated themselves, affected by cognition and aesthetics, this motivation may not be sufficient to sustain long-term. Currently, most image generation tools are still relatively early and primitive. This text-input approach based on users' knowledge accumulation and understanding is not particularly aligned with users' cognitive patterns.

Users' cognitive patterns actually resemble this chart — developer friends will be very familiar with it. Its horizontal axis is model interpretability, vertical axis is complexity. That is, the higher the complexity of a model, the lower its interpretability. User cognition is the same. Top left is algorithms — to users, algorithms are black boxes, not something solvable by popular science videos. Bottom right is concrete text-based chart rules, giving users a perceptible boundary.

So what is the controllability principle? Controllability means creativity must be controllable at the input end, while creation must be precisely controllable at the output end, not too天马行空 because that doesn't conform to basic creative laws.

Currently, users themselves feel anxiety about this technology because it's unknowable. Additionally, users need to adapt to tool usage based on their own capabilities. The most obvious thing I've observed is that creators with professional skills pick up AIGC tools faster and output better results, while those without professional skills are lost. In UGC communities, there may be a state where the strong get stronger and the weak get weaker. AIGC hasn't become an aid for most people, but instead exacerbated a certain degree of分化 — this may be unexpected for many. But since it's still early, there are still many opportunities.

To avoid users experiencing this state of not knowing what to input, or precision not matching expectations, we should give users rules. Traditionally, expectations for AIGC are absolutely free and open. But like in games, giving users rules isn't limitation — it's establishing a relatively fuzzy cognitive boundary, indirectly telling users what the product can do, lowering the barrier to entry. Providing rules doesn't mean having users strictly follow certain methods, but providing options so users can grow along a relatively smooth cognitive curve.

For specific rules, such as establishing hierarchical tags that users can select and share, as well as standardized cognitive templates. This can form an information communication paradigm within communities, thereby lowering users' communication costs. When users have more desire to share and communicate, a very early community rule will form.

Finally, regarding expectations for AIGC technology's future, many people often say it will replace someone, displace someone — there's really no need for such worries. Technology in any era is unstoppable. The masses' desire for communication and expression, and the cultural creative communities formed based on these, will inevitably continue to exist long into the future. With both of these existing, talking about who will be replaced is meaningless. What's more important is that through organic coupling, we find where the gears mesh, enabling existing technology and productivity to be liberated. AIGC more plays a collaborative role, freeing people from meaningless mechanical labor and bringing human creativity and thinking ability to their best.

Additionally, I hope that while AIGC liberates professional creators' productivity, it can also lower the barrier to expression for the masses, enabling more people with ideas, cognition, and knowledge accumulation to bypass very high learning costs and express directly. Just as the printing press enabled massive dissemination of religious knowledge, contributing to the later Enlightenment and Renaissance, I personally hope AIGC technology can also bring liberation of expression for people. As for how this era will be defined and written about in the future, let us wait and see.

Interactive Giveaway

We welcome you to share your views and perspectives on AIGC in the comments section. We will select 2 featured comments and send each a 5Y Capital commemorative hoodie. (Comments accepted until December 12. Please reply with shipping information within 24 hours of receiving notification.)

5Y Capital seeks, supports, and inspires lonely entrepreneurs, providing support from spirit to all operational matters. We believe that if the you whom others see as crazy begins to be believed in, the world will become refreshingly different.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG