The Next Generation of Productivity Tools Is Here: How Should Entrepreneurs Embrace the AIGC Wave? | Ronghui Dialogue

高榕创投·December 13, 2022·15·1

"Every entrepreneur, developer, and creator should be paying attention to advances in AIGC."

Whether you're generating artwork with Stability.ai or Midjourney, chatting with ChatGPT, or having Jasper.ai draft a short essay for you, the recent wave of AIGC killer apps and the large models behind them has been "like a fire in winter" — delivering genuinely surprising experiences. This wave of generative AI is using existing human data to push into new frontiers, and opening up our imaginations in the process.

What underlying technologies are driving this evolution? What new applications overseas are worth watching? Most importantly, how can Chinese entrepreneurs benefit — and what new opportunities might emerge at the infrastructure layer, in new scenarios, and at the application layer?

Recently, Gaorong Ventures and Sheng Huo Huo Po (OnionPod) co-hosted an online seminar, bringing together four guests from the fields of general artificial intelligence, 3D content production and consumption platforms, investment, and media for a conversation.

Excerpts from the discussion:

>> Ding Jiao: This has been a breakout year for AIGC — from AI art products to the genuinely stunning ChatGPT. Why is AIGC entering the public consciousness now, and what technologies are driving it?

Yolanda: In fact, the industry has been using machine learning methods like GANs to generate images for years. But the results were limited, and they weren't connected to the text-to-image approach that's so hot today.

The current AIGC wave is primarily driven by two original technologies from OpenAI. First, the Diffusion Model, marked by DALL·E, and the open-source tools it spawned like Stability.ai and Midjourney, which have massively advanced text-to-image generation.

The other is the text generation model GPT. In 2017, OpenAI first combined the Transformer architecture with large-scale training systems for text generation, achieving very strong results; GPT has made enormous progress in the years since. AIGC has now entered a growth phase, with rapid development in AI-generated images, text, code, music, and more — all while helping people work more efficiently. OpenAI's success was no accident. We summarize it as three "fives": five years, 50 people with the deepest understanding of AGI, and $500 million in compute.

Going forward, I believe the most important evolution for AIGC will be from single-modal to multi-modal systems — for example, a single neural network that simultaneously generates images, text, voice, and more. This technology is advancing rapidly. So we believe that in content-related creative fields, we'll start to see AI assisting humans, and in some cases even replacing them in certain steps.

>> Ding Jiao: Haozhi from XVerse has extensive experience in graphics engines and computer vision. From a technical perspective, why did AI art products become the first to break out this year?

Huang Haozhi: AI art has actually been researched for many years. The recent sense that AI art has reached a usable state — even surpassing some junior artists — comes from advances in two areas: image generation technology and natural language processing models. So the progress in AI art is a result of the intersection of computer vision and NLP.

In image generation, earlier generative models like GANs and Imagen AI essentially took an existing image, processed it through deep learning neural networks, and produced another image with similar content. This year's much-discussed Stable Diffusion and similar models use Diffusion Models. What's special about diffusion models is the shift from training based on one-dimensional noise to modeling fully two-dimensional noise, using iterative noise-adding and denoising to generate images. People found the image quality was higher. This led to a wave of diffusion model research and technical breakthroughs.

On the other hand, what makes AI art compelling is that people can generate corresponding artwork through relatively natural language descriptions. This is thanks to large-scale pre-trained language models, which extract linguistic features and map them to image features, ultimately producing a mapping from language to image.

With years of internet development, we've accumulated massive amounts of text and image data. Training models on this data enables the generation of relatively high-quality artwork.

>> Ding Jiao: What entrepreneurial opportunities are exploding in AIGC in overseas markets right now?

Liu Xinhua: Unlike the previous wave of discriminative AI, which mainly solved classification and boundary problems with existing data — in familiar applications like algorithmic recommendation and facial recognition — this wave of generative AI is focused on generating and creating new data. The outputs are divergent and diverse, using existing human data to open new boundaries and create entirely new incremental value.

From August to now, enormous innovation has emerged in AIGC. The GPT-3 ecosystem alone has sprouted hundreds of applications across different fields, with unicorn companies among them.

Overseas, AIGC isn't just an opportunity for tech giants — many startups have emerged too. Some are infrastructure-focused, incubating breakthroughs in large models for new scenarios, with many new models based on DALL·E, GPT, and others. At the application layer, many new companies have appeared, generating text, images, video, code, 3D models, and more.

In vertical industries, a dense cluster of startups has also emerged. Jasper.ai, for example, currently serves marketing, e-commerce, and self-media creation.

>> Ding Jiao: When evaluating these AIGC startups, what capabilities and metrics do you focus on?

Liu Xinhua: From an investment perspective, when judging whether a field has reached an inflection point for explosive growth, we look at whether representative products have achieved product-market fit (PMF). In frontier fields like generative AI, we also have a preliminary assessment: before PMF, which scenarios have reached AMF (AI-Market Fit).

For AIGC to reach AMF, two dimensions matter. First, AI capabilities have basically reached 60-70% of human professional levels — the threshold of being usable by humans. Second, AI-based tools have high tolerance for error and offer editability. Markets with large user bases, high data availability, and high commercial value are most likely to explode.

A key shared characteristic of the applications that have blown up in this AIGC wave is that generative AI actually has the most opportunity in markets with high creative intensity. These fields tend to have high tolerance for error, no strict singular aesthetic standards, and a willingness to accept diversity. Plus, the tools offer strong editability for easy human remixing. This is counterintuitive. As OpenAI CEO Sam Altman said, "Ten years ago, everyone thought creative work would be the last human job replaced by AI. No one expected the opposite to be true today."

So beyond creative work, generative AI may also find opportunities in fields like gaming, architectural design, new drug discovery, and new materials — areas with high creative divergence and high tolerance for error.

Many people also have concerns about AIGC accuracy. The hottest AIGC companies have already reached a usable accuracy threshold. My personal view is that to further improve AIGC usability, the community-driven data-model collaborative growth flywheel is crucial.

This year, both Jasper.ai and Stable Diffusion have thrived on vibrant communities. Community capability will be very important for excellent AIGC companies going forward. User communities continuously contribute data to models; new data drives further model evolution, which creates better experiences and attracts more users — forming data network effects and user data networks that reinforce each other, driving the growth flywheel.

>> Ding Jiao: What generative AI applications does Inspirai currently have in production? What are your goals and plans for gaming and broader fields?

Yolanda: Since 2017, Inspirai has been continuously innovating in intelligent agent/AI Being product technology, applied in gaming, digital twins, virtual humans, and other fields.

Inspirai's earliest work was decision-making large models applied in gaming, where AI generates a series of decision commands. These currently have substantial deployment in strategy games (including SLG, tactical RPG, and card game genres), where intelligent agents mainly serve as "play companions." A representative example is Inspirai's AI agent defeating a Chinese professional StarCraft II champion.

Our future goal is evolving intelligent agents from "play companions" to "chat companions" and ultimately "companions." After finishing a game, the agent can chat with you, review how you played, and discuss how to coordinate next time. So starting last year, Inspirai has been investing in cognitive dialogue large models, enabling agents to understand game situations and speak.

Further out, we're building multi-modal, lifelike intelligent agents/AI Beings whose capabilities will approach human levels — understanding context, communicating with high emotional intelligence, and also writing, painting, and moving. Analogous to human beings, this kind of AI Being is a new form of life.

>> Ding Jiao: What is XVerse's assessment of future AI-generated 3D content? Do you have any actual cases yet?

Huang Haozhi: First, XVerse believes that future media forms will evolve toward 3D interactive formats, or more convergent and multi-modal directions. The real world is naturally 3D, and users find this media form intuitively familiar.

But compared to AI-generated text, images, and video, AI-generated 3D content is still in a very early stage. The reason is the scarcity of 3D content data. Due to high production barriers and long cycles, 3D content currently relies mainly on PGC (Professional Generated Content) from professional teams, with slow accumulation.

However, we're seeing two trends in 3D content that lay groundwork for future AI generation. First, as 3D content and metaverse-related applications rise, 3D content accumulation is accelerating. Second, 3D reconstruction technology based on the real world is also iterating, continuously lowering the difficulty of 3D content generation. For example, NeRF technology proposed in 2020 only requires multi-angle images with camera poses as input to train a NeRF model, which can then render clear photos from any viewpoint.

In the near term, with relatively little 3D content data available, how is XVerse accelerating AI-generated 3D content? Our approach is using images as a medium — going from text to image, then from image to 3D content. Google recently used NeRF models to attempt text-to-3D generation.

Currently, XVerse is experimenting with AI generation for people, environments, and objects in 3D worlds. For people, we judge that human motion generation (click to learn about XVerse's technical approach) will be the first to explode. With advances in motion capture technology and wearable devices, human motion capture data is accumulating; additionally, generative models like diffusion models can also be used for motion data generation.

For environments, XVerse has iterated through multiple rounds of large-world generation technology, using AI to help generate large-scale, complex scenes.

For example, if we want to generate a 3D road network for Shenzhen, how do we do it? First, our internal art team can sketch roads in 2D graphics; then we use GANs, diffusion models, and other technologies to generate road network models; combining this with geolocation data, aerial photography, and other data, we use CV's strengths in recognition and segmentation tasks to divide and classify the road network, placing appropriate 3D models at different locations, thereby making the road network 3D.

XVerse: Schematic of city-level road network generation results

Ding Jiao: Compared to overseas, what entrepreneurial opportunities will emerge in AIGC in China? Which directions are more likely — for example, To B or To C?

Liu Xinhua: First, it's clear that all domestic entrepreneurs will benefit from generative AI innovation and ecosystems. The emergence of large models and their exponential capability iteration, along with flourishing open-source communities and the massive development and opening of APIs, will all benefit Chinese entrepreneurs. Moreover, large models' generalization and universality are extremely strong — no need for repeated training on scenarios and models (i.e., zero-shot). The resulting low barrier to entry, data flywheel effects, and broad scenario adaptability mean domestic entrepreneurs can do combinatorial innovation standing on the shoulders of giants.

But China will also develop a different startup ecosystem. Infrastructure platform entrepreneurship is relatively difficult, but genuine domestic substitution opportunities exist, especially in scenarios with particularly rich local data.

Additionally, in some new scenarios for infrastructure platforms, Chinese entrepreneurs also have innovation opportunities — for example, in 3D scenarios, gaming, manufacturing, and construction.

The most likely area for Chinese startups to emerge is probably the application layer. The To B direction should have the most potential to explode first. For example, among overseas generative AI companies, the most commercially mature is Jasper.ai with its SaaS model, targeting vertical scenarios like marketing, self-media, and e-commerce with high-frequency, large-scale text production needs. For To C, I believe there could emerge the next-generation Douyin/Kuaishou/Tencent Video opportunity, but this may be more of a big-tech opportunity. Of course, as a new generation of productivity tools, AIGC could also produce To D (developer-focused) companies — for example, Copilot-style code generation tools have already emerged overseas.

Microsoft GitHub AI programming tool Copilot

Looking further ahead, another interesting direction for generative AI is personalized models — perhaps everyone will have their own model in the future, bringing a wave of AIUGC. For example, an influencer could train a model based on past videos, then use it to generate extremely personalized, highly individualistic content in the future.

Yolanda: I'd like to respond to what Xinhua said. We also judge that the first wave of AIGC companies in China will likely be SaaS-type. But something interesting is happening: as AIGC capabilities improve dramatically, individual users may only spend a few dozen yuan per month — the purchase decision cost is very low. Users buy it if they feel the tool genuinely helps them work more efficiently, or write, or draw better. So in some scenarios, the boundary between To B and To C is rapidly blurring.

From this perspective, many previous Chinese AI companies mainly did heavy To B enterprise or To G models. AIGC opens up new imagination for us.

Ding Jiao: With AIGC booming, will the commercial environment for domestic AI companies improve? How can they find incremental space in the future?

Yolanda: Frankly, this year's external environment is very challenging for AI companies. But what Inspirai believes in most is AGI (Artificial General Intelligence) technology — the paradigm and methodology of deep learning plus reinforcement learning. Whether we're training in a relatively closed scenario, like having an intelligent agent learn a game from scratch; or whether it's AIGC breaking out of its niche, extending general intelligence to a broader scope — taking web-wide corpus to train an initial cognitive brain, then continuously correcting and iterating this brain through user data and feedback across scenarios to form stronger capabilities — the underlying technology is the same.

Behind AGI, the world's theoretically best two companies, OpenAI and DeepMind, are driving this major paradigm forward at great speed. Long-term confidence is growing. Two years ago, when GPT-2 came out, there wasn't much sense of it. But after GPT-3 was released, many foreign companies began researching and gained first-mover advantage.

Now China is feeling the intense atmosphere. AIGC is like "a fire in winter" — it has ignited everyone. I believe there will be a breakthrough explosion at the application level in the next year or two. The first applications to catch fire will likely be those related to the internet and general common sense, like marketing copy. But going further, the goal is to build vertical Jasper.ais — for example, 2D assets in games, 3D content like what XVerse is doing, and fields with higher professional requirements. Further still is what we believe in: higher-order forms of content generation with multi-modal fusion.

I also believe domestic entrepreneurs should dare to think bigger and more imaginatively. For example, as domestic large models open up, based on their cognitive capabilities and dialogue understanding, is it possible to build the next AI-era search engine? I think we should always be rapidly exploring and experimenting in these directions.

Ding Jiao: Do domestic AIGC entrepreneurs have any advantages?

Yolanda: I believe Chinese AIGC startup teams have two advantages over overseas teams. First is understanding of scenarios and users. One view holds that the ultimate winners in AIGC may not be AI companies, but companies with the deepest understanding of various industries, scenarios, and users. Chinese teams' strength in this area shouldn't be underestimated — for example, in mobile internet, we built super platforms for payments, shopping, and content like Douyin and Kuaishou.

Second, China's entrepreneurial environment is about speed above all. In a very new field, you may start with only vague ideas, but you need to rapidly build organizational capability, iterate quickly, and trial-and-error your way to real opportunities and good product forms.

Ding Jiao: If you're not an AI technology company itself, how can you learn about AIGC and benefit from this opportunity and growth trend?

Huang Haozhi: First, I think you should think about unique directions combined with your own industry. For example, if I'm in apparel, what can I do with AIGC? I might not know what kind of clothes will sell well, so I could design a prompt describing what the clothes look like, generate images, filter them, do small-batch ad testing, and use data feedback to inform subsequent design decisions.

Also, if your company has stronger technical capabilities, there's one certain thing you can do: determine your own inputs and outputs. Because some data sources are only available to certain companies, and today you can use cloud training resources to train your own models, outputting content for specific domains. It's worth noting that inputs can be not just natural language descriptions, but also data analysis descriptions, legal documents, and more.

Ding Jiao: Some worry AIGC may replace humans; others think AIGC isn't capable enough yet. How will humans coexist with AI in the future?

Liu Xinhua: AIGC can't completely replace human work, but it is indeed a powerful assistant for humans — augmentation, not replacement. We should maintain sufficient openness, because the speed of AI evolution matters more than how low the starting point is. Today AIGC has broken out as a new generation of productivity tools. Every entrepreneur, developer, and creator should pay attention to AIGC's progress. In your respective fields, I suggest experiencing products like ChatGPT, Jasper.ai, Midjourney, and Stability.ai — you'll likely find very interesting discoveries.

And future AIGC entrepreneurs should think: your product needs to seamlessly integrate with existing workflows, making them more efficient. This low-friction integration more easily drives commercial adoption of AIGC. For example, Jasper.ai has excellent product experience — not only horizontally integrating with all mainstream user creation workflows, but also vertically integrating SEO and marketing optimization tools, multilingual translators, and compliance tools for identifying copyright issues. The workflow supports remixing and re-editing. So excellent AIGC products don't compete with humans, but better combine with existing human tools to amplify human strengths; they also let AI progress faster and adapt to specific scenarios. This "non-intrusive experience" has enabled Jasper to achieve, in less than two years since founding, 100,000 paid users, ARR exceeding $90 million, and annual user retention over 57%.

As AIGC is applied, new job types will also emerge — for example, generative AI trainers or prompt engineers. "How to write a good prompt so AI better understands you and produces better output" are potential directions we can think about.

Looking ten years ahead, many unexpected things or new species may appear. For example, fashion designers once prided themselves on aesthetic ability, but future AI capabilities combined with excellent designers may transcend past frameworks and create entirely new aesthetics.

Looking to the future, as we approach general artificial intelligence, AI will inevitably be humanity's friend. As NVIDIA CEO Jensen Huang said, AI will help address population and labor shortages in the future, improving overall societal productivity and prosperity.