How Far Are We from LLMs Creating New Commercial Growth and Super Apps? | Ronghui · Inside Volcano Engine

高榕创投·April 10, 2024·24·0

Large language models are driving new business growth and novel forms of interaction.

2024 is the pivotal year when large language models transition from technology to real-world application. On this journey, ByteDance is a player that cannot be ignored — from launching its Skylark foundation model to rolling out the Ark LLM service platform, the company is now exploring how to integrate large models into dozens of business scenarios across its ecosystem.

How can LLM capabilities empower core business functions and drive new commercial growth?
What prerequisites must a super app meet in the LLM era?
How can complex scenarios like Douyin E-commerce deploy large models to upgrade user experience?
What evolutionary possibilities await content and interaction paradigms in the next era?

Recently, Gaorong Ventures invited more than 30 innovative companies to visit Volcano Engine for discussions on these topics, exchanging insights on ByteDance's explorations and practices in the large model space, as well as the underlying cloud computing services and LLM ecosystem that Volcano Engine has built.

Ecosystem and Openness

Building an AI Innovation Ecosystem

"Only when we achieve a closed loop from underlying technology to capital acceleration to upper-layer applications can we truly establish China's AI innovation ecosystem." In his opening remarks, Volcano Engine President Dai Tan stated that in the coming period, ByteDance and Volcano Engine aim to strengthen their foundation model capabilities while also addressing the innovation costs faced by enterprises. Volcano Engine plans to launch a series of incubation programs to lower the barrier to entry and reduce costs for businesses using large model services. The company will also partner with venture capital firms to help startups accelerate their path to a closed loop.

"The innovation opportunities brought by large models are truly exciting," noted Rui Han, partner at Gaorong Ventures. In this wave of innovation, ByteDance is one of the most significant players in the large model space, he explained, because its massive user base and richly diverse business scenarios provide the ideal proving ground for large models and AI technology. "Volcano Engine has not only earned an excellent reputation within ByteDance but is also opening up the technologies, growth methodologies, and tools accumulated during ByteDance's rapid development to external enterprises — to some extent, this is driving the industry forward."

Models and Compute

Prerequisites for the Emergence of Super Apps

In June 2023, Volcano Engine launched Ark, its large model service platform, offering model training, inference, evaluation, and fine-tuning capabilities. Wu Di, head of intelligent algorithms at Volcano Engine, predicted in his presentation that China will see an explosion in application-oriented compute power within the next two to three years. He estimated that around late 2024 or early 2025, demand for application-oriented compute (inference + fine-tuning) will surpass demand for training-oriented compute.

Wu described China's current application market as being like early spring — "from a distance, the ground still looks barren, but if you listen closely, you can feel the earth trembling, and soon bamboo shoots will sprout from every corner." Many customers are already gradually finding their footing on the Ark platform, learning how to apply large model capabilities to their core businesses. "When they start pushing for deeper penetration, the explosion in inference usage will be staggering."

To meet the impending surge in compute demand, Wu said, Volcano Ark will forge three core competitive advantages to truly help entrepreneurs solve critical problems.

Quality Models: Volcano Ark is building a "multi-cloud, multi-model" ecosystem that encompasses ByteDance's Skylark model alongside models from Zhipu AI, Moonshot AI, and others, while selectively introducing some high-capability open-source models.

Abundant Compute: Volcano Ark has done extensive foundational systems work, with extremely flexible scheduling and ample compute resources, so that customers feel like they're turning on a faucet with endless tokens flowing out — supporting their pursuit of success in the large model race.

Low Costs: Cost-effectiveness is measured by the customer experience. The large model era will only truly arrive when everyone can easily and freely purchase large model services and applications.

Wu added that beyond these three foundational capabilities, Volcano Ark is bringing in internal and external partners for critical functions like agent orchestration, content safety, and RAG (retrieval-augmented generation), allowing customers to access integrated, high-quality resources on a one-stop basis. For agent orchestration, for example, the Ark platform connects with Coze (coze.cn), the AI bot development platform, to help customers efficiently build lightweight prototypes.

As large models, compute infrastructure, and related foundations gradually mature, when will super apps emerge? "A super app is a faithful reflection and expression of large model capabilities — like a bottle and its wine. The app is the bottle; what matters is how mature the wine is." Wu believes that, under this premise, super apps may require three necessary conditions:

First, a super app needs to aggregate demand from many people, just as Douyin E-commerce does.

Second, a super app will inevitably lower the barrier of human-computer interaction from high to low, hiding complex planning and logic behind the scenes — much as CapCut made video editing accessible to everyone.

Third, a super app will likely have deep hardware integration. In the long run, the signals captured by large models won't be limited to text; they will likely integrate well with sound, vision, light, and even human movement and acceleration.

Looking further ahead, Wu believes that the emergence of large language models will expand each individual's information processing capacity, enabling organizations to achieve multiples of their previous output at the same scale — ultimately transforming the economic environment and competitive landscape in business. How should enterprises respond to such changes? Wu shared his team's own practice: "We want all future output from team members to be LLM-friendly — documents, code, and so on should all be easy for large models to fine-tune and use, so that the models absorb the team's intellectual output and form a positively reinforcing flywheel."

Applications and Scenarios

The Ultimate Goal Is Converting to Enterprise Revenue

Addressing entrepreneurs' concerns about practical large model deployment, Luo Yihang, head of AI application products at Volcano Engine, and Li Xiaoqing, a technical expert at Douyin E-commerce, shared their hands-on experiences in AIGC marketing and Douyin E-commerce scenarios respectively.

Luo noted that moving from models to scenarios — especially enterprise scenarios — requires polishing, validation, and deployment before there is any prospect of converting to enterprise revenue.

Taking AIGC marketing applications as an example, Volcano Engine has built a suite of applications based on foundational large model capabilities that span customer outreach, acquisition, and engagement — improving creative efficiency, interaction efficiency, and retention efficiency. For marketing departments, the Intelligent Creative Cloud offers intelligent creative insights, AIGC asset generation and content creation, and multi-platform distribution management. An increasing number of customers are also achieving notable results through AR + AIGC interactive marketing. For sales teams, the company has developed an intelligent sales assistant to improve efficiency, encompassing capabilities such as intelligent outbound calling, intelligent coaching, lead scoring, store management, and knowledge assistance.

Douyin E-commerce, as a typical application scenario with dense content and high-frequency interaction, has actively explored using large models to improve user experience, deploying applications such as intelligent shopping guides, intelligent robots, and copy generation.

Li Xiaoqing analyzed how intelligent shopping guides powered by large models can "outperform" previous approaches and complement search. "In the past, customer service robots facing consumers' varied requests needed multiple modules, each requiring separate training and tuning — the entire system became extremely complex, with limited comprehension and relatively mechanical responses that struggled with multi-turn interaction. With large models, you simply input the query, context, and business state, and the model generates an answer — broader in scope, higher in quality, and with lower annotation costs."

Content and Interaction

New Possibilities Brought by Large Models

As AI technology advances, how can experiences in games and other products be upgraded — and what entirely new forms of content and interaction can be created? Under the moderation of Chen Qilin, executive director at Gaorong Ventures, ByteDance's UGC team and three entrepreneurs discussed their predictions for the surprises AI will create for us in the near future.

AI Enables Three Breakthroughs for Future Content Platforms

Yunzhongzi Technology has developed MidReal, a storytelling platform for overseas C-end users. Kaijie Chen, CEO of Yunzhongzi Technology, explained the motivation behind building this platform: "We believe storytelling is the essence of all entertainment forms and will be a crucial kernel in the next AI-native era — whether it's novels, film scripts, short video scripts, or game narratives, they are all stories."

In Chen's view, AI enables three breakthroughs for future content platforms, including gaming platforms.

First, AI breaks the definition of tools. In the past, humans acted as producers using various tools to create content and assets. But as agents further develop, AI can itself define problems and solve them — so agents may graduate from tools to become producers themselves.

Second, AI achieves consumption-as-production. Consumers themselves participate in the production process, with production happening in real-time interaction with consumers. On a storytelling platform, for example, a user might click once and generate a storyline that develops along their ideas — at which point the boundary between content production and consumption becomes quite blurred.

Third, AI changes who counts as a value creator. When the roles of producer and consumer can switch freely, how the platform distributes generated revenue also changes.

Chen also looks forward to integrating multimodal large models into the backbone of storytelling in the future — "turning stories directly into images, videos, 3D, and so on. This could be very exciting within the next 1-2 years and bring new structural changes to the content and entertainment industries."

AI Opens More Imaginative Space for Future Game Formats

Hyperparameter is an AI-focused technology company dedicated to creating a virtual world where 1 billion humans and 10 billion AIs coexist. Its specific capabilities include building AI bots for games and constructing the technical systems for AI NPC ecosystems. As native inhabitants of virtual worlds, AI bots and AI NPCs not only help players enjoy more pleasurable gaming experiences but also bring new content and social relationships to humanity.

Zhu Hengman, head of platform technology at Hyperparameter, explained that as large model technology advances, AI capabilities in virtual worlds are continuously improving — better understanding player behavior and language, capable not only of free-form conversation but also perceiving environmental changes and responding accordingly, generating new plotlines in games. Independently interactive AI can bring more possibilities and imaginative space to future game formats and gameplay.

Regarding future technology expectations, Zhu discussed DeepMind's recently released world-generation model Genie (Genie: Generative Interactive Environments), which can generate interactive video that responds to real-time user action inputs. Although the video quality generated by Genie still lags behind Sora, it demonstrates the possibility of real-time interactive video generation — once mature, this could bring disruptive changes to many industries.

AI Creates Interesting Souls and Beautiful Shells

Shu Zhi, head of game AI technology at ByteDance UGC, shared his team's current work — "using AI technology to create interesting souls and beautiful shells in games." Beautiful shells might be NPCs or character creation in games; interesting souls require underlying capabilities in large language models, reinforcement learning, and other technologies.

UGC games are widely seen as an inevitable trend. How can generative AI be applied in this context? Shu used this year's much-discussed 3D content AI generation as an example to predict that in the future, players may be able to create a home or city through a simple prompt; or generate avatars — previously, face-sculpting was somewhat difficult, but with AI technology, users could upload a photo or input a text prompt to sculpt a face. Going further, it may become possible to generate 3D items, and even new levels and plotlines, enriching creative gameplay.

From 3D Generation to World Models

Yingmu Technology, a member of the Volcano Engine accelerator program, focuses on 3D generation, including character and object generation. Zhang Qixuan, CTO of Yingmu Technology, noted, "The biggest lesson from works like Sora is: don't try to ascend to a new modality from the previous one." Before Sora, most video generation relied on pre-trained 2D generation models that outputted sequentially consistent frames within a certain range. But in Sora's architecture, video can be output directly — so there's no need to solve video stability and consistency issues. "It inherently possesses the ability to simulate the physical world, and the same applies to 3D generation."

In the large model era, startups not only need to reshape their development models, product models, and business models — they are also being compelled to build LLM-friendly organizations to better respond to the coming technological transformation and new competitive landscape.