Toward the Agentic AI Era: Three Underlying Threads Behind the Blooming Landscape | Gaorong Ventures × Volcano Engine

高榕创投·July 28, 2025·42·0

From Deep Research to Deep Action.

We are entering the era of Agentic AI. In this new wave, we are no longer satisfied with conversation and generation alone — we are also pursuing action and autonomous execution. Beyond driving innovative applications, Agentic AI is marching toward broader industrial practice and deeply embedded scenarios.

The leap toward autonomous execution depends on advances in large model reasoning capabilities, multimodal understanding, and the ability to handle complex multi-step tasks, as well as a comprehensive upgrade of AI infrastructure.

Recently, Volcano Engine released Doubao 1.6, the Doubao video generation model Seedance 1.0 Pro, and an AI-native cloud product suite — not only solidifying its technical advantages but also co-creating with various partners to reshape productivity and ways of thinking.

In July, Gaorong Ventures brought together over 60 entrepreneurs to visit Volcano Engine, tracking its latest developments in foundation models, AI-native cloud infrastructure, and Agents, and exploring the technical possibilities and boundless potential of Agentic AI to transform industries.

"We've reached a new inflection point where the next generation of AI entrepreneurs has the opportunity to acquire users and capture corresponding commercial value at scale in better ways," said Shuo Hu, partner at Gaorong Ventures, in his opening remarks. As model capabilities improve and infrastructure matures, users have gradually shifted from skepticism to deep conviction in the value of AI applications.

From the perspective of model providers and cloud platforms, there is also relentless acceleration of AI capabilities toward deeper deployment. As Song Tao, general manager of Volcano Engine's North China business, put it: "With the continuous iteration of ByteDance's large model technology and the evolution of Volcano Ark as a large model service platform, we look forward to supporting more enterprises in achieving business growth driven by large models, and providing real momentum for their digital and AI transformation."

Hu pointed out that with the release of a series of reasoning models this year and the emergence of AI Agent products, AI applications are evolving from solving research-oriented complex tasks (Deep Research) toward solving complex, execution-oriented tasks (Deep Action) — the boundless potential of Agentic AI remains to be unlocked.

He also shared several recent observations on AI applications with the entrepreneurs, including that "AI is moving from AI as Tool toward AI as Outcome." "One intuitive benchmark is whether an AI application can charge by results. So we are bullish on AI applications or AI Agents that can adopt outcome-based pricing — this means taking responsibility for reasoning results and delivering truly valuable outcomes to users."

Wu Di, head of intelligent algorithms at Volcano Engine, shared that whether for AI applications or Agentic AI, all the flourishing diversity of 2025 is following three main threads; meanwhile, Doubao and Volcano Engine are also continuously upgrading along these same three threads.

Thread One: Thinking Models with Visual Understanding

Wu noted that Thinking models, which emerged early in 2025, have still mainly served search engines and vertical industry deep research through Q2. "Is this the ceiling for Thinking models? Of course not. Because we haven't yet given reasoning models truly difficult problems."

Wu believes that the truly difficult problems involve understanding real-world visuals and making inferences and decisions based on them. This year will see more model providers vigorously pushing forward reasoning models with multimodal understanding capabilities. Such reasoning models have the potential to empower a large number of high-difficulty, high-value, high-PMF industries.

Take the recently released Doubao 1.6 as an example: it already achieves relatively high accuracy on multimodal understanding tasks and has begun landing applications in scenarios such as e-commerce (e.g., product review), automotive (e.g., autonomous driving annotation), and inspection patrols.

Thread Two: Capability Improvements in Video Generation Models

This year, video generation models globally have seen continuous capability improvements, with models that can genuinely create value in real production environments. Since the global release of the Seedance 1.0 Pro model in June, daily video generation volume has grown at a staggering pace.

Wu has two trend predictions for video generation models. First, by the end of this year, leading video generation models will see another leap in generation quality, closer to commercial-grade requirements — with significantly improved handling of complex scenes and small moving objects. Second, the path of video generation models and large language models will increasingly converge, with greater emphasis on the intelligence of generation models in the future. More and more model providers will better unify multimodal generation with large model understanding capabilities.

Thread Three: Maturation and Penetration of Multi-Step Complex Task Capabilities

Wu predicts that by the end of this year or early next year, preliminary Agentic AI or AI Agents will begin entering production and daily life. "The marker is that you'll gradually get used to handing an Agent a task worth maybe 5 to 20 RMB and trusting its output."

Multi-step complex tasks cannot be separated from tools. In June, Volcano Engine released MCP Hub to help enterprises efficiently build Agents.

Wu judges that when Agentic AI matures, it will trigger a wave of reinforcement learning. "Because Agentic AI or Agents entering real production and life scenarios means, on one hand, that the value per task increases, while tolerance for errors decreases — therefore requiring extensive post-training and reinforcement learning around these high-value scenarios. By the first half of 2027, China's post-training or reinforcement learning compute share may approach or even exceed that of pre-training."

Beyond large models, Volcano Engine is also leveraging multiple vehicles — the Coze platform, AI-native cloud services, and accelerators — to provide critical support for visionary startups, comprehensively accelerating the journey of AI innovation from concept to market.

Coze: Reshaping Productivity with Agents

Yang Yun, Coze product solutions specialist at Volcano Engine, introduced that in April this year, Coze was upgraded to a new-generation AI Agent platform, attracting various enterprises and developers to build AI applications on Coze in an end-to-end manner. "Once enterprises experience Coze as a platform representing advanced productivity, it's hard to go back to the era of high-code orchestration of AI applications."

Addressing actual enterprise needs, Coze emphasizes providing a development experience that is accessible, efficient, and affordable.

Accessible: Visual interface allowing enterprises to easily orchestrate, debug, and deploy applications, with seamless connection to business systems.
Efficient: Built-in tools, plugins, and multi-scenario templates that significantly boost development efficiency. From idea to project delivery, cycles can be measured in weeks or even days.
Affordable: Charged based on model usage.

To date, the Coze development platform has seen a wave of development practices, including internal efficiency applications such as intelligent Q&A, information processing, and content generation; as well as external effectiveness scenarios like smart hardware companion Agents and education and training Agents. In smart hardware, for example, customers can use Coze to orchestrate intelligent agents, forming an end-to-end hardware solution that quickly makes hardware "speak and interact."

AI-Native Cloud: Stronger, Faster, More Open

Entering the Agentic AI era, developers' paradigms have also shifted, driving accelerated evolution of AI-native cloud. Luo Hao, head of Volcano Engine's cloud foundation products, introduced that Volcano Engine aims to continuously optimize cloud foundations to help developers achieve more cost-effective inference, faster AI application development speed, and improved efficacy and efficiency in post-training for reasoning models.

To this end, Volcano Engine recently packaged and launched three kits: AgentKit, TrainingKit, and ServingKit.

AgentKit helps developers achieve minute-level deployment of Agents. Based on Doubao 1.6's leading visual understanding capabilities, it enables Computer Use and Browser Use. It also emphasizes "out-of-the-box" experience, quickly deploying tools needed for Agent development with one click and integrating various Tool Use and MCP services.

ServingKit primarily helps developers optimize inference capabilities. Through broad ecosystem support, operator optimization, and disaggregated parallel architecture, it can reduce TTFT (Time To First Token) to 0.5–1 seconds and achieve a 3x improvement in TPS (Tokens Per Second, one of the key metrics for measuring model inference efficiency).

On TrainingKit, beyond pre-training, Volcano Engine has also strengthened optimization for post-training. With leading training efficiency and stable training environments, numerous autonomous driving, embodied intelligence, and robotics customers have chosen Volcano Engine for training.

Volcano Engine Accelerator: Helping AI-Era Startups Succeed

Zhou Yilei, director of Volcano Engine's startup ecosystem and accelerator business, introduced how the V-START · Volcano Engine Accelerator helps startups accelerate their business and growth through internal and external acceleration systems.

From the internal perspective, the accelerator provides startups with large model foundations and dedicated technical support, compute subsidies, and joint product co-creation. Externally, it partners with investment institutions, tech parks, universities, and business schools to expand financing channels and introduce customer opportunities for startups.

Although developing AI Agents and various AI applications has "never been simpler" today, creating value with AI in real scenarios still has enormous untapped potential, requiring continuous accumulation of experience and rapid iteration in practice.

Under the moderation of Jia Rui, head of Volcano Engine's startup ecosystem and accelerator, Shi Yang of TRAE, Zhang Yan of Lark growth, Fan Wei of Deloitte AI Institute, and Jin Xin of Gaotu Techedu's AI R&D department shared their practices and reflections on implementing AI from the perspectives of AI coding, enterprise AI, AI + professional services, and AI + education.

TRAE: AI Coding from Code Generation to Software Delivery

Shi Yang introduced that from the January release of the international version of the TRAE IDE and the March launch of the China version to date, TRAE has honed three core product capabilities: Cue code completion, Chat Q&A, and Agent code generation. Currently, TRAE has over 1 million monthly active users, with more than 6 billion lines of code generated and adopted cumulatively, making it a popular AI IDE product among developers. Shi emphasized that TRAE remains focused on serving professional developers, helping them improve efficiency in real work scenarios.

Over the past two years, the AI coding field has made great strides: the 1.0 stage focused on "code generation," improving coding efficiency through plugins or IDEs; with model capability upgrades, the 2.0 stage will move toward "software delivery," achieving end-to-end tasks in complex scenarios.

Looking ahead to the coming year, Shi believes that "we will soon see further improvements in the efficiency and intelligence of models in the coding domain," and that there are many new opportunities around AI coding, such as coding for general Agent scenarios, or generation through multimodal approaches.

Lark: AI That Truly Works and Truly Lands

Lark is committed to being the best partner for enterprises implementing AI. Zhang Yan introduced that this year Lark put forward an important theme: "AI that truly works and truly lands." To this end, Lark released the industry's first AI application maturity model, referencing the L1 to L4 levels in autonomous driving to classify AI application maturity or usability into four stages from M1 to M4. M1 is proof-of-concept, M2 is early adoption, M3 is production-ready, and M4 is full deployment.

"The value AI delivers in enterprises today isn't about features, it's about results." Zhang pointed out that the most representative application that can be called M4 today is Lark's new intelligent meeting minutes; M3-level applications include enterprise knowledge Q&A and multidimensional tables.

Deloitte: Enterprises Clarifying Real Needs and Scenarios

Fan Wei of Deloitte shared observations and reflections from helping numerous clients implement AI. Over the past two years, many large enterprises approached AI implementation in a "pre-made meal" mode — there were good models on the market, so they called APIs and explored tentatively; today it's more customized, with enterprises gradually clarifying their real needs and scenarios, assessing technical feasibility from needs, and truly achieving implementation.

Additionally, with the arrival of ChatGPT moments, DeepSeek moments, and similar inflection points, AI penetration has become more universal, especially among young "natives" of the AI era, who are also pushing large enterprises to implement AI.

Gaotu Techedu: All with AI, Always AI

As a leading benchmark enterprise in education, Gaotu continues to explore the boundless possibilities of "education + AI," proposing this year's strategy of "All with AI, Always AI." Jin Xin shared that Gaotu is laying out best practices for "education + AI" across three dimensions: product capability, operational capability, and organizational capability.

Product capability: Hoping to provide users with a 24/7 "study companion," offering personalized learning services that are professional, efficient, and warm.
Operational capability: Education industry operations are very heavy; hoping to provide teachers and colleagues with a 24/7 "teacher companion" to help with repetitive work and comprehensively improve efficiency and service quality.
Organizational capability: Although AI is becoming stronger, it is difficult to completely replace humans, so a human-AI hybrid solution is expected to be adopted for a long period. This requires rethinking AI-driven organizations — for example, requiring colleagues to think about AI, use AI, and more importantly, believe in AI.

With advancing models, improving infrastructure, and diversifying application forms and industry implementations, we are accelerating toward the eve of an Agentic AI explosion. We look forward to exploring the innovative possibilities and growth opportunities within it, pragmatically and romantically, alongside more entrepreneurs.