Yunqi Capital | DeepRoute.ai x Volcano Engine, Partnering to Accelerate Agent Deployment in Vehicles
From Driving Machine to Intelligent Agent: The Evolution of the Car

When cars cease to be mere transportation and become intelligent agents endowed with perception, comprehension, and decision-making capabilities, they are evolving from simply "knowing how to drive" to "knowing how to think."
Recently, DeepRoute, a Yunqi Capital portfolio company and a leading domestic intelligent driving firm, joined forces with Volcano Engine to unveil its latest Vision-Language-Action (VLA) model, simultaneously revealing four core capabilities: "X-Ray Vision," "Know-It-All," "Interpreter," and "Responsive Assistant" — marking a new phase in its AI driver technology.
In this "Yunqi Capital" feature, we take you inside how DeepRoute is exploring VLA models to build generalist capabilities for intelligent driving, pushing artificial general intelligence toward real-world embodied applications.
The following content is republished from "DeepRoute"
As an internationally leading AI company, DeepRoute is dedicated to building "physical-world AGI," developing innovative technologies to create AI drivers and achieve RoadAGI. On June 11, DeepRoute announced two major developments at the Volcano Engine Force Conference.
💡 Development 1
DeepRoute announced a partnership with Volcano Engine to co-build physical-world agents powered by Doubao large model capabilities — bringing intelligence into physical reality and creating an AI-driven future world.
💡 Development 2
DeepRoute released four new VLA model capabilities: "X-Ray Vision," "Know-It-All," "Interpreter," and "Responsive Assistant"
(VLA stands for Vision-Language-Action model)
1. Spatial Semantic Understanding
This equips AI vehicles with driving "X-Ray Vision", specifically designed to tackle dynamic and static blind spots such as bridge underpass navigation and bus-obstructed sightlines. It fully reconstructs and comprehends the driving environment, precisely resolving risks from blind-spot driving. For example, in a bridge underpass turning scenario, the VLA model can identify a sign reading "Caution: Cross Traffic, Slow Down," account for dynamic blind-spot risks from buses, and infer that pedestrians may be crossing ahead — proceeding with caution and prioritizing safety.

2. Irregular Obstacle Recognition
Functioning like a "Driving Know-It-All," this VLA super-learner rapidly absorbs vast knowledge from the internet and archives all real-world experience. Corner cases become minor cases; driving challenges are "Know-It-All" manageable. For instance, it can identify various irregular overloaded small trucks and safely navigate around them, ensuring smooth passage.

3. Text-Based Sign Comprehension
Serving as a "Driving Interpreter," it reads and understands the meaning of complex textual signage, efficiently parsing road signs and selecting the correct route. For example, at variable lanes or complex multi-lane intersections, it accurately understands text-based waiting-area signs, matches the appropriate lane, and executes maneuvers — taking the right road at the right time.

4. Voice Interaction and Vehicle Control
In voice-controlled driving, the VLA model demonstrates efficient response capabilities that are "steady, precise, and nimble" — truly earning its title as "Responsive Assistant." Users simply issue natural voice commands such as "slow down," "turn left at the third intersection ahead," "U-turn at the next intersection," or "pull over," and the system responds within seconds and executes with precision.
Meanwhile, the VLA can perceive real-time road condition changes, assisting users with safe and rational driving decisions such as "don't use the bus lane," "change lanes to avoid large vehicles," or "pass the slow car ahead." It doesn't just understand the road — it understands the user, crafting a smarter, more reassuring travel experience.

With the release of these four core VLA capabilities, DeepRoute has once again raised the technical ceiling for intelligent driving. From "X-Ray Vision" to "Responsive Assistant," the VLA model not only endows AI vehicles with unprecedented environmental perception and interactive capabilities, but further advances the embodied realization of artificial general intelligence in the physical world.

Founded in 2019, DeepRoute successfully launched its latest-generation intelligent driving platform DeepRoute IO, which operates without high-definition maps and applies an end-to-end model, as well as a new-generation VLA model (Vision-Language-Action model). With competitive products and services, the company has established mass-production partnerships with multiple automakers, jointly advancing the rollout of over ten vehicle models. By the end of 2025, more than 200,000 vehicles equipped with DeepRoute's combined assisted driving solution are expected to reach consumer markets.
We look forward to DeepRoute continuing to build on technology and ground itself in real-world deployment, boldly pioneering new frontiers and driving toward a new milestone in the intelligent era.





