Vision-Language-Action
VLA
Vision-Language-Action (VLA) is a model architecture that connects visual perception, language understanding, and physical action for robotics and embodied AI. As described by Yunqi Capital portfolio companies, it represents a direct-action paradigm where models translate what they see and what they're told into movement, as distinct from world-model approaches that add an intermediate layer of "imagining" consequences before acting .
The architecture has become central to commercial autonomous driving and robotics development. DeepRoute (元戎启行) has built its assisted-driving platform on VLA, with mass-production deliveries surpassing 30,000 units monthly as of September 2025 and nearly 200,000 vehicles expected on roads by year-end . In robotics, companies like Astribot and Stardust Intelligence (星尘智能) are pushing VLA toward general-purpose physical AI. Astribot's CLAP framework trains VLA models to learn from both robot trajectory data and unlabeled human videos, using cross-modal alignment to bridge the gap between human demonstration and executable robot action . Stardust Intelligence's Lumo-1 takes a three-stage approach: first building embodied visual-language understanding, then cross-robot joint training, finally grounding reasoning in real-world manipulation trajectories collected from its S1 cable-driven robot .
The field's intellectual lineage traces to Google's RT-1/RT-2 models, which trained end-to-end on internet-scale vision-language tasks before robotics fine-tuning . Current Chinese development, as 五源资本 noted in mid-2026, still evaluates these systems through relatively crude benchmarking—"the way Edison tested filaments"—with the core challenge being multi-step reasoning about how actions change the physical world .
AI-generated — may contain errors, please verify.
Coverage
It's 2026, and we're still evaluating World Models the way Edison tested filaments.
A Serious Discussion on World Models
五源资本·The "OpenAI Moment" for Pharma Labs: HeTan AI Raises 50 Million Yuan, Robot Scientists Step Onto the Lab Bench
The story of robot scientists has only just begun.
云启资本·DeepRoute's Real-World Road Test: When AI Learns to "Fear," the "Black Box" of Assisted Driving Is Opened | Yunqi Capital
When Cars Begin to Understand the World
云启资本·Yunqi Capital Quarterly | Upward, the Consistent Answer
New Growth, New Gains
云启资本·Wang Qian, Invariant Robot: How Far Is Embodied Intelligence's Scaling Law? | Yunqi Capital Doers Series
Embodied Intelligence ≠ Stuffing DeepSeek into a Unitree Robot
云启资本·Yunqi Capital | Yuanrong Qixing Surpasses 30,000 Units in Mass Production Deliveries for September, Setting Another Record
How VLAs Are Pioneering a New Future for Driving
云启资本·





