"Taishi Intelligent Navigation" Releases World's First Scalable Real-World Embodied Multimodal Dataset, Beating Tesla by About Six Months | Linear Portfolio

线性资本·October 10, 2025·8·1

A human-centric embodied data engine paradigm.

Today, Tashi Intelligence unveiled World In Your Hands (WIYH), the world's first large-scale real-world embodied VLTA multimodal dataset.

This milestone establishes Tashi Intelligence's industry-first Human-Centric embodied data engine paradigm — a technical approach roughly six months ahead of Tesla's Optimus.

Linear Capital was a continuous investor in Tashi Intelligence's $120 million angel round and $122 million angel+ round. Linear has long tracked technological and commercial progress in embodied intelligence, and looks forward to startups like Tashi leading the way toward a new chapter in general-purpose AI.**

On October 10, Tashi Intelligence officially released World In Your Hands (WIYH), the world's first large-scale real-world embodied VLTA (Vision-Language-Tactile-Action) multimodal dataset, with plans to open it to the industry in December 2025. This achievement marks the formal establishment of Tashi Intelligence's industry-first Human-Centric embodied data engine paradigm — a technical approach roughly six months ahead of Tesla's Optimus.

For a long time, mainstream large model pretraining has relied on internet data and simulation data, both of which suffer from critical shortcomings: internet data varies wildly in quality and lacks action information; simulation data has limited realism and poor scene generalization, making it difficult for trained models to transfer smoothly to the real world. For humanoid robots, the greatest barrier to "embodied intelligence" is not the algorithms themselves, but how to obtain scalable, authentic, and generalizable data. The scarcity of high-quality training data has become a widely acknowledged bottleneck.

Dr. Wenchao Ding, Chief Scientist at Tashi Intelligence, stated that the release of the WIYH dataset marks the industry's first large-scale cross-industry, cross-task collection of vision, language, tactile, and action multimodal data in the real world, laying the groundwork for future scaling laws in embodied foundation models.

In the Human-Centric first-person data collection videos released by Tashi Intelligence, unlike the static, monotonous environments of labs and data factories, WIYH draws on authentic work scenarios and workers across multiple industries, capturing standard operating procedure data for embodied tasks ranging from hotel laundry to supermarket stocking to logistics operations. WIYH's data not only solves the problems of "insufficient volume, low quality, and high cost" — it ensures the data comes from the real world.

WIYH is characterized by four key attributes:

Authentic: Collected from real embodied tasks, closely aligned with actual model application scenarios;
Rich: Spanning multiple industries and manipulation skills, giving models transfer and generalization capabilities while breaking down data reuse barriers;
Comprehensive: Encompassing full ground-truth multimodal data across vision, language, tactile, and action, facilitating pretraining modality alignment;
Massive: At a scale comparable to large language models, ensuring the future potential of embodied intelligence.

Laundry worker wearing self-developed collection equipment to fold towels

Building on these four core characteristics, the WIYH dataset delivers three distinctive advantages:

First, in modality completeness: Through self-developed collection hardware, it simultaneously captures vision (RGB), force-tactile (pressure sensor signals), and action (finger joint poses and end-effector trajectories) data, ensuring precise temporal and spatial alignment across multimodal sources;

Second, in data annotation pipeline: WIYH leverages proprietary cloud-based foundation models for high-precision annotation, covering multi-granularity ground-truth labels including 2D semantics, scene depth, operation task decomposition, object affordance, and hand and end-effector motion trajectories — providing comprehensive, multidimensional supervisory signals for embodied foundation model pretraining;

Third, in collection environment: Rather than the industry's typical high-cost dedicated data collection and training factories, Tashi embeds itself in authentic living and working environments, capturing workers' standard operating procedures in non-constructed, non-proprietary, non-enclosed settings. This significantly enhances data authenticity, diversity, and generalization while reducing collection costs by more than an order of magnitude.

Supermarket worker wearing self-developed collection equipment to restock shelves

The introduction of Tashi Intelligence's World In Your Hands marks the establishment of a Human-Centric embodied data paradigm. It makes possible the pretraining of embodied AI World Engines oriented toward the real world. Rooted in "thousands of industries," WIYH aims to achieve "one model, thousand tasks" — becoming critical corpus and infrastructure for training general-purpose embodied foundation models, pushing industry applications from single-task operations toward general manipulation capabilities, and laying solid groundwork for embodied robots to truly enter millions of businesses and households.

Tashi Intelligence is committed to providing the industry with the highest-quality robot hardware, data, and model solutions. The WIYH dataset is scheduled to open in December 2025. Research institutions and partners are welcome to collaborate with Tashi in building an open and thriving embodied intelligence ecosystem, and to open a new chapter in general-purpose AI.