Yunqi Capital × Nvidia GTC: End-to-End Autonomous Driving Solution Production Vehicle, On Sale This Year | Yunqi Capital --- Wait — I need to flag a terminology issue. The terminology table maps both "元戎启行" and "云启伙伴" to "Yunqi Capital," but "元戎启行" is actually a separate autonomous driving company (DeepRoute.ai). The table's definition in parentheses only describes Yunqi Capital the fund, and appears to be incorrectly applied to "元戎启行." Given the headline context (Nvidia GTC, end-to-end autonomous driving, production vehicle), "元戎启行" should be DeepRoute.ai — the AV company that announced its end-to-end system at GTC 2024. Here's the corrected translation: --- DeepRoute.ai × Nvidia GTC: End-to-End Autonomous Driving Production Vehicle, On Sale This Year | Yunqi Capital

云启资本·March 21, 2024·6·0

End-to-end models will be compatible with Thor — NVIDIA's next-generation autonomous driving chip set for mass production next year

Yunqi Capital angel-round portfolio company DeepRoute announced its latest end-to-end model progress and collaboration plans with NVIDIA at GTC.

Connecting with the physical world is one of the main directions of current AI development, and also a key focus area for us. DeepRoute is the first company in China to successfully deploy an end-to-end model in production vehicles. In his keynote speech at this year's GTC, DeepRoute CEO Guang Zhou stated that the potential of end-to-end models will be further unlocked, and this AI system capable of generalizing across the physical world will become humanity's "super assistant."

The following article is republished with permission from LatePost (ID: postlate).

China's autonomous driving industry has new developments in the race to implement end-to-end models.

On March 17 and March 20, DeepRoute CEO Guang Zhou attended the China EV 100 Forum and NVIDIA GTC respectively, announcing two advances in the company's end-to-end model:

Production vehicles equipped with DeepRoute's end-to-end autonomous driving solution will hit the market this year. According to sources, DeepRoute has secured at least 4 production models.
DeepRoute has partnered with NVIDIA, and its end-to-end model will be among the first to adapt to NVIDIA's next-generation autonomous driving chip Thor, which will enter mass production next year.

Guang Zhou introducing DeepRoute's end-to-end autonomous driving solution at GTC.

In autonomous driving, "end-to-end" means: using just one model to transform perception information collected by cameras and other sensors into operational signals like how much to turn the steering wheel or press the accelerator, enabling the vehicle to drive automatically.

Previously, the more common implementation was the modular approach, dividing perception, prediction, and planning into three separate modules. The perception module often used data-driven deep learning models, while the planning module required more traditional programming methods with explicitly defined rules.

"An end-to-end model trained on massive amounts of data can give machines the ability to learn, think, and analyze autonomously, efficiently handling various scenarios on the road." Zhou believes end-to-end models will attract more car owners to use autonomous driving features.

Entering 2024, as Tesla rolled out its end-to-end-based autonomous driving system FSD v12 in North America, Chinese automakers including Xpeng Motors, Li Auto, and NIO have increased their investment in end-to-end model R&D, racing to deploy them in vehicles to enhance product competitiveness.

In Zhou's view, end-to-end models aren't just about giving cars a better new version of autonomous driving systems. Because cars are essentially a special type of robot, end-to-end models will also form the foundation for general-purpose robots with far greater market potential in the future, serving as one of the driving forces of the AI 2.0 era: "Building on end-to-end models, it's possible to create a general artificial intelligence system for the physical world."

1

"End-to-end is the natural choice"

The currently mainstream modular autonomous driving approach has the advantage of being more technologically mature, with greater certainty in development. But under this technical architecture, autonomous vehicles face obstacles when expanding operating areas and adapting to different regional roads and environments: especially in the planning and control环节, engineers need to write extensive code to establish driving rules for handling corner cases. Modules trained solely on data struggle to handle situations they haven't encountered before.

This challenges the safety of autonomous driving systems. "Every pitfall needs to be filled with rules, but if you miss one, a single pitfall could mean an accident," said one AI practitioner.

Writing extensive rules also brings enormous development and maintenance costs. To rapidly expand the coverage of autonomous driving systems in production vehicles, Huawei's planning and control team reportedly recruited over a thousand engineers.

"Rule-based approaches inevitably leave some situations unhandled. With hundreds of thousands or millions of production vehicles, operating in different regions with different roads, it's difficult to cover everything with rules," Zhou said.

Over the years, the number of modules in autonomous driving solutions has continued to decrease:

Before 2017, developing autonomous driving systems required 9 models, with perception alone needing 3 separate ones for detection, object tracking, and data fusion.
In 2017, the number of models decreased to 7, with the 3 perception modules consolidated into 1 multi-sensor fusion module.
In 2022, the number of models dropped to 3, handling perception, prediction, and control respectively.

Modules in autonomous driving solutions have become increasingly fewer

The end-to-end model represents the completed form of this trend: using just one model to accomplish the entire autonomous driving task.

In Zhou's view, if you believe data-driven approaches represent the major trend, "end-to-end models are a natural choice." DeepRoute began investing resources in end-to-end model R&D in early 2023, completing road tests by August that year.

These tests revealed the potential of end-to-end models. "Traditional rule-based models prioritize safety as the core metric, then passenger comfort. But they don't care about the experience of other road participants." What impressed Zhou most was during one end-to-end model test: the vehicle needed to go straight but was stopped in a shared right-turn/straight lane, blocking a car behind that wanted to turn right. The model noticed there was still space ahead and moved the car forward slightly to let the right-turning car pass first — "just like an experienced driver."

But end-to-end models have "a very low floor" — if poorly trained, they may underperform traditional models on metrics like safety and comfort. This means developing a qualified end-to-end model requires greater resource investment.

Zhou believes that due to the high barriers to entry for end-to-end models, the gap between different autonomous driving companies will widen significantly in the coming years.

2

End-to-end model competition

is a contest of system capabilities

Building a good end-to-end autonomous driving model requires comprehensive system capabilities, with new challenges at every stage.

Obtaining massive amounts of driving data is the entry ticket for training end-to-end autonomous driving models. Tesla CEO Elon Musk discussed the importance of data for autonomous driving models during an earnings call last year: "Training with 1 million video cases is barely enough; 2 million is slightly better; at 3 million you start going 'Wow'; by 10 million it becomes unbelievable."

Not all driving data can be used to train end-to-end models. One autonomous driving engineer said that when training their end-to-end model, only 2% of their accumulated road test data was usable. To give end-to-end models general capabilities, they must be trained on high-quality data from diverse scenarios.

Zhou told LatePost that when DeepRoute obtains anonymized data from partner automakers, they prioritize filtering for driving data from drivers with over 6 years of experience and no violations in the past 3 years, collected on various complex road segments. They capture the steering wheel's angle and rate, pedal position and rate, paired with the driving environment at that moment to train the model. He said DeepRoute's greatest advantage is its data processing capability, built up over years of consistently developing data-driven production autonomous driving models.

To lay a solid foundation for the model, DeepRoute devotes 80-90% of its energy to data engineering, including but not limited to collecting, cleaning, categorizing, and annotating high-quality data.

DeepRoute testing its end-to-end autonomous driving solution

Transforming massive amounts of data into an end-to-end model also requires substantial computing power. Musk said during a recent earnings call that to train stronger FSD models, Tesla will spend $1 billion this year purchasing chips from NVIDIA and AMD to build a supercomputing center.

Zhou said DeepRoute has also purchased a batch of GPUs to build a data center for training end-to-end models, and rents cloud computing resources when large amounts of GPU power are needed for training. In his view, computing power alone isn't enough — what matters is how to maximize the use of massive data during training to produce models that meet expectations.

Once trained, models can't be directly deployed to vehicles. Because models trained on large amounts of data have relatively large parameters, they require high-compute chips onboard to run. Currently among available products, the highest single-chip compute autonomous driving chip is NVIDIA's Orin, with 254 TOPS. Moreover, bandwidth between vehicle-side autonomous driving chips is limited, making it difficult to use them in parallel to increase overall performance.

Musk said at a recent event that a key challenge for FSD v12 was optimizing and streamlining the model under limited compute conditions — the problems to solve became an order of magnitude more complex.

Zhou stated that their streamlined end-to-end model can run on Orin chips, and with more powerful chips like Thor, which reaches 1000 TOPS on a single chip, the end-to-end model's performance would be even better. He believes that as one of the first companies to adapt end-to-end models to Thor chips, DeepRoute will gain more advantages in this wave.

3

The next battleground:

Scaling Laws for robotics

After experiencing the effects of end-to-end models, Zhou re-examined the company's development path. He believes the potential of end-to-end models extends far beyond enabling autonomous driving in cars — continued iteration could lead to general artificial intelligence for the physical world.

In 2023, DeepRoute underwent its first strategic adjustment in 4 years since founding: the short-term goal is to push end-to-end models into production vehicles and accumulate data; the long-term goal is to find a path toward general artificial intelligence for the physical world, achieving AGI in Robot.

While autonomous driving end-to-end models also use massive data to train larger models for better performance, they differ from large language models like GPT-4 in that training robot models doesn't require text data with simple rules, but rather large amounts of complex "critical-state data" collected from the physical world — data capturing when objects in motion are affected by and change in the physical world. For example, when a car drives on a congested road and needs to constantly adjust speed and direction, these movement behaviors are collected to form a dataset.

The "Scaling Laws" that current large language models rely on for expansion may not directly transfer to robotics foundation models.

The Scaling Laws proposed by OpenAI researchers in 2020 allowed researchers to train small models on limited data and relatively accurately predict what performance level large language models would reach as data volume, parameters, and training compute increased, solving the scaling problem for large language models.

Because of Scaling Laws, the large language model field gradually formed consensus: using more AI compute and data to train models with larger parameter scales yields better results, and can even lead to the "emergence" of intelligence.

"In autonomous driving, or robotics scenarios, because the training data types differ, simply using more high-quality data to train larger models may hit bottlenecks — performance might not improve, and could even decline." Zhou said the robotics field needs innovation in model architecture to find its own "Scaling Laws" for qualitative improvements in model performance.

Zhou said finding Scaling Laws for the robotics field will be DeepRoute's key research direction in the coming years, and is essential to achieving the company's long-term goal of AGI in Robot.

So far, no company has proposed Scaling Laws for autonomous driving end-to-end models or the robotics field.

"Tesla might have them, but they won't necessarily share them publicly." Zhou believes competition in the AI 2.0 era will be more intense, and leading companies may choose closed-source approaches — this is a reality that must be recognized.

Cover image source: DeepRoute