DeepRoute.ai's Guang Zhou: The More You Understand AI, the Less You Doubt VLA | Yunqi Capital Doers Series
In the technological leap of intelligent driving, **Yunqi Capital portfolio company DeepRoute.ai has always been "the first to eat the crab."** From mapless solutions to end-to-end models, and now to the VLA (Vision-Language-Action) architecture that it's first to put into production, the company has repeatedly positioned itself at the forefront of inflection points.

In the technological leap of intelligent driving, Yunqi Capital portfolio company DeepRoute has always been "the first to eat the crab." From map-free solutions to end-to-end models, and now to the VLA (Vision-Language-Action) architecture it has pioneered, the company has repeatedly stood at the forefront of inflection points.
This August, DeepRoute's release of DeepRoute IO 2.0 made it the first domestic supplier to truly mass-produce intelligent driving based on the VLA technical route.
Recently, CBN's New Cortex sat down with DeepRoute founder and CEO Zhou Guang for an in-depth conversation about why the company chose the VLA route and what lies ahead. This edition of "Yunqi Doers" shares the highlights.
This article is excerpted from the WeChat account "New Cortex NewNewThing."
Original title: Exclusive Interview with DeepRoute's Zhou Guang: The More You Understand AI, the Less You'll Question VLA | New Cortex Dialogue
Reporters: Wu Yifan, Wu Yangyang
Over the past decade, intelligent driving has ridden several waves of technological change: from HD map-based solutions to map-free ones, from rule-based systems to deep learning, from deep learning to world model-based reinforcement learning, from ordinary end-to-end models to large models... Nearly every new wave has knocked out a cohort of intelligent driving companies, forcing them to either shut down or pivot.
DeepRoute is the exception.
"We're more of a disruptor," DeepRoute CEO Zhou Guang told CBN's New Cortex. Founded in 2019, the company has already "sparked" at least two industry-wide technological waves: the first was its 2020 map-free solution. At the time, HD map-based intelligent driving was the industry mainstream, but systems constrained by maps couldn't access all scenarios in one go. DeepRoute began developing its map-free approach in 2020 and launched it in March 2023. Then in April 2023, Huawei released its own map-free urban NOA system. Because this was the first time the industry moved from elevated road scenarios into urban areas with heavier pedestrian traffic and more direct user experience, the intelligent driving market truly exploded for the first time after nearly a decade of exploration. The second wave was VLA.
On August 26 this year, DeepRoute released its next-generation intelligent driving system DeepRoute IO 2.0 based on the VLA (Vision-Language-Action) architecture. Compared to the previous generation, drivers can now interact with the new system using natural language, even commanding the car with instructions like "turn left at the next intersection" or "slow down to 60." Moreover, DeepRoute claims its VLA model possesses spatial understanding capabilities enabling more complex reasoning — for instance, where the previous system would deem an object behind a wall "nonexistent" if it couldn't be seen, VLA can transcend this limitation.

DeepRoute's VLA system makes a "maintain low speed" decision, providing the reasoning: "electric vehicle in tunnel, blind left curve ahead." The previous CNN-based system lacked this spatial recognition capability and couldn't provide decision rationales.
Zhou Guang describes the previous generation end-to-end system as CNN-based end-to-end, while VLA is GPT-based end-to-end. Currently, DeepRoute is the only third-party intelligent driving supplier adopting the VLA technical route. Among new car makers, only Xpeng Motors and Li Auto have announced plans to deploy VLA.
Facing external controversy over VLA models, Zhou Guang believes that the more one understands AI, the less one will question VLA. "What neural network today surpasses large models? Nothing, they're all large models," he says. He even argues that intelligent driving companies that fail to become large model companies are "dead." His logic: "CNN-based networks have limited learning capacity due to framework constraints," and "VLA's starting point is the ceiling of where end-to-end is today." His key evidence is Tesla: "HW5 (i.e., AI5) chip specs at 2500 TOPS — what architecture would need 2500 TOPS? If it's not GPT architecture, what is it?" Zhou Guang asks.
If VLA once again reshapes the entire intelligent driving industry as Zhou Guang predicts, DeepRoute will dive into the "disruptor" river a second time — or a third, if counting the "early fusion" technique Zhou Guang pioneered at his previous company Roadstar (a method that fuses raw data from multiple sensors like LiDAR and cameras before decision-making, as opposed to "late fusion" which integrates outputs after each sensor has made its own decision).
Since its map-free CNN model first entered mass production in August 2024, DeepRoute has shipped nearly 100,000 units within a year, reaching 30,000 units in September alone. To date, DeepRoute has secured five VLA program定点车型合作 (designated vehicle model partnerships).
Below is the conversation between CBN's New Cortex and Zhou Guang, edited and condensed for clarity.

DeepRoute CEO Zhou Guang.
The Threshold for VLA Is 100,000 Vehicles
New Cortex: Let's start with VLA. Among third-party suppliers, you're the only one on this route.
Zhou Guang: There aren't many third-party suppliers left now.
New Cortex: There are still four or five, right?
Zhou Guang: Very few can actually do urban NOA. Suppliers, I think, need to first achieve end-to-end, advanced urban scenarios, rather than talking about how next-generation technology should evolve. VLA is actually built on top of end-to-end technology.
New Cortex: So is there a technological ladder for intelligent driving development?
Zhou Guang: It's also a data ladder. From end-to-end to VLA, the data requirements differ. I remember making an estimate two years ago — I said the entry ticket for VLA might be reaching mass production of 100,000 vehicles, otherwise your dataset isn't sufficient for large models, while the foundation for end-to-end might start at ten thousand units.
The larger the model, the more data it needs. L4 companies can do a demo with one vehicle, which shows L4 doesn't require much data. My previous company (Roadstar) did L4, founded in 2017 with just two vehicles, yet could run smoothly in Shenzhen because it relied on HD maps and LiDAR. That was first-generation technology — maybe 10 vehicles sufficed, and even today the maximum scale is only in the thousands.
For end-to-end, you start with hundreds or a thousand vehicles, collect data, then mass produce, and you can basically reach ten thousand.
After 100,000 vehicles, more data doesn't significantly improve model performance, because the model lacks capacity to absorb that much data. But VLA is different — 100,000 might just be the starting point.
New Cortex: There's a threshold for doing VLA?
Zhou Guang: Yes, there is. Look at the history of AI development — it's always been about increasingly larger datasets and greater compute. Today some companies are still at different stages, still using rule-based or HD MAP approaches. Of course they're operating successfully and can even do driverless operations. So taking that technical path to its extreme, there's a small group continuing that. And there's another small group, like us, that chose end-to-end. Once you go end-to-end, you're basically on the one-way road toward large models, you can only go toward scaling, and reaching VLA becomes natural.
New Cortex: Will the next step after end-to-end definitely be VLA?
Zhou Guang: Definitely VLA, definitely large models. Because the previous generation end-to-end wasn't large models — it was still a classical neural network-driven autonomous driving model, not a large model architecture autonomous driving model. There's a difference.
New Cortex: What's the difference between VLA and the previous generation end-to-end models?
Zhou Guang: The entire architecture is different. VLA should be called GPT-based end-to-end, the previous was CNN-based end-to-end — maybe putting it this way leaves no questions. The previous generation was primarily CNN architecture with a small amount of Transformer. VLA is pure GPT, it looks the same as large models, though currently the data volume isn't that large yet. But VLA's advantage grows with increasing data volume.
There are all kinds of voices in the industry about VLA. We shouldn't just look domestically — take Tesla's HW5 (i.e., AI5) chip with 2500 TOPS specs. What architecture would need 2500 TOPS? If it's not GPT architecture, what is it? Why would you need such a large model with CNN?
New Cortex: Do different architectures make intelligent driving capabilities perform very differently?
Zhou Guang: In driving capability, we're focused more on the later stages (referring to safety redundancy, risk prediction). We hope to enhance defensive capabilities. VLA can do many things — like recognizing road signs, which end-to-end can't do — plus defensive driving, irregular obstacle recognition, voice interaction. These are all naturally achievable with VLA models.
Of course, all this takes time. We'll select what we consider key capabilities to break through first. Comprehensive breakthrough isn't yet timely — current compute isn't sufficient, NVIDIA's Thor chip only has 1000 TOPS, trying to do everything is too strained. I think 10,000 TOPS would be more reasonable. Also, data volume hasn't caught up, but VLA's starting point will already outperform end-to-end models.
New Cortex: The term "defensive driving" needs some explanation.
Zhou Guang: To put it in one sentence: make AI afraid. Current AI isn't afraid, it's fearless — wrong is wrong, no big deal. But all animals know fear, all know to seek benefits and avoid harm.
New Cortex: Why should intelligent driving systems have this capability, and how do you give it to them?
Zhou Guang: You have to give the AI data that teaches it fear during the training process. For example, someone suddenly darts out from in front of a bus — that's when a human would panic. Or when you're driving out from under an overpass with really heavy occlusion, and you can't see clearly, a person would get nervous and slow down. Humans have this defensive instinct: look first, then reduce speed. Beyond the examples I just gave, there are countless scenarios where drivers feel panic on the road. We record all of this data and use it to train the neural network. End-to-end data collection has no concept of "panic."
New Cortex: Does this kind of data even exist? Panic is something people feel internally.
Zhou Guang: People express it.
New Cortex: So you have to deliberately collect this data? Are there ready-made datasets for this on the market?
Zhou Guang: We'll definitely go out and specifically collect this "fear" data. Once your dataset has "fear" as a standard, the AI naturally learns to be afraid. There's no existing data for this.
New Cortex: Is the insufficient compute power you mentioned earlier one of the reasons why many domestic manufacturers now want to develop their own chips?
Zhou Guang: If you're going to develop a 100 TOPS chip, that makes no sense. If you're going to develop your own chip, it has to be a high-compute chip. Tesla's HW4 (AI4) to HW5 (AI5) — compute increased 10x. HW4 is roughly 200 to 300 TOPS, about the same as Orin. HW5 is 2,500 TOPS, though there are various rumors saying 2,000 to 4,000 TOPS. There's never been such a massive generational leap in chips before. Previously compute might increase two or three times. This sudden 10x jump is entirely because of large models.
We won't develop our own chips. Different fields require different expertise.
New Cortex: So you don't think VLA is a risky choice?
Zhou Guang: Not risky at all. Large models have already proven themselves. When OpenAI went after large models back then, that was risky. But what risk is there for us? This is simply what should be done. What neural network today surpasses large models? Nothing. Everything is large models.
New Cortex: Large models have proven their success in the digital world, but what about the physical world?
Zhou Guang: It absolutely has to be a large model architecture. What are large models good at? Temporal reasoning and inference. Doesn't driving require inference, temporal understanding, and text comprehension? Doesn't driving require knowledge from the internet? For example, we see a tricycle loaded with garbage bags — a large model would know this is an old man collecting trash.
New Cortex: Its recognition capability is stronger than CNN?
Zhou Guang: CNN fundamentally doesn't have this capability at all.
New Cortex: How do you understand the external skepticism toward VLA?
Zhou Guang: Everything grows from a baby. There may be some skepticism now, but first, every company's progress is different. Second, VLA might indeed still be a "child" right now, while the previous generation of technology has grown into a "middle-aged person." A middle-aged person outperforming a child — what's surprising about that? And they haven't even outperformed it yet. The two are roughly comparable.
New Cortex: You think their ceilings are different.
Zhou Guang: Obviously different. VLA's starting point is where end-to-end is today. I think the more a company understands AI and large models, the less it will question VLA.
New Cortex: What's the ceiling for VLA?
Zhou Guang: The ceiling is Agent. Whatever large models can't do well today, VLA will also struggle with. But we're nowhere near the ceiling yet. The room for improvement comes from compute, data, and so on. Think about it — large models today have already hit the compute ceiling, the data ceiling. We're still early.
Right now VLA faces dataset challenges, and then there's also the L (Language) to A (Action) portion. Our kind of VLA model definitely still differs from a complete large model. The V (Vision) to L (Language) part is actually similar, but the L (Language) to A (Action) part still has much to figure out.
From Car Companies Not Believing in Intelligent Driving
to Having FOMO:
Only 5 Years in Between
New Cortex: What are the deployable scenarios for VLA?
Zhou Guang: First and foremost, cars, because no other terminal has that kind of volume. If you're starting with 100,000 units for something, what else today can provide 100,000 units of data? I don't think anything can.
New Cortex: Do you plan to deploy it first in Robotaxi scenarios or in customers' mass-production vehicles?
Zhou Guang: Mass-production vehicles. Current Robotaxi simply can't generate that much data. VLA needs to start at 100,000 units; I think reaching a million would be even better.
New Cortex: What's your strategy for selecting customers?
Zhou Guang: We prefer deep collaboration with customers, working together to create hit models. The vehicles that truly achieve scale are typically around the top fifty in sales rankings. Rather than spreading ourselves thin across too many models, we'd rather concentrate resources on serving leading projects. The broader your scope of business and the more miscellaneous your portfolio of合作车型, the less likely customers are to entrust you with their premium models — because it's difficult to establish the trust needed for long-term, deep collaboration.
New Cortex: How many VLA mass-production customers have you secured?
Zhou Guang: We've reached 5 designated合作项目 for the VLA model. At this stage, our assisted driving solution (referring to the previous-generation end-to-end system) has been deployed in over 100,000 vehicles.
New Cortex: How did you reach 100,000 units so quickly?
Zhou Guang: Our map-free solution first went into mass production around August last year. We shipped 100,000 units in a year, and these aren't low-end systems — they're all high-end urban NOA. There was a ramp-up period at the start of mass production. Now we're shipping over 10,000 units monthly, and soon we might quickly reach 20,000 to 30,000 units per month.
New Cortex: Was it challenging to win over customers willing to adopt VLA?
Zhou Guang: Not at all. Existing customers have already seen the actual value that technology brings, so they're willing to try new things.
New Cortex: The outside world is still questioning VLA. How do you convince your customers to accept it?
Zhou Guang: First, I think they have FOMO (fear of missing out). Second, VLA has advantages in "defensive driving" that end-to-end simply can't achieve, because end-to-end lacks spatial understanding capability.
New Cortex: DeepRoute was founded in 2019. How difficult was it to talk to car companies about intelligent driving mass production back then?
Zhou Guang: Car companies at that time basically didn't believe in autonomous driving. I think Huawei later convinced the market by promoting the map-free solution as "driveable nationwide," and that created our opportunity. Because if autonomous driving never entered urban areas, users would find it very hard to perceive its value.
New Cortex: Car companies went from not believing in autonomous driving in 2019 to having FOMO now. Why did this change happen?
Zhou Guang: First, the in-car experience changed qualitatively. Additionally, the development of large models refreshed everyone's understanding. I think large models had the biggest impact. As an AI practitioner, I was incredibly excited the first time I chatted with GPT. We used to believe the journey from weak AI to strong AI would take a very long time. After GPT came out, I felt like that 50 years might suddenly become 5 years. It was very counterintuitive.
Also, when GPT-4V came out, I had a strong reaction. I wondered what it would be like if GPT-4V were driving. It could tell you what you need to pay attention to in every scenario — something end-to-end simply can't do. For example, at the entrance to Shenzhen Futian Free Trade Zone, there's a sign that says "Left turns not controlled by traffic light." How do you train that through data? Without language, how does the model learn this? It doesn't fit any statistical pattern. All statistics tell the model you can turn right on red, but they won't tell you that you can also turn left on red. At that time (2024), our test vehicle would stop at the red light.
New Cortex: Was GPT-4V what inspired you to pursue VLA?
Zhou Guang: It was GPT, honestly. If we don't actively embrace this, large model companies will definitely defeat us. It's just like when we eliminated the previous generation of autonomous driving companies — we did it through end-to-end. Now even those low-compute chip companies are starting to switch to end-to-end.
At the time, I saw GPT-4V's entire reasoning process. I literally took some scenarios and asked it what to pay attention to. Problems we couldn't solve, it could give prompts for. For example, it would tell you there's a large truck ahead with boxes that might fall off — that's dangerous. It has language; it just lacks an action plan. End-to-end doesn't have this prompt. It fundamentally can't understand this kind of thing.
If We Can't Become a Large Model Company,
We're "Dead"
New Cortex: When did you start wanting to pursue VLA?
Zhou Guang: End of 2023. In the summer of 2023, Google published that RT-2 paper (referring to RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, which introduced VLA models in robotics), and I had the idea then. By 2024 we had decided to do this. In August 2024, our end-to-end model went into mass production.
New Cortex: Had you secured your first mass-production customer since founding the company by then?
Zhou Guang: Not when GPT-4V came out, but we had already won the designated project for the end-to-end solution. At the end of 2023, I told everyone: if we can't become a large model company, the company is "dead."
New Cortex: So right when you had invested in the previous generation of technology, you wanted to switch to new technology?
Zhou Guang: Not switch — continue, but I told everyone we absolutely had to become "large-model-ized." It had to be large model work methods. I didn't want our company to still have perception teams and planning & control teams. No.
New Cortex: So back then you still had perception teams and planning & control teams?
Zhou Guang: Of course.
New Cortex: So it was like having three generations of technology simultaneously in your company.
Zhou Guang: No, we didn't have VLA at that point. We only transitioned to VLA after we finished end-to-end.
New Cortex: Had you even recouped the investment costs for CNN end-to-end by then?
Zhou Guang: We're now collecting returns on CNN end-to-end.
New Cortex: How much organizational impact did the transition from CNN end-to-end to VLA have?
Zhou Guang: The end-to-end organizational structure could still use the previous generation's structure (referring to intelligent driving technology not relying on high-precision maps). VLA can't. VLA's entire organizational structure is the same as large model companies.
New Cortex: So end-to-end and VLA are two different stages of business within the company now?
Zhou Guang: One is mature, the other is just getting started. You could think of it this way: one is an adult Homo neanderthalensis (note: an extinct species of archaic human), the other is an elementary school Homo sapiens (note: the species to which modern humans belong). Homo sapiens is more advanced than Neanderthals — they're not the same species. But you'll find that this elementary schooler already possesses the capabilities of the previous generation's adult. I'm definitely committed to the VLA path, because it still has a future.
Front Fusion, Mapless, VLA
We Were the First to Do All of It
New Cortex: When you started your company in 2017, there were already players in this space, many from Baidu, and you had just graduated. Did you feel you were late to the game or lacked industry background?
Zhou Guang: No, that wave of entrepreneurship was all rule-based. I don't consider that autonomous driving. Programming and AI have nothing to do with each other — the difference between them is like the difference between language arts and math.
New Cortex: At the time, they were already working on L4.
Zhou Guang: Actually, I think the L2, L4... classification is quite inaccurate, too backward. It's a standard defined 30 years ago, before AI even existed. The way to classify autonomous driving should be from rule-based to CNN-based end-to-end, to GPT-based end-to-end — this is a technology-driven classification. Look at how clearly OpenAI defines AGI levels: Chatbot, Reasoner, Agent, Innovator, Organizer.
New Cortex: Does GPT-based end-to-end correspond to Reasoner or Agent?
Zhou Guang: Autonomous driving and digital AI aren't exactly the same, and I haven't figured out how to grade it either. I've been saying the Level xx classification is extremely outdated. Tesla gave us an answer — it essentially wants to go straight from L2 to L5, skipping L3 and L4. Tesla has never mentioned L3, not even in product planning; it went straight to Robotaxi. Today, Tesla's Robotaxi operates relatively safely in certain areas, but that doesn't mean it can't work outside those areas — it's just that outside Austin, the safety factor might be lower. Tesla completely disregards the Level xx definitions, attempting to go directly from L2 to L4/L5 in one leap.
New Cortex: What's the difference between L4 and L5?
Zhou Guang: L4 means you can only use it in a certain area; L5 means you can use it nationwide. Back then, the difference between L4 and L5 was essentially the coverage area of high-precision maps: whether your map covered the whole country or just one province or city.
New Cortex: This classification seems a bit arbitrary.
Zhou Guang: Robots 30 years ago were classic robots — the core was SLAM (Simultaneous Localization and Mapping), completely map-based. A variant of SLAM is HD MAP. SLAM is simultaneous mapping and localization, but in the real world you could collect the map first, so you don't need real-time mapping anymore, just localization — that's HD MAP.
In today's large model era, who still talks about SLAM? Once AI is powerful enough, these mapping and scanning processes become unnecessary; you can reason instead. So our current approach achieves things through reasoning — the model infers what to do, rather than a map telling me what to do. So the autonomous driving classification system needs to be rethought.
New Cortex: Why don't you create your own?
Zhou Guang: I haven't thought about it yet. Maybe I can come up with something better — after all, this thing has been used for 30 years.
New Cortex: So in your view at the time, traditional L4 wasn't necessarily a path that had to be taken?
Zhou Guang: No. I formed this understanding around 2019, when I had exchanges with top AI academic circles, and realized that perception could solve the navigation problem.
New Cortex: Front fusion, mapless, and VLA — you were the first in the industry to do all of these. These were the most cutting-edge technologies at the time, but not necessarily the most reliable. How did you assess the risks?
Zhou Guang: They definitely weren't the most reliable. You still need your own technical judgment.
New Cortex: Similar to the current debate about "whether to use LiDAR or not"?
Zhou Guang: On that specific question, my answer is yes. But if you give me enough money, then no. Because with enough money, I can solve these problems with data. So it depends on whether you spend that money on data or on sensors. Tesla spent an enormous amount to solve OCC (occupancy networks); OCC is basically solved now, so LiDAR isn't as needed for recognizing irregular obstacles.
New Cortex: How much did your choice of technical approach depend on OEMs or prospective clients at the time?
Zhou Guang: When we went mapless, we didn't consider this at all — we just followed the correct technical path. We believed neural networks could recognize road topology structures. What needed to be done then was the BEV approach (note: BEV refers to bird's-eye-view structured data representation, a critical prerequisite for implementing mapless solutions). I decided to pursue BEV as early as 2020. The first step of perception is front fusion; the second step is BEV — it's a variant of front fusion, the next stage of front fusion.
New Cortex: Tesla also started working on BEV around that time.
Zhou Guang: In 2021, Andrej Karpathy introduced BEV at Tesla AI Day, and we were the earliest company in China to pursue it.
New Cortex: Were you referencing Tesla at the time?
Zhou Guang: No reference, which is why I say we were the first in China to achieve mapless. But I did have exchanges with Karpathy.
New Cortex: How did you judge that Karpathy's ideas were correct?
Zhou Guang: Because I was thinking the same thing. At that time, Karpathy's entire team was only about a dozen people. But the key is: who do you form consensus with? Forming consensus with people who don't understand AI — that's definitely wrong. Of course, mapless means removing HD maps and replacing them with perception. Perception is a model, but planning and control were still rule-based at first, so it wasn't great to use initially.
The Value of Disruptors Is the Chance to Overtake on the Curve
VLA Is Another Curve
New Cortex: What stage do you think DeepRoute is at now?
Zhou Guang: The stage of accelerating commercialization. Before, it was a research stage, a demo stage. The commercialization acceleration stage means facing brutal commercial competition — face it calmly, keep an open mind, have your own convictions, and only do what's right.
New Cortex: Quite a few companies commercialized several years earlier than you.
Zhou Guang: If the technical direction itself has deviations, even an early start may lead to more adjustments and challenges down the road.
New Cortex: Do you think this is an inflection point?
Zhou Guang: Every technological shift is an inflection point. How can you surpass the leaders if the technology doesn't change?
New Cortex: Actually, everyone has experienced at least one transition in the past decade.
Zhou Guang: Right, everyone has to go through it. We've actually been proactive about it. We're more disruptors, I think — not forced by industry pressure.
New Cortex: As a disruptor, what do you gain?
Zhou Guang: The chance to overtake on the curve. By being the first to deploy mapless and end-to-end, DeepRoute's adoption volume in urban NOA — this high-end assisted driving domain — already ranks among the top three among third-party assisted driving suppliers.
New Cortex: Is there a cost?
Zhou Guang: Once you do it well, others can copy you. It's fine — we don't have just one technology. When we first did front fusion, people said, "What if you do it and others copy it quickly?" I said what's more important is our ability to keep making the right calls. Getting one thing right might be luck; getting it right continuously definitely isn't.
New Cortex: Does VLA have a moat?
Zhou Guang: I think the moat ultimately comes down to commercial capability. Commercial capability, data advantage — that's your moat. Technology just gives you a chance to overtake on the curve, but it can't always be a curve. On the straightaways, it's still about product, experience, service.
New Cortex: Are you in the straightaway stage now?
Zhou Guang: CNN end-to-end is the straightaway, but I think VLA is another curve.
New Cortex: Over the past decade of intelligent driving development, has every entrepreneur constantly faced route choices?
Zhou Guang: No, we don't really feel that way. We just follow AI. What some people see as a transition, to us is just normal walking.
New Cortex: Does VLA represent the end of the road?
Zhou Guang: The entire industry is moving toward large models.
New Cortex: Do you need to think about the next generation of technology now?
Zhou Guang: VLA is the next generation of technology.
New Cortex: Have you pretty much completed all the technical layout you need to do?
Zhou Guang: Far from it. GPT is a direction — like saying you're going north. But going north still involves many choices, and end-to-end is a very narrow thing.
New Cortex: Are the choices in the new direction known or unknown?
Zhou Guang: Both.
New Cortex: Will you develop Robotaxi business next?
Zhou Guang: We'll use GPT-based end-to-end to do Robotaxi. We won't do mapping anymore, won't use HD MAP — these are obsolete technologies. What we want to do is definitely something like Tesla's Robotaxi model: AI-driven, large model-based.
New Cortex: What stage is the Robotaxi business at now?
Zhou Guang: In preparation. We'll disclose more later. We'll do this as a technology third party.
New Cortex: Will this year be a window for Robotaxi? There's talk of that in the industry.
Zhou Guang: I think it still depends on Tesla. Right now, we should focus more on forward-looking paths.
New Cortex: Looking at scale, users, brand — if you're slow on Robotaxi, do you get anxious?
Zhou Guang: Not really. Large models represent a new path. No matter how hyped up the earlier approaches were, they may not matter.
New Cortex: But there are already some brands in the market.
Zhou Guang: If the experience can't be delivered, and you can only operate in limited areas — just a few places, a few routes — it's meaningless, with no commercial value.


