Dissecting LLMs in Cars: How to Turn Automobiles Into "Thinking" Robots? | Yunqi Capital Attent!on Podcast

云启资本·March 31, 2025

AI technology continues to leap forward, and our daily lives — what we wear, eat, live in, and how we get around — are being profoundly reshaped by this technological storm. In the mobility sector in particular, a new wave of intelligent vehicle evolution, driven by "large models hitting the road," is now in full swing.

As AI technology continues to leap forward, the way we dress, eat, live, and move is being fundamentally reshaped by this technological storm. In the mobility sector in particular, a new wave of intelligent vehicle evolution driven by "large models hitting the road" is now in full swing.

Recently, the Yunqi Capital-produced tech podcast Attent!on invited Li Pan, co-founder and CMO of Yitu Technology — which just announced the completion of its Pre-A funding round — and Sang Yu, an investor on Yunqi Capital's tech team, to break down the technical shifts, pain points, competitive landscape, and future trends of large models in vehicles. They explored how AI is transforming cars from "mobility tools" into "intelligent spaces that understand you."

Scan the QR code above or follow Attent!on on Xiaoyuzhou to listen to this episode

Guest Introductions

Li Pan, Co-founder and CMO of Yitu Technology

Master's degree from Shanghai Jiao Tong University. 15 years of experience in product-market-ecosystem and brand management in the intelligent connected vehicle industry, with a track record of pioneering multiple innovative businesses. Previously served as head of intelligent product at a leading domestic OEM and head of product ecosystem at a leading domestic connected car company.

Sang Yu, Investor, Yunqi Frontier Tech Team

Long-term focus on embodied intelligence, intelligent driving, and other frontier tech fields and innovative projects.

Linda, Managing Director at Yunqi (Host)

Key Topics This Episode

1. Core Value of Large Models in Vehicles

  • From "passive Q&A" to "proactive service": upgraded voice interaction, scenario-based intelligent agents (trip planning, emotional companionship, etc.)
  • Multimodal fusion: coordination of voice + vision + vehicle control data to create a "human-centric" mobile space

2. Technical Challenges and Breakthroughs

  • Technical evolution: TTS → large model end-to-end technology → agent, multimodal
  • Ecosystem reconstruction: automotive underlying architecture needs to be redefined
  • Data bottleneck: in-vehicle scenario data is fragmented; requires combination of vertical-domain models and reinforcement learning
  • Breakthrough strategy: layered product and technical architecture

3. Major OEM Cooperation Models

  • Lightweight cooperation: directly providing intelligent agent modules (e.g., voice enhancement, coffee ordering)
  • Deep customization: combining supplier model capabilities with OEM hardware capabilities
  • Full-stack self-developed: outsourcing engineering implementation needs

4. 2025 Predictions

  • Intelligent cockpit: end-to-end voice interaction deployment, AI Agent and AIOS pioneering in automotive scenarios
  • Autonomous driving: VLA models solving complex game-theoretic situations, end-to-end control improving autonomous driving experience

Podcast Timeline

01:03 Opening: Getting to Know the Guests

02:41 Technical evolution of large models in vehicles: from mechanical interaction to proactive service 11:33 Every family will need three robots in the future: home robot, pet robot, mobility robot 19:28 Industry pain points: OEM architecture, data algorithms, and ecosystem dilemmas 29:42 Yitu Technology's breakthrough strategy: layered products and AIOS system 32:41 OEM cooperation models: differentiated needs of traditional automakers, new forces, and cross-sector tech giants 38:02 Smarter means more power-hungry? Balancing computing power and energy consumption 41:08 Trend predictions: 2025 inflection points for intelligent cockpit and autonomous driving 44:10 Car-buying advice from industry practitioners

Below are selected excerpts from this episode (edited for clarity)

The Technical Evolution of Large Models in Vehicles

Host

We're discussing this topic today because everyone's seen how hot "large models hitting the road" has become — OEMs have already made some autonomous driving function modules standard features. The next step is figuring out how to embrace large models and in what product form to deploy them. Sang Yu, let's start with you from the technology evolution angle. From 2023 to 2025, what changes have you seen in large models in vehicles — in terms of product, cost, and functionality?

Sang Yu, Yunqi Investor

The history is quite interesting, and worth revisiting how we first connected with Yitu. When we first encountered Yitu, there was a buzzy concept called "automotive middleware." Many companies in the industry were doing automotive middleware, mostly focused on the communication technology direction. But when we spoke with Yitu's CEO Wu Xiaohang, we found he was thinking one level deeper. SOA-ization of automotive software and middleware — that's a major trend, but how do you actually use that trend? How do you turn it into services that consumers can perceive? That was what they were thinking about at the time. SOA software architecture development can atomize many automotive services, but how do you orchestrate those atoms? Everyone felt like something was missing — a so-called brain, or an intelligent agent in the car, that could truly string these service capabilities together.

At the time we were imagining the possibilities, and we saw the series of advances in large models. We later introduced MiniMax to Yitu, and they had quite extensive exchanges. Yitu gradually became more committed to the large model direction. I think this is a fairly typical example of how people first saw, ahead of the industry, the starting point for large models in vehicles — which was essentially about how to organize service capabilities and use a large model to form a brain.

Looking back from the technology and product form perspective, originally it was based on relatively traditional NLP and TTS technology, where people did very mechanical Q&A. Today, the hope is to leverage the capabilities of large models.

Let me give a vivid example: people probably want an attentive butler in their car. This butler can provide proactive service — noticing that your seat position is uncomfortable, or that it's raining today and turning on heating and wipers. It shifts from passive service to proactive service. And going deeper, perhaps we expect this attentive butler to sense your emotions, know your schedule for the day, perhaps notice you're tired today, and chat with you, empathize, play some music.

Host

Before joining Yitu, Mr. Li Pan was actually at an OEM — on the client side. At Yitu, you're now serving clients. From these two perspectives and identities, how do you view the progress of large models in vehicles?

Li Pan, Co-founder and CMO of Yitu Technology

My entire career has been in the automotive supply chain. I worked in connected vehicles before, then went to an OEM, and now I'm out starting a business. I think we traditional automotive people have been through an internet wave over roughly the past decade or so — starting around 2015-2016 when people began talking about the internet economy, internet-plus. At that time, we saw something quite fundamental: the internet is about information distribution and efficiency improvement. Whether mobile internet applications or services, everyone wanted to quickly transplant them into cars.

But I think over the past seven or eight years, the transformation of automotive products by mobile internet hasn't met our expectations. To put it jokingly, the current intelligent cockpit version in cars is basically just a few iPads installed — though these iPads may be getting bigger and enabling some multi-screen interaction, fundamentally it's a product of the mobile internet era. It hasn't truly brought intelligent interaction and intelligent services to cars. As for intelligent driving, that's even more obvious — that's the autonomous driving domain's job.

So we traditional automotive people still harbor this idea: we believe that a product or terminal like the car should be reimagined by AI. What does this mean? I think if we treat the car as the ultimate terminal, we still believe it needs to transform from a so-called mobility tool into a mobile intelligent space. Essentially, whether it's the hardware inside, the software, or even many operating systems, everything needs to be redefined.

Just as we look at robots today — that industry may have had relatively weak foundations, still making breakthroughs in the physical body itself. Compared to robotics companies, we've been building cars for over 100 years. The power system, chassis, steering, body system — the so-called major components of a robot's physical body. I think we've matured these sufficiently, and China's industrial chain advantages in this area are extremely obvious. Yes, but if we want to develop toward a so-called mobile space direction, having just this mechanical physical structure isn't enough. It serves people.

I shared a view recently: I think in five years or longer, every family may own three robots. At home there's a home robot that helps you clean, cook, take care of kids, etc. There's another robot that you take out for walks — maybe a toy dog, or a pet. And in the underground garage, we have another robot — our mobility robot, that takes me far away, commutes with me, travels, picks up kids, etc. At the same time it shelters me from wind and rain, provides comfort and safety, and enables long-distance mobility convenience.

But such a robot — its physical body is fine now, but it doesn't yet have strong in-car interactivity. I think as we redefine the functional attributes of space, the space itself will change too. But what comes with that? How can it today truly understand its master's intentions like a real assistant, comprehend your words, execute your tasks, and autonomously mobilize the vehicle's various controllers? This involves three very interesting things.

First is thinking, understanding, reasoning — Yitu defines this as a "mind," an in-vehicle mind. From perceptual input and processing, to reasoning, output, and task distribution and scheduling — this is what a mind does. It handles vertical-domain matters on the road. For example, perceiving elderly people, children, friends getting in the car; driving on highways and elevated roads; perhaps also perceiving many things brought into the car, like a child bringing a ball. It needs to read situations, think, do some deep reasoning.

Second is human-machine interaction. Large models in vehicles will definitely become a completely new form of human-machine interaction. It doesn't necessarily mean having a big screen stuck there that I need to touch. Essentially, within our in-vehicle space, interaction will again be a huge opportunity, but it will also depend on models.

Third is service. This service will differ from what we used to talk about in cars — listening to music, navigation. For example, if we have our own assistant and tell this assistant that an old friend from Beijing is landing at Hongqiao Airport around 7 p.m., the information content here is enormous, perhaps containing five layers of information. This assistant needs to start thinking and deliberating — airport pickup, dinner arrangements, etc. All these various services encompassing mobility should be what this automotive robot helps the user solve.

So I think when these three important subsystems are all driven by AI to operate on a new architecture, combined with our entire redesigned physical space, only then can we in the next 3 to 5 years truly create a family partner that understands me and can freely help complete various mobility needs.

Yitu is built around these three things.

Bottlenecks and Breakthroughs

Host

What are the actual pain points that large models in vehicles face at this stage? Which problems are relatively easier to solve now? Which pain points really require long-term joint efforts from different players in the industry?

Li Pan, Co-founder and CMO of Yitu Technology

First, industrial development has its own rhythm. Large models are great, but for large models to redefine the underlying architecture of the entire vehicle today, the industry still needs to follow such standards. Because there are still large numbers of suppliers following existing development protocols and models — this itself is enormous work. But I believe companies with vision, including innovative companies, may already be doing this.

Second, looking at several core elements — data, algorithms — I think computing power is actually more or less okay for in-vehicle applications. Because recently Nvidia Thor has already exceeded 1000 TOPS, so doing some edge-side things in the vehicle is fine.

But honestly, algorithms and data — unlike the internet, there's a lot of fragmented data, including user behavior scenarios that aren't that common. It's not like helping employees or users look up documents or make PowerPoints. So I think we're still a bit short on algorithms and data — there need to be companies that step up to pull this together.

Third is ecosystem. Currently the ecosystem around AI terminals is extremely immature. After "software-defined vehicles" was proposed, roughly ten years have passed, forming a very mature upstream and downstream industrial chain. Some people do ecosystem, applications and services. After ten years, everyone has磨合 (worked together) very smoothly. But today, following the architecture we just discussed, I think we're only at the first step of a long journey.

Sang Yu, Yunqi Investor

I'd also like to answer from our investment perspective. Two words are very important from our investment view: "pace." AI, large models in vehicles — this is at least within two layers of technology flywheel iteration. One layer is AI technology itself. Everyone can feel it, especially we frontline investors looking at AI — we really feel like there are new achievements every day. For companies it's even more so. How do you figure out how to use these technologies well, and possibly one day these technological advances make a leap that has a disruptive impact on your original solution? So how to adapt to the pace of technological progress — this really tests product managers and company choices.

The second flywheel we're in is the entire industry ecosystem. OEMs are deploying now too, but as we just discussed, this technology is still relatively new, underlying adaptation is also continuously iterating. Including the infra layer and hardware layer, there are many partners — those doing domain controllers, systems, voice front-end large models, etc. Everyone is moving forward together within an ecosystem, and market education also needs continuous output. I think for a company in a trend where technology is iterating and the ecosystem is charging forward together, how to grasp our own positioning and pace is extremely critical.

Host

Mr. Li Pan, I'm quite curious — among the OEMs Yitu is currently working with, there are traditional automakers, new forces, and some existing cross-sector internet giants that are also doing automotive-related things. From your perspective, do you see different emphases among them?

Li Pan, Co-founder and CMO of Yitu Technology

Everyone knows AI is important, but truly refreshing the entire organization to all do AI — that's neither possible nor realistic. So on the OEM side there are actually three different needs or cooperation models.

For OEMs, they do big product definition. Today this is a new car, and this new car needs new experiences — natural language interaction, working in the car, gaming in the car when not driving. Once product definition, scenarios, and many other things are set, OEMs still need someone to supply solutions. And within solution provision, there are several different approaches.

One is directly giving the intelligent agent to the customer — the OEM thinks it's good and takes it. The second is OEMs with certain R&D strength that will do some customized development, requiring suppliers to provide some model capabilities. The third is full-stack self-developed large OEMs — whether applications, delivery, etc., they want to do it all themselves. But this also gives rise to a new category of need: they still need some people to help them with engineering implementation work.

Let me give an example: models. OEMs themselves go out to find some models and talk cooperation. But for example, multimodal models — they need processing of perceptual data, edge-side work to be done. But these things may require them to build an architecture ensuring future rapid extension or adaptation across different vehicle models. So different OEMs at different stages need different things.

For the full episode, subscribe to the Attent!on channel on the Xiaoyuzhou app~