FreeS Fund's 2021 Outlook ④ | The Endgame for Sensors Behind Autonomous Driving

峰瑞资本峰瑞资本·March 4, 2021

Is the "Tesla Vision" approach the future of autonomous driving?

Autonomous driving has remained a hot topic in recent years. Baidu and Huawei are pushing L4 technology, while EV upstarts like Xpeng Motors and NIO compete on self-parking capabilities and ADAS (Advanced Driver Assistance Systems) over-the-air updates. Apple, though it hasn't made any official announcement, is also developing autonomous driving technology. In this space, FreeS Fund has conducted extensive research and discussion, and has deployed capital into more than ten lidar, millimeter-wave radar, and upstream/downstream projects.

Our basic read on how the autonomous driving industry has evolved over the past few years comes down to a few key points:

  • Autonomous driving is the future of the auto industry — be confident about that. But technological and commercial development is gradual. It takes enormous time for validation and iteration to reach L3 and L4 levels.

L3 means conditional automation; L4 means high automation. For specifics, refer to the SAE (Society of Automotive Engineers) levels of driving automation. By North American standards, we may need to reach Level 5 before we achieve the ideal state of fully driverless vehicles without steering wheels.

  • Sensors will hit the road before autonomous driving systems are fully mature. They'll enter through assisted driving, gradually accumulating data and optimizing systems and technology — from simple to complex, from assisted and semi-automated, ultimately reaching the highly autonomous L4 level.

  • On the sensor dimension, we've always believed multi-sensor fusion is the most rational configuration for autonomous driving. The main sensors are camera vision, lidar, and millimeter-wave radar.

Looking at how autonomous driving has developed over the past few years, a large number of mass-produced vehicles have already gained some level of assisted driving capability, with a few brands even attempting L3 and L4. Various sensors have begun appearing in production vehicles — not just vision cameras, but also millimeter-wave radar and lidar. In this piece, we offer some analysis of the sensors involved in autonomous driving, hoping to provide fresh angles for thinking about the space.

Contact Us & Join Us

FreeS Fund continues to track investment opportunities in consumer/TMT, hard tech, and biopharma. Business plans are welcome at bp@freesvc.com. You can also reach out to Feng Xiaorui on WeChat (ID: freesfund).

FreeS is currently seeking investment professionals in biopharma, deep tech, and consumer/TMT across Beijing, Shanghai, and Shenzhen. We welcome candidates with industry backgrounds who are interested in investing (full-time or internships), and referrals of outstanding candidates to hr@freesvc.com (resumes enthusiastically accepted).

Autonomous Driving: OPA Lidar in the Context of Multi-Sensor Fusion

By Yongcheng Yang (yangyongcheng@freesvc.com)

On sensor configuration for autonomous driving, the main debate in the industry is this: Is the technological trend toward camera-only vision, or multi-sensor fusion combining cameras, lidar, and millimeter-wave radar?

This question is heated and controversial for several reasons. First, autonomous driving technology is a massive hardware-software integrated ecosystem. The choice of any sensor isn't an isolated hardware decision — it involves supporting software, algorithms, and large-scale AI-related data accumulation. It's a technological ecosystem choice where one move affects the whole body. So introducing any hardware sensor requires massive investment in supporting hardware and software technology, plus long-term big data accumulation. It demands extreme caution.

Additionally, the controversy over sensor configuration can't be discussed without Tesla's insistence on camera-only solutions and Elon Musk's shaping of technical discourse. According to public reports, Musk has stated outright that Tesla's autonomous driving system won't use lidar now or in the future. Per Caijing.com: "Tesla CEO Elon Musk once said that lidar is like having a bunch of appendices on a person — the appendix itself is essentially meaningless; any company relying on lidar is likely to die a quiet death."

The Route Debate: Why Multi-Sensor Fusion Is More Rational Than Camera-Only

To analyze the autonomous driving sensor configuration question clearly, and answer why multi-sensor fusion is more rational than camera-only going forward, I think we need to address at least two questions first:

Why Does Tesla Insist on Camera-Only?

First, we need to be clear: Tesla is a car brand, or what we'd call an OEM (original equipment manufacturer). Tesla's overall position in the automotive supply chain is vehicle integration — selecting suitable automotive components from the industry's massive supplier ecosystem (Tier 1) that fit its vehicle product definition, such as motors, electronic controls, mechanical structures, multimedia, and so on, then assembling and producing cars.

Tesla Model S Sensor Configuration

(Image source: Tesla official website)

From this perspective, Tesla should be "pragmatic" and "neutral" toward automotive components and technology dimensions. So why does Tesla oppose multi-sensor, particularly lidar? The problem likely lies with lidar itself: one, the technology isn't mature enough, especially since companies capable of producing truly solid-state lidar meeting automotive-grade standards don't yet have mass production capability; another issue is that lidar prices haven't dropped to a reasonable, universal level. In short, there's no suitable lidar ready for mass production in vehicles yet.

So, putting ourselves in Tesla and Musk's shoes: as the leader in autonomous driving vehicle brands and the biggest commercial beneficiary of the autonomous driving concept, Tesla already has a camera vision solution in mass production that has gained market recognition and accumulated massive data. As the most successful commercial case of camera-based autonomous driving, Tesla has full confidence in the camera-only solution and projecting that confidence to the market and users is completely necessary and logical. Conversely, if Tesla were to come out advocating multi-sensor fusion at this point, it could arguably be "misread" as admitting the camera-only solution has some flaws and deficiencies that will need other sensors to compensate for.

For a product with as long a lifecycle as a car, this kind of messaging could easily dampen the purchase desire of potential users. And objectively, there truly isn't any cost-effective solid-state lidar available on the market for Tesla to adopt. As a mature, pragmatic, famous automaker, this is clearly a "vicious cycle" that Tesla doesn't want to find itself in right now.

Since Humans Can Drive Using Only Eyes as a Single Sensor, Why Can't AI Achieve Autonomous Driving Through Vision Cameras?

To answer this, we need to dig slightly deeper into human visual recognition. First, human vision is a highly intelligent, automated complex system with functions including fixed focus, autofocus, zoom, multi-area vision, and more.

For example, while driving, our gaze switches focal length and imaging between distant and near views. And even when focusing on near views, we retain detection capability for distant targets, especially moving ones — so when there's a moving target in the distance, we can quickly switch the imaging plane to that distant target by adjusting the eye's crystalline lens.

At the same time, for targets requiring special attention, such as road obstacles, we can "fix our gaze" for detailed imaging of specific regions, providing localized target visual resolution.

Additionally, we can adjust our eyes' viewing angles and visual range through neck and head movements, avoiding blind spots while applying our retina's limited imaging resolution to regions of particular concern.

We also have a highly intelligent, automated pupil "aperture" controlling light intake, adapting to different ambient lighting conditions.

By contrast, onboard autonomous driving electronic cameras are basically fixed focal length, fixed FOV, fixed aperture, and fixed position — completely lacking the automation and flexibility of human eyes. This explains why humans can manage with just two eyes, while autonomous driving systems need a dozen or even twenty-plus cameras.

For example, the forward-facing camera mounted at the front of the vehicle needs multiple units to handle close-range, mid-range, and long-range visual imaging. The reason for using multiple cameras with divided labor rather than a "general-purpose" camera like the human eye, beyond cost considerations, is that achieving human eye-like intelligence and automation would necessarily require extensive motors, mechanical movement, and control components. Things like focal length adjustment motors face serious technical challenges in automotive environments. The reason: vehicles operate across wide temperature ranges, with intense motion and vibration, and extremely long required mean time between failures.

To summarize: due to the harshness of automotive operating environments, the industry ultimately chose "solid-state" cameras. And the current intelligence, automation, and flexibility level of these "solid-state" cameras still falls far short of human eyes.

Furthermore, on the metric of distance measurement, camera vision — even including human eyes — has massive disadvantages compared to lidar. Because in terms of the principle for obtaining distance (depth) information, unlike lidar's direct measurement method, visual ranging's actual precision and accuracy are highly correlated with the characteristics of the measured object and background imagery. This ranging principle creates the possibility of ranging algorithm failure under special background and target scenarios. Human eyes can at least compensate somewhat through "mechanical" actions like tilting the head to adjust viewing angle and obtain images from different perspectives, improving triangulation accuracy.

Even so, human eyes have innate deficiencies in distance perception and low-light conditions. Fortunately, from an evolutionary perspective, human visual capability was sufficient to "get by" throughout the long "low-speed" era of human history when we moved on foot. The automobile, this high-speed machine, has existed for an extremely brief period relative to human history.

So, before human vision has evolved (or perhaps can evolve) new sensory functions, borrowing specialized sensors like lidar and millimeter-wave radar to measure distance seems like a reasonable and smart choice. It's like how, lacking the night vision of wild animals, we can use flashlights and car headlights to navigate long roads through the dark.

So multi-sensor fusion should be a rational and almost inevitable trend for autonomous driving. As for which sensor plays the bigger role in autonomous driving algorithms, that's probably less of a concern for sensor companies and more for the algorithm developers.

/ 02 / Sensing Principles: Cameras, Millimeter-Wave Radar, and Lidar

When discussing multi-sensor fusion for autonomous driving, we're mainly talking about cameras, millimeter-wave radar, and lidar. The general characteristics, advantages, and disadvantages of these three sensors have been extensively discussed in the industry, so we won't rehash them. One point worth adding: from the perspective of sensing principles, cameras and lidar are both fundamentally optical imaging. Cameras mainly use visible light imaging, leveraging light sources like sunlight, street lamps, and vehicle headlights, making it easier to achieve relatively high planar resolution. Lidar uses its own dedicated light source, typically invisible infrared laser light, with the advantage of obtaining target distance (depth) information. Because lidar uses its own infrared laser source with narrowband spectral characteristics, it is inherently more interference-resistant than natural light imaging. Additionally, because self-emitting laser sources have coherence properties, they can leverage coherent reception technologies accumulated over the long term in the optical communications industry to achieve higher signal-to-noise ratios, obtaining more measurement data alongside distance — such as target velocity. This is the main reason the industry is bullish on FMCW (Frequency Modulated Continuous Wave) technology.

Millimeter-wave radar uses radio waves at a different frequency spectrum from light, with very clear advantages in harsh weather conditions like rain, fog, and haze. However, current millimeter-wave radar still lags behind cameras and lidar in planar resolution and distance resolution. To improve these two resolutions, millimeter-wave radar needs more channels and antenna elements, wider communication bandwidth, and higher radio frequency bands — requiring new technical efforts and presenting new cost control challenges. In the millimeter-wave radar space, there are no shortage of innovators. For example, FreeS Fund portfolio company加特兰微电子 (Gatland Microelectronics) focuses on CMOS-process millimeter-wave radar chip development and design, and was the first globally to mass-produce automotive-grade CMOS-process 77/79GHz millimeter-wave radar RF front-end chips.

Lidar is the sensor that entered the automotive industry latest while simultaneously generating the most discussion. Starting from Velodyne's multi-line mechanically scanning lidar, numerous startup teams have experimented with MEMS lidar, Flash (TOF) array imaging, OPA (Optical Phased Array) phased-array laser scanning, and other technical approaches. The effort is directed along two lines: one, reducing cost; two, minimizing mechanical moving parts as much as possible to achieve solid-state lidar, meeting the harsh working environments and ultra-long failure-free operation time requirements of automobiles.

/ 03 / Lidar: The ToF vs. OPA Debate

From a technical dimension, Flash (TOF) array imaging and OPA phased-array technical approaches are undoubtedly closer to ideal solid-state architectures, as both are based on semiconductor chip technology. MEMS technology (Micro-Electro-Mechanical System) also uses semiconductor processes and technology, but because it employs micro-electromechanical mirrors built inside chips as components for controlling laser beam direction, it essentially still contains a mechanical moving part. Meanwhile, achieving well-controlled scanning lines with large scanning angles on tiny-sized mirrors also faces many technical challenges. Flash lidar technology is the approach closest to camera technology, able to leverage many mature technologies and process achievements from the camera industry. However, because Flash lidar uses array light sources, constrained by eye safety standards and other practical limitations, Flash lidar is more suitable for close-range and mid-range application scenarios. Apple chose the Flash approach for the lidar it introduced on the 2020 iPad Pro. The lidar uses direct Time of Flight (dToF) to measure light reflected from indoor or outdoor environments at distances up to five meters. Single-photon dToF technology is a key technology for achieving lidar miniaturization, low cost, and mass producibility in the future. FreeS Fund portfolio company Nanjing CoreXVision Microelectronics focuses on photoelectric conversion device design and single-photon detection imaging technology, providing single-photon dToF 3D image sensing chips, hyperscale data center ultra-high-speed optical interconnect chips, and system solutions.

OPA lidar is the technical direction the industry places the highest hopes on. Its principle: by controlling the phase of emitted signals from each antenna in a nano-antenna array, signals from different antenna units interfere with each other, thereby controlling the direction of the output laser beam. It is the approach whose principle most closely resembles phased-array radar in the radio frequency spectrum. If it can be realized on a chip, it would be the most ideal pure solid-state lidar technical route currently visible. Beyond the advantage of being purely solid-state, because OPA lidar completely controls laser scanning direction through electrical signals, it can dynamically adjust scanning angle ranges, performing global scanning of target areas or localized fine scanning of specific regions. A single lidar could potentially cover close-range, mid-range, and long-range target distance detection. If combined with FMCW technology, it could also directly provide target velocity. In other words, OPA lidar achieves maneuverability and flexibility similar to human vision without employing any mechanical moving parts.

Due to OPA technology's inherent advantages, significant capital and technical talent have been invested both domestically and internationally. FreeS Fund has invested in two companies dedicated to OPA chips and lidar: Luminwave and 力策科技 (Lice Technology). Luminwave focuses on silicon photonics-based OPA solutions, while Lice innovated and developed spatial light modulation OPA technology and chips based on III-V semiconductor materials.

In the development of OPA technology, spatial light modulator (SLM) technology based on liquid crystal materials was once an important exploratory direction. However, liquid crystal SLM scanning speed is extremely slow (millisecond level), far from meeting autonomous driving's rapid imaging requirements. From a user perspective, Lice Technology's OPA approach is equivalent to creating a high-speed SLM, completing high-speed spatial modulation of a single-wavelength laser. Of course, from a technical dimension, Lice's OPA technology is undoubtedly a new innovation, and simultaneously the OPA solid-state lidar technical approach currently closest to mass production. After several years of effort, both Luminwave and Lice Technology, two lidar sector startups, have achieved good results in their respective technical directions. Luminwave recently completed a financing round of tens of millions of RMB, and Lice Technology is also about to begin a new financing round.

Summary

1. Multi-sensor fusion should be a rational and almost inevitable trend for autonomous driving. Current "solid-state" cameras' intelligence, automation, and flexibility levels still fall far short of human eyes. Additionally, on the metric of distance measurement, camera vision — even including human eyes — has massive disadvantages compared to lidar. Therefore, multi-sensor fusion leveraging the respective advantages of cameras, millimeter-wave radar, and lidar is currently the more feasible approach in autonomous driving.

2. Lidar is the sensor that entered the automotive industry latest while simultaneously generating the most discussion. Currently, the lidar industry is mainly pushing in two directions: one, reducing cost; two, minimizing mechanical moving parts as much as possible to achieve solid-state lidar, meeting the harsh working environments and ultra-long failure-free operation time requirements of automobiles.

3. From a technical dimension, Flash (TOF) array imaging and OPA phased-array technical approaches are undoubtedly closer to ideal solid-state architectures. Both are based on semiconductor chip technology.

Why Are New Energy Vehicles So Hot? Are They Good Long-Term? | FreeS Research Institute Feng Feng's 2021 Outlook: How We See China's Present and Future FreeS 2021 Outlook ② | 3 Trends in Biopharma Innovation FreeS 2021 Outlook ① | 8 Trends in Consumer Entrepreneurship FreeS Report 20 | Learning from History: Why We're Bullish on Industrial Robots? FreeS Year-End Special | 2019: Domestic Chips From Shadows to Unsheathed Feng Feng Column | A 2 Trillion Market and 10 Trillion Output Value: Why We're Bullish on China Chips?