Challenges of Mass Production Autonomous Driving in China
And the Recent Progress from Xpeng Motors in 2023
This blog post is based on the keynote speech in the End-to-end Autonomous Driving Workshop at CVPR 2023 held in Vancouver, titled “The Practice of Mass Production Autonomous Driving in China”. The recording of the keynote can be found here.
Autonomous driving is a daunting challenge, especially in China, where human driving is already one of the most challenging in the world. There are three main factors that comes into play: dynamic traffic participants, static road structure, and traffic signals. In particular, traffic light control signals pose a unique challenge as they are static in geometry but dynamic in semantics. In the following sessions, we will review the dynamic objects and static environments briefly, and do a deep dive on the interesting and special topic of traffic light.
Dynamic and Static Challenges
Dynamic traffic participants, such as vulnerable road users (VRUs), pose significant challenges for autonomous vehicles in China. VRUs are often unpredictable, taking on different poses and appearing where drivers least expect them. Large animals can suddenly appear on rural roads, while pets may wander onto urban streets. In addition, fully loaded vehicles or tricycles can be difficult to pinpoint the exact vehicle type. Consider the last photo in the middle row, it is actually even very challenging for humans to recognize the scene at first sight. The vehicle, loaded with tree branches, is inadvertently in perfect camouflage.

Static road structure and topology can pose a significant challenge for autonomous vehicles as well. For example, the complex intersection shown here highlights the level of complexities that needs to be addressed here. While resembling a screenshot from a sci-fi movie, this intersection is, in fact, a real place viewable on Google Earth.

If we zoom in, we will find an interesting road element which is perhaps unique in China, the Left-turn Waiting Area. It is designed to increase left turn traffic throughput, allowing more cars to go through the intersection within one cycle of traffic light. Note the design may not be symmetric, and each direction are designed individually depending on the traffic pattern. And we can even find academic papers about it and its effectiveness. Although it was proposed out of good intention, it could be really confusing for new drivers and the autonomous driving vehicle.
Turning left at an intersection with a waiting area involves a two-step process. Both of them involves different combination of traffic light signals. Here I only showed the most common traffic light pattern. The traffic light combination could be more complex, sometimes involving special traffic lights dedicated to waiting areas.

The King of Corner Case: Traffic Lights
Now we can take a deep dive into all the corner cases of traffic lights. Traffic lights are perhaps the category of objects with the most long-tail corner cases. The perception of traffic lights are complicated for two different reasons. First we have to recognize the location and type and color of the traffic light, then we also need to know out of all the traffic lights we detected, here we have six, which one our vehicle should pay attention to. To make this decision, it is essential to obtain the correct matching between traffic lights and different lanes.

One special type of light is traffic lights designed for buses. We have to recognize them correctly for two different reasons. First of all, for planning and control of ego vehicle, we need to recognize them in order to correctly ignore them, as they may carry information conflicting with the lights we should pay attention to and cause confusion for our autonomous vehicles. Yet to predict how a potential bus nearby would maneuver, we need to know its status correctly as well.

Traffic lights designed for buses in China come in many forms, including LED lights with labels such as “BRT,” “SRT,” “Bus,” or a single letter “B”. They can also feature specific Chinese characters like “公交” (bus) or “有轨电车” (monorail), and sometimes include icons depicting a cute little bus. Alongside these features, traffic sign modifiers may also be included, making it essential for autonomous vehicles to detect and recognize these features and associate them accurately with the corresponding traffic lights.


In addition to the traffic lights dedicated to buses, another complex type of traffic light is the multibulb traffic light. Unlike traditional traffic lights where only one bulb is lit up at a time, multibulb traffic lights may have multiple bulbs illuminated simultaneously within the same socket. Therefore, detecting a traffic light box is not enough; it is equally important to detect the individual light bulbs and interpret their semantic meaning accurately.
In the additonal image of multibulb traffic light, we also see some additional numbers here. They are countdown timer until the next color change. We see countdown timers for pedestrians quite often in north America, but these timers are meant for vehicles. If this piece of information is used correctly, they could be helpful for planning to improve the smoothness of the ride.

Countdown timers can take on a variety of forms and be presented in different ways. They could be standalone displays or integrated with the traffic light system. The format of the digits could vary, including the use or absence of leading zeros, and the fonts used could differ as well, with some being more artistic than others. Furthermore, there are even traffic lights designed in the style of a progress bar. This involves an animation where the progress bar gradually shortens before changing to a full progress bar of a different color. While this design may be considered the most innovative, it can also pose challenges for our perception engineers.

Finally and here are the traffic lights dedicated to waiting areas, they can take on the form of an icon, or text. The icon ones are also typically involves an animation, with lights gradually lighting up to guide you to the waiting area. Text ones could be on LED display or traffic sign boards. For text, there is no standard pattern either, which requires Optical Character Recognition (OCR) and a bit of natural language processing to extract the semantic meaning.






