High-Resolution Traffic Sensing with Autonomous Vehicles
The last decades have witnessed the breakthrough of autonomous vehicles (AVs), and the perception capabilities of AVs have been dramatically improved. Various sensors installed on AVs, including, but are not limited to, LiDAR, radar, camera and stereovision, will be collecting massive data and perceiving the surrounding traffic states continuously. In fact, a fleet of AVs can serve as floating (or probe) sensors, which can be utilized to infer traffic information while cruising around the roadway networks. In contrast, conventional traffic sensing methods rely on fixed traffic sensors such as loop detectors, cameras and microwave vehicle detectors. Due to the high cost of conventional traffic sensors, traffic state data are usually obtained in a low-frequency and sparse manner. In view of this, this paper leverages rich data collected through AVs to propose the high-resolution traffic sensing framework. The proposed framework estimates the fundamental traffic state variables, namely, flow, density and speed in high spatio-temporal resolution, and it is developed under different levels of AV perception capabilities and low AV market penetration rate. The Next Generation Simulation (NGSIM) data is adopted to examine the accuracy and robustness of the proposed framework. Experimental results show that the proposed estimation framework achieves high accuracy even with low AV market penetration rate. Sensitivity analysis regarding AV penetration rate, sensor configuration, and perception accuracy will also be studied. This study will help policymakers and private sectors (e.g Uber, Waymo) to understand the values of AVs, especially the values of massive data collected by AVs, in traffic operation and management.
As the combination of a wide spectrum of cutting-edge technologies, autonomous vehicles (AVs) are destined to fundamentally change and reform the whole mobility system [litman2017autonomous]. AVs have great potentials in improving safety and mobility [assidiq2008real, tientrakool2011highway, stern2018dissipation], reducing fuel consumption and emission [vahidi2018energy, gawron2018life], and redefining the civil infrastructure systems such as road networks [chen2016optimal, chen2017optimal, duarte2018impact], parking spaces [zhang2015exploring, harper2018exploring, millard2019autonomous], and public transit systems [lutin2018not, salonen2018passenger]. Over the past two decades, many advanced driver assistance systems (ADAS) (e.g. lane keeping, adaptive cruise control) have been deployed in various type of production vehicles. Currently, both traditional car manufacturers and high-tech companies are competing to lead the full autonomy (L4L5) technologies. For example, Waymo’s AVs alone are driving 25,000 miles every day in 2018 [waymo2] and there have been commercialized AVs operating in multiple cities by Uber [uberatg].
Despite of the rapid development of AVs technologies, there is still a long way to reach the full autonomy and to completely replace all conventional vehicles with AVs. We will witness a long period over which AVs and conventional vehicles co-exist on public roads. How to sense, model and manage the mixed transportation systems presents a great challenge to public agencies. To the best of our knowledge, most current studies view AVs as controllers and focus on modeling and managing the mixed traffic networks [zhao2019enhanced]. For example, novel System Optimal (SO) and User Equilibrium (UE) models are established to include AVs [levin2017congestion, wang2019multiclass], coordinated intersections are proposed to improve the traffic throughput [shida2009development, li2018piecewise, yu2018integrated], vehicle platooning strategies are developed to reduce highway congestion [li2015overview, gong2016constrained], and AVs can also complement conventional vehicles to solve last-mile problems [chong2011autonomous, moorthy2017shared]. However, there is a lack of studies in traffic sensing methods for the mixed traffic networks.
In this paper, we advocate the great potentials of AVs as moving observers in high-resolution traffic sensing. We note that traffic sensing with AVs in this paper is different from perception of AVs [van2018autonomous]. The perception of AV is the key to the safe and reliable AVs, and it refers to the ability of AVs to collect information and extract relevant knowledge from the environment using various sensors [pendleton2017perception]. While traffic sensing with AVs refers to estimating the traffic conditions, such as flow, density and speed using the information perceived by AVs [seo2017traffic]. To be precise, traffic sensing with AVs is built on top of the perception technologies on AV, and in this paper we will discuss the impact of different perception technologies on traffic sensing.
In fact, a fleet of autonomous vehicles (AVs) can serve as floating (or probe) sensors, detecting and tracking the surrounding traffic conditions to infer traffic information while cruising around the roadway network. Enabling traffic sensing with AVs is cost effective. The AVs equipped with various sensors and data analytics capabilities may be costly. While costly, those sensors and data are used primarily to detect and track adjacent objects to enable safe AV driving in the first place. Therefore, there is no additional overhead cost of these data collections for traffic sensing since it is a secondary use.
High-resolution traffic sensing is central to traffic management and public policies. For instance, local municipalities would need information regarding how public space (e.g. curbs) is being utilized to set up optimal parking duration limits; Metropolitan planning agencies would need various types of traffic/passenger information, including travel speed, traffic density and traffic flow by vehicle classifications, as well as pedestrians and cyclists. Infrastructure planning requires intensive spatial and temporal coverage, and having data only at sparse locations on highways is far from being sufficient; Those data would also be essential to design optimal traffic signal timing and schedule infrastructure maintenance. In addition, non-emergent and emergent incidents are reported by citizens through 911 system, respectively. Automated traffic sensing, both historical and in real time, can complement those systems to enhance their timeliness, accuracy and accessibility. In general, accurate and ubiquitous information of infrastructure and usage patterns in public space are currently missing.
By leveraging the rich data collected through AVs, we are able to detect and track various objects in transportation networks. The objects include, but are not limited to, moving vehicles by vehicle classifications, parked vehicles, pedestrians, cyclists, signage in public space. When all those objects in high spatio-temporal resolutions are being continuously tracked, those data can be translated to useful traffic information for public policies and decision making. The three key features of traffic sensing based on autonomous vehicles sensors are: inexpensive, ubiquitous and reliable. Those data are collected by automotive manufacturers for guiding autonomous driving in the first place, which promises great scalability in this approach. With minimum additional efforts, the same data can be effectively translated into information useful for the community. For instance, how much time in public space at a particular location is utilized by different classifications of vehicles and by what travel modes, respectively? Can we effectively evaluate the accessibility, mobility and safety of the mobility networks? The sensing coverage will become ubiquitous in the near future, provided with an increasing market share of autonomous vehicles. Data acquired from individual autonomous vehicles can be compared, validated, corrected, consolidated, generalized and anonymized to retrieve most reliable and ubiquitous traffic information. In addition, this paper for traffic sensing also implies the future possibility of interventions for effective and timely traffic management. It enables real-time traffic monitoring, potentially safer traffic operation, faster emergency response, and smarter infrastructure management.
The rest of this paper focuses on a critical problem to estimate the fundamental traffic state variables, namely, flow, density and speed, in high resolution to demonstrate the sensing power of AVs. In addition to traffic sensing, there are many aspects and data in community sensing that could be explored in the near future. For example, perception of AVs can be used for monitoring urban forest health, air quality, street surface roughness and many other applications of municipal asset management [ma2019measuring, xu2019ilocus, mahmoudzadeh2019estimating].
Traffic state variables (e.g. flow, density and speed) play a key role in traffic operation and management. Over the past several decades, traffic state estimation (TSE) methods have been developed for not only stationary sensors (i.e. Eulerian data) but also moving observers (i.e. Lagrangian data) [sun2017simultaneous]. Stationary sensors, including loop detectors, cameras and radar, monitor the traffic conditions at a fixed location. Due to the high installation and maintenance cost, the stationary sensors are usually sparsely installed in the network, and hence the collected data are not sufficient for the practical traffic operation and management [jain2019review]. Data collected by moving observers (e.g. probe vehicles, ride-sourcing vehicles, unmanned aerial vehicles, mobile phones) has a better spatial coverage and hence it enables cost-effective TSE in large-scale networks [antoniou2011synthesis]. Though the TSE method with moving observer can date back to 1954 [wardrop1954method], recent advances in communication and Internet of Things (IoT) technologies have catalyzed the development and deployment of various moving observers in real-world. Readers are referred to wang2005real, seo2017traffic for a comprehensive review of existing TSE models.
To highlight our contributions, we present studies that are closely related to this paper. The moving observers can be categorized into four types: originally defined moving observers, probe vehicles (PVs), unmanned aerial vehicles (UAVs) and AVs. Their characteristics and related TSE models are presented as follows:
Originally defined moving observers. The moving observer method for TSE is originally proposed by wardrop1954method. The proposed method requires a probe vehicle to transverse along the road and count the number of slower vehicles overtaken by the probe vehicle and the number of faster vehicles which overtake the probe vehicle [wright1973theoretical]. Though the setting of the originally define moving observers is too ideal for practice, it enlightened us on the value of using Lagrangian data for TSE.
PVs. The PVs refer to all the vehicles that can be geo-tracked, and it includes, but is not limited to, taxis, buses, trucks, connected vehicles, ride-sourcing vehicles [zheng2015trajectory]. The PV data has great advantages in estimating speed, while it hardly contains density/flow information. Studies have explored the sensing power of PVs [o2019quantifying]. PV data is usually used to complement stationary sensor data to enhance the traffic state estimation [herrera2010incorporation, van2018macroscopic]. PVs with spacing measurement equipment can estimate traffic flow and speed simultaneously [wilby2014lightweight, seo2015traffic, seo2015estimation, fountoulakis2017highway].
UAVs. By flying over the roads and viewing from top-view perspectives, UAVs are able to monitor a segment of road or even the entire network [puri2005survey, kanistras2015survey, ke2018real]. UAVs have the advantage of better spatial coverage while extra purchase of UAVs and the corresponding maintenance cost are required. Traffic sensing with UAV has been extensively studied in recent years, including vehicle identification algorithms [zhu2018urban, khan2018unmanned, ke2018real], sensing frameworks [jin2016unmanned, niu2018uav], and UAV routing mechanisms [li2018unmanned, liu2019real].
AVs. AVs can be viewed as probe vehicles equipped with more sensors and hence have better perception capabilities. Not only the AV itself can be geo-tracked, the vehicles surrounded by AVs can also be detected and tracked. AVs also share some similarities with UAVs because AVs can scan a continuous segment of road. We believe AVs fall in between the PVs and UAVs, and hence existing TSE methods can hardly be applied to AVs. Furthermore, there are few studies on TSE with AVs. chen2017cyber presents a cyber-physical system to model the traffic flow near AVs based on flow theory, while the TSE for the whole road is not studied. Recently, Uber ATG conduct an experiment to explore the possibility of TSE using AVs [tseav].
Given the unique characteristics of AVs as moving observers, there is a great need to study the AV-based TSE methods. In view of this, we develop a data-driven framework that estimates high-resolution traffic state variables, namely flow, density and speed using the massive data collected by AVs. The framework clearly defines the task of TSE with AVs involved and considers different perception levels of AVs. A two-step TSE method is proposed under a low AV market penetration rate. The main contributions of this paper are summarized as follows:
It discusses the functionality and role of various sensors in traffic state estimation. The sensing power of AVs is categorized into three levels.
It builds a two-step framework that leverages the sensing power of AVs to estimate high-resolution traffic state variables. The first step directly translates the information observed by AVs and the second step employs data-driven methods to estimate the information that is not observed by AVs. The proposed estimation methods are data-driven and can be interpreted by the traffic flow theory.
The Next Generation Simulation (NGSIM) data is adopted to examine the accuracy and robustness of the proposed framework. Experimental results are compelling, satisfactory and interpretable. Sensitivity analysis regarding AV penetration rate, sensor configuration, and perception accuracy will also be studied.
The remainder of this paper is summarized as follows. Section 2 discusses the sensing power of AVs. Section 3 rigorously formulates the high-resolution TSE framework with AVs, followed by a discussion of the solution algorithms in section 4. In section 5, numerical experiments are conducted with NGSIM data to demonstrate the effectiveness of the proposed framework. Lastly, conclusions are drawn in section 6.
2 Sensing power of autonomous vehicles
In this section, we will discuss different levels of AV perception capabilities and how they associate with traffic sensing. We first discuss various sensors installed on AVs and their relation to traffic sensing. Analogous to the automation level definitions from Society of Automotive Engineers (SAE), we define three sensing levels of AVs. Lastly, we discuss a conceptual data center for processing the sensing data.
In this section, we discuss different types of sensors used for AV perception and their potential usage for traffic sensing. Sensors for perception that are mounted on AVs include, but are not limited to, camera, stereo vision camera, LiDAR, radar and sonar [pendleton2017perception].
A camera can detect shapes and colors, so it is widely used for object detection (e.g. signals, pedestrians, vehicles and lane marks). Due to its low cost, multiple cameras can be mounted on a single AV. Theoretically, studies have shown that camera data can be used for object detection, tracking and traffic sensing [shan2015camera, bautista2016convolutional]. In practice, camera image does not contain depth (distance) information, localization of vehicles is challenging for a single camera. On the modern AV prototypes, cameras are usually fused with stereo vision camera system or LiDAR to perceive the surrounding environments. In particular, the shape and color information obtained from camera is essential for object tracking [aly2008real, dollar2009pedestrian]. Stereo vision camera refers to a device with two or more cameras horizontally mounted. Stereo vision camera is able to obtain the depth information of each pixel from the slightly different images taken by its cameras.
Light Detection and Ranging (LiDAR) uses the pulsed laser beam to measure the distance between the detected object and itself. LiDAR can also obtain the 3D shape of the detected object. The LiDAR used on AVs is typically 360°, and the detection range varies from 30 to 150 meters, depending on makers, detection algorithms and weather conditions. Both LiDAR and stereo vision camera can be used for vehicle detection and 3D mapping. The system latency (time delay for processing the retrieved data) of stereo vision camera is higher than LiDAR, though the price of stereo vision camera is much cheaper [van2018autonomous]. Theoretically either of the LiDAR or stereo vision camera can be used to build the full AV perception system, while currently most of AVs use LiDAR as the primary sensor.
There are two types of radar mounted on AVs. The short-range radar (SRR) is typically used for blind spot detection, parking assist and collision warning. The range for SRR is around meters [takatori2006stand]. Similarly, sonar, with its limited detection range (3 to 5 meters), is also frequently used for blind spot detection and parking assist. Neither of the two sensors are considered as appropriate sensors for traffic sensing. In contrast, the long-range radar (LRR), which is primarily used for adaptive cruise control, can be potentially used for traffic sensing. The range for LRR is around meters and it is dedicated to detect the preceding vehicle in its current lane.
To conclude, Table 1 summarizes a list of sensors that can be potentially used for traffic sensing based on above discussions and Thakur2018, van2018autonomous.
|Camera||Surrounding vehicle detection/tracking, lane detection||meters|
|Stereo vision camera||Surrounding vehicle detection/tracking, 3D mapping||meters|
|LiDAR||Surrounding vehicle detection/tracking, 3D mapping||meters|
|Long-range radar||Preceding vehicle detection||150 meters|
2.2 Levels of perception
In this section, we discuss how to categorize the sensing power of AVs with sensors listed in Table 1 mounted. The Society of Automotive Engineers (SAE) has proposed a six-level classification criteria for autonomous vehicles [avlevel]. L1 AVs can conduct adaptive cruise control, which is fulfilled by the long-range radar. From the perspective of traffic sensing, L1 AV can always detect the location and speed of its preceding vehicle in the same lane. From L2 to L5, AVs gradually take control from human drivers. To achieve that, AVs need to continuously observe the surrounding traffic conditions. From the perspective of traffic sensing, L2-L5 vehicles can detect or track the vehicles in their surrounding areas. Here we emphasize the difference between vehicle detection and vehicle tracking. Detection refers to the localization of a certain vehicle when it appears in the detection area of an AV, and tracking means that AV can keep track of a certain vehicle when it is within the detection area. To be precise, the task of detection does not require to “memorize” the detected vehicles in each time frame, while tracking requires the AV to keep track of the detected vehicles as long as they are within the detection range. Tracking is technically much more challenging than the detection. As of today, the detection technology is fairly mature, while the tracking technology is still not ready for real-world applications [milan2016mot16]. The reason for the difference is that the detection/tracking is conducted frame by frame on AVs. If the AV processes 30 frame per-second, tracking requires to detect all the vehicles in each frame and match them correspondingly, while detection does not require to match the vehicles in different frames. The matching is challenging because vehicles often block each other, and this makes it difficult for machines to decide whether the detected vehicle is the same vehicle detected in previous frames. From the perspective of traffic sensing, detection only provides the locations of each vehicles but tracking can provide additional speed information.
Analogous to the SAE’s automation level definitions, we define three levels of sensing power for AVs, as presented in Figure 1.
The precise descriptions of the three perception levels are as follows.
: The primary task for is to track the preceding vehicle, and this is originally used for the adaptive cruise control (ACC). However, the speed and location of the preceding vehicle are obtained for TSE.
: In addition to , the primary task for is to detect and locate surrounding vehicles. Only vehicle counting is needed, and the speed information is not required in .
: In addition to , the primary task for is to track every single vehicle in the detection area, hence the location and speed of each vehicle is monitored by AVs in .
Based on the definition of AV perception levels, requires a LRR dedicated for preceding vehicles, requires LiDAR/radar system, and require a comprehensive sensor fusion of camera, LiDAR, and radar. To be precise, section 2.3 discusses how different sensors are combined to fulfill different levels of sensing power.
2.3 Detection area of AVs
We clearly define the surrounding area (or detection area) of AVs, which is used throughout the whole paper. The detection area of AVs depends on the sensor configurations. Figure 2 presents two configurations of AV sensors. In the model of nuScenes, various sensors are mounted at different locations of an AV, while Waymo integrates most of the sensors on the top of the vehicle. Based on various sensor configurations on different AVs, the detection area of AVs can be different [ihs].
In this paper, we adopt a simplified representation of AV detection area, as presented in Figure 3.
The detection area in Figure 3 consists of two components: and . is used for detecting the preceding vehicles, and it is fulfilled by the LRR; is for detecting the surrounding vehicles, which is supported by the combination of LiDAR and cameras. We assume only is active in , while both and are active in and , as presented in Table 2.
|Sensing Power||Detection Area||Information Obtained|
|Speed/location of the preceding vehicle|
|and||Speed/location of the preceding vehicle, location of surround vehicles|
|and||Speed/location of the preceding vehicle and surrounding vehicles|
2.4 Data center
In this paper, we assume that there is a data center that receives all the information sent by AVs, as presented in Figure 4. Due to the bandwidth and latency restrictions, AVs can not send all the raw data to the data center. Instead, the AV only sends the location and speed of the vehicles it has detected to the data center. The main task for the data center is to aggregate the information and remove the redundant information when the same vehicle is detected multiple times by different AVs. This task can be done by checking and matching the location of the detected vehicles. For example, the vehicle with green rectangle in Figure 4 is detected by two AVs, hence two duplicate data points are sent to the data center and the data center is able to identify and clean these duplicate data points. The localization accuracy is usually within the size of a standard vehicle, hence the accuracy for matching and cleaning is high [wolcott2015fast]. In the numerical experiments, we will conduct sensitivity analysis to evaluate the impact of different matching accuracies.
In this section, we rigorously formulate the traffic state estimation (TSE) framework with AVs. We first present the notations, and then the traffic states variables are defined. A two-step estimation method is proposed: the first step directly translates the information observed by AVs and the second step employs data-driven methods to estimate the information that is not observed by AVs.
All the notations will be introduced in context, and Table 3 provides a summary of the commonly used notations for reference.
|Index of a certain lane|
|The set of all lane indices|
|The set of all time points in the study period|
|The set of all time points in time interval|
|A certain longitudinal location along the road|
|The set of all longitudinal locations on lane|
|Traffic state that is not directly observed by AVs|
|The Lebesgue measure for either one or two dimensional Euclidean space|
|The counting measure for the countable sets|
|Variables in a time-space region|
|Index of a certain time interval|
|The set of all indices in the study period|
|Index of a certain longitudinal road segment|
|The set of all indices|
|The set of all longitudinal locations in road segment and lane|
|A cell in time-space region for time interval road segment and lane|
|Average speed for time interval road segment and lane|
|Average traffic flow for time interval road segment and lane|
|Average density for time interval road segment and lane|
|The headway area of vehicle in time-space region|
|Variables related to vehicles|
|Index of a certain vehicle|
|The set of all vehicle indices|
|The set of all vehicles indices in time interval road segment and lane|
|Instantaneous speed of vehicle at time|
|Instantaneous headway of vehicle at time|
|Instantaneous longitudinal location of vehicle at time|
|The lane in which vehicle is located at time|
|The time point when the vehicle enters the road|
|The time point when the vehicle exits the roads|
|The distance traveled by vehicle on lane|
|The time spent on lane by vehicle|
|Variables related to autonomous vehicles|
|Index of detection area|
|The detection area of an AV|
|The set of all autonomous vehicle indices|
|The detection area of all AVs in road segment on lane|
|The set of time-space indices such that is covered by the AV detection range in time interval|
|Variables related to the sensing framework|
|The directly observed density for time interval road segment and lane|
|The directly observed speed for time interval road segment and lane|
|The estimated density and speed for time interval road segment and lane|
|The estimated speed for time interval road segment and lane|
3.2 Modeling traffic states in time-space region
We consider a highway with lanes, where . The operator is the counting measure for countable sets. For each lane , we denote as the set of longitudinal locations on lane , hence is the length of lane . In this paper, we treat each lane as a one-dimensional line. Without loss of generality, we set the starting point of to be , hence , where is the length of lane . Throughout the paper, we denote operator as the Lebesgue measure in either one or two dimensional Euclidean space, and it represents the length or area for one or two dimensional space. Note in this paper we assume the length of each lane is the same , while the proposed estimation method can be easily extended to accommodate different lane lengths. We further discretize the road to equal road segments and each road segment is denoted by , where is the index of the road segment and . Hence we have and . The above formulation is visualized in Figure 5.
We denote as the index of a certain vehicle and as the set of all vehicle indices. Suppose we know the location , speed , and space headway of any vehicle at any time point , where is the longitudinal location of vehicle at time , is the lane in which vehicle is located at time , and is the set of all time points in the study period. We assume each vehicle only enters the highway once. If a vehicle enters the highway multiple times, the vehicle at each entrance will be treated as a different vehicle. To obtain the traffic states, we construct the distance and time and headway area from vehicle location , speed , and headway for a certain vehicle based on edie1963discussion. Suppose denotes the time point when the vehicle enters the highway and denotes the time point when the vehicle exits the highway, we denote the distance traveled and time spent on lane by vehicle as and , respectively. Mathematically, and are presented in Equation 1.
We use the headway area to represent the headway between vehicle and its preceding vehicle on lane in the time-space region, and it is represented by Equation 2.
When we have the trajectories of all vehicles on the road, we can model the traffic states of each lane in a time-space region. Without loss of generality, we set the starting point of to be zero, hence we have , where is the length of the study period. We discretize the study period to equal time intervals, where . We denote as the set of time points for interval , where . Therefore, we have , where . In this paper, we use uniform discretization for and to simplify the formulation, while the proposed estimation methods work for arbitrary discretization scheme.
We use to denote a certain cell in the time-space region for road segment and time period , as presented in Equation 3.
We denote the headway area of vehicle in cell by , as presented in Equation 4.
Example 1 (Variable representation in time-space region).
In this example, we illustrate the variables defined in the time-space region. We consider a one-lane road and the lane index is . is segmented into road segments (), and is segmented into time intervals (), as presented in Figure 6. The cell is the intersection of and , is the intersection of and .
Each green line in the time-space region represents the trajectory of a vehicle. In Figure 6, we highlight the first (), second () and the 8th () vehicle trajectory. The distance traveled by each vehicle is the same, hence . We also highlight in Figure 6, which represents the time spent by each vehicle on lane .
The headway area of vehicle , denoted by , is represented by the green shaded area. The red shaded area, which represents , is the intersection of and based on Equation 4.
According to edie1963discussion, seo2015estimation, we compute the traffic states variables, namely flow , density and speed , for each road segment and time period , as presented in Equation 5.
We treat the traffic states (e.g. flow , density and speed ) estimated from full samples of vehicles as ground truth and unknown. In the following sections, we will develop a data-driven framework to estimate the traffic states from the partially observed traffic information obtained from autonomous vehicles under different levels of perception power.
3.3 Overview of the traffic sensing framework
In this section, we present an overview of the traffic sensing framework. We assume a subset of vehicles are AVs, namely , where denotes the index set of all AVs. The goal for the traffic sensing framework is to estimate the density and speed using the information observed by AVs. Once the speed and density are estimated accurately, the traffic flow can be obtained by the conservation law [bressan2015conservation]. The framework consists of two major parts: direct observation and data-driven estimation, as presented in Figure 7.
In the direct estimation step, density and speed are observed directly through AVs. Since AVs are moving observers [wardrop1954method], traffic states can only be observed partially for a certain set of time intervals and road segments (i.e. cells) in a time-space region. Section 3.4 will rigorously determine the set of cells that can be directly observed by AVs and compute the direct observations from information obtained by AVs. We will discuss the direct observation with different levels of sensing power. The second part aims at filling up the unobserved information with data-driven estimation methods. The functions and are used to estimate the unobserved density and speed on lane , respectively. Details will be presented in section 3.5.
3.4 Direct observation
In this section, we present to compute traffic states using the information that is directly observed by AVs under different levels of perception. Suppose the detection area of AV at time is , and consists of two parts, which are the detection area for preceding vehicles () and the detection area for surrounding vehicles (), as discussed in section 2.3. We further denote the detection area of vehicle in road segment on lane by , as presented in Equation 6.
The next step is to discretize the detection area into the time-space region. We define as the set of time-space indices such that is covered by the detection range in time interval , as presented in Equation 8.
where is the tolerance and is set to in this paper.
Now we are ready to rigorously formulate the traffic states that can be directly observed by AVs under different levels of perception. As a notation convention, we use to represent the information that cannot be directly observed by AVs, and denote the directly observed density and speed, respectively.
3.4.1 : tracking the preceding vehicle
In the perception level , an AV can only detect and track its preceding vehicle, and hence its detection area for density and speed is . The observed density and speed can be represented in Equation 9.
Equation 9 is proven to be an accurate estimation of the traffic states [seo2015estimation]. We note that some AVs also have a LRR mounted to track the following vehicle behind the AVs, and this situation can be accommodated by replacing the set with in Equation 9, where represents all the vehicles that follow AVs in .
When the AV market penetration rate is low, only covers a small fraction of all cells in the time-space region, especially for multi-lane highways. In contrast, covers more cells than . Practically, it implies that the LiDAR and cameras are the major sensors for traffic sensing.
3.4.2 : locating surrounding vehicles
In the perception level , both and are enabled by the LRR, LiDAR and cameras, while can only detect the location of surrounding vehicles. Hence the density can be observed in both and , and the speed is only observed in . The estimation method for cannot be used for since the preceding vehicles of the detected vehicle might not be detected, hence cannot be estimated accurately. Instead, provides a snapshot of the traffic density at a certain time point, and we can compute the density of time interval by taking the average of all snapshots, as presented in Equation 10.
where represents the set of time indices when is covered by the in , and represents the set of vehicles detected by in at time .
3.4.3 : tracking surrounding vehicles
In the perception level , both localization and tracking are enabled by the LRR, LiDAR and cameras. In addition to the information obtained by , speed information of surrounding vehicles in are also available. Similar to the density estimation, we first computed the instantaneous speed of a cell at a certain time point by taking the harmonic mean of all detected vehicles, and then the average speed of a cell is computed by taking the average of all time points, as presented by Equation 11.
where represents the harmonic mean. Though provides the most speed information, the directly observed density is the same for and . Overall, the sensing power of AV increases as more cells are directly observed from to . In the following section, we will present to fill the using data-driven methods.
3.5 Data-driven estimation method
In this section, we propose a data-driven framework to estimate the unobserved density and speed in . To differentiate the density (speed) before and after the estimation, we use and to represent the estimated density and speed for time interval road segment and lane , while denote the density and speed before the estimation (i.e. after the direct observation). The method consists of two steps: 1) estimate the unobserved density given the observed density ; 2) estimate the unobserved speed given that the density is fully known from estimation and speed is partially known from direct observation.
where is a generalized function that takes the observed density and time/space index as input and outputs the estimated density. is also a generalized function to estimate speed, while its inputs include the observed speed , the estimated density , and the time/space index . In this paper, we propose matrix completion-based methods for , and both matrix completion-based and regression-based methods for . Details are presented in the following subsections.
3.5.1 Matrix completion-based methods
The matrix completion-based model can be used to estimate either density or speed. We first assume that densities (speeds) in certain cells are directly observed by the AVs, as presented in Equation 14.
where we denote and as the detection area for density and speed of lane in time-space region with sensing power . Precisely, and .
For each lane , the estimated density (or speed ) forms a matrix in the time-space region, and each row represents a certain road segment and each column represents a certain time interval . Some entries ( or ) in the density matrix (or speed matrix) are missing. To fill the missing entries, many standard matrix completion methods can be used. For example, the naive imputation (imputing with the average values across all time intervals or across all cells), k-nearest neighbor (k-NN) imputation [troyanskaya2001missing], and the singular-value decomposition (SVD)-based SoftImpute algorithm [hastie2015matrix].
3.5.2 Regression-based methods
The speed data can also be estimated by a regression-based model given the density is fully estimated. We train a regression model to estimate the speed from densities for lane , as presented in Equation 15.
where represent the number of nearby lanes, time intervals and road segments considered in the regression model. The intuition behind the regression model is that the speed of a cell can be inferred by the densities of its neighboring cells. A specific example of is the fundamental digram [newell1993simplified], which is formulated by by setting .
In this paper, we adopt a simplified function presented in Figure 8. Suppose we want to estimate the speed for cell 1, there are 12 neighboring cells (including cell 1) considered as inputs. The regression methods adopted in this paper are Lasso [tibshirani1996regression] and random forests [breiman2001random].
4 Solution Algorithms
In this section, we discuss some practical issues regarding the traffic sensing framework proposed in section 3.
4.1 Computation of
To obtain the ground truth (Equation 5) and the observed density (Equation 9), , which denotes the headway area of vehicle in cell (Equation 4), needs to be computed in the time-space region. is computed by intersecting and , and can be represented by a rectangle in the time-space region. The headway area for vehicle is usually banded [seo2015estimation], which can be approximated by a polygon. Therefore, can also be represented by a polygon, and the interaction of a rectangle (which is also a special polygon) and a polygon can be conducted efficiently [strobl2017dimensionally].
4.2 Sampling rate
As discussed in section 2.4, the AVs send messages to data center periodically. Let the sampling rate denote the message sending frequency, and we assume that all AVs have the same sampling rate. When the sampling rate is high, the data center can obtain the density and speed information in high temporal resolution, hence the traffic sensing can be accurate. On the other hand, the sampling rate is limited by the bandwidth and latency of the message transmission network. In the numerical experiments, sensitivity analysis will be conducted to study the impact of sampling rate.
In the data-driven method presented in section 3.5, the cross-validation is conducted for model selection in both matrix completion-based and regression-based methods.
In the matrix completion-based model, we use cross-validation to select the maximal rank in the SoftImpute and the number of nearest neighbors in the k-NN imputation [kanagal2010rank]. To perform the cross-validation for the matrix completion, we randomly hide of the matrix entries and run the imputation methods on the rest of entries. Then we measure the imputation accuracy by comparing the imputed values and the actual values on the hidden entries .
In the regression-based model, 5-fold cross-validation is performed to select the optimal parameter settings for different regression methods, such as the weight of regularization term in Lasso, number of base estimators in random forests.
5 Numerical Experiments
In this section, we conduct the numerical experiments with NGSIM data to examine the proposed TSE framework. All the experiments below are conducted on a desktop with Intel Core i7-6700K CPU @ 4.00GHz 8, 2133 MHz 2 16GB RAM, 500GB SSD, and the programming language is Python 3.6.8.
5.1 Data and experiment setups
We use the Next Generation Simulation (NGSIM) data to validate the proposed framework. NGSIM data contains high-resolution vehicle trajectory data on different roads [alexiadis2004next]. Our experiments are conducted on I-80, US-101 and Lankershim Boulevard, and the overviews of the three roads are presented in Figure 9. NGSIM data is collected using digital video camera, and its temporal resolution is 100ms. Details of the three roads can be found in alexiadis2004next, he2017research.
We assume that a random set of vehicles are AVs and the AVs can perceive the surrounding traffic conditions. Given the limited information collected by AVs, we estimate the traffic states using the proposed framework. We further compare the estimation results with the ground truth computed from the full vehicle trajectory data. The Normalized Root Mean Squared Error (NRMSE), Symmetric Mean Absolute Percentage Error (SMAPE1, SMAPE2) will be used to examine the estimation accuracy, as presented by Equation 16. SMAPE2 is considered as a robust version of SMAPE1 [li2014multimodel].
where is the true vector, is the estimated vector, is the index of the vector, and is the set of indices in vector and . When comparing two matrices, we flatten the matrices to vector and then conduct the comparison.
Here we describe all the factors that affect the estimation results. The market penetration rate denotes the proportions of AVs in the fleet. In the experiments, we assume the AVs are uniformly distributed in the fleet. The detection area is a ray fulfilled by LRR and is a circle fulfilled by LiDAR. We assume LiDAR has a detection range and it might also oversee a vehicle with a certain probability (referred to as missing rate). The AVs can be at one level of perception, as discussed in section 2. The sampling rate of data center can be different. In addition, different data-driven estimation methods are used to estimate the density and speed, as presented in section 3.5. We define LR1 and LR2 as Lasso regression, and RF1 and RF2 as random forests regression. The number 1 means only cells to are used as inputs, while the number 2 means all cells in Figure 8 are used as inputs. SI denotes the SoftImpute, KNN denotes the k-nearest neighbor imputation, and NI denotes the naive imputation by simply replacing missing entries with the mean of each column.
Baseline setting: The market penetration rate of AVs is . The detection range of LRR is meters, and the detection range of LiDAR is meters with missing rate. The level of perception is , and the speed is detected without any noise. The sampling rate of data center is 1 Hz. SI is used to estimate density and LR2 is used to estimate speed. We set and .
5.2 Basic results
We first run the proposed estimation method with the baseline setting. The estimation method takes around 7 minutes to estimate all three roads, and the most time consuming part is the information aggregation in the data center (discussed in section 2.4) and the computation of Equation 11. The estimation accuracy is computed by averaging the NRMSE, SMAPE1 and SPAME2 through all lanes, and the results are presented in Figure 4.
In general, the estimation method yields accurate estimation on highways (I-80 and US-101), while it underperforms on the complex arterial road (Lankershim Boulevard). Estimation accuracy of speed is always higher than that of density, which is because the density estimation requires every vehicle being sensed while speed estimation only needs a small fraction of vehicles being sensed [long2002probe].
Estimation accuracy on different lanes. We then examine the performance of the proposed method on each lane separately, and the estimation accuracy of each lane is summarized in Table 5.
On can read from Table 5 that the proposed method performs similarly on most lanes. One interesting observation is that the proposed method performs well on Lane 6 on I-80, Lane 5 on US-101 and Lane 1 on Lankershim Blvd, and those lanes are merged with ramps. This implies that the proposed method has potentials to work well on merging intersections.
The proposed method performs differently on lanes that are near the edge of roads. For example, the proposed method yields the worst density estimation and the best speed estimation on lane 1 of I-80, which is an HOV lane. The vehicle headway is relatively large on the HOV lane, hence estimating density is more challenging given limited detection range of LiDAR. In contrast, speed on HOV lane is relatively stable, making the speed estimation easy. In addition, the estimation accuracy of the Lane 4 of Lankershim Blvd is low, as a result of the physical discontinuity of the lane.
To visually inspect the estimation accuracy, we plot the true and estimated density and speed in time-space region for Lane 2 and Lane 4 in Figure 10 and 11. It can be seen that the estimated density and speed resemble the ground truth, even the congestion is discontinuous in the time-space region (see Lankershim Blvd in Figure 10). Again the Lane 4 of Lankershim Blvd is physically discontinuous, hence a large block of entries are entirely missing in time-space region (see the third row of Figure 11), and the blocked missingness may affect the proposed methods and increase the estimation errors.
5.2.1 Regression Coefficients
When estimating the speed, we use the LR2 for each lane separately. In this section, we look at the regression coefficients of the fitted Lasso model and interpret the coefficients from the perspective of the traffic flow theory. In particular, we select Lane 2 on US-101 and summarize the fitted coefficients in Table 6. The regression coefficients for other lanes and networks can be found in the supplementary materials.
The R-square for Lane 2 on US-101 is 0.832, indicating the regression model is fairly accurate. From Table 6, one can see the intercept is positive and it represents the free flow speed when the density is zero. Coefficients for x1 to x12 are all negative with high confidence, and this implies that higher density yields lower speed.
Recall Figure 8, suppose we want to estimate the density for cell 1, we refer to cell as the surrounding cells in the current lane and cell as the surrounding cells in the nearby lanes. The coefficients of x1 to x4 are the most negative, indicating the densities of the surrounding cells in the current lane have the highest impact on the speed. The densities of surrounding cells in the nearby lanes also have negative impact on the speed but the magnitude is lower.
5.3 Comparing different algorithms
In this section, we examine different methods in estimating density and speed. Recall in section 3.5, the matrix completion-based methods can estimate both density and speed while the regression-based methods can only estimate the speed. We run the proposed estimation method with different combinations of estimation methods for density and speed, and the rest of settings are the same as the baseline setting. To be precise, three methods are used to estimate density: Naive Imputation (NI), k-nearest neighbor imputation (KNN) and SoftImpute (SI). Seven methods will be used to estimate speed and they are NI, KNN, SI, LR1, LR2, RF and RF2. We plot the heatmap of MSAPE1 for each road separately, as presented in Figure 12.
The speed estimation does not affect the density estimation as the density estimation is conducted first. SI always outperforms KNN and NI for density estimation. Different combination of algorithms perform differently on each road. We use A-B to denote the method that uses A for density estimation and B for speed estimation. NI-LR2 on I-80, SI-LR2 on US-101 and SI-RF on Lankershim Blvd outperform the rest of the methods in terms of speed estimation. Overall the SI-LR2 generates accurate estimation for all three roads.
5.4 Impact of sensing power
We analyze the impact of sensing power of AVs on the estimation accuracy. Recall section 2.2 and 3.4, we consider three levels of perception for AVs. Based on Equation 9, 10 and 11, more entries in the time-space region get directly observed when the perception level increases. We run the proposed estimation method with different perception levels and different methods for speed estimation. Other settings are the same as the baseline setting. The heatmap of MSAPE1 for each road is presented in Figure 13.
As shown in Figure 13, the proposed methods performs the best on US-101 and the worst on Lankershim Blvd. With 5% market penetration rate, at least S2 is required for I-80 and US-101 to obtain an accurate traffic state estimation. Similarly, S3 is required for Lankershim Blvd to ensure the estimation quality. Later we will discuss the impact of market penetration rate on the estimation accuracy under different perception levels.
The estimation accuracy improves for all speed estimation algorithms and all three roads when the perception level increases. Different speed estimation algorithms perform differently on different roads within the same perception level. For example, in S2, the imputation-based methods outperform the regression-based method on I-80 and US-101, while the Lasso regression outperforms the rest on Lankershim Blvd. In S3, all the density estimation methods perform similarly on I-80 and US-101, while the regression-based method significantly outperform the imputation-based methods on Lankershim Blvd in terms of density estimation.
5.5 Impact of AV market penetration rate
To examine the impact of AV market penetration rate, we run the proposed method with different market penetration rates ranging from to , and the rest of settings are the baseline setting. The experiment results are presented in Figure 14.
Generally the estimation accuracy increases when the AV market penetration rate increases. penetration rate is large enough for an accurate estimation for I-80 and US-101, while Lankershim Blvd requires larger penetration rate. To further investigate the impact of market penetration rate under different levels of perception, we run the experiment with different penetration rate under three levels of perception, and the results are presented in Figure 15.
One can read that S2 and S3 yield the same density estimation as the vehicle detection is enough for density estimation. Better speed estimation can be achieved on S3 since more vehicles are tracked and the speeds are measured. Again, Figure 15 indicates at least S2 is required for I-80 and US-101 to obtain an accurate traffic state estimation, and S3 is required for Lankershim Blvd to ensure the estimation quality.
In addition to above findings, another interesting finding is that when the market penetration rate is low, the regression methods usually outperform the matrix completion-based methods, while the matrix completion-based methods outperform the regression-based methods when the market penetration rate is high.
In the baseline setting, AVs are uniformly distributed in the fleet, while many studies suggest that a dedicated lane for platooning can further enhance mobility [ramezani2017capacity]. In this case, AVs are not uniformly distributed on the road. To simulate the dedicated lane, we view all vehicles on Lane 1 of I-80, Lane 1 of US-101, and Lane 1,2 of Lankershim Blvd are AVs, and all the vehicles on other lanes are conventional vehicles. To compare, we also set another scenario with the same number of vehicles, which are treated as AVs, uniformly distributed on roads. We run the proposed method on both scenarios with the rest of settings being the baseline setting, and the results are presented in Table 7.
As can be seen from Table 7, the distribution of AVs has marginal impact on the estimation accuracy. The proposed method performs similarly on the scenarios of the dedicated lane and uniformly distribution for all three roads, which is probably because the detection range of LiDAR is large enough to cover the width of the roads.
5.7 Sensitivity analysis
In this section, we examine the sensitivity of various factors (e.g. LiDAR detection range, sampling rate, detection missing rate, and speed detection noise) in our experiments.
LiDAR detection range. The detection range of LiDAR varies in a wide range for different brands [lidar]. We run the proposed estimation method using different detection rage ranging from 10 meters to 70 meters, and the rest of settings are the baseline setting. The estimation accuracy for each road is presented in Figure 16.
One can read that the estimation error reduces for both density and speed when the detection range increases. The gain in estimation accuracy becomes marginal when the detection range is large. For example, when the detection range exceed 40 meters on US-101, the improvement of the estimation accuracy is negligible. Another interesting observation is that, on Lankershim Blvd, even 70-meter detection range cannot yield a good density estimation with 5% market penetration rate.
Sampling rate. Recall that the sampling rate denotes the frequency of the message (which contains the location/speed of itself and detected vehicles) to the data center, as discussed in section 2.4. When the sampling rate is low, we conjecture that the data center received fewer messages, which increases the estimation error. To verify our conjecture, we run the proposed estimation method with different sampling rate ranging from 0.3Hz to 10Hz, and the rest of settings are the base setting. The estimation accuracy on each road is further plotted in Figure 17.
The estimation accuracy increases when the sampling rate increases for all three roads, as expected. The density estimation is more sensitive to the sampling rate than the speed estimation. This is probably because the density changes dramatically in time-space region, while the speed is relatively stable.
Detection missing rate. The AVs might overlook a certain vehicle during the detection, and we use the missing rate to denote the probability. We examine the impact of the missing rate by running the proposed estimation method with different missing rate ranging from 0.01 to 0.9, and the rest of settings are the baseline setting. We plot the estimation accuracy for each road separately, as presented in Figure 18.
From Figure 18 one can read that the estimation error increases when the missing rate increases for all three roads. The density estimation is much more sensitive to the missing rate than the speed estimation. This is because that overlooking vehicles has a significant impact on density estimation, while speed estimation only needs a small fraction of vehicles being observed.
Noise level in speed detection. We further look at the impact of noise in speed detection. We assume that the speed of a certain vehicle is detected with noise, and the noise level is denoted as . If the true vehicle speed is , we sample from the uniform distribution , and then the detected vehicle speed is assumed to be . We run the proposed estimation method by sweeping from to , and the rest of settings are baseline setting. The estimation accuracy is presented in Figure 19.
Surprisingly the proposed method is robust to the noise in speed detection, as the estimation errors remain stable when the speed noise level increases. One explanation for this is that the speed of each cell is computed by averaging the detected speeds from multiple vehicles, hence the detection noise is complemented and reduced based on the Law of large numbers.
This paper proposes a high-resolution traffic sensing framework with autonomous vehicles (AVs). The framework leverages the perception power of AVs to estimate the fundamental traffic state variables, namely, flow, density and speed, and the underlying idea is to use AVs as moving observers to detect and track vehicles surrounded by AVs. We discuss the potential usage of each sensor mounted on AVs, and categorize the sensing power of AVs into three levels of perception. Then the data-driven traffic sensing framework is rigorously formulated. The proposed framework consists of two steps: 1) directly observation of the traffic states using AVs; 2) data-driven estimation of the unobserved traffic states. In the first step, we define the direct observations under different perception levels. The second step is done by estimating the unobserved density using matrix-completion methods, followed by the estimation of unobserved speed using either matrix-completion methods or regression-based methods. The implementation details of the whole framework are further discussed.
The Next Generation Simulation (NGSIM) data is adopted to examine the accuracy and robustness of the proposed framework. The proposed estimation framework is examined extensively on I-80, US-101 and Lankershim Boulevard. In general, the proposed framework estimates the traffic states accurately with a low AV market penetration rate. The speed estimation is always easier than density estimation, as expected. Results show that, with 5% AV market penetration rate, at least S2 is required for I-80 and US-101 to obtain an accurate traffic state estimation, while S3 is required for Lankershim Blvd to ensure the estimation quality. During the estimation of speed, all the coefficients in the Lasso regression can be interpreted by the traditional flow theory. In addition, sensitivity analysis regarding AV penetration rate, sensor configuration, speed detection noise, perception accuracy was conducted.
This study will help policymakers and private sectors (e.g Uber, Waymo) understand the values of AVs in traffic operation and management, especially the values of massive data collected by AVs. Hopefully, new business models to commercializing the data [mobilityeco] or collaborations between private sectors and public agencies can be triggered. In the near future, we will examine the sensing capabilities of AVs at network level and extend the proposed traffic sensing framework to large-scale networks. Another interesting research direction is to investigate the privacy issue when AVs share the observed information with the data center.
This research is funded in part by Traffic 21 Institute and Carnegie Mellon University’s Mobility 21, a National University Transportation Center for Mobility sponsored by the US Department of Transportation.