Risk-Aware Reasoning for Autonomous Vehicles
A significant barrier to deploying autonomous vehicles (AVs) on a massive scale is safety assurance. Several technical challenges arise due to the uncertain environment in which AVs operate such as road and weather conditions, errors in perception and sensory data, and also model inaccuracy. In this paper, we propose a system architecture for risk-aware AVs capable of reasoning about uncertainty and deliberately bounding the risk of collision below a given threshold. We discuss key challenges in the area, highlight recent research developments, and propose future research directions in three subsystems. First, a perception subsystem that detects objects within a scene while quantifying the uncertainty that arises from different sensing and communication modalities. Second, an intention recognition subsystem that predicts the driving-style and the intention of agent vehicles (and pedestrians). Third, a planning subsystem that takes into account the uncertainty, from perception and intention recognition subsystems, and propagates all the way to control policies that explicitly bound the risk of collision. We believe that such a white-box approach is crucial for future adoption of AVs on a large scale.
Over the past hundred years, innovation within the automotive industry has created more efficient, affordable, and safer vehicles, but progress has been incremental so far. The industry now is on the verge of a substantial change due to the advancements in Artificial Intelligence (AI) and Autonomous Vehicle (AV) sensing technologies. These advancements offer the possibility of significant benefits to society, saving lives, and reducing congestion and pollution. Despite the progress, a significant barrier to large scale deployment is safety assurance. Most technical challenges are due to the uncertain environment in which AVs operate such as road and weather conditions, errors in perception and sensory input data, and uncertainty in the behavior of the pedestrians and agent vehicles. A robust AV control algorithm should account for different sources of uncertainty and generate control policies that are quantifiably safe. In addition, algorithms that respect precise safety measures can assist policymakers addressing legislative issues related to AVs, such as insurance policies and ultimately convince the public for a wide deployment of AVs.
One of the most prevalent measures for AV safety is the number of crashes per million miles . Although such a measure provides some estimate on overall safety performance in a particular environment, it fails to capture unique differences and the richness of individual scenarios. As AVs become more prevalent, the reasoning behind individual events becomes of critical importance as the public would require transparency and explainable AI. Recent AV fatal crashes raise further debates among scholars and pioneers in the industry concerning how an autonomous vehicle should act when human safety is at risk. On a more philosophical level, a study  sheds light on the major challenges of understanding societal expectations about the principles that should guide the decision making in life-critical situations. As an illustrative example, suppose a self-driving vehicle, experiencing a partial system failure, forced into an ultimatum choice between running over pedestrians or sacrificing itself and its passenger to save them. What should be the reasoning behind such a situation, and more fundamentally, what should be the moral choice? Despite the profound philosophical dilemma and the impact on the public perception of AI as a whole and the regulatory aspects for AVs in particular, the current state-of-the-art of the technological stack of AVs does not explicitly capture and propagate uncertainty sufficiently well throughout decision processes in order to accurately assess these edge scenarios.
In this work, we discuss algorithmic pipeline and a technical stack for AVs to capture and propagate uncertainty from the environment throughout perception, prediction, planning, and control. An AV has to be able to plan and optimize trajectories from its current location to a goal while avoiding static and dynamic (moving) obstacles, while meeting deadlines and efficiency constraints. The risk of collision should be bounded by a given safety threshold that meets governmental regulations, while meeting deadlines should meet a quality of service threshold.
To expand AV perception range, we consider the Vehicular Ad-Hoc Network (VANET) communication model. Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I), and more recently Vehicle-to-Everything (V2X), are technologies that enable vehicles to exchange safety and mobility information between each other and with the surrounding agents, including pedestrians with smart phones and smart wearables. Vehicles can collect information en route, such as road conditions and position estimates of static and dynamic objects, and can use this information to continuously predict actions performed by other vehicles and infrastructure. V2V messages would have a range of approximately 300 meters, which exceeds the capabilities of systems with cameras, ultrasonic sensors, and LIDAR, allowing greater capability and time to warn vehicles.
In this work, we propose a system architecture (Sec. II) and discuss key challenges in quantifying uncertainty at different levels of abstractions: scene representation (Sec. III), intention recognition (Sec. IV), risk-bounded planning (Sec. V), and control (Sec. VI). We highlight current state-of-the-art, and propose research directions at each level.
Ii System Architecture
In the following, we present the architecture of a risk-aware AV stack with six technical objectives in mind:
A probabilistic perception and object representation system that takes into consideration uncertainty that arises from hardware modalities and sensor fusion. The system will capture uncertainty in object classification, bounding geometries, and temporal inconsistencies under diverse conditions.
Leverage the communication network to gain knowledge of the surrounding agents (vehicles and pedestrians) that are beyond line-of-sight, and then improve upon scene representation.
An intention recognition system that takes into account all dynamic objects (vehicles and pedestrians), from perception and V2X communication, and estimates a distribution over potential future trajectories.
On a higher level, propose goal-directed autonomous planners that strive to meet the passenger goals and preferences, and help the passengers to think through adjustments to their goals, when they can’t be safely met.
To ensure that decisions are made in a timely manner, design polynomial-time approximation algorithms that offer formal bounds on sub-optimality, and which produce near-optimal results.
In addition, by specifying the probability that a plan is executed successfully, the system operator or policymaker can set the desired level of conservatism in the plan in a meaningful manner and can trade conservatism against performance. Fig. 1 shows the interaction between key components of the system as we illustrate throughout the paper.
Iii Probabilistic Scene Representation
Scene understanding is research topic with strong impact on technologies for autonomous vehicles. Most of the efforts have been concentrated on understanding the scenes surrounding the ego-vehicle (autonomous vehicle itself). This is composed by sensor data processing pipeline that includes different stages such as low-level vision tasks, detection, tracking and segmentation of the surrounding traffic environment –e.g., pedestrian, cyclists and vehicles. However, for an autonomous vehicle, these low-level vision tasks are insufficient to comprehensive scene understanding. It is necessary to include reasoning about the past and the present of the scene participants. This paper intends to guide future research on interpretation of traffic scene in autonomous driving from a probabilistic event reasoning perspective.
Iii-a Probabilistic Context Layout for Driving
Scene representation includes context representations that include spatially geometrical relationships  among different traffic elements with certain semantic labels. It is different from the semantic segmentation frameworks , , because the context representation does not only contain the static components of traffic scene (typical technique for this aspect is simultaneous localization and mapping (SLAM)), such as road, the type of traffic lanes, traffic direction, and participant orientation, but also consists of several kinds of dynamic elements, e.g., motion correlation of participants. The study , has given a detailed review on semantic segmentation, taking the traffic geometry inferring into consideration.
A key aspect of context representation is to extract salient features from a large set of sensor data. For that purpose, it is necessary to establish a saliency mechanism, that is a critical region extraction and information simplification technique that is widely used for attractive region selection in images. Over the past few decades, saliency has been generally formulated as bottom-up and top-down modes. Bottom-up modes ,  are fast, data-driven, pre-attentive and task-independent. Top-down approaches , , ,  often entail supervised learning with pre-collected task labels by a large set of training examples and are task-oriented and vary in different environments.
A recent work  presents a fast algorithm that obtains a probabilistic occupancy model for dynamic obstacles in the scene with few sparse LIDAR measurements. Typically the occupancy states exhibit highly nonlinear patterns that cannot be captured with a simple linear classification model. Therefore, deep learning models and kernel-based models can be considered as potential candidates. However, these approaches require either a massive amount of data or a high number of hyper-parameters to tune. A promising future direction is to extend this approach to account for different object classes (rather than occupancy map) and other sensors as well such as cameras.
Iii-B Beyond Line-of-sight
Any sensing modality has blind spots. For objects that lie beyond-line-of-sight, one can consider a communication network to improve upon the scene representation. This can be critical in certain edge scenarios. For example, in Fig. 2, the ego-vehicle (red) has two options: either maintain speed or overtake the vehicle ahead. Suppose that another agent vehicle is approaching from a distance that is not detected by onboard sensors of the ego-vehicle. In this scenario, both the speed and location of the distant vehicle might not be accurately estimated, therefore maneuver leading to a collision.
There has been substantial progress for the standardization of vehicle-to-everything/V2X (V2V/V2I/V2P) communication protocols. The major V2X standards are known as DSRC (Dedicated Short-Range Communications)  as well as 5G . The introduction of 5G’s millimeter-wave transmissions brings a new paradigm to wireless communications. Depending on the application, 5G positioning can also enhance tracking techniques, which leverage short-term historical data (local signatures and key features). Uncertainty can be captured by probabilistic models (e.g., Gaussian) through sampling temporal inconsistencies in historical data streams such as localization data, and parameter tuning.
Iv Intention Recognition
This subsystem involves prediction and machine learning tasks to reliably estimate the future trajectories of uncontrollable agents in the scene, including pedestrians and other agent vehicles. Many existing trajectory prediction algorithms [8, 28] obtain deterministic results quite efficiently. However, these approaches fail to capture the uncertain nature of human actions. Probabilistic predictions are beneficial in many safety-critical tasks such as collision checking and risk-aware motion planning. They can express both the intrinsically uncertain prediction task at hand (human nature) and reasoning about the limitations of the prediction method (knowing when an estimate could be wrong ). To incorporate uncertainties into prediction results, data-driven approaches can learn common characteristics from datasets of demonstrated trajectories [25, 27]. These methods often express uni-modal predictions, which may not perform well in sophisticated urban scenarios where the driver can choose among multiple actions. A recent work  presents a hybrid approach using a variational neural network that predicts future driver trajectory distributions for the ego-vehicle based on multiple sensors in urban scenarios. The work can be extended in future to predict trajectories for agent-vehicles using V2V data streams, if available.
We propose a simple intent recognition that is divided into two steps. First we continuously record high-level maneuvers of surrounding vehicles (both off-line and online). Examples of such maneuvers are merge left, merge right, accelerate all at different velocities and variations and so on. Each of these maneuvers comprises of a set of collected trajectories. Due to the uncertainties in the motions of human-driven vehicles, we learn a compact motion representation called Probabilistic Flow Tube (PFT)  from demonstrating trajectories to capture human-like driver styles and uncertainties for each maneuver. A library of pre-learned PFTs can be used to estimate the current maneuver as well as predict the probabilistic motion of each agent vehicle using a Bayesian approach.
V Risk-bounded Planning
Deterministic optimization approaches have been well developed and widely used in several disciplines and industries, in order to optimize processes both off-line and on-line. In this work, we characterize uncertainty in a probabilistic manner and find the optimal sequence of ego-vehicle trajectory control, subject to the constraint that the probability of failure must be below a certain threshold. Such constraint is known as a chance constraint. In many applications, the probabilistic approach to uncertainty modeling has a number of advantages over a deterministic approach. For instance, disturbances such as vehicle wheel slip can be represented using a stochastic model. When using a Kalman Filter for enhancing localization, the location estimate is provided as a probabilistic distribution. In addition, by specifying the probability that a plan is executed successfully, the system operator or policymaker can set the desired level of conservatism in the plan in a meaningful manner and can trade conservatism against performance. Therefore, robustness is achieved by designing solutions that guarantee feasibility as long as disturbances do not exceed these bounds. Furthermore, if the passenger goals cannot be safely achieved, then the chance constraints can be analyzed to pinpoint the sources of risk, and the user goals can be adjusted, based on their preferences, in order to restore safety.
Reasoning under uncertainty has several challenges. The optimization problem of trajectory optimization is non-convex, due to discrete choices and the presence of obstacles in the feasible space. One approach to tackle the challenges is by introducing multiple layers of abstractions. Instead of solving high-level problems (e.g., route planning) and low-level problems (e.g., steering wheel angle, acceleration, and brake commands) in a single shot, one can decouple them into sub-problems. We achieve such hierarchy through a high-level planner, short-horizon planner, and precomputed and learned maneuver trajectories as we illustrate below.
V-a High Level Planner
High-level planning involves route planning, applying traffic rules, and consequently setting short-term objectives (aka set points), which will be fed into Short Horizon Planner (as shown in Fig. 1). The planner adjusts those short-term objectives when no safe solution exists. To be able to model the feasibility of an obtained plan, we leverage Temporal Plan Networks (TPN) . A TPN is a graph where the nodes represent events, and the edges represent activities. In temporal planning, the ego-vehicle is presented with a series of events and must decide precisely when to schedule them. STNs with Uncertainty (STNUs) is an extension allowing to reason over stochastic, or uncontrollable, actions and their corresponding durations . Such formalism allows to check the feasibility of a high-level plan and prompt the user to adjust his or her intermediate goals and time constraints to output smooth intermediate plans, fed into the short horizon planner.
V-B Short Horizon Planner
Planning under uncertainty is a fundamental area in artificial intelligence. For the application of AV, it is crucial to plan for potential contingencies instead of planning a single trajectory into the future. This often occurs in dynamic environments where the vehicle has to react quickly (in milliseconds) to any potential event. Partially observable Markov decision processes (POMDP)[12, 24] provide a model for optimal planning under actuator and sensor uncertainty, where the goal is to find policies (contingency plans) that maximize (or minimize) some measure of expected utility (or cost).
In many real-world applications, a single measure of performance is not sufficient to capture all requirements (e.g., an AV tasked to minimize commute time while keeping the distance from obstacle below a given threshold). This extension is often called constrained POMDP (C-POMDP) . When constraints involve stochasticity (e.g., distance following a probabilistic model), the problem is modeled as chance-constrained POMDP (CC-POMDP) , where we have a bound on the probability of violating constraints. To calculate the risk of each decision, one can leverage the probabilistic flow-tube (PFTs) concept to model a set of possible trajectories . The current state-of-the-art solver of CC-POMDP is called RAO* . RAO* generates a conditional plan based on action and risk models and likely possible scenarios for agent vehicles.
RAO* explores from a probability distribution of vehicle states (belief state), by incrementally constructing a hypergraph, called the explicit hyper-graph shown in Fig. 3. At each node of the hyper-graph, the planner considers possible actions provided by Motion Model Generator (see Fig. 1) and receives several possible observations. At each level, it utilizes a value heuristic to guide the search towards optimal policies. It also uses a risk heuristic to prune the search space, removing high-risk branches that violate the chance constraints. Hence, at each level, the action that maximizes expected reward and meets chance constrained is selected for the vehicle. However, one of the drawbacks of RAO* is that it does not always return optimal solutions and also does not provide any bound on the sub-optimality gap. In a recent work , we provide an algorithm that provides guarantee on optimality (namely, a fully polynomial time approximation scheme (FPTAS)) while preserving safety constraints, all within polynomial running time.
Recently  applied RAO* for the application of self-driving vehicles under restricted settings (e.g., known distribution of actions taken by agent-vehicles). CC-POMDP, while otherwise expressive, allow only for sequential, non-durative actions. This poses restrictions in modeling real-world planning problems. In our recent ongoing work, we extend the framework of CC-POMDP to account for durative actions, and leverage heuristic forward search to prune the search space to improve upon the running time.
Vi Motion Model Generator
Based on each driving scenario, we compute a library of maneuvers. Each maneuver is associated with nominal control signals by solving a model predictive control (MPC) optimization problem . The set of possible maneuver actions are constrained by traffic rules and vehicle dynamics and are informed by the expected evolution of the situation. Computing the actions can be accomplished through offline and online computation, and also through publicly available datasets (e.g., Berkeley DeepDrive BDD100k).
The size of the search space of CC-POMDP, described above, is sensitive to the number of maneuver actions. To tackle this issue, we consider three different levels for abstractions. i) Micro Actions are primitive actions like Accelerate, Decelerate, Maintain. ii) Maneuver Actions are sequences of micro actions like Merge left, Merge right, iii) Macro Actions are sequences of maneuver actions such as pass the front vehicle, go straight until next intersection .
To calculate the risk of collision, we leverage PFT, which represents a sequence of probabilistic reachable sets. PFTs show probabilistic future predictions for states of the vehicles under a selected action. In this context, the intersection between two, temporally aligned, PFT trajectories represents the risk of collision. To construct PFTs, we use vehicle dynamics and also probabilistic information about uncertainties, as well as through learning from datasets. By propagating the probability distributions of uncertainties through the continuous dynamics of the vehicle, we construct probability distributions for the locations of the vehicle over a finite planning horizon.
In this work, we proposed a system architecture for risk-aware AVs that can deliberately bound the risk of collision below a given threshold, defined by the policymaker. We presented the related work, discussed key challenges, and proposed research directions in three key subsystems: perception, intention recognition, and risk-aware planning. We believe that our white-box approach is crucial for a better understanding of AV decision making and ultimately for future adoption of AVs on a large scale.
-  (2019-07) Faster dynamic controllability checking in temporal networks with integer bounds. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 5509–5515. External Links: Cited by: §V-A.
-  (2016) The social dilemma of autonomous vehicles. Science 352 (6293), pp. 1573–1576. Cited by: §I.
-  (2012) Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes. International Journal of Social Robotics 4 (4), pp. 357–368. Cited by: §IV, §V-B.
-  (2019) Deep learning architectures for accurate millimeter wave positioning in 5g. Neural Process Letters https://doi.org/10.1007/s11063-019-10073, pp. 1–28. Cited by: §III-B.
-  (2010) VANET: vehicular applications and inter-networking technologies. Vol. 1, Wiley Online Library. Cited by: §III-B.
-  (2016) Exemplar-driven top-down saliency detection via deep association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5723–5732. Cited by: §III-A.
-  (2017) Temporally and spatially flexible plan execution for dynamic hybrid systems. Artificial Intelligence 247, pp. 266–294. Cited by: §V-A.
-  (2013) Vehicle trajectory prediction based on motion model and maneuver recognition. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pp. 4363–4369. Cited by: §IV.
-  (2018) Hybrid risk-aware conditional planning with applications in autonomous vehicles. In 2018 IEEE Conference on Decision and Control (CDC), pp. 3608–3614. Cited by: §V-B, §VI.
-  (2019) Uncertainty-aware driver trajectory prediction at urban intersections. arXiv preprint arXiv:1901.05105. Cited by: §IV.
-  (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519. Cited by: §III-A.
-  (1998) Planning and acting in partially observable stochastic domains. Artificial intelligence 101 (1-2), pp. 99–134. Cited by: §V-B.
-  (2016) Driving to safety: how many miles of driving would it take to demonstrate autonomous vehicle reliability?. Transportation Research Part A: Policy and Practice 94, pp. 182–193. Cited by: §I.
-  (2017) What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems, pp. 5574–5584. Cited by: §IV.
-  (2019-07) Approximability of constant-horizon constrained pomdp. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 5583–5590. External Links: Cited by: 4th item, §V-B.
-  (2017) Road geometry estimation for urban semantic maps using open data. Advanced Robotics 31 (5), pp. 282–290. Cited by: §III-A.
-  (2013) Sequential bayesian model update under structured scene prior for semantic road scenes labeling. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1321–1328. Cited by: §III-A.
-  (2010) The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PloS one 5 (4), pp. e10047. Cited by: §III-A.
-  (2015) Decentralized control of partially observable markov decision processes using belief space macro-actions. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 5962–5969. Cited by: §VI.
-  (2016) Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606. Cited by: §III-A.
-  (2015) Approximate linear programming for constrained partially observable markov decision processes. In Twenty-Ninth AAAI Conference on Artificial Intelligence, Cited by: §V-B.
-  (2016) RAO*: an algorithm for chance constrained pomdps. In Proc. AAAI Conference on Artificial Intelligence, Cited by: 4th item, §V-B.
-  (2018) Automorphing kernels for nonstationarity in mapping unstructured environments.. In CoRL, pp. 443–455. Cited by: §III-A.
-  (1971) The optimal control of partially observable markov decision processes.. PhD the sis, Stanford University. Cited by: §V-B.
-  (2009) Growing hidden markov models: an incremental tool for learning and predicting human and vehicle motion. The International Journal of Robotics Research 28 (11-12), pp. 1486–1506. Cited by: §IV.
-  (2013) Saliency detection by multiple-instance learning. IEEE transactions on cybernetics 43 (2), pp. 660–672. Cited by: §III-A.
-  (2012) Probabilistic trajectory prediction with gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium, pp. 141–146. Cited by: §IV.
-  (2017) Lane-change detection based on vehicle-trajectory prediction. IEEE Robotics and Automation Letters 2 (2), pp. 1109–1116. Cited by: §IV.
-  (2017) Training a network to attend like human drivers saves it from common but misleading loss functions. arXiv preprint arXiv:1711.06406. Cited by: §III-A.
-  (2016) Top-down visual saliency via joint crf and dictionary learning. IEEE transactions on pattern analysis and machine intelligence 39 (3), pp. 576–588. Cited by: §III-A.
-  (2016) Instance-level segmentation for autonomous driving with deep densely connected mrfs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 669–677. Cited by: §III-A.
-  (2017) A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing 14 (2), pp. 119–135. Cited by: §III-A.