Behavior Planning of Autonomous Cars with Social Perception
Autonomous cars have to navigate in dynamic environment which can be full of uncertainties. The uncertainties can come either from sensor limitations such as occlusions and limited sensor range, or from probabilistic prediction of other road participants, or from unknown social behavior in a new area. To safely and efficiently drive in the presence of these uncertainties, the decision-making and planning modules of autonomous cars should intelligently utilize all available information and appropriately tackle the uncertainties so that proper driving strategies can be generated. In this paper, we propose a social perception scheme which treats all road participants as distributed sensors in a sensor network. By observing the individual behaviors as well as the group behaviors, uncertainties of the three types can be updated uniformly in a belief space. The updated beliefs from the social perception are then explicitly incorporated into a probabilistic planning framework based on Model Predictive Control (MPC). The cost function of the MPC is learned via inverse reinforcement learning (IRL). Such an integrated probabilistic planning module with socially enhanced perception enables the autonomous vehicles to generate behaviors which are defensive but not overly conservative, and socially compatible. The effectiveness of the proposed framework is verified in simulation on an representative scenario with sensor occlusions.
The driving environment of autonomous vehicles (AVs) are dynamic and can be full of uncertainties. First, the future behaviors and trajectories of other traffic participants, such as pedestrians or vehicles with human drivers, are probabilistic in nature. It is difficult to predict them precisely, particularly in highly interactive driving scenarios. Beyond that, the implicit social behavior on local driving preferences and styles is also hard to describe exactly when the AVs are adapting themselves to a new environment. Moreover, the detection and tracking modules can produce lots of physical state uncertainties due to the algorithmic limitation in terms of unsatisfactory performance, as well as the physical limitation such as sensor field-of-view occlusions and limited sensor range.
To generate safe and efficient maneuvers of autonomous vehicles, the decision-making and planning modules of AVs should be able to properly tackle all the uncertainties in the preceding modules such as perception and prediction. Research efforts were devoted recently to designing decision-making and planning algorithms under behavioral uncertainties from prediction. For example, an interactive belief-state planner proposed in  used Partially Observable Markov Decision Process (POMDP) to deal with the behavior uncertainties of other vehicles. A decision-making framework was also constructed in  to deal with uncertain behavior of other vehicles at intersections considering potential violations.
While focusing on the behavior uncertainties, there is a common assumption in the aforementioned work, that is, the physical states of other traffic participants are deterministic and accurate. The perception module is assumed to provide perfect results on whether an object exist or not, or what the current positions, velocities and orientations of different objects are. However, such an assumption can hardly hold in practice. Even if the sensors, such as cameras and LiDARs, can well capture the objects, it is impossible for the most state-of-the-art algorithms to achieve perfect perception, as evidenced by the 3D detection results for the "easy" cases on KITTI benchmark .
In addition to the algorithmic limitations, the physical limitations can also lead to uncertainties of the physical states of objects. Physical limitations in perception mainly include occlusion and limited sensor range . Occlusion, an inevitable encounter of autonomous vehicles, causes great challenges for tracking, prediction and risk assessment  , and therefore poses significant impacts on the performance of decision-making and planning. To deal with occlusions, a safe driving strategy was proposed in  at blind intersections. Focusing also on blind intersections,  directly designed a planning method based on inverse reinforcement learning (IRL). In , a decision-making approach under occlusions was proposed in the framework of POMDP.
Most of the works, however, treats all other traffic participants such as pedestrians or human-driven vehicles only as objects/obstacles to avoid. In fact, they are all intelligent agents whose behaviors can be quite informative. Hence, our key observation is that human participants should be treated not only as dynamic obstacles, but also as distributed sensors that provide via their behaviors additional information about the environment beyond the scope of physical sensors. We call this concept social perception. The decision-making and planning modules of autonomous cars should explicitly exploit the enhancement offered by the social perception.
Figure 1 demonstrates several exemplar scenarios where other road users can serve as sensors to overcome occlusions or limited sensor range. In Fig. 1(a), the host vehicle V0 cannot detect the pedestrian due to the occlusion caused by V1 and V2. However, it can be inferred that the most probable reason for V1 to decelerate is a pedestrian crossing the street. Therefore, the behavior of V1 can be exploited as a sensor to enable social perception for potential pedestrians. In Fig. 1(b), the host vehicle V0 is making a right turn at a one-way-stop T-intersection. It should yield to a potential vehicle V3, but the view is occluded by street-parked vehicles. However, V1 and V2 on the left-turn-only lane keep moving and making left turn, which indicates that there should be no vehicle in the occluded area or a vehicle if any might be relatively far away, and V0 may proceed to turn. Figure 1(c) shows a signalled intersection. The host vehicle V0 (turning right) can only detect the signal (red light) in front of it controlling its direction. It should yield to V3 and V4 which are still with relatively high speeds. However, V1 and V2 on the left-turn-only lane accelerate, which indicates that there is a protected left turn for them, and V0 can proceed to turn right. Therefore, the social perception is needed when the motion attributes of others are out of the limited sensor range.
Inferring the physical states from the behavior of others as described in the examples is one aspect of social perception. For example,  infers the map occupancy from behaviors of human drivers based on manually designed rules. A more important aspect for social perception is that it can go beyond the perception of physical states and extend to the perception of social information existed within a group of social agents. Courtesy  is one of the representative social information to be extracted. Socially cohesive behavior was analyzed and designed in  by assuming that the behaviors of others (for instance, human drivers) were often correct and similar behaviors should be generated by autonomous vehicles.
Integrating the social perception into the decision-making and planning modules of autonomous vehicles is extremely important to enable safer and more efficient maneuvers in the presence of corresponding uncertainties. Collisions could be potentially avoided (Fig. 1(a)) and the behavior of the autonomous vehicle can be more efficient, less conservative (Fig. 1(b) and (c)), and more socially compatible so that both the passengers and the other human drivers will not be surprised or annoyed. In this paper, we explicitly incorporate the social perception into a probabilistic planner based on Model Predictive Control (MPC), and propose a unified planning framework to handle the above mentioned uncertainties.
Ii Problem Statement
In this paper, we consider the behavior planning of an autonomous car in a multi-agent environment with perception uncertainties. Except for the autonomous car, denoted as , we assume all other agents to be human, represented by . Hence, we do not explicitly model the interactions among human, but focus on the interaction between the robot car and an individual human. As for the perception uncertainties, we consider two types of uncertainties as defined above: the physical state uncertainties such as occlusions and limited sensor range, and the social behavioral uncertainties such as local driving preferences.
Throughout the paper, we let and denote, respectively, the robot car’s states and control inputs, and and for those of human . In a traffic scene with human participants, the states of all agents become , and the environment states can be represented by where is the non-agent related states such as traffic lights. We use to represent the social information set. For each agent, we have
where and describe, respectively, the dynamics of the autonomous car and human . The closed-loop dynamics of the whole multi-agent system becomes
We assume that all agents in the scene are noisily optimal planners. Namely, at time , each agent behaves to minimize its own cost function based on its estimates of the environment states () and the social information () inferred from observations, denoted by . Let and be, respectively, the cost functions of the robot car and human at time over a horizon of :
where is the sequence of control actions of the agent within the horizon ( for the robot car and for human ). with represent, respectively, the preferences of the robot car and human . At every time step , all agents generate their optimal sequences of actions by minimizing their corresponding cost functions , execute the first steps (i.e., set in (1) and (2) and re-plan at the next time step at .
As shown in (II), robot cars generate behaviors based on their estimates of the environment states and the social information set. If the estimates significantly deviate from the ground truth, unexpected or even dangerous behaviors might be generated. For environment states, current practice is to set the estimates as the de-noised observations of the robot car from physical sensors, i.e., . However, due to occlusions and limited sensor ranges, the states observation of the robot car might be a subset or even different from the actual states , which makes not an effective solution. As for the social information , it is a set of variables that cannot be directly perceived by physical sensors. Hence, to enable better autonomous driving strategies under such perception uncertainties, more advanced perception/inference scheme is desired to update and from observations.
Iii Social Perception
Our key observation is that human traffic participants should be treated not only as dynamic obstacles that the robot cars need to be aware of, but also as distributed sensors the behaviors of which can provide additional information beyond the scope of physical sensors equipped with the autonomous vehicles.
Iii-a Distributed Agents as Distributed Sensors
Again consider the multi-agent system consisting of one robot car and humans. With physical sensors subjected to occlusions and limit range, each agent can observe only a subset of the environment states, denoted by with . Based on the corresponding observation , each agent extracts their estimates, and , on the environment states and the social information, respectively. The estimates of different agents will then influence their next-step actions/trajectories which can be perceived by other agents. Note that due to their distributed locations, observations and the associated estimates of different agents can significantly differ, but be complementary to each other. Hence, the distributed agents can be viewed as distributed sensors which emit behavioral signals. By observing such behavioral signals, the robot car can infer estimates and from human to reduce its perception uncertainties coming from either algorithmic limitations, or physical limitations, or both.
Iii-B Inference Algorithm
Iii-B1 Modeling the behavior generation function of human
As discussed in section II, we assume that each human is a noisily optimal planner, and consider the interaction between the robot car and humans when modeling the humans’ behaviors. Thus, at each time period ( steps) starting at , the behavior sequence minimizes the human’s cost function as given in (II) based on his/her estimates. Namely, the behavior generation function of the human can be expressed as
and the optimal cost is given by
Note that in (5) - (7), we model the human behavior generator as his/her optimal response function to the robot car’s input to explicitly address the influences from the robot car, as in . Hence, if the robot car can access the humans’ cost function parameters and their estimates, it can calculate the best behavioral responses from them.
Iii-B2 Updating beliefs on estimates via inference
To use humans as sensors of the environment, we need to construct observation models for the robot car to update its beliefs on estimates. For environment states and social information, different observation models are designed.
Updating beliefs on state estimates. At every step , the robot can update its beliefs on the state estimates from behaviors of human via:
is the probability/likelihood of human taking action if the human’s estimates were indeed . To get , we assume that actions with higher cost are exponentially less likely based on maximum entropy principle as in . This means that can be approximated by:
where represents the optimal cost to go by taking action , given by
Updating beliefs on social information. As defined in Section I, social information refers to the group behaviors of human in the traffic scene. Therefore, to update the beliefs on estimates of social information , the robot car need to collect the common behaviors from multiple human. Therefore, the belief update process becomes:
Iii-C Learning cost functions of human
As discussed above, for the robot car to update its beliefs by collecting behavioral information from of human, the robot car needs to have access to the cost functions of human, so that it can evaluate and . One can obtain such cost functions via inverse reinforcement learning (IRL) [16, 15, 17, 18]. Note that during the learning process, we assume that the demonstrations are sub-optimal and there is no perception uncertainties, i.e., . A brief review of the IRL algorithm is given below.
The single-step cost is assumed to be parametrized as a linear combination of features (the social information is assumed to be invariant within one horizon):
Over a horizon of , the cumulative cost function is
Our goal is to find the weights which maximizes the likelihood of the demonstration set :
Building on the principle of maximum entropy, we assume that trajectories are exponentially more likely when they have lower cost:
Thus the probability (likelihood) of the demonstration set becomes
where is the number of trajectories in .
With the assumption of locally optimal demonstrations, we have in (III-C). This simplifies the partition term as a Gaussian Integral where a closed-form solution exists (see  for details). Substituting (17) and (III-C) into (15) yields the optimal parameter as the maximizer.
Iv Human-Like Behavior Planning With Social Perception
In this section, we will discuss how to integrate the social perception into the decision-making and planning module to enable a more human-like driving strategy in terms of defensiveness, non-conservativeness, and social compatibility.
Iv-a The behavior planner under uncertainties
Due to the probabilistic nature of beliefs in (8) and (11), we utilize a probabilistic framework based on Model Predictive Control (MPC) as in  as the planner for the autonomous cars. The cost function of the robot car is defined as an expected cost over the beliefs:
where is a cumulative cost over a horizon of , as defined in (II). Note that with a long horizon , discrete representation of and is practically not feasible. In this case, we will use representative motion patterns to represent as in .
Cost function design. We consider safety, efficiency, comfort, and fuel consumption in the cost. Thus, we penalize the following terms and the weight of each term can be learned via IRL as addressed in Section III-C.
tracking error: where is the distance from the position of the robot car at time step to the desirable traffic-free reference path.
safety term: we use the relative distances from surrounding participants to evaluate the safety term. Define , where is the number of surrounding cars and is the distance of the robot car to the -th one. Note that can be obtained via the behavior generator in (5) and the dynamics equation in (2) based on current beliefs .
efficiency: we penalize the difference between the speed of the robot car and the traffic limit , given by .
acceleration: where is the acceleration input at time step of the robot car.
Note that in the cost function belongs to the set of social information. We use instead of to allow the robot car to infer current traffic speed and follow it. To assure that the robot car does not break the traffic rules, we expose the maximum allowable speed limit as a constraint below.
Constraints. To guarantee the feasibility of the planned trajectories, we introduce the following constraints:
kinematics constraints: we use Bicycle model  to describe the kinematics model of the robot cars.
dynamics constraints: we constrain curvatures and accelerations of the vehicle as follows:
where is the curvature of the planned trajectory at -th step, and is the boundary of feasible curvatures. Both and can be calculated via the “G-G” diagram as in .
safety constraints: safety constraints come from both static road structures as well as dynamic obstacles such as human drivers and pedestrians. For static structures, we use polygons to represent them, and check the robot car’s distance to the polygons. For dynamic obstacles, we use several circles to cover them, and calculate distances between the robot car and the circles as in .
Note that in the probabilistic planner, constraints over all the beliefs should all be considered. To deal with the tailing effect, we set a threshold in practice . This means that if the belief of a certain state or a certain social variable is lower than , we will set the probability to zero in the expected cost function, and ignore the related constraints.
Iv-B The planning framework with social perception
With the probabilistic planner formulated in Section IV-A, implementation of the behavior planning framework with social perception is summarized as below:
V Simulation Results
In this section, we give an exemplar scenario with sensor occlusions to verify the effectiveness of the proposed planning framework with social perception.
Despite progresses in advanced perception and tracking algorithms, sensor occlusions are inevitable for autonomous vehicles. Consider the scenario described in Fig. 2, where the autonomous car (red) and a human-driven car (yellow) are driving side-by-side, and a pedestrian is about to cross the street. Due to the relative positions between the robot car and the human car, the view of the robot car is blocked by the human-driven car so that it cannot detect the pedestrian.
In such a scenario, a conservative autonomous car will assume that there might be potential out-of-view pedestrians crossing the street. Hence, it slows down to prepare for stops or to leave a larger gap with the human driver to get better view. Both strategies will sacrifice the efficiency of the autonomous car. On the other hand, an aggressive autonomous car might directly ignore the possibility of pedestrian crossing the street and plan to drive through the intersection directly, which might lead to a collision. We note that in either case, the autonomous car perceives the environment states only via its own physical sensors, but completely ignores the information emitted via the behaviors of the human driver.
As shown in Fig. 2, the view of the human driver in this scenario is not occluded about pedestrians crossing the street, and he/she is closer to the pedestrians if there was any. Hence, from the behavior of the human driver, the robot car can actually infer and become more confident about the probability of a pedestrian crossing.
We simulated this traffic scenario with a conservative planner, an aggressive planner and our proposed planner with social perception. The sampling period for each time step is s. Both cases with and without crossing pedestrians are considered, and the results are shown in Figs. 3 through 6.
With crossing pedestrians in the occluded area. Figures 3 and 4 show the comparison results with an aggressive planner and the proposed planner when there is a crossing pedestrian in the occluded area. We can see that in Fig. 4, with the proposed planner, when the human driver slowed down, the robot car’s belief on the existence of pedestrians increased quickly (Fig. 4(c)). Compared to the aggressive planner in Fig. 3, the updated belief enables the robot car to prepare for stops before occlusions are clear, while the aggressive planner failed to yield to the pedestrian even if it braked hard when it saw the pedestrian, as in Fig. 3.
With non-crossing pedestrians in the occluded area. We also compared the proposed planner with a conservative planner which assumes the existence of crossing pedestrians in default. Results are given in Figures 5 and 6. We can see that the proposed planner (Fig. 6) is much more efficient than the conservative planner (Fig. 5). The autonomous car with the conservative planner slowed down even if the human driver did not. On the other hand, with the proposed planner, the belief on the existence of crossing pedestrians remained low by observing the behavior of the human driver, as shown in Fig. 6(c), which enables the robot car to maintain relatively high speed and improves its efficiency.
In this paper, we proposed a unified probabilistic planning framework with social perception to deal with uncertainties from physical states, prediction of others, and unknown social information. We treated all road participants as sensors in a distributed sensor network. By observing their individual behaviors as well as group behaviors, uncertainties of different types can be reduced via a uniform belief update process. We also explicitly incorporated the social perception scheme with a probabilistic planner based on MPC, which can thus generate behaviors which are defensive but not overly conservative, and socially compatible for autonomous vehicles. Simulation results in a traffic scene with sensor occlusions were given, with comparison to a conservative planner and an aggressive planner. The results showed that the proposed framework can enable more efficient and yet defensive behaviors in the presence of perception uncertainties.
-  C. Hubmann, J. Schulz, G. Xu, D. Althoff, and C. Stiller, “A Belief State Planner for Interactive Merge Maneuvers in Congested Traffic,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 1617–1624.
-  S. Noh, “Decision-Making Framework for Autonomous Driving at Road Intersections: Safeguarding Against Collision, Overly Conservative Behavior, and Violation Vehicles,” IEEE Transactions on Industrial Electronics, vol. 66, no. 4, pp. 3275–3286, Apr. 2019.
-  “The KITTI Vision Benchmark Suite, 3d object detection.” [Online]. Available: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d
-  P. F. Orzechowski, A. Meyer, and M. Lauer, “Tackling Occlusions & Limited Sensor Range with Set-based Safety Verification,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 1729–1736.
-  E. Galceran, E. Olson, and R. M. Eustice, “Augmented vehicle tracking under occlusions for decision-making in autonomous driving,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2015, pp. 3559–3565.
-  J. Li, W. Zhan, and M. Tomizuka, “Generic Vehicle Tracking Framework Capable of Handling Occlusions Based on Modified Mixture Particle Filter,” in 2018 IEEE Intelligent Vehicles Symposium (IV), Jun. 2018, pp. 936–942.
-  M.-Y. Yu, R. Vasudevan, and M. Johnson-Roberson, “Occlusion-Aware Risk Assessment for Autonomous Driving in Urban Environments,” arXiv:1809.04629 [cs], Sep. 2018, arXiv: 1809.04629.
-  S. Hoermann, F. Kunz, D. Nuss, S. Renter, and K. Dietmayer, “Entering crossroads with blind corners. A safe strategy for autonomous vehicles,” in Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 727–732.
-  L. Y. Morales, A. Naoki, Y. Yoshihara, and H. Murase, “Towards Predictive Driving through Blind Intersections,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 716–722.
-  M. Bouton, A. Nakhaei, K. Fujimura, and M. J. Kochenderfer, “Scalable Decision Making with Sensor Occlusions for Autonomous Driving,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018, pp. 2076–2081.
-  O. Afolabi, K. Driggs–Campbell, R. Dong, M. J. Kochenderfer, and S. S. Sastry, “People as Sensors: Imputing Maps from Human Actions,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2018, pp. 2342–2348.
-  L. Sun, W. Zhan, M. Tomizuka, and A. D. Dragan, “Courteous Autonomous Cars,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2018, pp. 663–670.
-  N. C. Landolfi and A. D. Dragan, “Social Cohesion in Autonomous Driving,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2018, pp. 8118–8125.
-  D. Sadigh and A. D. Shankar S. Sastry, Sanjit Seshia, “Information gathering actions over human internal state,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016, pp. 66–73.
-  B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8. Chicago, IL, USA, 2008, pp. 1433–1438.
-  P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning. ACM, 2004, p. 1.
-  S. Levine and V. Koltun, “continuous inverse optimal control with locally optimal examples,,” in the 29th International Conference on Machine Learning (ICML-12), 2012.
-  L. Sun, W. Zhan, and M. Tomizuka, “Probabilistic prediction of interactive driving behavior via hierarchical inverse reinforcement learning,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2111–2117.
-  W. Zhan, C. Liu, C. Y. Chan, and M. Tomizuka, “A non-conservatively defensive strategy for urban autonomous driving,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Nov. 2016, pp. 459–464.
-  W. Zhan, L. Sun, Y. Hu, J. Li, and M. Tomizuka, “Towards a fatality-aware benchmark of probabilistic reaction prediction in highly interactive driving scenarios,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 3274–3280.
-  R. Rajamani, Vehicle dynamics and control. Springer Science & Business Media, 2011.
-  J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, T. Dang, U. Franke, N. Appenrodt, C. G. Keller et al., “Making bertha drive-an autonomous journey on a historic route.” IEEE Intell. Transport. Syst. Mag., vol. 6, no. 2, pp. 8–20, 2014.