Sequential Experimentation to Efficiently Test Automated Vehicles
SEQUENTIAL EXPERIMENTATION TO EFFICIENTLY TEST AUTOMATED VEHICLES
Zhiyuan Huang 
Henry Lam 
Department of Industrial and Operations Engineering 
University of Michigan 
1205 Beal Avenue 
Ann Arbor, MI 48105, USA 
Ding Zhao 
Department of Mechanical Engineering 
University of Michigan 
2901 Baxter Road 
Ann Arbor, MI 48109, USA 
Abstract
Automated vehicles have been under heavy developments in major auto and tech companies and are expected to release into market in the foreseeable future. However, the road safety of these vehicles remains a concern. One approach to evaluate their safety is via ontrack experimentation, but this requires gigantic costs and time investments. This paper discusses a sequential learning approach based on kriging models to reduce the experimental runs and economize ontrack experimentation. The approach relies on a heuristic simulationbased gradient descent procedure to search for the best next test scenario. We demonstrate our approach with some numerical test cases.
1 Introduction
1.1 Background of Automated Vehicles Evaluation
While automated vehicles (AVs) are currently under intense developments by almost all major auto companies and tech giants, their safety has remained a concern, as reinforced by recent Tesla accidents involving selfdriving systems [Singhvi and Russell (2016)]. The difficulty in evaluating AVs is that these vehicles are “smart”, in that they interact with their environments and prompt autonomous actions, and hence cannot be tested using existing standard approaches.
For example, the socalled test matrix approach, adopted commonly in many vehicle testing procedures, uses fixed and predefined test scenarios to evaluate vehicles. However, an AV producer can tune the algorithm to excel in such test scenarios but fail on others, making the results of the test matrix invalid in capturing the actual risk [Aust (2012)]. In the United States, there are currently no standards or protocols to test AVs with high degrees of automation (known as automation level 2 [NHTSA (2013)] or higher). Most prospective AV manufacturers at present rely on Naturalist Field Operational Tests (NFOT) [FESTAConsortium et al. (2008)] to evaluate AV safety, which means putting the vehicle prototypes on actual roads and collecting data from potential accidents or conflicts. Such tests, however, are both timeconsuming and costly, as accidents are rare events that can only be assessed under statistical confidence with astronomical road miles driven by these prototypes. According to ?, an NFOT “cannot be conducted with less than $10,000,000”.
As an alternative, researchers have explored the use of Monte Carlo simulation techniques. ? and ? evaluated collision avoidance systems by reusing existing NFOT data, and ? used forward collision scenarios to evaluate collision warning and mitigation braking technologies on heavy trucks. ?, ?, and ? applied importance sampling methods to evaluate carfollowing and lane change scenarios. However, MonteCarlobased methods need to make assumptions on the control and dynamics of AVs. The lack of full knowledge in specifying these assumptions, complicated by the autonomous operations of AVs that are not publicly disclosed, remains one of the key difficulties in carrying out reliable Monte Carlo evaluation. Ontrack experiments to learn the behaviors of AVs is therefore a crucial step [Peng and Leblanc (2012)]. These behaviors, once accurately informed, can be used as inputs to the Monte Carlo evaluation. However, such experiments are only recently feasible [Mcity (2017)] and require huge cost and time investments. This motivates us to explore an adaptive approach to reduce the number of ontrack experimental runs needed for the learning.
1.2 Outline of the Sequential Experimentation Approach
The number of possible scenarios that an AV can react on, which collectively define the behavior of the AV, are typically infinite. This motivates us to consider a metamodel to make our learning feasible. Specifically, we use a kriging framework to model the unknown behavior of AVs, and investigate a myopic approach to sequentially select the next test scenario that can maximize the information gain (thereby reducing the runs needed to achieve a reasonable estimation accuracy). As the gain is in terms of the correctness of the Monte Carlo evaluation, finding the next test scenario generally requires simulationbased optimization. In particular, we investigate a heuristic use of stochastic gradient descent. The simulation is also used to make the final safety evaluation of the AV being considered.
Our framework follows from the kriging technique originated from geology (e.g., [Chiles and Delfiner (2009)]) and further developed in computer experiments (e.g., [Sacks et al. (1989)]). The primary use of kriging is to assimilate spatial data under correlation among different design points that is made computationally convenient through Gaussian process modeling. Our approach follows this framework by viewing the test scenarios as design points. In the static settings, the design points are typically selected using spacefilling design (e.g., with Latin Hypercube Sampling; [Kleijnen (2008)]). To reduce experimental costs with respect to a specified goal, one can sequentially select the design points, which is the approach we adopt. In particular, we follow the sequential sampling idea that has been applied to sensitivity analysis and optimization [Kleijnen and Van Beers (2004), Kleijnen (2009)]. Our work most closely follows the concept of knowledge gradient (e.g., [Powell and Ryzhov (2012)]) in the Bayesian setting. Other related literature includes the stream of study in stochastic kriging [Ankenman et al. (2010), Staum (2009)], a generalization of the kriging technique to stochastic computer experiments. In this paper, however, we assume the ontrack experimentation is errorfree and hence relates more closely to the deterministic experimentation framework. On the other hand, stochasticity comes in the evaluation criterion and as a result, as discussed above, our sequential design point search will allude to the use of simulation optimization.
2 A Kriging Framework for Av Evaluation
We introduce our framework in two components. First, Section 2.1 describes the setting and the challenge of AV evaluation and gives a simple illustrative example. Section 2.2 then describes how we cast the AV evaluation task into a krigingbased learning model.
2.1 The Task of AV Evaluation
Evaluation of the road safety of AVs requires studying the risk arising from its interaction with the surrounding environments, such as other vehicles driven by human drivers, pedestrians etc. The risk can be measured by probabilistic quantities such as the chance of accidents (e.g., crashes) and conflicts (e.g., the AV and a front car within a dangerously short distance). For example, ? demonstrate this calculation via Monte Carlo simulation with a lane change scenario. Figure 1 describes this setting, where a humancontrolled vehicle driving in front of the AV is cutting into the AV’s lane. The AV has a builtin intelligent control system that is assumed deterministic, while the frontal vehicle is susceptible to noisy human behavior and hence is stochastic.
A collision can occur when the gap is too short at any point of time. Consider a fixed period of time that represents the typical carfollowing duration. Denote as the range between the AV and the humandriving vehicle and as its rate of change, which depend on the physical measurements of both vehicles including accelerations, velocities and positions. denotes the initial condition of the lane change scenario. We say a collision happens if the range at any point of time is too short, say within a threshold . Then the collision probability is , or equivalently .
In general, the stochasticity of the humandriving vehicle, described by its acceleration etc., can be estimated from existing data. ? for instance uses the naturalistic driving data among all the lane change scenarios extracted from the Safety Pilot Model Deployment (SPMD) database [Bezzina and Sayer (2014)]. However, the AV control is typically not fully known to the tester. It could be known by the company that owns its production, but due to commercial concern such knowledge is not revealed to governmental or public entities who conduct safety tests. So to carry out the Monte Carlo safety test, a governmental unit needs to learn the control system by carrying out its own ontrack experiment. This experiment runs on a physical proving ground (e.g., [Mcity (2017)]) which, in the considered setting, can preset the configuration of the frontal vehicle to resemble an actual road condition. Observing how the AV reacts in these conditions provides some information on its underlying intelligent control.
Other scenarios can be evaluated similarly as above; see, e.g., ? for a carfollowing setting. In the subsequent discussion we will focus on the lane change situation for illustration.
2.2 A Kriging Model
We study a krigingbased learning approach to collect information about the AV from ontrack experiments. Suppose we are interested in estimating , where is an unknown function on , is a given threshold, and is a random object under the probability . For instance, can be and be in the example described in Section 2.1, where here refers to the set of parameters that controls the humandriving vehicle, which is random and its distribution calibrated from the SPMD database.
To model how information on updates our estimate on , we view as a response surface on the domain . We model as a Gaussian Random Field (GRF) [Rasmussen (2006)] that is independent of the stochasticity of , denoted as
where is the mean function and is the covariance function of the GRF. Given any fixed design points , comprises a Gaussian random vector with means and covariances . It is customary to assume that and where the correlation function implies stationary variance over and depends on the design point pairs only through the value of . For simplicity, we will further assume that for some , which represents a flat belief on over all the design points. We use the correlation function , where denotes the Euclidean norm. This correlation function signifies a higher correlation for test scenarios that are closer to each other. Note that we have adopted intuitive choices for the mean and correlation functions here for convenience, but better ones (in the sense of better reflecting the prior belief on the vehicle behaviors under different test scenarios) should be used with the availability of expert knowledge.
Suppose that the parameters are known. Given some observations on the value of at some points in , we can update the distribution of via conditioning. We denote as the observed design vector and the associated response vector . We define the matrix such that its th entry is , and define so that , for and . Given observations , for any fixed , we have
and
where is a vector with as the th element [Rasmussen (2006)]. Note that still follows a Gaussian distribution. For simplicity, we denote and .
In practice, the parameters need to be either estimated (e.g., by using maximum likelihood) or assigned reasonable values according to expert knowledge. For more details on calibrating the parameters, see, e.g., ?. In our subsequent discussion, we assume these are given and unchanged throughout the learning process.
Under the GRF assumption and conditioning on , we now set our target quantity of interest as , where now generates both the stochasticity in and the Gaussian uncertainty in . Typically this probability is larger, i.e., more conservative, than when is completely known, because of the additional noise coming from the model uncertainty. We view this probability as a reasonable target, but clearly other formulations are plausible.
Note that we have
(1) 
where denotes the expectation taken with respect to the stochasticity of . Since follows a Gaussian distribution with mean and variance , we can write (1) further as
(2) 
where denotes the tail distribution function of a standard Gaussian distribution.
3 Sequential Selection of Test Scenarios via Optimization
From (2), we design a procedure to sequentially look for the next scenario, or design point, to test the value of that can in a sense maximize the information gain. We define information gain as the distance between the current estimate of and its update taking into account the outcome of the next test. We maximize the expected distance under the current posterior distribution. This framework follows generally from the concept of knowledge gradient [Powell and Ryzhov (2012)], but here we are interested in a pure estimation problem instead of an optimization problem. Note that the distribution of is estimated from data, which can be parametrically modeled or fully datadriven, i.e., nonparametric. In general we need to run simulation to evaluate our target quantity, even though is highly structured.
We present some further notations. Let be the vectors of historical design points and responses from collected up to step . We denote . In particular, under follows a Gaussian distribution with mean and variance . For simplicity, we write and .
Let be the current target estimate, and be the target estimate if one tests an additional design point and collects a response . Let be some distance criterion between two probabilities. Given , we search for the next design point by looking for that solves
(3) 
A simple example of is the squared distance, which we adopt in the sequel. Optimization (3) becomes
(4) 
where we denote , the distribution function of , and the standard Gaussian distribution function.
Note that (4) generally does not support closedform evaluation, and requires running simulation. If is a discrete space, ranking and selection methods can be applied (an approach taken by ?). Here we focus on a continuous space for . We use stochastic approximation (SA) [Kushner and Yin (2003)] to search for a local optimum for (4). This approach follows from ? that considers parallel Bayesian global optimization where the onestep optimum cannot be solved in closedform under Gaussian process function models. Note that, like the setting in ?, since there is no guarantee that the objective function in (4) is concave, we can only ensure that our SA converges to a local optimum under suitable conditions.
We describe our stochastic gradient estimator for the objective function (4). Given i.i.d. samples drawn from and drawn from a standard Gaussian distribution, our gradient estimator is a vector in whose th element is given by
(5) 
where is given by
Here we have
and
where we use to denote the vector whose th element is for and th element is , to denote the matrix whose th entry is for and , th entry is for , th entry is for , and th entry is .
Furthermore, we have
The vector has 0 in all entries but the last, which is equal to . has 0 in all entries except the last row and column, where the th entry and th entry is equal to for where denotes the th observation. The vector has 0 in all entries but the last, which is equal to
Lastly, we have
and
where
and is a matrix whose th entry is for and , is a vector whose th element is for , and
The above estimator is only a heuristic that roughly resembles an infinitesimal perturbation analysis. Upon closer inspection, one can see that the term in the denominator in the formulas above is close to 0 if approaches any of the observed design points, a consequence of the fact that the responses at those points are completely known. This may blow up the gradient estimate. This issue can potentially be addressed by adding artificial small noise to the kriging model to inflate the variance from zero at those positions. An alternative is to use the finitedifference method, although this will reduce the efficiency of the resulting gradient descent algorithm.
With the gradient estimator, we iterate
(6) 
where denotes the gradient estimator in (5), for starting from an initial solution , to optimize the objective function in (4) according to a heuristic RobbinsMonro SA. The step size is taken as . One may also apply the algorithm at multiple starting points in view of the nonconvexity of the problem.
Overall, to sequentially select the design points, the steps consist of:

Use a smallsample spacefilling design to build an initial observation set and construct an initial kriging model.

Conduct an experiment at and add and the associated experimental outcome to the observation set to get .

Update the kriging model using the observation set .
4 Numerical Examples
This section shows some numerics on our information criterion and simple illustrations of our procedure.
4.1 Illustration of the Information Criterion
Here we present an example to illustrate the intuition behind the proposed information criterion. By contrasting with a simple alternative criterion, we demonstrate the relation between the proposed criterion and the underlying probability distribution.
Consider the generic target probability of interest , where is a random object with probability . In addition to (3), we consider an alternative criterion to select the next design point by maximizing the pointwise variance of over under the posterior distribution on , namely ), where is the indicator function. In other words, we maximize
(7) 
where and refer to the conditional distribution on as before. Criterion (7), which we name the local prediction impact for convenience, does not depend on the distribution of but only measures the uncertainty (or confidence of our knowledge) on the values of the function at different points. This contrasts our suggestion in (3) that accounts for both the uncertainty on and the distribution of , and in this sense (7) is a less comprehensive measure. In general, one would expect that the information gain measured by (3) is large when the local prediction impact is large and the position of interest is “important” according to the distribution of . On the other hand, a position with a large local prediction impact may not necessarily be important in determining the estimate of , since the latter depends on the distribution of .
We demonstrate the two criteria with a study of lane change scenarios described in Section 2.1. We assume that the AV uses a deterministic system with Adaptive Cruise Control (ACC) and Autonomous Emergency Braking (AEB) [Ulsoy et al. (2012)] (see Fig. 3), but this is supposedly unknown to the tester. There are three key variables that constitute the scenario, namely the frontal vehicle’s velocity , range and time to collision , where we define as
As described in ?, when the velocity is between 5 to 15 , the other two variables and are independent of each other. can be modeled by an exponential distribution and by a Pareto distribution. Here we define , and we are interested in estimating , i.e., the probability that the two vehicles have a minimum range smaller than 2 meters, when the velocity of the leading vehicle is set to lie in the aforementioned range.
We use a kriging model with parameters , and . We set the prediction threshold for simplicity. The zero mean of is chosen to reflect the belief that the response of a scenario with no information is far from being a critical event. We choose the value of which intuitively puts to be three standard deviations higher than the mean of a scenario that has no information (i.e., ). is selected to make the correlation between scenarios with distance (believed to represent initial conditions with different AV behaviors) to be small enough (less than ).
We use 20 initial design points to build the model and its value of is shown in Fig. 3. The blue dots represent a sample distribution of the variable . Red circles are existing design points with return 0 and red crosses are existing design points with return 1. We consider four arbitrarily picked new design points (which we call points A, B, C and D) shown by the red stars, whose coordinates are shown in the first row of Table 1. The local prediction impacts and the information gains depicted by the objective in (3) of these design points are shown in the second and third rows respectively.
We see that points B and C have smaller local prediction impacts than points A and D, which can be attributed to the vicinity of their positions to those of the historical data that subsequently reduces the uncertainty of . This translates to a smaller variance of and hence a smaller local prediction impacts. Relatedly, these points also have a low information gain measured by (3). However, point D, even though far away from the positions of the historical data, has an even lower information gain. This can be attributed to the tiny density of at this point, which makes the overall information gain low. In contrary, point A has a higher density of and consequently a higher information gain.
A  B  C  D  
Coordinate  (0.05,0.1)  (0.12,0.55)  (0.05,0.55)  (0.45,0.5) 
Local prediction impact  0.209  0.0996  0.0471  0.249 
Information gain  0.0082 
4.2 Example of the Sequential Learning Approach
To illustrate our sequential learning approach, we use a simple hypothetical problem where we define the probability of interest as , with two random objects each following a standard Gaussian distribution (in the lane change scenario described before would correspond to the initial conditions such as frontal vehicle velocity, with a correspondingly more sophisticated function). The true probability is .
We use a kriging model with parameters , , and . Here the parameters are arbitrarily chosen, as we assume that no prior information is available. We start with 20 initial design points. In the SA scheme, we use as the step size parameter with , and we terminate the scheme after 50 iterations, at each new design point. The gradient estimator is averaged from 1,000 samples. For illustration, we use 10,000 samples to estimate the target probability under the kriging model to assess its error relative to the truth.
Fig. 5 shows that as we collect more observations to update the kriging model, the probability estimate gradually converges to the true probability. At each step, we start the SA from a randomly generated point using a standard Gaussian distribution. To illustrate the benefit from the optimization step, Fig. 5 compares our approach with random sampling at each step, where this random sample is precisely the starting point of our SA scheme. We observe that our sequential learning approach converges to the true probability quickly in the first few steps, but the convergence slows down as the learning progresses. This may be caused by a saturation in terms of the highest accuracy affordable by the SA’s noises, as well as the heuristic nature of our approach. Finally, Fig. 7 and 7 show the probability estimates when we use SA with starting points fixed at and respectively, at each learning step. We see that the probability estimates move towards the truth regardless of the starting points, giving a sign that the SA algorithm is at least working. Moreover, starting from appears to achieve faster convergence, which can be reasoned by the fact that is closer to the boundary of the event that facilitates the involved learning process. We note that the lines in the figures appear a bit fluctuant as they are illustrated in the scale of the probability estimates, which is small relative to the simulation replication size we use to generate them (i.e., ). Further investigation is clearly needed, but the above observations aim to show some preliminary insights on the behavior of our approach and confirm its potential.
5 Conclusion
This paper presents a sequential learning approach based on using kriging models to approximate AV behaviors, to reduce ontrack experimentation for AV safety evaluation. The approach relies on a heuristic simulationbased gradient descent procedure to search for the best next test scenario in terms of maximizing an information criterion regarding the accuracy of conflict probability evaluation. We derive a gradient estimator and investigate the performance of our procedure. Numerical examples show that our approach sequentially improves our probability estimate, and appears to perform better than simple strategies such as random scenario sampling. Future work includes the studies of further assumptions of the kriging models in the AV evaluation context and developments of scenario search procedures that are both more efficient and theoretically sound.
Acknowledgments
We gratefully acknowledge support from the National Science Foundation under grants CMMI1542020, CMMI1523453 and CAREER CMMI1653339.
Author Biographies
ZHIYUAN HUANG is a secondyear Ph.D. student in the Department of Industrial and Operations Engineering at the University of Michigan, Ann Arbor. His research interests include simulation and stochastic optimization. His email address is zhyhuang@umich.edu.
HENRY LAM is an Assistant Professor in the Department of Industrial and Operations Engineering at the University of Michigan, Ann Arbor. His research focuses on stochastic simulation, risk analysis, and
simulation optimization. His email address is khlam@umich.edu.
Ding Zhao is a Assistant Research Scientist in the Department of Mechanical Engineering at the University of Michigan, Ann Arbor. His research focuses on Connected and Automated Vehicles (CAVs) using synthesized approaches rooted in advanced statistics, modeling, optimization, dynamic control, and big data analysis. His email address is zhaoding@umich.edu.
References
 Akamatsu, M., P. Green, and K. Bengler. 2013. “Automotive technology and human factors research: Past, present, and future”. International journal of vehicular technology 2013.
 Ankenman, B., B. L. Nelson, and J. Staum. 2010. “Stochastic kriging for simulation metamodeling”. Operations research 58 (2): 371–382.
 Aust, M. 2012. “Evaluation Process for Active Safety Functions: Addressing Key Challenges in Functional, Formative Evaluation of Advanced Driver Assistance Systems”.
 Bezzina, D., and J. Sayer. 2014. “Safety pilot model deployment: Test conductor team report”. Report No. DOT HS 812:171.
 Chiles, J.P., and P. Delfiner. 2009. Geostatistics: modeling spatial uncertainty, Volume 497. John Wiley & Sons.
 FESTAConsortium et al. 2008. “FESTA Handbook Version 2 Deliverable T6. 4 of the Field opErational teSt supporT Action”. Brussels: European Commission.
 Huang, Z., D. Zhao, and H. Lam. 2017. “Towards Affordable Ontrack Testing for Autonomous Vehicle  A Krigingbased Statistical Approach”. Working Paper.
 Huang, Z., D. Zhao, H. Lam, and D. J. LeBlanc. 2017. “Accelerated evaluation of automated vehicles using piecewise mixture models”. arXiv preprint arXiv:1701.08915.
 Huang, Z., D. Zhao, H. Lam, D. J. LeBlanc, and H. Peng. 2017. “Evaluation of Automated Vehicles in the Frontal Cutin Scenarioan Enhanced Approach using Piecewise Mixture Models”. Ann Arbor 1001:48109.
 Kleijnen, J. P. 2008. Design and analysis of simulation experiments, Volume 20. Springer.
 Kleijnen, J. P. 2009. “Kriging metamodeling in simulation: A review”. European journal of operational research 192 (3): 707–716.
 Kleijnen, J. P., and W. C. Van Beers. 2004. “Applicationdriven sequential designs for simulation experiments: Kriging metamodelling”. Journal of the Operational Research Society 55 (8): 876–883.
 Kushner, H., and G. G. Yin. 2003. Stochastic approximation and recursive algorithms and applications, Volume 35. Springer Science & Business Media.
 Lee, K. 2004. Longitudinal driver model and collision warning and avoidance algorithms based on human driving databases.
 Mcity 2017. “Mcity”. https://mcity.umich.edu/.
 NHTSA 2013. “Preliminary statement of policy concerning automated vehicles, National Highway Traffic Safety Administration”. Washington, DC:1–14.
 Peng, H., and D. Leblanc. 2012. “Evaluation of the performance and safety of automated vehicles”. White Pap. NSF Transp. CPS Work.
 Powell, W. B., and I. O. Ryzhov. 2012. Optimal learning, Volume 841. John Wiley & Sons.
 Rasmussen, C. E. 2006. “Gaussian processes for machine learning”.
 Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn. 1989. “Design and analysis of computer experiments”. Statistical science:409–423.
 Singhvi, A., and K. Russell. 2016. “Inside the SelfDriving Tesla Fatal Accident”. The New York Times.
 Staum, J. 2009. “Better simulation metamodeling: The why, what, and how of stochastic kriging”. In Simulation Conference (WSC), Proceedings of the 2009 Winter, 119–133. IEEE.
 Ulsoy, A. G., H. Peng, and M. Çakmakci. 2012. Automotive control systems. Cambridge University Press.
 Wang, J., S. C. Clark, E. Liu, and P. I. Frazier. 2016. “Parallel bayesian global optimization of expensive functions”. arXiv preprint arXiv:1602.05149.
 Woodrooffe, J., D. Blower, S. Bao, S. Bogard, C. Flannagan, P. E. Green, and D. LeBlanc. 2014. “Performance characterization and safety effectiveness estimates of forward collision avoidance and mitigation systems for medium/heavy commercial vehicles”. Univ. Michigan Transp. Res. Inst., Ann Arbor, MI, USA, UMTRI201136.
 Yang, H.H., and H. Peng. 2010. “Development and evaluation of collision warning/collision avoidance algorithms using an errable driver model”. Vehicle system dynamics 48 (S1): 525–535.
 Zhao, D., H. Lam, H. Peng, S. Bao, D. J. LeBlanc, K. Nobukawa, and C. S. Pan. 2017. “Accelerated evaluation of automated vehicles safety in lanechange scenarios based on importance sampling techniques”. IEEE transactions on intelligent transportation systems 18 (3): 595–607.