Bounding rare event probabilities in computer experiments
Abstract
We are interested in bounding probabilities of rare events
in the context of computer experiments. These rare events depend on the output
of a physical model with random input variables. Since the model is only known
through an expensive black box function, standard efficient Monte Carlo methods
designed for rare events cannot be used. We then propose a strategy to deal
with this difficulty based on importance sampling methods. This proposal relies on
Kriging metamodeling and is able to achieve sharp upper confidence bounds on the rare event probabilities.
The variability due to the Kriging metamodeling step is properly taken into account.
The proposed methodology is applied to a toy example and compared to more standard Bayesian bounds.
Finally, a challenging real case study is analyzed. It
consists of finding an upper bound of the probability that the trajectory of an
airborne load will collide with the aircraft that has released it.
Keywords: computer experiments, rare events, Kriging, importance sampling, Bayesian estimates, risk assessment with fighter aircraft.
1 Introduction
Rare events are a major concern in the reliability of complex systems (Heidelberg, 1995; Shahabuddin, 1995). We focus here on rare events depending on computer experiments. A computer experiment (Welch et al., 1992; Koehler and Owen, 1996) consists of an evaluation of a black box function which describes a physical model,
(1.1) 
where and where is a compact subset of . The code which computes is expensive since the model is complex. We assume that no more than calls to are possible. The input are measured with a lack of precision and some variables are uncontrollable. Both sources of uncertainties are modeled by a random distribution on . Let be the random variable. Our goal is to propose an upper bound of the probability:
where is a subset of defined by and is a given threshold.
In a crude Monte Carlo scheme, the following estimator of is obtained:
(1.2) 
where is defined by
(1.3) 
and is an sample of random variables with the same distribution as . Its expectation and its variance are:
Since follows a binomial distribution with parameters and , an exact confidence upper bound on :
is available.
Indeed, let be a random variable which follows a binomial distribution with parameters and . For any real number , we can easily show that the upper confidence bound on :
is such that:
(1.4) 
This upper bound is not in closed form but easily computable.
In the case where which happens with probability , the
confidence interval is .
As an example, if the realization of is equal to ,
an upper confidence bound at level , can be warranted only if
more than 230,000 calls to were performed.
When the purpose is to assess the reliability of a system under the constraint of a limited number of calls
to , there is a need for a sharper upper bound on .
Several ways to improve the precision of estimation and bounding have been proposed in the literature.
Since Monte Carlo estimation works better for frequent events,
the first idea is to change the crude scheme in such a manner that the event becomes less rare.
It is what importance sampling and splitting methods schemes try to achieve.
For example L’Ecuyer et al. (2007) showed that randomized quasiMonte Carlo can be used jointly
with splitting and/or importance sampling.
By analysing a rare event as a cascade of intermediate less rare events,
Del Moral and Garnier (2005) developed a genealogical particle system approach to explore
the space of inputs .
Cérou and Guyader (2007a, b) proposed an adaptive multilevel splitting also based
on particle systems.
An adaptive directional sampling method is presented by Munoz Zuniga et al. (2010) to accelerate
the Monte Carlo simulation method.
These methods can still need too many calls to and
the importance distribution is hard to set for an importance sampling method.
A general approach in computer experiments is to make use of a metamodel
which is a fast computing function which approximates .
It has to be built on the basis of data
which are evaluations of at points of a wellchosen design . The bet is that
these evaluations will allow the building of more accurate bounds on the probability
of the target event.
Kriging is such a metamodeling tool: one can see Santner et al. (2003) and more recently
Li and Sudjianto (2005); Joseph (2006); Bingham et al. (2006).
The function is seen as a realization of a Gaussian process which is a Bayesian prior.
The related posterior distribution is computed conditionally to the data.
It is still a Gaussian process whose mean can be used as a prediction of everywhere on
and the variance as a pointwise measure of the confidence one can have in the prediction.
By using this mean and this variance, Oakley (2004) has developed a sequential method
to estimate quantiles and Vazquez and Bect (2009) a sequential method to estimate the probability
of a rare event.
Cannamela et al. (2008) have proposed some sampling strategies based only
on a reduced model which is a coarse approximation of
(no information about the accuracy of prediction is given), to estimate quantiles.
In this paper, we also use Kriging metamodeling. Indeed, we assume that is a realization of a Gaussian process . This Gaussian process is assumed independent of since it models the uncertainty in our knowledge of while models a physical uncertainty on the input variables. As a consequence, is a realization of the random variable:
A natural approach consists of focusing on the posterior distribution of
which depends on the posterior distribution of given its computed evaluations.
A Bayesian estimator of can be computed and a credible bound is reachable by simulating realizations of the conditional Gaussian process
to obtain realizations of .
We propose another approach which makes use of an importance sampling method the importance distribution of which
is based on the metamodel.
The paper is organized as follows: Section 2 describes the posterior distribution of the Gaussian process and how to obtain an estimator and a credible bound of . Section 3 presents our importance sampling method and the stochastic upper bound which is provided with a high probability. Finally in Section 4, the two methods are compared on a toy example. Different designs of numerical experiments (sequential and non sequential) are performed for this comparison. A solution to a real aeronautical case study about the risk that the trajectory of an airborne load will collide with the aircraft that has released it, is proposed.
2 Standard Bayesian bounds
The first step for Kriging metamodeling is to choose a design of numerical experiments (one can see Morris and Mitchell (1995); Koehler and Owen (1996) and more recently Fang et al. (2006); Mease and Bingham (2006); Dette and Pepelyshev (2010)). Let be the evaluations of on .
Let us start from a statistical model consisting of Gaussian processes , the expressions of which are given by: for ,
(2.1) 
where

are regression functions, and is a vector of parameters,

is a centered Gaussian process with covariance
where is a correlation function depending on some parameters (for details about kernels, see Koehler and Owen, 1996).
The maximum likelihood estimates of are computed on the basis of the observations. Then, the Bayesian prior on is chosen to be and the process is assumed independent of . We denote the process conditionally to , in short .
The process is still a Gaussian process (see Santner et al., 2003) with

mean: ,
(2.2) 
covariance:
(2.3)
where
In this approach the conditioning to the data regards the parameters as fixed although they are estimated.
The Bayesian prior distribution on leads to a Bayesian prior distribution on . Our goal is to use the distribution of the posterior process conditionally to the observation of , to learn about the posterior distribution of . It is straightforward to show that given , denoted , has the same distribution as . The mean and the variance of are then given by:
(2.4) 
where is the cumulative distribution function of a centered reduced Gaussian random variable,
(2.5) 
A numerical Monte Carlo integration can be used to compute the posterior mean and variance since they do not need more calls to . However, the computation time requested by a massive Monte Carlo integration, especially for , can be very long as can be seen in the examples.
The mean and the variance of can be used to obtain credible bounds. As a consequence of Markov inequality, it holds, for any ,
(2.6) 
Likewise, Chebychev inequality gives, for any ,
(2.7) 
Moreover, the quantiles of can be estimated through massive simulations of the conditional process . These realizations of lead to realizations of from which quantiles can be estimated. These quantiles of are exactly the upper bounds that are sought.
We adapt the algorithm proposed by Oakley (2004) to obtain realizations of . From a realization , the corresponding realization of is computed using a massive Monte Carlo integration with respect to the distribution of . Thus, a credible bound on is constructed. Given , a constant is found such that:
However, it is not possible to get an exact realization of or to sample jointly at all the inputs of the Monte Carlo sample (of ) since this sample is too large. Thus, the algorithm relies on a discretization of the process. The same scheme as the one followed by Oakley (2004) is used. We choose points in : where the corresponding realizations of are simulated. From the set of joint realizations of , a realization of is approximated by the mean of which is the process conditioned to and . The variance of has to be very small for any point in for the approximation to be valid. Hence, has to be large enough and the points in have to fill the space . When the dimension of the input space is low, to propose such a set is quite easy. However, it can be burdensome to fill the space in high dimension and it can lead to a too large number of needed realizations of which are impossible to simulate jointly. The discretization step is a major concern since it induces an uncontrollable error on the credible bound on . We propose then in the next Section, an alternative approach based on importance sampling which avoids this discretization step.
3 Metamodelbased importance sampling
As was explained in Section 1, the major drawback of the crude Monte Carlo scheme is the high level of uncertainty when it is used for estimating the probability of a rare event. Importance sampling is a way to tackle this problem. The basic idea is to change the distribution to make the target event more frequent. We aim at sampling according to the importance distribution:
where is to be designed close to . Thanks to calls to the metamodel, a set can be chosen as follows:
(3.1) 
where is fixed such that “ with a good confidence level”. In other words, if is such that , we want to be in . We recall that the posterior mean is an approximation of and has been added to take into account the uncertainty of the approximation.
A set of points, , is drawn to be an i.i.d. sample following the importance distribution. The corresponding importance sampling estimator of is
(3.2) 
The probability is computable by a Monte Carlo integration since
it does not depend on ; yet, more calls to are necessary to compute .
This estimator is only unbiased provided that .
Nevertheless, it is an unbiased estimator of
.
Since follows a binomial distribution
, for any , the following confidence upper bound holds:
(3.3) 
by using the bound (1.4). This is an upper bound on only if the estimator (3.2) is unbiased i.e. only if . As is noticed in the decomposition:
the second term on the righthand side which is the opposite of the bias has to be controlled. That is why the random variable
whose a realization is , is considered.
Similarly to the previous decomposition, it holds that
(3.4) 
A bound on comes from (3.3).
Proposition 3.1.
For , it holds that
(3.5) 
where stands for .
Proof
Let be any realization of .
As in (3.3), we have
Thus, since this result holds for any realization of ,
The next proposition states an upper bound for the second term in (3.4).
Proposition 3.2.
For , it holds that
where .
Proof
The mean of can be computed
in the same fashion as the mean of .
It gives
Then, Markov inequality is applied which completes the proof.
Finally, by gathering the results of Proposition 3.1 and Proposition 3.2, a
stochastic upper bound is found on .
Proposition 3.3.
For such that , it holds that
(3.6) 
where and have been defined above.
The proof is obvious.
If is chosen as proposed in (3.1), the bound is:
4 Numerical experiments
4.1 A toy example
We study on a toy example, described below, the two bounding strategies: the Bayesian strategy (credible bound obtained by simulating realizations of
as described in the end of Section 2)
and the MBIS (metamodelbased importance sampling) strategy (stochastic bound given by Proposition 3.3).
Since the credible bounds in the Bayesian strategy are directly derived from the metamodeling, the Bayesian strategy is the reference and if the dimension of is low,
it should perform well. Our aim is to test whether the MBIS strategy can achieve such good bounds. Since the choice in the
design should directly impact the set (3.1) and hence the quality of the importance sampling,
different kind of designs are considered to compare the strategies.
The function is assumed to describe a physical model:
The input vector is supposed to have a uniform distribution on .
The threshold is set to which corresponds to the probability
. This probability was computed thanks to a massive
Monte Carlo integration.
It is assumed that no more than calls to are allowed.
For the Bayesian strategy, all of the available calls to are used to build the metamodel, while for the MBIS
strategy (using notations of Section 3) and are set.
The two strategies are compared with different design sampling methods.
Three design sampling methods are used: an LHSmaximin method (Morris and Mitchell, 1995) which is non sequential
and space filling and two sequential methods. The sequential sampling methods are based on a first LHSmaximin design including
of the points and the last are added sequentially according to the criterion
tIMSE (targeted Integrated Mean Square Error) proposed by Picheny et al. (2010) for
one method and according to the criterion J SUR (Stepwise Uncertainty Reduction)
proposed by Bect et al. (2011) for the other method. These sequential methods are based on
a tradeoff between reduction of uncertainty in the knowledge of and the exploration of the space around the critical set .
All Kriging metamodels are built with an intercept as the regression function and a Gaussian correlation function is chosen
as the correlation function of the Gaussian process i.e. , and ,
are set for the model given by equation (2.1).
For the Bayesian strategy,
a thousand realizations of are computed from which the credible bound is obtained.
The discretization is done on a grid with points and to prevent illconditioned covariance matrices,
if a point of the grid is too close to a point of the design it is replaced with a point in far enough from the points of the design and the points of the grid.
The numerical integration to compute the realization of from the realization of the process is done with a sample.
In the MBIS strategy, the probability
(and also the bound on the bias, given in Proposition 3.2) was computed by a Monte Carlo integration on
a sample and has been set.
There are sources of variability on the estimators and the bounds due to the design sampling methods. Indeed, the designs are computed to be maximin by using a finite number of iterations of a simulated annealing algorithm. Moreover, there exist symmetries within the class of maximin designs. In the sequential method, the point to be added is sought through a stochastic algorithm. Concerning the importance sampling strategy, the sampling which gives induces variability. In order to test the sensitivity to these sources of variability, each of the two strategies for each of the three design sampling methods is repeated one hundred times.
Full LHSmaximin  J SUR  tIMSE  

Minimum  
quartile  
Mean  
Median  
quartile  
Maximum 
Full LHSmaximin  J SUR  tIMSE  

Minimum  
quartile  
Mean  
Median  
quartile  
Maximum 
Figure 2 and Table 1 display the results for credible bounds obtained by the Bayesian strategy and Figure 3 and Table 2 display the results for stochastic bounds obtained by the MBIS strategy. The results are provided according to the design sampling method. For the MBIS stochastic bounds, and have been set using the notations of Proposition 3.3. If a crude Monte Carlo scheme as presented in Section 1 is used here with only calls, the estimator is equal to with probability greater than and in this case, the upper confidence bound is at level . The two strategies bring much sharper bounds on the probability of the rare event. The sequential design sampling method with J SUR criterion leads to the better bounds whatever the strategy. The MBIS strategy manages to reach the same sharp bounds as the ones provided by the Bayesian strategy in the case where the design is obtained thanks to the J SUR criterion. The Bayesian strategy is less sensitive to the choice in the design sampling method. However, the Bayesian method suffers from the fact that the quantiles are estimated thanks to conditional simulations of the Gaussian process which rely on a discretization of the space. Hence, it leads to an approximation in results and stability concern as is noticed in the results (see the maximum credible bound obtained with the tIMSE sampling criterion). Furthermore, a limited number of iterations is achievable since the simulations are quite burdensome.
The bound provided by Markov inequality (2.6) is not interesting in this example since
it cannot be less than 0.01. Chebychev inequality has not been used since we were not able to determine the posterior variance in a reasonable time.
As the strategies depend on the Kriging model hypothesis (2.1), a leaveoneout cross validation as proposed by Jones et al. (1998) can be performed to check whether this hypothesis is sensible. It consists of building metamodels with posterior mean and variance denoted respectively by and , from designs
where .
Then, the values
(4.1) 
are computed. If something like of them lie in the interval , the Kriging hypothesis is not rejected. In our toy example, in all of the tests which were done, all these values are in .
4.2 A real case study: release envelope clearance
4.2.1 Context
When releasing an airborne load, a critical issue is the risk that its trajectory could collide with the aircraft.
The behavior of such a load after release depends on many variables. Some are under the control of the crew: mach,
altitude, load factor, etc. We call them controlled variables and denote their variation domain as .
The others are uncontrolled variables: let be the set of their possible values.
The release envelope clearance problem consists of exploring the set to find a
subset where the release is safe, whatever the uncontrolled variables are.
To investigate this problem, we can use a simulator which computes the trajectory of the carriage when the
values of all the variables are given. Moreover, for and , besides the trajectory
, the program delivers a danger score to be interpreted
as an “algebraic distance”: a negative value characterizes a collision trajectory.
To assess the safety of release at a given point of , we suppose that the values of the uncontrolled
variables are realizations of a random variable that can be simulated.
Therefore, for a given value , and the collision risk is the probability
We do not aim at estimating this risk accurately.
We would rather classify the points into three categories: according to the position of risk with
respect to the two markers and , is said to be

totally safe if ,

relatively safe if ,

unsafe if .
In this example, there are controlled and uncontrolled variables, so that . From budgetary point of view, experts consider that a set of about representative points of is enough to cover the domain consistently. On the other hand, the computation of trajectories takes about 4 days which is considered reasonable. On the basis of these indications, the maximum amount of available calls to the simulator is per point.
4.2.2 Bounding strategy
As the dimension of the set of uncontrolled variables is high, the credible bounds obtained with the Bayesian strategy
are impossible to get. The stochastic bounds provided by the MBIS strategy are still available. Unfortunately,
a sequential sampling method for the design of experiments is not achievable since the simulator is too expensive if only one point is evaluated per call.
Indeed, a fixed part of the cost of a call does not depend on the number of points for which the code is run.
Although the MBIS strategy with an LHSmaximin sampling method is not optimal, it is still efficient.
We propose this twostep bounding strategy which can be applied for each point of the set of representative points (in the set ). Each step uses half of the calls budget: . Let be the current point of interest that we suppose fixed. For any , is set. It then corresponds to the notation introduced in the first part of the paper.

At the first stage, a Gaussian process is built as explained in (2), on the basis of evaluations of on . We know that is a realization of the random variable whose mean
can be computed accurately.
As stated by (2.6), applying Markov inequality gives, for any ,According to the value of we then take the following decisions:

if which leads by (2.6) to , we qualify the current point as totally safe,

if , we conservatively classify as unsafe,

if we use a second stage procedure to refine the risk assessment.


A millionsample of is drawn from which we tune in such a way that of these million elements of are in . The resulting points are an sample of realizations of the random variable which follows the importance distribution,
By using calls to the simulator, is computed. Drawn from Proposition 3.3 with setting , we obtain the bound
which is a decreasing function of .
Let us define . For such an , Proposition 3.3 states:which provides as a confidence upper bound on .
4.2.3 Experiments
Three points of have been experienced. Of these cases the first one is known to be a null risk point, while the third one is very unsafe and the second one is inbetween. For benchmarking purposes, besides the simulator calls budget required for the estimation process described in 4.2.2, a sample of realizations of has been computed for each of the three examples. For each case, we began by estimating a Gaussian process on the basis of values computed on the points of a point design . This design was obtained by LHSmaximin sampling. Figures 4, 5 and 6 show the predictive performance of the processes when applied to the benchmark points. These points, which appear in red, are sorted according to their process mean values while the blue curves mark the predicted 3 standard deviation positions around the means. As appears rather clearly, the dispersion of the real values is underestimated by the model: they overflow the blue zone with a frequency () higher than expected (). The worst case is the first one, for which large deviations appear for benchmark points with low values of .
In order to obtain bounds from (2.6), we then computed using (2.4):

In the first case, the massive Monte Carlo procedure leads to a numerically null evaluation of and, as a consequence, to the classification of the related point as totally safe.

In the second example, being evaluated at , we need to proceed to the second step of the bounding strategy to refine the collision probability estimation. The obtained confidence upper bound is at confidence level . The benchmark data do not show collision case: a confidence upper bound is .

in case 3 which is consistent with the confidence interval , obtained on benchmark data.
5 Discussion
In this paper, we have especially focused on bounding the probability of a rare event. From our point of view, it seems much more reliable to assess that the probability of a feared event (failure of a system, natural disaster, etc.) not exceeding a given level with high probability than to estimate the probability of this event happening. Using Kriging metamodels to cope with the expensive black box model induces a random interpretation of the probability to be estimated. Two strategies were studied in that context including our MBIS strategy. On a toy example, the efficiency of the two strategies was shown and it was highlighted that a sequential design sampling, when possible, is preferable. Concerning the importance sampling strategy, further investigations could be about proposing an optimal splitting of the calls to the code used for the metamodel or for the importance sampling. Other concerns could be about tuning to construct the set .
We have dealt with a crossvalidation method to assess the Kriging hypothesis. However, in the case where the crossvalidation leads one to reconsider this hypothesis, a solution is to extend the confidence interval on the prediction by tuning by hand the parameter in equation (2.1). In Bayesian words, it can be called using a less informative prior distribution on .
We have not managed to compute the posterior variance (2.5) by using a massive Monte Carlo integration in our examples since it is very small. However, other rare event methods can be investigated since the variance no longer depends on .
References
 Bect et al. (2011) Bect, J., Ginsbourger, D., Li, L., Picheny, V., and Vazquez, E. (2011). Sequential design of computer experiments for the estimation of a probability of failure. Statistics and Computing. to appear.
 Bingham et al. (2006) Bingham, D., Hengartner, N., Higdon, D., and Kenny, Q. Y. (2006). Variable Selection for Gaussian Process Models in Computer Experiments. Technometrics, 48(4):478–490.
 Cannamela et al. (2008) Cannamela, C., Garnier, J., and Iooss, B. (2008). Controlled stratification for quantile estimation. The Annals of Applied Statistics, 2(4):1554–1580.
 Cérou and Guyader (2007a) Cérou, F. and Guyader, A. (2007a). Adaptive multilevel splitting for rare event analysis. Stochastic Analysis and Applications, 25(2):417–443.
 Cérou and Guyader (2007b) Cérou, F. and Guyader, A. (2007b). Adaptive particle techniques and rare event estimation. In Conference Oxford sur les méthodes de Monte Carlo séquentielles, volume 19 of ESAIM Proc., pages 65–72. EDP Sci., Les Ulis.
 Del Moral and Garnier (2005) Del Moral, P. and Garnier, J. (2005). Genealogical particle analysis of rare events. The Annals of Applied Probability, 15(4):2496–2534.
 Dette and Pepelyshev (2010) Dette, H. and Pepelyshev, A. (2010). Generalized Latin Hypercube Design for Computer Experiments. Technometrics, 52(4):421–429.
 Fang et al. (2006) Fang, K.T., Li, R., and Sudjianto, A. (2006). Design and Modeling for Computer Experiments. Computer Science and Data Analysis. Chapman & Hall/CRC.
 Heidelberg (1995) Heidelberg, P. (1995). Fast simulation of rare events in queuing and reliability models. ACM Transactions on Modeling and Computer Simulation, 5:43–85.
 Jones et al. (1998) Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient Global Optimization of Expensive BlackBox Functions. Journal of Global Optimization, 13(4):455–492.
 Joseph (2006) Joseph, V. R. (2006). Limit Kriging. Technometrics, 48(4):458–466.
 Koehler and Owen (1996) Koehler, J. and Owen, A. (1996). Computer experiments. In Design and analysis of experiments, volume 13 of Handbook of Statistics, pages 261–308. North Holland, Amsterdam.
 L’Ecuyer et al. (2007) L’Ecuyer, P., Demers, V., and Tuffin, B. (2007). Rare events, splitting, and quasiMonte Carlo. ACM Trans. Model. Comput. Simul., 17(2).
 Li and Sudjianto (2005) Li, R. and Sudjianto, A. (2005). Analysis of Computer Experiments using Penalized Likelihood in Gaussian Kriging Models. Technometrics, 47(2):111–120.
 Mease and Bingham (2006) Mease, D. and Bingham, D. (2006). Latin Hyperrectangle Sampling for Computer Experiments. Technometrics, 48(4):467–477.
 Morris and Mitchell (1995) Morris, M. and Mitchell, T. (1995). Exploratory designs for computer experiments. Journal of Statistical Planning and Inference, 43:381–402.
 Munoz Zuniga et al. (2010) Munoz Zuniga, M., Garnier, J., Remy, E., and de Rocquigny, E. (2010). Adaptative directional stratification for controlled estimation of the probability of a rare event. Technical report.
 Oakley (2004) Oakley, J. (2004). Estimating percentiles of uncertain computer code outputs. Applied Statistics, 53:83–93.
 Picheny et al. (2010) Picheny, V., Ginsbourger, D., Roustant, O., and Haftka, R. (2010). Adaptive designs of experiments for accurate approximation of a target region. Journal of Mechanical Design, 132(7).
 Santner et al. (2003) Santner, T., Williams, B., and Notz, W. (2003). The Design and Analysis of Computer Experiments. SpringerVerlag.
 Shahabuddin (1995) Shahabuddin, P. (1995). Rare event simulation in stochastic models. In WSC’95: Proceedings of the 27th conference on Winter simulation.
 Vazquez and Bect (2009) Vazquez, E. and Bect, J. (2009). A sequential Bayesian algorithm to estimate a probability of failure. In 15th IFAC SYmposium on System IDentification (SYSID 2009).
 Welch et al. (1992) Welch, W. J., Buck, R. J., Sack, J., Wynn, H. P., Mitchell, T. J., and Morris, M. D. (1992). Screening, Predicting, and Computer Experiments. Technometrics, 34(1):15–25.