[

# [

[
###### Abstract

The problem of estimating the probability is considered when represents a multivariate stochastic input of a monotonic function . First, a heuristic method to bound , originally proposed by de Rocquigny (2009), is formally described, involving a specialized design of numerical experiments. Then a statistical estimation of is considered based on a sequential stochastic exploration of the input space. A maximum likelihood estimator of based on successive dependent Bernoulli data is defined and its theoretical convergence properties are studied. Under intuitive or mild conditions, the estimation is faster and more robust than the traditional Monte Carlo approach, therefore adapted to time-consuming computer codes . The main result of the paper is related to the variance of the estimator. It appears as a new baseline measure of efficiency under monotonicity constraints, which could play a similar role to the usual Monte Carlo estimator variance in unconstrained frameworks. Furthermore the bias of the estimator is shown to be corrigible via bootstrap heuristics. The behavior of the method is illustrated by numerical tests led on a class of toy examples and a more realistic hydraulic case-study.

On considère l’estimation de la probabilité est un vecteur aléatoire et une fonction monotone. Premièrement, on rappelle et formalise une méthode, proposée par de Rocquigny (2009), permettant d’encadrer par des bornes déterministes en fonction d’un plan d’expérience séquentiel. Le second et principal apport de l’article est la définition et l’étude d’un estimateur statistique de tirant parti des bornes. Construit à partir de tirages uniformes successifs, cet estimateur présente sous de faibles conditions théoriques une variance asymptotique plus faible et une meilleure robustesse que l’estimateur classique de Monte Carlo, ce qui rend la méthode adaptée à l’emploi de codes informatiques lourds en temps de calcul. Des expérimentations numériques sont menées sur des exemples-jouets et un cas d’étude hydraulique plus réaliste. Une heuristique de boostrap, reposant sur un réplicat de l’hypersurface par des réseaux de neurones, est proposée et testée avec succès pour ôter le biais non-asymptotique de l’estimateur.

Accelerated Monte Carlo under monotonicity]Accelerated Monte Carlo estimation of exceedance probabilities under monotonicity constraints

Nicolas Bousquet]Nicolas Bousquet

[

by

EDF Research & Development

Dpt. of Industrial Risk Management

6 quai Watier, 78401 Chatou, France

nicolas.bousquet@edf.fr

In many technical areas, the exceedance of some unidimensional variable over a certain critical value may define an event of probability which has to be carefully estimated. Assumed to be stricly positive, can be defined by

 p = P(g(X)≤0) = ∫\mathbbmU\mathbbm1{g(x)≤0}f(x) dx

with a random vector of uncertain input parameters with probability density function (pdf) , taking its values in a dimensional space , and a deterministic mapping from to . This framework is often encountered in structural reliability studies (Madsen & Ditlevsen, 1996), when is a computer code reproducing a physical phenomenon. A Monte Carlo (MC) method is the usual way to estimate , by with large independently sampled from . Avoiding regularity hypotheses on , this unbiased estimator presents good convergence properties and an estimation error independent on . Unfortunately, this strategy often appears inappropriate in practice when reaches low values, since can be time-consuming and the computational budget may be limited: a good estimation of a probability typically requires at least calls to (Lemaire & Pendola, 2006). Furthermore, has the theoretical defect not to be robust, in the sense given by Glynn et al. (2009): its relative error, namely its coefficient of variation, does not tend to a finite limit when , given any finite number of trials.

Many non-intrusive strategies have been proposed to accelerate the MC approach. Traditional methods from the engineer community in structural reliability (FORM/SORM) treat the estimation of as an optimization problem. The computational work is usually fast but the estimators suffer from weakly or non-controllable errors. Statistical approaches are judged in terms of reduction rate with respect to the MC estimator variance . Methods like quasi-MC, sequential MC or importance sampling (Kroese & Rubinstein, 2007) are based on selecting a design of experiments (DOE), namely a set of points in on which is tested, such that be explored in areas close to the limit state surface . Most advanced methods often get rid of the time-consuming difficulties by emulating the behavior of , for instance using kriging techniques (Cannamela et al., 2008) which presuppose smoothness conditions on .

Minimizing the strength of regularity hypotheses placed on underlies the development of specialized acceleration methods. For instance, computer codes can suffer from edge effects which restrict smoothness conditions (Munoz-Muniga et al., 2011). On the other hand, the reality of the phenomenon can imply various form constraints on . Especially, the assumption that is monotonic with respect to is a standard problem in regression analysis (Durot, 2008). In the area of numerical experiments, monotonicity properties of computer codes have been considered theoretically and practically, e.g. proving the MC acceleration of Latin Hypercube Sampling for the estimation of expectancies (MacKay et al., 1979), carrying out screening methods for sensitivity analyses (Lin, 1993), constraining response surfaces (Kleijnen & van Beers, 2009; Kleijnen, 2011), predicting the behavior of network queuing systems (Ranjan et al., 2008), computing flood probabilities (de Rocquigny, 2009) or estimating the safety of a nuclear reactor pressure vessel (Munoz-Muniga et al., 2011).

Specific engineering works in structural reliability have highlighted the possibility of bounding and estimating significantly faster than using a MC approach. Under the name of monotonic reliability methods (MRM), de Rocquigny (2009) proposed a class of sequential algorithms contouring the limit state surface and enclosing between deterministic bounds which dynamically narrow. A similar idea was explored by Rajabalinejad et al. (2011). However, although a parallelization of such algorithms was already implemented (Limbourg et al., 2010), these methods were only empirically studied and some of the proposed estimators of remained crude.

The present article therefore aims to provide a first theoretical approach of the accelerated MC estimation of when is assumed to be monotonic and possibly discontinuous, although some smoothness constraints are assumed on the failure surface . More precisely, this article is structured as follows.

Section 2 is dedicated to a general description and a mathematical formalization of MRM. The main contribution is presented in Section 3: a statistical estimator of is proposed, based on uniformly sampled DOEs in nested spaces. Defined as the maximum likelihood estimator of dependent Bernoulli data, its asymptotic properties are theoretically studied. The estimator is shown to be robust, and its variance gains a significant reduction with respect to the usual MC case. It may also be viewed as a baseline (or target) variance for monotonic structural reliability frameworks. The non-asymptotic bias of the estimator is examined in Section 4, through numerical experiments involving a class of toy examples. Based on a neural network emulation of , bootstrap heuristics are proposed and successfully tested to remove this bias. Finally, a more realistic hydraulic case-study illustrates the benefits of the complete method.

Along the paper some connections are done with other areas of computational mathematics, especially about implementation issues, and a discussion section ends this article by focusing on the research avenues that must be explored in the area of stochastic sequential DOEs to improve the results presented here.

Let be a deterministic function defined as a real-valued scalar mapping of on its definition domain . Deterministic means that the function produces always the same output if it is given the same input . Global monotonicity is defined as follows: , , , , such that

 g(x1,…,xi−1,xi+siϵ,xi+1,…,xd) ≤ g(x1,…,xi−1,xi,xi+1,…,xd)

where represents the sign of monotonic dependence: (resp. ) when is decreasing (resp. increasing) along with the th component . The following assumption is made without loss of generality since any decreasing th component can be changed from to :

The function is globally increasing over .

To be general, and is a random vector defined on the probability space . Next assumption is made following this same concern of generality.

All inputs are independently uniform on .

In real cases, can be defined as transformed inputs, as usual in structural safety problems (Madsen & Ditlevsen, 1996). In such cases one can write where is a vector of physical inputs and is the multivariate distributional transform (Rüschendorf, 2009). Therefore where is a mononotic function and has to preserve this monotonicity. When the are independent, is reduced to the vector of marginal cdfs and is naturally increasing, so the assumption is not restrictive. Else, technical requirements on are needed, which depend on the way this joint distribution is defined (Rüschendorf, 2009). See for instance Chen (2009) for such requirements on Gaussian copulas. Another general result is given in the Appendix (Supplementary Material).

Both subspaces and are not empty (so that exists in ).

###### Definition 1.  —

A set of points of is said to be safety-dominated (resp. failure-dominated) if is guaranteed to be positive (resp. negative) in any point of this set.

Denote by the partial order between elements of defined by . Then assume that some point value is known, and consider the sets and . The increasing monotonicity implies that if (resp. ), then is safety-dominated (resp. is failure-dominated). This proves next lemma.

###### Lemma 1.  —

Both inequalities are true with probability 1:

 p ≤ 1−P(X∈\mathbbmU+~x)   if g(~x)>0, p ≥ P(X∈\mathbbmU−~x)   else.

More generally, assume that input vectors can be sorted into safe and failure sub-samples following the corresponding values of . They are respectively defined by

 Ξ+n = {x∈(xj)j=1,…,n | g(xj)>0}

and

 Ξ−n={x∈(xj)j=1,…,n | g(xj)≤0}.

Then one may define the sets

 \mathbbmU+n = {x∈\mathbbmU | ∃xj∈Ξ+n, x⪰xj}, \mathbbmU−n = {x∈\mathbbmU | ∃xj∈Ξ−n, x⪯xj}

(see Figure 2 for an illustration). Finally, denoting and to alleviate the notations, one has in all the sequel and for all ,

 p−n ≤ p ≤ p+n. (\theequation)

Hereafter, and will be referred to as dominated subspaces, where the sign of is known. Note that the complementary non-dominated subspace is the only partition of where further calls of are required to improve the bounds. Finally, a topological assumption on is needed to complete the formal description of the situations studied by de Rocquigny (2009) and Limbourg et al. (2010).

The limit state surface is regular enough and separates in two disjoint domains and (simply connected).

The second part of this assumption implies that, in terms of classification, the two classes of points and are perfectly separable when . This property will be used later in the paper to carry out bootstrap heuristics. By regular enough, is assumed not to be the surface of multidimensional stairs, so that it cannot be exhaustively described by a DOE with . This mild assumption is ensured, for instance, if is continuously differentiable on a non-empty measurable subset of . More formally, it is assumed that ,

 supxn∈¯S∫\mathbbmUn−1∩\mathbbmU−\mathbbm1{x⪯xn} dx < p−p−n−1, (\theequation) supxn∈¯S∫\mathbbmUn−1∩\mathbbmU+\mathbbm1{1−x⪯1−xn} dx < p+n−1−p. (\theequation)

This will imply that and the finiteness of the strictly positive quantity encountered further in the paper.

###### Remark 1.  —

In multi-objective optimization, a dominated space can be interpreted as a subset of a performance space delimited by a Pareto frontier (Figueira et al., 2005). In this framework, is thought as a monotonic rule of decision depending of variables, for which the set of best possible configurations (the frontier) is searched.

###### Remark 2.  —

The proportions are the volumes of two unions of hyperrectangles sharing the same orthogonal basis. Computing such volumes is known in computational geometry as Klee’s measure problem, for which recursive sweepline algorithms (van Leeuwen & Wood, 1981) can provide exact solutions. Details about their implementation are given in Appendix (Supplementary Material). When exceeds 4 or 5, these exact methods appear however too costly, and trivial MC methods must be preferred in practice to compute these quantities.

Starting from , and , the iterative scheme shared by all MRM variants at step is based on:

1. selecting a DOE ;

2. computing the signatures ;

3. updating the subspaces

 \mathbbmU−n = \mathbbmU−n−1∪{x∈\mathbbmU | ∃ x(j)n,  ξ(j)xn=1,  x⪯x(j)n}, \mathbbmU+n = \mathbbmU+n−1∪{x∈\mathbbmU | ∃ x(j)n,  ξ(j)xn=0,  x⪰x(j)n}, \mathbbmUn = \mathbbmU/(\mathbbmU−n∪\mathbbmU+n)
4. updating the bounds .

Since , then and the sequence is nondecreasing. Symmetrically, the sequence is nonincreasing. Since bounded in and , both sequences are converging.

At each step, the DOE must be chosen accounting for the increasing monotonicity of . Denoting and two elements of the DOE and assuming to know , it is unnecessary to compute in two cases:

 if ξ(1)xn=1 and x(1)n⪰x(2)n ⇒ x(2)n∈\mathbbmU−x(1)n and ξ(2)xn=1, if ξ(1)xn=0 and x(1)n⪯x(2)n ⇒ x(2)n∈\mathbbmU+x(1)n and ξ(2)xn=0.

Thus the order of trials should be carefully monitored, in relation with the partial order between the elements of the DOE. Reducing the DOE to a single element, i.e. for all steps, minimizes the number of unnecessary trials. This one-step ahead strategy is favored in the present paper.

First iterations should be monitored to reduce significantly the width of , such that further iterations mainly focus on refinements. A deterministic strategy seems the most appropriate to start from until providing non-trivial bounds. A dichotomic diagonal MRM, illustrated on Figure 2 in a two-dimensional case, was used in the examples considered further. It explores the non-dominated space in an intuitive way and stops at step such that

 k0 ≥ 1+log(1/p)dlog2.

Consequently, an expected crude prior value of can help to estimate the minimal number of trials. To alleviate the paper, the notation now describes the situation after introductive deterministic steps with , such that and .

Pursuing a deterministic strategy can be too costly to be efficient, the upper bound offering possibly a very conservative assessment of (de Rocquigny, 2009). Intuitively, such a strategy should be optimized by selecting the next element of the DOE as the maximizer of a criterion which predicts a measure of dominated volume. Apart from the difficulty of predicting, choosing the criterion remains arbitrary. Switching to a stochastic strategy, which allows for a sequential statistical estimation of in addition of providing bounds, seems a promising alternative approach. In this framework,

 xn ∼ fn−1

at each step , with a pdf defined on . Then the probability space becomes endowed with the filtration where is the algebra generated by a sequence. The sequences and become monotonic and bounded stochastic processes with dependent increments.

The remainder of this article is devoted to a baseline statistical estimation of in a monotonic framework, in a similar spirit to the MC approach in unconstrained frameworks. Therefore, in the following, the sampling is chosen uniform at each step: .

Assume that are successively uniformly sampled in the nested non-dominated spaces . Next lemma follows.

###### Lemma 2.  —

In corollary any estimator of located between the bounds is strongly consistent. Especially, any crude average of the bounds gains a statistical validity. A more sophisticated approach can be carried out by noticing that, at step , the occurence of a nonzero signature follows a Bernoulli distribution conditionally to , with

 γk = P(g(x)≤0|x∈\mathbbmUk−1), = P(g(x)≤0)−P(g(x)≤0|x∈\mathbbmU−k−1)P(x∈\mathbbmU−k−1)P(x∈\mathbbmUk−1)

from Bayes’ formula, hence

 γk = p−p−k−1p+k−1−p−k−1. (\theequation)

After steps, all information about is brought by the dependent-data likelihood defined by the product of these conditional Bernoulli pdf:

 Ln(p) = n∏k=1(p−p−k−1p+k−1−p−k−1)ξxk(p+k−1−pp+k−1−p−k−1)1−ξxk, (\theequation)

the maximum estimator (MLE) of which is considered in next proposition.

###### Proposition 3.1.  —

Denote . There exists a unique and consistent solution in of the likelihood equation , such that

 ^pn = n∑k=1~ωk(^pn)pkn∑k=1~ωk(^pn), (\theequation) with   ~ωk(p) = ((p−p−k−1)(p+k−1−p))−1   and   pk = p−k−1+(p+k−1−p−k−1)ξxk

Assumption 4 ensures the existence of since, by (\theequation) and (\theequation), cannot be reached by at least one of the two bounds for any finite . Similarly, the quantities defined in next propositions remain finite if the limit state surface has mild smoothness properties. They are related to the behavior of the inverse of the Fisher information associated to (\theequation), which converges to 0 faster than the variance of the usual MC estimator

 VMCn(p) = p(1−p)n.
###### Lemma 3.  —

Assume that is such that (\theequation) and (\theequation) hold (Assumption 4). Then, ,

 E[1/(p−p−n)2] < ∞, (\theequation) E[1/(p+n−p)2] < ∞, (\theequation)

and consequently .

###### Proposition 3.2.  —

Denote the Fisher information associated to (\theequation). Then

 J−1n(p) = (n∑k=1E[~ωk(p)])−1 ≤ VMCn(p)nn∑k=1(1−ck−1)−1 < VMCn(p) (\theequation)

where and ,

 ck = E[p−kp+1−p+k1−p−p−k(1−p+k)p(1−p)].
###### Proposition 3.3.  —

Denote . Then

 J−1n(p) ≤ VMCn(p)(pγ01−p).

In this data-dependent context, the central limit Theorem 3.1 remains classical in the sense that the Cramer-Rao bound given by the inverse of the Fisher information is asymptotically reached by the MLE. It is technically based on the martingality of the score process . Therefore inequalities (\theequation) and (3.3) imply asymptotic variance reduction with respect to Monte Carlo and robustness. From (3.3), the asymptotic coefficient of variation (CV) of the MLE is such that

 CV[^pn] ≤ pE[p−n−1]√γ0n ∞∼ √γ0n.
###### Theorem 3.1.  —

Let be any deterministic sequence in such that . Under the supplementary assumptions:

(i)

(ii)

(iii)

then

 J1/2n(p)(^pn−p)L→N(0,1). (\theequation)

The law of large numbers (i) reflects the requirement that the sum of weights cannot diverge faster than from its mean behavior when . Although difficult to check in practice, this behavior seems rather intuitive because the sampling trajectories mainly vary at the first steps of the algorithm, when the non-dominated space is still large. Therefore (i) can be perceived as an indirect requirement on the surface . Assumption (ii) appears somewhat natural, saying that the bounds converge to symmetrically. Assumption (iii) expresses the idea that any estimator located between and converges to faster than the bounds. Again, it seems intuitive since is defined as an incremental average (cf. (\theequation)), and therefore adopts a smoother behavior than the bounds, as a function of .

Next proposition allows for an empirical estimation of the asymptotic variance and confidence intervals. The additional requirement (v) appears mild and in the same spirit than the smoothness assumptions on , saying that cannot be exactly reached by an average of the bounds for any finite number of trials.

###### Proposition 3.4.  —

Denote . Under the assumptions of Theorem 3.1, and assuming in addition:

(iv)

Assumption (i) remains true ,

(v)

such that ,

then

 ^J\nobreak \ignorespaces5/2n(p)|^J′n(p)|(^J\nobreak \ignorespaces−1n(^pn)−J−1n(p)) L→ N(0,1). (\theequation)

The reality of the theoretical descriptions hereinbefore is examined in the two next sections, through numerical experiments conducted on toy examples and a more realistic hydraulic model.

The statistical behavior of the MLE is illustrated here using the following generic toy example. For a given dimension , denote

 Zd = hd(Y) = Y1/(Y1+d∑i=2Yi)

where the physical input follows the gamma distribution with cdf , independently of other inputs. Obviously, , is increasing in where , and follows the beta distribution . Therefore, denoting the order quantile of , the deterministic function defined by

 gd(X) =hd∘T−1(X)−qd,p,

with , is related to the known exceedance probability .

In dimension 2, using , the behavior of MRM bounds can be easily compared to the MC 95%-confidence area (Figure 5). This small dimension induces a significant improvement in precision with respect to Monte Carlo, which however disappears in higher dimensions and highlights the need for real statistical estimators. Studies of the root mean square error (RMSE) and the standard deviation of the MLE, which are plotted in Figure 5 as functions of the increasing number of calls to for dimensions 3 and 4, reflected the high variance reduction of the iterative estimator with respect to Monte Carlo but highlighted a positive bias (Figure 5). Indeed the highest weights favor local estimators when approaching (ie., when in (3.1)). On the examples considered in this last figure (as well as in other experiments not shown here), a marked gap in relative bias was noticed between dimensions 3 and 4. Under dimension 4, the bias remains reasonable from a moderate number of iterations (typically 400). Else it dramatically stays at high values. Other experiments have shown on this example the effective convergence of the empirical variance of the MLE towards the Cramer-Rao bound as well as the good behavior of its empirical estimate (Figure 7).

Bias removal appears as a practical requirement, automatizing the estimation of . Indeed, given a finite value of , estimating requires to decide from which iteration the computation of the MLE can be worth it, redefining with . An intuitive rule is to select

 k∗n = argminknRMSE(^pn). (\theequation)

If the MLE were debiased, which is minimized by .

Given a fixed number of trials, two general approaches may be used for controlling and correcting the bias affecting . A corrective approach consists of obtaining a closed-form expression for the bias from Taylor expansions (Cox & Snell, 1968; Ferrari & Cribari-Neto, 1998) or penalizing the score or the likelihood functions (Bester & Hansen, 2005). This approach is not carried out here since the data-dependent context would require a specific algebraic work and technical developments about the empirical estimation of the main quantities involved. The alternative use of bootstrap resampling techniques (Efron & Tibshirani, 1993) can assess the bias empirically. For the simplicity of their principle, these heuristics are preferred here.

In the present context, bootstrap experiments must be based on a replication of the limit state surface (see Figure 7 for an illustration). Under Assumption 4, can be interpreted as the decision frontier of a supervised classification binary problem, without horseriding of classes (ie., perfectly separable). Therefore depends on the choice of a classifier calibrated from an arbitrary number of points sampled in dominated subspaces. The rationale of the bootstrap heuristics is as follows. Given , the signature of any can be predicted by the occurence of . Then denote the volume under . It can easily be estimated by at an arbitrary precision by Monte Carlo sampling (depending on ). Moreover a large number of MLE estimators of can be fastly computed using the predicted signatures.

The bootstrap heuristics make sense if the classifier is chosen such that when , so that the features of the experiment are asymptotically reproduced. Moreover, must produce a monotonic surface . For these reasons, the four-layer monotonic Multi-Layer neural networks (MLNN) proposed by Daniels & Velikova (2010) have been chosen for the experiments. Based on a combination of minimum and maximum functions (so-called MIN-MAX networks) over the two hidden layers, these networks have universal approximation capabilities of monotonic continuous functions. Besides, this choice matches the advices by Hurtado (2004) who strongly recommended the MLNN and Support Vector Machines (SVM) to estimate in a structural reliability context. Both tools are flexible, can estimate a frontier on the basis of a few samples and overcome the curse of dimensionality.

Classification-based bootstrap algorithm

1.

2. Sample and .

3. From , build a monotonic classifier of .

4. Replace by the uncostly monotonic (increasing) function

 ~g(x) = {−1if P(g(x≤0)|^Cn,M)≥1/2,