Cross-Entropy Based Importance Sampling for Stochastic Simulation Models

# Cross-Entropy Based Importance Sampling for Stochastic Simulation Models

Quoc Dung Cao Youngjun Choe Department of Industrial and Systems Engineering,
University of Washington, Seattle, WA 98195, USA
###### Abstract

To efficiently evaluate system reliability based on Monte Carlo simulation, importance sampling is used widely. The optimal importance sampling density was derived in 1950s for the deterministic simulation model, which maps an input to an output deterministically, and is approximated in practice using various methods. For the stochastic simulation model whose output is random given an input, the optimal importance sampling density was derived only recently. In the existing literature, metamodel-based approaches have been used to approximate this optimal density. However, building a satisfactory metamodel is often difficult or time-consuming in practice. This paper proposes a cross-entropy based method, which is automatic and does not require specific domain knowledge. The proposed method uses an expectation–maximization algorithm to guide the choice of a mixture distribution model for approximating the optimal density. The method iteratively updates the approximated density to minimize its estimated discrepancy, measured by estimated cross-entropy, from the optimal density. The mixture model’s complexity is controlled using the cross-entropy information criterion. The method is empirically validated using a numerical study and applied to a case study of evaluating the reliability of wind turbine using a stochastic simulation model.

###### keywords:
Monte Carlo, variance reduction, mixture model, cross-entropy information criterion
journal: Reliability Engineering & System Safety

## 1 Introduction

There are two types of computer simulation models, deterministic simulation models and stochastic simulation models. In a deterministic simulation model, the simulator will produce a fixed output for the same input values. In this paper, we focus on stochastic simulation models. Due to extra stochastic elements in the simulation model, given a fixed input value, the output is a random variable representing the outcome of the simulation. Despite its flexible modeling capacity, stochastic simulation models can be more complex and computationally costly to generate a simulation outcome.

One of the applications of Monte Carlo simulation is to compute an estimator of a system’s expected output. In Monte Carlo simulation, one draws inputs from a suitable probability distribution, calculates the output, and repeats the process multiple times to obtain different outputs. The estimate of the expected output can be the average of those simulated outputs. Monte Carlo simulation is widely used with both deterministic simulation models and stochastic simulation models. As the computing power advances over the past decades, the reliance on Monte Carlo simulation has increased in various engineering disciplines. Particularly in reliability engineering, Monte Carlo simulation is widely used to study the behavior and reliability of a system, often through estimating its failure probability.

However, the reliability evaluation based on Monte Carlo simulation remains challenging. The more closely the simulation model mimics the real system, the more computationally expensive each simulation run is. Moreover, highly reliable systems such as wind turbines or nuclear reactors require many simulation replications to encounter a rare event of interest such as a system failure, which translates to further demand for computational resources. These challenges are exacerbated with stochastic simulation models, which often have to be run multiple times at the same input value. These challenges highlight the importance of improving computational efficiency of the reliability study with stochastic simulation models.

Among techniques to speed up Monte Carlo simulation in rare event probability estimation, importance sampling (IS) is one of the most promising methods kroese:2011handbook (), especially with deterministic simulation models kahn1953 (). IS can boost the simulation efficiency by reducing the number of required simulation runs to achieve a target variance of failure probability estimator mcap2017 (). IS methods are also used with stochastic simulation models in various applications, such as finance glasserman2005 (), insurance asmussen2000 (), reliability heidelberger1995 (); balesdent2016rare (), communication networks chang1994 (), and queueing operations sadowsky1991 (); blanchet2009 (); blanchet2014 ().

Theoretically, if the inputs are drawn from the optimal IS density, the variance of the probability estimator will be minimized and we can save the most computational cost. However, the theoretically optimal IS density is generally not implementable in practice and often necessitates some approximations such as a metamodel-based method Choe2015 () or the cross-entropy (CE) method rubinstein1999 (). Metamodel-based IS aims to represent the original high fidelity but black-box simulation model by a surrogate model of which outputs can be more efficiently computed. From the surrogate model, the optimal IS density can be approximated to reduce the variance of the failure probability estimator DUBOURG2013 (). From another perspective, CE-based IS aims to find an IS density that is as close as possible to the theoretically optimal IS density, where closeness is measured by the Kullback-Leibler divergence rubinstein1999 (). The CE method has been widely used for estimating the optimal IS density with deterministic simulation models. However, to the best of our knowledge, there is no literature exploring the CE method’s application to stochastic simulation models. Thus, this paper proposes a novel method called the cross-entropy based stochastic importance sampling (CE-SIS) to approximate the optimal IS density for stochastic simulation models.

One challenge of the CE-based IS is the dilemma between parametric and nonparametric representation of the candidate optimal IS density. In the standard CE method for deterministic simulation models, the candidate IS density is confined to a parametric family. Some of the popular parametric IS densities are multivariate Bernoulli and multivariate Gaussian, thanks to their simple random variate generation and convenient updating formulae for CE minimization. However, parametric IS can be too rigid to capture the complicated important region and it is often difficult to validate the parametric model assumptions botev2013 (). Nonparametric approach can offer flexibility to overcome such limitations. One of the most popular nonparametric density approximation methods is the kernel approach. Besides its flexibility, the kernel approach also has its own drawback. The probability density model is not as parsimonious as (a mixture of) parametric density models. This lack of parsimony often makes subsequent inference and analysis of the resulting model computationally intensive rubinstein2005 (); botev2007 (). For stochastic simulation models, the parametric IS approach needs a strong assumption but the variance of the IS estimator converges to the optimal variance faster than the nonparametric IS approach chen2017oracle (). The nonparametric IS approach requires weak assumptions but the variance reduction rate is not as fast as the parametric IS approach chen2017oracle (). To achieve a good balance between flexibility and efficiency in the CE-SIS method, we express the candidate IS density for stochastic simulation models using a mixture of parametric distributions. We particularly focus our study on the Gaussian mixture model due to its flexibility to model a smooth distribution.

The proposed method is validated with a numerical example and a case study on wind turbine reliability evaluation. In the case study, our method is benchmarked against a state-of-the-art metamodel-based IS method in the literature. Approximating the optimal IS density automatically without relying on expert domain knowledge (which is often necessary for metamodel construction), the CE-SIS method demonstrates comparable results. This shows the advantage of the CE-SIS method in situations when building a high-quality metamodel is difficult or time-consuming; the CE-SIS method would conveniently provide substantial computational saving potentially more than metamodel-based methods.

## 2 Background

The crude Monte Carlo (CMC) method samples the input, , from a known probability density function, kroese:2011handbook () . From the input , a simulator generates the output, . If the simulation model is deterministic, is a deterministic function of , i.e., . For the stochastic simulation model, is stochastic for a given . Due to the random vector within the simulation model, it generates different outputs even at a fixed input .

In reliability engineering, a quantity of interest is the so-called failure probability, , where is a pre-specified threshold on a system that fails if exceeds . Note that any failure event set can be expressed as using a transformation. To estimate the failure probability, the following CMC estimator is most commonly used,

 ^PCMC =1nn∑i=1I(Y>l), (1)

where is the number of total simulation replications. The estimator in (1) is an unbiased estimator for . For the deterministic simulation model, the failure probability is equivalent to , where is the indicator function and the subscript, , appended to the expectation operator, , denotes that the expectation is taken with respect to , the density from which is drawn. For the stochastic simulation model, the failure probability is the expectation of the conditional failure probability, expressed as .

In a highly reliable system, the failure probability can be very low. It may take many simulation runs until a failure occurs. In some cases, each simulation run can be computationally very expensive. In order to obtain a good estimator of the failure probability, the number of simulation runs can be large. To save the computational resource, IS changes the sampling distribution of from to another distribution that makes the failure events more likely. Sampling from makes the estimator in (1) no longer unbiased.

For the deterministic simulation model, to make the failure probability estimator unbiased, the IS estimator becomes

 ^PDIS =1nn∑i=1I(Yi>l)f(Xi)q(Xi), (2)

where , , is sampled from instead of kroese:2011handbook (). is the output from the simulator corresponding to the input . The variance of is minimized if is sampled from the optimal IS density kahn1953 ()

 qDIS(x) =I(g(x)>l)f(x)P(Y>l). (3)

Here, the indicator function can be evaluated only by running the simulator since the function is unknown. The denominator is the unknown quantity that we want to estimate. Thus, is not implementable in practice, and approximation of is required. As mentioned above, for the deterministic simulation model, the optimal IS density can be approximated using methods such as a metamodel-based method or the CE method. Hereafter, we call the IS method for a deterministic simulation model DIS.

For the stochastic simulation model, to capture the extra randomness, the unbiased IS estimator becomes

 ^PSIS =1mm∑i=1(1NiNi∑j=1I(Y(i)j>l))f(Xi)q(Xi), (4)

where is the number of distinct values and is the th output from replications at the distinct input, . Such replications at the same value capture the additional randomness within the simulation model. Hereafter, we call the IS method for a stochastic simulation model SIS.

Similar to DIS, the variance of is desired to be as small as possible. Minimizing the variance requires two steps Choe2015 ():

(a) sampling from

 qSIS(x) =1Cqf(x)√1ns(x)(1−s(x))+s(x)2, (5)

where is and is the normalizing constant;

(b) allocating replications to by

 Ni =n√n(1−s(Xi))1+(n−1)s(Xi)∑mj=1√n(1−s(Xj))1+(n−1)s(Xj),i=1,…,m. (6)

Since the conditional probability, , is unknown in practice, the IS method for a stochastic simulation model requires the approximation of , , and . This paper will focus on how the CE method can be extended to approximate . In the next subsection, we first review how the CE method is applied to approximate .

### 2.1 CE Method for DIS

The CE method is originally developed to find the density that best approximates the optimal density of DIS rubinstein1999 (). The standard parametric CE method limits the search space for the optimal IS density to a pre-specified parametric family (e.g., Gaussian, Poisson, gamma, etc.), , and seeks the density that is closest to the optimal density . The closeness is measured by the Kullback-Leibler divergence rubinstein1999 (),

 D(q∗(x),q(x;θ)) (7)

This quantity is always non-negative and takes zero if and only if almost everywhere. Thus, minimizing over leads to if belongs to the same parametric family as .

Minimizing in (7) over is equivalent to minimizing its second term, known as the cross-entropy (CE)

 C(q∗(x),q(x;θ)) =−∫q∗(x)logq(x;θ)dx, (8)

because the first term in (7) is a constant over . As shown in (3), the optimal IS density can be expressed as , where is for DIS. The CE method aims to equivalently minimize

 C(θ) =−∫h(x)f(x)logq(x;θ)dx =−Ef[h(X)logq(X;θ)] =−∫h(x)f(x)q(x;θ′)logq(x;θ)q(x;θ′)dx (9) =−∫h(x)w(x;θ′)logq(x;θ)q(x;θ′)dx =−Eq[h(X)w(X;θ′)logq(X;θ)], (10)

where , , and is the original density of . The likelihood ratio is denoted by . Note that the equality in (9) holds under the condition that implies . This condition can be easily satisfied by ensuring that the support of includes the support of even if is unknown. The condition is always satisfied by the Gaussian mixture model. In practice, the CE method finds that minimizes the following IS estimator of (10),

 ¯C(θ) =−1nn∑i=1h(Xi)w(Xi)logq(Xi;θ), (11)

where , is sampled from . Note that by writing , we surpress the notation for dependence of on in (11). Using this IS estimator, the CE method minimizes the CE iteratively:

1. Sample , from . At the first iteration, can be flexible (e.g., is commonly used).

2. Find , where is in (11).

3. Set and start the next iteration from Step 1 until some stopping criterion is met.

This procedure iteratively refines . However, the refinement is limited, as the parameter search space, , is restricted by a pre-defined parametric family.

Some studies rubinstein2005 (); botev2007 () explore nonparametric approaches to allow greater flexibility on the candidate IS density than the standard CE method. However, as mentioned before, the flexibility comes with costs: finding the optimal density botev2007 () or sampling from the optimized density rubinstein2005 () is computationally challenging. Bridging between the two extremes of the spectrum (a parametric density with or a nonparametric density with , where is the number of parameters in a candidate IS density and is the number of simulation replications), a few studies botev2013 (); Wang2015 (); kurtz2013 () recently consider the mixture of parametric distributions, where can vary between and . This approach is particularly desirable for engineering applications because (a) it can be as flexible as we want; (b) it is easy and fast to sample from the mixture of parametric distributions; and (c) the mixture IS density provides an insight into the engineering system (e.g., means of mixture components often coincide with the so-called ‘hot spots’, where the system likely fails.). Particularly in (Choe2017, ), the candidate distribution is expressed by the Gaussian mixture model (GMM) and an expectation–maximization (EM) algorithm is used to minimize the cross-entropy estimator in (11).

### 2.2 Gaussian Mixture Model and EM Algorithm

The GMM of mixture components takes the following form:

 q(x;θ) =k∑j=1αjqj(x;μj,Σj), (12)

where the component weights, , , are positive and sum to one. The th Gaussian component density, , is specified by the mean, , and the covariance . Thus, the parameter vector of GMM denotes . Since is a function of , the GMM model can be explicitly written as . For simplicity, we will write the GMM as unless we should express different models in terms of .

To minimize (11), the gradient of (11) with respect to is set to zero:

 −1nn∑i=1h(Xi)w(Xi)∇θlogq(Xi;θ) =0. (13)

This leads to the updating equations as derived in kurtz2013 ():

 αj =∑ni=1h(Xi)w(Xi)γij∑ni=1h(Xi)w(Xi), (14) μj =∑ni=1h(Xi)w(Xi)γijXi∑ni=1h(Xi)w(Xi)γij, (15) Σj =∑ni=1h(Xi)w(Xi)γij(Xi−μj)(Xi−μj)T∑ni=1h(Xi)w(Xi)γij, (16)

where

 γij =αjqj(Xi;μj,Σj)∑kj′=1αj′qj′(Xi;μj′,Σj′). (17)

As the name suggests, the right-hand sides of the ‘updating’ equations (14), (15), (16) involve either explicitly or implicitly through . As such, the updating equations are interlocking with each other and cannot be solved analytically. Thus, by starting with an initial value for on the right-hand sides of the updating equations, the left-hand sides are computed and plugged back to the right-hand sides iteratively until the convergence is reached. This optimization procedure is a version of the EM algorithm that alternates between the expectation step (computing ) and the maximization step (updating ).

A common challenge in using GMM to approximate the IS density is balancing between the estimated model’s KL divergence (a closeness measure between the approximated IS density and the optimal one) and the model complexity. The studies botev2013 (); Wang2015 (); kurtz2013 () that consider mixture models point out the difficulty associated with the choice of the number of mixture components, . They either assume that is given botev2013 (); kurtz2013 () or follow a rule of thumb based on “some understanding of the structure of the problem at hand” Wang2015 (). Overcoming the need of picking a priori when using GMM to approximate the optimal DIS density, Choe (Choe2017, ) proposes the cross-entropy information criterion (CIC) to select automatically.

### 2.3 Cross-Entropy Information Criterion

In order to balance between the cross-entropy estimate and the model complexity, the CE estimator is minimized and the model complexity is concurrently penalized. CIC for deterministic simulation models takes the following form:

 CIC(t)DIS(d) =¯C(t−1)DIS(^θ)+^K(t−1)DISd∑t−1s=0m(s), (18)

where the CE estimator and are calculated using all the aggregated data up to iteration :

 ¯C(t−1)DIS(^θ) =−1∑t−1s=0m(s)t−1∑s=0m(s)∑i=1h(Xi)w(Xi;^θ(s))logq(Xi;^θ) (19)
 ^K(t−1)DIS =1∑t−1s=0m(s)t−1∑s=0m(s)∑i=1h(Xi)w(Xi;^θ(s)). (20)

Note that is the dimension of and proportional to , where denotes the dimension of the input . The second term of the CIC in (18) penalizes the model complexity by being linearly proportional to . The CIC is an asymptotically unbiased estimator of the cross-entropy (up to a multiplicative constant) under one key assumption that there exists such that , and several other regularity conditions (see (Choe2017, ) for details) that are similarly required for the Akaike information criterion (AIC) (akaike1974, ) to be an asymptotically unbiased estimator of the true log-likelihood. Since for stochastic simulation models, is unknown, it has to be estimated through . We will present the CIC formulation for SIS in the next section.

## 3 Methodology

This section proposes a model selection solution to the estimation of the optimal IS density for stochastic simulation models. We use the GMM to express candidates for the optimal density. The CIC will automatically determine the mixture order of the GMM and control the model complexity. Among different mixture models, we choose the GMM since it is the most widely used mixture model thanks to its ability to model any smooth distribution. In general, the proposed CE method is not limited to just the GMM. It can be extended to other mixture models as long as we can minimize a CE estimator. In the convenient case of the GMM, an EM algorithm can be applied to estimate the GMM parameters. Moreover, it is computationally efficient to sample from the GMM.

It is natural to apply the concept of CIC in DIS to SIS. The expression of CIC involves , which is known in DIS but not in SIS. Hence we use the estimator which is a function of , the estimated probability of failure at a particular value.

### 3.1 Approximations Necessary for Implementation

To implement the proposed method, we need to know for computing CIC, evaluating the EM algorithm equations (14)-(17) to solve for , and minimizing the CE estimator. For SIS, needs to be estimated because is unknown. We estimate by

 ^s(Xi) =1NiNi∑j=1I(Y(i)j>l) (21)

and then estimate by plugging in :

 ^h(x)=√^s(x)(1−^s(x))/n+^s(x)2. (22)

As the sample size grows large, asymptotically converges to . Specifically, if increases faster than , then the convergence holds by the law of large numbers and the continuous mapping theorem.

SIS also needs to allocate replications at each , as explained in Section 2. In Appendix 1, we show that for a large , the optimal in (6) is approximately proportional to . Thus, we decide based on this approximation. If , we assign , to ensure the unbiasedness of in (4).

### 3.2 Aggregated Failure Probability Estimation

We aggregately use the samples obtained in all of the CE iterations. Instead of in (4), we compute the failure probability estimator using all the aggregated data up to iteration :

 ¯P(t)SIS =1t+1t∑s=01m(s)m(s)∑i=11N(s)iN(s)i∑j=1I(Y(s)ij>l)f(X(s)i)q(X(s)i;^θ(s)), (23)

where is the random input at iteration , is the number of drawn at iteration , and is the number of simulation runs at each .

The idea of data aggregation leads to the aggregated CIC formulation for stochastic simulation models as follows:

 CIC(t)SIS(d) =¯C(t−1)SIS(^θ)+^K(t−1)SISd∑t−1s=0m(s), (24)

where the CE estimator and are calculated using all the aggregated data up to iteration :

 ¯C(t−1)SIS(^θ) =−1∑t−1s=0m(s)t−1∑s=0m(s)∑i=1^h(Xi)w(Xi;^θ(s))logq(Xi;^θ) (25)

Noting that in (18) is an unbiased DIS estimator of the failure probability, we similarly set

 ^K(t−1)SIS =¯P(t−1)SIS. (26)

### 3.3 Summary of the Proposed Method (CE-SIS)

For the EM initialization, the grid search of the optimal number of mixture components , the stopping criteria of the EM algorithm to minimize the quantity in (25), and whole division of the replication allocation , we follow the same procedures in (Choe2017, ). The details of implementation are provided in Appendix 2.

We propose the CE-SIS pseudo-code in Algorithm 1:

INPUT:

• Initialize the GMM for the first iteration (e.g., GMM with and the identity covariance matrix).

• Given a simulation budget, , we pick the actual simulation runs at each iteration, , such that , where denotes the number of total CE iterations.

• Fix the number of values to draw, , for each iteration such that, for example, , as adopted in (Choe2015, ). According to (Choe2015, ), SIS should be generally insensitive to the ratio . Note that at the first iteration , we set or equivalently , to explore the input space as much as possible.

• Pick the maximum number of components of the GMM, , for each iteration t. can depend on since it is bounded by a function of the available sample size.

## 4 Numerical Example

In this section, we use a numerical example to evaluate the performance of the CE-SIS algorithm presented above. Assuming that the data generating process is unknown, we compare the performance of the CE-SIS algorithm with the state-of-the-art metamodel-based approach in mcap2017 ().

Cannamela et al. cannamela2008 () originally develop a deterministic simulation model example, which is later modified in Choe2015 () into a stochastic simulation model example. We test the CE-SIS algorithm with the latter numerical example. Its data generating structure is as follows:

 X∼N(0,1), Y|X∼N(μ(X),σ2(X)),

where the mean and the standard deviation are:

 μ(X) =0.95X2(1+0.5cos(5X)+0.5cos(10X)), σ(X) =1+0.7|X|+0.4cos(X)+0.3cos(14X).

To approximate the optimal density in (5) and the allocation in (6), the metamodel-based method in mcap2017 () also needs to estimate the conditional probability using a metamodel. The metamodel for the conditional distribution is set as the normal distribution with the parameters, mean and standard deviation, modeled as cubic spline functions in the generalized additive model for location, scale and shape (GAMLSS) framework rigby2005 (). The metamodel is built using a pilot dataset of 600 simulation replications and the IS experiment is based on 1000 simulation replications.

To ensure that both methods use the same simulation budget, for the CE-SIS method, we use and for . The experiment is repeated times to obtain the results in Table 1. Since we have full knowledge about the underlying process, we also show in Table 1 the result obtained from the optimal SIS density that uses the true function.

The performance metric we use is the CMC ratio, which is defined as

 CMC ratio=nnCMC,

where is the total number of simulation replications (e.g., 600 + 1000 for metamodel-based SIS, 600 + 1000 for CE-SIS, and 1000 for the optimal SIS) and is the total number of simulation replications required for the crude Monte Carlo simulation to achieve the same standard error as each method, which is calculated as

 nCMC=^P(1−^P)S.E.2.

Note that is the standard error of each method. is the probability estimate from the optimal SIS, which is most accurate, i.e., 0.00996. In Table 1, the smaller CMC ratio translates to a smaller number of replications used in each row’s method divided by the number of replications necessary for CMC in (1) to achieve the standard error in the row, which means better computational saving for the same accuracy level.

It appears that the metamodel-based approach is slightly better than the CE-SIS method. The result is not surprising in the following context. First, the metamodel represents one of the best metamodels that can be built in practice since the correct distribution family (i.e., Gaussian) is used for for building the metamodel. In reality, the underlying data generating distribution is unknown and likely much more complex, rendering the construction of a good metamodel difficult, especially when the pilot dataset is small. In fact, the main advantage of the CE-SIS method is the ability to automatically come up with the best approximated importance sampling density. In other words, scalability is the main advantage of CE-SIS method, especially when we want to develop an off-the-shelf software package for SIS. Second, the difference between the standard errors of two methods, 0.00005, is small (i.e., 0.5% of the failure probability), which means the CE-SIS method can achieve performance close to the best possible models.

## 5 Case Study

This section presents an application of the CE-SIS method to the case study on wind turbine reliability evaluation. We compare the performance between the CE-SIS method and the metamodel-based method Choe2015 (), using the same CMC ratio metric in Section 4.

The reliability of a wind turbine is subject to stochastic weather (Byon2010a, ). To incorporate the randomness into the reliability evaluation of a turbine design, the international standard, IEC 61400-1 (IEC2005, ), requires the turbine designer to use stochastic simulations. We evaluate the reliability of a wind turbine using computationally expensive aerodynamic simulators (each simulation run takes roughly 1 minute on a typical PC available nowadays) developed by the U.S. National Renewable Energy Laboratory. Specifically, TurbSim is used to generate a 3-dimensional time-marching wind profile, and FAST is used to simulate a turbine’s structural bending moment responses Choe2015 (); Jonkman2005 (); Jonkman2009 (). Each simulation represents a 10-min simulated operation of the turbine. Our simulation setup is identical with the benchmark setting used in moriarty2008 ().

We are interested in estimating the probability of a failure event which is defined as the bending moment at a turbine’s blade root exceeding a specified threshold. In particular, we study two bending moment types: edgewise and flapwise. We estimate the probability that a bending moment exceeds a threshold , where = 8,600 kNm for edgewise bending moment and = 13,800 kNm for flapwise bending moment, both of which occur with around 5% probability.

The input wind speed, , is drawn from the truncated Rayleigh density Choe2015 (); choe2015extload () defined as:

 f(x)=fR(x)FR(xout)−FR(xin),

where is the untruncated Raleigh density with the shape parameter, , and is the corresponding cummulative distribution function. The cut-in and cut-out wind speed are = 3 m/s and = 25 m/s, respectively, denoting the range of wind speeds for which a wind turbine operates.

In the metamodel-based method in Choe2015 (), the conditional probability is approximated using a nonhomogenous Generalized Extreme Value (GEV) distribution:

 P(Y≤y∣X=x)=⎧⎪ ⎪⎨⎪ ⎪⎩exp(−(1+ξ(y−μ(x)σ(x)))−1/ξ),for ξ≠0exp(−exp(−y−μ(x)σ(x))),otherwise,% (27)

where and are the location and scale parameter functions, respectively, modeled with cubic smoothing spline functions under the GAMLSS framework. The shape parameter is fixed as a constant, with estimated value for the edgewise (flapwise) case. The goodness-of-fit of the GEV distribution is tested using the Kolmogorov-Smirnov test. The metamodel is built using a pilot sample of 600 simulation replications and in addition, the metamodel-based method uses 1000 (2000) replications for the failure probability estimation for edgewise (flapwise) bending moment.

In the CE-SIS method, we allocate the same simulation budget as the metamodel-based method. We use and (200) for for the edgewise (flapwise) case.

Table 2 compares the results based on 50 repetitions of each method. The CE-SIS method has slightly smaller (larger) standard error than the metamodel-based method for the edgewise (flapwise) bending moment. Accordingly, both methods save the similar level of computational resource compared to CMC, as indicated by the CMC Ratio. In the metamodel-based method, the metamodel is carefully built by fitting a nonhomogeneous GEV distribution to the pilot data. We can see that the performance of the CE-SIS method is comparable to that of the metamodel-based SIS with a high quality metamodel. Note that since CE-SIS is an automated method, it can be particularly helpful when building a metamodel is difficult. We also observe that the computational saving in the edgewise case is more substantial than the flapwise case. This is because the approximated SIS density is not very different from the original density in the flapwise case (this phenomenon is first observed and discussed extensively in Choe2015 (), to which an interested reader is referred). Nevertheless, we see that the CE-SIS method, without requiring domain knowledge about the underlying process, can still capture this information satisfactorily.

## 6 Conclusion

We propose a method called the cross-entropy based stochastic importance sampling (CE-SIS) which can efficiently construct an importance sampling density for a stochastic simulation model. The CE-SIS method uses an EM algorithm to minimize the estimated cross-entropy from a candidate IS density to the optimal IS density while penalizing the model complexity concurrently. The method automatically refines the estimated IS density, thus not requiring the specific domain knowledge for building a metamodel, which is often difficult or time-consuming in practice. We focus on a candidate IS density expressed as a Gaussian mixture model, which is both flexible and computationally efficient, while the extension to other mixture models is possible. The application of the cross-entropy information criterion allows a sound choice of the mixture model complexity for the CE-SIS method. By aggregating all the data from previous iterations and effectively allocating the number of replications at each input value, the CE-SIS method utilizes the available information efficiently under a given simulation budget. The numerical example and case study show the advantage of the CE-SIS over the metamodel-based SIS and crude Monte Carlo simulation.

## Appendix 1: Approximation of Ni

We seek an asymptotic approximation of the optimal allocation size,

 Ni =n√n(1−s(Xi))1+(n−1)s(Xi)∑mj=1√n(1−s(Xj))1+(n−1)s(Xj),i=1,…,m.

First, we show . For a large , we can approximate

 qSIS(Xi) =1Cqf(Xi)√1ns(Xi)(1−s(Xi))+s(Xi)2 ≈1Cqf(Xi)√s(Xi)2 =1Cqf(Xi)s(Xi)

for any . This asymptotic approximation may be not good for some if is close to zero. However, in that case, is small too, and such is unlikely to be sampled in the first place. Therefore, we can approximate

 s(Xi) ≈Cqw(Xi),

where the second approximation is based on the fact that the estimated IS density approximates the optimal density.

Furthermore, for a large , we can also approximate

 Cq =∫Xff(x)√1ns(x)⋅(1−s(x))+s(x)2dx ≈∫{x:s(x)>1/(n+1)}f(x)√1ns(x)⋅(1−s(x))+s(x)2dx ≈∫{x:s(x)>1/(n+1)}f(x)s(x)dx ≈^PSIS,

where is the support of . Thus, it follows that

 s(Xi) ≈^PSISw(Xi).

Therefore, for a large ,

 Ni ∝√n(1−s(Xi))1+(n−1)s(Xi) ≈    ⎷1−^PSISw(Xi)^PSISw(Xi) ∝√w(Xi)−^PSIS.

Although it does not happen frequently, if , then we set the corresponding as , the smallest allocation possible to maintain the unbiasedness of the SIS estimator.

## Appendix 2: Implementation details of the CE-SIS algorithm

Since the EM algorithm will lead to a local optimum for non-convex problems, we use 10 random initializations of and choose the best minimizer of in (25) to reduce the impact of initial guess of on the algorithm’s performance (figueiredo2002, ).

By monitoring the condition numbers of the Gaussian components’ covariances (figueiredo2002, ), the number of components can also be controlled to prevent over-fitting issue within the EM algorithm. We choose a singularity threshold of for the covariance matrix. Exceeding the threshold will signify an ill-conditioned matrix. If more than a half of the 10 initializations observe ill-conditions, we stop the EM algorithm and consider is reached for that iteration.

Checking the convergence of the EM algorithm is through monitoring the reduction of in (25). In the numerical study in Section 4, iterating the updating equations in the EM algorithm is stopped if the reduction of is less than 1% or a specified maximum number of iterations is reached.

For the allocation at each iteration , we round the numerical to the nearest integer. The difference between and due to rounding is spread across the largest ’s equally to make sure we strictly stay within the simulation budget. The purpose of spreading across the largest ’s (as opposed to, say, the smallest ones) is to minimize the deviations from the prescribed ’s. Note that SIS is generally insensitive to variations of ’s Choe2015 ().

## References

• (1) D. P. Kroese, T. Taimre, Z. I. Botev, Handbook of Monte Carlo Methods, New York: John Wiley and Sons., 2011.
• (2) H. Kahn, A. W. Marshall, Methods of reducing sample size in Monte Carlo computations, Journal of the Operations Research Society of America 1 (5) (1953) 263–278.
• (3) Y. Choe, H. Lam, E. Byon, Uncertainty quantification of stochastic simulation for black-box computer experiments, Methodology and Computing in Applied Probability.
URL https://doi.org/10.1007/s11009-017-9599-7
• (4) P. Glasserman, J. Li, Importance sampling for portfolio credit risk, Management Science 51 (11) (2005) 1643–1656.
• (5) S. Asmussen, K. Binswanger, B. Højgaard, Rare events simulation for heavy-tailed distributions, Bernoulli 6 (2) (2000) 303–322.
• (6) P. Heidelberger, Fast simulation of rare events in queueing and reliability models, ACM Transactions on Modeling and Computer Simulation (TOMACS) 5 (1) (1995) 43–85.
• (7) M. Balesdent, J. Morio, L. Brevault, Rare event probability estimation in the presence of epistemic uncertainty on input probability distribution parameters, Methodology and Computing in Applied Probability 18 (1) (2016) 197–216.
• (8) C. Chang, P. Heidelberger, S. Juneja, P. Shahabuddin, Effective bandwidth and fast simulation of ATM intree networks, Performance Evaluation 20 (1) (1994) 45–65.
• (9) J. S. Sadowsky, Large deviations theory and efficient simulation of excessive backlogs in a GI/GI/m queue, IEEE Transactions on Automatic Control 36 (12) (1991) 1383–1394.
• (10) J. Blanchet, P. Glynn, H. Lam, Rare event simulation for a slotted time M/G/s model, Queueing Systems 63 (1-4) (2009) 33–57.
• (11) J. Blanchet, H. Lam, Rare-event simulation for many-server queues, Mathematics of Operations Research 39 (4) (2014) 1142–1178.
• (12) Y. Choe, E. Byon, N. Chen, Importance sampling for reliability evaluation with stochastic simulation models, Technometrics 57 (3) (2015) 351–361.
• (13) R. Rubinstein, The cross-entropy method for combinatorial and continuous optimization, Methodology and Computing in Applied Probability 1 (2) (1999) 127–190.
• (14) V. Dubourg, B. Sudret, F. Deheeger, Metamodel-based importance sampling for structural reliability analysis, Probabilistic Engineering Mechanics 33 (2013) 47–57.
• (15) Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, P. L’Ecuyer, The cross-entropy method for optimization, Machine Learning: Theory and Applications, V. Govindaraju and C. R. Rao, Eds, Chennai: Elsevier 31 (2013) 35–59.
• (16) R. Rubinstein, A stochastic minimum cross-entropy method for combinatorial optimization and rare-event estimation, Methodology and Computing in Applied Probability 7 (1) (2005) 5–50.
• (17) Z. I. Botev, D. P. Kroese, T. Taimre, Generalized cross-entropy methods with applications to rare-event simulation and optimization, Simulation 83 (11) (2007) 785–806.
• (18) Y. Chen, Y. Choe, Oracle importance sampling for stochastic simulation models, arXiv preprint arXiv:1710.00473.
• (19) H. Wang, X. Zhou, A cross-entropy scheme for mixtures, ACM Transactions on Modeling and Computer Simulation 25 (1) (2015) 6:1–6:20.
• (20) N. Kurtz, J. Song, Cross-entropy-based adaptive importance sampling using gaussian mixture, Structural Safety 42 (2013) 35–44.
• (21) Y. Choe, Information criterion for minimum cross-entropy model selection, arXiv preprint arXiv:1704.04315.
• (22) H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control 19 (6) (1974) 716–723.
• (23) C. Cannamela, J. Garnier, B. Iooss, Controlled stratification for quantile estimation, The Annals of Applied Statistics 2 (4) (2008) 1554–1580.
• (24) R. A. Rigby, D. M. Stasinopoulos, Generalized additive models for location, scale and shape, Journal of the Royal Statistical Society: Series C (Applied Statistics) 54 (3) (2005) 507–554.
• (25) E. Byon, L. Ntaimo, Y. Ding, Optimal maintenance strategies for wind power systems under stochastic weather conditions, IEEE Transactions on Reliability 59 (2) (2010) 393–404.
• (26) International Electrotechnical Commission, IEC/TC88, 61400-1 ed. 3, Wind Turbines - Part 1: Design Requirements. (2005).
• (27) J. M. Jonkman, M. L. Buhl Jr., FAST User’s Guide, Tech. Rep. NREL/EL-500-38230, National Renewable Energy Laboratory, Golden, Colorado (2005).
• (28) B. J. Jonkman, TurbSim user’s guide: version 1.50, Tech. Rep. NREL/TP-500-46198, National Renewable Energy Laboratory, Golden, Colorado (2009).
• (29) P. Moriarty, Database for validation of design load extrapolation techniques, Wind Energy 11 (6) (2008) 559–576.
• (30) Y. Choe, Q. Pan, E. Byon, Computationally efficient uncertainty minimization in wind turbine extreme load assessments, ASME Journal of Solar Energy Engineering 138 (4) (2016) 041012–041012–8.
• (31) M. A. Figueiredo, A. K. Jain, Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (3) (2002) 381–396.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters