DISCRETE OPTIMIZATIONOF STATISTICAL SAMPLE SIZESIN SIMULATION BY USINGTHE HIERARCHICALBOOTSTRAP METHOD We are very thankful to Latvian Council of Science for the Grant Nr.97.0798 within which the present investigation is worked out.

# Discrete Optimization Of Statistical Sample Sizes In Simulation by Using The Hierarchical Bootstrap Method ††thanks: We are very thankful to Latvian Council of Science for the Grant Nr.97.0798 within which the present investigation is worked out.

A. Andronov and M. Fioshin
Riga Technical university,
Riga, LV-1658, 1, Kalku Str., Latvia
e-mail: andronow@rau.lv, mf@rau.lv
###### Abstract

The Bootstrap method application in simulation supposes that value of random variables are not generated during the simulation process but extracted from available sample populations. In the case of Hierarchical Bootstrap the function of interest is calculated recurrently using the calculation tree. In the present paper we consider the optimization of sample sizes in each vertex of the calculation tree. The dynamic programming method is used for this aim. Proposed method allows to decrease a variance of system characteristic estimators.
Keywords: Bootstrap method, Simulation, Hierarchical Calculations, Variance Reduction

## 1 Introduction

Main problem of the mathematical statistics and simulation is connected with insufficiency of primary statistical data. In this situation, the Bootstrap method can be used successfully (Efron, Tibshirani 1993, Davison, Hinkley 1997). If a dependence between characteristics of interest and input data is very composite and is described by numerical algorithm then usually it applies a simulation. By this the probabilistic distributions of input data are not estimated because the given primary data has small size and such estimation gives a bias and big variance. The Bootstrap method supposes that random variables are not generated by a random number generator during simulation in accordance with the estimated distributions but ones are extracted from given primary data at random. Various problems of this approach were considered in previous papers (Andronov et al. 1995, 1998).

We will consider the known function of m independent continuos random variables It is assumed that distributions of random variables are unknown, but the sample population is available for each . Here is the size of the sample . The problem consists in estimation of the mathematical expectation

 Θ=Eϕ(X1,X2,…,Xm). (1)

The Bootstrap method use supposes an organization of some realizations of the values . In each realization the values of arguments are extracted randomly from the corresponding sample populations . Let be a number of elements which were extracted from the population in the -th realization. We denote and name it the l-th subsample. The estimator of the mathematical expectation is equal to an average value for all r realizations:

 Θ∗=1rr∑l=1ϕ(X(l)). (2)

Our main aim is to calculate and to minimize the variance of this estimator. The variance will depend upon two factors: 1) a calculation method of the function ; 2) a formation mode of subsamples .

The next two Sections will be dedicated to these questions. In Section 4 we will show how to decrease variance using the dynamic programming method.

## 2 The Hierarchical Bootstrap Method

We suppose that the function is calculated by a calculation tree. A root of this tree corresponds to the computed function . It is the vertex number k. The vertices numbers correspond to the input variables . The rest vertices are intermediate ones. They correspond to intermediate functions (see Fig.1).

Only one arc comes out from each vertex , . It corresponds to a value of the function . We suppose that .

We denote as a set of vertices from which arcs come into the vertex , and as a corresponding set of variables (arcs): . It is clear that for ; , . We suppose that a numbering of the vertices is correct: if than .

Now, function value can be calculated by the sweep method. At the beginning, we extract separate elements from populations , then calculate the function values , successively. After r such runs the estimator is computed according to the formula (2). An analysis of this method was developed in the previous papers of authors.

The Hierarchical Bootstrap method is based on the wave algorithm. Here all values of the function for each vertex should be calculated all at once. They form the set , where is number of realizations (sample size). Getting one realization consists of choosing value from each corresponding population , and calculation of value. By this we suppose that a random sample with replacement is used when each element from is choosen independendly with the probability . Further on, this procedure is repeated for next vertex. Finally we get as values of the population . Their average gives the estimator by analogy with formula (2).

## 3 Expressions for Variance

The aim of this section is to show how to calculate variance of the estimator (2). It is easy to see that in the case of Hierarchical Bootstrap the variance is function of sample sizes .

In the previous papers of authors the variance was calculated using the -pairs notion (Andronov et. al, 1996, 1998). Then, it was considered as continuos function of variables , and reduced gradient method was used. But now we need other approach for the calculation of .

We use Taylor decomposition of function in respect to mean :

 ϕ(x)=ϕ(μ)+▽Tϕ(μ)(x−μ)+12(x−μ)T▽2ϕ(μ)(x−μ)+O(||x−μ||3), (3)
 where ▽2ϕ(x) is the matrix of second derivatives of the function ϕ(x), ||v|| is Euclidean norm of vector v.

It gives the following decomposition:

 ϕ(x)ϕ(x′)= ϕ2(μ)+ϕ(μ)▽Tϕ(μ)(x−μ)+ϕ(μ)▽Tϕ(μ)(x′−μ)+ +12ϕ(μ)(x−μ)T▽2ϕ(μ)(x−μ)+ +12ϕ(μ)(x′−μ)T▽2ϕ(μ)(x′−μ)+ +▽Tϕ(μ)(x−μ)(x′−μ)T▽ϕ(μ)+ +O(||x′−μ||3+||x−μ||3).

If is a random vector with mutual independent components , , , then

 Eϕ(X)=ϕ(μ)+12m∑i=1σ2i∂2∂x2iϕ(μ)+E(O(||X−μ||3)), (5)
 (Eϕ(X))2=ϕ2(μ)+ϕ(μ)m∑i=1σ2i∂2∂x2iϕ(μ)+…. (6)

Now we suppose that and are some values from sample population . Let denote the covariance of two elements and with different numbers and :

 Ci=Cov(Yil,Yil′|l≠l′). (7)

Let and correspond to the elements and accordingly. Because we extract and from at random and with replacement, then the event occurs with the probability . Then

 (8)

Therefore

 Cov(Xi,X′i)=1niDYi+(1−1ni)Ci. (9)

If the values and correspond to subfunction and the sample population , then formulas (3), (5) and (6) give

 Cv=∑i∈Iv(∂∂xiϕv(μv))2Cov(Xi,X′i)+…. (10)

Now we can get from (9)

 Cov(ϕv(X),ϕv(X′))=1nvσ2v+(1−1nv)Cv, (11)

where the variance can be determined from (10) by :

 σ2v=∑i∈Iv(∂∂xiϕv(μv))2σ2i+…. (12)

Finally we have

 Cov(ϕv(X),ϕv(X′))=∑i∈Iv(∂∂xiϕv(μv))2[1nvσ2i+(1−1nv)Cov(Xi,X′i)]+…, (13)

or

 Cov(ϕv(X),ϕv(X′))=∑i∈Iv(∂∂xiϕv(μv))2[(1nv+(1−1nv)1ni)σ2i+…
 +(1−1nv)(1−1ni)Ci]. (14)

By this we suppose that variances of input random variables are known. For example, it is possible to use estimators of these variances, calculated on given sample populations .

If the vertex belongs to the initial level of the calculation tree (it means that ) then , is known value, . Therefore

 Cov(ϕi(Xi),ϕi(X′i))=1niσ2i,i=1,2,…,m. (15)

Another covariances and variances are calculated recurrently in accordance with formulas (10),(12), (14), (15) from vertices with less numbers to vertices with great numbers. Finally we get the variance of interest as the variance for root of the calculation tree:

 DΘ∗=1r(σ2k+(r−1)Ck)+…, (16)

where .

As it was just mentioned, we will consider the variance as a function of sample sizes and denote it . Our aim is to minimize this function in respect to variables by linear restriction, or, by other words, to solve the following optimization problem:

 minimize D(n1,n2,…,nk) (17)

by restriction

 a1n1+a2n2+…+aknk≤b, (18)

where , and are integer non-negative numbers.

Now we intent to apply the dynamic programming method (Minox 1989).

## 4 The Dynamic Programming Method

Let us solve the optimization problem (17), (18). Our function of interest is calculated and simulated recurrently, using the calculation tree (see Section 2). In accordance to the dynamic programming technique, we have ”forward” and ”backward” procedure.

During ”backward” procedure, we calculate recurrently so-called Bellman function , , , . Let us consider the subfunction , that corresponds to the vertex . This subfunction directly depends on variables , , which correspond to incoming arcs for the vertex . Additionally depends on variables and from which there exists path from leaves to the vertex of our calculation tree. Let denote corresponding set of variables and .

Now we need to denote some auxiliary functions. Let us introduce the following notation

 ψi(α)=ασ2i+(1−α)Cov(Xi,X′i),i=1,2,…,k,0≤α≤1. (19)

Then we are able to write in accordance with (13):

 Cov(ϕv(X),ϕv(X′))=∑i∈Iv(∂∂xiϕv(μv))2ψi(1nv). (20)

Now we have from (19), (12) and (13)

 ψv(α)=∑i∈Iv(∂∂xiϕv(μv))2{ασ2i+(1−α)[1nvσ2i+(1−1nv)Cov(Xi,X′i)]}=
 ∑i∈Iv(∂∂xiϕv(μv))2[(α+1−αnv)σ2i+(1−α)(1−1nv)Cov(Xi,X′i)],

so

 ψv(α)=∑i∈Iv(∂∂xiϕv(μv))2ψi(α+1−αnv). (21)

Note that it follows from (11) and (19) that our variance of interest (16) is

 DΘ∗=ψk(0). (22)

Values depend on the sample sizes for all . We will mark this fact as .

Now we are able to introduce above mentioned Bellman functions:

 Φv(α,z)=minniψv(α;ni:i∈Bv), (23)

where minimization is realized with respect to non-negative integer variables that are satisfied the linear restriction

 ∑i∈Bvaini≤z. (24)

It is clear that optimal value of variance for the problem (17), (18) is equal to .

Bellman functions are calculated recurrently from to for and . Basic functional equation of dynamic programming has the following form:

 Φv(α,z)=min∑i∈Iv(∂∂xiϕv(μv))2Φi(α+1−αnv;zi) (25)

where minimaiztion is realized with respect to non-negative integer variables and that satisfy the linear restriction

 avnv+∑i∈Ivzi≤z. (26)

The initial values of are determined with the tree leaves by formulas (15), (19) and (23):

 Φv(α,z)=ασ2v+(1−α)1[z/av]σ2v= (27)
 =σ2v(α+(1−α)1[z/av]),v=1,2,…,m,

where - integer part of number ,

Thus the ”backward” procedure is a recurrent calculation of Bellman functions for , , by using formulas (27), (25). Finally we get the minimal variance

 D∗Θ∗=Φk(0,b). (28)

To calculate the optimal sample sizes we should apply ”forward” procedure of dynamic programming technique. At first, we find and by solving the equation

 Φk(0,b)=min∑i∈Ik(∂∂xiϕk(μk))2Φi(1n∗k,z∗i) (29)

where minimization is realized by condition

 akn∗k+∑i∈Ikz∗i≤b. (30)

Let , . Then we recurrently determine by analogy the rest and for :

 Φv(αI−1(v),z∗v)=min∑i∈Iv(∂∂xiϕv(μv))2Φi(αI−1(v)+1−αI−1(v)n∗v,z∗i) (31)

by condition

 avn∗v+∑i∈Ivz∗i≤z∗v. (32)

Moreover we put

 αv=αI−1(v)+(1−αI−1(v))/n∗v (33)

Finally the optimal sizes for are determined by the following way:

 n∗i=[z∗i/ai]. (34)

## References

Andronov, A., Merkuryev, Yu. (1996) Optimization of Statistical Sizes in Simulation. In: Proceedings of the 2-nd St. Petersburg Workshop on Simulation, St. Petersburg State University, St. Petersburg, 220-225.

Andronov, A., Merkuryev, Yu. (1998) Controlled Bootstrap Method and its Application in Simulation of Hierarchical Structures. In: Proceedings of the 3-d St. Petersburg Workshop on Simulation, St. Petersburg State University, St. Petersburg, 271-277.

Davison, A.C., Hinkley, D.V. (1997) Bootstrap Methods and their Application. Cambridge university Press, Cambridge.

Efron, B., Tibshirani, R.Y. (1993) Introduction to the Bootstrap. Chapman & Hall, London.

Minox, M. (1989) Programmation Mathematique. Teorie et Algorithmes. Dunod.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters