On Simulated Annealing Dedicated to Maximin Latin Hypercube Designs

# On Simulated Annealing Dedicated to Maximin Latin Hypercube Designs

## Abstract

The goal of our research was to enhance local search heuristics used to construct Latin Hypercube Designs. First, we introduce the 1D-move perturbation to improve the space exploration performed by these algorithms. Second, we propose a new evaluation function specifically targeting the Maximin criterion.

Exhaustive series of experiments with Simulated Annealing, which we used as a typically well-behaving local search heuristics, confirm that our goal was reached as the result we obtained surpasses the best scores reported in the literature. Furthermore, the function seems very promising for a wide spectrum of optimization problems through the Maximin criterion.

## 1 Introduction

The study of complex systems usually requires a considerable computation time. To speed up computations, the system may be replaced by a faster approximating model. To create this model, a set of outcomes for different parameter values is needed. The set of parameter values has an impact on the accuracy of the approximating model. Different sampling methods for this set of parameters has been proposed in [4]. If we note the number of inputs of the system and the number of possible values taken by an input variable , the choice of sample vectors can be represented by points on a hypercube of size and dimension . Among the designs proposed, we focus on the Maximin Latin Hypercube Design (LHD).

The LHD implements the Latin constraint: each coordinate in must appear only once in every dimension. In other words, the coordinates of any pairs of nodes differ in all dimensions. Moreover, the Maximin constraint means that we search the configuration with the maximal , where is the minimal distance between two points of the design. An instance is defined by the values of the dimension and the size . Consequently, an instance will be noted , for example .

There are exactly distances between points. They may be ordered , by definition, . In the remainder, we often refer to square values: .

As the Maximin LHD problem is believed to be NP-hard, heuristics are widely used to solve it. The use of deterministic methods are, for the moment, rather limited: branch-and-bound was only used with by [8, 9] (a highscore is a minimal maximal distance between any pair of design points for a given instance) . The survey [6] of metaheuristics sums up their performance on Maximin LHD and reports that SA not only outperformed other evolutionary algorithms but also improved many of the previous highscores for and . For this reason we choose to work on the improvement of SA applied to Maximin LHDs.

After introducing usual methods used to solve Maximin LHD with SA, we propose a new mutation and a new evaluation function to improve on the best current scores. Eventually, we show how it permits to exceed a great part of the scores presented by the literature. These results should not exclusively be considered in the particular SA context as they may offer the opportunity to boost the performance of local search algorithms for different designs with the Maximin constraint.

N.B. This reports completes our paper [2] providing the precisions and details which had to be omitted in its camera-ready version due to the page number limit. It should thus be cited together with [2].

## 2 Typical approach to solve Maximin LHD with Simulated Annealing

SA is a metaheuristic most commonly used for discrete search spaces inspired by a metallurgical process. It consists in visiting the search space with a perturbation on the configurations and deciding whether the mutated configuration should be selected. That is why it alternates phases of heating and slow cooling to influence this choice: as in physics, by controlling the cooling process, we give an opportunity to the configuration to find the lowest energy.

It is proven that SA converges to the global minimum with the Metropolis probability [1]. Several ingredients are compulsory for SA [3]: a perturbation (a mutation), an evaluation function (also called potential energy) and a temperature decrease , where is the iteration number. The acceptance probability depends on and the gap of potential energy between the new and the old configuration. When the current configuration at the iteration number is , a configuration will be accepted with the Metropolis probability:

 pk=min⎛⎜ ⎜⎝1,eH\scriptsize{pot}(ω′)−H\scriptsize{pot}(ω)KT(k)⎞⎟ ⎟⎠

Constant is typically fixed to 1. A configuration is defined by points with coordinates respecting the Latin constraints.

Survey [6] examined several perturbations among which was the most efficient. It deals with a pair of points: a randomly chosen one and a critical one. A critical point is a point involved in . Mutation transposes the coordinates of these points in one dimension. The authors of [6] proposed mutation which is a variant of as the transposition takes place in the dimension which ensures a better for a subsequent configuration . Mutation outperformed for , and .

Article [6] compared two evaluation functions: and introduced in [5]:

 ϕp=⎛⎜⎝(n2)∑i=1d−pi⎞⎟⎠1p. (1)

Function is more efficient than certainly because it takes into account changes on every distance whereas the function only considers the shortest distance of the configuration. As the paper [6] obtained the best score of the literature, we used the same values to set the values of parameter .

## 3 New perturbation targeting LHD

### 3.1 Principle of the perturbation

We use as a basis to construct a better performing perturbation. To clarify its principle, we define the notion of the neighborhood:

###### Definition 1 (Neighbor of a point).

For a given instance , a point of a configuration is a neighbor of the point if and only if there is a dimension for which coordinates of these two points are the closest possible. In other terms, .

The new mutation 1D–move consists in taking a critical point as before and taking one of its neighbor. Then we exchange the coordinates in one of the dimensions concerned by the neighborhood.

We choose a instance to illustrate 1D–move. Table 1 gives the coordinates of the points of a configuration . First, we choose a critical point: is determined by points and , so we take . Points (on axis , and ), (on axis ) and (on axis ) are neighbors of . For the sake of the example, we shall choose as a neighbor. Then, we exchange the coordinates of and on axis because and are neighbors through dimension . The new configuration is also given in Table 1.

### 3.2 Performance Evaluation

1D–move outperforms not only but as well. We reproduced the experiments made in [6] keeping the same value of parameter () for , and to show its performance (Table 2). SA performs a linear thermal descent until temperature is reached. The initial temperature is set thanks to a series of preliminary runs. We computed 100 effective runs and we present here the average within the 95% confidence interval.

To explain the efficiency of 1D–move, we can refer to SA on the Traveling Salesman Problem. Article [7] shows that the perturbations which move the smallest number of edges are the best. 1D-move modifies the same number of points as and . Consequently, distances are modified by all these mutations. However, the changes on distances are smaller with 1D–move given that modifications on coordinates are thanks to the neighborhood property. We prove it with Eq. (2). Our hypothesis is that this specific property explains why 1D–move is more efficient.

Let us consider a configuration and the two mutations and 1D–move. We want to prove that the changes due to these two mutations are not necessarily represented by the same order of magnitude. Let us note:

 m2:ω⟶ω′\leavevmode\nobreak  and \leavevmode\nobreak {1D--move}:ω⟶ω′′.

We assume the two mutations translate the point on a given dimension . We also take a point of the configuration which remains invariant with these mutations ( and equal to ). The objective is to find a configuration for which is equivalent to in the case. This difference is:

 Δd=∣∣ ∣∣ ⎷k∑l=1(p′2(l)−p1(l))2− ⎷k∑l=1(p2(l)−p1(l))2∣∣ ∣∣.

As most of the dimensions are not concerned by this move, we just note:

 k∑l=1,l≠j(p′2(l)−p1(l))2=k∑l=1,l≠j(p2(l)−p1(l))2=a2.

Variable depends on and . If and are neighbors regarding all dimensions (except ), . We take a configuration for which , , and . This configuration is illustrated in Figure 1.

By separating from other dimensions, we eventually find:

 Missing or unrecognized delimiter for \right (2)

Some configurations respect the property: , for which . This means that the difference between two distances may take values with the order of magnitude . It is not possible with 1D–move. Using the triangle inequality and the neighborhood property: .

With (or which is more restrictive than ), sometimes reaches the order of magnitude . We showed with that it was impossible with 1D–move which allows the local search to be more regular.

## 4 New evaluation function targeting Maximin

### 4.1 Presentation of a Maximin effect: narrowing the distribution of distances

We study the properties of distances obtained with the evaluation function in SA solutions. We represent all the distances of a configuration in histograms and identify properties that will allow us to establish a better evaluation function below. From now on, we distinguish three cases relative to values taken by and . We note the mean of for any configuration as as shown in [9].

#### Casen≤k

In this case (see Figure 2, ), the distances of potential solutions are concentrated around the mean. It is highly probable that two points at random taken will be neighbors. This explains why a point is close to all others in SA solutions and we talk about unimodal distribution. In our example in Figure 2, the statistical range of relative to , , in fact, is narrow. The rationale for this behavior is that when the number of points is less than the number of dimensions, it happens, in absence of constraints, that all the points are equidistant. Since the Latin constraint has to be respected, the points cannot be exactly equidistant. The distances, however, do not differ significantly.

#### Casek≤n≤2k

In this case (Figure 2, ), distributions are concentrated around two peaks. The first peak is mainly around the average distance (actually, there is a little shift between the peak and the mean because both the peaks preserve ) and the second peak is located around the doubled average distance. Much more distances are concerned by the first peak.

We illustrate this phenomenon with the instance in Figure 2. We can explain this by the fact that it is possible for this many points to be placed in an hyperoctahedron. In such a geometric object, each point is at the same distance from every other point but one, which is farther away. Thus, the distribution of distances shows two values, with the smaller being represented much more frequently.

In our example, . Concerning the highest peak, the statistical range remains small compared with the mean: the ratio is , larger than in the first case for the whole distribution. There are only seven distances located in the interval

#### Case2k≤n

In this last case (Figure 2, ), distances are distributed more uniformly. There is neither a dense peak nor a sparse interval. We observe a decrease of occurrences with an increase in the value of the distance.

#### Observations and consequences

For the first case, the only peak is naturally thin thanks to SA and particularly action. There is a little point in trying to narrow it more. We note that for the two last cases (), narrowing differences between distances lead to improve performance. We illustrate this on the instance. We represent distance sets of several possible solutions and observe that the best solutions have the most narrowed distributions. We compare two solutions in Figure 3 with and which is the best solution found in [6]. Indeed, we note that has the most narrow peak. We formulate the hypothesis that this property may be beneficial for SA performance. We introduce below a new evaluation function taking into account this aspect.

### 4.2 Definition of evaluation function ψ

We propose an evaluation function to replace the usual function :

 ψp,σ=⎛⎜⎝(n2)∑i=1wid−pi⎞⎟⎠1p, where\leavevmode\nobreak wi=1√∑(n2)j=1e−∣∣Dj−Di∣∣2σ2. (3)

The idea is to add weights for each distance term . These weights determine if the distance is close to other ones. If a distance is far from the others, the weight will be high. Consequently, it forces the distances to be close to each other. A single drawback of is its complexity in . There are different ways to reduce this complexity. First, for instance, it is possible to consider only the differences which respect . In this way, we avoid the calculations of terms that may be considered as negligible (). Instead of summing up distances, we can randomly choose distances .

### 4.3 Tuning of parameter σ and its justification

Let us focus on the parameter : given that we aim at furnishing a large number of scores, we need to tune it in a global way. It must depend directly on and , without preliminary experiments for each instance . Looking at the definition of , this variable is introduced in order to regulate the order of magnitude of the exponential term. We see that should have approximately the same order of magnitude than the values taken by .

This is why we try to give the expression of a linear function of and which is similar to typical values . To establish it, we study the variance of a random variable: the tuning of is founded on Theorem 2.

###### Theorem 2.

Let be the random variable representing any square distance in any configuration of instance . We have with .

###### Proof.

Thanks to [9], we know that . We note the random variable that gives any couple of points for . The random variable is a function of . For any , we note and get . As and are independent if , we note to keep the notation simple. If is high enough, we apply the Central Limit Theorem: . We focus first on :

 E(Y2)=∑nx=1∑y≠x(x−y)4n(n−1)2=2(n∑n−1z=1z4−∑n−1z=1z5)n(n−1)=n415+O(n3).

We thus deduce . ∎

We propose a global tuning of as a linear function of the variance of our configurations. As computing the variance of a configuration, at every iteration, would be expensive, we formulate the hypothesis that the variance of the square distances set of the SA solutions follows the function above. The idea of the tuning is to consider linearly dependent on the variance of the random variable . In the weights , we compare the difference between the current distance and an extra one with by calculating in order to identify which differences have to be taken into account.

In the case , we assume . According to several experiments series, we identify a good compromise with .

In the case , leading to unimodal distributions, does not bring more interesting results than . It is equivalent to assuming to be very large ().

Finally, the case which is an intermediary of the two previous cases, can be tuned with . This proposition does not obviously represent the best tuning for all possible instances but gives an efficient and simple solution for the tuning of .

It is necessary to mention that the case is the case where tuning is essential: to be as efficient as possible, the value of has to be carefully selected. Table 3 shows the impact of on SA performance with mutations and 1D-move. We keep the same experimental setup as in Subsection 3.2: SA makes a thermal linear descent, the results presented come out from 100 runs and the average is within the 95% confidence interval.

In Table 4, we update scores for the same instances as in [6]. The results were produced with iterations and . Our function is used when , elsewhere. We note in bold type improved results and in italics results worse than [6]. For , the use of 1D–move and allows us to exceed a large number of scores but this improvement is less significant for other values. For , we suppose that the new tools are not able to outperform previous results because the results are already optimal or very good. For , a credible hypothesis is that the value of is so close to 1 that the effect of is weak. Generally, results could be better with a specifically adapted tuning. Here, we established temperature, and by making compromises between all the instances. However, in a real life case, by treating complex systems, we work on a defined instance with and fixed. In such circumstances, we naturally advice to customize the tuning of the different parameters by making preliminary experiments on this very instance. We expect that such an approach would produce results outperforming those in Table 4.

## 5 Conclusion

In this article, we introduce new techniques to treat the Maximin LHD construction. The first one is the 1D–move mutation especially dedicated to the LHD structure. It is very efficient for a local search on LHDs because it makes it possible to follow a step-by-step path on the cost surface without jumping over possible minima. The second tool, the evaluation function directly focuses upon Maximin optimization.

As numerous problems, among them Maximin Designs, involve this criterion, we emphasize that this function can be used for many other applications. In the Maximin LHD context, the function tries to find solutions by narrowing a set of possible distances. SA, with 1D–move and , gave results better than those considered to be “the best known” for the majority of cases without any dedicated tuning.

### References

1. E. Aarts and P. van Laarhoven. Statistical cooling: A general approach to combinatorial optimization problems. Philips Journal of Research, 40(4), 1985.
2. P. Bergé, K. Le Guiban, A. Rimmel, and J. Tomasik. Search Space Exploration and an Optimization Criterion for Hard Design Problems. In Proc. of ACM GECCO, pages 43–44, July 2016.
3. S. Kirkpatrick, D. Gelatt Jr., and M. P. Vecchi. Optimization by simmulated annealing. Science, 220(4598):671–680, 1983.
4. M. McKay, R. Beckman, and W. Conover. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics, 21, 1979.
5. M. D. Morris and T. J. Mitchell. Exploratory designs for computational experiments. Journal of statistical planning and inference, 43:381–402, 1995.
6. A. Rimmel and F. Teytaud. A Survey of Meta-heuristics Used for Computing Maximin Latin Hypercube. In Proc. of EvoCOP, pages 25–36, 2014.
7. P. Tian, J. Ma, and D. Zhang. Application of the simulated annealing algorithm to the combinatorial optimisation problem with permutation property: An investigation of generation mechanism. European Journal of Operational Research, 118:81–94, 1999.
8. E. R. van Dam, B. Husslage, D. den Hertog, and H. Melissen. Maximin latin hypercube designs in two dimensions. Operations Research, 55(1):158–169, 2007.
9. E. R. van Dam, G. Rennen, and B. Husslage. Bounds for maximin latin hypercube designs. Operations Research, 57(3):595–608, 2009.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters