Learning from Non-Stationary Stream Data in Multiobjective Evolutionary Algorithm

Learning from Non-Stationary Stream Data in Multiobjective Evolutionary Algorithm

Abstract

Evolutionary algorithms (EAs) have been well acknowledged as a promising paradigm for solving optimisation problems with multiple conflicting objectives in the sense that they are able to locate a set of diverse approximations of Pareto optimal solutions in a single run. EAs drive the search for approximated solutions through maintaining a diverse population of solutions and by recombining promising solutions selected from the population. Combining machine learning techniques has shown great potentials since the intrinsic structure of the Pareto optimal solutions of an multiobjective optimisation problem can be learned and used to guide for effective recombination. However, existing multiobjective EAs (MOEAs) based on structure learning spend too much computational resources on learning. To address this problem, we propose to use an online learning scheme. Based on the fact that offsprings along evolution are streamy, dependent and non-stationary (which implies that the intrinsic structure, if any, is temporal and scale-variant), an online agglomerative clustering algorithm is applied to adaptively discover the intrinsic structure of the Pareto optimal solution set; and to guide effective offspring recombination. Experimental results have shown significant improvement over five state-of-the-art MOEAs on a set of well-known benchmark problems with complicated Pareto sets and complex Pareto fronts.

{IEEEkeywords}

Multiobjective Optimization Evolutionary Algorithms Machine Learning Online Agglomerative Clustering

\IEEEpeerreviewmaketitle

1 Introduction

\IEEEPARstart

In practice, a decision maker often requires to consider optimising multiple conflicting objectives. This type of optimisation problems are usually referred to as multiobjective optimisation problems (MOPs). Since the objectives of the problems usually conflict with each other, there does not exist a unique solution that can optimise all the objectives simultaneously. Therefore, a set of Pareto optimal solutions, named as Pareto set (PS), exists for an MOP [1]. A solution is considered to be ‘Pareto optimal’ if it is impossible to make any one objective better off without making at least another one worse off. Finding the PS often challenges greatly on computational capacity and algorithm intelligence [2].

In the last three decades, extensive research on evolutionary algorithms (EAs) have shown that the EA paradigm is very powerful in handling MOPs, in the sense that a set of solutions that approximates to the PS, named as approximated set, can be obtained in a single run without requiring much computational effort [3][4]. EAs simulate the genetic evolution of a population of individuals to best fit their living environment [5]. To design an effective EA, effective recombination for fit offspring generation is a key. Research has shown that a problem’s domain knowledge, if any, can greatly improve the search efficiency if the knowledge is properly collected or learned during the search process [6].

For an -objective optimisation problem, it has been proved that the distribution of the PS exhibits an ()-dimensional manifold structure under mild conditions [7]. This property is often referred to as the regularity property. From the point view of EA design, an effective EA is expected if the manifold structure can be discovered and applied for offspring generation. Some EAs have been developed to combine machine learning techniques for the discovery of the intrinsic manifold structure to aid the search for the PS. For examples, in regularity model-based estimation of distribution algorithm (RM-MEDA) [6], the local principal component analysis (local PCA) approach is applied at each generation. It uses the learned principal components to approximate the manifold structure. Some EAs adopted other machine learning techniques to approximate the manifold structure [8]. All these algorithms apply the machine learning techniques at every generation. These learning algorithms often need to visit all data several times (iterations) until converge. Thus, a considerable amount of computational resources is consumed on learning.

To reduce the computational overhead, the multiobjective EA (MOEA) proposed by Zhang et al. [4] couples the population evolution and the model inference. In their MOEA, only one iteration of the learning algorithm is applied at each generation. This scheme provides an important development on saving computational resources. The evolution procedure can also be seen as a learning procedure; intrinsic PS’s structure of an MOP is expected to be learned dynamically from the changing candidate solutions. However, there is a fundamental issue in this scheme. As well known, one of the main assumptions in machine learning is that sample observations are assumed to be effectively i.i.d. (independent and identically distributed) for the purposes of statistical inference. But, under the scheme in [4], along the evolution procedure, the assumption is largely violated. First, solutions at adjacent generations have rather different qualities in terms of their respective objectives, which indicate that they might not be sampled from the same underlying distribution (i.e. these solutions are not identically distributed). Second, the generation of new solutions at present generation depends on collective information from previous generation, which indicates solutions at adjacent generations are not independent.

Look deeply into the data (i.e. offsprings created during the evolution search) we try to learn from, some special characteristics can be observed: 1) the structure to be discovered along evolution is temporal and changing dynamically. In other words, these data are produced by a non-stationary process1; 2) the structure determined by the data is scale-variant. On a short time scale, the structure is pseudo-stationary, while on a long time scale, the structure has a sequential and converging property. That is, along the evolution process, the underlying structure is similar between adjacent generations, while the structure will finally be converging to the PS’s manifold structure of the considered optimisation problem.

In this paper, we present the first-ever MOEA based on an online machine learning2 from a stream of non-stationary data. In our algorithm, a modified algorithm to the online agglomerative clustering algorithm presented in [9] is developed to learn the PS’s structure addressing the above mentioned characteristics. Obvious advantages of the proposed online clustering based evolutionary algorithm (OCEA) include 1) a perfect match between the search dynamics and the non-stationary structure learning and 2) a significantly reduced computational cost on learning (data need to be visited only once in the context of online learning). To successfully implement the proposed algorithm, we need to address three main issues. First, how to modify the online agglomerative clustering in accordance with the evolution process to discover the underlying structure? Second, how to properly use the learned structure to create offsprings effectively? Finally, how to select the fittest individuals to drive the search towards the PS? These issues will be discussed in the following sections.

The rest of the paper is organised as follows. The background and previous work on multiobjective evolutionary algorithms is introduced in Section 2. Section 3 presents the proposed algorithm in detail. Experimental studies are shown in Section 4 and 5. The analysis of parameters effect to algorithmic performance is discussed in Section 6. Section 7 concludes the paper.

2 Background and Previous Work

A box-constrained continuous MOP can be stated as follows:

(1)

where defines the decision (search) space; and are the lower and upper boundaries of variable , respectively; is a vector of decision variable; represents the mapping from search space to objective space where objective functions are to be considered.

Suppose that are two vectors. If for all , but there exists at least one index , such that , then is said to dominate 3, denoted by . A solution is called (globally) Pareto optimal if there is no such that . The set of all Pareto optimal solutions, denoted by PS, are named as Pareto set. The set of the objective vectors of the Pareto optimal solutions is called Pareto front, denoted by PF. The goal of an MOEA for an MOP is to find a set of approximated solutions whose objective vectors (the objective vectors constitute an approximated front) are as close to the PF as possible (i.e. the convergence requirement), and distribute along the PF as widely and evenly as possible (i.e. the diversity requirement).

Great efforts have been made to deal with MOPs in the evolutionary computation community [3]. These developed approaches focus either on establishing a mechanism to balance convergence and diversity, or on developing effective recombination.

MOEAs concerning the balance between convergence and diversity basically fall into three categories. In the first category, the Pareto dominance relationship is applied for promising solution selection. The nondominated sorting developed by Deb et al. [10] is the most known method. Its primary use is to drive the search towards the PF which favours convergence. It needs to incorporate other strategies, such as crowding distance [10] and K-nearest neighbor method [11], to preserve the population diversity. It has been found out that dominance-based sorting method is not able to provide enough comparability for many-objective ( objectives) optimization problems. Typical dominance-based MOEAs include NSGA-II [10], SPEA2 [11], PESA-II [12], NSGA-III [13], and others.

In the second category, MOEAs based on performance metrics, such as hypervolume (HV), R2 and , were developed. The performance metrics embed the convergence and diversity requirements together so that they can be employed to directly guide the selection of solutions for a good balance of convergence and diversity. Representative MOEAs include SMS-EMOA [14], HyPE [15], R2-IBEA [16] and DDE [17]. The computation of the performance metrics becomes much more difficult and time-consuming in dealing with many-objective optimisation problems.

The third category is the decomposition-based MOEAs. In this category, a number of reference vectors in the objective space are used to decompose the problem into a set of single objective subproblems [18], or several simple multiobjective subproblems [19]. The convergence is controlled by the objective values of the subproblems; while the diversity is managed by computing the distances of the solutions to the reference vectors. Representative decomposition-based MOEAs include MOEA/D [20], MOEA/D-DE [18], MOEA/D-STM [21], MOEA/D-M2M [19] and others.

Regarding MOEAs focusing on effective recombination, they are almost all designed based on the regularity property of MOPs. The underlying assumption is that the manifold structure could be used to greatly improve the search efficiency since high-quality offsprings can be generated if the regularity structure is properly modelled and learned. The first work on applying the regularity property in designing MOEA, i.e., aforementioned RM-MEDA, was proposed in 2008 [6], where the manifold structure is approximated by the first principal components. This work was improved later by using help from the modelling on the PF [22]. Various regularity based MOEAs have been developed since then, such as a reducing redundant cluster based RM-MEDA [8], a RM-MEDA with local learning strategy [23], evolutionary multiobjective optimisation via manifold learning [24], and others. Moreover, in [4], a self-organising map method is incorporated within the evolution procedure to search for the manifold PS structure.

3 The Algorithm

As discussed previously, existing regularity based MOEAs usually spend a high computational cost on learning. To reduce the consumption of computational resources, we propose to adopt an online machine learning scheme. Offsprings are considered as a stream of data since they come in order along the evolution process, and can only be accessed once or a small number of generations. Moreover, it is observed that along the evolution process, the stream of solutions is dependent, and non-stationary. Therefore, the application of online learning algorithm is able to reduce the number of visits and account for the non-stationary nature. This can significantly reduce the computational resources.

Note that a finite mixture of Gaussian clusters can be used to well approximate the distribution of a set of data points statistically.4 This motives us to approximate the manifold structure by using an online clustering algorithm. The cluster statistics, including the number of clusters, cluster mean and variance-covariance, will evolve over time. To model this non-stationary process, we propose to modify an online agglomerative clustering algorithm called AddC [9] and use it to dynamically estimate the cluster statistics.

In the following, we first describe the online agglomerative clustering algorithm developed in [9] and discuss how it should be modified to adapt to the evolution process of MOEAs. The other details of the developed algorithm are then presented.

3.1 Online Agglomerative Clustering

AddC, presented by Guedalia et al. [9] in 1998, is developed for clustering a stream of non-stationary data. AddC’s clustering procedures are shown in Alg. 1. From line 1 to 2, an arriving new data point is assigned to the cluster that is closest to it at first. This step attempts to minimise the within cluster variance. Afterwards, from line 3 to 6, if there are less than clusters, is employed as a centroid to create a new cluster; otherwise, from line 4 to 6, two redundant clusters which are closest to each other are merged, and is also treated as a centroid to create a new cluster for replacing the redundant cluster (i.e. in line 6). The merging operation is aimed to maximise the distances between the centroids and to remove redundant clusters. The creation of new clusters is to consider the temporal changes in the distribution of the data. In line 7, if there still exist data points to be clustered, the clustering operations are repeated. Otherwise, a post process is conducted to remove clusters with a negligible number () of data in line 8. The post process is to eliminate outliers if any.

0:  an arriving new data point , centroids and counters of existing clusters , , and the maximum number of clusters allowed .
0:  a new set of clusters.
1:  The centroid which is closest to the data point is defined as the winner,
2:  Update the closest centroid and its counter,
where is the number of data points in .
3:  If , set and . Goto step 6.
4:  Find a pair of closest (redundant) centroids,
5:  Merge redundant clusters and update the cluster statistics,
6:  Initialise a new cluster , and .
7:  If there still exist data points to be clustered, take a new point and goto Step 1.
8:  Post process: , if , perform steps 6 and 7.
Algorithm 1 Online Agglomerative Clustering AddC

3.2 Algorithmic Framework

The framework of OCEA is presented in Alg. 2. In line 1 to 2, an initial population is yielded, an external archive is initialised to be the same as . In the first generation, each solution is considered as a cluster where itself is initialised to be the centroid and counter , . Afterwards, at each generation, an offspring is generated around each solution (lines 7 to 9). To generate , a mating control parameter is applied to balance exploration and exploitation. With , the solution generation will be in favour of exploitation. That is, the reference (or parent) solutions are chosen from the cluster that locates. With , the reference individuals are chosen from the global mating pool specified in line 5. This is to favour exploration. After recombination, the generated offspring is then used to update external archive and current population by environmental selection, and the clustering information (lines 9 and 11).

The solution generation and the updating procedures for population and clusters will be described in the following subsections.

0:  population size , maximum evolutionary generations , mating control parameter .
0:  population .
1:  Intialization and an external archive .
2:  Take each as a cluster with centroid and counter .
3:  for  to  do
4:     Set #clusters.
5:     Construct a global mating pool by randomly choosing a solution from a .
6:     for  to  do
7:        Construct a mating pool for each as follows:
where represents that loactes in , is a random number generator in .
8:        Generate = SolGen .
9:        Update and clustering = Esoc .
10:     end for
11:     Set and pass the clustering results of to .
12:  end for
Algorithm 2 OCEA framework

3.3 New Solution Generation

In this paper, the differential evolution (DE) and polynomial mutation (PM) operators are adopted to generate offsprings as presented in Alg. 3. The recombination operator takes the current solution and its mating pool as input and outputs an offspring . DE [26] is firstly used to generate a trial solution (line 2), a repair mechanism is employed to correct any component that is outside the search boundary of that component (line 3). After repair, the PM [2] operator is applied to generate a new solution (line 4). The new solution is repaired again if necessary and the final solution is returned (line 5).

In Alg. 3, and are the two control parameters for the DE operator, and are the parameters for the PM operator. If , the DE operator in Alg. 3 is rotation invariant, which is of advantage to deal with complicated PS [18]. Therefore DE is selected to generate new offsprings in OCEA. Obviously, the use of other recombination operators is not limited; e.g. we could use the recombination operators in [27].

0:  a current solution and its mating pool
0:  a trial solution
1:  Choose randomly two distinct parent individuals and from
2:  Generate as follows:
3:  Repair ,
where .
4:  Mutate ,
where if a uniform random generator in [0,1], and
5:  If necessary, repair
Algorithm 3 Solution generation (SolGen) operator

3.4 Updating on Population and Clusters

In Alg. 2 line 9, function Esco is applied to carry out environmental selection and clustering updating. OCEA adopts the environmental selection method proposed in SMS-EMOA [14] which is based on the hypervolume metric. The hypervolume metric is the only known unitary metric that is Pareto compliant [28]. It has shown better performance over decomposition-based and Pareto dominance-based environmental selection approaches [15].

Regarding cluster updating, we modify the online agglomerative clustering algorithm AddC (Alg. 1) so that it can be fitted into the evolutionary search mechanism. The modified AddC is fused in OCEA to update/refine the clusters to adaptively learn the PS’s structure.

Alg. 4 presents the details of Esoc. For each new solution , is updated by the hypervolume metric based environmental selection. Specifically, the fast non-dominanted sorting approach proposed in NSGA-II [10] is applied to partition the external archive into non-dominanted fronts , where is the best front and is the worst one (line 1). which indicates that there are more than one front in . If it is the case, the solution in with the largest value is removed, where denotes the number of solutions in that dominates . Otherwise, if , the solution that least contributes to the hypervolume, i.e. (line 3 to 5, and 14), is excluded. The calculation of can be found in [14].

0:  a new solution , external archive , centroids and counters of current existing clusters , , and the maximum number of clusters allowed .
0:  External archive and its cluster information.
1:  Apply the fast non-dominanted sorting approach on to obtain fronts .
2:  if  then
3:     Determine the worst solution,
4:  else
5:     Determine the worst solution,
6:  end if
7:  if  then
8:     If , then remove from : .
9:     if  then
10:        Remove cluster , set .
11:     else
12:        Update cluster :
13:     end if
14:     Delete the worst solution .
15:     Set , construct a new cluster , set , .
16:     if  then
17:        Find two closest clusters,
18:        Merge the two clusters,
19:     end if
20:  else
21:     Delete the worst solution in .
22:  end if
Algorithm 4 The updating procedure (Esoc).

If is kept in after environmental selection, i.e., , the online clustering operation is invoked. First, is removed from its cluster , and its cluster’s centroid and counter are updated following equations in line 12. It differs from AddC where no data points are to be removed during the online clustering process. Then is taken as a new centroid to construct a new cluster (line 15). If there are more than clusters in , two clusters that are closest to each other are emerged (lines 16 to 18) to complete the clustering operation.

3.5 Notes on OCEA

It is necessary to emphasize that:

  • The evolution procedure of OCEA is also an online clustering procedure working on a stream of offsprings which are created and updated during the evolution process. We would expect that the clustering structure is to be gradually emerged during evolution and finally gets well shaped at termination.

  • Different from the original AddC (Alg. 1), (a) the clustering procedure in OCEA starts from the initial clusters composed of the solutions in the initial population (line 2 in Alg. 2); (b) During the evolution, some solutions are dominated and need to be removed. An extra operation is added to account for the removal of solutions, including the updating of cluster statistics and the discarding of any empty cluster (lines 8 to 13 in Alg. 4); (c) In our online clustering procedure, is not assigned to its closest cluster as opposed to Alg. 1 where a new data need to be assigned to its closest cluster (line 15 in Alg. 4).

  • OCEA incorporates the online clustering tightly within the evolution search. The online clustering discovers adaptively the PS structure along with the evolution. New solutions are created taking the cluster information into account at each generation. As a result, it can be seen that the online clustering closely adapts to the search procedure; and accounts for the non-stationary of the evolution dynamics.

  • Different from existing regularity model-based MOEAs in which the learning at each generation has a time complexity linearly to the number of training iterations. The number of generations should be large enough to make sure the convergence of the learning algorithm. On the contrary, in our scheme, each solution is visited only once. This can significantly reduce the computational burden.

  • In our scheme, we do not require a post-process which is different from the original AddC algorithm.

4 Experimental Study

To investigate the performance of OCEA, it is compared with two decomposition-based MOEAs (MOEA/D-DE [18] and TMOEA/D [29]), one regularity model based MOEA (RM-MEDA [6]), one popular performance metric based MOEA (SMS-EMOA [14]), and one typical Pareto dominance based MOEA (NSGA-II [10]).

Among these algorithms, MOEA/D-DE decomposes the MOP into a set of single-objective problems with uniformly distributed weights. It might be not able to obtain approximated fronts with good diversity for MOPs with complex PFs. TMOEA/D transforms the objective functions into those that are easy to be addressed by MOEA/D. This is to make MOEA/D perform well on MOPs with complex PFs. RM-MEDA is developed based on the regularity property. It learns some local principle components at each generation, and uses the principle components to approximate the manifold structure. SMS-MOEA uses the hypervolume metric as the selection criterion. NSGA-II, on the other hand, uses the Pareto dominance relationship among individuals and crowding distance to carry out environmental selection. These comparison algorithms cover all the main streams of MOEAs in the literatures.

4.1 Test Instances and Performance Metrics

MOPs with complex PF and complicated PS structures are particularly focused in this paper. The GLT test suite from [4] are used in the comparison experiments. The test suite includes a variety of problems with various characteristics that challenge MOEAs greatly. Those characteristics include disconnected PF, convex PF, nonlinear variable linkage, etc.

Two commonly-used performance metrics, inverted generational distance (IGD) [6] and hypervolume (HV) [30], are employed to measure the algorithm’s performance. These two metrics can measure both the convergence and diversity of the final approximated fronts found by MOEAs. Lower IGD and larger HV metric values imply better performance of MOEAs.

To calculate the HV metric value of an approximated front, a reference point which can be dominated by all the objective vectors in the final approximated front need to be set. The reference points chosen for the test instances are as follows: for GLT1, , for GLT2 , for GLT3 and for GLT4 , for GLT5-GLT6, .

4.2 Experimental Settings

It has been well acknowledged that for the GLT test instances, the DE and PM operators are more able to produce promising solutions than other operators [4]. Therefore, to make a fair comparison, the recombination operators in NSGA-II and SMS-EMOA are replaced by the DE and PM operators used in this paper. Furthermore, all parameters in the experiments are adjusted through preliminary experiments for optimal performance on these test instances. All algorithms are implemented in Matlab and tested in the same computer. The parameter settings for these algorithms are as follows:

  • Common parameters:

    • population size: for bi-objective and 105 for tri-objective instances;

    • search space dimension: for GLT1-GLT6;

    • runs: each algorithms independently runs each test instance for 33 times;

    • termination: maximum evolutionary generation .

  • Parameters for OCEA:

    • maximum number of clusters allowed: ;

    • mating control parameter: ;

    • DE control parameters: ;

    • PM control parameters: .

  • Parameters for MOEA/D-DE:

    • neighbourhood size: ;

    • mating control parameter: ;

    • maximum number of solutions to be replaced by an offspring: 2;

    • DE control parameters: ;

    • PM control parameters: .

  • Parameters for TMOEA/D:

    • neighbourhood size: ;

    • generations for the first stage: ;

    • generations for the second stage: ,
      ;

    • DE control parameters: .

  • Parameters for RM-MEDA:

    • number of clusters in local PCA: 5;

    • maximum iterations used in local PCA: 50;

    • sampling extension ratio: 0.25.

  • Parameters for NSGA-II and SMS-EMOA:

    • DE control parameters: ;

    • PM control parameters: .

To get statistically sound conclusions in the experiments, each algorithm independently runs 33 times for each instance, and the comparisons are performed based on the statistics of the performance metric values, i.e., mean and standard deviation values. In the comparison table, the mean IGD and HV metric values for each instance are sorted in an ascending and descending order, respectively, and the ranks are given in the square brackets of the table. The best mean metric values are highlighted in bold face with gray background. The Wilcoxon’s rank sum test at a 5% significance level is also performed to test the significance of differences between the mean metric values of each instance obtained by each pair of algorithms. In the tables, “”, “”, and “” are used to denote that the mean metric values obtained by OCEA is better than, worse than, or similar to those achieved by the comparison algorithm, respectively.

4.3 Comparison Study

To study the statistical performance of OCEA, Table 1 shows the statistics of IGD and HV metric values obtained by MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA on the GLT test suite averaged over 33 independent runs. In general, OCEA obtains 8 out of 12 best mean metric values, while the rest algorithms only obtain 4. According to the mean ranks, the algorithms’ performance ranked from the best to the worst are OCEA, RM-MEDA, TMOEA/D, SMS-EMOA, NSGA-II and MOEA/D-DE. Specifically, according to the Wilcoxon’s rank sum test, in the 12 comparisons with each of MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II and SMS-EMOA, OCEA achieves 12, 11, 11, 12, 11 better, 0, 1, 1, 0, 0 worse, and 0, 0, 0, 0, 1 similar mean metric values, respectively. Table 1 denotes that OCEA performs the best overall on the GLT test suite.

Instance MOEA/D-DE TMOEA/D RM-MEDA NSGA-II SMS-EMOA OCEA
IGD
GLT1 7.042e-03[3] 5.472e-03[2] 1.324e-02[4] 1.550e-02[5] 2.100e-02[6] 2.041e-03[1]
GLT2 3.569e-01[6] 3.636e-02[2] 3.327e-02[1] 4.146e-02[4] 4.955e-02[5] 3.720e-02[3]
GLT3 3.829e-02[6] 2.212e-02[5] 2.018e-02[4] 1.405e-02[2] 1.929e-02[3] 6.432e-03[1]
GLT4 1.985e-02[2] 4.462e-02[5] 4.147e-02[4] 4.106e-02[3] 5.505e-02[6] 5.769e-03[1]
GLT5 8.079e-02[6] 4.409e-02[3] 5.130e-02[4] 6.419e-02[5] 3.018e-02[2] 2.942e-02[1]
GLT6 5.582e-02[6] 4.059e-02[4] 3.835e-02[3] 5.384e-02[5] 2.225e-02[2] 2.223e-02[1]
HV
GLT1 3.367e+00[2] 3.366e+00[3] 3.316e+00[4] 3.312e+00[5] 3.297e+00[6] 3.369e+00[1]
GLT2 1.943e+01[6] 1.977e+01[2] 1.970e+01[5] 1.972e+01[3] 1.972e+01[4] 1.981e+01[1]
GLT3 3.941e+00[6] 3.943e+00[5] 3.944e+00[4] 3.946e+00[2] 3.946e+00[3] 3.948e+00[1]
GLT4 4.980e+00[2] 4.869e+00[6] 4.961e+00[3] 4.954e+00[4] 4.953e+00[5] 4.993e+00[1]
GLT5 7.939e+00[5] 7.958e+00[3] 7.951e+00[4] 7.939e+00[6] 7.968e+00[2] 7.969e+00[1]
GLT6 7.937e+00[5] 7.947e+00[4] 7.948e+00[3] 7.933e+00[6] 7.960e+00[2] 7.961e+00[1]
Mean Rank 4.583 3.667 3.583 4.167 3.833 1.167
// 12/0/0 11/1/0 11/1/0 12/0/0 11/0/1
Table 1: Statistics (mean(std. dev.)[rank]) of IGD and HV metric values of final approximated fronts obtained by MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA algorithms over 33 independent runs on the GLT test suite

To observe the search efficiency of OCEA, Fig. 1 shows the evolution of the statistics of the IGD metric values obtained by the six algorithms on GLT1-GLT6. From the figure, it can be seen that for GLT1 and GLT3-GLT6, OCEA reaches the fastest to the lowest mean IGD metric values. For GLT2, OCEA has the slower, similar and faster speed in comparison with RM-MEDA, TMOEA/D and the other algorithms, respectively. Moreover, when dealing with GLT2, OCEA actually performs better than RM-MEDA at the early stage compared with RM-MEDA. From the evolution of the standard deviations of the metrics, it also can be observed that within 300 generations, OCEA has achieved robust performance on all the instances except for GLT3. Fig. 1 indicates that OCEA approaches the fastest to the PFs and maintains the most diverse populations among the comparison algorithms on average.

Figure 1: Evolution of the statistics of IGD metric values obtained by MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA on GLT1-GLT6

To reveal the search processes, Fig. 2 plots the evolution of the approximated fronts obtained by RM-MEDA, NSGA-II, MOEA/D-DE and OCEA on GLT4. It is noted that the evolution of the approximated front obtained by each algorithm plotted in the figure is representative. The representative evolution of an algorithm here indicates the final approximated front yielded by the evolution is with the median IGD metric value in 33 independent runs. It can be seen from the figure that, at the 100th generation, the approximated front yielded by OCEA has reached the PF completely, and almost covered the whole PF. After 300 generations, it has reached the approximated front with excellent convergence and diversity. On the other hand, after 300 generations, the final approximated fronts obtained by RM-MEDA, NSGA-II, MOEA/D-DE still cannot cover the whole PF, are not distributed unevenly. Fig. 2 shows that OCEA can indeed greatly improve the search efficiency.

RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA RM-MEDA
(a) RM-MEDA
(b) NSGA-II
(c) MOEA/D-DE
(d) OCEA
Figure 2: Evolution of the approximated fronts obtained by RM-MEDA, NSGA-II, MOEA/D-DE and OCEA on GLT4
overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts overall fronts
(a) overall fronts
(b) overall fronts
(c) representative fronts
(d) representative fronts
Figure 3: Final approximated fronts obtained by OCEA and RM-MEDA

To further investigate the effect of OCEA, Fig. 3 plots the final approximated fronts obtained by RM-MEDA and OCEA on GLT1-GLT6. All the final approximated fronts of each instance obtained by RM-MEDA and OCEA, are plotted in Fig. a and b. The final approximated front of each instance with median IGD metric value (called representative front) obtained by RM-MEDA and OCEA, respectively, over 33 independent runs are plotted in Fig. c and d. From Fig. a and b, it can be seen that through 33 independent runs, the final approximated fronts of each instance achieved by RM-MEDA and OCEA, respectively, both can cover the whole PF of that instance. However, compared with RM-MEDA, OCEA performs more stably. From Fig. c and d, it is observed that the representative fronts of GLT5-GLT6 yielded by RM-MEDA do not reach the PFs. For GLT1-GLT4, although the representative fronts yielded by RM-MEDA all reach the PFs, the PFs are not completely covered. By contrast, the representative fonts obtained by OCEA for each instance all converge to the PFs and distributed well over them. Fig. 3 implies that for the GLT test instances, OCEA is stable and robust in terms of convergence and diversity.

In summary, we may conclude that OCEA has shown an excellent performance for dealing with MOPs with complicated PSs and complex PFs.

5 Further Discussions

5.1 Performance on WFG test suite

To deeply understand the performance of OCEA, OCEA is also applied to the WFG test suite [31] and compared with the five algorithms mentioned above. It is well known that the WFG test instances have complex PFs and are with various complicated characteristics, such as nonseparable, multimodal, degenerate, deceptive, etc. In this section, 9 bi-objective WFG test instances with 30 dimensional decision variables are taken as the test-bed. The maximum evolutionary generation is set as 450. Through preliminary optimisation over parameters, part of the parameter settings of these algorithms are listed in Table 2; while the rest is the same as in Section 4.2. Again 33 independent runs of these algorithms are carried out on each test instance. Table 3 shows the statistics of the IGD and HV metric values obtained by MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA on the WFG test instances over 33 independent runs.

Algs. Parameters
MOEA/D-DE
TMOEA/D
RM-MEDA
NSGA-II
SMS-EMOA
OCEA
Table 2: Parameter settings for MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA on the WFG test suite
test instances MOEA/D-DE TMOEA/D RM-MEDA NSGA-II SMS-EMOA OCEA
IGD
WFG1 1.105e+00[2] 9.327e-01[1] 1.169e+00[3] 1.435e+00[5] 1.446e+00[6] 1.306e+00[4]
WFG2 3.781e-02[6] 2.860e-02[4] 3.030e-02[5] 1.431e-02[3] 1.357e-02[1] 1.429e-02[2]
WFG3 1.456e-02[4] 1.237e-02[2] 3.399e-02[6] 1.869e-02[5] 1.245e-02[3] 1.184e-02[1]
WFG4 3.400e-02[1] 4.921e-02[3] 9.385e-02[6] 5.692e-02[5] 4.581e-02[2] 5.512e-02[4]
WFG5 6.762e-02[3] 7.776e-02[5] 1.116e-01[6] 6.895e-02[4] 6.656e-02[1] 6.657e-02[2]
WFG6 3.325e-01[5] 3.531e-01[6] 3.249e-01[2] 3.321e-01[4] 3.318e-01[3] 3.249e-01[1]
WFG7 1.873e-02[3] 4.113e-02[6] 3.935e-02[5] 2.860e-02[4] 1.192e-02[2] 1.137e-02[1]
WFG8 4.930e-02[3] 4.484e-02[2] 1.628e-01[6] 6.190e-02[5] 5.474e-02[4] 3.763e-02[1]
WFG9 2.451e-01[3] 2.687e-01[5] 2.073e-01[1] 2.673e-01[4] 2.703e-01[6] 2.422e-01[2]
HV
WFG1 5.942e+00[2] 6.718e+00[1] 5.607e+00[3] 4.427e+00[5] 4.398e+00[6] 4.889e+00[4]
WFG2 1.144e+01[2] 1.145e+01[1] 1.127e+01[6] 1.141e+01[5] 1.143e+01[3] 1.142e+01[4]
WFG3 1.092e+01[4] 1.093e+01[3] 1.077e+01[6] 1.089e+01[5] 1.094e+01[2] 1.094e+01[1]
WFG4 8.527e+00[1] 8.337e+00[5] 8.107e+00[6] 8.341e+00[4] 8.420e+00[2] 8.350e+00[3]
WFG5 8.140e+00[4] 7.815e+00[6] 7.962e+00[5] 8.144e+00[3] 8.197e+00[2] 8.199e+00[1]
WFG6 6.351e+00[5] 6.245e+00[6] 6.388e+00[2] 6.351e+00[4] 6.355e+00[3] 6.391e+00[1]
WFG7 8.643e+00[3] 7.979e+00[6] 8.459e+00[5] 8.579e+00[4] 8.666e+00[2] 8.672e+00[1]
WFG8 8.450e+00[2] 8.274e+00[5] 7.674e+00[6] 8.329e+00[4] 8.373e+00[3] 8.463e+00[1]
WFG9 6.181e+00[3] 6.038e+00[6] 6.411e+00[1] 6.119e+00[4] 6.114e+00[5] 6.246e+00[2]
Mean Rank 3.111 4.056 4.444 4.278 3.111 2.000
// 12/5/1 12/4/2 14/3/1 15/0/3 11/4/3
Table 3: Statistics (mean(std. dev.)[rank]) of the IGD and HV metric values of final approximated fronts obtained by MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II, SMS-EMOA and OCEA algorithms over 33 independent runs on the WFG test suite.

Table 3 shows that OCEA achieves 9 out of the 18 best mean metrics. The rest five algorithms obtain only 9. The performance of these algorithms ranked from the best to the worst is OCEA, MOEA/D-DE, SMS-EMOA, TMOEA/D, NSGA-II and RM-MEDA according to the mean ranks. The Wilcoxon’s rank sum test suggests that OCEA performs better than MOEA/D-DE, TMOEA/D, RM-MEDA, NSGA-II and SMS-EMOA in 12, 12, 14, 15 and 11 out of the 18 mean metric values; performs worse in 5, 4, 3, 0 and 4; and similar in 1, 2, 1, 3 and 3. From Table 3, we may conclude that OCEA performs very well in solving the WFG test instances. It also indicates that OCEA is able to deal with MOPs with complex PFs and with complicated problem characteristics.

6 Parameter Sensitivity Analysis

The sensitivity of OCEA to its parameters is analysed in this section. The GLT test suite is used for the analysis.

6.1 Maximum Number of Clusters

To test how affects the performance of OCEA, 4, 5, 7, 10, 20 are chosen to do analysis. The rest parameters are the same as those in Section 4.2. OCEA was run on each test instances independently 22 times with different values. Fig. a shows the mean and standard deviation values of the IGD metric values obtained by OCEA.

From Fig. a, it can be seen that for GLT2, GLT5-GLT6, OCEA can always achieve similar performance robustly for different values. But for GLT1, GLT3-GLT4, different leads to relatively large performance differences. Especially, when is large, the performance of OCEA is not well enough. In general, a small can result in good search results by OCEA on the GLT test instances. This implies that OCEA is not very sensitive to the values on the GLT test instances. Therefore, is chosen in Section 4 to carry out the comparison. It should be noted that the optimal depends on individual problem.

(a)
(b)
(c)
(d)
Figure 4: The mean values and standard deviations of the IGD metric values of approximated fronts obtained by OCEA with different values over 22 independent runs on GLT1-GLT6

6.2 Clustering Effectiveness Analysis

The evolution procedure couples naturally with the online clustering procedure in OCEA. It is expected that the approximated set will present a clustering effect when the evolution procedure has converged. To justify the effectiveness of the online clustering, Fig. 5 plots the clustering results in the first 3-dimensional search space on the GLT1-GLT6 test instances. In the figure, the solutions in each different cluster are marked with different colors and symbols. It can be seen that the final approximated sets are partitioned into 7 clusters clearly (note that is set as 7). This figure indicates that OCEA can indeed approximate the clustering structure effectively.

Figure 5: The clusters of final approximated sets obtained by OCEA for GLT1-GLT6

6.3 Mating Restriction Probability

To test the sensitivity of the OCEA’s performance to the mating control parameter , 0.5, 0.6, 0.7, 0.8, 0.9 are used for the analysis. The rest parameters are the same as those in Section 4.2. Again, for different value, OCEA independently run 22 time on the test instances. Fig. b shows the statistics of the obtained IGD metric values.

From Fig. b, it is observable that for GLT5 and GLT6, different values bring a similar performance for OCEA; but for GLT1-GLT4, OCEA with different values performs very differently. Nevertheless, when , OCEA has relatively better performance for all the instances. The observation in Fig. b indicates that OCEA is not so sensitive to the setting of in solving the GLT test instances. Therefore, is chosen in Section 4 for the controlled comparison experiments. Again, it is necessary to point out that an optimal setting should depend on the problem characteristics.

6.4 Control Parameters of Differential Evolution Operator

The effect of the DE parameters, i.e., and , are to be evaluated in this section. 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1 are chosen to proceed analysis. The rest parameters are the same as in Section 4.2. When different () values are set, () is set as 1 (0.6). The mean and standard deviation values of the IGD metric values obtained by OCEA with different or over 22 independent runs are shown in Figs. c and d.

Fig. c shows that the value has a crucial effect on the OCEA performance for GLT1, GLT3-GLT4, and a large value can lead to a good OCEA performance. However, for GLT2, GLT5-GLT6, different settings do not affect the OCEA performance acutely. Fig. d shows that the value has a significant effect on OCEA for GLT1-GLT4, and a small value is better. But OCEA performs rather stably for GLT5-GLT6 with different values. In case () and (), OCEA can always find good IGD metric values for all the GLT test instances. In general, Figs. c and d denote that OCEA is not very sensitive to the and settings.

7 Conclusion

This paper presented a first-ever MOEA that incorporate an online clustering to address the non-stationary nature of the evolutionary search. The underlying consideration is 1) to learn the manifold structure of the PS (i.e. the so-called regularity property of MOPs) through clustering; and 2) to adapt to the non-stationary search dynamics. The online agglomerative clustering approach developed in [9] is modified to accommodate the evolution search dynamics. Experimental study has shown that the online clustering can address the non-stationary search process well, and is able to adaptively learn the clustering structure of the PS. The comparison against five well-known MOEAs has also shown that the structures learned adaptively by the online clustering can indeed improve the search efficiency (in terms of search speed) and effectiveness (in terms of the quality of the final approximated sets and fronts). Future work includes 1) the development of intelligent recombination operators that can be well fitted in the online learning mechanism; 2) the development and/or incorporation of other online learning strategies; and 3) the study of the developed framework for many-objective optimisation problems.

Footnotes

  1. A process is stationary if and only if the joint distribution of the data at different time are the same. Specifically, if we let be the generations of the evolution, and be a -dimensional solution. The sequence is a stationary stochastic process if the joint probabilistic distribution of and are the same for all and an arbitrary selection of . This is obviously not the case for the stream of offsprings created during the evolution process.
  2. In computer science, online machine learning methods learn patterns from data which are available in a sequential order as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once.
  3. The definition of domination is for minimization. “Dominate” means “be better than”.
  4. It is well acknowledged that mixtures of Gaussian distributions are dense in the set of probability distributions with respect to weak topology [25].

References

  1. K. Miettinen, Nonlinear Multiobjective Optimization.   Boston, USA: Kluwer Academic Publishers, 1999.
  2. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms.   New York, USA: John Wiley & Sons, 2001.
  3. A. Zhou, B.-Y. Qu, H. Li, S.-Z. Zhao, P. N. Suganthan, and Q. Zhang, “Multiobjective evolutionary algorithms: A survey of the state of the art,” Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 32–49, 2011.
  4. H. Zhang, S. Song, and X. Z. Gao, “Self-organizing multiobjective optimization based on decomposition with neighborhood ensemble,” Neurocomputing, 2015.
  5. X. Yu and M. Gen, Introduction to Evolutionary Algorithms.   London, UK: Springer-Verlag, 2010.
  6. Q. Zhang, A. Zhou, and Y. Jin, “RM-MEDA: A regularity model based multiobjective estimation of distribution algorithm,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 1, pp. 41–63, 2008.
  7. C. Hillermeier, Nonlinear Multiobjective Optimization-A Generalized Homotopy Approach.   Basel, Switzerland: Birkhäuser Verlag, 2001.
  8. Y. Wang, J. Xiang, and Z. Cai, “A regularity model-based multiobjective estimation of distribution algorithm with reducing redundant cluster operator,” Applied Soft Computing, vol. 12, no. 11, pp. 3526–3538, 2012.
  9. I. D. Guedalia, M. London, and M. Werman, “An on-line agglomerative clustering method for nonstationary data,” Neural computation, vol. 11, no. 2, pp. 521–540, 1999.
  10. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
  11. E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strength pareto evolutionary algorithm,” Swiss Federal Institute Technology, Zurich, Switzerland, Tech. Rep., 2001.
  12. D. W. Corne, N. R. Jerram, J. D. Knowles, M. J. Oates, and J. Martin, “PESA-II: Region-based selection in evolutionary multiobjective optimization,” in Proceedings of the 2nd Annual Genetic and Evolutionary Computation Conference.   San Francisco, California, USA: Morgan Kaufmann Publishers, 2001, pp. 283–290.
  13. K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: Solving problems with box constraints,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014.
  14. N. Beume, B. Naujoks, and M. Emmerich, “SMS-EMOA: multiobjective selection based on dominated hypervolume,” European Journal of Operational Research, vol. 181, no. 3, pp. 1653–1669, 2007.
  15. J. Bader and E. Zitzler, “HypE: An algorithm for fast hypervolume-based many-objective optimization,” Evolutionary Computation, vol. 19, no. 1, pp. 45–76, 2011.
  16. D. Phan and J. Suzuki, “R2-IBEA: R2 indicator based evolutionary algorithm for multiobjective optimization,” in Proceedings of the 2013 IEEE Congress on Evolutionary Computation.   New York, USA: IEEE, 2013, pp. 1836–1845.
  17. C. A. Rodríguez Villalobos and C. A. Coello Coello, “A new multi-objective evolutionary algorithm based on a performance assessment indicator,” in Proceedings of the 14th Annual Genetic and Evolutionary Computation Conference.   New York, USA: ACM, 2012, pp. 505–512.
  18. H. Li and Q. Zhang, “Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 2, pp. 284–302, 2009.
  19. H.-L. Liu, F. Gu, and Q. Zhang, “Decomposition of a multiobjective optimization problem into a number of simple multiobjective subproblems,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 3, pp. 450–455, 2014.
  20. Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, 2007.
  21. K. Li, Q. Zhang, S. Kwong, M. Li, and R. Wang, “Stable matching-based selection in evolutionary multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 6, pp. 909–923, 2014.
  22. A. Zhou, Q. Zhang, and Y. Jin, “Approximating the set of pareto-optimal solutions in both the decision and objective spaces by an estimation of distribution algorithm,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 1167–1189, 2009.
  23. Y. Li, X. Xu, P. Li, and L. Jiao, “Improved RM-MEDA with local learning,” Soft Computing, vol. 18, no. 7, pp. 1–15, 2013.
  24. K. Li and S. Kwong, “A general framework for evolutionary multiobjective optimization via manifold learning,” Neurocomputing, vol. 146, pp. 65–74, 2014.
  25. A. Bacharoglou, “Approximation of probability distributions by convex mixtures of gaussian measures,” Proceedings of the American Mathematical Society, vol. 138, no. 7, pp. 2619–2628, 2010.
  26. K. Price, R. M. Storn, and J. A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization.   Heidelberg, Berlin, Germany: Springer-Verlag, 2006.
  27. X. Qiu, J. Xu, K. Tan, and H. Abbass, “Adaptive cross-generation differential evolution operators for multi-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. PP, no. 99, pp. 1–1, 2015.
  28. E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Performance assessment of multiobjective optimizers: An analysis and review,” IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132, 2003.
  29. H.-L. Liu, F. Gu, and Y. Cheung, “T-MOEA/D: MOEA/D with objective transform in multi-objective problems,” in Proceedings of the 2010 International Conference of Information Science and Management Engineering, vol. 2.   New York, USA: IEEE, 2010, pp. 282–285.
  30. E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach,” IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.
  31. S. Huband, L. Barone, L. While, and P. Hingston, “A scalable multi-objective test problem toolkit,” in Proceedings of the 3rd International Conference on Evolutionary Multi-Criterion Optimization.   Heidelberg, Berlin, Germany: Springer-Verlag, 2005, pp. 280–295.
105005
This is a comment super asjknd jkasnjk adsnkj
""
The feedback cannot be empty
Submit
Cancel
Comments 0
""
The feedback cannot be empty
   
Add comment
Cancel

You’re adding your first comment!
How to quickly get a good reply:
  • Offer a constructive comment on the author work.
  • Add helpful links to code implementation or project page.