Learning from NonStationary Stream Data in Multiobjective Evolutionary Algorithm
Abstract
Evolutionary algorithms (EAs) have been well acknowledged as a promising paradigm for solving optimisation problems with multiple conflicting objectives in the sense that they are able to locate a set of diverse approximations of Pareto optimal solutions in a single run. EAs drive the search for approximated solutions through maintaining a diverse population of solutions and by recombining promising solutions selected from the population. Combining machine learning techniques has shown great potentials since the intrinsic structure of the Pareto optimal solutions of an multiobjective optimisation problem can be learned and used to guide for effective recombination. However, existing multiobjective EAs (MOEAs) based on structure learning spend too much computational resources on learning. To address this problem, we propose to use an online learning scheme. Based on the fact that offsprings along evolution are streamy, dependent and nonstationary (which implies that the intrinsic structure, if any, is temporal and scalevariant), an online agglomerative clustering algorithm is applied to adaptively discover the intrinsic structure of the Pareto optimal solution set; and to guide effective offspring recombination. Experimental results have shown significant improvement over five stateoftheart MOEAs on a set of wellknown benchmark problems with complicated Pareto sets and complex Pareto fronts.
Multiobjective Optimization Evolutionary Algorithms Machine Learning Online Agglomerative Clustering
1 Introduction
\IEEEPARstartIn practice, a decision maker often requires to consider optimising multiple conflicting objectives. This type of optimisation problems are usually referred to as multiobjective optimisation problems (MOPs). Since the objectives of the problems usually conflict with each other, there does not exist a unique solution that can optimise all the objectives simultaneously. Therefore, a set of Pareto optimal solutions, named as Pareto set (PS), exists for an MOP [1]. A solution is considered to be ‘Pareto optimal’ if it is impossible to make any one objective better off without making at least another one worse off. Finding the PS often challenges greatly on computational capacity and algorithm intelligence [2].
In the last three decades, extensive research on evolutionary algorithms (EAs) have shown that the EA paradigm is very powerful in handling MOPs, in the sense that a set of solutions that approximates to the PS, named as approximated set, can be obtained in a single run without requiring much computational effort [3][4]. EAs simulate the genetic evolution of a population of individuals to best fit their living environment [5]. To design an effective EA, effective recombination for fit offspring generation is a key. Research has shown that a problem’s domain knowledge, if any, can greatly improve the search efficiency if the knowledge is properly collected or learned during the search process [6].
For an objective optimisation problem, it has been proved that the distribution of the PS exhibits an ()dimensional manifold structure under mild conditions [7]. This property is often referred to as the regularity property. From the point view of EA design, an effective EA is expected if the manifold structure can be discovered and applied for offspring generation. Some EAs have been developed to combine machine learning techniques for the discovery of the intrinsic manifold structure to aid the search for the PS. For examples, in regularity modelbased estimation of distribution algorithm (RMMEDA) [6], the local principal component analysis (local PCA) approach is applied at each generation. It uses the learned principal components to approximate the manifold structure. Some EAs adopted other machine learning techniques to approximate the manifold structure [8]. All these algorithms apply the machine learning techniques at every generation. These learning algorithms often need to visit all data several times (iterations) until converge. Thus, a considerable amount of computational resources is consumed on learning.
To reduce the computational overhead, the multiobjective EA (MOEA) proposed by Zhang et al. [4] couples the population evolution and the model inference. In their MOEA, only one iteration of the learning algorithm is applied at each generation. This scheme provides an important development on saving computational resources. The evolution procedure can also be seen as a learning procedure; intrinsic PS’s structure of an MOP is expected to be learned dynamically from the changing candidate solutions. However, there is a fundamental issue in this scheme. As well known, one of the main assumptions in machine learning is that sample observations are assumed to be effectively i.i.d. (independent and identically distributed) for the purposes of statistical inference. But, under the scheme in [4], along the evolution procedure, the assumption is largely violated. First, solutions at adjacent generations have rather different qualities in terms of their respective objectives, which indicate that they might not be sampled from the same underlying distribution (i.e. these solutions are not identically distributed). Second, the generation of new solutions at present generation depends on collective information from previous generation, which indicates solutions at adjacent generations are not independent.
Look deeply into the data (i.e. offsprings created during the evolution search) we try to learn from, some special characteristics can be observed: 1) the structure to be discovered along evolution is temporal and changing dynamically. In other words, these data are produced by a nonstationary process
In this paper, we present the firstever MOEA based on an online machine learning
The rest of the paper is organised as follows. The background and previous work on multiobjective evolutionary algorithms is introduced in Section 2. Section 3 presents the proposed algorithm in detail. Experimental studies are shown in Section 4 and 5. The analysis of parameters effect to algorithmic performance is discussed in Section 6. Section 7 concludes the paper.
2 Background and Previous Work
A boxconstrained continuous MOP can be stated as follows:
(1) 
where defines the decision (search) space; and are the lower and upper boundaries of variable , respectively; is a vector of decision variable; represents the mapping from search space to objective space where objective functions are to be considered.
Suppose that are two vectors. If for all , but there exists at least one index , such that , then is said to dominate
Great efforts have been made to deal with MOPs in the evolutionary computation community [3]. These developed approaches focus either on establishing a mechanism to balance convergence and diversity, or on developing effective recombination.
MOEAs concerning the balance between convergence and diversity basically fall into three categories. In the first category, the Pareto dominance relationship is applied for promising solution selection. The nondominated sorting developed by Deb et al. [10] is the most known method. Its primary use is to drive the search towards the PF which favours convergence. It needs to incorporate other strategies, such as crowding distance [10] and Knearest neighbor method [11], to preserve the population diversity. It has been found out that dominancebased sorting method is not able to provide enough comparability for manyobjective ( objectives) optimization problems. Typical dominancebased MOEAs include NSGAII [10], SPEA2 [11], PESAII [12], NSGAIII [13], and others.
In the second category, MOEAs based on performance metrics, such as hypervolume (HV), R2 and , were developed. The performance metrics embed the convergence and diversity requirements together so that they can be employed to directly guide the selection of solutions for a good balance of convergence and diversity. Representative MOEAs include SMSEMOA [14], HyPE [15], R2IBEA [16] and DDE [17]. The computation of the performance metrics becomes much more difficult and timeconsuming in dealing with manyobjective optimisation problems.
The third category is the decompositionbased MOEAs. In this category, a number of reference vectors in the objective space are used to decompose the problem into a set of single objective subproblems [18], or several simple multiobjective subproblems [19]. The convergence is controlled by the objective values of the subproblems; while the diversity is managed by computing the distances of the solutions to the reference vectors. Representative decompositionbased MOEAs include MOEA/D [20], MOEA/DDE [18], MOEA/DSTM [21], MOEA/DM2M [19] and others.
Regarding MOEAs focusing on effective recombination, they are almost all designed based on the regularity property of MOPs. The underlying assumption is that the manifold structure could be used to greatly improve the search efficiency since highquality offsprings can be generated if the regularity structure is properly modelled and learned. The first work on applying the regularity property in designing MOEA, i.e., aforementioned RMMEDA, was proposed in 2008 [6], where the manifold structure is approximated by the first principal components. This work was improved later by using help from the modelling on the PF [22]. Various regularity based MOEAs have been developed since then, such as a reducing redundant cluster based RMMEDA [8], a RMMEDA with local learning strategy [23], evolutionary multiobjective optimisation via manifold learning [24], and others. Moreover, in [4], a selforganising map method is incorporated within the evolution procedure to search for the manifold PS structure.
3 The Algorithm
As discussed previously, existing regularity based MOEAs usually spend a high computational cost on learning. To reduce the consumption of computational resources, we propose to adopt an online machine learning scheme. Offsprings are considered as a stream of data since they come in order along the evolution process, and can only be accessed once or a small number of generations. Moreover, it is observed that along the evolution process, the stream of solutions is dependent, and nonstationary. Therefore, the application of online learning algorithm is able to reduce the number of visits and account for the nonstationary nature. This can significantly reduce the computational resources.
Note that a finite mixture of Gaussian clusters can be used to well approximate the distribution of a set of data points statistically.
In the following, we first describe the online agglomerative clustering algorithm developed in [9] and discuss how it should be modified to adapt to the evolution process of MOEAs. The other details of the developed algorithm are then presented.
3.1 Online Agglomerative Clustering
AddC, presented by Guedalia et al. [9] in 1998, is developed for clustering a stream of nonstationary data. AddC’s clustering procedures are shown in Alg. 1. From line 1 to 2, an arriving new data point is assigned to the cluster that is closest to it at first. This step attempts to minimise the within cluster variance. Afterwards, from line 3 to 6, if there are less than clusters, is employed as a centroid to create a new cluster; otherwise, from line 4 to 6, two redundant clusters which are closest to each other are merged, and is also treated as a centroid to create a new cluster for replacing the redundant cluster (i.e. in line 6). The merging operation is aimed to maximise the distances between the centroids and to remove redundant clusters. The creation of new clusters is to consider the temporal changes in the distribution of the data. In line 7, if there still exist data points to be clustered, the clustering operations are repeated. Otherwise, a post process is conducted to remove clusters with a negligible number () of data in line 8. The post process is to eliminate outliers if any.
3.2 Algorithmic Framework
The framework of OCEA is presented in Alg. 2. In line 1 to 2, an initial population is yielded, an external archive is initialised to be the same as . In the first generation, each solution is considered as a cluster where itself is initialised to be the centroid and counter , . Afterwards, at each generation, an offspring is generated around each solution (lines 7 to 9). To generate , a mating control parameter is applied to balance exploration and exploitation. With , the solution generation will be in favour of exploitation. That is, the reference (or parent) solutions are chosen from the cluster that locates. With , the reference individuals are chosen from the global mating pool specified in line 5. This is to favour exploration. After recombination, the generated offspring is then used to update external archive and current population by environmental selection, and the clustering information (lines 9 and 11).
The solution generation and the updating procedures for population and clusters will be described in the following subsections.
3.3 New Solution Generation
In this paper, the differential evolution (DE) and polynomial mutation (PM) operators are adopted to generate offsprings as presented in Alg. 3. The recombination operator takes the current solution and its mating pool as input and outputs an offspring . DE [26] is firstly used to generate a trial solution (line 2), a repair mechanism is employed to correct any component that is outside the search boundary of that component (line 3). After repair, the PM [2] operator is applied to generate a new solution (line 4). The new solution is repaired again if necessary and the final solution is returned (line 5).
In Alg. 3, and are the two control parameters for the DE operator, and are the parameters for the PM operator. If , the DE operator in Alg. 3 is rotation invariant, which is of advantage to deal with complicated PS [18]. Therefore DE is selected to generate new offsprings in OCEA. Obviously, the use of other recombination operators is not limited; e.g. we could use the recombination operators in [27].
3.4 Updating on Population and Clusters
In Alg. 2 line 9, function Esco is applied to carry out environmental selection and clustering updating. OCEA adopts the environmental selection method proposed in SMSEMOA [14] which is based on the hypervolume metric. The hypervolume metric is the only known unitary metric that is Pareto compliant [28]. It has shown better performance over decompositionbased and Pareto dominancebased environmental selection approaches [15].
Regarding cluster updating, we modify the online agglomerative clustering algorithm AddC (Alg. 1) so that it can be fitted into the evolutionary search mechanism. The modified AddC is fused in OCEA to update/refine the clusters to adaptively learn the PS’s structure.
Alg. 4 presents the details of Esoc. For each new solution , is updated by the hypervolume metric based environmental selection. Specifically, the fast nondominanted sorting approach proposed in NSGAII [10] is applied to partition the external archive into nondominanted fronts , where is the best front and is the worst one (line 1). which indicates that there are more than one front in . If it is the case, the solution in with the largest value is removed, where denotes the number of solutions in that dominates . Otherwise, if , the solution that least contributes to the hypervolume, i.e. (line 3 to 5, and 14), is excluded. The calculation of can be found in [14].
If is kept in after environmental selection, i.e., , the online clustering operation is invoked. First, is removed from its cluster , and its cluster’s centroid and counter are updated following equations in line 12. It differs from AddC where no data points are to be removed during the online clustering process. Then is taken as a new centroid to construct a new cluster (line 15). If there are more than clusters in , two clusters that are closest to each other are emerged (lines 16 to 18) to complete the clustering operation.
3.5 Notes on OCEA
It is necessary to emphasize that:

The evolution procedure of OCEA is also an online clustering procedure working on a stream of offsprings which are created and updated during the evolution process. We would expect that the clustering structure is to be gradually emerged during evolution and finally gets well shaped at termination.

Different from the original AddC (Alg. 1), (a) the clustering procedure in OCEA starts from the initial clusters composed of the solutions in the initial population (line 2 in Alg. 2); (b) During the evolution, some solutions are dominated and need to be removed. An extra operation is added to account for the removal of solutions, including the updating of cluster statistics and the discarding of any empty cluster (lines 8 to 13 in Alg. 4); (c) In our online clustering procedure, is not assigned to its closest cluster as opposed to Alg. 1 where a new data need to be assigned to its closest cluster (line 15 in Alg. 4).

OCEA incorporates the online clustering tightly within the evolution search. The online clustering discovers adaptively the PS structure along with the evolution. New solutions are created taking the cluster information into account at each generation. As a result, it can be seen that the online clustering closely adapts to the search procedure; and accounts for the nonstationary of the evolution dynamics.

Different from existing regularity modelbased MOEAs in which the learning at each generation has a time complexity linearly to the number of training iterations. The number of generations should be large enough to make sure the convergence of the learning algorithm. On the contrary, in our scheme, each solution is visited only once. This can significantly reduce the computational burden.

In our scheme, we do not require a postprocess which is different from the original AddC algorithm.
4 Experimental Study
To investigate the performance of OCEA, it is compared with two decompositionbased MOEAs (MOEA/DDE [18] and TMOEA/D [29]), one regularity model based MOEA (RMMEDA [6]), one popular performance metric based MOEA (SMSEMOA [14]), and one typical Pareto dominance based MOEA (NSGAII [10]).
Among these algorithms, MOEA/DDE decomposes the MOP into a set of singleobjective problems with uniformly distributed weights. It might be not able to obtain approximated fronts with good diversity for MOPs with complex PFs. TMOEA/D transforms the objective functions into those that are easy to be addressed by MOEA/D. This is to make MOEA/D perform well on MOPs with complex PFs. RMMEDA is developed based on the regularity property. It learns some local principle components at each generation, and uses the principle components to approximate the manifold structure. SMSMOEA uses the hypervolume metric as the selection criterion. NSGAII, on the other hand, uses the Pareto dominance relationship among individuals and crowding distance to carry out environmental selection. These comparison algorithms cover all the main streams of MOEAs in the literatures.
4.1 Test Instances and Performance Metrics
MOPs with complex PF and complicated PS structures are particularly focused in this paper. The GLT test suite from [4] are used in the comparison experiments. The test suite includes a variety of problems with various characteristics that challenge MOEAs greatly. Those characteristics include disconnected PF, convex PF, nonlinear variable linkage, etc.
Two commonlyused performance metrics, inverted generational distance (IGD) [6] and hypervolume (HV) [30], are employed to measure the algorithm’s performance. These two metrics can measure both the convergence and diversity of the final approximated fronts found by MOEAs. Lower IGD and larger HV metric values imply better performance of MOEAs.
To calculate the HV metric value of an approximated front, a reference point which can be dominated by all the objective vectors in the final approximated front need to be set. The reference points chosen for the test instances are as follows: for GLT1, , for GLT2 , for GLT3 and for GLT4 , for GLT5GLT6, .
4.2 Experimental Settings
It has been well acknowledged that for the GLT test instances, the DE and PM operators are more able to produce promising solutions than other operators [4]. Therefore, to make a fair comparison, the recombination operators in NSGAII and SMSEMOA are replaced by the DE and PM operators used in this paper. Furthermore, all parameters in the experiments are adjusted through preliminary experiments for optimal performance on these test instances. All algorithms are implemented in Matlab and tested in the same computer. The parameter settings for these algorithms are as follows:

Common parameters:

population size: for biobjective and 105 for triobjective instances;

search space dimension: for GLT1GLT6;

runs: each algorithms independently runs each test instance for 33 times;

termination: maximum evolutionary generation .


Parameters for OCEA:

maximum number of clusters allowed: ;

mating control parameter: ;

DE control parameters: ;

PM control parameters: .


Parameters for MOEA/DDE:

neighbourhood size: ;

mating control parameter: ;

maximum number of solutions to be replaced by an offspring: 2;

DE control parameters: ;

PM control parameters: .


Parameters for TMOEA/D:

neighbourhood size: ;

generations for the first stage: ;

generations for the second stage: ,
; 
DE control parameters: .


Parameters for RMMEDA:

number of clusters in local PCA: 5;

maximum iterations used in local PCA: 50;

sampling extension ratio: 0.25.


Parameters for NSGAII and SMSEMOA:

DE control parameters: ;

PM control parameters: .

To get statistically sound conclusions in the experiments, each algorithm independently runs 33 times for each instance, and the comparisons are performed based on the statistics of the performance metric values, i.e., mean and standard deviation values. In the comparison table, the mean IGD and HV metric values for each instance are sorted in an ascending and descending order, respectively, and the ranks are given in the square brackets of the table. The best mean metric values are highlighted in bold face with gray background. The Wilcoxon’s rank sum test at a 5% significance level is also performed to test the significance of differences between the mean metric values of each instance obtained by each pair of algorithms. In the tables, “”, “”, and “” are used to denote that the mean metric values obtained by OCEA is better than, worse than, or similar to those achieved by the comparison algorithm, respectively.
4.3 Comparison Study
To study the statistical performance of OCEA, Table 1 shows the statistics of IGD and HV metric values obtained by MOEA/DDE, TMOEA/D, RMMEDA, NSGAII, SMSEMOA and OCEA on the GLT test suite averaged over 33 independent runs. In general, OCEA obtains 8 out of 12 best mean metric values, while the rest algorithms only obtain 4. According to the mean ranks, the algorithms’ performance ranked from the best to the worst are OCEA, RMMEDA, TMOEA/D, SMSEMOA, NSGAII and MOEA/DDE. Specifically, according to the Wilcoxon’s rank sum test, in the 12 comparisons with each of MOEA/DDE, TMOEA/D, RMMEDA, NSGAII and SMSEMOA, OCEA achieves 12, 11, 11, 12, 11 better, 0, 1, 1, 0, 0 worse, and 0, 0, 0, 0, 1 similar mean metric values, respectively. Table 1 denotes that OCEA performs the best overall on the GLT test suite.
Instance  MOEA/DDE  TMOEA/D  RMMEDA  NSGAII  SMSEMOA  OCEA 

IGD  
GLT1  7.042e03[3]  5.472e03[2]  1.324e02[4]  1.550e02[5]  2.100e02[6]  2.041e03[1] 
GLT2  3.569e01[6]  3.636e02[2]  3.327e02[1]  4.146e02[4]  4.955e02[5]  3.720e02[3] 
GLT3  3.829e02[6]  2.212e02[5]  2.018e02[4]  1.405e02[2]  1.929e02[3]  6.432e03[1] 
GLT4  1.985e02[2]  4.462e02[5]  4.147e02[4]  4.106e02[3]  5.505e02[6]  5.769e03[1] 
GLT5  8.079e02[6]  4.409e02[3]  5.130e02[4]  6.419e02[5]  3.018e02[2]  2.942e02[1] 
GLT6  5.582e02[6]  4.059e02[4]  3.835e02[3]  5.384e02[5]  2.225e02[2]  2.223e02[1] 
HV  
GLT1  3.367e+00[2]  3.366e+00[3]  3.316e+00[4]  3.312e+00[5]  3.297e+00[6]  3.369e+00[1] 
GLT2  1.943e+01[6]  1.977e+01[2]  1.970e+01[5]  1.972e+01[3]  1.972e+01[4]  1.981e+01[1] 
GLT3  3.941e+00[6]  3.943e+00[5]  3.944e+00[4]  3.946e+00[2]  3.946e+00[3]  3.948e+00[1] 
GLT4  4.980e+00[2]  4.869e+00[6]  4.961e+00[3]  4.954e+00[4]  4.953e+00[5]  4.993e+00[1] 
GLT5  7.939e+00[5]  7.958e+00[3]  7.951e+00[4]  7.939e+00[6]  7.968e+00[2]  7.969e+00[1] 
GLT6  7.937e+00[5]  7.947e+00[4]  7.948e+00[3]  7.933e+00[6]  7.960e+00[2]  7.961e+00[1] 
Mean Rank  4.583  3.667  3.583  4.167  3.833  1.167 
//  12/0/0  11/1/0  11/1/0  12/0/0  11/0/1 
To observe the search efficiency of OCEA, Fig. 1 shows the evolution of the statistics of the IGD metric values obtained by the six algorithms on GLT1GLT6. From the figure, it can be seen that for GLT1 and GLT3GLT6, OCEA reaches the fastest to the lowest mean IGD metric values. For GLT2, OCEA has the slower, similar and faster speed in comparison with RMMEDA, TMOEA/D and the other algorithms, respectively. Moreover, when dealing with GLT2, OCEA actually performs better than RMMEDA at the early stage compared with RMMEDA. From the evolution of the standard deviations of the metrics, it also can be observed that within 300 generations, OCEA has achieved robust performance on all the instances except for GLT3. Fig. 1 indicates that OCEA approaches the fastest to the PFs and maintains the most diverse populations among the comparison algorithms on average.
To reveal the search processes, Fig. 2 plots the evolution of the approximated fronts obtained by RMMEDA, NSGAII, MOEA/DDE and OCEA on GLT4. It is noted that the evolution of the approximated front obtained by each algorithm plotted in the figure is representative. The representative evolution of an algorithm here indicates the final approximated front yielded by the evolution is with the median IGD metric value in 33 independent runs. It can be seen from the figure that, at the 100th generation, the approximated front yielded by OCEA has reached the PF completely, and almost covered the whole PF. After 300 generations, it has reached the approximated front with excellent convergence and diversity. On the other hand, after 300 generations, the final approximated fronts obtained by RMMEDA, NSGAII, MOEA/DDE still cannot cover the whole PF, are not distributed unevenly. Fig. 2 shows that OCEA can indeed greatly improve the search efficiency.
To further investigate the effect of OCEA, Fig. 3 plots the final approximated fronts obtained by RMMEDA and OCEA on GLT1GLT6. All the final approximated fronts of each instance obtained by RMMEDA and OCEA, are plotted in Fig. a and b. The final approximated front of each instance with median IGD metric value (called representative front) obtained by RMMEDA and OCEA, respectively, over 33 independent runs are plotted in Fig. c and d. From Fig. a and b, it can be seen that through 33 independent runs, the final approximated fronts of each instance achieved by RMMEDA and OCEA, respectively, both can cover the whole PF of that instance. However, compared with RMMEDA, OCEA performs more stably. From Fig. c and d, it is observed that the representative fronts of GLT5GLT6 yielded by RMMEDA do not reach the PFs. For GLT1GLT4, although the representative fronts yielded by RMMEDA all reach the PFs, the PFs are not completely covered. By contrast, the representative fonts obtained by OCEA for each instance all converge to the PFs and distributed well over them. Fig. 3 implies that for the GLT test instances, OCEA is stable and robust in terms of convergence and diversity.
In summary, we may conclude that OCEA has shown an excellent performance for dealing with MOPs with complicated PSs and complex PFs.
5 Further Discussions
5.1 Performance on WFG test suite
To deeply understand the performance of OCEA, OCEA is also applied to the WFG test suite [31] and compared with the five algorithms mentioned above. It is well known that the WFG test instances have complex PFs and are with various complicated characteristics, such as nonseparable, multimodal, degenerate, deceptive, etc. In this section, 9 biobjective WFG test instances with 30 dimensional decision variables are taken as the testbed. The maximum evolutionary generation is set as 450. Through preliminary optimisation over parameters, part of the parameter settings of these algorithms are listed in Table 2; while the rest is the same as in Section 4.2. Again 33 independent runs of these algorithms are carried out on each test instance. Table 3 shows the statistics of the IGD and HV metric values obtained by MOEA/DDE, TMOEA/D, RMMEDA, NSGAII, SMSEMOA and OCEA on the WFG test instances over 33 independent runs.
Algs.  Parameters 

MOEA/DDE  
TMOEA/D  
RMMEDA  
NSGAII  
SMSEMOA  
OCEA 
test instances  MOEA/DDE  TMOEA/D  RMMEDA  NSGAII  SMSEMOA  OCEA 

IGD  
WFG1  1.105e+00[2]  9.327e01[1]  1.169e+00[3]  1.435e+00[5]  1.446e+00[6]  1.306e+00[4] 
WFG2  3.781e02[6]  2.860e02[4]  3.030e02[5]  1.431e02[3]  1.357e02[1]  1.429e02[2] 
WFG3  1.456e02[4]  1.237e02[2]  3.399e02[6]  1.869e02[5]  1.245e02[3]  1.184e02[1] 
WFG4  3.400e02[1]  4.921e02[3]  9.385e02[6]  5.692e02[5]  4.581e02[2]  5.512e02[4] 
WFG5  6.762e02[3]  7.776e02[5]  1.116e01[6]  6.895e02[4]  6.656e02[1]  6.657e02[2] 
WFG6  3.325e01[5]  3.531e01[6]  3.249e01[2]  3.321e01[4]  3.318e01[3]  3.249e01[1] 
WFG7  1.873e02[3]  4.113e02[6]  3.935e02[5]  2.860e02[4]  1.192e02[2]  1.137e02[1] 
WFG8  4.930e02[3]  4.484e02[2]  1.628e01[6]  6.190e02[5]  5.474e02[4]  3.763e02[1] 
WFG9  2.451e01[3]  2.687e01[5]  2.073e01[1]  2.673e01[4]  2.703e01[6]  2.422e01[2] 
HV  
WFG1  5.942e+00[2]  6.718e+00[1]  5.607e+00[3]  4.427e+00[5]  4.398e+00[6]  4.889e+00[4] 
WFG2  1.144e+01[2]  1.145e+01[1]  1.127e+01[6]  1.141e+01[5]  1.143e+01[3]  1.142e+01[4] 
WFG3  1.092e+01[4]  1.093e+01[3]  1.077e+01[6]  1.089e+01[5]  1.094e+01[2]  1.094e+01[1] 
WFG4  8.527e+00[1]  8.337e+00[5]  8.107e+00[6]  8.341e+00[4]  8.420e+00[2]  8.350e+00[3] 
WFG5  8.140e+00[4]  7.815e+00[6]  7.962e+00[5]  8.144e+00[3]  8.197e+00[2]  8.199e+00[1] 
WFG6  6.351e+00[5]  6.245e+00[6]  6.388e+00[2]  6.351e+00[4]  6.355e+00[3]  6.391e+00[1] 
WFG7  8.643e+00[3]  7.979e+00[6]  8.459e+00[5]  8.579e+00[4]  8.666e+00[2]  8.672e+00[1] 
WFG8  8.450e+00[2]  8.274e+00[5]  7.674e+00[6]  8.329e+00[4]  8.373e+00[3]  8.463e+00[1] 
WFG9  6.181e+00[3]  6.038e+00[6]  6.411e+00[1]  6.119e+00[4]  6.114e+00[5]  6.246e+00[2] 
Mean Rank  3.111  4.056  4.444  4.278  3.111  2.000 
//  12/5/1  12/4/2  14/3/1  15/0/3  11/4/3 
Table 3 shows that OCEA achieves 9 out of the 18 best mean metrics. The rest five algorithms obtain only 9. The performance of these algorithms ranked from the best to the worst is OCEA, MOEA/DDE, SMSEMOA, TMOEA/D, NSGAII and RMMEDA according to the mean ranks. The Wilcoxon’s rank sum test suggests that OCEA performs better than MOEA/DDE, TMOEA/D, RMMEDA, NSGAII and SMSEMOA in 12, 12, 14, 15 and 11 out of the 18 mean metric values; performs worse in 5, 4, 3, 0 and 4; and similar in 1, 2, 1, 3 and 3. From Table 3, we may conclude that OCEA performs very well in solving the WFG test instances. It also indicates that OCEA is able to deal with MOPs with complex PFs and with complicated problem characteristics.
6 Parameter Sensitivity Analysis
The sensitivity of OCEA to its parameters is analysed in this section. The GLT test suite is used for the analysis.
6.1 Maximum Number of Clusters
To test how affects the performance of OCEA, 4, 5, 7, 10, 20 are chosen to do analysis. The rest parameters are the same as those in Section 4.2. OCEA was run on each test instances independently 22 times with different values. Fig. a shows the mean and standard deviation values of the IGD metric values obtained by OCEA.
From Fig. a, it can be seen that for GLT2, GLT5GLT6, OCEA can always achieve similar performance robustly for different values. But for GLT1, GLT3GLT4, different leads to relatively large performance differences. Especially, when is large, the performance of OCEA is not well enough. In general, a small can result in good search results by OCEA on the GLT test instances. This implies that OCEA is not very sensitive to the values on the GLT test instances. Therefore, is chosen in Section 4 to carry out the comparison. It should be noted that the optimal depends on individual problem.
6.2 Clustering Effectiveness Analysis
The evolution procedure couples naturally with the online clustering procedure in OCEA. It is expected that the approximated set will present a clustering effect when the evolution procedure has converged. To justify the effectiveness of the online clustering, Fig. 5 plots the clustering results in the first 3dimensional search space on the GLT1GLT6 test instances. In the figure, the solutions in each different cluster are marked with different colors and symbols. It can be seen that the final approximated sets are partitioned into 7 clusters clearly (note that is set as 7). This figure indicates that OCEA can indeed approximate the clustering structure effectively.
6.3 Mating Restriction Probability
To test the sensitivity of the OCEA’s performance to the mating control parameter , 0.5, 0.6, 0.7, 0.8, 0.9 are used for the analysis. The rest parameters are the same as those in Section 4.2. Again, for different value, OCEA independently run 22 time on the test instances. Fig. b shows the statistics of the obtained IGD metric values.
From Fig. b, it is observable that for GLT5 and GLT6, different values bring a similar performance for OCEA; but for GLT1GLT4, OCEA with different values performs very differently. Nevertheless, when , OCEA has relatively better performance for all the instances. The observation in Fig. b indicates that OCEA is not so sensitive to the setting of in solving the GLT test instances. Therefore, is chosen in Section 4 for the controlled comparison experiments. Again, it is necessary to point out that an optimal setting should depend on the problem characteristics.
6.4 Control Parameters of Differential Evolution Operator
The effect of the DE parameters, i.e., and , are to be evaluated in this section. 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1 are chosen to proceed analysis. The rest parameters are the same as in Section 4.2. When different () values are set, () is set as 1 (0.6). The mean and standard deviation values of the IGD metric values obtained by OCEA with different or over 22 independent runs are shown in Figs. c and d.
Fig. c shows that the value has a crucial effect on the OCEA performance for GLT1, GLT3GLT4, and a large value can lead to a good OCEA performance. However, for GLT2, GLT5GLT6, different settings do not affect the OCEA performance acutely. Fig. d shows that the value has a significant effect on OCEA for GLT1GLT4, and a small value is better. But OCEA performs rather stably for GLT5GLT6 with different values. In case () and (), OCEA can always find good IGD metric values for all the GLT test instances. In general, Figs. c and d denote that OCEA is not very sensitive to the and settings.
7 Conclusion
This paper presented a firstever MOEA that incorporate an online clustering to address the nonstationary nature of the evolutionary search. The underlying consideration is 1) to learn the manifold structure of the PS (i.e. the socalled regularity property of MOPs) through clustering; and 2) to adapt to the nonstationary search dynamics. The online agglomerative clustering approach developed in [9] is modified to accommodate the evolution search dynamics. Experimental study has shown that the online clustering can address the nonstationary search process well, and is able to adaptively learn the clustering structure of the PS. The comparison against five wellknown MOEAs has also shown that the structures learned adaptively by the online clustering can indeed improve the search efficiency (in terms of search speed) and effectiveness (in terms of the quality of the final approximated sets and fronts). Future work includes 1) the development of intelligent recombination operators that can be well fitted in the online learning mechanism; 2) the development and/or incorporation of other online learning strategies; and 3) the study of the developed framework for manyobjective optimisation problems.
Footnotes
 A process is stationary if and only if the joint distribution of the data at different time are the same. Specifically, if we let be the generations of the evolution, and be a dimensional solution. The sequence is a stationary stochastic process if the joint probabilistic distribution of and are the same for all and an arbitrary selection of . This is obviously not the case for the stream of offsprings created during the evolution process.
 In computer science, online machine learning methods learn patterns from data which are available in a sequential order as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once.
 The definition of domination is for minimization. “Dominate” means “be better than”.
 It is well acknowledged that mixtures of Gaussian distributions are dense in the set of probability distributions with respect to weak topology [25].
References
 K. Miettinen, Nonlinear Multiobjective Optimization. Boston, USA: Kluwer Academic Publishers, 1999.
 K. Deb, MultiObjective Optimization Using Evolutionary Algorithms. New York, USA: John Wiley & Sons, 2001.
 A. Zhou, B.Y. Qu, H. Li, S.Z. Zhao, P. N. Suganthan, and Q. Zhang, “Multiobjective evolutionary algorithms: A survey of the state of the art,” Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 32–49, 2011.
 H. Zhang, S. Song, and X. Z. Gao, “Selforganizing multiobjective optimization based on decomposition with neighborhood ensemble,” Neurocomputing, 2015.
 X. Yu and M. Gen, Introduction to Evolutionary Algorithms. London, UK: SpringerVerlag, 2010.
 Q. Zhang, A. Zhou, and Y. Jin, “RMMEDA: A regularity model based multiobjective estimation of distribution algorithm,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 1, pp. 41–63, 2008.
 C. Hillermeier, Nonlinear Multiobjective OptimizationA Generalized Homotopy Approach. Basel, Switzerland: Birkhäuser Verlag, 2001.
 Y. Wang, J. Xiang, and Z. Cai, “A regularity modelbased multiobjective estimation of distribution algorithm with reducing redundant cluster operator,” Applied Soft Computing, vol. 12, no. 11, pp. 3526–3538, 2012.
 I. D. Guedalia, M. London, and M. Werman, “An online agglomerative clustering method for nonstationary data,” Neural computation, vol. 11, no. 2, pp. 521–540, 1999.
 K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGAII,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
 E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strength pareto evolutionary algorithm,” Swiss Federal Institute Technology, Zurich, Switzerland, Tech. Rep., 2001.
 D. W. Corne, N. R. Jerram, J. D. Knowles, M. J. Oates, and J. Martin, “PESAII: Regionbased selection in evolutionary multiobjective optimization,” in Proceedings of the 2nd Annual Genetic and Evolutionary Computation Conference. San Francisco, California, USA: Morgan Kaufmann Publishers, 2001, pp. 283–290.
 K. Deb and H. Jain, “An evolutionary manyobjective optimization algorithm using referencepointbased nondominated sorting approach, part i: Solving problems with box constraints,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014.
 N. Beume, B. Naujoks, and M. Emmerich, “SMSEMOA: multiobjective selection based on dominated hypervolume,” European Journal of Operational Research, vol. 181, no. 3, pp. 1653–1669, 2007.
 J. Bader and E. Zitzler, “HypE: An algorithm for fast hypervolumebased manyobjective optimization,” Evolutionary Computation, vol. 19, no. 1, pp. 45–76, 2011.
 D. Phan and J. Suzuki, “R2IBEA: R2 indicator based evolutionary algorithm for multiobjective optimization,” in Proceedings of the 2013 IEEE Congress on Evolutionary Computation. New York, USA: IEEE, 2013, pp. 1836–1845.
 C. A. Rodríguez Villalobos and C. A. Coello Coello, “A new multiobjective evolutionary algorithm based on a performance assessment indicator,” in Proceedings of the 14th Annual Genetic and Evolutionary Computation Conference. New York, USA: ACM, 2012, pp. 505–512.
 H. Li and Q. Zhang, “Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGAII,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 2, pp. 284–302, 2009.
 H.L. Liu, F. Gu, and Q. Zhang, “Decomposition of a multiobjective optimization problem into a number of simple multiobjective subproblems,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 3, pp. 450–455, 2014.
 Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, 2007.
 K. Li, Q. Zhang, S. Kwong, M. Li, and R. Wang, “Stable matchingbased selection in evolutionary multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 6, pp. 909–923, 2014.
 A. Zhou, Q. Zhang, and Y. Jin, “Approximating the set of paretooptimal solutions in both the decision and objective spaces by an estimation of distribution algorithm,” IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 1167–1189, 2009.
 Y. Li, X. Xu, P. Li, and L. Jiao, “Improved RMMEDA with local learning,” Soft Computing, vol. 18, no. 7, pp. 1–15, 2013.
 K. Li and S. Kwong, “A general framework for evolutionary multiobjective optimization via manifold learning,” Neurocomputing, vol. 146, pp. 65–74, 2014.
 A. Bacharoglou, “Approximation of probability distributions by convex mixtures of gaussian measures,” Proceedings of the American Mathematical Society, vol. 138, no. 7, pp. 2619–2628, 2010.
 K. Price, R. M. Storn, and J. A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization. Heidelberg, Berlin, Germany: SpringerVerlag, 2006.
 X. Qiu, J. Xu, K. Tan, and H. Abbass, “Adaptive crossgeneration differential evolution operators for multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. PP, no. 99, pp. 1–1, 2015.
 E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Performance assessment of multiobjective optimizers: An analysis and review,” IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132, 2003.
 H.L. Liu, F. Gu, and Y. Cheung, “TMOEA/D: MOEA/D with objective transform in multiobjective problems,” in Proceedings of the 2010 International Conference of Information Science and Management Engineering, vol. 2. New York, USA: IEEE, 2010, pp. 282–285.
 E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach,” IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.
 S. Huband, L. Barone, L. While, and P. Hingston, “A scalable multiobjective test problem toolkit,” in Proceedings of the 3rd International Conference on Evolutionary MultiCriterion Optimization. Heidelberg, Berlin, Germany: SpringerVerlag, 2005, pp. 280–295.