Whale swarm algorithm with iterative counter for multimodal function optimization
Abstract
Most realworld optimization problems often come with multiple global optima or local optima. Therefore, increasing niching metaheuristic algorithms, which devote to finding multiple optima in a single run, are developed to solve these multimodal optimization problems. However, there are two difficulties urgently to be solved for most existing niching metaheuristic algorithms: how to set the optimal values of niching parameters for different optimization problems, and how to jump out of the local optima efficiently. These two difficulties limited their practicality largely. Based on Whale Swarm Algorithm (WSA) we proposed previously, this paper presents a new multimodal optimizer named WSA with Iterative Counter (WSAIC) to address these two difficulties. In the one hand, WSAIC improves the iteration rule of the original WSA for multimodal optimization, which removes the need of specifying different values of attenuation coefficient for different problems to form multiple subpopulations, without introducing any niching parameter. In the other hand, WSAIC enables the identification of extreme point during iterations relying on two new parameters (i.e., stability threshold and fitness threshold ), to jump out of the located extreme point. The proposed WSAIC is compared with several niching metaheuristic algorithms on CEC2015 niching benchmark test functions and five additional classical multimodal functions with high dimensions. The experimental results demonstrate that WSAIC statistically outperforms other niching metaheuristic algorithms on most test functions.
keywords:
Whale swarm algorithm, multimodal optimization, metaheuristic algorithm, niching1 Introduction
Most of the realworld optimization problems are multimodal gao2017jaya (); tasgetiren2017iterated (); lin2016effective (); ismkhan2017effective (); yi2016parallel (); dahi2016quantum (), i.e., their objective functions often contain multiple global optima or local optima. If a traditional numerical method is used to solve a multimodal optimization problem, it has to try many times for locating a different optimum in each single run to pick out the global optima, which will take a lot of time and work. In such a scenario, using metaheuristic algorithms, no matter evolutionary algorithms (EAs) or swarm based algorithms, to solve these problems has become a hot research topic, as they are easy to implement and can converge to as good as possible solutions. However, many metaheuristic algorithms, such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and so on, are primarily designed to search for a single global optimum. But, it is desirable to locate multiple global optima in order to choose the most appropriate solution for engineers. In addition, some metaheuristic algorithms are easy to fall into the local optima. Therefore, many techniques have been proposed for the metaheuristic algorithms to find as many global optima as possible. These techniques are commonly known as niching methods li2010niching (), which are committed to promoting and maintaining the formation of multiple stable subpopulations within a single population for locating multiple optima. Some representative niching methods include crowding de1975analysis (), fitness sharing goldberg1987genetic (), clustering yin1993fast (), restricted tournament selection harik1995finding (), parallelization bessaou2000island (), speciation deb1989investigation (), and population topologies kennedy2002population (), and so on. Several of them are presented below, more references and discussions about niching methods can be seen from reference li2016seeking ().
Crowding was firstly proposed by De Jong de1975analysis () to preserve genetic diversity, so as to improve the global search ability of the algorithm for locating multiple optima. In crowding method, the offspring with better fitness replaces the most similar individual from a subset (i.e., crowd) of the population. The similarity is generally measured by distance, including hamming distance for binary encoding and Euclidean distance for realvalued encoding thomsen2004multimodal (), which means that the smaller the distance of two individuals is, the higher similarity of two individuals is. The individuals of subset are randomly selected from the population, and the size of subset is a user specified parameter called crowding factor () that is often set to 2 or 3. However, low values will lead to replacement errors, i.e., the offspring replaces another individual with little similarity, which will lower the population diversity. To reduce replacement errors, deterministic crowding mahfoud1992crowding () and probabilistic crowding mengshoel1999probabilistic () were proposed, Thomsen suggested simply setting equal to the population size thomsen2004multimodal (), and so on.
Goldberg and Richardson goldberg1987genetic () proposed fitness sharing mechanism, which develops the method of sharing functions to permit the formation of multiple subpopulations. When using this method, the shared fitness of all the individuals need to be calculated according to Eq.1 before executing the operators of an algorithm, as the operations on the individuals are based on their shared fitness.
(1) 
where, and are the original fitness and shared fitness of individual respectively; is the shared value of individual with other individuals, and is formulated as , wherein is the population size, is the sharing function over the individual and , which is calculated as follows.
(2) 
where, is a constant, and always set as 1; is the distance between the individual and ; is the sharing distance, which is always set as the value of peak radius. However, this method assumes that all the peaks have equal height and width. Obviously, a prior knowledge of the fitness landscape is required for this method to set the value of .
Speciation deb1989investigation () is another popular niching technique, which is used to form parallel subpopulations, i.e., species, according to the similarity between individuals. The similarity is also measured by distance, such as Euclidean distance. This niching technique employs one userspecified parameter called species distance () to divide the population into a set of species. The detail procedure of forming species in every generation is shown below. Assuming a maximization optimization problem. The first step is to find out the species seeds that dominate their own species. Firstly, an empty set is defined to contain the species seeds. Sorting the individuals in decreasing order of fitness and adding the first individual of population after sorting to the set . Then, judging the remaining individuals one by one in order, and determining whether they are within a distance /2 of any species seed in . If no, they are added to . After all the individuals are traversed, the set has collected all the species seeds. Next comes the step of adding the individuals to their corresponding species. For each species seed in , adding the individuals that are within a distance /2 of it to its species, if an individual has been already added to a species, doing nothing. Although speciation method is able to divide the population into multiple subpopulations, it has a major shortcoming that its parameter, i.e., species distance, is hard to be set precisely for different optimization problems. In such case, inspired by the Multinational Evolutionary Algorithms ursem1999multinational (), Stoean et al. stoean2007disburdening () proposed “detectmultimodal” mechanism to establish species, which removes the need of distance parameter. The “detectmultimodal” mechanism utilities a set of interior points between two individuals to detect whether there is a valley between them in the fitness landscape, so as to determine whether the two individuals track different extreme points. If the fitness values of all the interior points are better than the worse fitness value among these two individuals, they are considered following the same extreme point, i.e., locating in the same peak of the fitness landscape, as shown in Fig. 1(a), wherein, > and >. On the contrary, if there exist at least one interior point whose fitness value is worse than the worse fitness value among these two individuals, at least one valley is considered existing between the two individuals, i.e., they are considered tracking different extreme points as shown in Fig. 1(b), wherein, <. Those individuals following the same extreme point are added to the same species. Although “detectmultimodal” mechanism does not utilize species distance to divide the population into multiple species, it employs another parameter called “number of gradations”, i.e., number of interior points, which also depends on the problem to be solved.
Thus it can be seen that some niching methods need to set some parameters, which require some prior knowledge of the fitness landscape, to divide the population into multiple subpopulations. However, for some realworld optimization problems, the prior knowledge of the fitness landscape are very difficult or almost impossible to obtain li2010niching (). Therefore, these niching methods are difficult to be used to deal with the realworld optimization problems. In this paper, a new multimodal optimization algorithm called Whale Swarm Algorithm with Iterative Counter (WSAIC), based on our preliminary work in Zeng2017 (), is proposed. By improving the iteration rule of the original WSA for multimodal optimization, WSAIC removes the need of specifying optimal parameter values for different problems to form multiple subpopulations, without introducing any niching parameter. In addition, WSAIC enables the identification of extreme point to jump out of the located extreme points during iterations.
The remainder of this paper is organized as follows. A brief overview of the multimodal optimization algorithms is presented in section 2. Section 3 describes the proposed WSAIC in sufficient detail. The experimental setup containing the algorithms compared, test functions, parameters setting and performance metrics is presented in section 4. The next section presents the experimental results and analysis to evaluate WSAIC. The last section is the conclusions and topics for further research.
2 Related works
Since increasing niching methods were put forward, a large number of multimodal optimization algorithms combining the metaheuristic algorithms with these niching methods have been proposed. In this section, a brief overview of multimodal optimization algorithms is presented. According to whether the prior knowledge of the fitness landscape is needed, these multimodal optimization algorithms are classified into prior knowledge based methods and nonprior knowledge based methods. More references and discussions about multimodal optimization algorithms can be viewed in references li2016seeking (); das2011real ().
2.1 Prior knowledge based methods
Species Conserving Genetic Algorithm (SCGA) was proposed by Li et al. li2002species () via introducing speciation and species conservation techniques into the classical GA. In each iteration, the current population is partitioned into multiple subpopulations (i.e., species) using the speciation technique deb1989investigation (), before executing the genetic operators. Moreover, after executing the genetic operators, all the species seeds are either conserved to the next generation or replaced by better members of the same species, which can contribute significantly to the preservation of global and local optima that have been found so far. Li showed that the additional overhead of SCGA caused by these two techniques is not higher than that introduced by GA with sharing goldberg1987genetic (), and SCGA performs far better than Genetic Algorithm with Sharing (SGA) in success rates of locating the global optima.
Li li2005efficient () proposed Speciesbased DE (SDE) algorithm to solve multimodal optimization problems via introducing speciation technique. In SDE algorithm, when the number of member individuals of a species is less than a predefined value, the algorithm will randomly generate new individuals within the radius of species seed until the species size reaches the predefined value. Then, each identified species runs the conventional DE algorithm by itself. In addition, if the fitness of an offspring is the same as that of its species seed, this offspring will be replaced by a randomly generated new individual. These two mechanisms have improved the efficiency of SDE algorithm significantly.
The speciation technique was also introduced into the conventional PSO by Li li2004adaptively () to solve multimodal optimization problems. In each iteration of Speciesbased PSO (SPSO), after the population is divided into multiple species and the species seeds are determined, each species seed is assigned to its member individuals as the . Then, each individual updates its position according to the iterative equations of velocity and position in the PSO. The experimental results showed that SPSO is comparable or better than SNGA beasley1993sequential (), SCGA and NichePSO brits2002niching () over a set of multimodal functions.
Stoean et al. stoean2007disburdening () proposed Topological Species Conservation (TSC) algorithm, which utilities the “detectmultimodal” mechanism to remove the need of distance parameter when selecting species seeds and forming species. In TSC algorithm, all the individuals that track the same extreme point are in the same species, which corresponds the real structure of the optimization function. And the species seeds also can be conserved to the next generation. However, TSC algorithm need excessive fitness evaluations in seeds selection procedure, especially when the number of interior points get larger. For improving the computational efficiency of TSC algorithm, i.e., saving the fitness evaluations, Stoean et al. stoean2010multimodal () proposed Topological Species Conservation Version 2 (TSC2) algorithm. In TSC2 algorithm, the current free individual chooses the seed one by one in ascending order of distance to it to perform the “detectmultimodal” procedure until the return value is true or this individual is considered a new seed, because the species dominated by the closer seed is more likely to track the same peak with the current free individual. With this method, TSC2 algorithm saves considerable fitness evaluations. In addition, when the optimization function has a large number of local optima, TSC algorithm might block the population into too many seeds that would only be conserved to the next generation, which would significantly reduce the search ability of TSC algorithm. And TSC2 algorithm introduced the maximum number of seeds to guarantee the algorithm’s search ability.
Deb and Saha deb2010finding () firstly converted a singleobjective multimodal optimization problem into a biobjective optimization problem. Multiple global and local optima of the original problem become the members of weak Paretooptimal set of the transformed problem. One of the objectives of the transformed problem is the objective function of the original problem. With regards to another objective, the gradientbased approach is firstly employed, which is based on the property that the derivatives of objective function at the minimum points are equal to zero. However, the derivatives of objective function at the maximum and saddle points are also equal to zero, and the objective functions of some optimization problems may be nondifferentiable at the minimum points. Then, more pragmatic neighborhood count based approaches are developed for establishing the second objective, which is assigned to the count of the number of neighboring solutions that are better than the current solution in terms of their objective function values. During iterations, the nondominated ranks of different solutions rely on two parameters, i.e., and , which are used to distinguish two optima.
2.2 Nonprior knowledge based methods
Thomsen thomsen2004multimodal () proposed Crowdingbased DE (CDE) algorithm by introducing crowding method into the conventional DE for multimodal function optimization. In CDE algorithm, the similarity of two individuals is measured by the Euclidean distance between two individuals. The fitness value of an offspring is only compared with that of the most similar individual in the current population, and the offspring replaces the most similar individual if it has better fitness. This replacement scheme can make the population remain diversity in the search space, which has a great contribution to the location of multiple optima. Thomsen showed that CDE algorithm performs better than a fitness sharing DE variant over a group of multimodal functions.
The History based topological speciation (HTS) was proposed by Li and Tang li2015history () to incorporate into the CDE with species conservation technique for multimodal optimization. HTS is a parameterfree speciation method, which captures the landscape topography relying exclusively on search history, which avoids the additional sampling and function evaluations associated with existing topology based methods. Therefore, HTS is a parameterfree speciation method. The experimental results showed that HTS performs better than existing topologybased methods when the function evaluation budget is limited.
Liang et al. liang2006comprehensive () proposed Comprehensive Learning Particle Swarm Optimizer (CLPSO) for multimodal function optimization. In CLPSO, all particles’ best previous positions can potentially be used to guide a particle’s flying, i.e., each dimension of a particle may learn from the corresponding dimension of different particle’s best previous position. The velocity updating equation of CLPSO is shown as follows.
(3) 
where, is an inertia weight, is an acceleration constant, denotes the th dimension of particle ’s position, represents the th dimension of particle ’s velocity. is a random number between 0 and 1 associated with . For particle , a set =[, , , , , ], where denotes the dimension of fitness function, is built to store the serial numbers of those particles that particle should learn from their best previous positions at the corresponding dimensions. denotes the th dimension of particle ’s best previous position. The values of elements in depend on the learning probability that can take different values for different particles. For example, generating a random number for assigning , if this random number is greater than , assigning to ; otherwise, assigning the serial number of a particle selected from population with tournament selection procedure to . If particle does not find a better position after a certain number of iterations called the refreshing gap , reassigning for particle .
Li li2007multimodal () proposed FitnessDistanceRatio based PSO (FERPSO) algorithm, which utilities FER to avoid specifying any niching parameter, for multimodal function optimization. The FER value between particle and particle is shown as follows.
(4) 
where, and are the best previous positions of particle and particle respectively; is a scaling factor and formulated as follows.
(5) 
where, and are the bestfit particle and worstfit particle in current population respectively.  is the size of search space, which is estimated by its diagonal distance (where denotes the dimension of search space, i.e., the number of variables. and are the upper and lower bounds of the th variable , respectively). In every iteration, each particle needs to calculate the FER value between it and every other particle to find the neighboring point denoted by , corresponding to the maximal FER value. Then, each particle updates its velocity according to Eq. 6. Over successive iterations, some subpopulations tracking different peaks will be formed, so as to locate multiple optima.
(6) 
where, and are the velocity and position of particle respectively. and denote two vectors which comprise random values generated between 0 and . is a positive constant. And is a constriction coefficient.
The PSO niching algorithms using ring topology, such as , ,  and , were also proposed by Li li2010niching () for multimodal function optimization. These ring topology based PSO niching algorithms also remove the need of specifying any niching parameter. Taking for example, a particle’s neighboring best point , shown in Eq. 6, is set as the best one among the best previous positions of its two immediate neighbors (i.e., left and right neighbors identified by population indices). With the ring topology methods, these PSO algorithms are able to form multiple subpopulations over successive iterations. Li showed that the PSO algorithms using ring topology can provide comparable or better performance than SPSO and FERPSO on some test functions.
Qu et al. qu2012differential () proposed a neighborhood based mutation and integrated it with three niching DE algorithms, i.e., CDE, SDE and sharing DE thomsen2004multimodal (), for multimodal function optimization. In neighborhood mutation, the subpopulations are formed, relying on the parameter neighborhood size . During iterations, each individual should calculate the Euclidean distances between it and other individuals in the population. Then, selecting the former nearest individuals to current individual to form a subpopulation. And the offspring of each individual is generated by using the corresponding DE algorithm within the subpopulation that the individual belongs to. After a certain number of iterations, some subpopulations will track different extreme points of the multimodal function to be optimized. Generally, the parameter can be set to a value between 1/20 of the population size and 1/5 of the population size.
The locally informed PSO (LIPS) algorithm was proposed by Qu et al. qu2013distance () for multimodal function optimization. LIPS makes use of the local information (best previous positions of several neighbors) to guide the search of each particle. The velocity updating equation of LIPS is shown as follows.
(7) 
where, is an inertia weight, denotes the th dimension of particle ’s position, is the th dimension of particle ’s velocity. , is the neighbor size, which is dynamically increased from 2 to 5 during iterations; is a random number generated in [0, 4.1/], and ; is the best previous position of the th nearest neighbor to the th individual’s best previous position. With this technique, LIPS algorithm eliminates the requirement for specifying any niching parameter and improves the local search ability. Qu et al. showed that LIPS algorithm outperforms several wellknown niching algorithms, containing , , SPSO, FERPSO, SDE and CDE, and so on, over 30 standard benchmark functions not only on success rate but also with regard to accuracy.
Wang et al. wang2015mommop () proposed Multiobjective Optimization for Multimodal Optimization Problems (MOMMOP), which transforms a Multimodal Optimization Problem (MMOP) into a Multiobjective Optimization Problem (MOP) with two conflicting objectives. So that all the global optima of the original MMOP can become the Pareto optimal solutions of the transformed problem. With MOMMOP, an MMOP is transformed into a MOP as follows.
(8) 
where, is a solution, () is the th variable, and denotes the number of variables. and are the two conflicting objectives of the transformed problem. is the objective function value of with respect to the original problem. and denote the best and worst objective function values during the evolution, respectively. and are the upper and lower bounds of the first variable, respectively. is the scaling factor, which gradually increases during the evolution. Because some optima may have the same values in certain variables, for the sake of locating multiple global optima, each variable is used to design a biobjective optimization problem similar to Eq. 8. If a solution Pareto dominates another solution on all the biobjective optimization problems, is considered to dominate . What’s more, to make the population distribute more reasonably, another comparison criterion is proposed. That is a solution dominates another solution if
(9) 
where, and are the objective function values of and , respectively, with respect to the original problem. denotes the Euclidean distance between the normalized and (i.e., =( )/( ), =( )/( ), where {1, ,}). If ( normalization(, ))<0.01, and is considered to be quite similar to each other.
2.3 Our motivations
Based on the above overview, we can find that lots of multimodal optimization algorithms need to set some niching parameters, which require some prior knowledge of the fitness landscape. However, it is very difficult or almost impossible to obtain the prior knowledge of the fitness landscape, for some realworld optimization problems. What’s more, few of existing multimodal optimization algorithms can effectively identify and get rid of the located extreme points during iterations. Because they have no idea whether a subpopulation has already located the extreme point of a peak or not, before the end of running. Therefore, lots of function evaluations will be wasted, when an extreme point has been located early. And it also restrict the global search ability of the algorithm that a subpopulation all the time tracks an extreme point located early.
So, based on the above analysis, our main motivations in this paper are summarized as follows.

Improving the iteration rule of the original WSA to remove the need of specifying optimal values of attenuation coefficient for different problems to form multiple subpopulations, without adding any niching parameter.

Enabling the identification of extreme point and jumping out of the located extreme points during iterations, relying on two new parameters named stability threshold and fitness threshold , so as to eliminate the unnecessary function evaluations and improve the global search ability.
3 Whale swarm algorithm with iterative counter
Firstly, this section introduces the traditional WSA briefly. Then, the improvements of WSA for multimodal function optimization are presented. Next, the implementation of WSAIC is described in sufficient detail. Assuming the problems to be solved by the algorithms are minimization problems. Let the fitness functions be the same as the objective functions.
3.1 Whale swarm algorithm
We proposed WSA for function optimization Zeng2017 (), inspired by the whales’ behavior of communicating with each other via ultrasound for hunting. It can be seen from reference Zeng2017 (), WSA performs well on maintaining population diversity and has strong local search ability, which contribute significantly to locating the global optima with high accuracy. In WSA, a whale updates its position, under the guidance of its “better and nearest” whale , according to the following equation.
(10) 
where, and denote the th elements of ’s position at and +1 iterations respectively, and represents the th element of ’s position at iteration. is the intensity of ultrasound source, which can be set to 2 for almost all the cases. denotes the natural constant. is the attenuation coefficient. And is the Euclidean distance between and . denotes a random value generated between 0 and uniformly. According to Eq. 10, a whale would move positively and randomly under the guidance of its “better and nearest” whale which is close to it, and move negatively and randomly under the guidance of that whale which is quite far away from it.
The general framework of WSA is shown in Fig. 2, where  in line 6 denotes the number of members in , namely the swarm size, and in line 7 is the th whale in . From Fig. 2, it can be seen that WSA has a fairly simple structure. In every iteration, before moving, each whale needs to find its “better and nearest” whale as shown in Fig. 3, where in line 6 is the fitness value of whale .
3.2 The improvements of WSA
3.2.1 The improvement on iteration rule of WSA
Although the original WSA performs well on forming multiple parallel subpopulations and maintaining the population diversity, it needs to specify optimal values of attenuation coefficient for different problems, which reduces the practicality of WSA. Thus, we improve the iteration rule of WSA to remove the need of specifying optimal values of attenuation coefficient for different problems, on the premise of ensuring the formation of multiple subpopulations and the ability of local exploitation, in this section. Firstly, we assume that the intensity of ultrasound is not attenuated in water, i.e., =0, which means that each whale can correctly understand the message send out by any other whale in the search area. Therefore, a whale will move positively and randomly under the guidance of its “better and nearest” whale, whether that whale is close to it or far away from it. So, when a whale and its “better and nearest” whale track different extreme points, the whale may leave far away from the extreme point tracked by it due to the guidance of its “better and nearest” whale that follows another extreme point, which will weaken WSA’s ability of local exploitation. Taking a onedimensional function optimization problem for example, as it can be seen from Fig. 4, the whale is near to an extreme point, however, its “better and nearest” whale is near to another extreme point. In this case, may move to a worse point or even go to another peak under the guidance of , which will impede the location of the extreme point tracked by previously. Obviously, this situation is not conducive to locating multiple global optima for WSA.
To solve the above problem effectively, we improved the rule of location update for each whale as follows. Firstly, generating a copy of a whale , then, moves under the guidance of ’s “better and nearest” whale according to Eq. 10. If the position of after moving is better than that of (i.e., the fitness value of after moving is less than that of ), moves to ; otherwise, remains unchanged. In a word, if a whale find a better position by Eq. 10 in an iteration, it moves to the better position; otherwise, it remains quiescent in its original position. So, when it comes to the case shown in Fig. 4, the probability of whale leaving away from the extreme point tracked by it will be reduced very much, because whale is difficult to find a better position by Eq. 10 under the guidance of its “better and nearest” whale . In other words, the whale may stay at its original position with great probability to guide the movement of other whales. When appearing at least one whale that follows the same extreme point with and is better than in the meantime, will converge to the extreme point under the guidance of the nearest one among those better whales, in next iteration. Therefore, this improvement will contribute significantly to forming multiple subpopulations and enhancing the ability of local exploitation for the improved WSA, which are very conducive to locating multiple global optima, despite =0. What’s more, this improvement does not added any niching parameter.
3.2.2 Identifying and escaping from the located extreme points during iterations
In the field of multimodal optimization, identifying the located extreme points effectively and jumping out of these extreme points for saving unnecessary function evaluations during iterations are very important for metaheuristic algorithms to locate the global optimum/optima. Although the improved WSA mentioned in section 3.2.1 can ensure the formation of multiple subpopulations and the ability of local exploitation, it cannot yet identify the located extreme points and escape from these extreme points during iterations. In such case, we propose two new parameters, i.e., stability threshold and fitness threshold , which aims to help each whale identify the located optima and jump out of these optima during iterations, so as to save unnecessary function evaluations and improve the global search ability. In this paper, is a predefined number of iterations utilized to judge whether a whale has reached steady state or not, and that a whale has reached steady state means that it has located the extreme point tracked by it. And is a predefined value utilized to judge whether a solution is a current global optimum or not. If a whale does not find a better position after successive iterations, it is considered to have already reached steady state and located an extreme point. If the difference between its fitness value and (the fitness value of the best one among the current global optima) is less than , the whale’s position is considered a current global optimum; otherwise, the whale’s position is considered a local optimum. If the whale’s position is a current global optimum, recording this optimum. Then, the whale that has reached steady state is randomly reinitialized in the search area to jump out of the located extreme point. For judging whether a whale does not find a better position after successive iterations or not, each whale keeps an iterative counter . In this paper, the improved WSA is called WSA with Iterative Counter (WSAIC).
3.3 The detailed procedure of WSAIC
Fig. 5 presents the pseudo code of WSAIC. For WSAIC, it is worth noting that the initialization of a whale contains two operations: initializing the whale’s position randomly and assigning 0 to its iterative counter. The improvement on iteration rule of WSA described in section 3.2.1 can be seen from Fig. 5. If a whale’s “better and nearest” whale exists (line 8 in Fig. 5), a copy of this whale is generated firstly (line 9 in Fig. 5), then, the copy moves under the guidance of the “better and nearest” whale according to Eq. 10 (line 10 in Fig. 5). If the position of this copy after moving is better than that of the original whale (line 12 in Fig. 5), the copy replaces the original whale (line 13 in Fig. 5).
The detail of identifying and escaping from the located extreme points during iterations for WSAIC is shown below. If a whale finds a better position (lines in Fig. 5) in an iteration, assigning 0 to its iterative counter (line 14 in Fig. 5); otherwise, the whale should check its iterative counter (lines and in Fig. 5). The detail procedure of checking a whale’s iterative counter is demonstrated in Fig. 6. As we can see from Fig. 6, firstly it should determine whether the whale’s iterative counter is equal to stability threshold or not. If the whale’s iterative counter is not equal to (line 2 in Fig. 6), its increases by 1 (line 3 in Fig. 6); otherwise, the whale is considered to have already reached steady state and located an extreme point. If the whale has reached steady state, it should determine whether the located extreme point is a current global optimum or not (line 5 in Fig. 6). If it is a current global optimum, recording this extreme point. Then, the whale that has reached steady state is randomly reinitialized (line 6 in Fig. 6), for jumping out of the located extreme point to find the global optima. Thus it can be seen that, with the parameter stability threshold , the proposed WSAIC can jump out of the located extreme points on the premise of keeping enough local search ability.
The detail procedure of judging whether a solution is a current global optimum or not is demonstrated in Fig. 7. Firstly, it should judge whether the fitness value of the solution is less than (the fitness value of the best one among the current global optima set ) or not. If the fitness value of this solution is less than (line 2 in Fig. 7), this solution must be the current global optimum. Before updating (line 6 in Fig. 7) and recording the new current global optimum (line 7 in Fig. 7), it should judge whether the optima in located before are still the current global optima or not. If the difference between and the whale’s fitness is greater than (line 3 in Fig. 7), all the elements of located before are not the current global optima, so needs to be cleared (line 4 in Fig. 7). If the fitness value of this solution is greater than (line 8 in Fig. 7), it should judge whether this solution is a current global optimum or not. If the difference between the fitness value of this solution and is not greater than (line 9 in Fig. 7), this solution is considered a current global optimum, so it is added to (line 10 in Fig. 7).
3.4 Parameters setting of WSAIC
As we can see from the detailed steps above, WSAIC contains four algorithmspecific parameters, i.e., intensity of ultrasound source , attenuation coefficient , stability threshold and fitness threshold . and are two constants, and always set to 2 and 0 respectively. should be set to a comparatively small value that is between 0 and the difference between the global second best fitness and the global best fitness, if the problem to be solved has at least one local optimum as shown in the example of an onedimensional function in Fig. 8. The and in Fig. 8 denote the global optimum and the global second best solution respectively, and the difference between their objective function values is quite small. For the function to be optimized in Fig. 8, should be set to a very small value that between 0 and . For almost all the problems, especially those problems without prior knowledge, can be set to . And for those benchmark test functions whose global optima are given, can be set to the value of the predefined fitness error (i.e., level of accuracy) that is utilized to judge whether a solution is a real global optimum. may need to be specified different values for different problems. According to a large number of experimental results, it is reasonable to set =100, where is the function dimension.
4 Experimental setup
The proposed WSAIC and other algorithms compared are all implemented with C++ programming language by Microsoft visual studio 2015 and executed on the PC with 3.2 GHz and 3.6 GHz Intel core i53470 processor, 4 GB RAM and 64bit Microsoft windows 10 operating system. The 5 popular niching metaheuristic algorithms compared are listed as follows.

LIPS qu2013distance (): The locally informed PSO.

FERPSO li2007multimodal (): FitnessEuclidean distance ratio PSO.

SDE li2005efficient (): Speciationbased DE.

CDE thomsen2004multimodal (): The original crowding DE.

SPSO li2004adaptively (): Speciationbased PSO.
Apart from the above 5 niching metaheuristic algorithms, WSAIC also compares with WSA Zeng2017 (). It is worth noting that the different evolutionary rules of different algorithms will result in different computational complexity. As all these algorithms compared are implemented in the same development environment, and run on the same computer, we utilize the CPU time as the stopping criterion for the sake of fairness. It is obvious that the more global optima the algorithm finds and the accuracy of these optima are higher when satisfying the stopping criterion, the better the algorithm performs.
4.1 Test functions
We use 20 multimodal benchmark functions to test these algorithms. Basic information of these test functions is summarized in Table 1, in which the symbol “” in last column corresponding to F16F20 means that these functions have many local optima, and we have not counted the number of their local optima. In Table 1, the former 15 multimodal functions come from CEC2015 Suganthan2015cec (), and the latter 5 functions are the classical multimodal functions with high dimension. These CEC2015 functions can be divided into two parts. The first 8 functions are expanded scalable functions and the remaining 7 functions are composition functions. All these CEC2015 functions come with search space shift and rotation that makes them more difficult to be solved, while the latter 5 multimodal functions are only shifted. More details of these test functions are presented in the document “Definitions of CEC2015 niching benchmark 20141228” which can be download from the website shown in reference Suganthan2015cec (). For functions F2, F3, F5, F6, F7, F8, F9, F11, F12 and F13 the objective is to locate all the global optima, while for the rest the target is to escape from the local optima to hunt for the global optimum. And all these test functions are minimization problems.
Fn.  Test function name  Dimensions  No. of global optima  No. of local optima 

F1  Expanded TwoPeak Trap  5  1  15 
F2  Expanded FiveUnevenPeak Trap  5  32  0 
F3  Expanded Equal Minima  4  625  0 
F4  Expanded Decreasing Minima  5  1  15 
F5  Expanded Uneven Minima  3  125  0 
F6  Expanded Himmelblau’s Function  4  16  0 
F7  Expanded SixHump Camel Back  6  8  0 
F8  Modified Vincent Function  3  216  0 
F9  Composition Function 1  10  10  0 
F10  Composition Function 2  10  1  9 
F11  Composition Function 3  10  10  0 
F12  Composition Function 4  10  10  0 
F13  Composition Function 5  10  10  0 
F14  Composition Function 6  10  1  19 
F15  Composition Function 7  10  1  19 
F16  Griewank  50  1  
F17  Ackley  100  1  
F18  Rosenbrock  100  1  
F19  Rastrigin  100  1  
F20  Expanded Scaffer’s F6  100  1  
Search range: 
4.2 Parameters setting
To compare the performance of the multimodal optimization algorithms in this paper, all the test functions should be treated as blackbox problems, though their global optima can be obtained by the method of derivation. Thus, the known global optima of these test functions cannot be used by the algorithms during iterations. After each algorithm finished the iteration, the fitness error , i.e., level of accuracy, is used to judge whether a solution is a real global optimum. If the difference between the fitness value of a solution and the fitness value of the known global optimum is lower than , this solution can be considered a real global optimum. In our experiments, the fitness error , population size and CPU time used by these algorithms for the test functions are listed in Table 2. In the third column of Table 2, the number before slash denote the population size of WSAIC, while the number after slash are the population size of the 6 algorithms compared. Because the individuals in population of WSAIC are able to identify and jump out of the located extreme points effectively during iterations, WSAIC can keep a relative small population size with respect to other algorithms, which contributes to reduce the computation complexity of WSAIC. It is worth noting that a function which has more optima or higher dimension may require a larger population size or more CPU time.
Fn.  pop. size of WSAIC /  CPU time (s)  

pop. size of other algorithms  
F1  0.00000001  40 / 200  6 
F2  0.00000001  60 / 200  200 
F3  0.00000001  50 / 2000  1500 
F4  0.00000001  30 / 100  180 
F5  0.00000001  40 / 800  80 
F6  0.00000001  40 / 200  20 
F7  0.000001  30 / 100  30 
F8  0.0001  100 / 1000  1500 
F9  0.00000001  500 / 3000  1800 
F10  0.00000001  500 / 1000  500 
F11  0.00000001  100 / 800  800 
F12  0.00000001  400 / 600  800 
F13  0.00000001  50 / 300  260 
F14  0.00000001  400 / 2000  500 
F15  0.00000001  100 / 600  1000 
F16  0.00000001  100 / 200  300 
F17  0.00000001  40 / 300  150 
F18  0.00000001  60 / 300  1200 
F19  0.00000001  80 / 300  1000 
F20  0.00000001  60 / 200  600 
The parameters’ values of these algorithms compared are set as same as those in their reference source respectively. Table 3 lists the values of main parameters of these algorithms.
Algorithms  Parameters 

LIPS  =0.729844, =25 
FERPSO  =0.729844, =4.1 
SDE  =0.9, =0.5, =10 
CDE  =0.9, =0.5, =population size 
SPSO  =0.729844, =2.05, =2.05 
WSA  =2 
WSAIC  =2, =0, =100, = 

: inertia weight; : neighborhood size;

: constriction factor; , , : coefficient;

: crossover rate; : scaling factor; : species size;

: crowding factor;
The attenuation coefficient of WSA for these test functions are listed in Table 4. Table 5 shows the values of species radius of SDE and SPSO for these test functions.
Fn.  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10 

0.0001  0.1  0.14  0.00005  0.16  0.16  0.001  0.3  0.09  0.001  
Fn.  F11  F12  F13  F14  F15  F16  F17  F18  F19  F20 
0.01  0.001  0.001  0.001  0.001  0.005  0.01  0.014  0.005  0.01 
Fn.  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10  F11  F12  F13  F14  F15  F16  F17  F18  F19  F20 

of SDE  25  8  15  10  8  40  10  15  10  25  25  30  15  10  10  0.5  0.5  0.2  0.5  0.5 
of SPSO  60  200  30  50  30  60  70  60  100  100  100  140  150  120  150  1000  900  950  2000  2000 
4.3 Performance metrics
To fairly compare the performance of WSAIC with other 6 algorithms, we have conducted 51 independent runs for each algorithm over each test function. And the following four metrics are used to measure the performance of all the algorithms.

Success Rate (SR) li2004adaptively (): the percentage of runs in which all the global optima are successfully located using the given level of accuracy.

Average Number of Optima Found (ANOF) Suganthan2015cec (): the average number of global optima found over 51 runs.

Quality of optima found: the mean of fitness values of optima found over 51 runs, reflecting the accuracy of optima found.

Convergence rate: the rate of an algorithm converging to the global optimum over function evaluations.
5 Experimental results and analysis
In this section, the results of the comparative experiments are presented and analyzed in detail.
5.1 Quantity of optima found
This section presents and analyses the results of quantity of optima found by these algorithms. Firstly, all the algorithms are compared on “Success Rate”, which is the most popular metric used to test the performance of the multimodal optimization algorithms in terms of locating multiple global optima. Then, the metric “Average Number of Optima Found” is employed to further compare the performance of the algorithms on locating multiple global optima, as some algorithms can not achieve nonzero SR over some functions with multiple global optima.
5.1.1 Success Rate
The SR of each algorithm on each test function is presented in Table 6, in which each number within the parenthese denotes the rank of each algorithm on the corresponding function in terms of SR, and the bold number means the corresponding algorithm performs best on the function. The same SR value on a function means that the corresponding algorithms have the same rank for the function. The last row of Table 6 shows the total rank of each algorithm for all the test functions, which are the summation of each individual rank of the algorithm for each function. It can be seen from Table 6 that WSAIC performs best on most of the test functions in terms of SR. Especially on F3, F5 and F8 which have massive global optima, WSAIC performs far better than other algorithms, while the algorithms compared can not achieve nonzero SR values on the three functions. It is worth noting that F9F15 are composition functions with search space shift and rotation, whose global optima are more difficult to be located, so that all the algorithms can not achieve nonzero SR values on F9F14. For the composition function F15, WSAIC, LIPS and FERPSO all get the maximal SR value, i.e., 1. What’s more, for the high dimensional multimodal functions F16, F18 and F19, WSAIC can also achieve much higher SR values than most of other multimodal optimization algorithms. It also can be seen that the better performance of WSAIC in terms of SR can be supported by the total rank of WSAIC that is 21 which is much better than those achieved by other algorithms.
Fn.  LIPS  FERPSO  SPSO  SDE  CDE  WSA  WSAIC 

F1  0.69  0.67  0  1  0  0.12  1 
(3)  (4)  (6)  (1)  (6)  (5)  (1)  
F2  0  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F3  0  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F4  0.31  0.43  0.33  0.80  1  0  1 
(5)  (6)  (4)  (3)  (1)  (7)  (1)  
F5  0  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F6  0.12  0  0  0  1  0  1 
(3)  (4)  (4)  (4)  (1)  (4)  (1)  
F7  0.06  0  0  0.59  1  0  1 
(4)  (5)  (5)  (3)  (1)  (5)  (1)  
F8  0  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F9  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F10  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F11  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F12  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F13  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F14  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F15  1  1  0  0.98  0.12  0.06  1 
(1)  (1)  (7)  (4)  (5)  (6)  (1)  
F16  1  0.49  0  0  0  0.42  0.96 
(1)  (3)  (5)  (5)  (5)  (4)  (2)  
F17  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
F18  0  0  0  0  0  0  0.88 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F19  0  0  0  0  0  0  0.98 
(2)  (2)  (2)  (2)  (2)  (2)  (1)  
F20  0  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  (1)  
Total rank  37  43  51  40  39  51  21 
5.1.2 Average Number of Optima Found
As the sample size in this paper is 51 that is greater than 30, we have conducted the Two Independentsamples Ztest for WSAIC to judge whether the difference between its population and the populations of other algorithms, respectively represented by their independent samples, are significant or not on each test function under the significance level 0.05, which is based on the variance between the ANOF of two independent samples. Table 7 presents the ANOF of each algorithm on each test function, and the standard deviation of the number of optima found are also listed. The symbol “+” means that the difference between the population of WSAIC and the population of the algorithm compared is significant, and WSAIC performs better than the algorithm compared, while the symbol “=” means that the difference is not significant. And the symbol “” means that the difference is significant, and WSAIC performs worse than the algorithm compared. The bold number in Table 7 means that the corresponding algorithm performs best on the function in terms of ANOF. It can be seen form Table 7 that WSAIC has the best performance in terms of ANOF over F1F8, F15, F18 and F19, which echoes the best SR values of WSAIC on these test functions as shown in Table 6. For the two composition functions F10 and F14 and the two high dimensional functions F17 and F20, all the algorithms can not get nonzero ANOF, which means that all the algorithms can not find the global optima of these functions. For the composition function F12, WSAIC only performs a little bit worse than SDE in terms of ANOF. While for the composition functions F9 and F11, WSAIC performs far better than other algorithms compared. It also can be seen that the better performance of WSAIC in terms of the number of optima found can be supported by the total number of symbols “+”, “=” and “” in the last three rows of Table 7. As we can see from Table 7, the nonzero values of the number of symbol “” only occur when WSAIC is compared with LIPS and SDE, that are 1 respectively. And the number of symbol “+” is lager than that of symbol “=” when compared with all the other algorithms. The better performance of WSAIC is firstly due to the improvement on the location update rule of WSA when =0, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position, which can ensure the formation of multiple subpopulations and keep the ability of local exploitation. More importantly, the method of identifying and jumping out of the located extreme points during iterations can improve the global search ability as far as possible, which can contribute significantly to the location of multiple global optima.
Fn.  LIPS  FERPSO  SPSO  SDE  CDE  WSA  WSAIC  
F1  0.690.46  +  0.670.47  +  00  +  10  =  00  +  0.120.32  +  10 
F2  20.921.90  +  7.431.42  +  00  +  13.771.55  +  00  +  1.960.74  +  320 
F3  288.478.92  +  148.147.32  +  00  +  163.73212.52  +  1.941.31  +  10.061.42  +  6250 
F4  0.310.46  +  0.430.50  +  0.330.47  +  0.800.40  +  10  =  00  +  10 
F5  107.393.34  +  84.294.36  +  00  +  15.651.54  +  00  +  6.220.89  +  1250 
F6  14.390.97  +  11.391.36  +  0.200.40  +  4.370.48  +  160  =  2.730.79  +  160 
F7  5.531.33  +  2.731.07  +  0.240.42  +  7.470.70  +  80  =  0.510.50  +  80 
F8  112.274.72  +  70.294.23  +  11.313.54  +  23.531.67  +  204.593.86  +  10.841.41  +  2160 
F9  7.080.27  3.551.05  +  00  +  00  +  00  +  2.590.69  +  4.221.13  
F10  00  =  00  =  00  =  00  =  00  =  00  =  00 
F11  0.980.14  +  0.100.30  +  0.350.48  +  0.670.46  =  00  +  0.390.49  +  0.800.40 
F12  0.450.50  0.020.14  =  00  =  0.820.38  00  =  00  =  0.140.34  
F13  10  =  0.960.19  =  00  +  10  =  10  =  0.040.19  +  0.940.24 
F14  00  =  00  =  00  =  00  =  00  =  00  =  00 
F15  10  =  10  =  00  +  0.980.14  =  0.120.32  +  0.060.24  +  10 
F16  10  +  0.490.50  +  00  +  00  +  00  +  0.410.49  =  0.960.19 
F17  00  =  00  =  00  =  00  =  00  =  00  =  00 
F18  00  +  00  +  00  +  00  +  00  +  00  +  0.880.32 
F19  00  +  00  +  00  +  00  +  00  +  00  +  0.980.14 
F20  00  =  00  =  00  =  00  =  00  =  00  =  00 
+  10  13  15  11  11  14  
=  8  7  5  8  9  6  
2  0  0  1  0  0 
5.2 Quality of optima found
This section compares the performance of these algorithms in terms of the quality of optima found. Table 8 presents the mean of fitness values of optima found over 51 runs on all these test functions, and the standard deviation of fitness values of optima found are also listed in the parentheses. It is worth noting that the fitness values of optima found by all the algorithms on the CEC2015 niching test functions (i.e., F1F15 in Table 1) have substracted 100*Fn. (Fn. denotes the serial number of a function), for comparing the performance of all the algorithms on the quality of optima found. And we have also conducted the Two Independentsamples Ztest between WSAIC and other algorithms compared. The bold number in Table 8 means that the corresponding algorithm performs best on the function in terms of the quality of optima found. It can be seen form Table 8 that WSAIC has the best performance over F1, F2, F4, F6F9, F14 and F17F19. The better performance of WSAIC in terms of the quality of optima found can also be supported by the total number of symbols “+”, “=” and “” in the last three rows of Table 8, in which the number of symbol “” pertaining to different algorithms compared is much less than that of symbols “+” and “=”.
Fn.  LIPS  FERPSO  SPSO  SDE  CDE  WSA  WSAIC  
F1  8.68E+00  +  1.33E+01  +  3.66E02  +  3.62E10  =  4.32E01  +  5.73E+01  +  0.00E+00 
(1.60E+01)  (1.89E+01)  (7.61E02)  (1.80E09)  (2.95E01)  (3.19E+01)  (0.00E+00)  
F2  3.58E14  =  6.11E14  =  2.60E01  +  5.51E10  =  2.40E04  =  6.75E01  +  1.39E16 
(2.52E13)  (6.71E14)  (2.71E01)  (4.50E10)  (3.04E04)  (4.77E+00)  (9.85E16)  
F3  4.14E12  =  2.41E12  =  3.57E03  =  2.52E10  =  1.10E08  =  0.00E+00  =  3.25E14 
(9.06E12)  (9.77E12)  (1.88E03)  (5.00E10)  (3.29E09)  (0.00E+00)  (1.83E13)  
F4  5.79E02  =  7.23E02  +  2.39E01  =  2.11E02  =  1.45E14  =  1.29E+00  +  3.34E15 
(4.35E02)  (8.17E02)  (9.09E01)  (4.59E02)  (2.48E14)  (5.35E01)  (1.75E14)  
F5  2.38E11  =  2.58E12  =  5.49E04  =  3.26E10  =  8.58E05  =  0.00E+00  =  2.12E15 
(3.48E11)  (1.51E11)  (3.72E04)  (3.66E10)  (5.74E05)  (0.00E+00)  (9.89E15)  
F6  1.67E13  =  4.69E14  =  2.37E02  =  3.56E10  =  3.32E11  =  2.69E11  =  2.55E14 
(1.17E12)  (6.00E14)  (2.84E02)  (7.27E10)  (1.15E10)  (1.13E10)  (5.29E14)  
F7  5.58E07  =  5.58E07  =  1.22E02  =  6.15E07  =  5.58E07  =  1.79E+00  +  5.58E07 
(1.11E14)  (6.06E14)  (1.61E02)  (6.42E08)  (1.29E14)  (1.97E+00)  (3.34E15)  
F8  5.82E08  =  2.12E07  =  6.38E05  =  5.92E06  =  1.31E06  =  3.02E06  =  2.36E08 
(2.08E07)  (3.95E07)  (9.64E06)  (4.10E06)  (5.85E07)  (3.47E06)  (6.89E08)  
F9  1.54E11  =  1.47E10  =  1.53E+00  +  1.53E+00  +  1.53E+00  +  5.21E10  =  6.86E14 
(1.03E10)  (4.23E10)  (3.57E09)  (7.90E14)  (2.31E09)  (1.00E09)  (5.99E14)  
F10  2.94E+01  4.95E+03  +  1.26E+04  +  3.41E+01  5.10E+01  +  9.30E+03  +  3.71E+01  
(4.16E+00)  (6.63E+03)  (4.06E+03)  (1.03E+01)  (1.13E+01)  (6.54E+03)  (1.27E+01)  
F11  3.80E06  =  2.65E02  =  4.52E02  =  1.64E01  =  6.55E02  =  3.11E03  =  4.17E02 
(2.68E05)  (6.26E02)  (5.74E02)  (2.89E01)  (2.64E02)  (8.68E03)  (8.79E02)  
F12  1.48E01  1.66E01  5.04E+01  +  1.28E04  1.36E01  1.49E+01  +  4.52E01  
(1.56E01)  (5.91E02)  (7.69E+01)  (3.43E04)  (2.08E02)  (5.66E+01)  (2.59E01)  
F13  2.27E13  8.08E+00  4.98E+01  +  1.78E13  4.50E13  3.25E+02  +  1.19E+01  
(0.00E+00)  (4.17E+01)  (1.31E+01)  (1.93E13)  (9.54E14)  (7.85E+01)  (4.76E+01)  
F14  1.61E+02  +  1.05E+02  +  3.07E+02  +  1.39E+02  +  6.51E+02  +  4.59E+02  +  6.27E+01 
(1.93E+02)  (9.29E+01)  (2.38E+01)  (4.01E+01)  (3.16E+01)  (1.79E+02)  (2.88E+01)  
F15  2.32E13  =  4.46E14  =  4.92E+01  +  3.73E+00  +  1.06E06  =  2.37E+02  +  0.00E+00 
(3.15E14)  (9.03E14)  (1.34E+01)  (2.64E+01)  (2.25E06)  (7.19E+01)  (0.00E+00)  
F16  1.83E13  =  8.26E02  =  1.20E+00  +  4.78E02  +  6.37E04  =  9.45E03  =  1.58E03 
(9.03E14)  (2.63E01)  (2.13E02)  (1.08E02)  (9.34E04)  (1.05E02)  (1.02E02)  
F17  2.13E+01  +  2.00E+01  =  2.13E+01  +  2.11E+01  +  2.13E+01  +  2.00E+01  =  2.00E+01 
(3.12E02)  (8.07E03)  (2.22E02)  (1.99E02)  (2.72E02)  (3.21E04)  (6.66E05)  
F18  1.11E+02  +  5.59E+07  +  2.89E+07  +  1.91E+02  +  1.74E+09  +  9.30E+01  +  4.92E+01 
(2.57E+01)  (1.20E+08)  (7.33E+06)  (3.35E+00)  (3.66E+08)  (3.87E+01)  (1.98E+02)  
F19  1.98E+02  +  1.10E+03  +  4.08E+03  +  7.33E+02  +  1.81E+04  +  3.20E+03  +  1.37E01 
(2.95E+01)  (7.07E+02)  (2.26E+02)  (2.62E+01)  (8.62E+02)  (6.18E+02)  (9.66E01)  
F20  4.07E+01  3.68E+01  4.39E+01  =  4.08E+01  4.65E+01  +  4.41E+01  +  4.37E+01  
(1.22E+00)  (2.40E+00)  (5.23E01)  (7.15E01)  (2.04E01)  (6.58E01)  (7.35E01)  
+  6  6  12  7  8  12  
=  10  11  8  9  10  8  
4  3  0  4  2  0 
What’s more, the box plot of mean fitness values of optima found per run over 51 runs is shown in Fig. 9. It can be seen from Fig. 9, the dispersion degree of mean fitness values of optima found by WSAIC are quite small on most of the test functions with respect to other algorithms compared. And WSAIC only has outliers on F11, F13, F14, F16 and F20, while most of other algorithms have more outliers over these test functions. Therefore, it can be concluded that WSAIC has good stability on the accuracy of optima found over these test functions, with respect to other algorithms compared. The better performance of WSAIC in terms of the quality of optima found is also due to the improvement on the location update rule of WSA, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position, which can ensure the ability of local exploitation. For example, when some whales follow a same extreme point, the best whale among these whales will stay where it is with great probability to guide other whales to converge to the extreme point followed by them. Besides, the method of identifying and jumping out of the located extreme points during iterations can improve the global search ability as far as possible to find the global optima. For example, if some whales converge to a solution that near to a global optimum, with this method some other whales that have already reached steady state will be reinitialized, and they may move to the positions that near to those convergent whales, which will accelerate these whales to converge to the global optimum.
5.3 Convergence rate
From the previous two sections, it can be seen that the proposed WSAIC has better and more consistent performance than other algorithms in terms of both the quantity of optima found and the quality of optima found on most test functions. To demonstrate the efficiency of WSAIC on locating the global optima, WSAIC is compared with other algorithms excepting FERPSO and WSA (because the population of FERPSO and WSA may prematurely converge to a solution or several solutions with same fitness value and quit iteration) in terms of convergence rate in this section. Six functions (i.e., F1, F4, F9, F14, F18 and F19, wherein F9 has no local optima while others all come with local optima) are used to test these algorithms. The convergence curves of all the algorithms on these test functions are depicted in Fig. 10, in which the abscissa values represent the number of function evaluations and the ordinate values denote the mean of fitness values of the current global optima over 51 runs. It can be seen from Fig. 10(c), for function F9 without local optima, SPSO, SDE and CDE all cannot converge to the global optima, and WSAIC converge to the global optima with much faster rate than that of LIPS. Although LIPS can converge to the global optima of F9, it gets a much lower ANOF on F9 than that gained by WSAIC as shown in Table 7. What’s more, for functions F1, F4, F14, F18 and F19 that have multiple local optima, WSAIC can achieve the global optima with fast convergence rate on F1 and F4 as shown in Fig. 10(a) and Fig. 10(b), with respect to other algorithms. And WSAIC can converge to better solutions than other algorithms on F14, F18 and F19, as shown in Fig. 10(d), Fig. 10(e) and Fig. 10(f). Especially, the convergence rates of WSAIC on F14 and F19 are much faster than that of other algorithms. Therefore, it can be concluded that the proposed WSAIC has excellent performance on convergence rate with respect to other algorithms.
5.4 The effect of different parameter values
As mentioned in section 3.4, the parameters and two constants, and always set to 2 and 0 respectively. For almost all the problems, especially those problems without prior knowledge, can be set to . Thus, only the parameter stability threshold may need to be specified different values for different problems. This section presents the results of ANOF obtained by WSAIC on all these test functions with different values, as shown in Table 9. And a clear visual comparison of ANOF obtained by WSAIC with different values is shown in Fig. 11, where the values of ANOF with different values on each test function are normalized, and 1 refers to the best ANOF value while 0 refers to the worst ANOF value. It can be seen from Table 9 and Fig. 11, WSAIC can achieve the best ANOF values on most test functions with =100. Therefore, the parameter can be set to 100 for almost all the continuous optimization problems.
Fn.  =20  =40  =60  =80  =100  =120  =140  =160  =180  =200 

F1  1  1  1  1  1  1  1  1  1  1 
F2  32  32  32  32  32  32  32  32  31.98  32 
F3  472.94  586.94  622.20  624.92  625  625  625  624.96  624.90  624.96 
F4  0.69  0.92  0.96  0.98  1  1  1  1  1  1 
F5  125  125  125  125  125  125  125  124.94  124.94  124.90 
F6  16  16  16  16  16  15.98  15.94  15.94  15.96  15.96 
F7  8  8  8  8  8  8  8  8  7.98  8 
F8  216  215.98  216  216  216  215.92  215.88  215.88  215.88  215.82 
F9  5.69  4.69  4.24  3.71  4.22  3.84  3.86  4.04  4.06  3.71 
F10  0  0  0  0  0  0  0  0  0  0 
F11  0.82  0.76  0.63  0.65  0.80  0.73  0.67  0.73  0.73  0.67 
F12  0.08  0.02  0.06  0.06  0.14  0.08  0.08  0.08  0.04  0.06 
F13  0.69  0.82  0.63  0.67  0.94  0.61  0.69  0.49  0.51  0.49 
F14  0  0  0  0  0  0  0  0  0  0 
F15  0.94  0.80  0.92  0.88  1  0.92  0.65  0.67  0.71  0.55 
F16  0.84  0.88  0.92  0.73  0.96  0.76  0.92  0.90  0.92  0.76 
F17  0  0  0  0  0  0  0  0  0  0 
F18  0.71  0.78  0.57  0.78  0.88  0.64  0.69  0.67  0.75  0.67 
F19  1  0.98  0.96  0.98  0.98  0.96  0.94  0.88  0.92  0.82 
F20  0  0  0  0  0  0  0  0  0  0 
6 Conclusions and future research
A new multimodal optimizer named Whale Swarm Algorithm with Iterative Counter (WSAIC), based on our preliminary work in Zeng2017 (), is proposed in this paper. Firstly, WSAIC improves the iteration rule of the original WSA when attenuation coefficient is set to 0, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position. As a results, WSAIC removes the need of specifying different values of for different problems to form multiple subpopulations, without introducing any niching parameter. And the ability of local exploitation is also ensured. What’s more, WSAIC enables the identification of extreme point and enables jumping out of the located extreme points during iterations, relying on two new parameters, i.e., stability threshold and fitness threshold . If a whale did not find a better position after successive iterations, it is considered to have already located an extreme point and is to be reinitialized, so as to eliminate the unnecessary function evaluations and improve the global search ability. If the difference between the fitness value of the located extreme point and (the fitness value of the best one among the current global optima) is less than , the located extreme point is considered a current global optimum. And the values of and are very easy to set for different problems. The experimental results clearly show that WSAIC performs statistically better than other niching metaheuristic algorithms over most test functions on comprehensive metrics.
The main contributions of this paper are summarized into four aspects as follows.

WSAIC removes the need of specifying optimal niching parameter for different problems, which increases the practicality.

WSAIC can efficiently identify and jump out of the located extreme points during iterations, so as to locate as more global optima as possible in a single run, which further increases the practicality.

The algorithmspecific parameters of WSAIC are easy to be assigned for different problems, which also increases the practicality.

The population size of WSAIC does not need to exactly suit the number of optima of the optimization problem. Generally, WSAIC can keep a relative small population size, which contributes significantly to reducing the computation complexity.
In the future, we will focus on the following aspects.

Introducing other metaheuristic algorithms or heuristic algorithms for the best whale in each iteration to execute the neighborhood search process, so as to further improve the local search ability and the quality of optima.

Designing some new methods to escape from the located extreme points instead of random reinitialization, to make the population spread over the entire solution space as far as possible.
Acknowledgements
This research was supported by the National Natural Science Foundation of China (NSFC) (51435009, 51421062 and 61232008) and the National Key Technology Support Program (2015BAF01B04).
References
 (1) K. Gao, Y. Zhang, A. Sadollah, A. Lentzakis, R. Su, Jaya, harmony search and water cycle algorithms for solving largescale reallife urban traffic light scheduling problem, Swarm and Evolutionary Computation.
 (2) M. F. Tasgetiren, D. Kizilay, Q.K. Pan, P. N. Suganthan, Iterated greedy algorithms for the blocking flowshop scheduling problem with makespan criterion, Computers & Operations Research 77 (2017) 111–126.
 (3) G. Lin, W. Zhu, M. M. Ali, An effective hybrid memetic algorithm for the minimum weight dominating set problem, IEEE Transactions on Evolutionary Computation 20 (6) (2016) 892–907.
 (4) H. Ismkhan, Effective heuristics for ant colony optimization to handle largescale problems, Swarm and Evolutionary Computation 32 (2017) 140–149.
 (5) J. Yi, X. Li, C.H. Chu, L. Gao, Parallel chaotic local search enhanced harmony search algorithm for engineering design optimization, Journal of Intelligent Manufacturing (2016) 1–24.
 (6) Z. A. E. M. Dahi, C. Mezioud, A. Draa, A quantuminspired genetic algorithm for solving the antenna positioning problem, Swarm and Evolutionary Computation 31 (2016) 24–63.
 (7) X. Li, Niching without niching parameters: particle swarm optimization using a ring topology, IEEE Transactions on Evolutionary Computation 14 (1) (2010) 150–169.
 (8) K. A. De Jong, Analysis of the behavior of a class of genetic adaptive systems.
 (9) D. E. Goldberg, J. Richardson, et al., Genetic algorithms with sharing for multimodal function optimization, in: Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms, Hillsdale, NJ: Lawrence Erlbaum, 1987, pp. 41–49.
 (10) X. Yin, N. Germay, A fast genetic algorithm with sharing scheme using cluster analysis methods in multimodal function optimization, in: Artificial neural nets and genetic algorithms, 1993, pp. 450–457.
 (11) G. R. Harik, Finding multimodal solutions using restricted tournament selection., in: ICGA, 1995, pp. 24–31.
 (12) M. Bessaou, A. Pétrowski, P. Siarry, Island model cooperating with speciation for multimodal optimization, in: Parallel Problem Solving from Nature PPSN VI, Springer, 2000, pp. 437–446.
 (13) K. Deb, D. E. Goldberg, An investigation of niche and species formation in genetic function optimization, in: Proceedings of the 3rd international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., 1989, pp. 42–50.
 (14) J. Kennedy, R. Mendes, Population structure and particle swarm performance, in: Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on, Vol. 2, IEEE, 2002, pp. 1671–1676.
 (15) X. Li, M. Epitropakis, K. Deb, A. Engelbrecht, Seeking multiple solutions: an updated survey on niching methods and their applications, IEEE Transactions on Evolutionary Computation.
 (16) R. Thomsen, Multimodal optimization using crowdingbased differential evolution, in: Evolutionary Computation, 2004. CEC2004. Congress on, Vol. 2, IEEE, 2004, pp. 1382–1389.
 (17) S. W. Mahfoud, Crowding and preselection revisited, Urbana 51 (1992) 61801.
 (18) O. J. Mengshoel, D. E. Goldberg, Probabilistic crowding: Deterministic crowding with probabilistic replacement, in: Proc. of the Genetic and Evolutionary Computation Conference (GECCO99), 1999, p. 409.
 (19) R. K. Ursem, Multinational evolutionary algorithms, in: Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, Vol. 3, IEEE, 1999, pp. 1633–1640.
 (20) C. L. Stoean, M. Preuss, R. Stoean, D. Dumitrescu, Disburdening the species conservation evolutionary algorithm of arguing with radii, in: Proceedings of the 9th annual conference on Genetic and evolutionary computation, ACM, 2007, pp. 1420–1427.
 (21) B. Zeng, L. Gao, X. Li, Whale Swarm Algorithm for Function Optimization, Springer International Publishing, Cham, 2017, pp. 624–639. doi:10.1007/9783319633091_55.
 (22) S. Das, S. Maity, B.Y. Qu, P. N. Suganthan, Realparameter evolutionary multimodal optimizationâa survey of the stateoftheart, Swarm and Evolutionary Computation 1 (2) (2011) 71–88.
 (23) J.P. Li, M. E. Balazs, G. T. Parks, P. J. Clarkson, A species conserving genetic algorithm for multimodal function optimization, Evolutionary computation 10 (3) (2002) 207–234.
 (24) X. Li, Efficient differential evolution using speciation for multimodal function optimization, in: Proceedings of the 7th annual conference on Genetic and evolutionary computation, ACM, 2005, pp. 873–880.
 (25) X. Li, Adaptively choosing neighbourhood bests using species in a particle swarm optimizer for multimodal function optimization, in: Genetic and Evolutionary Computation Conference, Springer, 2004, pp. 105–116.
 (26) D. Beasley, D. R. Bull, R. R. Martin, A sequential niche technique for multimodal function optimization, Evolutionary computation 1 (2) (1993) 101–125.
 (27) R. Brits, A. P. Engelbrecht, F. Van den Bergh, A niching particle swarm optimizer, in: Proceedings of the 4th AsiaPacific conference on simulated evolution and learning, Vol. 2, Singapore: Orchid Country Club, 2002, pp. 692–696.
 (28) C. Stoean, M. Preuss, R. Stoean, D. Dumitrescu, Multimodal optimization by means of a topological species conservation algorithm, IEEE Transactions on Evolutionary Computation 14 (6) (2010) 842–864.
 (29) K. Deb, A. Saha, Finding multiple solutions for multimodal optimization problems using a multiobjective evolutionary approach, in: Proceedings of the 12th annual conference on genetic and evolutionary computation, ACM, 2010, pp. 447–454.
 (30) L. Li, K. Tang, Historybased topological speciation for multimodal optimization, IEEE Transactions on Evolutionary Computation 19 (1) (2015) 136–150.
 (31) J. J. Liang, A. K. Qin, P. N. Suganthan, S. Baskar, Comprehensive learning particle swarm optimizer for global optimization of multimodal functions, IEEE transactions on evolutionary computation 10 (3) (2006) 281–295.
 (32) X. Li, A multimodal particle swarm optimizer based on fitness euclideandistance ratio, in: Proceedings of the 9th annual conference on Genetic and evolutionary computation, ACM, 2007, pp. 78–85.
 (33) B.Y. Qu, P. N. Suganthan, J.J. Liang, Differential evolution with neighborhood mutation for multimodal optimization, IEEE transactions on evolutionary computation 16 (5) (2012) 601–614.
 (34) B.Y. Qu, P. N. Suganthan, S. Das, A distancebased locally informed particle swarm model for multimodal optimization, IEEE Transactions on Evolutionary Computation 17 (3) (2013) 387–402.
 (35) Y. Wang, H.X. Li, G. G. Yen, W. Song, Mommop: Multiobjective optimization for locating multiple optimal solutions of multimodal optimization problems, IEEE transactions on cybernetics 45 (4) (2015) 830–843.
 (36) http://www.ntu.edu.sg/home/EPNSugan/index_files/CEC2015/CEC2015.htm.