A Framework to Handle Multimodal Multiobjective Optimization in Decompositionbased Evolutionary Algorithms
Abstract
Multimodal multiobjective optimization is to locate (almost) equivalent Pareto optimal solutions as many as possible. While decompositionbased evolutionary algorithms have good performance for multiobjective optimization, they are likely to perform poorly for multimodal multiobjective optimization due to the lack of mechanisms to maintain the solution space diversity. To address this issue, this paper proposes a framework to improve the performance of decompositionbased evolutionary algorithms for multimodal multiobjective optimization. Our framework is based on three operations: assignment, deletion, and addition operations. One or more individuals can be assigned to the same subproblem to handle multiple equivalent solutions. In each iteration, a child is assigned to a subproblem based on its objective vector, i.e., its location in the objective space. The child is compared with its neighbors in the solution space assigned to the same subproblem. The performance of improved versions of six decompositionbased evolutionary algorithms by our framework is evaluated on various test problems regarding the number of objectives, decision variables, and equivalent Pareto optimal solution sets. Results show that the improved versions perform clearly better than their original algorithms.
I Introduction
MULTIOBJECTIVE optimization problems (MOPs) appear in realworld applications. Since no solution can simultaneously minimize multiple objective functions in general, the goal of MOPs is usually to find a Pareto optimal solution preferred by a decision maker [32]. When the decision maker’s preference information is unavailable a priori, an “a posteriori” decision making is conducted. The decision maker selects the final solution from a set of solutions that approximates the Pareto front in the objective space.
An evolutionary multiobjective optimization algorithm (EMOA) is frequently used for the “a posteriori” decision making. Since EMOAs are populationbased optimizers, they are likely to find a set of solutions in a single run. A number of EMOAs have been proposed in the literature. They can be classified into the following three categories: dominancebased EMOAs (e.g., NSGAII [10]), indicatorbased EMOAs (e.g., IBEA [59]), and decompositionbased EMOAs (e.g., MOEA/D [56] and NSGAIII [11]). In particular, decompositionbased EMOAs have shown promising performance [46].
In the EMO community, it has been implicitly assumed that the decision maker is interested only in the distribution of solutions in the objective space. Thus, the distribution of solutions in the solution space has not received much attention. However, after the decision maker has selected the final solution based on its objective vector , she/he may want to examine other dissimilar solutions with equivalent quality or slightly inferior quality [40, 38, 39].
Fig. 1 shows a situation where the four solutions , , , and are far from each other in the solution space but close to each other in the objective space. Although is a dominated solution, it may be acceptable for the decision maker. This is because the difference between , , , and is small enough. If the decision maker has obtained multiple dissimilar solutions with similar quality, she/he can select according to her/his preference in the solution space. For example, suppose that in Fig. 1 becomes unavailable due to some accident (e.g., material shortages and mechanical failures) after the decision maker has selected as . In such a case, she/he can substitute one of , , and for . Schütze et al. give a more practical example on space mission design problems [39]. Sebag et al. also demonstrate the importance of multiple equivalent solutions for the decision maker on functional brain imaging problems [40]. Previous studies on other realworld problems with multiple equivalent solutions include diesel engine design problems [15], distillation plant layout problems [33], and rocket engine design problems [25].
A multimodal multiobjective problem (MMOP) is to locate (almost) equivalent Pareto optimal solutions as many as possible [40, 13]. On the one hand, it is sufficient for MOPs to find one of , , and in Fig. 1 since their quality is almost the same. On the other hand, all of , , , and should be found for MMOPs. Since multiple equivalent solutions play crucial roles in the reliable decision making process as explained above, MMOPs are important in practice. Evolutionary multimodal multiobjective optimization algorithms (EMMAs) are specially designed optimizers for MMOPs. Unlike EMOAs, EMMAs have mechanisms to handle multiple equivalent solutions. Representative EMMAs include MOEA [39], Omnioptimizer [13], DIOP [47], NichingCMA [41], and MO_Ring_PSO_SCD [54].
In our earlier study [45], we proposed MOEA/DAD, which is a specially designed MOEA/D for MMOPs. MOEA/DAD can assign multiple individuals to each subproblem in order to handle equivalent solutions. For each iteration, a child is assigned to a subproblem whose weight vector is closest to its objective vector , in terms of the perpendicular distance. Then, is compared to the individuals assigned to the same subproblem based on the weighted Tchebycheff function and the contribution to the solution space diversity.
In this paper, we extend our earlier study [45] and propose a general framework (called ADA) to handle multimodal multiobjective optimization in a variety of decompositionbased EMOAs (including referencebased algorithms such NSGAIII [11] and DEA [52]). The proposed ADA framework uses assignment, deletion, and addition operations. While decompositionbased EMOAs work well for multiobjective optimization, they are likely to perform poorly for multimodal multiobjective optimization. This is because most of them do not have mechanisms to maintain the solution space diversity. In this paper, we show that efficient EMMAs could be realized by incorporating a solution space diversity maintenance mechanism into decompositionbased EMOAs. The proposed ADA framework facilitates the solution space diversity maintenance of existing EMOAs. We incorporate ADA into the following six representative decompositionbased EMOAs: MOEA/DAGR [50], MOEA/DDU [53], eMOEA/D [22], NSGAIII [11], DEA [52], and RVEA [6]. We examine the ability of the ADA versions of those EMOAs (MOEA/DAGRADA, MOEA/DDUADA, eMOEA/DADA, NSGAIIIADA, DEAADA, and RVEAADA) to locate multiple equivalent Pareto optimal solutions on various test problems. We also analyze the behavior of those ADA versions.
The differences between this paper and our previous study [45] are as follows:

We propose ADA. While MOEA/DAD [45] is a single algorithm for MMOPs, ADA is a framework to improve the performance of the existing decompositionbased EMOAs for MMOPs. In contrast to MOEA/DAD, each configuration (e.g., the assignment operation) in ADA depends on an EMOA to be combined.

We demonstrate that ADA can be combined with the six decompositionbased EMOAs, including socalled reference vectorbased EMOAs. MOEA/DAD does not have such flexibility.

We introduce a practical twophase decision making method with a solution set found by an ADAbased algorithm. We also introduce two postprocessing methods for benchmarking an ADAbased algorithm.

While we used only twoobjective and twovariable problems in [45], we use problems with a various number of objectives , design variables , and equivalent Pareto optimal solution subsets to investigate the scalability of EMMAs. Such a scaleup study has not been performed in the literature. We also use problems with distancerelated variables [55, 30].
The rest of this paper is organized as follows. Section II provides some preliminaries of this paper. Section III introduces ADA and explains how to incorporate it into decompositionbased EMOAs. Section IV describes the experimental setup. Section V examines the performance of the six ADA variants. Section VI concludes this paper.
Ii Preliminaries
First, Subsections IIA and IIB give definitions of MOPs and MMOPs, respectively. Then, Subsections IIC and IID describe decompositionbased EMOAs and reference vectorbased EMOAs, respectively. Subsections IIC and IID mainly explain components of the six existing EMOAs used in ADA. Algorithms S.1–S.6 in the supplementary file provide the overall procedures of the six EMOAs. Section III introduces how each component is used in ADA.
Iia Definition of MOPs
A continuous MOP is to find a solution that minimizes a given objective function vector . is the dimensional solution space where for each index . is the dimensional objective space.
A solution is said to dominate if and only if for all and for at least one index . If is not dominated by any other solutions, it is called a Pareto optimal solution. The set of all is the Pareto optimal solution set, and the set of all is the Pareto front. The goal of MOPs for the “a posteriori” decision making is to find a nondominated solution set that approximates the Pareto front in the objective space.
IiB Definition of MMOPs
Although the term “multimodal multiobjective optimization” was firstly coined in [12, 40] in 2005, the definition of MMOPs has not been explicitly given in most previous studies. We consider the following two types of MMOPs: (i) Type1MMOP is to locate all Pareto optimal solutions. (ii) Type2MMOP is to locate all Pareto optimal solutions and nonPareto optimal solutions which have acceptable quality for the decision maker. Those nonPareto optimal solutions should be far from the Pareto optimal solutions and the other nonPareto optimal solutions in the solution space.
For example, , , and in Fig. 1 should be found for Type1MMOPs. In addition, the nonPareto optimal solution should be found for Type2MMOPs if its quality is acceptable to the decision maker. While most existing studies (e.g., [13, 54]) assume Type1MMOPs, only a few studies (e.g., [40, 39, 47]) address Type2MMOPs. Although there is room for discussion, Type2MMOPs may be more practical than Type1MMOPs. Diverse solutions with similar quality to the final solution are beneficial to the decision maker even if their quality is slightly worse than [40, 39].
Nevertheless, we focus only on Type1MMOPs in this paper. The main reason is due to the difficulty in benchmarking EMMAs on Type2MMOPs. In contrast to Type1MMOPs, Type2MMOPs are loosely defined. This is because the terms “acceptable quality” and “far from” in Type2MMOPs significantly depend on the decision maker. It is difficult to define the two key factors in Type2MMOPs for benchmarking purposes in a fair manner. How to evaluate the performance of EMMAs for Type2MMOPs itself is another research topic.
IiC Decompositionbased EMOAs (MOEA/Dtype algorithms)
Although some frameworks of decompositionbased EMOAs have been proposed in the literature, MOEA/Dtype algorithms show the promising performance [46]. Here, we describe MOEA/Dtype algorithms. First, we explain the most basic MOEA/D [56] and its modified version called MOEA/DDE [26]. The framework of MOEA/DDE is used in most MOEA/Dtype algorithms. Then, we describe components of three MOEA/Dtype algorithms: MOEA/DAGR [50], MOEA/DDU [53], and eMOEA/D [22].
Moea/d
The most basic MOEA/D [56] decomposes an objective MOP into singleobjective subproblems using a scalarizing function and a set of uniformly distributed weight vectors . For each , , and . The th individual in the population is assigned to the th subproblem with . Thus, the population size is always equal to the number of weight vectors . The th subproblem also has its neighborhood index list , which consists of indices of the closest weight vectors to in the weight vector space.
After the initialization of and , the following steps are repeatedly performed until a termination condition is satisfied. In each iteration , for the th subproblem (), an index list is set to . Two indices and of the parent individuals and are randomly selected from . is generated by applying variation operators to and . The SBX crossover and the polynomial mutation [9] are used in the original MOEA/D. After has been generated, the replacement selection is applied to each neighborhood subproblem . The current solution of the th subproblem is replaced with if .
Moea/dDe
The differences between MOEA/D and MOEA/DDE are threefold. First, the differential evolution (DE) operator [43] is used instead of the SBX crossover [9]. Second, two types of index lists ( and ) are used. The index list is set to with a probability of and with a probability of . Thus, can be set to all individual indices. Third, the maximum number of individuals replaced by the child is restricted to .
Moea/dAgr
The difference between MOEA/DAGR and MOEA/DDE is caused by a replacement method. In MOEA/DAGR, indices of subproblems to be updated by a newly generated solution are set to neighborhood indices of the th subproblem in the weight vector space where is the subproblem index whose scalarizing function value is the best among all subproblems:
(1) 
The weighted Tchebycheff function is used in (1):
(2) 
where is the ideal point. Since finding the true ideal point is difficult in general, its approximation is used in (2). The th element of the approximation of is the minimum objective value found during the search process.
The replacement neighborhood size plays a crucial role in balancing exploration and exploitation in MOEA/DAGR in a similar manner to in MOEA/D. A large value encourages exploitation. In MOEA/DAGR, the value deterministically increases with the number of iterations. The results presented in [50] show that a scheduling method based on the sigmoid function is suitable for MOEA/DAGR.
Moea/dDu
In contrast to MOEA/DAGR, the selection of subproblems that need to be updated is based on the distance in the objective and weight vector spaces in MOEA/DDU. In the normalized objective space, the perpendicular distance between the normalized objective vector and is calculated for each . The replacement is performed for subproblems with the minimum perpendicular distance. Similar to MOEA/DAGR, controls the balance between exploration and exploitation.
eMOEA/D
In eMOEA/D, the replacement method is applied to subproblems with the minimum scalarizing function values. MOEA/DAGR and eMOEA/D are the same regarding the use of scalarizing function values to select subproblems. The following multiplicative scalarizing function (MSF) [22] is used in eMOEA/D:
(4) 
where controls the size of the socalled improvement region. plays a similar role in the penalty value of the PBI function in (7), which will be explained later. Even if is far from regarding the the perpendicular distance, would be evaluated as being superior using a sufficiently large value. In (4), with is identical to .
According to a general rule of thumb “emphasize diversity and convergence at the early and later stages, respectively”, decreases linearly with the number of iterations :
(5) 
where is the maximum number of iterations. The recommended value of is .
IiD Reference vectorbased EMOAs
Representative reference vectorbased EMOAs include NSGAIII [11], DEA [52], RVEA [6], VaEA [51], and SPEA/R [23]. While is called the weight vector in decompositionbased EMOAs, it is referred to the reference vector in reference vectorbased EMOAs. Below, we explain NSGAIII, DEA, and RVEA.
NsgaIii
NSGAIII is an improved version of NSGAII [10] for manyobjective optimization. For each iteration , children are generated by applying variation operators to randomly selected pairs of individuals from . In the environmental selection, individuals for the next iteration are selected from the union of and . The primary and secondary criteria are based on the nondomination levels and the reference vectorbased niching method, respectively.
Individuals in the union are grouped as according to their nondomination levels. First, and the front index are initialized as and , respectively. Then, individuals in are added to and is incremented until . After this operation, , where is the index of the last front. If , other individuals are selected from the last front using the niching method described below. At the beginning of the niching procedure, an individual in and is assigned to the th subproblem with the minimum perpendicular distance between its normalized objective vector and :
(6) 
where the function returns the perpendicular distance between two input vectors. After the assignments of all individuals, other individuals in the next iteration are selected from based on the number of individuals assigned to each subproblem and their perpendicular distance.
Dea
The environmental selection in DEA is similar to that of NSGAIII. However, the secondary criterion in DEA is based on the socalled dominance. First, each individual in the union of and is assigned to the th subproblem using (6). Then, individuals assigned to each subproblem are ranked based on their dominance levels. Let us assume that two individuals and are assigned to the same th subproblem. is said to dominate if . The PBI function is given as:
(7)  
(8)  
(9) 
where indicates the Euclidean norm of . The distance represents how close the objective vector is to the Pareto front, and is the perpendicular distance between and . The penalty parameter balances the convergence () and the diversity (). The recommended value is for vectors with the axis directions (e.g., ) and for all the other vectors .
Rvea
RVEA uses a set of unit reference vectors , instead of a set of reference vectors , where for each . RVEA adaptively adjusts based on the current .
After children have been generated, the environmental selection is applied to the union of and . First, for each individual in , is transformed as . Then, in is assigned to the th subproblem with the minimum angle between and :
(10) 
where the function in (10) returns the angle between the two input vectors and .
Then, individuals assigned to the same subproblem are compared based on their anglepenalized distance (APD) values. For each , the best individual with the minimum APD value can survive to the next iteration. The APD value of with the unit reference vector is given as:
(11)  
(12)  
(13) 
where is a penalty value for . The larger the angle between and is, the larger the penalty value is given to . is used to normalize the angle. is the current number of iterations, and is the maximum number of iterations. The influence of the anglebased penalty scheme increases as the search progresses. The recommended setting of is .
Iii Proposed ADA framework
This section explains the proposed ADA framework and the six ADA versions. Unlike EMOAs for MOPs, EMMAs for MMOPs need to maintain the diversity of the population in both the objective and solution spaces. For example, Omnioptimizer [13] uses an aggregate crowding distance metric in the objective and solution spaces. While most EMMAs (e.g., [41, 47, 54]) aggregate the objective and solution space diversity metrics similar to Omnioptimizer, ADA uses them in a twophase manner similar to TriMOEATA&R [30]. More specifically, ADA handles the objective space diversity by an assignment method in an original EMOA and the solution space diversity by a simple niching criterion.
On the one hand, a single individual is assigned to each subproblem in most decompositionbased EMOAs. Thus, the population size is equal to the number of weight/reference vectors (i.e., ). On the other hand, one or more individuals can be assigned to each subproblem in the ADA framework (i.e., ). This mechanism is to maintain multiple equivalent individuals in each subproblem. The value is adaptively adjusted in ADA.
Algorithm 1 shows the ADA framework. Algorithms S.7–S.12 in the supplementary file show the six ADAbased algorithms. Algorithms S.7–S.12 are almost the same to Algorithm 1. Lines 3, 11, and 16 are different between Algorithms S.7–S.12 and Algorithm 1. At the beginning of the search, is set to (line 1). The population and the weight/reference vector set are also initialized. The th individual in is assigned to the th subproblem (lines 2–3). This operation is unnecessary for those EMOAs with a reassignment procedure of individuals to subproblems such as RVEA. After the child has been generated by the reproduction operations (line 6–8), the population is updated using the assignment (line 10), deletion (lines 16–17), and addition (lines 18–19) operations. The assignment and deletion operations differ depending on an EMOA to be combined with the ADA framework. After the normalization of the objective vectors (line 9), is assigned to the th subproblem (line 10). denotes a set of individuals that have been assigned to the th subproblem and are in the neighborhood of in the solution space (line 11). Thus, ADA requires a neighborhood criterion in the solution space. is explained later in detail using Fig. 2. Two Boolean variables and (line 12) are used in the addition operation (lines 18–19). enters if it satisfies either of the following two conditions. One is that there is no individual in (lines 13–14). The other is that is better than at least one individual in in the deletion operation (lines 15–17).
The following Subsections (from IIIA to IIIF) explain each step of ADA in detail. Subsection IIIG presents an effective decision making based on a solution set found by the proposed approach. Apart from the decision making, Subsection IIIH introduces two postprocessing methods for benchmarking. Subsection IIII discusses the applicability of ADA. Subsection IIIJ discusses the originality of ADA.
Iiia Reproduction operation
ADA uses the basic GA operators (i.e., the SBX crossover and the polynomial mutation [9]) and the simplest method of selecting parents. First, two parents and are randomly selected from such that , regardless of the subproblem to which each individual has been assigned. Then, is reproduced by applying SBX to and . Finally, the polynomial mutation is applied to . All ADAbased algorithms use the same mating selection scheme, regardless of the mating selection schemes in their original EMOAs.
In principle, any variation operators can be incorporated into ADA, including DE operators [26], PSO operators [54], and modelbased methods [41, 58]. Any parent selection methods can also be used in ADA, including the distancebased selection methods in the solution space [5]. The performance of ADA can be improved by using these more sophisticated methods. However, if such effective methods are used in ADA, it is unclear which algorithmic component mainly contributes to the overall performance of ADAbased algorithms. Since we want to investigate the effectiveness of the ADA framework in an isolated manner, we use the simplest variation operators and parent selection method in this paper.
IiiB Normalization
Since the objective functions of most realworld problems are differently scaled, the normalization method is mandatory. In ADA, the normalization method depends on an EMOA to be combined. For example, NSGAIIIADA uses the interceptbased normalization method in NSGAIII.
If the original EMOA does not have a normalization method (e.g., MOEA/DAGR), we use the following simple normalization method. The objective vector is normalized using the approximated ideal point and the worst point in the population . The th element of the normalized objective vector is given as: .
IiiC Assignment operation
The child is assigned to the th subproblem based on its normalized objective vector (line 10 in Algorithm 1). The assignment operation plays a crucial role in ADA to maintain the diversity of the population in the objective space.
Each EMOA has a different method to select the index of the subproblem to which is assigned. MOEA/DAGRADA and eMOEA/DADA select the th subproblem with the minimum scalarizing function value as in (1). MOEA/DDUADA, NSGAIIIADA, and DEAADA assign to the th subproblem with the minimum perpendicular distance between and as in (6). RVEA selects the subproblem with the minimum angle between and the unit reference vector as in (10). ADA requires only a single index for whereas MOEA/DAGR, eMOEA/D, and MOEA/DDU select subproblem indices from .
IiiD Neighborhood criterion in the solution space
ADA requires the neighborhood criterion in the solution space (line 11 in Algorithm 1). While the objective space diversity is maintained by the assignment operation, the solution space diversity is controlled by the neighborhood criterion.
We use a simple relative distancebased neighborhood criterion presented in [45]. First, the normalized Euclidean distance between each individual in and is calculated in the solution space. The upper and lower bounds for each decision variable of a problem are used for the normalization. Then, all individuals in are sorted based on their distance values in descending order. If is within the nearest individuals from , is said to be a neighbor of in the solution space. is a control parameter in this neighborhood criterion.
In addition to the relative distancebased neighborhood criterion, any neighborhood criterion can be incorporated into ADA. A number of niching methods have been proposed in the multimodal singleobjective optimization community [28]. Modern niching methods include the adaptive radiusbased method [3] and the nearestbetter clustering [35]. However, our preliminary results show that such sophisticated niching methods do not work well in ADA. The main reason for the failure is that an appropriate parameter specification for such a modern niching method is difficult due to the difficulty in understanding the fitness landscape of an MOP [24]. For this reason, we use the simple neighborhood criterion.
IiiE Deletion operation
Let be a set of individuals that are in the neighborhood of in the solution space among the individuals assigned to the same th subproblem as (line 11 in Algorithm 1). Fig. 2 shows an example of . In Fig. 2, , , , and has been assigned to the thirdsubproblem (). A set of neighborhood individuals of in the solution space is . A set of individuals assigned to the third subproblem in the objective space is . In this case, . Whereas is the neighborhood individual of , has been assigned to the second subproblem (). has been assigned to the third subproblem with , but is not in the neighborhood of . Since is dominated by with respect to the two objectives in Fig. 2, is deleted in most EMOAs. In contrast, can survive in in ADA. This is because is not a neighbor of (and any other individual assigned to the third subproblem). Thus, is not compared to . Although the quality of is poor, contributes to the solution space diversity of .
A paired comparison between and each in is performed (lines 15–17 in Algorithm 1). If is evaluated as being worse than (by the evaluation criterion explained in the next paragraph), is deleted from . The deletion of such is reasonable since it is based on both the quality in the objective space and the diversity in the solution space.
The comparison criterion depends on the environmental selection in each original EMOA. In MOEA/Dtype algorithms, the comparison is based on the scalarizing function . MOEA/DAGRADA, MOEA/DDUADA, and eMOEA/DADA use in (2), in (3), and in (4), respectively. If on the th subproblem, is removed from . In NSGAIIIADA and DEAADA, the primary comparison between and is based on the Paretodominance relation. In NSGAIIIADA, ties are broken by the perpendicular distance between the normalized objective vector and . In DEAADA, the tiebreaker is the dominance. RVEAADA uses the APD value given in (11).
IiiF Addition operation
The child is added to the population if either of the following two conditions is met (lines 18–19 in Algorithm 1). One is that no individual exists in . An empty means that there is no neighborhood individual of in in the solution space (or the objective space). If the first condition is met, enters without any comparison. Although with inferior quality is likely to enter , it helps to maintain the solution space diversity (or the objective space diversity). While a dominated individual is unlikely to survive to the next iteration in most EMOAs, it can remain in due to the first criterion in the addition operation.
The other is that performs better than at least one individual in in the deletion operation (lines 16–17 in Algorithm 1). Since has good quality in its neighborhood in the solution space, it should be added to .
IiiG Decision making
We introduce an effective decision making with a solution set found by an ADAbased algorithm. Fig. 7 shows an example of the decision making. Fig. 7 (a) exhibits the distribution of all objective vectors in the final population of MOEA/DAGRADA on the twoobjective and twovariable SYMPART1 problem [37]. As shown in Fig. 7 (d), nine equivalent Pareto optimal solution sets are on the nine lines in SYMPART1. The experimental setting is described in Section IV later. The decision making in ADA is the following twophase method based on and .
based decision making
Fig. 7 (a) shows that contains a large number of solutions. This is undesirable for the decision maker, because she/he usually wants to examine a small number of welldistributed solutions in the objective space [56]. To address this issue, the best solution for the th subproblem is selected from based on the same criterion in the deletion operation (). Then, only nondominated solutions are selected from the best solutions. Let be a set of the nondominated solutions obtained by this procedure. Fig. 7 (b) shows nondominated solutions in in the objective space. The decision maker examines objective vectors in , rather than .
When the decision maker wants to examine or less nondominated solutions (e.g., only 10 solutions), other solution selection methods can be used to select from . If the number of objectives is less than , efficient hypervolumebased selection approaches (e.g., [4]) are available. If , computationally cheap distancebased selection methods (e.g., [42]) can be used.
based decision making
After the decision maker has examined in the objective space, she/he selects the final solution from . Suppose that the decision maker has selected a solution on the 77th subproblem as in Fig. 7 (b). In ADA, she/he can examine other dissimilar solutions with similar quality to . Let be a set of all solutions assigned to the same 77th subproblem with . Figs. 7 (c) and (d) show the distribution of all nine solutions in in the objective and solution spaces, respectively. Although the nine solutions in have almost the same quality in the objective space, they are dissimilar in the solution space. In addition to , the decision maker can examine other candidates in based on her/his preference in the solution space.
IiiH Two Postprocessing methods for benchmarking
Apart from the practical twophase decision making method described in Subsection IIIG, we here consider benchmarking of ADAbased EMMAs. The performance of EMOAs and EMMAs is evaluated using performance indicators. However, a fair performance comparison is difficult between ADAbased EMMAs and other optimizers. This is because the current population in ADA can contain an unbounded number of solutions. Most performance indicators cannot assess solution sets with different sizes in a fair manner [20]. IGD [18] and IGDX [58] described in Section IV later are not exceptions. Thus, ADA requires a method of selecting a constant number of solutions from for benchmarking studies.
We introduce two postprocessing methods that are used to evaluate the performance of ADAbased algorithms for MOPs and MMOPs, respectively. In general, uniformly distributed solutions are unlikely to be uniformly distributed objective vectors due to a nonuniform mapping from the solution space to the objective space. To address this issue, we use the two postprocessing methods for the objective and solution spaces, respectively. On the one hand, the same method of selecting from described in Subsection IIIG is used for performance indicators of MOPs (e.g., IGD [8]). Since indicators of MOPs assess the distribution of a solution set in the objective space, the choice of is reasonable.
On the other hand, the solution distancebased selection method presented in [45] is used for performance indicators for MMOPs (e.g., IGDX [58]). Let us consider the task of selecting sparsely distributed solutions from all nondominated solutions in . Below, denotes the distance between a solution and its nearest solution in a solution set in the normalized solution space. First, is set to empty. A solution is randomly selected from the nondominated solution set and stored into . Then, a solution with the maximum value is repeatedly added to until . Unlike and , is used only for benchmarking of ADAbased algorithms.
IiiI Applicability of ADA
ADA is a framework to improve the performance of decompositionbased EMOAs for MMOPs. We do not claim that ADA can be combined into any decompositionbased EMOAs. Since ADA requires a method of assigning a child to a subproblem, ADA is inapplicable to EMOAs with no assignment mechanism. Such EMOAs include MOEA/D [56] and MOEA/DDRA [57]. Also, the deletion operation performs the pairwise comparison independently from other individuals. Thus, ADA is not applicable to EMOAs whose environmental selection is performed for all individuals in , such as MOEA/DSTM [27] and VaEA [51].
In summary, ADA can be combined into EMOAs with an assignment mechanism and a pairwise comparisonbased environmental selection. The six EMOAs explained in Subsections IIC and IID satisfy these conditions. In addition, ADA is applicable to IDBEA [1] and SPEA/R [23].
Fortunately, even if a decompositionbased EMOA does not satisfy the abovementioned two conditions, ADA can be applied to the EMOA after some modifications. For example, since MOEA/D does not have the assignment method, we cannot directly combine ADA in MOEA/D. However, MOEA/D can be easily modified by using any of the three assignment methods described in Subsection IIIC. In fact, MOEA/DAD proposed in [45] is an ADAbased MOEA/D with the assignment operation in (6). Thus, the applicability of ADA is not limited to only a few decompositionbased EMOAs.
IiiJ Originality of ADA
In addition to MOEA/DAD [45], a variant of MOEA/D for MMOPs is proposed in [16]. The MOEA/D variant assigns individuals to each subproblem. The fitness value is based on the PBI function value and two distance values in the solution space. The main disadvantage of the MOEA/D variant in [16] is the difficulty in finding a proper value. Since the number of equivalent Pareto optimal solution subsets is unknown a priori, finetuning of is necessary for a given problem. Although a multistart decompositionbased approach is proposed in [37], it has a similar disadvantage. In contrast, ADA adaptively adjusts the number of individuals assigned to each subproblem. Thus, ADA does not require a problemdependent parameter such as .
TriMOEATA&R [30] consists of various advanced components, including the convergence and diversity archivesbased strategy as in Two_Arch2 [49], the decision variabledecomposition method in MOEA/DVA [31], and the anglebased individual assignment in RVEA [6]. ADA and TriMOEATA&R are similar in that they handle diversity in the objective and solution spaces in a twophase manner. While TriMOEATA&R uses an absolute distancebased neighborhood criterion with , ADA uses the relative distancebased neighborhood criterion with . ADA uses a much simpler mating scheme, and it does not use any decomposition method of decision variables. ADA also uses the adaptive population sizing strategy to handle equivalent solutions, as mentioned above. Whereas TriMOEATA&R is an algorithm for MMOPs with distancerelated variables, ADA is a general framework for various MMOPs.
Iv Experimental settings
Test problems  

TwoOnOne [34] and SSUF1,3[29]  2  2  2 
SYMPART1–3 [37]  2  2  9 
Polygon [17]  Any  2  Any 
Omnitest [13]  2  Any 
Iva Test problems
We use the following eight multimodal multiobjective test problems: the TwoOnOne problem [34], the three SYMPART problems [37], the two SSUF problems [29], the Polygon problem [17], and the Omnitest problem [13]. Table I shows their properties, including the number of objectives , the number of decision variables , and the number of equivalent Pareto optimal solution sets .
TwoOnOne has two equivalent Pareto optimal solution sets that are symmetrical with respect to the origin. Equivalent Pareto optimal solution sets are on the nine lines in SYMPART1, as shown in Fig. 7. The nine lines are rotated in SYMPART2. In addition, the nine lines are distorted in SYMPART3. SSUF1 and SSUF3 have two symmetrical Pareto optimal solution sets. We evaluate the scalability of EMMAs to using Polygon. We set of Polygon as follows: . Although can be any number in Polygon, it was set to be nine. We investigate the scalability of EMMAs to and using Omnitest. We set as follows: . increases exponentially with increased in Omnitest. In summary, we use 15 test problem instances.
HPS [55] and MMMOP [30] have been recently proposed. However, HPS and MMMOP have socalled “distancerelated” variables that affect only the distance between the objective vector and the Pareto front. For this reason, we do not mainly use HPS and MMMOP for our benchmarking study. We use HPS and MMMOP only in Subsections VE and VF.
IvB Performance indicators
Below, is a set of solutions obtained by an EMMA. is also a set of reference solutions in the Pareto optimal solution set. The size of was set to . For of each problem, solutions were selected from randomly generated Paretooptimal solutions using the distancebased solution selection method [45] (Subsection IIIH).
We use IGD [18] to evaluate in terms of both convergence to the Pareto front and diversity in the objective space:
(14) 
where . IGD is a modified version of IGD [8]. While the original IGD is Pareto noncompliant, IGD is weakly Pareto compliant.
We evaluate how well approximates the Paretooptimal solution set in the solution space using IGDX [58]:
(15) 
where is the Euclidean distance between and .
EMOAs that can find with small IGD and IGDX values are efficient multiobjective optimizers and multimodal multiobjective optimizers, respectively. In the ADAbased algorithms, and explained in Subsection IIIH are used for the IGD and IGDX calculations, respectively.
IvC Average performance score
We use the average performance score (APS) [2] in order to aggregate results on various problems. Suppose that algorithms are compared for a problem instance based on the indicator values obtained in multiple runs. For each and , if significantly outperforms using the Wilcoxon ranksum test with , then ; otherwise, . The score is defined as follows: . The score represents the number of algorithms that outperform . The APS value of is the average of the values for all problem instances. A small APS value of indicates that performs well among algorithms.
IvD EMMAs and EMOAs
We examine the performance of the six ADAbased EMMAs: MOEA/DAGRADA, MOEA/DDUADA, eMOEA/DADA, NSGAIIIADA, DEAADA, and RVEAADA. We implemented the ADAbased algorithms using jMetal [14]. Their source codes can be downloaded from the supplementary website (https://sites.google.com/view/mmoada). We compare the ADAbased EMMAs to their original EMOAs: MOEA/DAGR [50], MOEA/DDU [53], eMOEA/D [22], NSGAIII [11], DEA [52], and RVEA [6]. Our implementations of the six original EMOAs were based on their corresponding articles, except for MOEA/DAGR. We replaced the DE operator with SBX in MOEA/DAGR to remove the effect of variation operators. For details, refer to the corresponding articles.
The number of maximum function evaluations was . runs were performed for each test problem. A set of weight/reference vectors were generated using the simplexlattice design method for and its twolayered version [11] for . The number of the weight/reference vectors was , , , , and , for , , , , and , respectively. The SBX crossover and the polynomial mutation were used in all methods, including MOEA/DAGR. Their control parameters were set as follows: , , , and . According to the analysis presented in [45], of the neighborhood criterion in ADA was set to . For example, when . Other parameters were set according to the corresponding references.
V Experimental results
This section shows performance analysis of the six ADAbased EMMAs. Subsection VA describes the effect of ADA on the six EMOAs. Subsection VB compares the six ADAbased EMMAs to stateoftheart EMMAs. Subsection VC discusses the adaptive population sizing in ADA. Subsection VD investigates the influence of the value on the performance of the six ADAbased EMMAs. Subsection VE examines the performance of the six ADAbased EMMAs on test problems with distancerelated variables. Subsection VF presents a comparison with the stateoftheart EMMAs using an unbounded external archive. Subsection VG discusses the runtime of the ADAbased algorithms.


Va Effect of ADA
Tables II (a) and (b) show the paired comparison of each EMOA and its ADA version on the 15 test problem instances in terms of IGD and IGDX, respectively. Table II shows only the APS value of each algorithm at the final iteration. Table S.1 in the supplementary file shows detailed results. As described in Subsection IIIH, only solutions in and were used for the IGD and IGDX calculations. Thus, all algorithms are compared under the same number of solutions. We set the value to for the Wilcoxon ranksum test. Polygon is the objective Polygon problem, and Omnitest is the variable Omnitest problem. Below, we describe the results of IGD and IGDX.
Igd
Table II (a) shows that the original EMOAs outperform their ADAbased EMMAs regarding IGD (except for RVEAADA). In general, EMMAs perform slightly worse than EMOAs for multiobjective optimization [41, 54]. While EMOAs aim to find a good approximation of the Pareto front, EMMAs attempt to locate all Pareto optimal solutions. For this reason, the performance of the ADA versions regarding IGD is worse than that of their original EMOAs.
However, as shown in Table S.1 in the supplementary file, the IGD value of the ADA versions is only times worse than that of their original EMOAs even in the worst case. Also, the ADA versions perform better than their original EMOAs on some problems. For example, eMOEA/DADA obtains better IGD values than eMOEA/D on SYMPART2, SSUF3, and 10Polygon. As discussed in [48], a mechanism for maintaining the solution space diversity can help the ADA versions to find highquality solutions.
Igdx
Table II (b) shows that the ADAbased EMMAs significantly outperform their original EMOAs. As shown in Table S.1, the IGDX value of the ADA versions is times better than that of their original EMOAs in the best case. These results indicate that ADA improves the performance of the six decompositionbased EMOAs for MMOPs.
Table S.1 shows that the ADAbased EMMAs work well on Polygon with and Omnitest with . Since the number of equivalent Pareto optimal solution sets increases exponentially with increased in Omnitest, the results show that ADA can also handle problems with a large value. In summary, ADA has a sufficient scalability to , , and .
Distribution of solutions
Figs. 21 and 21 show the distribution of nondominated solutions found by eMOEA/DADA, NSGAIIIADA, RVEAADA, and their original versions on Polygon with in the objective and solution spaces, respectively. These figures show results of a single run with a median IGD and IGDX values, respectively. For the ADAbased algorithms, nondominated solutions in and are shown in Figs. 21 and 21, respectively. Figs. S.1 and S.2 in the supplementary file show the results of the other ADAbased algorithms, but the results are similar to Figs. 21 and 21. It was shown in [11] that some decompositionbased EMOAs did not work well on problems with convex Pareto fronts. It was also presented in [21] that some decompositionbased EMOAs performed poorly on problems with invertedtriangular Pareto fronts. Since Polygon has a convex and an inverted triangular Pareto front, it is difficult for decompositionbased EMOAs to find good solutions.
For the abovementioned reasons, no method in Fig. 21 can approximate the Pareto front of Polygon well. Nevertheless, some ADAbased algorithms (e.g., eMOEA/DADA and RVEAADA) find better distributed objective vectors than their original EMOAs. This unintended effect of ADA may be caused by the diversity of the population in the solution space [48]. Although we do not claim that ADA can improve the performance of EMOAs for MOPs in addition to MMOPs, the solution space diversity maintenance might help the original EMOAs to handle problems with irregular Pareto fronts.
In the Polygon problem with and , equivalent Pareto solution sets are inside of the nine regular triangles in the solution space. Figs. 21 (a)–(c) show that the original EMOAs cannot locate all equivalent Pareto solution sets well. In contrast, Figs. 21 (d)–(f) demonstrate that their ADA versions can locate all nine equivalent Pareto solution sets. Results on other test problems are similar to Fig. 21.


VB Comparison with other EMMAs
In Subsection VA, we demonstrated that ADA can improve the performance of the six EMOAs for MMOPs. Here, we compare the six ADAbased algorithms to the following three EMMAs: Omnioptimizer [13], MO_Ring_PSO_SCD [54], and TriMOEATA&R [30]. Omnioptimizer is the most representative EMMA. MO_Ring_PSO_SCD and TriMOEATA&R are recently proposed methods. Default parameter settings were used for the three methods.
Table III shows the APS values of the nine EMMAs on the 15 test problem instances. Table S.2 in the supplementary file shows detailed results. TriMOEATA&R incorrectly recognizes that some problems have distancerelated variables (e.g., SYMPART1). In such a case, TriMOEATA&R generates a set of additional solutions by recombining solutions in the diversity archive and the distancerelated variables at the end of the search. Since , methods of selecting solutions from are needed. We use the two postprocessing methods in ADA. In the same manner as in ADA, and are used for the IGD and IGDX calculations, respectively. We use the selection method in NSGAIIIADA to obtain from .
Table III (a) shows that Omnioptimizer performs the best regarding IGD. The good performance of Omnioptimizer for MOPs is consistent with the results presented in [54, 45]. NSGAIIIADA shows the best performance regarding IGD among the six ADAbased algorithms. Table III (b) shows that the six ADAbased algorithms perform better than the three other algorithms regarding IGDX. TriMOEATA&R performs the worst regarding IGDX. This is because the 15 problem instances have no distancerelated variables. In summary, the results indicate that the six ADAbased algorithms have better performance than the stateoftheart algorithms for MMOPs.
VC Adaptive population sizing in ADA
In ADA, the number of individuals assigned to each subproblem is adaptively adjusted by the deletion and addition operations. Thus, the population size is not constant. Here, we discuss the adaptive population sizing in ADA.
Fig. 23 shows the change of of NSGAIIIADA on Omnitest with . Results of the other ADA versions are similar to Fig. 23. The value for is always larger than that for . In contrast, the value decreases as the value increases from . This may be because problems with a large value are generally difficult for EMMAs. Since the search does not proceed well for , the value does not significantly increase. The difficulty of finding multiple equivalent Pareto sets has been reported for the case of a large value [19].
In addition, the trajectory of is problemdependent. Fig. S.3 in the supplementary file shows the change of on other problems. While the value is approximately on TwoOnOne and 3Polygon at the end of the search, it is approximately – on other problems. Fig. S.3 shows that the value sharply increases in early iterations. After some evaluations, the value is stable.
Fig. 23 shows the cumulative numbers of the activations of the addition operation based on the two criteria ( and in Algorithm 1) and the deletion operation in NSGAIIIADA on 3Polygon. On the one hand, the number of activations of the addition operation based on is almost the same as that of the deletion operation throughout the evolution in Fig. 23. This competitive behavior of the two operations leads to the stable value in Fig. S.3. On the other hand, the addition operation based on is frequently performed only in an early stage of evolution in Fig. 23. Thus, the sharp increase of the value in Fig. S.3 is due to the based addition operation. Since only one individual is assigned to each subproblem at the beginning of the search, the based addition operation is often activated.
In summary, the value is adaptively adjusted by the cooperative behavior of the deletion and addition operations in ADA. In the addition operation, the roles of the two criteria and differ from each other. Thus, it is important for ADA to use both and .
VD Impact of on the six ADAbased algorithms
We investigate the influence of the value on the performance of the ADAbased algorithms. Table IV shows the APS values of the six ADAbased algorithms with six values on the 15 test problem instances. Tables S.3–S.8 in the supplementary file show detailed results.
Table IV (a) shows that the six ADAbased algorithms with have the worst performance regarding IGD. Too small values degrade the performance of the ADAbased algorithms regarding IGD. Table IV (b) shows that is most suitable for all ADAbased algorithms (except for RVEAADA) in terms of IGDX. Too large values degrade the performance of the ADAbased algorithms regarding IGDX. In summary, are suitable for most ADAbased algorithms for MMOPs.
We also investigate the influence of on the evolution of . Figs. S.4 and S.5 in the supplementary file show the change of of NSGAIIIADA with various values on Omnitest with various values and other problems, respectively. Results of other ADA versions are similar to the results of NSGAIIIADA. Figs. S.4 and S.5 indicate that the value decreases as the value increases. The larger the value is, the more individuals can be neighbors of the child . As a result, the niching mechanism in ADA deteriorates. These observations are consistent with the abovementioned results of IGDX.
