A Framework to Handle Multi-modal Multi-objective Optimization in Decomposition-based Evolutionary Algorithms

# A Framework to Handle Multi-modal Multi-objective Optimization in Decomposition-based Evolutionary Algorithms

## Abstract

Multi-modal multi-objective optimization is to locate (almost) equivalent Pareto optimal solutions as many as possible. While decomposition-based evolutionary algorithms have good performance for multi-objective optimization, they are likely to perform poorly for multi-modal multi-objective optimization due to the lack of mechanisms to maintain the solution space diversity. To address this issue, this paper proposes a framework to improve the performance of decomposition-based evolutionary algorithms for multi-modal multi-objective optimization. Our framework is based on three operations: assignment, deletion, and addition operations. One or more individuals can be assigned to the same subproblem to handle multiple equivalent solutions. In each iteration, a child is assigned to a subproblem based on its objective vector, i.e., its location in the objective space. The child is compared with its neighbors in the solution space assigned to the same subproblem. The performance of improved versions of six decomposition-based evolutionary algorithms by our framework is evaluated on various test problems regarding the number of objectives, decision variables, and equivalent Pareto optimal solution sets. Results show that the improved versions perform clearly better than their original algorithms.

Multi-modal multi-objective optimization, decomposition-based evolutionary algorithms, reference vector-based evolutionary algorithms, solution space diversity

## I Introduction

MULTI-OBJECTIVE optimization problems (MOPs) appear in real-world applications. Since no solution can simultaneously minimize multiple objective functions in general, the goal of MOPs is usually to find a Pareto optimal solution preferred by a decision maker [32]. When the decision maker’s preference information is unavailable a priori, an “a posteriori” decision making is conducted. The decision maker selects the final solution from a set of solutions that approximates the Pareto front in the objective space.

An evolutionary multi-objective optimization algorithm (EMOA) is frequently used for the “a posteriori” decision making. Since EMOAs are population-based optimizers, they are likely to find a set of solutions in a single run. A number of EMOAs have been proposed in the literature. They can be classified into the following three categories: dominance-based EMOAs (e.g., NSGA-II [10]), indicator-based EMOAs (e.g., IBEA [59]), and decomposition-based EMOAs (e.g., MOEA/D [56] and NSGA-III [11]). In particular, decomposition-based EMOAs have shown promising performance [46].

In the EMO community, it has been implicitly assumed that the decision maker is interested only in the distribution of solutions in the objective space. Thus, the distribution of solutions in the solution space has not received much attention. However, after the decision maker has selected the final solution based on its objective vector , she/he may want to examine other dissimilar solutions with equivalent quality or slightly inferior quality [40, 38, 39].

Fig. 1 shows a situation where the four solutions , , , and are far from each other in the solution space but close to each other in the objective space. Although is a dominated solution, it may be acceptable for the decision maker. This is because the difference between , , , and is small enough. If the decision maker has obtained multiple dissimilar solutions with similar quality, she/he can select according to her/his preference in the solution space. For example, suppose that in Fig. 1 becomes unavailable due to some accident (e.g., material shortages and mechanical failures) after the decision maker has selected as . In such a case, she/he can substitute one of , , and for . Schütze et al. give a more practical example on space mission design problems [39]. Sebag et al. also demonstrate the importance of multiple equivalent solutions for the decision maker on functional brain imaging problems [40]. Previous studies on other real-world problems with multiple equivalent solutions include diesel engine design problems [15], distillation plant layout problems [33], and rocket engine design problems [25].

A multi-modal multi-objective problem (MMOP) is to locate (almost) equivalent Pareto optimal solutions as many as possible [40, 13]. On the one hand, it is sufficient for MOPs to find one of , , and in Fig. 1 since their quality is almost the same. On the other hand, all of , , , and should be found for MMOPs. Since multiple equivalent solutions play crucial roles in the reliable decision making process as explained above, MMOPs are important in practice. Evolutionary multi-modal multi-objective optimization algorithms (EMMAs) are specially designed optimizers for MMOPs. Unlike EMOAs, EMMAs have mechanisms to handle multiple equivalent solutions. Representative EMMAs include -MOEA [39], Omni-optimizer [13], DIOP [47], Niching-CMA [41], and MO_Ring_PSO_SCD [54].

In our earlier study [45], we proposed MOEA/D-AD, which is a specially designed MOEA/D for MMOPs. MOEA/D-AD can assign multiple individuals to each subproblem in order to handle equivalent solutions. For each iteration, a child is assigned to a subproblem whose weight vector is closest to its objective vector , in terms of the perpendicular distance. Then, is compared to the individuals assigned to the same subproblem based on the weighted Tchebycheff function and the contribution to the solution space diversity.

The differences between this paper and our previous study [45] are as follows:

1. We propose ADA. While MOEA/D-AD [45] is a single algorithm for MMOPs, ADA is a framework to improve the performance of the existing decomposition-based EMOAs for MMOPs. In contrast to MOEA/D-AD, each configuration (e.g., the assignment operation) in ADA depends on an EMOA to be combined.

2. We demonstrate that ADA can be combined with the six decomposition-based EMOAs, including so-called reference vector-based EMOAs. MOEA/D-AD does not have such flexibility.

3. We introduce a practical two-phase decision making method with a solution set found by an ADA-based algorithm. We also introduce two post-processing methods for benchmarking an ADA-based algorithm.

4. While we used only two-objective and two-variable problems in [45], we use problems with a various number of objectives , design variables , and equivalent Pareto optimal solution subsets to investigate the scalability of EMMAs. Such a scale-up study has not been performed in the literature. We also use problems with distance-related variables [55, 30].

5. We compare the six ADA-based algorithms to three state-of-the-art EMMAs, including TriMOEA-TA&R [30] proposed after the publication of [45].

The rest of this paper is organized as follows. Section II provides some preliminaries of this paper. Section III introduces ADA and explains how to incorporate it into decomposition-based EMOAs. Section IV describes the experimental setup. Section V examines the performance of the six ADA variants. Section VI concludes this paper.

## Ii Preliminaries

First, Subsections II-A and II-B give definitions of MOPs and MMOPs, respectively. Then, Subsections II-C and II-D describe decomposition-based EMOAs and reference vector-based EMOAs, respectively. Subsections II-C and II-D mainly explain components of the six existing EMOAs used in ADA. Algorithms S.1–S.6 in the supplementary file provide the overall procedures of the six EMOAs. Section III introduces how each component is used in ADA.

### Ii-a Definition of MOPs

A continuous MOP is to find a solution that minimizes a given objective function vector . is the -dimensional solution space where for each index . is the -dimensional objective space.

A solution is said to dominate if and only if for all and for at least one index . If is not dominated by any other solutions, it is called a Pareto optimal solution. The set of all is the Pareto optimal solution set, and the set of all is the Pareto front. The goal of MOPs for the “a posteriori” decision making is to find a non-dominated solution set that approximates the Pareto front in the objective space.

### Ii-B Definition of MMOPs

Although the term “multi-modal multi-objective optimization” was firstly coined in [12, 40] in 2005, the definition of MMOPs has not been explicitly given in most previous studies. We consider the following two types of MMOPs: (i) Type1-MMOP is to locate all Pareto optimal solutions. (ii) Type2-MMOP is to locate all Pareto optimal solutions and non-Pareto optimal solutions which have acceptable quality for the decision maker. Those non-Pareto optimal solutions should be far from the Pareto optimal solutions and the other non-Pareto optimal solutions in the solution space.

For example, , , and in Fig. 1 should be found for Type1-MMOPs. In addition, the non-Pareto optimal solution should be found for Type2-MMOPs if its quality is acceptable to the decision maker. While most existing studies (e.g., [13, 54]) assume Type1-MMOPs, only a few studies (e.g., [40, 39, 47]) address Type2-MMOPs. Although there is room for discussion, Type2-MMOPs may be more practical than Type1-MMOPs. Diverse solutions with similar quality to the final solution are beneficial to the decision maker even if their quality is slightly worse than [40, 39].

Nevertheless, we focus only on Type1-MMOPs in this paper. The main reason is due to the difficulty in benchmarking EMMAs on Type2-MMOPs. In contrast to Type1-MMOPs, Type2-MMOPs are loosely defined. This is because the terms “acceptable quality” and “far from” in Type2-MMOPs significantly depend on the decision maker. It is difficult to define the two key factors in Type2-MMOPs for benchmarking purposes in a fair manner. How to evaluate the performance of EMMAs for Type2-MMOPs itself is another research topic.

### Ii-C Decomposition-based EMOAs (MOEA/D-type algorithms)

Although some frameworks of decomposition-based EMOAs have been proposed in the literature, MOEA/D-type algorithms show the promising performance [46]. Here, we describe MOEA/D-type algorithms. First, we explain the most basic MOEA/D [56] and its modified version called MOEA/D-DE [26]. The framework of MOEA/D-DE is used in most MOEA/D-type algorithms. Then, we describe components of three MOEA/D-type algorithms: MOEA/D-AGR [50], MOEA/D-DU [53], and eMOEA/D [22].

#### Moea/d

The most basic MOEA/D [56] decomposes an -objective MOP into single-objective subproblems using a scalarizing function and a set of uniformly distributed weight vectors . For each , , and . The -th individual in the population is assigned to the -th subproblem with . Thus, the population size is always equal to the number of weight vectors . The -th subproblem also has its neighborhood index list , which consists of indices of the closest weight vectors to in the weight vector space.

After the initialization of and , the following steps are repeatedly performed until a termination condition is satisfied. In each iteration , for the -th subproblem (), an index list is set to . Two indices and of the parent individuals and are randomly selected from . is generated by applying variation operators to and . The SBX crossover and the polynomial mutation [9] are used in the original MOEA/D. After has been generated, the replacement selection is applied to each neighborhood subproblem . The current solution of the -th subproblem is replaced with if .

#### Moea/d-De

The differences between MOEA/D and MOEA/D-DE are threefold. First, the differential evolution (DE) operator [43] is used instead of the SBX crossover [9]. Second, two types of index lists ( and ) are used. The index list is set to with a probability of and with a probability of . Thus, can be set to all individual indices. Third, the maximum number of individuals replaced by the child is restricted to .

#### Moea/d-Agr

The difference between MOEA/D-AGR and MOEA/D-DE is caused by a replacement method. In MOEA/D-AGR, indices of subproblems to be updated by a newly generated solution are set to neighborhood indices of the -th subproblem in the weight vector space where is the subproblem index whose scalarizing function value is the best among all subproblems:

 j=argmink∈{1,...,N}{g(% \boldmathu|\boldmathwk)}. (1)

The weighted Tchebycheff function is used in (1):

 gtch(\boldmathx|\boldmathw)=maxi∈{1,...,M}{wi|fi(\boldmathx)−z∗i|}, (2)

where is the ideal point. Since finding the true ideal point is difficult in general, its approximation is used in (2). The -th element of the approximation of is the minimum objective value found during the search process.

The replacement neighborhood size plays a crucial role in balancing exploration and exploitation in MOEA/D-AGR in a similar manner to in MOEA/D. A large value encourages exploitation. In MOEA/D-AGR, the value deterministically increases with the number of iterations. The results presented in [50] show that a scheduling method based on the sigmoid function is suitable for MOEA/D-AGR.

#### Moea/d-Du

In contrast to MOEA/D-AGR, the selection of subproblems that need to be updated is based on the distance in the objective and weight vector spaces in MOEA/D-DU. In the normalized objective space, the perpendicular distance between the normalized objective vector and is calculated for each . The replacement is performed for subproblems with the minimum perpendicular distance. Similar to MOEA/D-AGR, controls the balance between exploration and exploitation.

MOEA/D-DU uses the division version of the weighted Tchebycheff function [36]:

 gdtch(\boldmathx|\boldmathw)=maxi∈{1,...,M}{|fi(\boldmathx)−z∗i|wi}, (3)

if , it is set to to avoid division by zero. It is reported in [36] that the distribution of the search directions of MOEA/D with is more uniform than that with .

#### eMOEA/D

In eMOEA/D, the replacement method is applied to subproblems with the minimum scalarizing function values. MOEA/D-AGR and eMOEA/D are the same regarding the use of scalarizing function values to select subproblems. The following multiplicative scalarizing function (MSF) [22] is used in eMOEA/D:

 (4)

where controls the size of the so-called improvement region. plays a similar role in the penalty value of the PBI function in (7), which will be explained later. Even if is far from regarding the the perpendicular distance, would be evaluated as being superior using a sufficiently large value. In (4), with is identical to .

According to a general rule of thumb “emphasize diversity and convergence at the early and later stages, respectively”, decreases linearly with the number of iterations :

 α=β(1−ttmax)(M(mink∈{1,...,M}{wk})), (5)

where is the maximum number of iterations. The recommended value of is .

### Ii-D Reference vector-based EMOAs

Representative reference vector-based EMOAs include NSGA-III [11], -DEA [52], RVEA [6], VaEA [51], and SPEA/R [23]. While is called the weight vector in decomposition-based EMOAs, it is referred to the reference vector in reference vector-based EMOAs. Below, we explain NSGA-III, -DEA, and RVEA.

#### Nsga-Iii

NSGA-III is an improved version of NSGA-II [10] for many-objective optimization. For each iteration , children are generated by applying variation operators to randomly selected pairs of individuals from . In the environmental selection, individuals for the next iteration are selected from the union of and . The primary and secondary criteria are based on the non-domination levels and the reference vector-based niching method, respectively.

Individuals in the union are grouped as according to their non-domination levels. First, and the front index are initialized as and , respectively. Then, individuals in are added to and is incremented until . After this operation, , where is the index of the last front. If , other individuals are selected from the last front using the niching method described below. At the beginning of the niching procedure, an individual in and is assigned to the -th subproblem with the minimum perpendicular distance between its normalized objective vector and :

 j=argmink∈{1,...,N}{PD(\boldmathf′(\boldmathx),\boldmathwk)}, (6)

where the function returns the perpendicular distance between two input vectors. After the assignments of all individuals, other individuals in the next iteration are selected from based on the number of individuals assigned to each subproblem and their perpendicular distance.

#### θ-Dea

The environmental selection in -DEA is similar to that of NSGA-III. However, the secondary criterion in -DEA is based on the so-called -dominance. First, each individual in the union of and is assigned to the -th subproblem using (6). Then, individuals assigned to each subproblem are ranked based on their -dominance levels. Let us assume that two individuals and are assigned to the same -th subproblem. is said to -dominate if . The PBI function is given as:

 gpbi(\boldmathx|\boldmathw) =d1+θd2, (7) d1 =∥(\boldmathf(\boldmathx)−% \boldmathz∗)T\boldmathw∥∥\boldmathw∥, (8) d2 =∥∥ ∥∥\boldmathf(\boldmathx)−(% \boldmathz∗+d1\boldmathw∥\boldmathw∥)∥∥ ∥∥, (9)

where indicates the Euclidean norm of . The distance represents how close the objective vector is to the Pareto front, and is the perpendicular distance between and . The penalty parameter balances the convergence () and the diversity (). The recommended value is for vectors with the axis directions (e.g., ) and for all the other vectors .

#### Rvea

RVEA uses a set of unit reference vectors , instead of a set of reference vectors , where for each . RVEA adaptively adjusts based on the current .

After children have been generated, the environmental selection is applied to the union of and . First, for each individual in , is transformed as . Then, in is assigned to the -th subproblem with the minimum angle between and :

 j=argmink∈{1,...,N}{angle(\boldmathf′(\boldmathx),\boldmathvk)}, (10)

where the function in (10) returns the angle between the two input vectors and .

Then, individuals assigned to the same subproblem are compared based on their angle-penalized distance (APD) values. For each , the best individual with the minimum APD value can survive to the next iteration. The APD value of with the unit reference vector is given as:

 APD(\boldmathx) =(1+P(\boldmathx,\boldmathv))∥\boldmathf′(\boldmathx)∥, (11) P(\boldmathx,\boldmathv) =M(ttmax)α(angle(\boldmathf′(\boldmathx),\boldmathv)γ(\boldmathv)), (12) γ(\boldmathv) =min\boldmaths∈\boldmathV∖{%\boldmath$v$}{angle(\boldmathv,\boldmaths)}, (13)

where is a penalty value for . The larger the angle between and is, the larger the penalty value is given to . is used to normalize the angle. is the current number of iterations, and is the maximum number of iterations. The influence of the angle-based penalty scheme increases as the search progresses. The recommended setting of is .

This section explains the proposed ADA framework and the six ADA versions. Unlike EMOAs for MOPs, EMMAs for MMOPs need to maintain the diversity of the population in both the objective and solution spaces. For example, Omni-optimizer [13] uses an aggregate crowding distance metric in the objective and solution spaces. While most EMMAs (e.g., [41, 47, 54]) aggregate the objective and solution space diversity metrics similar to Omni-optimizer, ADA uses them in a two-phase manner similar to TriMOEA-TA&R [30]. More specifically, ADA handles the objective space diversity by an assignment method in an original EMOA and the solution space diversity by a simple niching criterion.

On the one hand, a single individual is assigned to each subproblem in most decomposition-based EMOAs. Thus, the population size is equal to the number of weight/reference vectors (i.e., ). On the other hand, one or more individuals can be assigned to each subproblem in the ADA framework (i.e., ). This mechanism is to maintain multiple equivalent individuals in each subproblem. The value is adaptively adjusted in ADA.

Algorithm 1 shows the ADA framework. Algorithms S.7–S.12 in the supplementary file show the six ADA-based algorithms. Algorithms S.7–S.12 are almost the same to Algorithm 1. Lines 3, 11, and 16 are different between Algorithms S.7–S.12 and Algorithm 1. At the beginning of the search, is set to (line 1). The population and the weight/reference vector set are also initialized. The -th individual in is assigned to the -th subproblem (lines 2–3). This operation is unnecessary for those EMOAs with a reassignment procedure of individuals to subproblems such as RVEA. After the child has been generated by the reproduction operations (line 6–8), the population is updated using the assignment (line 10), deletion (lines 16–17), and addition (lines 18–19) operations. The assignment and deletion operations differ depending on an EMOA to be combined with the ADA framework. After the normalization of the objective vectors (line 9), is assigned to the -th subproblem (line 10). denotes a set of individuals that have been assigned to the -th subproblem and are in the neighborhood of in the solution space (line 11). Thus, ADA requires a neighborhood criterion in the solution space. is explained later in detail using Fig. 2. Two Boolean variables and (line 12) are used in the addition operation (lines 18–19). enters if it satisfies either of the following two conditions. One is that there is no individual in (lines 13–14). The other is that is better than at least one individual in in the deletion operation (lines 15–17).

The following Subsections (from III-A to III-F) explain each step of ADA in detail. Subsection III-G presents an effective decision making based on a solution set found by the proposed approach. Apart from the decision making, Subsection III-H introduces two post-processing methods for benchmarking. Subsection III-I discusses the applicability of ADA. Subsection III-J discusses the originality of ADA.

### Iii-a Reproduction operation

ADA uses the basic GA operators (i.e., the SBX crossover and the polynomial mutation [9]) and the simplest method of selecting parents. First, two parents and are randomly selected from such that , regardless of the subproblem to which each individual has been assigned. Then, is reproduced by applying SBX to and . Finally, the polynomial mutation is applied to . All ADA-based algorithms use the same mating selection scheme, regardless of the mating selection schemes in their original EMOAs.

In principle, any variation operators can be incorporated into ADA, including DE operators [26], PSO operators [54], and model-based methods [41, 58]. Any parent selection methods can also be used in ADA, including the distance-based selection methods in the solution space [5]. The performance of ADA can be improved by using these more sophisticated methods. However, if such effective methods are used in ADA, it is unclear which algorithmic component mainly contributes to the overall performance of ADA-based algorithms. Since we want to investigate the effectiveness of the ADA framework in an isolated manner, we use the simplest variation operators and parent selection method in this paper.

### Iii-B Normalization

Since the objective functions of most real-world problems are differently scaled, the normalization method is mandatory. In ADA, the normalization method depends on an EMOA to be combined. For example, NSGA-III-ADA uses the intercept-based normalization method in NSGA-III.

If the original EMOA does not have a normalization method (e.g., MOEA/D-AGR), we use the following simple normalization method. The objective vector is normalized using the approximated ideal point and the worst point in the population . The -th element of the normalized objective vector is given as: .

### Iii-C Assignment operation

The child is assigned to the -th subproblem based on its normalized objective vector (line 10 in Algorithm 1). The assignment operation plays a crucial role in ADA to maintain the diversity of the population in the objective space.

Each EMOA has a different method to select the index of the subproblem to which is assigned. MOEA/D-AGR-ADA and eMOEA/D-ADA select the -th subproblem with the minimum scalarizing function value as in (1). MOEA/D-DU-ADA, NSGA-III-ADA, and -DEA-ADA assign to the -th subproblem with the minimum perpendicular distance between and as in (6). RVEA selects the subproblem with the minimum angle between and the unit reference vector as in (10). ADA requires only a single index for whereas MOEA/D-AGR, eMOEA/D, and MOEA/D-DU select subproblem indices from .

### Iii-D Neighborhood criterion in the solution space

ADA requires the neighborhood criterion in the solution space (line 11 in Algorithm 1). While the objective space diversity is maintained by the assignment operation, the solution space diversity is controlled by the neighborhood criterion.

We use a simple relative distance-based neighborhood criterion presented in [45]. First, the normalized Euclidean distance between each individual in and is calculated in the solution space. The upper and lower bounds for each decision variable of a problem are used for the normalization. Then, all individuals in are sorted based on their distance values in descending order. If is within the nearest individuals from , is said to be a neighbor of in the solution space. is a control parameter in this neighborhood criterion.

In addition to the relative distance-based neighborhood criterion, any neighborhood criterion can be incorporated into ADA. A number of niching methods have been proposed in the multi-modal single-objective optimization community [28]. Modern niching methods include the adaptive radius-based method [3] and the nearest-better clustering [35]. However, our preliminary results show that such sophisticated niching methods do not work well in ADA. The main reason for the failure is that an appropriate parameter specification for such a modern niching method is difficult due to the difficulty in understanding the fitness landscape of an MOP [24]. For this reason, we use the simple neighborhood criterion.

### Iii-E Deletion operation

Let be a set of individuals that are in the neighborhood of in the solution space among the individuals assigned to the same -th subproblem as (line 11 in Algorithm 1). Fig. 2 shows an example of . In Fig. 2, , , , and has been assigned to the third-subproblem (). A set of neighborhood individuals of in the solution space is . A set of individuals assigned to the third subproblem in the objective space is . In this case, . Whereas is the neighborhood individual of , has been assigned to the second subproblem (). has been assigned to the third subproblem with , but is not in the neighborhood of . Since is dominated by with respect to the two objectives in Fig. 2, is deleted in most EMOAs. In contrast, can survive in in ADA. This is because is not a neighbor of (and any other individual assigned to the third subproblem). Thus, is not compared to . Although the quality of is poor, contributes to the solution space diversity of .

A paired comparison between and each in is performed (lines 15–17 in Algorithm 1). If is evaluated as being worse than (by the evaluation criterion explained in the next paragraph), is deleted from . The deletion of such is reasonable since it is based on both the quality in the objective space and the diversity in the solution space.

The comparison criterion depends on the environmental selection in each original EMOA. In MOEA/D-type algorithms, the comparison is based on the scalarizing function . MOEA/D-AGR-ADA, MOEA/D-DU-ADA, and eMOEA/D-ADA use in (2), in (3), and in (4), respectively. If on the -th subproblem, is removed from . In NSGA-III-ADA and -DEA-ADA, the primary comparison between and is based on the Pareto-dominance relation. In NSGA-III-ADA, ties are broken by the perpendicular distance between the normalized objective vector and . In -DEA-ADA, the tie-breaker is the -dominance. RVEA-ADA uses the APD value given in (11).

The child is added to the population if either of the following two conditions is met (lines 18–19 in Algorithm 1). One is that no individual exists in . An empty means that there is no neighborhood individual of in in the solution space (or the objective space). If the first condition is met, enters without any comparison. Although with inferior quality is likely to enter , it helps to maintain the solution space diversity (or the objective space diversity). While a dominated individual is unlikely to survive to the next iteration in most EMOAs, it can remain in due to the first criterion in the addition operation.

The other is that performs better than at least one individual in in the deletion operation (lines 16–17 in Algorithm 1). Since has good quality in its neighborhood in the solution space, it should be added to .

### Iii-G Decision making

We introduce an effective decision making with a solution set found by an ADA-based algorithm. Fig. 7 shows an example of the decision making. Fig. 7 (a) exhibits the distribution of all objective vectors in the final population of MOEA/D-AGR-ADA on the two-objective and two-variable SYM-PART1 problem [37]. As shown in Fig. 7 (d), nine equivalent Pareto optimal solution sets are on the nine lines in SYM-PART1. The experimental setting is described in Section IV later. The decision making in ADA is the following two-phase method based on and .

#### \boldmathAprimary-based decision making

Fig. 7 (a) shows that contains a large number of solutions. This is undesirable for the decision maker, because she/he usually wants to examine a small number of well-distributed solutions in the objective space [56]. To address this issue, the best solution for the -th subproblem is selected from based on the same criterion in the deletion operation (). Then, only non-dominated solutions are selected from the best solutions. Let be a set of the non-dominated solutions obtained by this procedure. Fig. 7 (b) shows non-dominated solutions in in the objective space. The decision maker examines objective vectors in , rather than .

When the decision maker wants to examine or less non-dominated solutions (e.g., only 10 solutions), other solution selection methods can be used to select from . If the number of objectives is less than , efficient hypervolume-based selection approaches (e.g., [4]) are available. If , computationally cheap distance-based selection methods (e.g., [42]) can be used.

#### \boldmathAsecondary-based decision making

After the decision maker has examined in the objective space, she/he selects the final solution from . Suppose that the decision maker has selected a solution on the 77-th subproblem as in Fig. 7 (b). In ADA, she/he can examine other dissimilar solutions with similar quality to . Let be a set of all solutions assigned to the same 77-th subproblem with . Figs. 7 (c) and (d) show the distribution of all nine solutions in in the objective and solution spaces, respectively. Although the nine solutions in have almost the same quality in the objective space, they are dissimilar in the solution space. In addition to , the decision maker can examine other candidates in based on her/his preference in the solution space.

### Iii-H Two Post-processing methods for benchmarking

Apart from the practical two-phase decision making method described in Subsection III-G, we here consider benchmarking of ADA-based EMMAs. The performance of EMOAs and EMMAs is evaluated using performance indicators. However, a fair performance comparison is difficult between ADA-based EMMAs and other optimizers. This is because the current population in ADA can contain an unbounded number of solutions. Most performance indicators cannot assess solution sets with different sizes in a fair manner [20]. IGD [18] and IGDX [58] described in Section IV later are not exceptions. Thus, ADA requires a method of selecting a constant number of solutions from for benchmarking studies.

We introduce two post-processing methods that are used to evaluate the performance of ADA-based algorithms for MOPs and MMOPs, respectively. In general, uniformly distributed solutions are unlikely to be uniformly distributed objective vectors due to a non-uniform mapping from the solution space to the objective space. To address this issue, we use the two post-processing methods for the objective and solution spaces, respectively. On the one hand, the same method of selecting from described in Subsection III-G is used for performance indicators of MOPs (e.g., IGD [8]). Since indicators of MOPs assess the distribution of a solution set in the objective space, the choice of is reasonable.

On the other hand, the solution distance-based selection method presented in [45] is used for performance indicators for MMOPs (e.g., IGDX [58]). Let us consider the task of selecting sparsely distributed solutions from all non-dominated solutions in . Below, denotes the distance between a solution and its nearest solution in a solution set in the normalized solution space. First, is set to empty. A solution is randomly selected from the non-dominated solution set and stored into . Then, a solution with the maximum value is repeatedly added to until . Unlike and , is used only for benchmarking of ADA-based algorithms.

ADA is a framework to improve the performance of decomposition-based EMOAs for MMOPs. We do not claim that ADA can be combined into any decomposition-based EMOAs. Since ADA requires a method of assigning a child to a subproblem, ADA is inapplicable to EMOAs with no assignment mechanism. Such EMOAs include MOEA/D [56] and MOEA/D-DRA [57]. Also, the deletion operation performs the pairwise comparison independently from other individuals. Thus, ADA is not applicable to EMOAs whose environmental selection is performed for all individuals in , such as MOEA/D-STM [27] and VaEA [51].

In summary, ADA can be combined into EMOAs with an assignment mechanism and a pairwise comparison-based environmental selection. The six EMOAs explained in Subsections II-C and II-D satisfy these conditions. In addition, ADA is applicable to I-DBEA [1] and SPEA/R [23].

Fortunately, even if a decomposition-based EMOA does not satisfy the above-mentioned two conditions, ADA can be applied to the EMOA after some modifications. For example, since MOEA/D does not have the assignment method, we cannot directly combine ADA in MOEA/D. However, MOEA/D can be easily modified by using any of the three assignment methods described in Subsection III-C. In fact, MOEA/D-AD proposed in [45] is an ADA-based MOEA/D with the assignment operation in (6). Thus, the applicability of ADA is not limited to only a few decomposition-based EMOAs.

In addition to MOEA/D-AD [45], a variant of MOEA/D for MMOPs is proposed in [16]. The MOEA/D variant assigns individuals to each subproblem. The fitness value is based on the PBI function value and two distance values in the solution space. The main disadvantage of the MOEA/D variant in [16] is the difficulty in finding a proper value. Since the number of equivalent Pareto optimal solution subsets is unknown a priori, fine-tuning of is necessary for a given problem. Although a multi-start decomposition-based approach is proposed in [37], it has a similar disadvantage. In contrast, ADA adaptively adjusts the number of individuals assigned to each subproblem. Thus, ADA does not require a problem-dependent parameter such as .

TriMOEA-TA&R [30] consists of various advanced components, including the convergence and diversity archives-based strategy as in Two_Arch2 [49], the decision variable-decomposition method in MOEA/DVA [31], and the angle-based individual assignment in RVEA [6]. ADA and TriMOEA-TA&R are similar in that they handle diversity in the objective and solution spaces in a two-phase manner. While TriMOEA-TA&R uses an absolute distance-based neighborhood criterion with , ADA uses the relative distance-based neighborhood criterion with . ADA uses a much simpler mating scheme, and it does not use any decomposition method of decision variables. ADA also uses the adaptive population sizing strategy to handle equivalent solutions, as mentioned above. Whereas TriMOEA-TA&R is an algorithm for MMOPs with distance-related variables, ADA is a general framework for various MMOPs.

## Iv Experimental settings

### Iv-a Test problems

We use the following eight multi-modal multi-objective test problems: the Two-On-One problem [34], the three SYM-PART problems [37], the two SSUF problems [29], the Polygon problem [17], and the Omni-test problem [13]. Table I shows their properties, including the number of objectives , the number of decision variables , and the number of equivalent Pareto optimal solution sets .

Two-On-One has two equivalent Pareto optimal solution sets that are symmetrical with respect to the origin. Equivalent Pareto optimal solution sets are on the nine lines in SYM-PART1, as shown in Fig. 7. The nine lines are rotated in SYM-PART2. In addition, the nine lines are distorted in SYM-PART3. SSUF1 and SSUF3 have two symmetrical Pareto optimal solution sets. We evaluate the scalability of EMMAs to using Polygon. We set of Polygon as follows: . Although can be any number in Polygon, it was set to be nine. We investigate the scalability of EMMAs to and using Omni-test. We set as follows: . increases exponentially with increased in Omni-test. In summary, we use 15 test problem instances.

HPS [55] and MMMOP [30] have been recently proposed. However, HPS and MMMOP have so-called “distance-related” variables that affect only the distance between the objective vector and the Pareto front. For this reason, we do not mainly use HPS and MMMOP for our benchmarking study. We use HPS and MMMOP only in Subsections V-E and V-F.

### Iv-B Performance indicators

Below, is a set of solutions obtained by an EMMA. is also a set of reference solutions in the Pareto optimal solution set. The size of was set to . For of each problem, solutions were selected from randomly generated Pareto-optimal solutions using the distance-based solution selection method [45] (Subsection III-H).

We use IGD [18] to evaluate in terms of both convergence to the Pareto front and diversity in the objective space:

 IGD+(\boldmathA) =1|\boldmathA∗|⎛⎜⎝∑\boldmathz∈\boldmathA∗min\boldmathx∈\boldmathA{d(\boldmathf(\boldmathx),\boldmathf(\boldmathz))}⎞⎟⎠, (14)

where . IGD is a modified version of IGD [8]. While the original IGD is Pareto non-compliant, IGD is weakly Pareto compliant.

We evaluate how well approximates the Pareto-optimal solution set in the solution space using IGDX [58]:

 IGDX(\boldmathA) (15)

where is the Euclidean distance between and .

EMOAs that can find with small IGD and IGDX values are efficient multi-objective optimizers and multi-modal multi-objective optimizers, respectively. In the ADA-based algorithms, and explained in Subsection III-H are used for the IGD and IGDX calculations, respectively.

### Iv-C Average performance score

We use the average performance score (APS) [2] in order to aggregate results on various problems. Suppose that algorithms are compared for a problem instance based on the indicator values obtained in multiple runs. For each and , if significantly outperforms using the Wilcoxon rank-sum test with , then ; otherwise, . The score is defined as follows: . The score represents the number of algorithms that outperform . The APS value of is the average of the values for all problem instances. A small APS value of indicates that performs well among algorithms.

### Iv-D EMMAs and EMOAs

The number of maximum function evaluations was . runs were performed for each test problem. A set of weight/reference vectors were generated using the simplex-lattice design method for and its two-layered version [11] for . The number of the weight/reference vectors was , , , , and , for , , , , and , respectively. The SBX crossover and the polynomial mutation were used in all methods, including MOEA/D-AGR. Their control parameters were set as follows: , , , and . According to the analysis presented in [45], of the neighborhood criterion in ADA was set to . For example, when . Other parameters were set according to the corresponding references.

## V Experimental results

This section shows performance analysis of the six ADA-based EMMAs. Subsection V-A describes the effect of ADA on the six EMOAs. Subsection V-B compares the six ADA-based EMMAs to state-of-the-art EMMAs. Subsection V-C discusses the adaptive population sizing in ADA. Subsection V-D investigates the influence of the value on the performance of the six ADA-based EMMAs. Subsection V-E examines the performance of the six ADA-based EMMAs on test problems with distance-related variables. Subsection V-F presents a comparison with the state-of-the-art EMMAs using an unbounded external archive. Subsection V-G discusses the runtime of the ADA-based algorithms.

Tables II (a) and (b) show the paired comparison of each EMOA and its ADA version on the 15 test problem instances in terms of IGD and IGDX, respectively. Table II shows only the APS value of each algorithm at the final iteration. Table S.1 in the supplementary file shows detailed results. As described in Subsection III-H, only solutions in and were used for the IGD and IGDX calculations. Thus, all algorithms are compared under the same number of solutions. We set the value to for the Wilcoxon rank-sum test. -Polygon is the -objective Polygon problem, and -Omni-test is the -variable Omni-test problem. Below, we describe the results of IGD and IGDX.

#### Igd+

Table II (a) shows that the original EMOAs outperform their ADA-based EMMAs regarding IGD (except for RVEA-ADA). In general, EMMAs perform slightly worse than EMOAs for multi-objective optimization [41, 54]. While EMOAs aim to find a good approximation of the Pareto front, EMMAs attempt to locate all Pareto optimal solutions. For this reason, the performance of the ADA versions regarding IGD is worse than that of their original EMOAs.

However, as shown in Table S.1 in the supplementary file, the IGD value of the ADA versions is only times worse than that of their original EMOAs even in the worst case. Also, the ADA versions perform better than their original EMOAs on some problems. For example, eMOEA/D-ADA obtains better IGD values than eMOEA/D on SYM-PART2, SSUF3, and 10-Polygon. As discussed in [48], a mechanism for maintaining the solution space diversity can help the ADA versions to find high-quality solutions.

#### Igdx

Table II (b) shows that the ADA-based EMMAs significantly outperform their original EMOAs. As shown in Table S.1, the IGDX value of the ADA versions is times better than that of their original EMOAs in the best case. These results indicate that ADA improves the performance of the six decomposition-based EMOAs for MMOPs.

Table S.1 shows that the ADA-based EMMAs work well on Polygon with and Omni-test with . Since the number of equivalent Pareto optimal solution sets increases exponentially with increased in Omni-test, the results show that ADA can also handle problems with a large value. In summary, ADA has a sufficient scalability to , , and .

#### Distribution of solutions

Figs. 21 and 21 show the distribution of non-dominated solutions found by eMOEA/D-ADA, NSGA-III-ADA, RVEA-ADA, and their original versions on Polygon with in the objective and solution spaces, respectively. These figures show results of a single run with a median IGD and IGDX values, respectively. For the ADA-based algorithms, non-dominated solutions in and are shown in Figs. 21 and 21, respectively. Figs. S.1 and S.2 in the supplementary file show the results of the other ADA-based algorithms, but the results are similar to Figs. 21 and 21. It was shown in [11] that some decomposition-based EMOAs did not work well on problems with convex Pareto fronts. It was also presented in [21] that some decomposition-based EMOAs performed poorly on problems with inverted-triangular Pareto fronts. Since Polygon has a convex and an inverted triangular Pareto front, it is difficult for decomposition-based EMOAs to find good solutions.

For the above-mentioned reasons, no method in Fig. 21 can approximate the Pareto front of Polygon well. Nevertheless, some ADA-based algorithms (e.g., eMOEA/D-ADA and RVEA-ADA) find better distributed objective vectors than their original EMOAs. This unintended effect of ADA may be caused by the diversity of the population in the solution space [48]. Although we do not claim that ADA can improve the performance of EMOAs for MOPs in addition to MMOPs, the solution space diversity maintenance might help the original EMOAs to handle problems with irregular Pareto fronts.

In the Polygon problem with and , equivalent Pareto solution sets are inside of the nine regular triangles in the solution space. Figs. 21 (a)–(c) show that the original EMOAs cannot locate all equivalent Pareto solution sets well. In contrast, Figs. 21 (d)–(f) demonstrate that their ADA versions can locate all nine equivalent Pareto solution sets. Results on other test problems are similar to Fig. 21.

### V-B Comparison with other EMMAs

In Subsection V-A, we demonstrated that ADA can improve the performance of the six EMOAs for MMOPs. Here, we compare the six ADA-based algorithms to the following three EMMAs: Omni-optimizer [13], MO_Ring_PSO_SCD [54], and TriMOEA-TA&R [30]. Omni-optimizer is the most representative EMMA. MO_Ring_PSO_SCD and TriMOEA-TA&R are recently proposed methods. Default parameter settings were used for the three methods.

Table III shows the APS values of the nine EMMAs on the 15 test problem instances. Table S.2 in the supplementary file shows detailed results. TriMOEA-TA&R incorrectly recognizes that some problems have distance-related variables (e.g., SYM-PART1). In such a case, TriMOEA-TA&R generates a set of additional solutions by recombining solutions in the diversity archive and the distance-related variables at the end of the search. Since , methods of selecting solutions from are needed. We use the two post-processing methods in ADA. In the same manner as in ADA, and are used for the IGD and IGDX calculations, respectively. We use the selection method in NSGA-III-ADA to obtain from .

Table III (a) shows that Omni-optimizer performs the best regarding IGD. The good performance of Omni-optimizer for MOPs is consistent with the results presented in [54, 45]. NSGA-III-ADA shows the best performance regarding IGD among the six ADA-based algorithms. Table III (b) shows that the six ADA-based algorithms perform better than the three other algorithms regarding IGDX. TriMOEA-TA&R performs the worst regarding IGDX. This is because the 15 problem instances have no distance-related variables. In summary, the results indicate that the six ADA-based algorithms have better performance than the state-of-the-art algorithms for MMOPs.

In ADA, the number of individuals assigned to each subproblem is adaptively adjusted by the deletion and addition operations. Thus, the population size is not constant. Here, we discuss the adaptive population sizing in ADA.

Fig. 23 shows the change of of NSGA-III-ADA on Omni-test with . Results of the other ADA versions are similar to Fig. 23. The value for is always larger than that for . In contrast, the value decreases as the value increases from . This may be because problems with a large value are generally difficult for EMMAs. Since the search does not proceed well for , the value does not significantly increase. The difficulty of finding multiple equivalent Pareto sets has been reported for the case of a large value [19].

In addition, the trajectory of is problem-dependent. Fig. S.3 in the supplementary file shows the change of on other problems. While the value is approximately on Two-On-One and 3-Polygon at the end of the search, it is approximately on other problems. Fig. S.3 shows that the value sharply increases in early iterations. After some evaluations, the value is stable.

Fig. 23 shows the cumulative numbers of the activations of the addition operation based on the two criteria ( and in Algorithm 1) and the deletion operation in NSGA-III-ADA on 3-Polygon. On the one hand, the number of activations of the addition operation based on is almost the same as that of the deletion operation throughout the evolution in Fig. 23. This competitive behavior of the two operations leads to the stable value in Fig. S.3. On the other hand, the addition operation based on is frequently performed only in an early stage of evolution in Fig. 23. Thus, the sharp increase of the value in Fig. S.3 is due to the -based addition operation. Since only one individual is assigned to each subproblem at the beginning of the search, the -based addition operation is often activated.

In summary, the value is adaptively adjusted by the cooperative behavior of the deletion and addition operations in ADA. In the addition operation, the roles of the two criteria and differ from each other. Thus, it is important for ADA to use both and .

### V-D Impact of L on the six ADA-based algorithms

We investigate the influence of the value on the performance of the ADA-based algorithms. Table IV shows the APS values of the six ADA-based algorithms with six values on the 15 test problem instances. Tables S.3–S.8 in the supplementary file show detailed results.

Table IV (a) shows that the six ADA-based algorithms with have the worst performance regarding IGD. Too small values degrade the performance of the ADA-based algorithms regarding IGD. Table IV (b) shows that is most suitable for all ADA-based algorithms (except for RVEA-ADA) in terms of IGDX. Too large values degrade the performance of the ADA-based algorithms regarding IGDX. In summary, are suitable for most ADA-based algorithms for MMOPs.

We also investigate the influence of on the evolution of . Figs. S.4 and S.5 in the supplementary file show the change of of NSGA-III-ADA with various values on Omni-test with various values and other problems, respectively. Results of other ADA versions are similar to the results of NSGA-III-ADA. Figs. S.4 and S.5 indicate that the value decreases as the value increases. The larger the value is, the more individuals can be neighbors of the child . As a result, the niching mechanism in ADA deteriorates. These observations are consistent with the above-mentioned results of IGDX.