COMO-CMA-ES and the Sofomore framework

Uncrowded Hypervolume Improvement: COMO-CMA-ES and the Sofomore framework

Cheikh Touré Inria, CMAP, Ecole Polytechnique, IP Paris cheikh.toure@polytechnique.edu Nikolaus Hansen Inria, CMAP, Ecole Polytechnique, IP Paris firstname.lastname@inria.fr Anne Auger Inria, CMAP, Ecole Polytechnique, IP Paris firstname.lastname@inria.fr  and  Dimo Brockhoff Inria, CMAP, Ecole Polytechnique, IP Paris firstname.lastname@inria.fr
Abstract.

We present a framework to build a multiobjective algorithm from single-objective ones. This framework addresses the -dimensional problem of finding solutions in an -dimensional search space, maximizing an indicator by dynamic subspace optimization. Each single-objective algorithm optimizes the indicator function given fixed solutions. Crucially, dominated solutions minimize their distance to the empirical Pareto front defined by these solutions. We instantiate the framework with CMA-ES as single-objective optimizer. The new algorithm, COMO-CMA-ES, is empirically shown to converge linearly on bi-objective convex-quadratic problems and is compared to MO-CMA-ES, NSGA-II and SMS-EMOA.

Multiobjective optimization, single-objective optimization, hypervolume, quality indicator, hypervolume contribution, hypervolume improvement
journalyear: 2019copyright: acmlicensedconference: Genetic and Evolutionary Computation Conference; July 13–17, 2019; Prague, Czech Republicbooktitle: Genetic and Evolutionary Computation Conference (GECCO ’19), July 13–17, 2019, Prague, Czech Republicprice: 15.00doi: 10.1145/3321707.3321852isbn: 978-1-4503-6111-8/19/07ccs: Mathematics of computing Nonconvex optimization

1. Introduction

Multiobjective optimization problems must be solved frequently in practice. In contrast to the optimization of a single objective, solving a multiobjective problem involves to handle trade-offs or incomparabilities between the objective functions such that the aim is to approximate the Pareto set—the set of all Pareto-optimal or non-dominated solutions. One might be interested to obtain an approximation of unbounded size (the more points the better) or just to have points approximating the Pareto set. Evolutionary Multiobjective Optimization (EMO) algorithms aim at such an approximation in a single algorithm run whereas more classical approaches, e.g. optimizing a weighted sum of the objectives with changing weights, operate in multiple runs.

The first introduced EMO algorithms simply changed the selection of an existing single-objective evolutionary algorithm keeping the exact same search operators. The population at a given iteration was then providing an approximation of the Pareto set. This idea led to the practically highly successful NSGA-II algorithm (Deb et al., 2002) that employs a two-step fitness assignment: after a first non-dominated ranking (Goldberg, 1989), solutions with equal non-domination rank are further distinguished by their crowding distance—based on the distance of each solution to its neighbors in objective space. However, it has been pointed out that NSGA-II does not converge to the Pareto set in a mathematical sense due to so-called deteriorative cycles: if all population members of the algorithm are non-dominated at some point in time, it is only the crowding distance that is optimized, without indicating any search direction towards the Pareto set to the algorithm. As a result, solutions which had been non-dominated solutions at some point in time can be replaced by previously dominated ones during the optimization, ending up in a cyclic but not in convergent behavior (Berghammer et al., 2010).

To improve the convergence properties of EMO algorithms, different approaches have been introduced later, most notably the indicator-based algorithms and especially algorithms based on the hypervolume indicator. They replace the crowding distance of NSGA-II with the (hypervolume) indicator contribution, see e.g. (Igel et al., 2007; Beume et al., 2007). Using the hypervolume indicator has the advantage that it is the only known strictly monotone quality indicator (Knowles et al., 2006) (see also next section) and thus, its optimization will result in solution sets that are subsets of the Pareto set.

The optimization goal of indicator-based algorithms such as SMS-EMOA (Beume et al., 2007) or MO-CMA-ES (Igel et al., 2007) is to find the best set of solutions with respect to a given quality indicator (the set with the largest quality indicator value among all sets of size ). This optimal set of solutions is known as the optimal -distribution (Auger et al., 2009). In principle, the search for the optimal -distribution can be formalized as a -dimensional optimization problem where is the number of solutions and is the dimension of the search space.

As we will discuss later, it turns out that this optimization problem is not only of too high dimension in practice but also flat in large regions of the search space if the hypervolume indicator is the underlying quality indicator. The combination of non-dominated ranking and hypervolume contribution as in SMS-EMOA or MO-CMA-ES corrects for this flatness, but also introduces search directions that are pointing towards already existing non-dominated solutions and not towards not-yet-covered regions of the Pareto set. In this paper, we show that we can correct the flat region of the hypervolume indicator by introducing a search bias towards yet-uncovered regions of the Pareto set by adding the distance to the empirical non-domination front, which leads to the new notion of Uncrowded Hypervolume Improvement. Then, we define a (dynamic) fitness function that can be optimized by single-objective algorithms. From there, going back to this original idea of EMO algorithms to use single-objective optimizers to build an EMO, we define the Single-objective Optimization FOr Optimizing Multiobjective Optimization pRoblEms framework (Sofomore) to build in an elegant manner, a multiobjective algorithm from a set of single-objective optimizers. Each single-objective algorithm optimizes (iteratively or in parallel) a dynamic fitness that depends on the output of the other optimizers.

We instantiate the Sofomore framework with the state-of-the-art single-objective algorithm CMA-ES. We show experimentally that the ensuing COMO-CMA-ES (Comma-Selection Multiobjective CMA-ES) exhibits linear convergence towards the optimal -distribution on a wide variety of bi-objective convex quadratic functions. In contrast, default implementations of the SMS-EMOA where the reference point is fixed and NSGA-II do not exhibit this linear convergence. The comparison between COMO-CMA-ES and a previous MATLAB implementation of the elitist MO-CMA-ES also shows the same or an improved convergence speed in COMO-CMA-ES except for the double sphere function.

The paper is structured as follows. In the next section, we start with preliminaries related to multiobjective optimization and quality indicators. Section 3 discusses the fitness landscape of indicator- and especially hypervolume-based quality measures and eventually introduces our Sofomore framework. Section 4 gives details about the new COMO-CMA-ES algorithm as an instantiation of Sofomore with CMA-ES. Section 5 experimentally validates the new algorithm and compares it with three existing algorithms from the literature and Section 6 discusses the results and concludes the paper.

2. Preliminaries

In the following, we assume without loss of generality the minimization of a vector-valued function that maps a search point from the search space of dimension to the objective space . This minimization of is generally formalized in terms of the weak Pareto dominance relation for which we write that a search point weakly Pareto-dominates another search point (written in short as or with an abuse of notation as ) if and only if for all . Note also that we can naturally extend the (weak) Pareto dominance relation to subsets as if and only if for all , there exists such that . If the relation is strict for at least one objective function, we say that Pareto-dominates (and write ). The set of non-dominated search points constitutes the so-called Pareto set, its image under is called the Pareto front. In the remainder, we will also use the term empirical non-dominated front or empirical Pareto front () for objective vectors that are on the boundary of the (objective space) region dominating a reference point , and not dominated by any element of with :

(1)

where is the boundary of the non-dominated region . Note that is the Pareto front when contains the Pareto set.

Indicator-Based Set Optimization Problems

Pareto sets and Pareto fronts are, under mild assumptions, dimensional manifolds. In practice, we are often interested in a finite size approximation of these sets with, let us say, many search points. To assess the quality of a Pareto set approximation , a quality indicator assigns a real valued quality to . Formally speaking, this transforms the original multiobjective optimization of into the single-objective set problem of finding the so-called optimal -distribution (Auger et al., 2012)

(2)

as the set of search points of cardinality (or lower) with the highest indicator value among all sets of this size (Auger et al., 2009).

Natural candidates for practically relevant quality indicators are monotone or even strictly monotone indicators such as the epsilon-indicator (Zitzler and Künzli, 2004), the R2 indicator (Hansen and Jaszkiewicz, 1998), or the hypervolume indicator ((Zitzler and Thiele, 1998a; Auger et al., 2009), still the only known strictly monotone indicator family to date). We remind that an indicator is called monotone if —or in other words, if it does not contradict the weak Pareto dominance relation. If , we say that is strictly monotone.

Hypervolume, Hypervolume Contribution, and Hypervolume Improvement

Because the hypervolume indicator (Zitzler and Thiele, 1998a; Auger et al., 2009) and its weighted variant is the only known strictly monotone indicator, we will later on use it as well in our framework. The hypervolume (Zitzler and Thiele, 1998b) of a finite set of solutions with respect to the reference point is defined as , where is the Lebesgue measure on the objective space and is the objective function. In the case of two objective functions, the hypervolume indicator value of non-dominated solutions with can also be written as the sum of the area of axis parallel rectangles: ; .

Furthermore, the hypervolume contribution of a search point to a solution set with respect to the reference point is the hypervolume indicator value that we lose when we remove from the set (Bringmann and Friedrich, 2011):

Also, the Hypervolume Improvement of a search point to a finite set with respect to the reference point is defined as (Emmerich et al., 2005; Yang et al., 2019) : Or in other words, equals the increase in hypervolume when is added to the set . Up to a null set .

3. Sofomore: Building Multiobjective from Single-Objective Algorithms

Quality indicators have been introduced as a way to measure the quality of a set of objective vectors but also to define a multiobjective optimization problem as a single-objective set problem of maximizing the quality indicator as in (2). This naturally defines a single-objective dimensional problem to be maximized

(3)

Because and in particular are typically large in practice, we usually do not attempt to solve a multiobjective optimization problem by directly optimizing (3). Nevertheless, when is the hypervolume indicator, Hernández et al. suggest to use a Newton method to directly solve (3). It assumes that is twice continuously differentiable, in which case the gradient and Hessian of can be computed analytically (Hernandez et al., 2018). Yet, directly attacking (3) is not possible because dominated points have a zero sub-gradient and the Newton direction is therefore zero. Thus, Hernández et al. need to start from a set of non-dominated points, close enough to the Pareto set, which requires in practice to couple the approach with another algorithm (Hernandez et al., 2018).

Instead of directly optimizing (3), our proposed Sofomore framework performs iterative subspace optimization of the function and penalizes the flat landscape of in dominated regions. More precisely, the basic idea behind Sofomore is to optimize subspace- or component-wise, by iteratively fixing all but one search point and only optimizing the indicator with respect to while the other search points are temporarily fixed. Hence we maximize the functions

(4)

If the placement of each of the search points () is optimized iteratively by fixing a different point set each time, as we suggest in our Sofomore framework below, we are in the setup of optimizing a dynamic fitness. More details on this aspect of our Sofomore framework will be given below in Section 3.2.

3.1. A Fitness Function for Subspace Optimization

If we use as quality indicator in (4) a (strictly) monotone indicator like the hypervolume indicator, the overall fitness is flat in the interior domain of regions where points are dominated. Hence, we suggest to not optimize (4) directly but to unflatten it in dominated areas of the search space without changing the optimization goal.

Any solution that is dominated by the other points in will receive zero fitness when we use as indicator in (4) the hypervolume indicator of the entire set with respect to the reference point or replace it with the hypervolume improvement of the solution to . This situation is depicted in the first column of Figure 1 where for a fixed set of six arbitrarily chosen search points, the hypervolume improvement’s level sets (of equal fitness) in both search and objective space are shown. This flat fitness with zero gradient will not allow to steer the search towards better search points which has also been highlighted by Hernández et al. (Hernandez et al., 2018).

A common approach to guide an optimization algorithm in the dominated space is to use the hypervolume (contribution) as secondary fitness after non-dominated sorting (Goldberg, 1989), as it is done for example in the SMS-EMOA (Beume et al., 2007) or the MO-CMA-ES (Igel et al., 2007). The idea is that all search points with a worse non-domination rank get assigned a fitness that is worse than for search points with a better non-domination rank. Within a set of the same rank, the hypervolume contribution with respect to all points with the same rank is used to refine the fitness. The middle column of Figure 1 shows the resulting level sets of equal fitness. As we can see, this fitness assignment distinguishes between dominated solutions, i.e. the fitness is not flat anymore. Yet it still has another major disadvantage: the search direction in the dominated area (perpendicular to its level sets) points towards already existing non-dominated solutions. Attracting dominated solutions towards non-dominated solutions seems however undesirable, as they will compete for the same hypervolume area. Instead, we want dominated points to enter the uncrowded space between non-dominated points thereby complementing their hypervolume contribution (improvement).

Uncrowded Hypervolume Improvement

For this purpose, we define the Uncrowded Hypervolume Improvement UHVI based on the Hypervolume Improvement for non-dominated search points and on the Euclidean distance to the non-dominated region for dominated search points. More concretely, of a search point with respect to a finite set and the reference point is defined as

(5)

where is the distance between an objective vector and the empirical non-domination front of the set defined as in (1).

We define the fitness for a search point with respect to other solutions in as

(6)

Note that is continuous on the empirical non-domination front where both the hypervolume improvement and the considered distance are zero.

Figure 1. Comparison of block-coordinate-wise fitness functions on the double sphere problem. Given a fixed set of six solutions, , the above three plots show level sets of equal fitness for a new search point in the search space, the second row shows the same level sets in objective space. Left: standard hypervolume improvement HVI. Middle: HVI within the local non-dominated fronts. Right: newly proposed hypervolume improvement (if non-dominated) together with distance to the non-dominated front (if dominated), denoted as uncrowded hypervolume improvement UHVI. Note that the colors for the fitness levels are not comparable over indicators, but are the same for a given indicator in both search and objective space. The search space plots further show the single-objective’s level sets as dotted lines. The black dot indicates the reference point of .

Figure 2 illustrates this fitness for one non-dominated and two dominated search points (blue plusses) with respect to a set of six other search points (black crosses). The right-hand column of Figure 1 shows the level sets of this fitness. The newly introduced hypervolume improvement and distance based fitness shows smooth level sets, both in search and in objective space. Maybe most importantly, in the dominated area, the fitness function’s descent direction (perpendicular to its level sets) now points towards the gaps in the current Pareto front approximation.

Figure 2. Illustration of the proposed UHVI fitness for three points in objective space (blue ). The top two points are dominated by the given set of five points (black crosses) and are thus assigned a fitness of their distance to the empirical non-dominated set while the bottom point is non-dominated and thus gets assigned the hypervolume improvement with respect to .

3.2. Iteratively Optimizing the Fitness: The Sofomore Framework

After we have discussed a fitness assignment that looks worth to optimize, we come back to our initial idea of subspace optimization and define the underlying algorithmic framework behind Sofomore.

At first, we consider a single-objective optimizer in an abstract manner as an iterative algorithm with state updated as where is the single-objective function optimized by the optimizer and encodes possible random variables sampled within one iteration if we consider a randomized algorithm (and can be taken as constant in the case of a deterministic optimizer). The transition function contains all updates done within the algorithm in one iteration.

We assume that in each iteration , the optimizer returns a best estimate of the optimum, often called incumbent solution or recommendation. This is the solution that the optimizer would return if we stop it at iteration . We denote this incumbent as —mapping the state of the algorithm to the estimate of the optimum given this state.

The overall idea behind the subspace optimization and the Sofomore framework can then be formalized as in Algorithm 1: after initializing single-objective algorithms with their states and denoting their transition functions as (), we consider their incumbents or recommendations as the search points that are expected to approximate the optimal -distribution.

In each step of the Sofomore framework, we choose one of the algorithms (denoted by its number , with ) and run it iterations on the fitness to update the recommendation while keeping all other recommendations fixed. It is important to note that the fitness used for algorithm is actually changing dynamically with the optimization because it depends on all the other incumbents but which, over time, are expected to move towards the Pareto set as well.

Algorithm 1 proposes a generic framework where the order in which the single-objective algorithms are run and the number of iterations for them are not explicitly defined. A simple strategy would be to choose the algorithms at random or in a given, fixed order and run each single-objective algorithm a fixed number of time steps. But also more elaborate strategies can be envisioned, for example based on the idea of multi-armed bandits (Bubeck and Cesa-Bianchi, 2012): we can log the changes in the fitness value of each incumbent over time and favor as the next chosen algorithms the ones that give the highest expected fitness improvements. Note also that the single-objective algorithms’ types may be different such that we can combine local with global algorithms or even change the algorithms over time, allow restarts etc. In the following experimental validation of our concept, however, we choose a single optimization algorithm and a simple, random strategy to choose which of them to run next.

1:Given: the initial states of single-objective optimizers,
2:Initialize incumbents:
3:while not stopping criterion met do
4:     Choose in and
5:     REPEAT times:
6:       run th algo on fitness
7:      update in
8:end while
9:return
Algorithm 1 General Sofomore Framework, with fitness of (6)

With a simple change, Algorithm 1 can be made parallelizable (resulting in slightly different search dynamics though): postponing the updates of the after every algorithm has been touched at least once makes the optimization of the fitness functions independent such that they can be performed in parallel.

Relation of Sofomore with other existing algorithms

We briefly discuss how some existing algorithms and algorithm frameworks relate to the new Sofomore proposal.

The coupling of single-objective algorithms to form a multiobjective one has been done before, especially in the MOEA/D framework (Zhang and Li, 2007). In MOEA/D, static search directions (in objective space) are defined via (single-objective) scalarizing functions. Each of them is optimized in parallel with solutions potentially shared between neighboring search directions. On the contrary, the fitness in Sofomore is dynamic, depending on the other incumbents. Optimizing a set of scalarizing functions in classical approaches to multiobjective optimization have static optimization problems to solve without any interaction between them (Miettinen, 1999).

Many other EMO algorithms, such as NSGA-II, SMS-EMOA, or MO-CMA-ES are not covered by the Sofomore framework. One simple reason is that the UHVI is newly defined.

The already mentioned Newton algorithm on the hypervolume indicator fitness of (Hernandez et al., 2018) is probably the closest existing approach from Sofomore, but (Hernandez et al., 2018) needs to initialize the Newton algorithm with a set of non-dominated solutions in order for the algorithm to optimize due to the flat regions of its objective space. Also algorithms for expensive multiobjective optimization based on the optimization of the expected hypervolume improvement (Wagner et al., 2010) can be seen as related to Sofomore, although the proposal of new solutions in algorithms like SMS-EGO (Ponweiser et al., 2008) or S-metric based ExI (Emmerich and Klinkenberg, 2008) use Gaussian Processes to model the objective function. These algorithms, in contrary to Sofomore, propose iteratively a single solution based on the expected hypervolume improvement over all known solutions and do not aim at replacing successively a single recommendation by another (better) one. Interesting to note is that algorithms like SMS-EGO and S-metric based ExI employ the expected hypervolume indicator improvement as fitness while the approach of Keane (Keane, 2006) “uses the Euclidean distance to the nearest vector in the Pareto front” (Wagner et al., 2010).

4. Como-Cma-Es

In this section, we instantiate Sofomore with the CMA-ES as single objective optimizer.

Regarding the choice of which optimization algorithm to run (and how long), we opt for a simple strategy: we sample a permutation from , the set of all permutations on uniformly at random and use this fixed permutation to touch each algorithm once in the order of the permutation. Once all algorithms have been touched, we then resample a new permutation. We run each algorithm for a single iteration. Letting the algorithms run for a too long period right from the start seems suboptimal. As the fitness is dynamic, we do not need to optimize it too precisely. We mainly have two requirements for the choice of single objective optimizers: (i) an optimization algorithm has to be stoppable at any iteration and resumable thereafter and (ii) an optimization algorithm needs to be able to give a good recommendation about the best estimate of the optimum, given its current state. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES, (Hansen and Ostermeier, 2001)) is a natural choice. Not only is it a state-of-the-art algorithm for difficult blackbox optimization problems but also does it fulfill our requirements. In CMA-ES, the state of the algorithm is composed of a step size and the parameters of a multivariate normal distribution, namely a mean vector representing the favorite solution and a covariance matrix . In addition, two -dimensional evolution paths speed up step-size and covariance matrix adaptation. For each , the incumbent solution is the mean of the CMA algorithm. A convenient implementation of CMA-ES is via the ask and tell interface (Collette et al., 2010), where the ask function returns candidate solutions and the tell function updates the state from their fitness values. The interface allows to easily stop and resume the optimization and to integrate the dynamic fitness of Sofomore, see Algorithm 2. We call this instantiation of the Sofomore framework COMO-CMA-ES. The CMA-ES instances are called kernels.

1:Required:
2: objective function in dimension
3: lower and upper bounds for each variable
4:  and of a region of interest
5: number of desired solutions
6: global initial step-size for all CMA-ES
7: fixed reference point for the hypervolume indicator
8:Initialization:
9: for all
10:evaluate all on and store the for later use
11: for all
12:while not stopping criterion do
13:     sample uniformly at random a permutation from all
14:   permutations on
15:     for i = 1 to p do
16:          get offspring from
17: th CMA-ES
18:         compute the fitness
19:    for all
20:         
21:         
22:         update the stored objective vector
23:     end for
24:end while
25:Return
Algorithm 2 The COMO-CMA-ES: an instance of the Sofomore framework with the CMA-ES as single-objective optimizer

We see in particular how CMA-ES is integrated into Sofomore via its ask-and-tell interface. After choosing the next kernel , the corresponding CMA-ES instance samples solutions (“ask”). It then evaluates them on the uncrowded hypervolume improvement based fitness defined in Eq (6)—given all other kernels being fixed. After sorting the solutions with respect to their fitness, COMO-CMA-ES feeds the sampled points with their fitness values back to the CMA-ES instance (“tell”) which updates all its internal algorithm parameters. Finally, the new mean of the corresponding CMA-ES instance updates the list of the COMO-CMA-ES’s proposed solutions. Note here that CMA-ES is usually not evaluating the mean of the sample distribution which therefore is done in line 22.

5. Experimental Validation

We present in this section numerical experiments of the COMO-CMA-ES. Though, in principle, the algorithm can be defined for any number of objectives, we present results only for . We use the pycma Python package (Hansen et al., 2019) version for CMA-ES as single-objective optimizer without further parameter tuning.

5.1. Test Functions and Performance Measures

For a matrix and two vectors and , we denote

(7)

We also denote by the all-zeros vector, the all-ones vector, and the unit vector with its only nonzero value at position . Starting from a positive diagonal matrix , and two independent orthogonal matrices and , we consider the classes of bi-objective convex quadratic problems Sep-, One and Two defined as follows  (Toure et al., 2019)

  • , .

  • ,

  • , with .

If is the identity matrix, we call the problems as sphere-sep- in the first case and bi-sphere in the second and third cases (the rotations are ineffective). If for , then we denote the problems as elli-sep-, elli-one or elli-two. If , and for , then we have cigtab-sep-, cigtab-one or cigtab-two.

We fix the reference point to . The scalings above ensure that the reference point is dominated by all Pareto fronts considered, and that the Sep- and the One problems have the same Pareto front (see  (Toure et al., 2019)) than the bi-sphere . Note that the expression does not depend on the dimension .

Figure 3. Convergence of COMO-CMA-ES on sphere-sep- (first row), elli-sep- (second row), cigtab-sep- (third row) and elli-two in D with kernels. The first column represents the convergence gap. The second column is the ratio of non-dominated points among the kernels incumbents (red) and the quartiles of the ratios of non-dominated points among each kernel’s incumbent and its offspring (median in blue and the remaining quartiles in green). And the last three columns are square root eigenspectra of uniform randomly chosen among the kernels’ covariance matrices.

We use two performance measurements in each run of an algorithm. First the convergence gap defined as the difference between an offset called hv_max and the hypervolume of the points found by the algorithm (in case of COMO-CMA-ES or of the population for the other algorithms tested) called hv; and second the archive gap defined as the difference between an offset called hvarchive_max and the hypervolume of all non-dominated points found by the algorithm called hvarchive. The setting of hv_max is done for each problem as the maximum hypervolume value of kernels found so far anytime the problem was optimized in our machines, plus a small number (). For the Sep- and the One problems, we take hvarchive_max as which corresponds to the hypervolume of the theoretical Pareto front. For the two-class of problems, we use the analytic expression of their Pareto set  (Toure et al., 2019) to sample a large number of points on the Pareto set, and compute their hypervolume as hvarchive_max. Thus for the elli-two problem in dimension , we sample points.

5.2. Linear convergence of COMO-CMA-ES

We investigate the convergence of COMO-CMA-ES for different dimensions and number of kernels, and display the results on the sphere-sep-, elli-sep-, cigtab-sep- and elli-two functions for and . The global initial step-size is set to and the initial lower, upper bounds (line 8 of Algorithm 2) respectively to , . In Figure 3, we observe linear convergence in the convergence gap (first column) on all test functions, starting roughly when all displayed ratios of non-dominated points reach (second column). The last three columns of Figure 3 illustrate the eigenspectra of the kernels covariance matrices. The first two columns reveal two phases.

First, the kernels incumbents approach the non-dominated region: for sphere-sep- this takes about evaluations per kernel, for elli-sep-, cigtab-sep- and elli-two it takes about , and evaluations per kernel. Afterwards, the convergence gap converges linearly. In our settings, there are evaluations per kernel during the update of a kernel, thus for the *- functions (which have the same Pareto set and front), the linear convergence rate is about and for elli-two, it is about .

For the first function evaluations per kernel on elli-sep-, there is no point dominating the reference point, which means that the algorithm started far from the Pareto front. Looking at elli-two, we confirm that it has a different Pareto front than the three other problems: (instead of ).

The Uncrowded Hypervolume Improvement depends on other kernels’ incumbents and therefore changes in each iteration. Yet, the last three columns are similar to what one would observe when optimizing a single objective convex-quadratic function with corresponding Hessian matrix. After a large enough number of iterations, the probability that the incumbents and their offspring are in the Pareto set becomes close to . Then if the incumbents is a subset of the Pareto set and is non-dominated, Eq (6) becomes: . Its Hessian on smooth bi-objective problems is . For our test functions, it is a linear combination of the single objectives Hessian matrices, up to a rank-one matrix and its transpose (the gradients are colinear on the Pareto set of bi-objective convex quadratic problems (Toure et al., 2019)). That might give a glimpse on the behaviour seen in the last three columns of Figure 3.

Figure 4. Convergence gaps (odd columns) and archive gaps (even columns) for bi-sphere, elli-one, cigtab-sep- and elli-two. Each algorithm is run times, in D or D, with or kernels. The random matrices are drawn from the same seed in all the algorithms.

5.3. Comparing COMO-CMA-ES  with MO-CMA-ES, NSGA-II and SMS-EMOA

We compare four multiobjective algorithms: COMO-CMA-ES, MO-CMA-ES (Igel et al., 2007), NSGA-II (Deb et al., 2002) and SMS-EMOA (Beume et al., 2007), by testing them on classes of bi-objective convex-quadratic problems. We draw once and for all one rotation for elli-one in and two different rotations for elli-two in . The Simulated Binary Crossover operator (SBX) and the polynomial mutation are used for NSGA-II (run with the evoalgos package (Wessing, 2017)) and SMS-EMOA (run with the Matlab version by Fabian Kretzschmar and Tobias Wagner (Wagner and Trautmann, 2010)): we use a crossover probability of and a mutation probability of , and the distribution indexes for crossover and mutation operators are both equal to . We use the version of MO-CMA-ES from (Voßet al., 2010). The number of kernels for COMO-CMA-ES corresponds to the population size of the other algorithms, that we set to either or , and the dimensions considered are and . The global initial step-size of COMO-CMA-ES is set to with initial lower, upper bounds (line 8 of Algorithm 2) set to the all-zeros and all-ones vectors. The initial population for the three other algorithms is sampled uniformly at random in .

We run each multiobjective optimization times and display the convergence gap (of the population or the incumbent solutions of the kernels) and the archive gap.

In Figure 4, the values of the convergence gap reached by COMO-CMA-ES and MO-CMA-ES are several orders of magnitude lower than for the two other algorithms. On the 5-dimensional bi-sphere, COMO-CMA-ES and MO-CMA-ES appear to show linear convergence, where the latter appears to be about 30% faster than the former. On the cigtab-sep- function, COMO-CMA-ES is initially slow, but catches up after about evaluations per kernel. In all other cases, COMO-CMA-ES  shows superior performance for the convergence gap. On the 10-dimensional cigtab-sep-, COMO-CMA-ES shows a plateau between 2000 and 4000 evaluations per kernel. This kind of plateau cannot be observed in the MO-CMA-ES and the observed final convergence speed is better for COMO-CMA-ES than for MO-CMA-ES. The observed plateau is typical for the behavior of non-elitist multi-recombinative CMA-ES on the tablet function, because CSA barely reduces an initially large step-size before the tablet-shape has been adapted, which is related to the neutral subspace defect found in (Krause et al., 2017). Elitism as in the MO-CMA-ES, on the other hand, also helps to decrease an initially too large step-size.

Although COMO-CMA-ES was not designed to perform well on the archive gap, it shows consistently the best results over all experiments. Only on the cigtab-sep- in with kernels, NSGA-II reaches and slightly surpasses the archive gap of COMO-CMA-ES after function evaluations per kernel. This suggests, as expected from the known dependency between optimal step-size and population size (Hansen et al., 2015), that COMO-CMA-ES adds valuable diversity while approaching the optimal -distribution of the Pareto front at the same time.

6. Conclusions

We have proposed (i) the Sofomore framework to define multiobjective optimizers from single-objective ones, (ii) a fitness for dominated solutions to be the distance to the empirical Pareto front (Uncrowded Hypervolume Improvement UHVI) and (iii) the non-elitist ”comma” CMA-ES to instantiate the framework (COMO-CMA-ES). We observe that COMO-CMA-ES converges linearly towards the -optimal distribution of the hypervolume indicator on several bi-objective convex quadratic problems. The COMO-CMA-ES appears to be robust to independently rotating the Hessian matrices of convex-quadratic problems, even if such rotations transform the Pareto set from a line segment to a bent curve. In our limited experiments, COMO-CMA-ES performed generally better than MO-CMA-ES, SMS-EMOA and the NSGA-II, w.r.t. convergence gap and archive gap while COMO-CMA-ES was solely designed to optimize the convergence gap. We conjecture that the advantage on the archive gap is due to (i) the large stationary variance obtained with non-elitist evolution strategies and (ii) the fitness assignment of dominated solutions which favors the vacant (uncrowded) space between non-dominated solutions and hence serves as implicit crowding distance penalty measure.

Acknowledgements

Part of this research has been conducted in the context of a research collaboration between Storengy and Inria. We particularly thank F. Huguet and A. Lange from Storengy for their strong support, practical ideas and expertise.

References

  • (1)
  • Auger et al. (2009) Anne Auger, Johannes Bader, Dimo Brockhoff, and Eckart Zitzler. 2009. Theory of the hypervolume indicator: optimal -distributions and the choice of the reference point. In Foundations of Genetic Algorithms (FOGA 2009). ACM, Orlando, Florida, USA, 87–102.
  • Auger et al. (2012) Anne Auger, Johannes Bader, Dimo Brockhoff, and Eckart Zitzler. 2012. Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications. Theoretical Computer Science 425 (2012), 75–103.
  • Berghammer et al. (2010) R. Berghammer, T. Friedrich, and F. Neumann. 2010. Set-based Multi-objective Optimization, Indicators, and Deteriorative Cycles. In Genetic and Evolutionary Computation Conference (GECCO 2010). ACM, Portland, Oregon, 495–502. https://doi.org/10.1145/1830483.1830574
  • Beume et al. (2007) N. Beume, B. Naujoks, and M. Emmerich. 2007. SMS-EMOA: Multiobjective Selection Based on Dominated Hypervolume. European Journal of Operational Research 181, 3 (2007), 1653–1669.
  • Bringmann and Friedrich (2011) Karl Bringmann and Tobias Friedrich. 2011. Convergence of hypervolume-based archiving algorithms I: Effectiveness. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 745–752.
  • Bubeck and Cesa-Bianchi (2012) Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5, 1 (2012), 1–122.
  • Collette et al. (2010) Yann Collette, Nikolaus Hansen, Gilles Pujol, Daniel Salazar Aponte, and Rodolphe Le Riche. 2010. On Object-Oriented Programming of Optimizers - Examples in Scilab. In Multidisciplinary Design Optimization in Computational Mechanics, Rajan Filomeno Coelho and Piotr Breitkopf (Eds.). Wiley, New Jersey, 499–538. https://hal.inria.fr/inria-00476172
  • Deb et al. (2002) K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197.
  • Emmerich et al. (2005) Michael Emmerich, Nicola Beume, and Boris Naujoks. 2005. An EMO algorithm using the hypervolume measure as selection criterion. In International Conference on Evolutionary Multi-Criterion Optimization. Springer, Guanajuato, Mexico, 62–76.
  • Emmerich and Klinkenberg (2008) Michael Emmerich and Jan-willem Klinkenberg. 2008. The computation of the expected improvement in dominated hypervolume of Pareto front approximations. Technical Report 4-2008. Leiden Institute of Advanced Computer Science, LIACS.
  • Goldberg (1989) D. E. Goldberg. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Massachusetts.
  • Hansen and Jaszkiewicz (1998) M. P. Hansen and A. Jaszkiewicz. 1998. Evaluating The Quality of Approximations of the Non-Dominated Set. Technical Report. Institute of Mathematical Modeling, Technical University of Denmark. IMM Technical Report IMM-REP-1998-7.
  • Hansen et al. (2019) Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634. (Feb. 2019). https://doi.org/10.5281/zenodo.2559634
  • Hansen et al. (2015) Nikolaus Hansen, Dirk V Arnold, and Anne Auger. 2015. Evolution strategies. In Springer handbook of computational intelligence. Springer, Berlin, 871–898.
  • Hansen and Ostermeier (2001) N. Hansen and A. Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation 9, 2 (2001), 159–195.
  • Hernandez et al. (2018) VAS Hernandez, O Schutze, H Wang, A Deutz, and M Emmerich. 2018. The Set-Based Hypervolume Newton Method for Bi-Objective Optimization. IEEE transactions on cybernetics in print (2018). (in print).
  • Igel et al. (2007) C. Igel, N. Hansen, and S. Roth. 2007. Covariance matrix adaptation for multi-objective optimization. Evolutionary Computation 15, 1 (2007), 1–28.
  • Keane (2006) Andy J. Keane. 2006. Statistical improvement criteria for use in multiobjective design optimization. AIAA journal 44, 4 (2006), 879–891.
  • Knowles et al. (2006) J. Knowles, L. Thiele, and E. Zitzler. 2006. A Tutorial on the Performance Assessment of Stochastic Multiobjective Optimizers. TIK Report 214. Computer Engineering and Networks Laboratory (TIK), ETH Zurich.
  • Krause et al. (2017) Oswin Krause, Tobias Glasmachers, and Christian Igel. 2017. Qualitative and quantitative assessment of step size adaptation rules. In Proceedings of the 14th ACM/SIGEVO Conference on Foundations of Genetic Algorithms. ACM, Copenhagen, Denmark, 139–148.
  • Miettinen (1999) K. Miettinen. 1999. Nonlinear Multiobjective Optimization. Kluwer, Boston, MA, USA.
  • Ponweiser et al. (2008) Wolfgang Ponweiser, Tobias Wagner, Dirk Biermann, and Markus Vincze. 2008. Multiobjective Optimization on a Limited Budget of Evaluations Using Model-Assisted -Metric Selection. In Parallel Problem Solving from Nature (PPSN 2008). Springer, Dortmund, Germany, 784–794.
  • Toure et al. (2019) Cheikh Toure, Anne Auger, Dimo Brockhoff, and Nikolaus Hansen. 2019. On Bi-Objective convex-quadratic problems. In International Conference on Evolutionary Multi-Criterion Optimization. Springer, Lansing, Michigan, USA, 3–14.
  • Voßet al. (2010) T. Voß, N. Hansen, and C. Igel. 2010. Improved Step Size Adaptation for the MO-CMA-ES. In Genetic and Evolutionary Computation Conference (GECCO 2010), J. Branke et al. (Eds.). ACM, Portland, OR, USA, 487–494.
  • Wagner et al. (2010) Tobias Wagner, Michael Emmerich, André Deutz, and Wolfgang Ponweiser. 2010. On expected-improvement criteria for model-based multi-objective optimization. In International Conference on Parallel Problem Solving from Nature. Springer, Krakow, Poland, 718–727.
  • Wagner and Trautmann (2010) Tobias Wagner and Heike Trautmann. 2010. Online convergence detection for evolutionary multi-objective algorithms revisited. In IEEE Congress on Evolutionary Computation. IEEE, Barcelona, Spain, 1–8.
  • Wessing (2017) Simon Wessing. 2017. evoalgos: Modular evolutionary algorithms. Python package version 1. (2017). https://pypi.python.org/pypi/evoalgos [Online; accessed 31-January-2019].
  • Yang et al. (2019) Kaifeng Yang, Michael Emmerich, André Deutz, and Thomas Bäck. 2019. Multi-Objective Bayesian Global Optimization using expected hypervolume improvement gradient. Swarm and evolutionary computation 44 (2019), 945–956.
  • Zhang and Li (2007) Q. Zhang and H. Li. 2007. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Transactions on Evolutionary Computation 11, 6 (2007), 712–731. https://doi.org/10.1109/TEVC.2007.892759
  • Zitzler and Künzli (2004) Eckart Zitzler and Simon Künzli. 2004. Indicator-based selection in multiobjective search. In International Conference on Parallel Problem Solving from Nature. Springer, Birmingham, UK, 832–842.
  • Zitzler and Thiele (1998a) E. Zitzler and L. Thiele. 1998a. Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study. In Conference on Parallel Problem Solving from Nature (PPSN V) (LNCS), Vol. 1498. Springer, Amsterdam, The Netherlands, 292–301.
  • Zitzler and Thiele (1998b) Eckart Zitzler and Lothar Thiele. 1998b. Multiobjective optimization using evolutionary algorithms - A comparative case study. In International conference on parallel problem solving from nature. Springer, Amsterdam, The Netherlands, 292–301.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
354693
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description