Benchmarking Evolutionary Algorithms For Realvalued Constrained Optimization – A Critical Review –
Abstract
Benchmarking plays an important role in the development of novel search algorithms as well as for the assessment and comparison of contemporary algorithmic ideas. This paper presents common principles that need to be taken into account when considering benchmarking problems for constrained optimization. Current benchmark environments for testing Evolutionary Algorithms are reviewed in the light of these principles. Along with this line, the reader is provided with an overview of the available problem domains in the field of constrained benchmarking. Hence, the review supports algorithms developers with information about the merits and demerits of the available frameworks.
keywords:
Benchmarking, Constrained Optimization, Evolutionary Algorithms, Continuous Optimizationtabular \makesavenoteenvtable
1 Introduction
Representing a subclass of derivativefree, natureinspired methods for optimization, Evolutionary Algorithms (EA) provide powerful optimization tools for, but not restricted to, blackbox or simulationbased optimization problems. That is, for problems where the analytical structure of the optimization problem is unknown by default. EA applications to such problems can be found in the fields of operations research, engineering, or machine learning michalewicz1996evolutionary (); Oduguwa2005 (); Zhang2011 (); collange2010multidisciplinary (); mora2015applications ().
Due to the lack of theoretical performance results for optimization tasks of notable complexity, the development and the performance comparison of EA widely rely on benchmarking. First and foremost, benchmarking experiments are established for performance evaluation and algorithm comparison on given problem classes. Ideally, this is supposed to support the selection of the algorithm best suitable for given realworld applications Mersmann2015 (). Yet, benchmarks can also be used to experimentally provide insight into the working principles of an algorithm (although, usually purposebuilt experiments have to be conducted in addition) and foster the development of algorithms for specific problem branches. Furthermore, benchmarks may qualify to verify theoretical predictions of the algorithm behavior Whitley1996 (); rardin2001experimental ().
Currently, there are basically two benchmarking environments in use, the test environments provided in the IEEE Congress on Evolutionary Computation (CEC) competitions and the Comparing Continuous Optimizer (COCO) benchmark suite.
The COCO suite hansen2016coco () represents the most elaborated platform for benchmarking and comparing continuous optimizers for numerical (nonlinear) optimization. The COCO framework advanced from the Blackbox Optimization Benchmarking (BBOB) 2009 benchmark set HansenAFR2009 (); hansen:inria00362633 (). The platform provides tools to ease the process of quantifying and comparing the performance of optimization algorithms for singleobjective noiseless and noisy problems, and for biobjective noiseless problems, respectively. A particular strength of the COCO platform is the large number of algorithm results available for comparison. Up to now, 231 distinct (classical as well as contemporary) algorithms have been tested on the COCO test beds.
Alternatively, the competitions that are organized on a yearly basis during the CEC aim at the comparison of stateoftheart stochastic search algorithms. These competitions, among others, include single objective, largescale, noisy, multiobjective, and constrained optimization problems, respectively. The CEC competitions provide specific test environments for algorithm assessment and comparison. The test function environments made available in this context turned out very popular for benchmarking Evolutionary Algorithms (EA).
Being commonly recognized as successful optimization strategies in the context of unconstrained optimization, the application of EA to constrained optimization problems has gained the attention of the research community in recent years. Constrained optimization tasks are concerned with searching for the optimal solution of an objective function with respect to limitations on the search space parameter vector. In many real world applications, constraints result from physical boundaries on the input data, from considering problem specific tradeoffs, or from limiting the resources of a problem. Regardless of the sources, the introduction of constraints increases the complexity of an optimization task. This is particularly true in the context of blackbox and simulationbased optimization. Refer to MEZURAMONTES2011 () for a survey on commonly used constraint handling approaches in the context of natureinspired algorithms. Taking into account constrained optimization problems, the theoretical background of EA is even less developed. Hence, usage of benchmarks for performance assessment and algorithm development is essential.
Regarding EA benchmarks for constrained optimization, the CEC competitions on constrained realparameter optimization CEC2006 (); CEC2010 (); CEC2017 () (organized in 2006, 2010, and 2017) introduced specific constrained test environments. The constrained test functions included in the CEC 2006 benchmark definitions were collected from MichalewiczS1996 (); himmelblau1972applied (); floudas1999handbook (); Xia96 (); epperly1995global (). The following competitions refined some benchmark definitions and introduced new problem instances. To this end, the testcase generator developed in Michalewicz2000 () was called on. The respective paper introduces a method to generate test problems with varying features, e.g. with respect to the problem size, the size of the feasible region, or the number and the type of the constraints. Benchmark problems created by the testcase generator are included into the CEC2010 and CEC2017 competition on constrained realparameter optimization. Until today, the CEC function sets are most frequently used for benchmarking contemporary Evolutionary Algorithms in the context of constrained optimization. Refer to Sec. 4 for the detailed review of the CEC benchmark sets.
Only recently, the development of a COCO branch for constrained blackbox optimization benchmark (BBOB) problems is near completion CocoCode ()
An overview of additional problem collections is available at NeumaierGlobalOpt ().
Each of these test problem sets is useful for demonstrations of the applicability of interesting algorithmic ideas. However, the presented problems are mainly related to the field of mathematical programming.
They are provided in different mathematical modeling systems like GAMS
Similar concerns apply to many realworld problem applications that are present in literature rardin2001experimental (). They usually come with limited reproducibility and comparability of the results reported, e.g. due to unavailable data or implicit modeling assumptions. Consequently, these studies can rather be thought of as a demonstration of the applicability of a certain algorithm in a particular context than a proof of its superiority. The focus is more on the algorithm output than on the algorithm efficiency johnson2002 ().
The main goal of the present paper is to provide a critical review of stateoftheart benchmarking environments that can be used for assessing and comparing Evolutionary Algorithms in the context of constrained realvalued optimization. To this end, existing benchmark principles for constrained optimization are collected and complemented with insights from the context of other benchmarking fields and experimentation. Current benchmarking environments are surveyed in the light of these rationales. Note that the classification of the vast number of solitary test problems NeumaierGlobalOpt () is not considered the purpose of this review. Instead, the focus is on the most elaborated constrained benchmarking environments for Evolutionary Algorithms, i.e. the constrained test environments established for the CEC competitions as well as the BBOBconstrained suite. This way, the article may raise awareness of recent benchmarking techniques as well as their corresponding strengths and their incapabilities. By suggesting room for improvements with respect to framework definitions, experimentation principles, and reporting styles, the present paper aims at stimulating the debate on benchmarking principles for constrained realvalued optimization.
Such a discussion seems necessary as the field of constrained optimization is spacious and the available benchmarking approaches are comparably scarce. Although some investigations exist mezura2004makes (), it is by no means conclusively determined which features are making a constrained optimization problem hard even for a single algorithm subclass. Constrained realvalued optimization problems may differ with respect to the following features (and their combinations), including but not necessarily restricted to,

the number of constraints,

the type of the constraints (refer to Sec. 2),

the analytical structure of objective function and constraints, e.g.

conditioning of the problem

modality of the objective function

ruggedness of the objective function

(non)linearity of the constraints

separability of the objective function and/or constraints

number of global optima (inside the feasible region)


the size of the search space,

the relative size of the feasible region in the search space,

the connectedness of the feasible region,

the orientation of the feasible region within the search space,

the location of the global optimum on the boundary or aside.
As there certainly is no such thing as free lunch NFL1997 (), and as the EA development for constraint optimization tasks will further rely on the availability of suitable benchmarks, the need of benchmark definitions that take into account consistent subgroups of conceivable problems is beyond dispute. The testcase generator introduced in Michalewicz2000 () or test problem collections like CUTEst2018 (), can be regarded as a meaningful step towards creating well structured problem groups of distinct characteristics. However, these problems and their reported solutions are commonly not scalable with respect to the problem dimensionality. Further, the issue of proposing a welldefined benchmarking framework as well as providing coherent reporting and ranking principles for metaheuristic algorithms is not in the scope of these test suites.
The remainder of this paper is organized as follows: Section 2 introduces the general realvalued constrained optimization problem (COP), particularly with regard to a classification of the constraint functions commonly used for benchmarking EA on blackbox problems. Section 3 presents benchmarking principles appropriate for the comparison of constrained optimization benchmarks. Afterward, the design of the CEC test function sets for constrained optimization and the COCO BBOBconstrained suite are presented in greater detail in Sec. 4, and Sec. 5, respectively. Both sections particularly expand on the proirly motivated principles. A discussion of the recent benchmarking principles for EA on constrained optimization problems concludes the paper in Sec. 6.
2 Problem formulation
The present paper focuses on continuous optimization problems. That is, both the objective function and the constraint functions are assumed to be realvalued functions. The objective function might either be represented as a reward or as a cost function. While the former calls for maximization, a cost function representation needs to be minimized. Some collections of benchmark functions may even use both representations, but this paper without loss of generality focuses on minimization problems.
The constraint functions fall into even more classes. A detailed taxonomy of constraints is provided in LedWild2015 (). The paper subdivides constraints into nine distinct constraint classes which rely on the categorization according to the following features.
 Non/Quantifiable

A constraint is said to be quantifiable if its degree of feasibility and/or constraint violation can be determined. Otherwise, the constraint is denoted nonquantifiable.
 Un/Relaxable

Unrelaxable constraints define conditions for the parameter vectors that are required to be satisfied to obtain meaningful outputs from either objective functions or simulations. In contrast, relaxable constraints represent desired conditions which do not have to be satisfied at each stage of the optimization process.
 A priori/Simulation

In case that the feasibility of a constraint can be evaluated directly, it is referred to as a priori constraint. Constraints that require running a simulation to verify its feasibility is denoted simulation constraint.
 Known/Hidden

While hidden constraints are unknown to a solver, known constraints are explicitly stated in the problem formulation and thus available to the solver. Notice, hidden constraints are distinctive of simulationbased optimization problems. They are nonquantifiable and unrelaxable by definition.
All combinations of the above categories are reasonable for the definition of problem instances for a constrained benchmark problem. However, the benchmark suites considered in this paper are usually dedicated to known, apriori, and quantifiable constraints. Whether some constraints are relaxable or unrelaxable is depending on the respective benchmark definitions. Note that, by interpreting all constraints as unrelaxable, the problem instances would become considerably harder for EA to satisfy. That is, suitable algorithms would have to be equipped with a sophisticated repair technique that allows generating usable candidate solutions in every situation.
The realvalued constrained optimization problems (COP) considered in this report have the general representation
(COP)  
In this context, denotes the dimensional search space parameter vector. The set usually comprises a number of boxconstraints specifying reasonable intervals of the parameter vector components, i.e.
(1) 
where the vector consists of the componentwise lower bounds, and the vector of upper bounds, respectively. Note that, is understood as the componentwise less than or equal inequality. The set is also referred to as box of problem (COP).
The feasible region of the search space is additionally restricted by realvalued constraint functions. These constraint functions are separated into inequality constraints and equality constraints . A vector that satisfies all constraints is called feasible. The set of all feasible parameter vectors is referred to as
(2) 
The global optimum of (COP) is denoted by .
Note that the objective function subject to some constraints is also referred to as constrained function.
Multiple representations of one specific constrained function that are subject to small variations are denoted as instances of that respective constrained function. Such variations involve, for example, the orientation of the feasible region, negligible change in the size of the feasible region or the location of the optimum. Contrary, (COP) instances are similar with respect to the objective functions as well as the number and the analytical type of the constraint functions.
The boxconstraints which impose restrictions on the parameter vector components are usually considered unrelaxable. On the contrary, inequality and equality constraints are considered relaxable insofar as the constrained functions can be evaluated for infeasible parameter vectors and such infeasible candidate solutions may also be employed in the search process.
The parameter denotes the size of the feasible region relative to the box size
(3) 
It can be estimated by uniformly sampling a sufficiently large number of candidate solutions inside the set and by counting the feasible candidate solutions among these, as suggested in Koziel1999 ().
Considering problem (COP), evolutionary algorithms employ a measure of infeasibility to guide the search process into feasible regions of the search space. The constraint violation of a candidate solution is usually specified as
(4) 
Multiple definitions of the constraint violation measure can be found in the literature, and the choice of which definition to use is essentially left to the search algorithm. EA commonly use to create penalty functions, to derive appropriate repair terms, or to rank infeasible candidate solutions. A popular method to calculate the constraint violation of the parameter vector is
(5) 
with functions and defined by
(6) 
and
(7) 
In contrast to classical deterministic solvers, equality constraints cause real difficulties for metaheuristics like EAs. In order to enable EA to satisfy the equality constraints at least up to a fair degree, Eq. (7) introduces the error margin . Hence, parameter vectors that realize smaller deviations than are considered feasible. The explicit choice of may differ with each benchmark specification, see Sec. 4 and Sec. 5.
Having obtained a notion of feasibility and infeasibility of candidate solutions allows for the introduction of a corresponding order relation. Such order relations permit the comparison of both feasible and infeasible candidate solutions DEB2000 (). A commonly used order relation in the field of constrained optimization is the lexicographic ordering which is defined in a very intuitive way. Two solutions are compared at a time according to the following criteria:

Any feasible solution is preferred to an infeasible solution.

Among two feasible solutions, the one having the better objective function value is considered superior.

Two infeasible solutions are ranked according to their constraint violation value (the lower the better).
In mathematical form, this order relation reads
(8) 
3 Principles for EA benchmarks on constrained optimization tasks
Having introduced the general problem formulation in Sec. 2, this section is concerned with the collection of requirements and preferable features that have to be taken into account when creating a credible benchmark problem (or even framework) for constrained optimization. To this end, the already established principles used in current benchmark sets for EA are considered. Additional thoughts with respect to benchmarking guidelines Matott2012 (), experimental rigor johnson2002 (), and the presentation style MoreWild2009 () of results obtained are appended.
The section is divided into three parts: the fundamental principles of the test environment, the design of adequate experiments, and the reporting of test results. Overlaps of these concepts cannot entirely be avoided.
In many cases, it is not possible to give a final recommendation of the best practice. Hence, it is not within the scope of this article to provide definitive answers to these questions, but rather to create categories that allow a comparative study of distinct benchmark environments. Ultimately, the benchmark suites can be designed with respect to various aspects of a given problem domain. It is the responsibility of the authors to secure the compliance of tested algorithms with these predefined benchmark principles.
Taking into account publications that report on benchmarking results, ignorance of some of these principles is frequently observed.
3.1 Fundamental principles of a test environment
Each set of benchmark problems should ensure reproducibility of the results obtained by a specific algorithm as well as the comparability of outcome generated by other strategies. Accordingly, this subsection proposes guidelines necessary to provide a common basis for algorithm benchmarking on constrained optimization environments.
Problem domain and documentation
A benchmark suite that covers all conceivable features of constrained optimization problems and their combinations appears unmanageable. Hence, it is recommended that a benchmark design systematically focuses on a specific problem subdomain instead of collecting a vast amount of arbitrary problem definitions.
Welldeveloped benchmarking environments are supposed to guide the user through the benchmarking process. Users should receive clear instructions regarding the correct use of the benchmark environment, its working principles, the related benchmarking conventions, and the required reporting style. This calls for a clear definition and documentation of the corresponding way of proceeding.
Problem publicity
It is to some degree necessary to decide whether the analytical description of a single problem instance is openly available or whether it is generated at random. The first case allows the user to obtain a notion of the problem complexity. Further, it facilitates the incorporation of realworld problems into the test problem collection. On the other hand, fixed problem statements in analytical form embrace the possibility of handtuning algorithm parameters for specific constrained problems or even cheating by exploiting analytical information.
Such issues can be partly circumvented by generating individual instances of a fixed constrained optimization problem at random. This involves the implementation of an elaborated testcase generator. Due to the complexity of instantiation of realworld applications, this comes with the need for designing suitable artificial test problems. According to Matott2012 (), the user should not at all be involved in the evaluation of the constrained function. To this end, the benchmark collection would need to provide an easily and freely accessible software environment that offers welldefined input/output specifications. The availability of interfaces to multiple programming languages would additionally support the usability of such a benchmark suite.
Function evaluations
It is imperative to provide a clear policy of how to count objective function evaluations and constraint evaluations, respectively. A first option is to interpret the evaluation of the whole constrained function, i.e. the evaluation of the objective function as well as all related constraints, as one single function evaluation. This is essentially equal to just counting objective function evaluations. Another possibility is to count the objective function evaluations as well as the constraint evaluations separately. In this case, the question remains whether to think of the constraints as a single vectorvalued function that returns all constraint values at a single evaluation, or as multiple realvalued functions that account for even more function evaluations. Distinguishing between inequality and equality constraints may also represent an option. More accurate counting may result in improved explanatory power, e.g. the separation of objective function and constraint evaluations allows to draw conclusions about the number of constraints inside a blackbox constrained function. Further, the effort of algorithms that repeatedly evaluate only the constraints, e.g. to perform a repair step, can be monitored more accurately in this case.
Boxconstraints
A recommendation for the treatment of boxconstraints needs to be stated to ensure reproducibility and comparability of the algorithm results.
According to LiaoMMS2014 (), its absence may have significant implications on the comparability of algorithm results. In the respective paper, it was pointed out that different boxconstraint handling interpretations can produce dissimilar outcomes even for a single algorithm. The study distinguished three boxconstraint scenarios:

unrelaxable boxconstraints,

relaxable boxconstraints, and

no box constraints at all.
While scenario (S3) is selfexplanatory, the boxconstraints are defined and enforced at any stage of the search process in situation (S1). Candidate solutions outside the box are considered invalid and thus have to be repaired or discarded. In case of (S2), boxconstraints are specified, but only enforced for the final candidate solutions. That is, infeasible candidate solutions outside the box may be used to drive the search. It was shown in LiaoMMS2014 () that algorithms were sometimes able to find solutions of better quality when facing situation (S2) or (S3) instead of (S1), and even if the global optimizer was not located on the boundary of the specified box .
In order to avoid inconsistencies, various options come to mind. First, the boxconstrained treatment can be completely eliminated if the admissible intervals of the parameter vector components are directly included in the inequality constraints . In case of one specific lower and upper bound for each parameter vector component, the number of inequality constraints increases by . Regarding high dimensional problems, one can think of situations where this can potentially blow the problem complexity out of proportion. Inducing that most algorithms would have to be adapted, this approach would limit the usability of such a benchmark problem. However, the least invasive option is giving permission to apply the individual boxconstraint handling techniques of choice. This clearly comes with the need for a proper reporting of its corresponding modus operandi.
3.2 Experimental design
The experimental design of a benchmark testbed is supposed to properly reflect the characteristics of the chosen problem (sub)domain. This requires the unambiguous description of the constrained test problems, initialization practices, as well as appropriate quality indicators. The benchmark problems are expected to be efficiently implemented in order to speed up the experiments. Moreover, the following subjects have to be adopted in the design process.
Initialization
Differences with respect to the initialization parameter vectors are present. These have varying implications on the applicability of certain optimizers. A benchmark problem might either provide a feasible initial candidate solution, supply a subset of not necessarily feasible parameter vectors (e.g. by specifying unrelaxable boxconstraints), or give no assistance at all. In case that no feasible solution is given, algorithms that rely on initially feasible solutions essentially have to priorly solve a constraint satisfaction problem before the original constrained optimization problem is tackled. This can significantly impair their performance and would complicate the comparability of such approaches with strategies that do not assume the existence of a feasible solution.
Precision
Considering randomized algorithms, a test environment needs to make assumptions on the termination precision and reasonable error margins for constraint satisfaction. The latter is particularly important in the context of equality constraints because it is otherwise highly improbable to find feasible candidate solutions. Further, a statement on the required precision of reported statistics appears necessary to ensure an appropriate ranking of two distinct algorithms on a single constrained function. For example, assuming two algorithms A and B both reliably approach the optimal objective function value of zero on the same constrained function. While A realizes a mean function value of in multiple, independent runs, B achieves a mean value of . Ranking algorithm B better than A based only on the observed mean values is quite questionable in this scenario. Considering precisions below the floating point accuracy also appears misguided.
Actually, although it is commonly done, the consideration of relative precisions (or absolute precisions in the case of ) of order or even smaller does not always reflect the needs of realworld optimization problems. That is, at some point of the search process the effort to realize very small improvements might be expendable from a practical point of view.
Constrained problems
A sufficient number of profound constrained optimization problems suitable to represent the chosen problem domain need to be appointed. The problems might either be automatically generated or collected from test problem collections. Each problem needs to be specified in the manner of (COP). That is, objective function, constraint functions, and boxconstraints have to be welldefined. In case that this information is not made public, a blackbox framework has to be developed that supplies the objective function value and at least an indicator of constraint satisfaction (or violation) to the solver.
Taking into account that current algorithms have to deal with continuously increasing problem complexity, the constrained functions are ideally designed in a scalable fashion Whitley1996 (). Scalability with respect to the search space dimension, and also the number of constraint functions, permits an understanding of the inherent problem complexity. It further allows assessing these factors of influence on the algorithm performance. In this regard, the creation of artificial test problems represents a much easier way to generate constrained test functions. On the downside, such test problems are usually easier to solve than realworld problems. However, realworld problems are hardly scalable as they often state a purposebuilt mathematical representation of a certain application. Modifications in terms of dimension or constraint numbers may result in a change of the problem structure. Further, the design of constrained test functions should incorporate characteristics that are commonly observed in realworld situations. This way, algorithmic ideas that proved themselves successful on the benchmark suite can be transferred to corresponding realworld applications with partly similar characteristics.
Building clusters of constrained problems with similar features facilitates insight into the algorithm performance on each of the problem subgroups. It further supports the decision whether an algorithmic idea is useful when dealing with specific realworld applications of a certain characteristic Mersmann2015 (). For example, regarding a practical application that involves satisfying a great number of constraints, algorithms that have been observed to perform well on test problem subgroups with similar features are of interest. These are usually expected to be better suited than the collectively best algorithm which ultimately might represent a compromise over all benchmark problems.
Moreover, the design of problem instances preferably should exclude biases towards certain algorithm classes. To this end, problem formulations aligned in the Cartesian axes should be avoided. Further, problems whose optimum is located on the boundary of the box may exhibit the tendency to favor EA that use specific boxconstraint handling techniques. Such issues may be bypassed by considering different instances of a problem, e.g. by introducing small modifications with respect to the orientation of the feasible region or the location of the optimum (see Sec. 2). The creation of new instances is usually simpler for theoretically derived constrained functions. Realworld problems determined by specific application cases usually have a rather rigid formal representation without any information about the optimum.
Order relation
Regardless of the working principles used within an algorithm, a benchmark environment that compares algorithms based on solution quality needs to define a consistent order relation for ranking candidate solutions provided by an algorithm.
Quality indicators
Multiple aspects of algorithm performance have to be covered by the experimental design Matott2012 (); johnson2002 (). The benchmark environment has to use a number of welldefined quality indicators that are computed in the experiments. The quality indicators reflect the suitability of a respective algorithm for a specific constrained function, a subgroup of constrained function, and the whole problem collection. Moreover, the quality indicators build the basis for algorithm comparison. That is, the benchmarks essentially need to introduce measures of effectiveness, efficiency and variability. A high effectiveness of an algorithm refers to its ability to realize solutions close the bestknown or optimal solution of a problem. On the other hand, an efficiency measure accounts for the number of resources (e.g. function evaluations or time) consumed for computing highquality solutions. Further, a measure of variability quantifies the reliability of an algorithm to realize equally good candidate solutions in multiple independent runs. There exist multiple ways to define such indicators. Hence, it is left to the benchmark designers to choose the most appropriate measures of algorithm performance for the corresponding problems.
To obtain the quantity of variability, benchmarking of randomized algorithms involves running multiple independent algorithmic runs on the same problem instances. The appropriate number of repetitions is connected to the choice of quality indicators MoreWild2009 (). In order to obtain reasonable statistics a minimum number of 10 to 25 algorithm runs is usually recommended.
Termination
A benchmark collection might determine strict rules on the termination conditions for participating algorithms, e.g. a fixed budget of function evaluations. Another approach would be to set multiple targets for an optimization strategy. Termination takes place after hitting the final target. By measuring the number of functions evaluations needed to reach a specified target a notion of algorithm speed can additionally be established. However, introducing targets assumes knowledge about the optimal function values of the constrained problems.
3.3 Reporting
This section takes into account useful principles that support a reproducible and comprehensible presentation of obtained algorithm results. Further, it is concerned with the aspect of algorithm comparison and mentions the need of encouraging algorithm developers to thoroughly report algorithmic details.
Newsworthiness and presentation
To ensure that meaningful results are generated, the benchmarking environment can support the user by providing a performance baseline. Such a baseline may represent performance results obtained by application of comparable algorithms for constrained optimization. If a collection of algorithm results is not present, even the performance results of random search can be considered useful. Such information is necessary to realize whether the benchmarked algorithm is, in fact, superior for a number of problems. This way, publications with respect to already dominated algorithmic ideas can be avoided.
The performance results have to be presented in informative ways to support the interpretation of the individual algorithmic behavior. This is preferably realized by stipulating a presentation style that uses a combination of tables and figures. By providing aggregated algorithm results for the complete benchmark collection as well as for predefined constrained function subgroups, the benchmark suite allows for establishing a connection between a tested algorithm and suitably constrained problems that it can solve.
Ranking of algorithms
Alongside with the presentation of individual algorithm performance, it is the purpose of a benchmark environment to answer the question which algorithm is best suited for solving (a subset of) the benchmark problems. The comparability of the algorithmic results is ensured by defining an appropriate ranking procedure.
Regarding constrained benchmarking functions, the comparability of algorithms results is in need of an ordering approach that is able to distinguish between feasible and infeasible realizations of the obtained quality indicators. A suitable representation of such an order relation is provided by the lexicographic ordering that has been defined in the context of Sec. 2. By introducing an order of priority to multiple quality indicators, the lexicographic ordering can be analogously defined to determine a proper algorithm ranking. The question which quality indicators to use for ranking competing algorithms involves a certain degree of subjectivity. For that reason, it is recommended in Mersmann2015 () to make use of consensus rankings which comprise more than one order relation. This way, a consensus ranking allows computing an appropriate algorithm ranking over the whole benchmark suite, or subsets of constrained problems, respectively.
In order to decide whether comparably small performance differences can be considered significant, the algorithm comparison usually benefits from factoring in statistical hypothesis testing. Being less restrictive than parametric approaches and requiring smaller sample sizes, nonparametric tests are usually recommended when testing EA realizations for statistical significance Garcia2009 (). However, statistical and practical significance are not necessarily equivalent and a wellestablished graphical representation of the algorithm results may suffice rardin2001experimental ().
Algorithm description
When providing benchmark results, it should be mandatory to require a proper characterization of a tested algorithm. Such a description includes the detailed motivation of prior investigations and a comprehensive description of the implemented algorithmic ideas. Further, an exact pseudocode representation is desirable to illustrate the working principles. Among others, this includes a specification of boxconstraint handling techniques, or the use of (approximated) gradient information, respectively. All algorithm specific strategy parameters need to be reported together with an explanation of their impact on the algorithm performance at best.
PC configuration
When it comes to measuring the computational running time of an algorithm, the users of benchmark collections should be required to report on the complete PC configuration. This includes detailed information about the processor architecture, memory, operating system, and the programming language, confer CEC2017 (). The use of performance benchmarks to calibrate algorithm speed is also recommended in order to obtain a perception of the systemdepending performance. This way algorithm comparability can be maintained over long periods of time johnson2002 ().
Runtime and algorithm complexity
In the context of derivativefree optimization, runtime might be measured in terms of the number of function evaluations executed hansen2016perf (). Having a machineindependent performance criterion, the measurement of CPU time (or wallclock time) may be regarded irrelevant. This approach assumes the availability of welldefined algorithm targets, e.g. knowledge about the optimal solution of the constrained functions that has to be approached with reasonable accuracy.
However, benchmark suites may concentrate on the computation of different indicators like mean or median solution quality. Such studies may argue that their focus is limited on the effectiveness of the algorithms and that runtime can be neglected in this context. Yet, regardless of the primary goals of a benchmark set, the algorithm running time should be reported johnson2002 (). It can be used to indicate algorithm complexity, i.e. running time tradeoffs that are related to increased solution quality and vice versa. Further, it provides a notion of the computational effort for reproducing the reported results and may provide useful information for assessing parallelization attempts.
Anyway, plain instructions for computing the algorithm speed have to be provided. This is achieved by indicating whether the calculations are performed for only one exemplary algorithm run or whether it considers all repetitions. Further, the running time may cover all preprocessing and initialization steps, or it might only focus on the main loop of the considered algorithm. Ideally, the complete algorithms time should be measured and reported relative to reproducible performance benchmarks.
4 The CEC competition on constrained realparameter optimization
The test function sets defined in the context of the IEEE Congress on Evolutionary Computation (CEC) competitions on single objective constrained realparameter optimization are arguably the most common test collections for benchmarking randomized search algorithms. The CEC competitions have been organized in 2006 CEC2006 (), 2010 CEC2010 (), and 2017 CEC2017 (). Each of these competitions introduced a specific set of constrained test problems in the line with (COP). The test functions sets are supported with a policy for the computation of comprehensive performance indicators and for reporting algorithm results.
The remainder of this section is concerned with reviewing the benchmarking conventions associated with the mentioned CEC benchmark environments as well as their characteristic features. To this end, the benchmark definitions are examined by taking into account three different aspects: the basic benchmarking conventions, the experimental setup, and the reporting of algorithm results. A summary of important features of the three constrained benchmark environments is provided in Table 1.
4.1 Benchmarking conventions
The CEC2006 benchmarks
The succeeding benchmark definitions for CEC2010 CEC2010 () introduced 18 new constrained benchmark problems. Yet, the origin of the corresponding constrained functions is not easily comprehensible. Only one constrained function was adopted from the CEC2006 benchmarks. The benchmark set introduced variations of 8 distinct objective functions that differ with respect to the application of parameter translations and/or rotations. Further variations are obtained by introduction of different number and types of constraint functions. Some objective and constraint functions can be attributed to a collection of unconstrained problems CEC2005 (); schwefel1995evolution (), e.g. the Rosenbrock function, the Griewank function, and the Weierstrass function. Other function definitions were obtained by use of the testcase generator proposed in Michalewicz2000 (). However, being defined in scalable from with respect to the search space dimension, the constrained test problems have to be solved in dimension and .
Benchmark name  CEC2006  CEC2010  CEC2017 

Minimal  
Maximal  
Number of constrained functions  
Number of distinct obj. functions  
Minimal number of constraints  
Maximal number of constraints  
Avg. number of constraints  
Scalable problems included  no  yes  yes 
Budget of function evaluations  
Number of separable problems  
Avg. size of  
Number of problems with 
Considering even larger search space dimensions (, , , and ), a novel collection of 28 benchmark problems was created for the CEC2017 competition CEC2017 (). The 2017 constrained function definitions are designed by taking new combinations of the building blocks provided in CEC2005 (); schwefel1995evolution (); Michalewicz2000 (). However, some overlaps do exist. It is claimed that the CEC2006 benchmarks and the CEC2010 have been successfully solved CEC2017 (). Yet, the older CEC testbeds are still very popular for benchmarking direct search algorithms and particularly Evolutionary Algorithms, e.g. GONG2014884 (). In contrast, the CEC2017 problem definitions are reutilized for the CEC competition on single objective constrained realparameter optimization taking place during the IEEE World Congress on Computational Intelligence (WCCI) in 2018.
The constrained functions definitions are fully presented in the corresponding technical reports. However, some constrained problems lack a description of the translation vectors and rotation matrices. These can only be understood by taking into account their implementations. The corresponding code is maintained on the respective website of the competition organizers websource01 (). It is openly available in the programming languages C and MATLAB.
The consecutive development from CEC2006 towards the CEC2017 benchmarks is not entirely motivated in the corresponding technical reports. Modifications with respect to performance indicators or algorithm ranking approaches are not entirely transparent. The documentation sometimes leaves room for interpretations by inexact instructions.
The three constrained benchmark collections demand to identify one evaluation of the whole constrained function as one single function evaluation. That is, the evaluation of the objective function values associated with a single candidate solution is accompanied with the evaluation of all corresponding equality and inequality constraints. The use of gradient information is only applicable, if the gradient is approximated numerically and the consumed function evaluations are properly taken into account.
The CEC competitions for constraint realparameter optimization do not enforce the feasibility of search space parameter vectors. In this respect, equality and inequality constraints of a constrained function (COP) are always considered as relaxable, cf. option (S2) in Sec. 2. That is, the algorithms are allowed to move in the unconstrained search space. Each candidate solution, either feasible or infeasible, may be evaluated and used within the search process of a strategy. It should be mentioned that the permission to use infeasible solutions during the search may significantly reduce the problem complexity. For instance, an algorithm can operate outside the feasible region until it approaches the optimizer sufficiently close.
A specific treatment of boxconstraints is not stipulated by the CEC benchmarks. The technical reports are not clear on whether boxconstraints have to be regarded relaxable (S2) or unrelaxable (S1). This ambiguity can potentially result in different approaches, and ultimately in significant performance differences LiaoMMS2014 (). Taking into account the most successful strategies reported in CEC competitions HuangQS06 (); TakahamaS06 (); TakahamaS10 (); MallipeddiS10 (); PolakovaT2017 (); TvrdikP2017 () and after inspecting the related openly available source codes, up to our knowledge, all algorithms were assuming situation (S1) as introduced in Sec. 2. Albeit reporting the full algorithm can be considered scientific standard, yet some papers miss out on giving such information. Further, the mechanisms to treat box constraint violations may vary. In order to ensure the reproducibility of the benchmark results, the test beds have to explicitly demand a statement on the boxconstraint handling techniques used by an algorithm.
For the computation of the quality indicators (see Sec. 4.3), the CEC framework sorts the algorithm realizations of 25 independent runs on the basis of the lexicographic ordering relation introduced in (8). That is, feasible solutions are ranked based on their objective function values. They always dominate infeasible solutions which are distinguished with respect to the related magnitude of their mean constraint violation (see Sec. 4.2, Eq. (9)).
4.2 Experimental design
The CEC competitions on constrained realparameter optimization do not provide an initially feasible region or candidate solution. Instead boxconstraints are specified for each constrained problem and an algorithm is supposed to randomly sample its starting point or initial population inside of the set of (COP).
Hence, the feasibility of initial candidate solutions is not ensured. In order to be competitive on the CEC benchmarks, algorithms need to be able to deal with infeasible solutions. This is affirmed when considering the size of the feasible region relative to , i.e. the parameter (cf. Eq. (3)). Looking at Table 1, the average value was reduced over the years. Whereas the ratio of constrained problems with a feasible region greater than was in 2006, this number dropped to in 2010, and even further to for constrained functions specified in 2017
Regarding the CEC2006 competition, the detailed benchmark function specifications can be found in the technical report CEC2006 (). The benchmark set consists of constrained functions of varying search space dimensions between and . The given constrained functions are fixed in terms of the problem dimension and the number of constraints. Each objective function is restricted by in between and linear and nonlinear (in)equality constraints, refer to Table 1. The optimal solution, or at least the bestknown solution, is provided for each constrained function.
The 2006 benchmark definitions include fully separable constrained functions. It further refrained from using rotations of the parameter vectors. By doing so the benchmarks enclose a potential bias towards strategies that search predominantly along the coordinate axes of the search space SuttonLW2007 (). In this regard, the CEC2006 benchmarks favor algorithms that use coordinatewise search or differences of obtained candidate solutions, e.g. Coordinate Search or Differential Evolution variants.
The benchmark definitions of the CEC2010 competition CEC2010 () can be considered a refinement with respect to this issue. As mentioned above, the constrained problems of CEC2010 can be affiliated to different sources CEC2005 (); schwefel1995evolution () and are partly designed by use of the testcase generator Michalewicz2000 (). The 2010 competition included constrained functions in dimensions , and , respectively. The formulation of scalable constrained functions allows for conclusions with respect to an algorithm’s ability to deal with growing search space dimensions. The mentioned bias towards coordinate search and separability was partly resolved by application of predefined search space rotations. Each objective function is accompanied with from to constraint functions. Hence, the average number of constraints per constrained function drops from in 2006 to about in 2010. In this respect, the CEC2010 competition problems represent a fresh start instead of being a progression of the CEC2006 problem definitions. The question to what extent the small number of constraints can actually cover realworld problem aspects remains. Moreover, bestknown solutions to the benchmark problems are no longer reported. This impedes gathering information about the effectiveness of an algorithm.
Still, out of problems are completely separable and do not apply any rotations to the parameter vectors. While the formal description of those transformations is not satisfactorily explained in the technical report, it is deposited in the corresponding competition source code websource01 (). There, the transformations are deterministically specified, and different, for each individual constrained function.
Having a look at the CEC2017 competition, the constrained function definitions are quite similar to its predecessor competition. The corresponding technical report CEC2017 () states scalable constrained optimization problems essentially attributable to the same sources of the CEC2010 benchmarks. The latest CEC collection considers not only a larger number of problems but also larger search space dimensions: , , , and . In total, the competition comprises constrained functions. The number of constraints is between and , i.e. the average number of constraints per problem is comparable to the CEC2010 benchmarks (refer to Table 1). Similarly to the 2010 version, information on optimal parameter vectors or function values is omitted. Among these problem definitions, out of constrained functions are separable. To this end, a small bias towards strategies that predominantly search parallel to the Cartesian axis of the search space cannot be fully excluded.
The CEC benchmark environments do not establish subgroups of constrained problems. That is, results obtained by an algorithm can hardly be identified with a certain problem characteristic. Although, at least the CEC2017 collection would allow for rough categorizations of the constrained problems. For example, the constrained problems , , and respectively, share the same objective function but differ in the number and type of their constraint functions. Problem classes that take into account number or type of the constraints would also be conceivable. This would be useful for extracting additional information about the applicability of algorithmic ideas to such problem classes.
All CEC benchmark sets share the definition of a feasible solution introduced in Sec. 2. Due to the issue of enforcing the generation of candidate solutions that exactly satisfy the equality constraints, the error margin of is used in all three competitions.
Every algorithm has to perform independent runs on a single instance of each constrained optimization problem.
In each run, the best result so far is monitored at three distinct points of the search process, i.e. after 10%, after 50%, and after 100% of the assigned function evaluation budget have been consumed.
(9) 
where is the aggregated number of equality and inequality constraints of problem (COP). Note that the constraint violation is obtained according to Eq. (5). The term specifies the number of violated constraints with violation greater than , , and , respectively.
The results of these runs are then used to compute statistics for algorithm evaluation and comparison. In order to sort the realized candidate solutions, the CEC benchmarks introduce a lexicographic ordering with respect to and . That is, two candidate solutions and are sorted according to
(10) 
Note that Eq. (10) is defined analogously to the order relation (8), but makes use of Eq. (9) instead of Eq. (5). A comprehensive list of the utilized quality indicators is provided in Table 2.
Notation  Description  2006  2010  2017 

Best  The objective function value corresponding to the best found solution in 25 independent algorithm runs with respect to Eq. (10).  +  +  + 
Median  The objective function value associated with the median solution of the 25 algorithm realizations according to Eq. (10).  +  +  + 
A vector containing the number of constraints with violation greater than , , and associated with the median solution.  +  +  +  
The mean constraint violation value associated with the median solution , refer to Eq.(9).  +  +  +  
Mean  The mean objective function value according to the 25 independent algorithm runs.  +  +  + 
Worst  The objective function value corresponding to the worst found solution .  +  +  + 
Std  The standard deviation according to the objective function values obtained in 25 runs.  +  +  + 
The ratio of feasible algorithm realizations over the number of total runs.  +  +  +  
The ratio of successful algorithm runs, cf. (11), over the number of total runs was computed.  +      
The quotient of the mean number of function evaluations consumed in successful runs and the success ratio is referred to as success performance .  +      
The mean constraint violation corresponding to the 25 independent algorithm runs.      + 
The CEC2006 benchmark set provided the globally optimal parameter vectors of each test problem. Using this information the effectiveness of an algorithm was determined in terms of the deviation of the bestsofar solution from the optimum . It was further used to calculate the success rate () of a specific algorithm. The success rate was defined as the ratio of successful runs and the number of total runs. Hence, an algorithm run is considered successful if at least one feasible solution with
(11) 
is realized. Note that, by distinguishing two feasible candidate solutions based on their deviation from the known optimum, the CEC2006 benchmarks use a slightly different way of proceeding than presented in (10).
No longer having information about the global optima, the success rate was replaced with the calculation of the feasibility rate () in the succeeding CEC competitions. indicates the ratio of those algorithm runs that realized at least one feasible solution and the total number of algorithm runs.
Regarding the termination criterion used by the CEC competitions, each constrained problem comes with a fixed budget of function evaluations.
4.3 Reporting
By primarily representing test problems for the CEC competitions on constrained realparameter optimization, the corresponding technical reports do not make a statement on ensuring newsworthiness of the algorithm results. In order to participate in the mentioned competitions, algorithm results have to be published in a conference paper that has to pass a related review process. Accordingly, the novelty of algorithms is reviewed in this way. However, those authors that use the constrained CEC functions as benchmarks in a different context might need to be reminded of assessing the benefit of their algorithmic ideas. To this end, benchmark results of comparable algorithms should be supplied, e.g. results obtained by the winning strategies from earlier competitions or even by random search. Such information is considered useful for quickly evaluating the suitability of a novel algorithm and its competitiveness for the CEC competitions. As pointed out by GarciaMartinez2017 () in the context of unconstrained benchmarks, the comparison of novel algorithmic ideas with diverse stateoftheart strategies is essential to prevent the publication of already dominated results and to contribute to real progress in the respective field of research.
The final quality indicators computed for a specific algorithm have to be presented for every single constrained problem in a detailed table. Considering that the CEC benchmarks demand information on three stages (, , and ) of the search process, this presentation style appears rather lengthy. Table 3 illustrates the presentation guidelines corresponding to the CEC2017 benchmarks.
Budget  Indicator  p01  p02  p03  …  p28 
Best  
Median  
10%  Mean  
Worst  
Std  
Best  
50%  ⋮  
Best  
100%  ⋮  
Making use of one table per dimension, and per algorithm, leads to increasing space requirements when considering more search space dimensions. Furthermore, drawing conclusions with respect to algorithm performance differences is made very difficult. Additionally, not subsuming problems of similar characteristics impedes interpretations of the results.
The CEC2006 and CEC2010 benchmarks were using convergence graphs to provide a more tangible notion of algorithm performance. In 2006, the convergence graphs illustrated the deviation of the objective function value from the optimum as well as the mean constraint violation plotted against the number of function evaluations in fulllogarithmic scales. Instead of taking into account the median solution, the technical report of CEC2010 recommends illustrating the best out of 25 runs. The idea of convergence graphs was dropped with growing table sizes for the CEC2017 competition.
The benchmark collections demand to report the configuration of the PC on which the experiments have been executed. To this end, the operation system, the CPU, the memory, the programming language used, and the algorithm have to be specified. Acting this way intends to support algorithm comparability. However, a performance benchmark to calibrate a tested algorithm’s efficiency on the corresponding system is not recommended. Such a performance baseline would retain comparability of algorithm results obtained on outdated systems.
With respect to algorithm reporting, the CEC related technical reports CEC2006 (); CEC2010 (); CEC2017 () require the complete description of the algorithm parameters used as well as their specific ranges. Further, algorithm designers are demanded to present guidelines for potential parameter adjustments and estimates of the corresponding costs in terms of function evaluations. The use of handtuned parameters for individual constrained functions is interdicted.
In order to give an impression of the algorithm complexity, three quantities have to be presented. The average of the computation time of evaluations, as well as , the complete computation time of a specific algorithm over all problems of similar dimensionality
(12) 
Here, denotes the number of constrained optimization problems with similar dimensionality of a respective benchmark function set. and are reported together with their relative difference .
To represent a meaningful quantity of algorithm complexity ,
the measurements and need to consider a sufficiently large number of function evaluations. However, such an approach can be problematic:
Imagine a DE algorithm TakahamaS10 () (or an EDA
Considering the ranking of competing algorithms, the presentation style promotes the need for a welldefined algorithm ranking. Unfortunately, the technical report of the CEC2006 competition CEC2006 () does not provide any motivation of a suitable ranking procedure at all. The presentation of the competition results is also of little help. Hence, the quality indicators used to obtain an algorithm ranking cannot be deduced.
While defined in different ways, the ranking schemes used for the CEC2010 and CEC2017 benchmarks are fully explained. The CEC2010 ranking method is based on a mean value comparison of two or more algorithms on each individual constrained problem. Algorithms that yield feasibility rates of are ordered based on their mean objective function values. Those algorithms realizing a feasibility rate in between are ranked according to their feasibility rate. Finally, strategies resulting in are ordered based on the mean constraint violations of all 25 runs. The total rank of an algorithm is obtained by summing up its ranks on all 36 individual problems (including dimensionalities and ) and the average rank is determined by
(13) 
This way the best algorithm is defined by the lowest rank value .
The CEC2017 ranking method is considering the mean objective function values as well as the median solution at the maximal allowed number of function evaluations. The first ranking of all competing algorithms is based on the mean values. After having completed all independent runs, for each constrained problem the algorithms are ordered with respect to their feasibility rate . The second ordering criterion is the magnitude of mean constraint violations. At last, ties are resolved by considering the realized mean objective function values. Acting this way, each algorithm obtains a rank on each constrained problem. The second ranking procedure relies on the median solutions. The first ordering step is concerned with the feasibility of the median solution. A feasible solution is better than an infeasible solution. Feasible solutions are then ordered by means of their objective function values and infeasible ones according to their mean constraint violations. On every constrained problem, each algorithm is assigned a rank . Having ranked all algorithms on every single constrained problem, the ranks are aggregated. That is, the total rank value of each algorithm is calculated as
(14) 
Again, the best algorithm obtains the lowest rank value .
Regarding these two ranking methods, it is noticed that the CEC2017 ranking is a progression. It no longer uses a single ranking (average case quality in the broadest sense), but the consensus of average case and median case quality. This is in line with Mersmann2015 (), where the use of socalled consensus rankings is recommended for algorithm comparison. Consensus rankings are distinguished into positional and optimizationbased methods.
The definition of a consensus ranking is by no means unique as it is rather sensitive with respect to the choice of individual rankings and the number of considered algorithms. A desirable property of a consensus ranking would be the Independence of Irrelevant Alternatives (IIA) criterion Arrow1950 () stating that changes in the number of algorithms must not affect the pairwise preference in the consensus ranks. That is, if the consensus ranks algorithm first and algorithm second among five distinct algorithms, then disregarding any other algorithm should not yield a consensus rank change between and . However, this criterion is hardly satisfied by most intuitive consensus methods and, after all, a “best” consensus ranking does usually not exist. Yet, a good consensus method is likely to promote insight into advantageous algorithmic ideas and might highlight poor performance, respectively. For a description and a more detailed discussion of sophisticated consensus methods, it is referred to Mersmann2015 ().
The positional consensus ranking of the CEC2017 benchmarks is created by simply adding the mean and median ranks. This can result in potentially undesirable consensus rankings
Moreover, the CEC ranking approaches aggregate algorithm rankings over multiple dimensions. This way, algorithms which are especially well performing in lower dimensions are potentially overrated and the overall ranking might be prejudiced. Further, algorithms that are particularly well performing in larger dimensions cannot be clearly identified. Aggregation over dimension should be avoided, because the problem dimension is a parameter known in advance that can and should be used for algorithm design decisions hansen2016perf ().
To conclude this review of the constrained CEC benchmarks, some of the mentioned aspects could be incorporated in the advancing CEC competitions on constrained realparameter optimization. In doing so, algorithm developers would benefit from the introduction of welldesigned problem subgroups that support the identification particularly difficult problem features. Further, competing algorithms should be ranked for individual dimensions in order to obtain an intuition of the scalability of an algorithm. A competition winner might then be assigned by weighting these ranks.
5 The COCO framework
The Comparing Continuous Optimizer (COCO) suite hansen2016coco () provides a platform to
benchmark and compare continuous optimizers for numerical (nonlinear) optimization.
Only recently, the development of a COCO branch for constrained optimization problems started.
The related code is available on the project website
While the COCO BBOBconstrained testbed is not yet operational, being short before completion, the corresponding benchmarking principles and the associated test problem structure are not expected to substantially change anymore. As the COCO framework represents the currently most elaborated benchmarking environment for EAs, not mentioning the constrained COCO principles would render the present review incomplete.
However, caution is advised with respect to small changes in individual test function aspects, e.g. the distances of the constrained optimal solution from the unconstrained optimal solution
The rest of this section is concerned with pointing out the COCO BBOBconstrained benchmarking conventions, the related test problem definitions, the evaluation criteria as well as the presentation style.
5.1 Benchmarking principles
The COCO BBOBconstrained suite is distinctly built on the unconstrained COCO framework. The COCO platform assists algorithm engineers in setting up proper experiments for algorithm comparison. It provides simple interfaces to multiple programming languages (C/C++, Python, MATLAB/Octave, and Java) which makes the benchmarks easily accessible. Users are not involved in the evaluation of constrained functions or the logging process of algorithm results. A corresponding postprocessing module facilitates the illustration and the meaningful interpretation of the collected algorithm data. In this respect, COCO reduces the benchmarking effort for algorithm developers with respect to implementation time.
The benchmark functions are considered to represent blackbox functions for the tested algorithms. Still, the objective functions are explicitly stated in mathematical form in the documentation. This allows for a deeper understanding of the individual problem difficulties and thus of an algorithm’s (in)capabilities.
In a first step, the COCO BBOBconstrained test bed confines itself to eight wellknown objective functions from the context of the unconstrained COCO suite. These objective functions are provided with varying number of (almost) linear inequality constraints.
However, the actual test instances are randomly generated for each algorithm runs, see Sec 5.2.
A very comprehensive explanation of the COCO framework and the associated constrained problems can be found on the COCO documentation website
The COCO guideline for counting function evaluations in the constrained setting involves distinguishing objective function evaluations and constraint evaluations. Still, one constraint evaluation is identified with the evaluation of all individual constraint functions at a time. Accordingly, a specified budget of function evaluations needs to be split.
On the one hand, the formal constrained function definitions are not specifying any boxconstraints, refer to (15). In this regard, guidelines for the treatment of boxconstraints are not needed. Yet, the BBOBconstrained suite provides the user with the subroutines cocoProblemGetSmallestValuesOfInterest, and cocoProblemGetLargestValuesOfInterest, to determine the lower bound and the upper bound for each constrained problem. While the optimal solution is located inside the box according to Eq. (1), evaluations of candidate solutions outside the box are not interdicted.
Whether the boxconstraints need to be enforced in every step or not is of course a design question. Anyway, the benchmark designers need to provide plain instructions with respect to treatment of boxconstraints during the search process. The use of the boxconstraint handling may be beneficial on some constrained problems. Therefore, such instructions are necessary to obtain comparable algorithm results. Moreover, algorithm developers need to be urged to report the specific boxconstrained handling techniques used.
5.2 Experimental design
The standard BBOBconstrained optimization problem reads
(15)  
A summary of the associated problem features is provided in Table 4. The considered constrained functions are separated into eight subgroups associated with the selected objective functions. These objective functions are

the Sphere function,

the Ellipsoid function,

the Linear slope function,

the rotated Ellipsoid function,

the rotated Discuss function,

the rotated Bent Cigar,

the rotated Different Powers, and

the rotated Rastrigin function.
By systematically equipping each objective function with 6 different numbers of inequality constraint functions, namely , , , , , and constraints, the BBOBconstrained benchmark problems are built. The number of the constraints depends on the considered search space dimension . Note that the BBOBconstrained suite renounces the incorporation of equality constraints.
Benchmark name  COCO BBOBconstrained 

Search space dimensions  
Number of constrained functions  (incl. varying dimensions) 
Number of distinct obj. functions  
Minimal number of constraints  
Maximal number of constraints  
Scalable problems  yes 
Budget of function evaluations  userdependent 
Number of separable problems  
Avg. size of  
Number of problems with 
For now, the COCO BBOBconstrained testbed concentrates on almost linear inequality constraints. To this end, the linear structure of the feasible region is distorted by application of bijective nonlinear transformations on a number of constrained functions. The subsequent application of a randomly generated translation of the whole constrained problem prevents the optimal solution from being the zero vector, i.e. . The problems are further created in a way that maintains a known optimal solution of the constrained function. This optimal solution is always located on the boundary of the feasible region. However, considering the blackbox setting the optimal solution is not accessible by a user, nor by the algorithm. It is used for evaluation of algorithm performance.
The procedure to create a constrained function consists of five steps

Select a pseudoconvex objective function and a corresponding number of constraints .

Define the first linear constraint .

Construct the remaining linear constraints by sampling their gradients from a multivariate normal distribution and incrementally demanding that the origin remains a KarushKuhnTucker (KKT) point of the problem boyd2004convex ().

If applicable, apply nonlinear transformations to the constrained function.

Randomly sample a translation vector to change the location of the optimal solution.
According to the COCO BBOBconstrained documentation sampaio2016coco (), the domain of almost linear constrained functions represents the most interesting starting configuration for benchmarking.
The transformations are essentially applied to ensure constrained functions that are reasonably difficult to solve, i.e. potential regularities that might favor the exploitation abilities of certain algorithms are excluded. The transformations are designed in such a way that the automatic generation of similarly hard test problem instances is realized. Problem instances share the same objective function, the same number of inequality constraints, as well as the same search space dimension. By randomly defining and distorting the linear constraints, the size of the feasible region may vary. The extent to which the complexity of two instances with differently sized feasible regions is maintained remains unanswered.
The constrained problems (15) are scalable with respect to search space dimension and number of constrained functions . Taking into account dimensionality, objective function and the number of constraints, the BBOBconstrained testbed consists of distinct constrained functions. By composing problem subgroups by means of objective functions, as well as dimensionality, supports the identification of algorithmic strengths and weaknesses for specific problem characteristics. The constrained COCO framework considers only inequality constraints . Consequently, a candidate solution is regarded feasible solution if all inequality constraints are satisfied, i.e. . By construction, the feasible sets of the benchmark suite is nonempty and connected. For initialization purposes a feasible candidate solution is provided by the COCO subroutine cocoProblemGetInitialSolution. It may serve as a starting point for the search process. This represents a beneficial feature for benchmarking algorithms that search exclusively inside the feasible region , see (2). As already mentioned in Section 5.1, the box constraints of each problem are accessible. Hence, they may also be used to initialize a starting population inside the box .
When estimating the size of the feasible region relative to the box defined by the lower and upper bounds, the associated values indicate the dependence of the dimension . Yet, the aggregated value presented in Table 4 only has limited significance. On the one hand, it was generated according to Koziel1999 () by considering only a single instance of each constrained function. As the randomly generated boundary of the feasible region may vary among constrained problem instances, the value is supposed to exhibit fluctuations of some degree. On the other hand, the was averaged over all possible problem dimensions and thus only represents a rough sketch. However, compared to the CEC benchmarks in Section 4, the average feasible region of a BBOBconstrained function can be considered larger.
The benchmark suite does not determine a fixed budget of function evaluations. The specification of appropriate termination conditions for an individual algorithm is left to the user HansenTMAB16 (). In this context, the COCO builtin function cocoProblemFinalTargetHit delivers an indicator of the realized algorithm precision. It returns true after the algorithm has approached the optimal objective function value with accuracy and can be utilized to terminate the algorithm run. Accordingly, the value of represents the final target precision that is used to specify a successful algorithm run, see Section 5.3.
By default, each algorithm is executed on randomly generated instances of each constrained function. The corresponding results are interpreted as 15 independent repetitions on the same constrained problem. Acting this way prevents unintentional exploitation of potentially biasing function features hansen2016coco ().
Remember that the optimal solution is by construction located on the boundary of the feasible region. This property might potentially prejudice search algorithms to largely operate outside of the feasible region of the search space. Depending on the fitness environment, this allows for faster progress until the algorithm reaches a certain neighborhood of the optimal solution.
5.3 Reporting
The COCO framework comes with a postprocessing module for automated data preparation and visualization in terms of html or LaTeX templates. The userindependent standardization of the data processing reduces the susceptibility to errors and supports the comparability of algorithm performance.
The COCO BBOBconstrained suite takes into account a single performance measure: the algorithm runtime.
Whether a target was reached after evaluation of a candidate solution is automatically checked by the COCO suite. To this end, a trigger value is compared with the next unmatched target. The corresponding number of function evaluations as well as the trigger value are logged. For now, the trigger value is identified with the objective function value of a feasible candidate solution. Infeasible candidate solutions, or their constraint violations, are not considered in the definition of the currently used triggers. The objective function value of the initially provided solution cocoProblemGetInitialSolution is considered as initial trigger value. The initial trigger value does usually not satisfy any of the targets. The trigger value is updated as soon as the benchmarked algorithm is able to find a feasible candidate solution with improved objective function value.
Making use of this runtime definition results in a performance measure that is essentially independent of the computational platform and the programming language used. Further, the algorithm results can easily be condensed and presented in multiple ways, e.g. by measuring the average runtime (aRT) of an algorithm hansen2016perf (), by use of data profiles or empirical cumulative distribution function (ECDF) plots MoreWild2009 (), or runtime tables for specific target values. An illustration of an aRT plot is displayed in Figure 1(a). It provides an estimate of the expected runtime. The aRT is computed by summing up all evaluations in unsuccessful algorithm runs as well as the number of evaluations consumed in the successful algorithm runs, both divided by the number of successful runs. The ECDF plot provided in Fig. 1(b) displays the proportion of successfully reached targets on function f01 plotted against the number of function evaluations.It is usually independent of any reference algorithms and thus unconditionally comparable across different publications. This supports drawing meaningful conclusions with respect to algorithm performance on the whole benchmark set, or on the individual problem subgroups, respectively. Note, that algorithm results are not aggregated over dimensions in order to disclose the impact of the problem dimensionality on the algorithm performance.
Algorithms can be directly compared by illustrating their ECDFs per function evaluations in logscales. This way, the area above and in between the graphs becomes a meaningful conception. An exemplary ECDF is illustrated in Figure 1. It can be interpreted in two ways: By considering the number of function evaluations on the axis as independent variable, the axis represents the ratio of targets reached for any budget . On the other hand, associating the axis with the independent variable, the values present the maximal runtime observed to reach any fraction of the predefined target values.
Consequently, better performing algorithms realize smaller areas above a curve. Further, the difference between those areas can be interpreted as a measure of the performance advantage of one algorithm over another.
With the caveat of loosing the connection to a single constrained problem, the ECDF plots allow for aggregation over multiple constrained problems hansen2016perf ().
That is, the presentation of algorithm performance on problem subgroups is straight forward.
Hence, in contrast to extensive and hardly interpretable tables, the ECDFs provide a relevant notion of algorithm suitability for single constrained functions, and subgroups of constrained problems, respectively.
Only considering feasible candidate solutions in the trigger/target definition may inflate the relevance of late phases in the search process. Depending on the constrained problem, algorithms that sample an initial population within the boxconstraints might consume a considerable number of function evaluations until they reach the feasible region. The number of function evaluations needed to hit a first target provides a notion of the runtime needed to find a first feasible solution. Accordingly, the area to the left of a ECDF curve can still be identified with the runtime of the respective algorithm. However, the resulting ECDF plots will thus likely display a steeply ascending curve that is shifted to the right boundary (determined by the limit of function evaluations). This complicates the comparison of multiple algorithms because the relevant information might be largely accumulated in one spot. Also from a practical point of view, the late search phase may have minor impact on the assessment of an algorithm if the main focus is on finding a feasible solution of reasonable precision.
Other trigger definitions are conceivable, i.e. the trigger may be defined by the sum of the objective function value and the constraint violation of a candidate solution. This way of proceeding takes into account infeasible steps, but it would introduce the issue of unwanted cancellation effects. Another idea to give an impression of the algorithm performance within the infeasible region is the definition of separate targets for the constraint violation. These targets would need to be displayed in a second plot that addresses the runtime during the search in the infeasible region of the search space.
The COCO experiments include the approximate measurement of the algorithm time complexity HansenTMAB16 (). To this end, it is recommended to monitor either the wallclock or the CPU time while running the algorithm on the benchmark suite. The time normalized by the number of function evaluations is demanded to be reported for each dimension. Additionally, information on the experimental setup, the programming language, the chosen compiler and the system architecture are required. Yet, the instructions do not completely exclude diverse interpretations and thus impede the comparability and reproducibility of the results.
As the development of the BBOBconstrained benchmark suite is still ongoing, the definitive presentation style of the algorithm results cannot be provided at this point. The presentation of additional information on the ratio of the feasible region relative to the box is conceivable.
Further, the ultimate choice of the trigger value for deciding whether a predefined target was reached is still being discussed.
6 Conclusion
The present review is intended to collect principles for comparing constrained test environments for Evolutionary Algorithms. To this end, it takes into account recommendations on the basic principles, the experimental design, and the presentation of algorithm results. Based on the gathered criteria, the most prominent constrained benchmarking environments for EAs are reviewed. Significant differences with respect to the basic assumptions and the experimental approaches became evident. The survey of the current constrained benchmarking collections suitable for randomized search algorithms supports the algorithms developers with information about the strength of the available frameworks.
Both considered benchmark suites focus on different constrained problem domains. They differ in terms of counting function evaluations, defining termination criteria as well as performance evaluations comparison. The COCO BBOBconstrained benchmark is very much based on the unconstrained COCO framework. By including exclusively almost linear inequality constraints, it represents a first systematic attempt towards general constrained problems. The BBOBconstrained test function definitions are rather tangible. This is due to the composition of wellknown unconstrained optimization problems and connected feasible sets. By construction, the BBOBconstrained benchmarks (internally) maintain an optimal solution for measuring algorithm performance. In comparison, the structure of the constrained CEC test problems is somehow harder to perceive. Being also based on proven unconstrained objective functions, the structure of the corresponding feasible sets is comparably complex. A reason is varying numbers of usually nonlinear equality and inequality constraints that potentially define disjoint feasible regions in the search space. Further, the most recent constrained function definitions do not provide information about optimal solutions.
These distinct benchmarking approaches directly induce different ways of presentation. On the one hand, the COCO framework measures runtime in terms of function evaluations per predefined target and visualizes algorithmic performance in terms of ECDF graphs. Algorithm performance can thus be rather easily aggregated over similar problems and compared to different algorithms. On the other hand, the CEC benchmarks compute a number of best quality, median quality, or mean quality indicators and illustrate the algorithm performances by use of tables. The assessment of algorithmic ideas is rather cumbersome. Further, the comparison of algorithm results thus relies on ranking schemes that may come with a sense of arbitrariness.
Both, the CEC competitions for constraint realparameter optimization, and the COCO BBOBconstrained framework do only consider relaxable equality and inequality constraints. That is, the algorithms are allowed to move in the whole unconstrained search space. Each candidate solution, either feasible or infeasible, may be evaluated and used within the variation or selection steps of the strategy. It should be reminded that the possibility to use infeasible solutions during the search may significantly reduce the problem complexity. An algorithm might completely operate outside the feasible region until it finds the optimal solution. Taking into account the size of the feasible regions, and looking at the problem definitions of the CEC 2006, 2010, and 2017 benchmarks, it should be made clear that many problems have disjoint feasible regions. In this regard, enforcing the feasibility of candidate solution prior to their evaluation appears useless for the CEC benchmark definitions. However, the prior demand for feasibility is often required in realworld problems, e.g. when considering simulations which require feasible inputs. In this regard, the CEC benchmarks do not represent a suitable test function class. Similar concerns can be raised for the BBOBconstrained benchmark suite. However, its feasible set is always nonempty and connected. Further, by providing an initially feasible solution the BBOBconstrained framework could potentially take into account unrelaxable constraints.
Furthermore, both benchmark sets omit to demand a specific boxconstraint treatment. Yet, they refrain from mentioning the need of the precise reporting of such approaches. Considering the source codes of the most successful strategies reported in CEC competitions, all algorithms were assuming situation (S1). Even if not explicitly specified in the benchmark definitions, today the enforcement of the bound constraints seems to be ’common sense’ within the Evolutionary Computation community. However, the mechanisms to treat box constraint violations may vary and are usually not well reported. As pointed out in Sec. 3.1, plain instruction with respect to the treatment of boxconstraints can prevent inconsistencies LiaoMMS2014 ().
Considering the CEC benchmark environments, the problem definitions were subject to considerable changes in recent years. The introduction of scalable constrained functions was accompanied by a reduction of the average number of constraints per problem (from to about ). While the CEC2006 benchmarks were (partly) inspired by realworld applications, the comparably small fixed number of constraints appears underrepresented when taking into account the structure of realworld problems. Further, parameter space transformations were introduced in order to remove potential problem biases in direction of the coordinate axes. Still, a small number of fully separable constrained functions remained in the CEC2017 benchmark set. Conclusively, the advancement of the constrained benchmark environments misses a comprehensive documentation. Future CEC benchmarking competitions also might consider to provide a repository of baseline algorithm results in order to assess the competitiveness of algorithmic ideas and to highlight actual advancements in this field of research.
Regardless of minor software bugs and unfinished postprocessing methodology, the COCO BBOBconstrained suite could have the potential to become a stateoftheart constrained benchmarking platform. It is equipped with a detailed documentation of its benchmarking principles, the experimental design as well as an elaborated postprocessing strategy. This is supported by the COCO framework representing a widely accepted benchmarking suite for the unconstrained case. However, only considering almost linear inequality constraints, BBOBconstrained may need to proceed towards more complex constraints.
The COCO problem design might further allow for limited user customizations. For instance, the optimal parameter could be optionally moved from the boundary into the feasible region. Another option could be manually turning off the nonlinear perturbations to obtain fully linearly constrained problems (with nonlinear objective functions). This might be useful for examining specific algorithmic ideas suited for constrained problems that lack appropriate benchmarking environments.
Taking into account the vast number of constrained problem characteristics, the current benchmarking environments under review do only cover a small ratio of the constrained problem domain. On the one hand, being a progression of unconstrained COCO BBOB suite, the BBOBconstrained framework focuses on wellstudied objective functions with almost linear inequality constraints of scalable number. Providing a collection of wellstructured test functions represents a reasonable approach to support the assessment and the development of EA suitable for constrained problems. On the other hand, the CEC benchmarks mainly present nonlinear constrained problems with a fixed number of not necessarily linear inequality and equality constraints. Also due to unconnected feasible sets, the CEC constrained test functions are considered to represent harder challenges in some cases. Aiming at the establishment of profound benchmarks for realvalued constrained optimization, the two approaches should not be regarded as opposing but rather as complementing benchmarking suites.
Advancing the CEC benchmark definitions, and finishing the COCO BBOBconstrained benchmark suite, are anticipated tasks for future research. Further the design of additional EA benchmarking tools for different constrained problem subdomains needs to be challenged. A possible step in this direction might be the consideration of linear constrained optimization problems suited for EA. In this regard, the KleeMinty problem is able to serve for demonstrating and examining the capabilities of EA in the context of linear optimization. It is based on the KleeMinty polytope klee1970good (), a unit hypercube of variable dimension with perturbed vertices, which represents the feasible region of the linear problem. The linear objective function is constructed in such a way that the Simplex algorithm yields an exponential worstcase running time. Considering the number of sophisticated deterministic approaches available, taking into account linear optimization problems for EA benchmarking may appear questionable in the first place. However, many purposebuilt algorithms for linear optimization IPM1 (); IPM2 () show poor performance in this environment. The KleeMinty problem was already used to compare a specially designed CMSAES variant for linear optimization with open source interior point LP solvers in Spettel2018 ().
In case that this review fosters the impression of an unbalanced criticism, this conjecture is probably due to the fact that the constrained CEC benchmarks have existed for many years providing a multitude of benchmarking papers and working points, respectively. In contrast, there are hardly any algorithm comparisons that were carried out on the basis of the BBOBconstrained environment. In this respect, the COCO BBOBconstrained framework will have to prove itself in practice.
Acknowledgement
This work was supported by the Austrian Science Fund FWF under grant P29651N32.
References
Footnotes
 The code related to the BBOBconstrained suite under development is available in the development branch on the project website http://github.com/numbbo/coco/development.
 GAMS – General Algebraic Modeling System, https://gams.com
 AMPL – A Mathematical Programming Language, https://ampl.com
 Aiming at a consistent terminology for the remainder of this article, this denotation of a constrained problem instance does not demand generality.
 Note, that the present paper refrains from citing bad examples.
 Note that the present paper refers to the constrained test problem set specified for the competition in year 2006 as CEC2006 benchmarks. The denotations CEC2010 benchmarks and CEC2017 benchmarks have to be understood in analogous manner.
 Only, the reports on the CEC2006 and CEC2010 reported on the values. To ensure comparability, the values of the constrained CEC benchmark problems have been reevaluated by use of the method presented in Koziel1999 ().
 Notice that, for the CEC2006 benchmarks the same measurements had to be collected after , and of the evaluation budget.
 Keeping in mind, that the CEC benchmark definitions refer to a function evaluation as one evaluation of the whole constrained function, see Sec. 4.1.
 Please, refer to hauschild2011introduction () for a survey about Estimation of Distribution Algorithms (EDA).
 Note that the IAA criterion does not hold in this case.
 https://github.com/numbbo/coco The corresponding documentation is provided in sampaio2016coco () under docs/bbobconstrained/functions/build after building it according to the instructions.
 Note, the unconstrained optimal solution of a constrained function (COP) is associated with the optimum of the related objective function, i.e. disregarding all constraints.
 https://numbbo.github.io/cocodoc/
 The construction of the constrained Rastrigin function group is slightly different. It is referred to atamna2017 (); sampaio2016coco () for the detailed definition.
 While this can be disputed, it is likely the most simple and logical step for gradually extending the COCO BBOB framework to the constrained problem domain.
 By concentrating on runtime, the BBOBconstrained benchmarks may refrain from defining an order relation for candidate solutions.
 Keep in mind, that the number of function evaluations comprises the sum of all objective function evaluations and the number of constraint evaluations.
 The ECDF aggregation over different dimensions is omitted to prevent loss of information related to the impact of the search space dimension on the algorithm performance.
 For the ongoing discussion on BBOBconstrained features, it is referred to https://github.com/numbbo/coco/issues.
References
 Z. Michalewicz, D. Dasgupta, R. G. Le Riche, M. Schoenauer, Evolutionary algorithms for constrained engineering problems, Computers & Industrial Engineering 30 (4) (1996) 851–870.
 V. Oduguwa, A. Tiwari, R. Roy, Evolutionary computing in manufacturing industry: An overview of recent applications, Appl. Soft Comput. 5 (3) (2005) 281–299. doi:10.1016/j.asoc.2004.08.003.
 J. Zhang, Z. h. Zhan, Y. Lin, N. Chen, Y. j. Gong, J. h. Zhong, H. S. H. Chung, Y. Li, Y. h. Shi, Evolutionary computation meets machine learning: A survey, IEEE Computational Intelligence Magazine 6 (4) (2011) 68–75. doi:10.1109/MCI.2011.942584.
 G. Collange, N. Delattre, N. Hansen, I. Quinquis, M. Schoenauer, Multidisciplinary optimization in the design of future space launchers, Multidisciplinary design optimization in computational mechanics (2010) 459–468.
 A. M. Mora, G. Squillero, Applications of Evolutionary Computation: 18th European Conference, EvoApplications 2015, Copenhagen, Denmark, April 810, 2015, Proceedings, Vol. 9028, Springer, 2015.

O. Mersmann, M. Preuss, H. Trautmann, B. Bischl, C. Weihs,
Analyzing the BBOB results by
means of benchmarking concepts, Evolutionary Computation 23 (1) (2015)
161–185.
doi:10.1162/EVCO_a_00134.
URL https://doi.org/10.1162/EVCO_a_00134  D. Whitley, S. Rana, J. Dzubera, K. E. Mathias, Evaluating evolutionary algorithms, Artificial Intelligence 85 (1) (1996) 245 – 276. doi:10.1016/00043702(95)001247.
 R. L. Rardin, R. Uzsoy, Experimental evaluation of heuristic optimization algorithms: A tutorial, Journal of Heuristics 7 (3) (2001) 261–304.

N. Hansen, A. Auger, O. Mersmann, T. Tusar, D. Brockhoff,
COCO: a platform for comparing
continuous optimizers in a blackbox setting, arXiv preprint.
URL https://arxiv.org/abs/1603.08785  N. Hansen, A. Auger, S. Finck, R. Ros, Realparameter blackbox optimization benchmarking 2009: Experimental setup, Tech. rep., INRIA (2009).

N. Hansen, S. Finck, R. Ros, A. Auger,
RealParameter BlackBox
Optimization Benchmarking 2009: Noiseless Functions Definitions, Research
Report RR6829, INRIA (2009).
URL https://hal.inria.fr/inria00362633  E. MezuraMontes, C. A. Coello Coello, Constrainthandling in nature inspired numerical optimization: Past, present and future, Swarm and Evolutionary Computation 1 (4) (2011) 173–194. doi:10.1016/j.swevo.2011.10.001.
 J. J. Liang, T. P. Runarsson, E. MezuraMontes, M. Clerc, P. N. Suganthan, C. A. Coello Coello, K. Deb, Problem definitions and evaluation criteria for the CEC 2006 special session on constrained realparameter optimization, online access (2006).
 R. Mallipeddi, P. N. Suganthan, Problem definitions and evaluation criteria for the CEC 2010 competition on constrained realparameter optimization, online access (2010).
 G. H. Wu, R. Mallipeddi, P. N. Suganthan, Problem definitions and evaluation criteria for the CEC 2017 competition on constrained realparameter optimization, online access (September 2016).
 Z. Michalewicz, M. Schoenauer, Evolutionary algorithms for constrained parameter optimization problems, Evolutionary Computation 4 (1) (1996) 1–32. doi:10.1162/evco.1996.4.1.1.

D. Himmelblau, Applied
nonlinear programming, McGrawHill, 1972.
URL https://books.google.at/books?id=KMpEAAAAIAAJ 
C. Floudas, P. Pardalos,
Handbook of test
problems in local and global optimization, Nonconvex optimization and its
applications, Kluwer Academic Publishers, 1999.
URL https://books.google.at/books?id=jQEoAQAAMAAJ  Q. Xia, Global optimization test problems: A constrained problem difficult for genetic algorithms, http://www.mat.univie.ac.at/~neum/glopt/xia.txt (September 1996).
 T. Epperly, R. E. Swaney, et al., Global optimization test problems with solutions (1996).
 Z. Michalewicz, K. Deb, M. Schmidt, T. Stidsen, Testcase generator for nonlinear continuous parameter optimization techniques, IEEE Transactions on Evolutionary Computation 4 (3) (2000) 197–215. doi:10.1109/4235.873232.
 N. Hansen, A. Auger, O. Mersmann, T. Tusar, D. Brockhoff, COCO code repository., http://github.com/numbbo/coco.

A. Neumaier, Global
optimization test problems, Vienna University.
URL http://www.mat.univie.ac.at/~neum/glopt.html  D. S. Johnson, A theoretician’s guide to the experimental analysis of algorithms, in: Data structures, near neighbor searches, and methodology: fifth and sixth DIMACS implementation challenges, Vol. 59, 2002, pp. 215–250.
 E. MezuraMontes, C. A. C. Coello, What makes a constrained problem difficult to solve by an evolutionary algorithm, Tech. rep., Technical Report EVOCINV012004, CINVESTAVIPN, México (2004).
 D. H. Wolpert, W. G. Macready, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 1 (1) (1997) 67–82. doi:10.1109/4235.585893.
 N. I. Gould, D. Orban, P. L. Toint, A constrained and unconstrained testing environment with safe threads (cutest), https://github.com/ralna/CUTEst (2018).
 S. Le Digabel, S. Wild, A Taxonomy of Constraints in SimulationBased Optimization, Tech. Rep. G201557, Les cahiers du GERAD (2015).
 S. Koziel, Z. Michalewicz, Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization, Evol. Comput. 7 (1) (1999) 19–44. doi:10.1162/evco.1999.7.1.19.
 K. Deb, An efficient constraint handling method for genetic algorithms, Computer Methods in Applied Mechanics and Engineering 186 (2) (2000) 311 – 338. doi:10.1016/S00457825(99)003898.
 L. S. Matott, B. A. Tolson, M. Asadzadeh, A benchmarking framework for simulationbased optimization of environmental models, Environmental Modelling & Software 35 (2012) 19 – 30. doi:10.1016/j.envsoft.2012.02.002.
 J. J. Moré, S. M. Wild, Benchmarking derivativefree optimization algorithms, SIAM Journal on Optimization 20 (1) (2009) 172–191.
 T. Liao, D. Molina, M. A. M. de Oca, T. Stützle, A note on bound constraints handling for the IEEE CEC’05 benchmark function suite, Evolutionary Computation 22 (2) (2014) 351–359. doi:10.1162/EVCO_a_00120.
 J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm and Evolutionary Computation 1 (1) (2011) 3 – 18. doi:10.1016/j.swevo.2011.02.002.

N. Hansen, A. Auger, D. Brockhoff, D. Tusar, T. Tusar,
COCO: performance assessment, CoRR
abs/1605.03560.
URL http://arxiv.org/abs/1605.03560  Z. Michalewicz, Genetic algorithms, numerical optimization, and constraints, in: Proceedings of the 6th International Conference on Genetic Algorithms, Pittsburgh, July 1519, Morgan Kaufmann, 1995, pp. 151–158.
 P. Suganthan, N. Hansen, J. Liang, K. Deb, Y.P. Chen, A. Auger, S. Tiwari, Problem definitions and evaluation criteria for the CEC 2005 special session on realparameter optimization (01 2005).

H. Schwefel, Evolution and
Optimum Seeking, Sixth Generation Computer Technologies, Wiley, 1995.
URL https://books.google.at/books?id=dfNQAAAAMAAJ  W. Gong, Z. Cai, D. Liang, Engineering optimization by means of an improved constrained differential evolution, Computer Methods in Applied Mechanics and Engineering 268 (2014) 884 – 904. doi:10.1016/j.cma.2013.10.019.

R. Mallipeddi, P. N. Suganthan,
CEC competitions on constrained
realparameter optimization, source code (2017).
URL http://www.ntu.edu.sg/home/epnsugan/  V. L. Huang, A. K. Qin, P. N. Suganthan, Selfadaptive differential evolution algorithm for constrained realparameter optimization, in: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 1621 July 2006, 2006, pp. 17–24. doi:10.1109/CEC.2006.1688285.
 T. Takahama, S. Sakai, Constrained optimization by the constrained differential evolution with gradientbased mutation and feasible elites, in: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 1621 July 2006, 2006, pp. 1–8. doi:10.1109/CEC.2006.1688283.
 T. Takahama, S. Sakai, Constrained optimization by the epsilon constrained differential evolution with an archive and gradientbased mutation, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2010, Barcelona, Spain, 1823 July 2010, 2010, pp. 1–9. doi:10.1109/CEC.2010.5586484.
 R. Mallipeddi, P. N. Suganthan, Differential evolution with ensemble of constraint handling techniques for solving CEC 2010 benchmark problems, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2010, Barcelona, Spain, 1823 July, 2010, pp. 1–8. doi:10.1109/CEC.2010.5586330.
 R. Polakova, J. Tvrdík, LSHADE with competing strategies applied to constrained optimization, in: 2017 IEEE Congress on Evolutionary Computation, CEC 2017, Donostia, San Sebastián, Spain, June 58, 2017, 2017, pp. 1683–1689. doi:10.1109/CEC.2017.7969504.
 J. Tvrdík, R. Polakova, A simple framework for constrained problems with application of LSHADE44 and IDE, in: 2017 IEEE Congress on Evolutionary Computation, CEC 2017, Donostia, San Sebastián, Spain, June 58, 2017, 2017, pp. 1436–1443. doi:10.1109/CEC.2017.7969472.
 A. M. Sutton, M. Lunacek, L. D. Whitley, Differential evolution and nonseparability: using selective pressure to focus search, in: Proceedings of the 9th annual conference on Genetic and evolutionary computation, ACM, 2007, pp. 1428–1435.
 C. GarcíaMartínez, P. D. Gutiérrez, D. Molina, M. Lozano, F. Herrera, Since CEC 2005 competition on realparameter optimisation: a decade of research, progress and comparative analysis’s weakness, Soft Computing 21 (19) (2017) 5573–5583. doi:10.1007/s0050001624719.
 M. Hauschild, M. Pelikan, An introduction and survey of estimation of distribution algorithms, Swarm and Evolutionary Computation 1 (3) (2011) 111–128.
 K. J. Arrow, A difficulty in the concept of social welfare, Journal of Political Economy 58 (4) (1950) 328–346. doi:10.1086/256963.

D. Brockhoff, N. Hansen, T. Tušar, O. Mersmann, P. R. Sampaio, A. Auger,
A. Atamna et al., COCO
documentation repository.
URL http://github.com/numbbo/cocodoc  P. R. Sampaio, N. Hansen, D. Brockhoff, A. Auger, A. Atamna, A methodology for building scalable test problems for continuous constrained optimization, Gaspard Monge Program for Optimisation (PGMO), ParisSaclay (2017).

S. Boyd, L. Vandenberghe,
Convex Optimization,
Berichte über verteilte Messysteme, Cambridge University Press, 2004.
URL https://books.google.at/books?id=mYm0bLd3fcoC 
N. Hansen, T. Tusar, O. Mersmann, A. Auger, D. Brockhoff,
COCO: the experimental procedure,
CoRR abs/1603.08776.
arXiv:1603.08776.
URL http://arxiv.org/abs/1603.08776  B. Efron, R. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis, 1994.

V. Klee, G. Minty, How
good is the Simplex algorithm?, Defense Technical Information Center, 1970.
URL https://books.google.at/books?id=R843OAAACAAJ  N. Megiddo, M. Shub, Boundary behavior of interior point algorithms in linear programming, Mathematics of Operations Research 14 (1) (1989) 97–146.
 A. Deza, E. Nematollahi, T. Terlaky, How good are interior point methods? Klee–Minty cubes tighten iterationcomplexity bounds, Mathematical Programming 113 (1) (2008) 1–14.
 P. Spettel, H.G. Beyer, M. Hellwig, A covariance matrix selfadaptation evolution strategy for linear constrained optimization, IEEE Transactions on Evolutionary Computation(under review).