IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics
LIACS, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
August 28, 2019
Abstract
IOHprofiler is a new tool for analyzing and comparing iterative optimization heuristics. Given as input algorithms and problems written in C or Python, it provides as output a statistical evaluation of the algorithms’ performance by means of the distribution on the fixedtarget running time and the fixedbudget function values. In addition, IOHprofiler also allows to track the evolution of algorithm parameters, making our tool particularly useful for the analysis, comparison, and design of (self)adaptive algorithms.
IOHprofiler is a readytouse software. It consists of two parts: an experimental part, which generates the running time data, and a postprocessing part, which produces the summarizing comparisons and statistical evaluations. The experimental part is build on the COCO software, which has been adjusted to cope with optimization problems that are formulated as functions with being a discrete alphabet of integers. The postprocessing part is our own work. It can be used as a standalone tool for the evaluation of running time data of arbitrary benchmark problems. It accepts as input files not only the output files of IOHprofiler, but also original COCO data files. The postprocessing tool is designed for an interactive evaluation, allowing the user to chose the ranges and the precision of the displayed data according to his/her needs.
IOHprofiler is available on GitHub at https://github.com/IOHprofiler.
Keywords: Benchmarking, BlackBox Optimization, Discrete Optimization, Evolutionary Computation, Algorithm Profiling
Contents
 1 Introduction
 2 Summary of Algorithm and Problem Requirements
 3 Precision of the Analysis and Data Format
 4 Supported Performance Analyses
 5 Conclusions and Possible Extensions
 A Manual for the Experimental Part: Data Generation
1 Introduction
The ultimate goal of research on optimization problems is the design of efficient problem solvers, which determine highquality solutions at low cost. Thousands of new algorithms are suggested every year, and the questions of how their performances compare across different optimization problems, and of how far the underlying design ideas can be used to solve different types of optimization problems impose themselves. Benchmarking addresses these questions in a principled way, by providing an empirical performance evaluation across different types of optimization problems. The task of designing suitable benchmarking environments is highly nontrivial, and comprises the following questions:

Which type of optimization algorithms shall be compared?

Which benchmark problems are most suitable for the comparison?

Which performance measures should be used?

Apart from performance, which additional properties of the algorithms should be compared?
In addition to these questions, a number of technical difficulties, such as the various programming languages in which the algorithms and problems are written or the platform on which the benchmark suite is executed also need to be addressed.
The answer to any of the questions Q1Q4 is quite subjective, and we cannot expect to reach consensus among the scholars and users of different optimization methods. When restricting to certain classes of optimization algorithms, however, it is possible to distill a number of common design principles. In the following, we briefly explain the choices and assumptions made by IOHprofiler.
1.1 Iterative Optimization Heuristics
With respect to Q1, we focus in this work on iterative optimization heuristics (IOH). As IOH we classify all algorithms which aim to find optimal solutions by an iterative search. That is, to optimize a problem , these algorithms proceed in rounds. In each round, the objective values of one or more solution candidates (search points) are evaluated. Their function values are used to update the strategy by which the search points for the next round are generated. The search continues until a stopping criterion has been met, e.g., when a solution of a desired quality has been found, a time budget has been reached, or no significant progress could be observed in the last iterations.
The class of IOH subsumes local search variants (including first/steepest ascent, variable neighborhood search, Simulated Annealing, Metropolis algorithms, etc.) and global search heuristics such as evolutionary algorithms, (Quasi)Monte Carlo algorithms, swarm intelligence, differential evolution, estimation of distribution algorithms, efficient global optimization, Bayesian optimization, etc.
IOH are particular useful for the optimization of complex, highdimensional, and largescale optimization problems. They are—in par with mathematical programming—among the most frequently applied optimization routines in industrial and academic optimization.
1.2 RealValued Optimization
Providing an answer to Q2 is arguably the most subjective part of the decision process. We have chosen to take this question aside and to present a very general benchmarking environment which allows an indepth comparison of IOH for arbitrary realvalued optimization problems.^{1}^{1}1At the moment, we assume only that the problems are static (i.e., does not change while being optimized) and noisefree (i.e., is a deterministic function). An extension to dynamic, noisy, and multiobjective optimization problems is under consideration. Users interested in such cases are invited to contact the authors to discuss how to modify IOHprofiler to cover such optimization problems. That is, we do not intend in this work to discuss which problems are particularly suitable for the comparison of different IOH. Rather do we offer a tool that can be used to compare performance across functions of the user’s choice.
Since the algorithms need to know which search space they should operate on, we focus in the experimental part on problems defined over a discrete alphabet; i.e., we allow functions of the type with being a discrete alphabet of integers. Note that this class comprises in particular the (very broad) class of pseudoBoolean optimization problems, i.e., functions of the type .
The postprocessing part does not make any assumptions on the type of problem. It can also be used to compare performance across arbitrary optimization problems . It notably accepts in particular data files produced by the original COCO software.
The experimental part of IOHprofiler assume maximization as objective. The postprocessing part automatically detects from the bestsofar values whether minimization or maximization was the objective of the corresponding experiment.
1.3 Performance Measures
With respect to Q3 we make the important assumption that the running time of the algorithms is dominated to a significant extend by the evaluation of solution candidates, so that all performance measures are based on the number of function evaluations. Originally inspired by socalled blackbox optimization (which, in intuitive terms, assumes that the objective function is not accessed by the algorithm other than through the evaluation of solution candidates), these measures are today established performance indicators also in situations where algorithms do have access to (and do make use of) instance data. In several streams of Computer Science the evaluation of a function value is considered a query (with the idea that the objective value is queried from an oracle), and the number of function evaluations referred to as query complexity. The advantage of evaluationbased performance measures is that they are independent of the machine on which they are executed. However, the user should keep in mind that query complexity can only give an accurate picture of CPU time when the latter is indeed determined to a large extend by the number of evaluated samples.
As standard performance measures IOHprofiler provides information about the distribution of fixedtarget running times and fixedbudget function values. These results include average values and quantiles, but also empirical cumulative distribution function (ECDF) curves, histograms, and empirical probability mass functions. All results are presented in an interactive format that allows the user to specify the granularity, the ranges, and the precision at which the results are displayed. All plots can be stored as png files, the data tables as csv files.
1.4 Tracking Additional Information
Addressing Q4, IOHprofiler can also be used to analyze the evolution of various algorithm parameters. The parameters to be tracked are specified by the user. For any of these parameters the user can obtain the same type of statistics as for the running time data. That is, the standard output of IOHprofiler includes in particular statistics (average, median, quantiles,…) about the parameter value at a given point in time (fixedbudget perspective) and at a given function value (fixedtarget perspective).
This profiling aspect can be of independent interest, as such information can be very useful for the design of suitable optimization heuristics.
1.5 GitHub page of IOHprofiler
IOHprofiler can be downloaded from the GitHub page https://github.com/IOHprofiler.
1.6 Mailing List
Users interested in receiving important updates about IOHprofiler can subscribe to a mailing list at http://eepurl.com/dBahWb.
1.7 User Support
The development team of IOHprofiler can be reached by email at Carola.Doerr@mpiinf.mpg.de^{2}^{2}2This address will be updated with a more generic one in the next version, but for now, please use this address.
Users can use this contact address to ask for support with the setup of IOHprofiler, but also to suggest new functionalities, different evaluation statistics, etc., or to provide feedback.
1.8 License and Main References
The experimental part of IOHprofiler is build on the COCO software [HAM16], available at https://github.com/numbbo/coco, and includes various modifications to adjust this tool to discrete optimization, to allow for the transformations described in Section 2.2, to choose the granularity by which the data is stored, and to track algorithm parameters.
The postprocessing part is original work; the visualization of the results uses the R library plotly from https://plot.ly/.
IOHprofiler is governed by the BSD 3Clause license.
1.9 Structure of the Documentation
In the following sections, we summarize the algorithm and problem requirements of IOHprofiler (Section 2), discuss the various options to set the precision of the performance evaluation (Section 3), provide an overview over the standard outputs generated by IOHprofiler (Section 4), and conclude with a number of extensions currently in preparation and planned for future releases of IOHprofiler (Section 5).
A stepbystep manual for the two experimental part of IOHprofiler is provided in Section A of the appendix. The manual for installing and running the postprocessing part of IOHprofiler can be found on the aforementioned GitHub page https://github.com/IOHprofiler.
2 Summary of Algorithm and Problem Requirements
In this section we summarize the class of algorithms for which IOHprofiler can compare performance and discuss the type of optimization problems which are admissible.
2.1 Iterative Optimization Heuristics
The focus of IOHprofiler is on the performance analysis of iterative optimization heuristics (IOH), the class of all algorithms that follow the structure of Algorithm LABEL:alg:IOH. As mentioned in the introduction, this class comprises all sorts of randomized search heuristics, ranging from simple local hillclimbers to complex global search heuristics. The only important feature is that these heuristics do (not only) directly manipulate the problem data, but rather work in a trialanderror fashion, in which the the function evaluation of search points is an integral part of the optimization routine.
Note that in Algorithm LABEL:alg:IOH all randomized decisions can be replaced by deterministic ones, so that the class of IOH also subsumes deterministic optimizers.
algocf[t] \end@float
Counting function evaluations. As mentioned above, an important assumption that we make about the algorithms is that their running time is determined (to a large extend) by the time needed to evaluate the solution candidates. We therefore regard in IOHprofiler only performance measures that are based on counting the number of function evaluations; either in a fixedtarget or a fixedbudget sense. While the former answers the question how many evaluations are needed to identify a solution of a certain quality., the fixedbudget perspective addresses the complementary questions asking for the quality of the best solutions that can be identified within a given budget of function evaluations.
Generalpurpose vs. problemaware algorithms. Our original interest is in comparing iterative optimization heuristic that do not have any a priori knowledge about the (type of) optimization problem that they are facing. That is, we classically work in the aforementioned blackbox setting, in which we assume that the algorithm only knows that the problem is a function ; i.e., it “knows” in particular the domain (search space) and the codomain (and possibly some bounds on the codomain). In the classic blackbox optimization scenario, the only way to acquire knowledge about the problem instance is through the evaluation of potential solutions . However, despite this initial motivation, IOHprofiler is nevertheless also suitable for the comparison and profiling of problemaware heuristics, which have been designed for a particular type of optimization problem.
As we shall explain in the next subsection, IOHprofiler offers to test several problem instances that are obtained from a given base problem through transformations of the search points and/or the function values. This allows the user to analyze, for example, if an algorithm is invariant with respect to problem representation and with respect to absolute function values.
2.2 Admissible Benchmark Problems
As mentioned above, the experimental part of IOHprofiler assume maximization as objective, and allows arbitrary functions with being a discrete alphabet of integers. In contrast, the postprocessing part of IOHprofiler does not make any further assumption about the type of problems for which the statistics are generated; all realvalued problems are admissible, and the objective can be either minimization or maximmization.
Both parts assume that the optimization problem is static, i.e., it does not change over time. We also assume that the evaluations are noisefree.
Problem instances. Instead of testing one particular problem only, the user can choose to run experiments on several problem instances that are obtained from through a set of transformations. In its most general form, IOHprofiler currently offers to return to the algorithm the values , where

is a multiplicative shift of the function value,

is a additive shift of the function value,

is an XORshift of the search point,

is a permutation of the search point. Note here that, in abuse of notation, we identify the permutation with the heredefined reordering of the bit string.
Note that the “” transformation is defined only for pseudoBoolean problems , but it can easily be extended to search spaces of the form .^{3}^{3}3Users interested in such an extension are invited to contact the authors to discuss a possible integration of such a transformation to IOHprofiler.
The transformations defined above can be used to test if an algorithm behaves invariant under the proposed modifications. This is an often desired feature of iterative optimization heuristic.
The user can choose if and which of the transformations are applied to his/her problem, cf. Section A.5 for details. If no transformation is selected, the original values are returned to the algorithm.
We mention here already that, for convenience of the data analysis, the output files store the following four values:

, the nonshifted function value of the search point evaluated in the corresponding iteration,

the bestsofar value,

, the shifted function value of the current search point; this is the value that the algorithm has access to, and

the bestsofar value.
3 Precision of the Analysis and Data Format
A sound statistical comparison of algorithms requires a substantial amount of data. For the standard output of IOHprofiler, we track for selected evaluations the number of search points evaluated up to this iteration, the (transformed and the original) function value of the solution under evaluation, and the (transformed and original) function value of the bestsofar solution. Apart from this information, IOHprofiler can store additional data, such as the parameters that determine the exact structure of the algorithm. Typical examples for such parameters are the radius at which new solution candidates are sampled (e.g., the mutation rate), the number of offspring evaluated in the present iteration, and parameters that determine the selection of the points to keep in the memory. The user selects which algorithm parameters are tracked, cf. Section A for details.
An example for a standard output file be seen in Figure 2.
The interval at which data is stored is chosen by the user. IOHprofiler allows for the following options. A detailed description how to select this granularity is provided in Section A.

Complete tracking (*.cdat files): This data file provides the highest granularity, by storing the abovedescribed information for each function evaluation.

Interval tracking (*.idat files): The user specifies a step size . Data is stored for every th function evaluation.

Targetbased tracking (*.dat files): These data files store data for each iteration in which the bestsofar function value improves.

Timebased tracking (*.tdat files): In this data file, records are written when the userspecified running time budgets are reached. These running time budgets are evenly spaced in the log scale, taking the form or . Here, and can be set by the user.
With the current experimental setup, the *.dat data format is always generated, the other three are optional.
The structure of the output files follows very closely that of the COCO environment: For each tested algorithm, a separate folder Algorithm1.zip is created; the name of this folder can be chosen by the user, see Section A for details. In this folder we find for each tested benchmark function a “.info” file, e.g., IOHprofiler_f2_i1.info (where the part “_f2” indicates the tested function and the part “_i1” the smallest index of the tested instances of this benchmark problem). This file contains the following information:

in the first line we store the name of the benchmark suite (a suite is a collection of benchmark functions), the ID of the benchmark function, the dimension for which experiments have been conducted, the name of the algorithm, and information about the version of the IOHprofiler.

the second line is an empty line containing only the symbol %. The user can use this line to record some information about the algorithm or the experiment. This information can be specified in the configuration file.

the subsequent lines specify the path where the actual runtime data is located (in the example of the screenshot above, this is the file IOHprofilerexp_f2_DIM100_i1.dat in folder data_f2. Thereafter, it is recorded for each run (100 in the example) how many lines of data points have been stored, along with the final bestsofar value of the respective run. In the example above, 12 503 data points have been stored for the first run, and the best found solution had a function value of .
When several dimensions have been tested, the corresponding information above is written into the IOHprofiler_f2_i1.info one below the other.
The performance data is stored in subfolders; one subfolder for each tested benchmark function. In these subfolders the different data files specified above can be found, generic names for these files are IOHprofiler_f2_DIM1000_i1.dat for a *.dat file containing performance data from an experiment on the dimensional function f2. Results for different dimensions are stored in the same folder.
4 Supported Performance Analyses
We recall that the objectives of IOHprofiler tool are twofold. On the one hand, it aims to contribute to a statistically sound comparison of iterative heuristics for pseudoBoolean optimization problems. This is the benchmarking aspect of IOHprofiler. An important motivation for algorithm benchmarking is the desire to generate insights that can be used for the design of efficient optimizers. To this end, it is not only important to understand well how the algorithms perform on different types of optimization problems; not less important is to analyze how the states of the algorithm itself evolve over time. To address this question, IOHprofiler allows to track the evolution of the key parameters that determine the algorithm. The evaluation of these parameters covers the profiling aspect of IOHprofiler.
We describe in this section the standard outputs that IOHprofiler generates. The results are grouped into three categories:

FixedTarget Results, described in Section 4.3: This section provides summarizing statistics covering the fixedtarget perspective of performance evaluation. That is, the results in this section mainly address the question how much “time” (i.e., function evaluations) is needed to obtain a solution of a desired target quality.

FixedBudget Results, Section 4.4: Covering the fixedbudget perspective, these outputs present statistics for the quality of the search points obtained within a given budget of function evaluations. That is, the results in this section mainly address the question how good the search points are that a user can expect to see within a given time frame (where “time” refers again to the number of evaluations).

Algorithm Parameters, Section 4.5: This section provides details about the evolution of the algorithm parameters that the user specified to be tracked during the experimental part.
Performance Measure in Preparation: IOHprofiler currently does not perform statistical tests, nor comparing results over various problem dimensions, nor performance aggregation over several benchmark problems. These measures are currently in preparation, and will be made available shortly. Note, however, that in addition to the summarizing statistics detailed in the next subsections, IOHprofiler provides for each section the option to store sorted raw data, which may be convenient for computing additional performance measures, statistical tests, etc. Users interested in a discussion which additional performance measures to include as a standard output, are asked to get in touch with the IOHprofiler developers.
4.1 Notation and Basic Terminology
Before we present the various outputs, we briefly discuss the terminology used in the remainder of this section. We recall that we assume maximization as objective.
For every algorithm and every function , we denote by

the number of function evaluations that have been performed in run until and including the first evaluation of a search point satisfying ; i.e., the “time” needed by algorithm in run to reach for function a solution with target value at least .

, the function value of the best among the first evaluated solution candidates in run . The variable is referred to as budget.
The values and are aggregated over the independent runs to fixedtarget running times and fixedbudget function values, respectively. Note that and are random variables, and and samples thereof.
Among the most classic performance measures are the mean values and of the distributions and . We approximate these expected values by the empirical averages over all independent runs, and abbreviate:

, the average budget needed to find a solution of quality at least .

, the average quality of the best solution found within a budget of function evaluations.
When the variables and are not concentrated and/or not symmetric, average values can be misleading. IOHprofiler therefore also computes different quantiles of these distributions. To this end, the values [and , respectively] are sorted in nondecreasing order. We denote by [and , resp.] the th element of the resulting sequence. For any , the th percentile of the distributions are estimated as
respectively.
In addition to these values, it is also interesting to accumulate the running time data into ECDF curves. ECDF stands for empirical cumulative distribution function. Again we have to distinguish between the fixedtarget and the fixedbudget perspective:

In the fixedtarget perspective, an ECDF curve requires to select a set of target values. The corresponding ECDF curve shows for each budget the fraction of the (run, target value) pairs that satisfy that . That is, where denotes the indicator variable, which is one when the condition is satisfied.

Likewise, in the fixedbudget perspective, the user selects a set of budgets. The corresponding ECDF curve shows for each target value the fraction of the (run, budget) pairs that satisfy that .
4.2 Linking the Data Files
In the upload tab of the postprocessing part, the user provides the links to the folders containing the performance data that shall be analyzed. Figure 3 shows this tab. The user can select whether his/her data is in the format of the IOHprofiler experimentation part or in the COCO format. Toggling the efficient mode results in a faster computation of the results, at the cost of precision. After choosing the data file to be uploaded to the tool, the Data Processing Promt on the right records which data has been identified; in the example 100 runs for the 100dimensional version of function f2. The list of processed data at the bottom summarizes this information in table format.
4.3 FixedTarget Results
The fixedtarget section has four different subsections (“tabs”):

‘Data Summary’: this tab provides tables with the fixedtarget running time statistics, as well as tables with the sorted raw values of the individual runs. See Section 4.3.1 for details.

‘Expected Runtime’: an interactive plot illustrates the fixedtarget running times. The user can choose to display mean and/or median values along with the standard deviations. The user also selects the algorithms which are displayed, the range for which the fixedtarget statistics are computed, and whether or not the axes are scaled logarithmically. Confer Section 4.3.2 for details.

‘Probability Mass Function’: interactive histograms show the distribution of the values for target values selected by the user. Furthermore, an approximation for the empirical probability mass function is provided in this tab, cf. Section 4.3.3.

‘Cumulative Distribution’: ECDF curves are computed for target values specified by the user. A spiderplot shows the area under the ECDF curves for different target values. In addition, ECDF curves for individual target values can be shown, cf. Section 4.3.4.
4.3.1 FixedTarget: ‘Data Summary’
Figure 4 shows the upper part of the ‘Data Summary’ tab. The user can set the range and the granularity of the results in the box on the left. The table shows fixedtarget running times for evenly spaced target values. More precisely, for each (algorithm , target value ) pair the table provides

runs: the number of runs of algorithm in which at least one solution satisfying has been found,

mean: , the average number of function evaluations needed to find a solution of function value at least ,

median, 2%, 5%, …: the quantiles of these firsthitting times.
The sorted raw data used to compute the summarizing statistics can be downloaded from the Original Runtime Samples section on the bottom of the ‘Data Summary’ tab. For each target value selected in the options box on the left, the table shows, for each algorithm and each run , the number of evaluations performed by the algorithm until it evaluated for the first time a solution of quality at least . The user can choose between a vertical and an horizontal alignment of the data; Figure 5 shows the wide variant. These tables can be stored as csv files.
4.3.2 FixedTarget: ‘Expected Runtime’
The average, median, and standard deviations of the running time samples are depicted against the bestsofar objective values. The displayed elements can be switched on and off by clicking on the legend on the right. This also allows the user to select the algorithms for which the results are shown. Some display options, including the option to store the picture as a png file, appear when moving the mouse over the picture. Detailed numbers appear when hovering the mouse over the curves, cf. Figure 6.
4.3.3 FixedTarget: ‘Probability Mass Function’
The third tab of the fixedtarget section provides, for a target value selected by the user, histograms of the running time samples and an approximation of the probability mass function.
For a selected target value the histogram, displayed in Figure 7, shows for each range the number of runs satisfying . The bin sizes are chosen automatically according to the socalled Freedman–Diaconis rule, by which the bin size is set to . Note that the displayed algorithms can be selected again by clicking on the legend on the right. The user has two options: an overlayed display, where all algorithms are displayed in the same plot, or a separated one, in which each algorithm is displayed in an individual chart.
Finally, the estimation of the probability mass function (cf. Figure 8) may be useful to get a better idea of how the values are distributed for a given target value . The user can opt to show all individual values , or only the approximated probability mass function. Note, however, that the latter is just an approximation, which estimates the probability mass function by treating the running times as continuous variables.^{4}^{4}4Strictly speaking, this method gives imprecise estimations when there are many duplicated values. Improvements are planned for the future version. Note also that in the example of Figure 8 many data points seem aligned, this might be caused by turning (in the upload tab) the “efficient mode” on, in which the raw data set is trimmed.
4.3.4 FixedTarget: ‘Cumulative Distribution’
This tab provides ECDF curves and information about the area under the ECDF curves. For the aggregated ECDF curves, the user selects a range of the target values and the steps at which the data is displayed. Selecting as , , and as in the example of Figure 9, the ECDF curves for target values are computed. When independent runs have been performed, the ECDF curves thus show the fraction of all (run, target value) pairs that satisfy for a given that . In the example of Figure 9, for Algorithm LeadingOnes_resampling (blue curve) this is the case for around of the pairs after function evaluations. For Algorithm LeadingOnes_1_plus_50_adap_p (green curve) the fraction is 58%.
An ideal algorithm would sample the maximal function value in the first step. This algorithm would have a 100% score for all budgets . In practice, such an algorithm does not exist, but it serves as a theoretical upper bound and we use the area under its curve to normalize the areas under the curves of the tested algorithms. The radarlike plot in the ‘Area under the ECDF’ part of the ‘Cumulative Distribution’ tab displays these normalized values for the (equallyspaced) target values chosen by the user, cf. Figure 10.
ECDF curves for individual targets are available in the ‘Single Target’ section of the ‘Cumulative Distribution’ tab. An example is shown in Figure 11.
4.4 FixedBudget Results
The fixedbudget section has the same four tabs as the fixedtarget section:

‘Data Summary’: tables with fixedbudget running time statistics and sorted raw values of the individual runs,

‘Expected Target Values’: interactive plot illustrating the bestsofar functions values as a function of the budget, in particular the averages , the median , and standard deviations,

‘Probability Mass Function’: interactive histograms of the values for budgets selected by the user and an approximation for an empirical probability mass function for , and

‘Cumulative Distribution’: ECDF curves and normalized values for the area under the ECDF curve for budgets specified by the user.
The plots are similar to those presented in Section 4.3, we omit a detailed description.
4.5 Parameter Evolution
In this section the user can track the evolution of the parameters (cf. Section A.6.2 for an example explaining how to record this data in the experimental part of IOHprofiler). In the example of Figure 12, we see that the Algorithm LeadingOnes_1_plus_50_adap_p (red curve) used a static population size of 50, while Algorithm LeadingOnes_1_plus_10_adap_lambda (green curve) uses a dynamic population size. Starting from solutions of function value 80, the average population size of this algorithm was around . A table containing average values as well as quantiles and standard deviations can be downloaded/stored on the bottom of this tab.
The corresponding fixedbudget results will be made available shortly.
5 Conclusions and Possible Extensions
An important aspect of benchmarking, which we have taken aside in this present work, is the selection of suitable benchmark problems. The user can apply IOHprofiler to any set of optimization problems that can be formulated as maximization of a function . In ongoing collaborations with various colleagues, most notably working group 3 from COST action CA15140, we aim to present to the community a suggestion of benchmark functions that should be included in a standardized benchmark set. We recall that the continuous counterpart COCO [HAM16], on which IOHprofiler is built, compares in the singleobjective, static, and noisefree case performance across 24 functions, which are grouped into 5 sets, according to whether or not they are separable, uni or multimodal, well or illconditioned, and according to whether or not they exhibit a global structure, cf. [HFRA09] for details. For the discrete benchmarking, we suggest to start the discussion which problems to include in the benchmark environment by the question which problem features should be represented, and across which of them performance should be aggregated.
All performance indicators provided by IOHprofiler are based on counting function evaluations. In the long run, it will be desirable to allow for a comparison between iterative and noniterative optimization methods such as Mathematical Programming. To this end, the outputs of IOHprofiler will have to be adjusted to timebased performance indicators. A major challenge posed by the latter is the question if or how to provide systemindependent performance measures, i.e., results that do not depend on the hardware on which the algorithms are run.
An important aspect that we plan to address in future work is the extension of IOHprofiler to allow for comparisons of noisy, dynamic, constrained, or multiobjective optimization problems. Finally, we also consider to extend IOHprofiler to other search domains, e.g., permutationbased problems.
As a shortterm perspective, we will include additional performance measures, in particular a comparison across different dimensions and some standard statistical tests. Concerning the experimental part, we are most notably working on simplifying the creation of different problem suits.
Acknowledgments
We thank our colleagues Anne Auger, Dimo Brockhoff, Arina Buzdalova, Maxim Buzdalov, Johann Dréo, Nikolaus Hansen, Pietro S. Oliveto, Ofer Shir, Markus Wagner, and Thomas Weise for various discussions around the benchmarking of iterative optimization heuristics.
Parts of our work have been inspired by working group 3 of COST Action CA15140 ‘Improving Applicability of NatureInspired Optimisation by Joining Theory and Practice (ImAppNIO)’ supported by the European Cooperation in Science and Technology.
Our work has been supported by a public grant as part of the Investissement d’avenir project, reference ANR11LABX0056LMH, LabEx LMH, in a joint call with the Gaspard Monge Program for optimization, operations research, and their interactions with data sciences.
Furong Ye acknowledges financial support from the China Scholarship Council, CSC No. 201706310143.
References
 [DKLW13] Benjamin Doerr, Timo Kötzing, Johannes Lengler, and Carola Winzen. Blackbox complexities of combinatorial problems. Theoretical Computer Science, 471:84–106, 2013.
 [HAM16] N. Hansen, A. Auger, O. Mersmann, T. Tušar, and D. Brockhoff. COCO: A platform for comparing continuous optimizers in a blackbox setting. ArXiv eprints, arXiv:1603.08785, 2016.
 [HFRA09] Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. RealParameter BlackBox Optimization Benchmarking 2009: Noiseless Functions Definitions. Research Report RR6829, INRIA, 2009.
 [LW12] Per Kristian Lehre and Carsten Witt. Blackbox search by unbiased variation. Algorithmica, 64:623–642, 2012.
 [RV11] Jonathan Rowe and Michael Vose. Unbiased black box search algorithms. In Proc. of Genetic and Evolutionary Computation Conference (GECCO’11), pages 2035–2042. ACM, 2011.
Appendix
Appendix A Manual for the Experimental Part: Data Generation
In this section we describe the experimental part of IOHprofiler and provide an illustrated manual, which enables the user to conduct experiments of their choice. We recall that this part is build upon the COCO (COmparing Continuous Optimisers) platform [HAM16], from which the data structure and some tool functions are inherited.
a.1 Preparation
Running IOHprofiler requires a working python environment and a C compiler. The benchmark problems as well as the algorithms can be provided in either python or in C. IOHprofiler has currently been tested with python 2.7.12 and with gcc 5.4.1. A version accepting algorithms and problems in Java is in preparation.
Both the experimental and the postprocessing parts of IOHprofiler can be downloaded from the GitHub page https://github.com/IOHprofiler. After downloading the zipped files of the experimental part, the archive needs to be extracted.
a.2 Overview of the Main Steps
The files of the C [python] interface are located at the path ”/codeexperiments/build/c” [”/codeexperiments/build/python”].
To run an experiment, the following steps need to be executed.

1. Benchmark Selection and Configuration: The user needs to select the set of problems (hereafter called ”the suite”) for which benchmark data shall be generated. The definition of the suite is done in the configuration file configuration.ini, which is also used to select the granularity at which performance data is stored, the location at which the results are stored, etc. The configuration file and the suite definition are described in Section A.3.

2. Algorithm Setup: The algorithm for which the performance data is generated needs to be defined in the file user_algorithm.c [user_algorithm.py]. To this end, the content of the function ”user_algorithm” (which includes as example the code for pure random search) is replaced by the new algorithm.

3. Data Generation: After ensuring that the working path is ”codeexperiments/build/c” [”codeexperiments/build/python”], the execution of the statement ”python ../../../do.py runc” [”python ../../../do.py runpython”] generates the performance data, which is saved in the current path.
a.3 The Configuration File
IOHprofiler applies the INI file format for the configuration file ”configuration.ini”. All variables are grouped into three sections: [suite], [observer], and [triggers].
The section [suite] contains information about the benchmark problems for which performance data is generated. There are four keys in this section:

suite_name is the name of the suite. The suite is defined in the file suite_PBO.c at ”/codeexperiments/scr/”. As an example, the suite PBO, which contains the benchmark problems OneMax, LeadingOnes, a linear function with random weights between 0 and 5, and a jump function with gap size is predefined as an example. For the time being, we recommend that users keep using the PBO suite, and add their benchmark functions to it. The design of a new suite is possible, but not recommended.

functions_id: IDs of the selected benchmark problems. An endash ''is allowed to present the range of problem IDs, for example, ”14”. Alternatively, the problems can be listed by comma, for example, ”1,2,3,4”. An explanation of how to add new benchmark problems and how to assign the function IDs will be given in Section A.4. In the distributed version the following benchmark problems are already defined:

OneMax

LeadingOnes

Jump with jump size

A linear function with fixed, but randomly chosen weights between 0 and 5.


instances_id: IDs of the problem instances, cf. Section A.5. Instances can be selected using commas and endashes, e.g., ”125,75,80100”.

dimensions: the selection of the problem dimensions, e.g., ”100,500,1000” will create experimental data for the three different problem dimensions.
The [observer] section contains information concerning the output files. There are five keys in this section:

observer_name: the name of observer. We suggest to use PBO for the time being, and updated description will be made available when the functionality to create new suits has been improved.

result_folder: the name of folder where the results will be stored. If the folder does not exist, it will be created automatically.

algorithm_name: a name for the algorithm. This information will be stored in the output files.

algorithm_info: users can write here additional information about the algorithm here, this information will be stored in the .info file of the output, cf. Section 3.

parameters_name: a list of parameters to be stored. If no algorithm parameters are to be stored, the users leaves this as ””. If several parameters are to be stored, they are separated by a comma, e.g., ”p1, p2, p3”.
In the [triggers] section the user decides the granularity of the performance data. According to the user’s choice, up to four different files will be created, cf. Section 3 for a description of the available output files.
There are four keys in the [triggers] section.

complete_triggers: Set as ”true” to output *.cdat files.

number_interval_triggers: The step size for *.idat files. For example, selecting number_interval_triggers = 50 will store results of every 50th function evaluation. If the user does not wish to generate *.idat files, he/she selects number_interval_triggers = 0.

number_target_triggers: the budget of storing information for every in *.tdat files. If the user does not wish to generate *.tdat files, he/she selects number_target_triggers = 0.

base_evaluation_triggers: A set of parameter for *.tdat files, for example, ”1,2,5” means storing information every th, th and th function evaluation in *.tdat files. If the user does not wish to generate *.tdat files, he/she selects base_evaluation_triggers = 0.
a.4 Adding New Benchmark Problems
IOHprofiler allows to add new userdefined benchmark problems. To do so, the user needs to create a problem file, which contains the definition of the function, and needs to include the problem into a suite.
The benchmark problems are defined in f_*.c files. The probably easiest way to create a new benchmark problem is to copy the OneMaxexample contained in file f_one_max.c, and to adjust it to the new function.
The actual definition of the OneMaxfunction is contained in the function f_one_max_raw. The user should replace the content of this function by his/her problem. All occurrences of ”one_max” need to be replaced by the name of the new problem (we use “new_problem” in the following). If different problem instances are desired, the user creates these in the function f_new_problem_IOHProfiler_problem_allocate, cf. Section A.5 for details.
To be used by IOHprofiler, the new problem needs to be added to a problem suite. For example, the preinstalled suite ”PBO” is defined in the file suite_PBO.c. To include the new problem in this suite, the problem file f_new_problem.c is added to the header of the file suite_PBO.c. Then, the f_new_problem_IOHProfiler_problem_allocate function of the new problem needs to be called in the function PBO_get_problem. Finally, the problem numbers need to be modified in the function suite_PBO_initialize.
a.5 Problem Instances
Many standard IOHs are representationinvariant, in the sense that their performance is identical on each fitness landscape, regardless of how it is embedded. That is, the performance is oblivious of rotations and shifts. Furthermore, it is sometimes argued that performance should also be oblivious with respect to a scaling of the function values—algorithms respecting this invariant are often referred to as “comparisonbased”. Users interested in testing how sensitive their algorithms are with respect to search space and/or fitness landscape transformations can make use of buildin transformations: To this end—similar to the COCO framework—IOHprofiler offers to test performance on various instances of the same problem. Precisely, the following four transformations are available. They can be combined with each other, cf. also Section 2.2 of the main document. As mentioned above, these transformations are chosen by the user in function f_new_problem_IOHProfiler_problem_allocate in the file f_new_problem.c.

transform_obj_scale: multiplicative shift of the function values, i.e., instead of the transformed function values are returned to the algorithm.

transform_obj_shift: additive shift of the function values, i.e., instead of the algorithms receive the transformed function values .

transform_vars_xor: an XOR of the search point, i.e., a shift in the search space. Instead of evaluating , the function values are computed and returned to the algorithm. Note that in this case the instance has a fitness landscape that is isomorphic to that of the original function . Algorithms that are unbiased in the sense of [LW12, RV11, DKLW13] show the same performance on any of these instances.

transform_vars_sigma: a permutation of the search point, i.e., instead of computing function values , the algorithms receive the function values of the permuted search points , where is a permutation of the set .
Using OneMaxas an example, we demonstrate how to define the different problem instances. As a general rule, we recommend to reserve instance “1” for the original benchmark problem, i.e., the one that does not call any of the four transformations.
We first demonstrate to convert the original OneMaxfunction so that instead of values the algorithm receives the function values . To this end, we first assign to ”problem” the original OneMaxinstance (cf. line 1 in the example below). After that, the instance is transformed to (line 6), where is a pseudorandomized binary vector chosen in line 2. In lines 7 and 8, we then transform the instance to and to , respectively. The multiplicative and additive shifts ”” and ”” are again two pseudorandom numbers, which are chosen in lines 35. The ranges for and are and , respectively.
The following code shows how to transform the original instance to , where we recall that we denote by the permuted string . After being allocated with the original OneMaxinstance , ”problem” is transformed to in line 13, where is a pseudorandom permutation chosen in lines 3 to 9. Then, following the same procedure as in the example above, ”problem” is transformed to in line 14 and, finally, to in line 15, where and are again random numbers, chosen in lines 10 to 12.
a.6 Examples
The user can find two examples in the git folder /example/:

/example/example1/ includes the code and the results of pure random search (hereafter called ”random search”), while

example/example2/ includes the code and results of a evolutionary algorithm.
We describe the configuration of these examples in Sections A.6.1 and A.6.2.
a.6.1 Example 1: Pure Random Search
The example of the ”random search” method is located at the path /example/example1/.
The ”configuration.ini” file is set as follows.
suite
suite_name = PBO
functions_id = 14
instances_id = 1100
dimensions = 100
observer
observer_name = PBO
result_folder = EXP
algorithm_name = RANDOM_SEARCH
algorithm_info = RANDOM_SERACH
parameters_name = evaluation
triggers
complete_triggers = true
number_interval_triggers = 10
number_target_triggers = 3
base_evaluation_triggers = 1,2,5
Based on this configuration file, performance data is collected for the algorithm optimizing the 100dimensional variants of the benchmark problems 1, 2, 3, and 4. For each problem, the instances from 1 to 100 are used. For each instance the number of independent runs performed in the experimental part is specified in the variable ”INDEPENDENT_RESTARTS” in the file containing the algorithm, cf. example below. If, for example, the user wishes to run each of the instances 1100 instances twice, he/she sets ”instance_id = 1100” in the configuration file, and sets ”INDEPENDENT_RESTARTS = 2” in the algorithm file.
The results of this example experiment will be stored in the folder ./EXP/. The name of the parameter to be stored is set as ”evaluation”; this will be the header of the respective column in the output files.
In this example four different output files will be created:
 *.cdat storing data of each iteration,
 *.idat storing data from every 10th iteration,
 *.tdat storing data from every th, every th, th, and th function evaluation,
 *.dat storing data for each iteration in which an improvement has been found,
where * is of the form IOHprofiler_f1_DIM100_i1, as explained in Section 3.
The user_algorithm that implements the pure random search is implemented in file /example/example1/c/user_algorithm.c as follows:
The user needs to set two parameters in the user_algorithm.c [user_algorithm.py].

BUDGET_MULTIPLIER: This parameter controls the maximal budget of function evaluations, which is set to BUDGET_MULTIPLIER times the dimension of the problem.

INDEPENDENT_RESTARTS: The number of independent runs of the algorithm for each instance. That is, setting ”INDEPENDENT_RESTARTS=100” and ”instance_id=13” will result in an overall number of 300 runs—100 independent runs for each of the first three instances.
With the code above, the maximal number of evaluations for each run is 50*dimension, and the algorithm will not restart within one run.
For each iteration, a new individual x is generated randomly (line 12), and the fitness is evaluated by line 16. Note here that y is a vector that stores the fitness of x. Also, a parameter p (evaluation step) will be logged in output files (line 15), and its logging name is defined in the configuration file as ”evaluation”. We will see in the next section an example where more than one parameter are stored.
a.6.2 Example 2: A () Ea
This example of a () EA is located at the path /example/example2/. The configuration file is as follows.
suite
suite_name = PBO
functions_id = 14
instances_id = 1
dimensions = 100,500,1000
observer
observer_name = PBO
result_folder = EXP
algorithm_name = ONE_PLUS_LAMDA_EA
algorithm_info = ONE_PLUS_LAMDA_EA
parameters_name = mutation_rate,l
triggers
complete_triggers = true
number_interval_triggers = 50
number_target_triggers = 0
base_evaluation_triggers = 0
Based on this configuration file, running time data is generated for the benchmark problems with function IDs 1 to 4. For each function, the algorithm will be run for dimension 100, 500, and 1000. Only the first instance (without transformation) is tested. All results will be stored in the folder ./EXP/, and the names of the two parameters which are tracked are set as ”mutation_rate” and ”l” (the number of bits in which parent and offspring differ).
The three data files *.idat (storing data after every 50th evaluation), *.cdat, and *.dat will be generated.
The user_algorithm that implements the () EA is as follows:
With the code above, the maximal number of function evaluations for each run is 50*dimension, and the algorithm will do ten independent runs for each selected instance (as discussed above, only instance 1 has been selected in the configuration file).
For each generation, offspring are created by mutating the parent individual (line 34). Their fitness is evaluated in line 36. The function fun(offspring) returns the fitness of the offspring.
In addition to the information about the fitness values, a vector of parameters (’para’) will be stored in the output files (line 36). The vector stores the mutation_rate and the number of flipped bits (line 35). The names attributed to these parameters are chosen as ”mutation_rate, l” in the configuration.ini file.
a.7 Overview of the Different Files
The following lists summarize the folders and the main files that can be found in the experimental part of IOHprofiler. The files that need to be edited by the user are formatted in bold font.
Folder /codeexperiments/:

/src/ : a folder of source files

/build/ : a folder of C and Python interfaces

/tools/ : some common tools for the project
Folder /example/:

/example1/ : examples of random search method, cf. Section A.6.1

/example2/ : examples of () EA, cf. Section A.6.2
Folder /src/:

f_binary.c : Implementation of the binary function and problem

f_jump.c : Implementation of the jump function and problem

f_leading_ones.c : Implementation of the leading ones function and problem

f_linear.c : Implementation of the linear function and problem

f_one_max.c : Implementation of the one_max function and problem

IOHProfiler.h : Header file for all public IOHProfiler functions and variables

IOHProfiler_internal.h : Definitions of internal IOHProfiler structures and typedefs

IOHProfiler_observer.c : Definitions of functions regarding IOHProfiler observers

IOHProfiler_platform.h : Automatic platformdependent configuration of the IOHProfiler framework

IOHProfiler_problem.c : Definitions of functions regarding IOHProfiler problems

IOHProfiler_random.c : Definitions of functions regarding IOHProfiler random numbers

IOHProfiler_runtime_c.c : Generic IOHProfiler runtime implementation for the C language

IOHProfiler_string.c : Definitions of functions that manipulate strings

IOHProfiler_suite.c : Definitions of functions regarding IOHProfiler suites

IOHProfiler_utilities.c : Definitions of miscellaneous functions used throughout the IOHProfiler framework

suite_PBO_legacy_code.c : Methods for generating pseudo random numbers

logger_PBO.c : Implementation of the PBO logger

observer_PBO.c : Implementation of the PBO observer

suite_PBO.c : Selection of functions to be included in the PBO suite

transform_obj_shift.c : Implementation of shifting the objective value by the given offset

transform_obj_scale.c : Implementation of scaling the objective value by the given offset

transform_vars_shift.c : Implementation of shifting all decision values by an offset

transform_vars_xor.c : Implementation of the xor of all decision values by an offset

transform_vars_sigma.c : Implementation of reordering all decision values by permuting the string of decision values
Folder /build/c/:

Makefile : Makefile to build the C program

user_experiment.c : The interface to invoke user algorithm

user_algorithm.c : The file where the user defines his/her algorithm

configuration.ini : the configuration file, cf. Section A.3

…
Folder /build/python/:

user_experiment.py : The interface to invoke user algorithm

user_algorithm.py : The file where the user defines his/her algorithm

configuration.ini : the configuration file, cf. Section A.3

…