Optimal Parameter Choices Through SelfAdjustment: Applying the 1/5th Rule in Discrete Settings
Abstract
While evolutionary algorithms are known to be very successful for a broad range of applications, the algorithm designer is often left with many algorithmic choices, for example, the size of the population, the mutation rates, and the crossover rates of the algorithm. These parameters are known to have a crucial influence on the optimization time, and thus need to be chosen carefully, a task that often requires substantial efforts. Moreover, the optimal parameters can change during the optimization process. It is therefore of great interest to design mechanisms that dynamically choose bestpossible parameters. An example for such an update mechanism is the onefifth success rule for stepsize adaption in evolutionary strategies. While in continuous domains this principle is well understood also from a mathematical point of view, no comparable theory is available for problems in discrete domains.
In this work we show that the onefifth success rule can be effective also in discrete settings. We regard the GA proposed in [Doerr/Doerr/Ebel: From blackbox complexity to designing new genetic algorithms, TCS 2015]. We prove that if its population size is chosen according to the onefifth success rule then the expected optimization time on OneMax is linear. This is better than what any static population size can achieve and is asymptotically optimal also among all adaptive parameter choices.
1 Introduction
It is widely acknowledged that setting the parameters of evolutionary algorithms (EA) is one of the key difficulties in evolutionary optimization. Eiben, Hinterding, and Michalewicz [13] call this challenge “one of the most important and promising areas of research in evolutionary computation”. This statement retains its topicality 15 years after the original publication of [13] as many talks at evolutionary computation conferences certify. We also understand today that even small changes in the parameters can yield to exponential performance gaps of the regarded algorithms [11, 10].
Substantial research efforts have been undertaken to find good parameter settings for general EAs, see for example [4]. Around the same time it has been discovered that it may be suboptimal to use a fixed set of parameters throughout the whole optimization process. It was suggested instead to change the parameters of the algorithms by some dynamic update rules, often using some sort of feedback of the fitness landscape that the algorithm is facing. For example, it can be beneficial in earlier parts of the process to invest in exploration of the fitness landscape, while the algorithm should become more stable and focus on one or few areas of attraction in the later exploitation phase(s).
Interestingly, while in continuous domains parameter control
With this work we provide a first example for a discrete optimization problem where a selfadjusting (i.e., adaptive, but not fitnessdependent) parameter choice yields an expected optimization time that is better by more than a constant factor than any static parameter choice. The parameter update rule is extremely simple and does not require any problemspecific insights. More precisely, we analyze the runtime of the already mentioned GA with population sizes chosen according to the onefifth success rule on the generalized OneMax problem. We also show that the onefifth success update scheme is optimal in this setting, that is, no alternative update mechanism can yield a significantly smaller runtime. In fact, we show that, throughout the whole optimization process, the onefifth success rule suggests parameter settings that closely follow the theoretically best possible choices.
1.1 The OneFifth Rule
One of the earliest adaptive update rules suggested in the evolutionary computation literature is the onefifth success rule. It was independently discovered in [21, 5, 22] and constitutes today one of the best known and most widely applied techniques in parameter control. Several empirical results (cf. [14] and references therein) suggest that EAs using the onefifth rule for adaptive parameter control are quite capable of finding optimal or close to optimal parameter settings. Since the parameters are updated without the intervention of the user, such update mechanisms are a very convenient way to minimize parameter tuning efforts. Furthermore, the onefifth success rule does not require any problemspecific knowledge and is thus widely applicable.
Originally, the onefifth rule was designed to control the step size of evolution strategies. In intuitive terms, it suggests that if the probability to create an offspring of better than currentbest fitness is greater than , then the step size should be increased, while it should be decreased if the probability is lower than . Today, this rule has found applications much beyond the adaptation of the step size. Here in this work we use it for adjusting the offspring population size of a genetic algorithm.
1.2 The Ga
We regard the GA, which has been proposed in [9] as a first example of an evolutionary algorithm optimizing the generalized OneMax problem using function evaluations. For the theory of evolutionary algorithms, this is a big success as it shows for the first time that even for such simple problems the usage of crossover can be beneficial (all previously known evolutionary algorithms need function evaluations in expectation to optimize the generalized OneMax problem).
An important parameter of the GA is , the number of offspring generated in the mutation phase and the subsequent crossover phase of the algorithm. In the original paper [9] it is shown that for the expected optimization time of the GA on OneMax is . In [7] we improve the runtime bound to . This expression is minimized for , giving an optimization time of . While this exact expression is irrelevant for the purposes of the present paper, it is important to note that this tight bound is superlinear for every possible choice of .
It was also observed in [9] that the GA has only linear expected optimization time on OneMax when the population size is chosen adaptively depending on the fitness of the currentbest individual. While theoretically appealing, this result has limited practical implications since the proposed fitnessdependent parameter choice crucially requires a very good understanding of the optimization process and thus, of the problem at hand. This is testified by the optimal relation of the population size to the current fitnessdistance to the optimum, which is . Guessing such a functional relationship for realworld optimization problems is typically not doable with reasonable efforts.
Interestingly, an alternative approach based on the onefifth success rule was suggested in [9]. In a series of experimental evaluations it was shown that if the population size is chosen according to this rule, the performance of the resulting GA is among the best ones for a series of test problems. In that algorithm the population size is increased if no improvement has happened in the last iteration, while it is decreased otherwise. More precisely, for a suitable constant , the population size parameter is multiplied by after each iteration in which the fitness of the currentbest individual could not be improved, and it is divided by otherwise, i.e., if the fitness of the currentbest search point increased in that iteration.
1.3 Our Results
While the observations made in [9] are purely empirical, we provide with this work a theoretical analysis of the suggested selfadjusting GA. We prove that the suggested implementation of the onefifth success rule yields a linear expected optimization time of the GA on the generalized OneMax problem. As noted above this is better than what any static parameter choice can achieve and is also best possible among all comparisonbased algorithms as we shall comment in Section 4. In particular, our bound shows that the onefifth success rule suggests population sizes that are asymptotically optimal among all possible (static and dynamic) choices.
To the best of our knowledge, this is the first time that in a discrete search environment a selfadjusting parameter choice is shown to be superior to any static choice. Indeed the GA is the first proven example in discrete evolutionary algorithmics where a nonfitnessdependent parameter choice reduces the optimization time by more than a constant factor. The results that come closest to this are the mentioned results from [3, 19], which are either constant factor reductions of the expected runtime (in case of [3]) or reductions of the parallel expected runtime but not the total number of function evaluations (in case of [19]).
Our proof gives some general insights in the working principles of adaptive parameter choices in discrete domains, which hopefully lead to future applications of this approach in discrete search.
Our paper is organized as follows. We first introduce the GA with static population sizes, give background on the generalized OneMax problem, and recall known bounds for the expected optimization time of the GA on OneMax functions in Section 2. In Section 3 we present the GA with selfadjusting parameter choices along with a brief summary of the mentioned (slightly adapted) classification scheme of Eiben, Hinterding, and Michalewicz [13] for parameter settings. We also present in Section 3 the runtime analysis of the selfadjusting GA (see Section 3.3), followed by a discussion of the general insights obtained through that analysis (Section 3.4). Finally, we show in Section 4 that the onefifth success rule suggests optimal or close to optimal parameter settings.
2 The GA with Static Population Size
Adopting the conventions and notation from [9], we regard here in this work only search spaces with bit string representations, we write , , and for any bit string and any nonnegative integer . By we denote the binomial distribution with trials and success probability ; i.e., for any .
The GA is given in Algorithm LABEL:alg:GA. It uses the following two variation operators.

The unary mutation operator which, given some , creates from a new bit string by flipping exactly bit entries in it.

The binary crossover operator with crossover probability , which, given two bit strings and , chooses by choosing for each with probability and choosing otherwise.
Thus, after a random initialization of the algorithm, in each iteration the following steps are performed.

In the mutation phase, offspring are sampled from the currentbest solution by applying times independently the mutation operator to , where the step size is chosen at random from before the generation of the first offspring.

In the crossover phase, offspring are created from and (one of) the best of the offspring from the mutation phase, , by sampling independently from .

Elitist selection: The best of the individuals of the crossover phase replaces if its fitness is at least as large as the fitness of . If there are several offspring with best fitness, we disregard those that are equal to and choose one of the remaining ones uniformly at random.
Throughout this paper we shall use and , choices which are well justified in [9, Sections 2 and 3]. Only in Section 4 we regard arbitrary choices of the parameters , , and .
algocf[htbp] \end@float
As performance measure we regard the expected running time of the GA, that is, the expected number of function evaluations that the algorithm performs until it evaluates for the first time an optimal search point . This is the common measure in runtime analysis and is sometimes referred to as the expected optimization time. Note that for algorithms performing more than one fitness evaluation per iteration, such as the GA, the expected runtime can be much different from the expected number of iterations (generations).
2.1 The Generalized OneMax Problem
In [9, Section 4] the GA is analyzed by experimental means. The results show that it performs well on OneMax functions, linear functions with random weights, and royal road functions. A theoretical investigation, an improved bound of which is stated below in Theorem 1, however, is currently available only for the generalized OneMax problem. As this problem is also the focus of our present work, we give a short introduction here.
The classical OneMax function counts the number of ones in a bit string. Optimizing it therefore corresponds to finding the allones bit string. Of course, we want the performance of an evolutionary algorithm to be independent of the problem encoding. More specifically, the algorithms that we typically regard have exactly the same optimization behavior on any generalized OneMax function
which counts the number of positions in which the bit strings agrees with the target string . It is easy to see that is the unique (global and local) optimum of . Note also that the classic OneMax function counting the number of ones in a bit string is the function with . It is therefore justified to call the set the generalized OneMax problem. For convenience we often drop the word “generalized” in the following. All statements made below hold for arbitrary OneMax functions.
The expected runtime of most search heuristics on OneMax is , due to a phenomenon called coupon collector’s problem (see, e.g., [6, Section 1.5] or [20]). In intuitive terms, the argument is as follows. When the initial bit string is taken uniformly at random from then each bit has a probability of of being in the wrong initial configuration, i.e., it has to be touched with probability . The coupon collector’s problem states that if we touch one random bit at a time, then it takes iterations until we have touched each bit at least once. Since many evolutionary algorithms (including, for example, EA and Randomized Local Search) change on average one bit per iteration, this implies the bound. It was a longstanding open question whether genetic algorithms can perform better on OneMax than this lower bound. Sudholt [23] gave a first example of a crossoverbased genetic algorithm outperforming the EA on OneMax. But while his (2+1) GA (for a suitably chosen mutation rate) is better by a constant factor than the EA, it does not improve upon RLS. The GA thus gave a first positive answer to this question, as we recall in the next section.
2.2 Runtimes for Static and FitnessDependent Population Sizes
The following statement, proven in [7], provides a tight bound for the expected runtime of the GA on OneMax.
Theorem 1 ([7]).
The expected optimization time of the GA with and on every generalized OneMax function is
Consequently, is the optimal choice for the parameter and this yields an expected optimization time of .
It is also known [9] that a fitnessdependent (and thus, inherently nonstatic) choice of the population size can decrease the runtime even further.
Theorem 2 (Theorem 8 in [9]).
The expected runtime of the GA with , , and fitnessdependent choice of on every generalized OneMax function is linear in .
We will use the latter bound in our analysis of the selfadjusting GA. More precisely, we show that the population sizes suggested by the onefifth success rule typically do not deviate much from the fitnessdependent choice analyzed in Theorem 2. This observation has also been made experimentally in [9]. Figure 1, taken from [9] (Figure 5 in that paper) shows the close relationship between the selfadjusting population sizes (in red) and the optimal fitnessdependent ones (in black) for a typical run of the GA on OneMax.
Two milestones in the analysis of the GA in [9] are the success probabilities of the mutation and the selection phase, respectively. Since we shall make use of these two bounds, we briefly repeat them below.
Note for Lemma 3 that for an offspring of with value greater than there exists at least one position such that while . It is therefore possible to extract in the crossover phase this entry from and thus increasing the overall fitness of the current best search point. This is why we call the mutation phase successful if holds.
Lemma 3 (Lemma 5 in [9]).
In the notation of Algorithm LABEL:alg:GA, for all and , the probability that in the mutation phase a search point is created with is at least .
Lemma 4 (Lemma 6 in [9]).
In the notation of Algorithm LABEL:alg:GA, consider fixed outcomes of , , and . Then the random outcome of the crossover phase satisfies
3 The GA with SelfAdjusting Population Sizes
Being the first provable superconstant speedup via a fitnessdependent parameter choice, the linear optimization time obtained in [9] is a big success in the theory of evolutionary algorithms. From the practical point of view, though, the question remains how in an actual application the user of the GA would guess the fitnessdependent optimal choice of . In this section, we show that this is not needed. A selfadjusting choice inspired by the classic onefifth rule can give the same (optimal, as the result in Section 4 shows) linear optimization time. To the best of our knowledge, this is the first result proving a reduced optimization time via parameter selfadjustment in discrete search spaces. We are optimistic that our approach can be applied to other discrete problems. At the end of this section, we give some general hints that might be useful for such purposes.
3.1 Terminology for Parameter Settings
Since the literature is unanimous with respect to the terminology for parameter settings, we have adopted and slightly extended in this work the taxonomy of Eiben, Hinterding, and Michalewicz [13]. Figure 2, an adapted version of Figure 1 in [13], illustrates this classification, which we briefly summarize below.
The efforts of choosing the right parameters in an evolutionary algorithm is called parameter setting. The first difference is between static and dynamic parameter settings. In the former the parameters are set before the actual run of the algorithm and they are not changed during the optimization process. In typical applications, a parameter tuning step precedes the application of the EA. In this phase, suitable parameter choices are sought through initial experimental investigations, either for all parameters simultaneously, or in an iterative process.
Optimizing dynamic parameter choices is called parameter control in [13]. Three principals are discussed: deterministic, adaptive, and selfadaptive parameter control. A dynamic parameter choice is called deterministic if it does not depend on the fitness landscape encountered by the algorithm. That is, there is no feedback between the fitness values and the dynamic parameters.
3.2 The Algorithm
Recall that the onefifth success rule in evolution strategies is used to change the stepsize in a selfadjusting manner. When the empirical success probability is large, the stepsize is increased to hopefully speedup the exploration. When is it low, it is reduced to hopefully increase the chance of a success. This is done in a way that an average success probability of one fifth leads to no change of the stepsize on average.
In discrete search spaces, naturally, things are very different. However, we can still come up with a natural variant of the onefifth success rule. Note that in our GA (with the suggested choices and for the mutation rate and the crossover rate, respectively), increasing will increase the success probability of one iteration, however, at the price of an increased number of function evaluations, that is, higher runtime. Consequently, it makes sense to increase when the empirical success probability is low (to speed up the process of finding an improvement), but to reduce it when the success probability is large (to hopefully save computational effort).
Taking Auger’s [1] implementation of the onefifth success rule as example, we design the following selfadjusting version of the GA, see also Algorithm LABEL:alg:GAself. After an iteration that led to an increase of the fitness of (“success”), indicating an easy success, we reduce by a constant factor (of course, not letting drop below ). If an iteration was not successful, we increase by a factor of (since we analyze the algorithm for mutation probability we do not let exceed ). Consequently, after a series of iterations with an average success rate of , we end up with the initial value of (unless the lower barrier of was hit).
algocf[t] \end@float
As a technical remark we note that where an integer is required in Algorithm LABEL:alg:GAself (e.g., lines 6 and 10) we round to its closest integer, i.e., instead of we regard if the fractional part of is less then and we regard otherwise.
3.3 Runtime Analysis
We show that the selfadjusting GA (with standard parameters and ) solves the generalized OneMax problem in linear time when the selfadjusting speed factor is not too large.
The proof of this result is rather technical. For this reason, we are only able to show a linear optimization time when is smaller than a certain constant , but we do not make this precise. In general, making implicit constants precise is a difficult task in runtime analysis, and for many much simpler problems the implicit constants are not known. In the experiments conducted in [8], see in particular Figure 4 there, all values worked well (recall that in Auger’s implementation [1] was used). At the end of this section, we give some indication why values larger than , however, may lead to an exponential expected optimization time.
Our main result is the following.
Theorem 5.
The optimization time of the selfadjusting GA with parameters and on every generalized OneMax function is for any sufficiently small update strength .
To prove Theorem 5, roughly speaking, we show that the population sizes suggested by the onefifth success rule are usually not very far from the fitnessdependent choice analyzed in [9, Theorem 8] (which is restated above as Theorem 2). Intuitively, if happens to be much larger than , the success probability of the GA is so large that with reasonably large probability one of the next iterations is successful and, as a consequence, the value of is then adjusted to its previous value divided by , thus approaching again . A key argument in this proof is the following lemma, which shows that for large values of the success probability of the GA is indeed reasonably large. This lemma can be seen as a generalization of Lemma 7 in [9].
Lemma 6.
Let . Let . Let be the probability that one iteration of Algorithm LABEL:alg:GA (with parameters and ) starting in is successful. There exists a constant such that for all we have .
of Lemma 6.
We use the same notation as in the description of Algorithm LABEL:alg:GAself. For readability purposes we again write even if an integer is required. For any fixed and for any , the success probability of increasing the fitness by at least one is (by the law of total probability) at least
(1)  
By Lemmas 3 and 4 it holds for any that
It thus suffices to bound from below

,

,

.
Bounding (i): For any it holds that
For we can thus bound expression (i) from below by 0.21.
Bounding (ii): We set and obtain, for ,
This expression is at most for large enough , showing that we can bound (ii) from below by .
Bounding (iii): We apply Chernoff’s bound to bound (iii) from below by . Since , this term is again larger than for a suitably chosen .
Putting everything together we have seen that, for a suitable choice of , the expression in (1) is strictly larger than . ∎
While the proof of Lemma 6 was rather straightforward, the proof of the main theorem, i.e., Theorem 5, is much more involved.
of Theorem 5.
As in the overview given before Lemma 6 we sloppily denote in the following by our fitnessdependent parameter choice of Theorem 2; i.e., . Note that the value of depends on the current fitness value but that this is not reflected in the abbreviation. To increase the readability, we omit again to specify whether has to be rounded up or down.
We partition the optimization process into phases. The first phase starts with the first fitness evaluation. A phase ends with an iteration at whose end we have increased the fitness of and holds, for a sufficiently large constant that we do not compute explicitly. ( is determined by Lemma 6.)
We shall first show that each phase has an expected cost of fitness evaluations. From this is it not difficult to conclude the proof by arguments used in the proof of Theorem 2.
To bound the expected cost of each phase, we distinguish between “short” phases, in which holds throughout, and “long” phases, in which for at least one iteration. We abbreviate the threshold by . Note that as well depends on the current fitness .
Claim 1: The expected cost of a short phase is .
Proof of claim 1: Let be the value of at the beginning of the phase and let be the number of iterations of the phase. Since does not exceed the threshold , there is exactly one iteration in which the fitness of is increased. That is, the value of has first been multiplied times by until there was a fitness increase and the value has been shrunk as a consequence of the fitness increase. The value of at the end of the phase is thus and this value is bounded from above by (since we are considering a short phase). Since an iteration with parameter requires fitness evaluations, the total number of fitness evaluations is thus
(2) 
Claim 2: The expected cost of a long phase is .
Proof of claim 2: We split the long phase into an opening phase and a main phase. The opening phase ends with the last iteration in which holds, so that the main phase starts with a that is at least as large as , but less than .
Let denote the number of fitness evaluations during the phase and let denote the number of iterations in the main (!) phase. As in the proof of Claim 1 it will be easy to see that for all and some fix constant . The most technical part of this proof is to bound the probability that the main phase of a long phase requires iterations; i.e., given that we are in a long phase. We show that this probability is at most for some positive constant . It is well known that the geometric series converges if . The overall expected number of fitness evaluations during a long phase is thus
which is as desired.
It remains to prove the following two claims.
Claim 2.1: for some large enough constant .
Claim 2.2: Given that we are in a long phase, we have for a positive constant .
Proof of Claim 2.1: The cost of the opening phase is at most , where is the initial value of at the beginning of the phase and is chosen maximally such that . As in (2) one shows that this sum is . Similarly, since the initial of the main phase is at most , the cost of the main phase is at most
for .
Proof of Claim 2.2: As mentioned above, the main phase starts with a that is at least and strictly less than . We are interested in the first point in time at which is less than . Note that all future values encountered in this phase are of the type for being a multiple of . By regarding this exponent , we transform the process into a biased random walk on the line . Our starting position is . If an iteration is successful, i.e., if the fitness value of has increased during the iteration, the process does one step of length one to the left. It does a step of length to the right otherwise (we thus pessimistically ignore the fact that never exceeds ). We bound the probability that it takes or more iterations until this random walk has reached a value of less then . At this point in time the current is less than the original value that was active at the beginning of the main phase. That is, when the random walk reaches a position smaller than , is for certain less then the then active threshold (which, by definition, increases whenever the fitness value of does).
If an iteration is successful with probability at least , the expected progress of this random walk in one iteration is , which is negative since by Lemma 6 we have . Hence there exists a constant such that .
To conclude the proof of Claim 2.2, let us define random variables , , by setting if the fitness does not increase in the th iteration of the main phase, and setting otherwise. We have just seen that . Given that we are in a long phase, the probability that the main phase has length at least equals . This is at most , which is in turn bounded from above by . We apply Chernoff’s bound—confer Theorem 1.11 in [6] for a version that allows for random variables that do not necessarily take positive values—to see that, as desired, this term is at most
∎
3.4 General Insights from the Runtime Analysis
The analysis above reveals the following facts, which might be helpful in general when trying to use a onefifth success rule or a related selfadjusting rule in discrete search spaces.
The adjustment rule must fit to the limiting success probability. In the proof above, it was crucial that the success probability shown in Lemma 6 was a constant larger than . It is easy to see that if the success probability was uniformly bounded from above by a constant , then in expectation would increase by a positive constant in each round. Consequently, would show an exponential growth, quickly leading to wastefully large values. This can partially be overcome by imposing an upper barrier for (we have such a barrier, namely , to ensure that the mutation probability is at most ), however, this would still lead to the algorithm mostly working with this maximal value of instead of a value close to the ideal .
In general, there is no reason for not trying success rules with other ratios than onefifth, that is, increasing by instead of in case of nonsuccess. In general, a larger value of will slightly decrease the speed of adjustment, but is more likely to overcome the problem described in the previous paragraph. Note that for our problem, when , the success probability is uniformly bounded from above by . Consequently, the onefifth rule avoids the exponential growth of , whereas a onethird rule or a onehalf rule (e.g., doubling or halving the parameter as in [19]) would not.
The constant matters. Even when the combination of update rule and success probability avoids an expected exponential growth of , things can still go wrong when the update strength is too large. Here is an example (where, to ease the presentation, we assume that we have no upper barrier on ; with an upper barrier, as above, the problem remains, though possibly to a smaller extent): Imagine that we start with some value . Above, we saw that the success probability of an iteration is bounded from above by . Consequently, the probability of having exactly consecutive nonsuccesses is at least . The optimization time of the last iteration alone is . Consequently, the expected effort of finding one improvement is at least . When , this series does not converge, i.e., the expected effort for one improvement is infinite. In our case, this happens (at least) when . Note that this was a rough estimate aimed at quickly demonstrating that large values can be dangerous. Better values can be achieved with more effort. E.g., the probability that among iterations, we have at most successes, is more than ; this can be seem from approximating the binomial distribution with a normal distribution. Since this event also increases the initial value by , the corresponding series already diverges for . Optimizing the ratio of successes and nonsuccesses, we see that the probability of having successes among trials is more than , showing that any leads to an exponential expected optimization time. We do not know to what extent this argument can be improved. For this reason, we would rather suggest to choose a small value of , clearly below , and trade in the possibly faster adjustment to the ideal parameter value for a reduced risk of an expected infinite optimization time.
4 A Linear Lower Bound for All Possible Parameter Choices
In the previous section we have seen that the selfadjusting GA is faster on average than with any static population size. We next show that it is asymptotically best possible also among all dynamic parameter choices. That is, regardless of how the parameters are updated in each iteration, the GA always has an expected runtime on OneMax that is at least linear in .
Theorem 7.
For every (possibly dynamic) choice of the mutation probability , the crossover probability , and the population size , the GA performs at least linearly many function evaluations on average before it evaluates for the first time the unique global optimum. That is, regardless of the parameter update scheme, the expected runtime of the GA on OneMax is .
For parameter choices that do not depend on absolute fitness values, Theorem 7 follows from an elegant technique in blackbox complexity. Since we believe this intuitive argument to be of general interest to the Self* research community, we present it in the following section. It can be seen as a nice, yet powerful tool for analyzing the limitations of evolutionary and other blackbox optimization algorithms.
For fitnessdependent parameter choices, Theorem 7 also holds, but needs a different argument as we shall comment in Section 4.2.
4.1 Lower Bound for SelfAdjusting Parameter Choices
We start the exposition of the lower bound by introducing the concept of comparisonbased algorithms.
Definition 8.
A comparisonbased blackbox algorithm does not make use of absolute fitness values. Instead, it bases all decisions solely on the comparison of search points.
To see that the selfadjusting GA is a comparisonbased algorithm, we reformulate the algorithm slightly, see Algorithm LABEL:alg:ga2. This alternative presentation shows that indeed all decisions of the GA are entirely based on comparisons between at most two search points. A lower bound that holds for all comparisonbased algorithms thus immediately implies a lower bound for the GA. We therefore expand our view and regard the whole class of comparisonbased algorithms. Theorem 7 for nonfitnessdependent parameter choices follows from the following statement, which is folklore knowledge in blackbox complexity and has been formally stated (in much more general form) in [24, Corollary 2].
algocf[htbp] \end@float
Theorem 9.
Every comparisonbased algorithm needs at least comparisons on average to optimize a generalized OneMax function.
The intuitive argument for Theorem 9 is pretty simple. In order to optimize a OneMax function we need to identify . That is, we need to learn the bits of . Roughly speaking, with each query we learn at most one bit of information about , namely in the mutation phase we learn whether or not, and in the crossover phase we learn the bit whether or not . Finally, we learn the one bit of information whether or not . Thus one iteration with offspring in the mutation and the crossover phase each gives us a total of bits of information. Thus, amortized over the offspring that were created, this is a bit less than one bit of information per query. Since we need to learn all bits of , this shows (intuitively) that we need to sample and compare at least search points in total. This implies the lower bound.
It is not too difficult to make this intuitive argument formal. To this end, one employs Yao’s Minimax Principle [25], a powerful tool in blackbox complexity. The interested reader can find a quite accessible exposition of Yao’s Principle along with some easy to follow examples from evolutionary computation in [12].
4.2 Lower Bound for FitnessDependent Parameter Choices
The proof given in Section 4.1 does not work for parameter choices that possibly depend on the absolute fitness of intermediate search points. Intuitively, the problem here is that and knowing (or, rather, using knowledge about) the absolute fitness value of provides bits of information. This would only yield a lower bound of order . Still the statement, i.e., Theorem 7, holds also for fitnessdependent parameter choices, as we shall briefly comment in this section. Note that this bound also implies the optimality of the fitnessdependent choice suggested in [9].
In the analysis of [7] it is shown (for the recommended parameter choices and ) that the expected fitness gain in one iteration of the GA with population size is of order at most (see the proof of the upper and lower bounds of Theorem 2 in [7]). That is, for “investing” function evaluations we obtain a fitness gain of order at most . This shows that the average fitness gain per function evaluation cannot be more than constant, thus implying the linear lower bound. To make this formal, one can use the additive drift theorem [16].
5 Conclusions
We have analyzed the GA with selfadjusting population sizes. We have shown that it optimizes any generalized OneMax function in linear time. This is best possible for any (static or dynamic) parameter choice and is better by a factor than any GA with static population size. Our result thus shows for the first time that selfadjusting parameter choices can be provably beneficial in discrete optimization problems.
We hope that our work inspires more work on the running time of evolutionary algorithms with selfadjusting parameter choices. We have provided some general insights that should be regarded when implementing a onefifth success rule in discrete search spaces.
While our work focuses on an adjustment of the population size, we are confident that also for other parameters of evolutionary algorithms, e.g., the mutation or crossover rates, selfadjusting choices can be analyzed theoretically.
Acknowledgments
The authors would like to thank Anne Auger and Nikolaus Hansen for valuable discussions on various aspect of this project.
Footnotes
 We use here and in the following a slightly adapted version of the terminology for parameter setting suggested in [13]. This terminology is summarized in Section 3.1 and Figure 2.
 As noted in [13] this does not exclude randomized update schemes. A possibly better wording would therefore be fixed or feedbackfree update rules.
References
 A. Auger. Benchmarking the (1+1) evolution strategy with onefifth success rule on the BBOB2009 function testbed. In Proc. of GECCO’09 (Companion), pages 2447–2452. ACM, 2009.
 A. Auger and N. Hansen. Linear convergence on positively homogeneous functions of a comparison based stepsize adaptive randomized search: the (1+1) ES with generalized onefifth success rule. CoRR, abs/1310.8397, 2013. Available online at http://arxiv.org/abs/1310.8397.
 S. Böttcher, B. Doerr, and F. Neumann. Optimal fixed and adaptive mutation rates for the leadingones problem. In Proc. of PPSN’10, volume 6238 of LNCS, pages 1–10. Springer, 2010.
 K. A. De Jong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, 1975.
 L. Devroye. The compound random search. PhD thesis, 1972.
 B. Doerr. Analyzing randomized search heuristics: Tools from probability theory. In A. Auger and B. Doerr, editors, Theory of Randomized Search Heuristics, pages 1–20. World Scientific Publishing, 2011.
 B. Doerr and C. Doerr. A tight bound for the runtime of the genetic algorithm on onemax. In Proc. of GECCO’15. ACM, 2015. To appear.
 B. Doerr, C. Doerr, and F. Ebel. Lessons from the blackbox: Fast crossoverbased genetic algorithms. In Proc. of GECCO’13, pages 781–788. ACM, 2013.
 B. Doerr, C. Doerr, and F. Ebel. From blackbox complexity to designing new genetic algorithms. Theoretical Computer Science, 567:87–104, 2015.
 B. Doerr, T. Jansen, D. Sudholt, C. Winzen, and C. Zarges. Mutation rate matters even when optimizing monotonic functions. Evolutionary Computation, 21:1–27, 2013.
 S. Droste, T. Jansen, and I. Wegener. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science, 276:51–81, 2002.
 S. Droste, T. Jansen, and I. Wegener. Upper and lower bounds for randomized search heuristics in blackbox optimization. Theory of Computing Systems, 39:525–544, 2006.
 A. E. Eiben, R. Hinterding, and Z. Michalewicz. Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 3:124–141, 1999.
 A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing. Springer Verlag, 2003.
 N. Hansen, A. Gawelczyk, and A. Ostermeier. Sizing the population with respect to the local progress in (1,)evolution strategies  a theoretical analysis. In Proc. of CEC’95, pages 80–85. IEEE, 1995.
 J. He and X. Yao. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 127:57–85, 2001.
 J. Jägersküpper. Rigorous runtime analysis of the (1+1) ES: 1/5rule and ellipsoidal fitness landscapes. In Proc. of FOGA’05, volume 3469 of LNCS, pages 260–281. Springer, 2005.
 G. Karafotias, M. Hoogendoorn, and A. Eiben. Parameter control in evolutionary algorithms: Trends and challenges. IEEE Transactions on Evolutionary Computation, 19:167–187, 2015.
 J. Lässig and D. Sudholt. Adaptive population models for offspring populations and parallel evolutionary algorithms. In Proc. of FOGA’11, pages 181–192. ACM, 2011.
 R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
 I. Rechenberg. Evolutionsstrategie. Friedrich Fromman Verlag (Günther Holzboog KG), 1973.
 M. Schumer and K. Steiglitz. Adaptive step size random search. IEEE Transactions on Automatic Control, 13(3):270–276, 1968.
 D. Sudholt. Crossover speeds up buildingblock assembly. In Proc. of GECCO’12, pages 689–702. ACM, 2012.
 O. Teytaud and S. Gelly. General lower bounds for evolutionary algorithms. In Proc. of PPSN’06, volume 4193 of LNCS, pages 21–31. Springer, 2006.
 A. C.C. Yao. Probabilistic computations: Toward a unified measure of complexity. In Proc. of FOCS’77, pages 222–227. IEEE, 1977.