A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions

# A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions

## Abstract

Evolutionary algorithms (EAs) are population-based general-purpose optimization algorithms, and have been successfully applied in various real-world optimization tasks. However, previous theoretical studies often employ EAs with only a parent or offspring population and focus on specific problems. Furthermore, they often only show upper bounds on the running time, while lower bounds are also necessary to get a complete understanding of an algorithm. In this paper, we analyze the running time of the (+)-EA (a general population-based EA with mutation only) on the class of pseudo-Boolean functions with a unique global optimum. By applying the recently proposed switch analysis approach, we prove the lower bound for the first time. Particularly on the two widely-studied problems, OneMax and LeadingOnes, the derived lower bound discloses that the (+)-EA will be strictly slower than the (1+1)-EA when the population size or is above a moderate order. Our results imply that the increase of population size, while usually desired in practice, bears the risk of increasing the lower bound of the running time and thus should be carefully considered.

E
\cortext

[cor1]Corresponding author. Email: zhouzh@nju.edu.cn

volutionary algorithms \seppopulation \seprunning time analysis \seplower bound

## 1 Introduction

Evolutionary algorithms (EAs) [Bäc96] are a kind of population-based heuristic optimization algorithm. They have been widely applied in industrial optimization problems. However, the theoretical analysis is difficult due to their complexity and randomness. In the recent decade, there has been a significant rise on the running time analysis (one essential theoretical aspect) of EAs [AD11, NW10]. For example, Droste et al. [DJW02] proved that the expected running time of the (1+1)-EA on linear pseudo-Boolean functions is ; for the (+1)-EA solving several artificially designed functions, a large parent population size was shown to be able to reduce the running time from exponential to polynomial [JW01, Sto08, Wit06, Wit08]; for the (1+)-EA solving linear functions, the expected running time was proved to be  [DK15], and a tighter bound up to lower order terms was derived on the specific linear function OneMax [GW15].

Previous running time analyses often consider EAs with only a parent or offspring population, which do not fully reflect the population-based nature of real EAs. When involving both parent and offspring populations, the running time analysis gets more complex, and only a few results have been reported on the (+)-EA (i.e., a specific version of the (+)-EA with ), which maintains solutions and generates offspring solutions by only mutation in each iteration. He and Yao [HY02] compared the expected running time of the (1+1)-EA and the (+)-EA on two specific artificial problems, and proved that the introduction of a population can reduce the running time exponentially. On the contrary side, Chen et al. [CTCY12] found that a large population size is harmful for the (+)-EA solving the TrapZeros problem. Chen et al. [CHS09] also proved that the expected running time of the (+)-EA on the OneMax and LeadingOnes problems is and , respectively. Later, a low selection pressure was shown to be better for the (+)-EA solving a wide gap problem [CHCY10], and a proper mutation-selection balance was proved to be necessary for the effectiveness of the (+)-EA solving the SelPres problem [LY12].

The above-mentioned studies on the (+)-EA usually focus on specific test functions, while EAs are general purpose optimization algorithms and can be applied to all optimization problems where solutions can be represented and evaluated. Thus, it is necessary to analyze EAs over large problem classes. Meanwhile, most previous running time analyses on population-based EAs only show upper bounds. Although upper bounds are appealing for revealing the ability of an algorithm, lower bounds which reveal the limitation are also necessary for a complete understanding of the algorithm.

In this paper, we analyze the running time of the (+)-EA solving the class of pseudo-Boolean functions with a unique global optimum, named UBoolean, which covers many P and NP-hard combinatorial problems. By applying the recently proposed approach switch analysis [YQZ15], we prove that the expected running time is lower bounded by . Particularly, when applying this lower bound to the two specific problems, OneMax and LeadingOnes, we can have a more complete understanding of the impact of the offspring population size . It was known that the (+)-EA is always not asymptotically faster than the (1+1)-EA on these two problems [LW12, Sud13]. But it was left open that what is the range of where the (+)-EA is asymptotically worse than the (1+1)-EA. Note that the expected running time of the (1+1)-EA on OneMax and LeadingOnes is and , respectively [DJW02]. By comparing them with our derived lower bound, we easily get that the (+)-EA is strictly asymptotically slower than the (1+1)-EA when on OneMax and on LeadingOnes. For the parent population size , we easily get obvious ranges and for the (+)-EA being asymptotically worse on OneMax and LeadingOnes, respectively.

The rest of this paper is organized as follows. Section 2 introduces some preliminaries. Section 3 introduces the employed analysis approach. The running time analysis of the (+)-EA on UBoolean is presented in Section 4. Section 5 concludes the paper.

## 2 Preliminaries

In this section, we first introduce the (+)-EA and the pseudo-Boolean problem class studied in this paper, respectively, then describe how to model EAs as Markov chains.

### 2.1 (μ+λ)-Ea

Evolutionary algorithms (EAs) [Bäc96] are used as general heuristic randomized optimization approaches. Starting from an initial set of solutions (called a population), EAs try to improve the population by a cycle of three stages: reproducing new solutions from the current population, evaluating the newly generated solutions, and updating the population by removing bad solutions. The (+)-EA as described in Algorithm 1 is a general population-based EA with mutation only for optimizing pseudo-Boolean problems over . It maintains solutions. In each iteration, one solution selected from the current population is used to generate an offspring solution by bit-wise mutation (i.e., line 5); this process is repeated independently for times; then solutions out of the parent and offspring solutions are selected to be the next population. Note that the selection strategies for reproducing new solutions and updating the population can be arbitrary. Thus, the considered (+)-EA is quite general, and covers most population-based EAs with mutation only in previous theoretical analyses, e.g., [CTCY12, HY02, LY12].

The running time of EAs is usually defined as the number of fitness evaluations until an optimal solution is found for the first time, since the fitness evaluation is the computational process with the highest cost of the algorithm [HY01, YZ08]. Note that running time analysis has been a leading theoretical aspect for randomized search heuristics [AD11, NW10].

### 2.2 Pseudo-Boolean Function Problems

The pseudo-Boolean function class is a large function class which only requires the solution space to be and the objective space to be . It covers many typical P and NP-hard combinatorial problems such as minimum spanning tree and minimum set cover. We consider a subclass named UBoolean as shown in Definition 1, in which every function has a unique global optimum. Note that maximization is considered since minimizing is equivalent to maximizing . For any function in UBoolean, we assume without loss of generality that the optimal solution is (briefly denoted as ). This is because EAs treat the bits 0 and 1 symmetrically, and thus the 0-bits in an optimal solution can be interpreted as 1-bits without affecting the behavior of EAs. The expected running time of unbiased black-box algorithms and mutation-based EAs on UBoolean has been proved to be  [LW12, Sud13].

###### Definition 1 (UBoolean)

A function in UBoolean satisfies that

 ∃s∈{0,1}n,∀s′∈{0,1}n−{s},f(s′)

Diverse pseudo-Boolean problems in UBoolean have been used for analyzing the running time of EAs, and then to disclose properties of EAs. Here, we introduce the LeadingOnes problem, which will be used in this paper. As presented in Definition 2, it is to maximize the number of consecutive 1-bits starting from the left. It has been proved that the expected running time of the (1+1)-EA on LeadingOnes is  [DJW02].

###### Definition 2 (LeadingOnes)

LeadingOnes Problem of size is to find an bits binary string such that, letting be the -th bit of a solution ,

 s∗=argmaxs∈{0,1}n(f(s)=∑ni=1∏ij=1sj).

### 2.3 Markov Chain Modeling

EAs can be modeled and analyzed as Markov chains, e.g., in [HY01, YZ08]. Let be the population space and be the optimal population space. Note that an optimal population in contains at least one optimal solution. Let be the population after generations. Then, an EA can be described as a random sequence . Since can often be decided from and the reproduction operator of the EA (i.e., ), the random sequence forms a Markov chain with state space , denoted as “” for simplicity. Note that all sets considered in this paper are multisets, e.g., a population can contain several copies of the same solution.

The goal of EAs is to reach the optimal space from an initial population . Given a Markov chain modeling an EA and , we define as a random variable such that . That is, is the number of steps needed to reach the optimal space for the first time when starting from time . The mathematical expectation of , , is called the conditional first hitting time (CFHT) of the chain staring from . If is drawn from a distribution , the expectation of the CFHT over , , is called the distribution-CFHT (DCFHT) of the chain from . Since the running time of EAs is counted by the number of fitness evaluations, the cost of initialization and each generation should be considered. For example, the expected running time of the (+)-EA is .

A Markov chain is said to be absorbing, if . Note that all Markov chains modeling EAs can be transformed to be absorbing by making it unchanged once an optimal state has been found. This transformation obviously does not affect its first hitting time.

## 3 The Switch Analysis Approach

To derive running time bounds of the (+)-EA on UBoolean, we first model the EA process as a Markov chain, and then apply the switch analysis approach.

Switch analysis [YQZ15, YQ15] as presented in Theorem 1 is a recently proposed approach that compares the DCFHT of two Markov chains. Since the state spaces of the two chains may be different, an aligned mapping function as shown in Definition 3 is employed. Note that . Using switch analysis to derive running time bounds of a given chain (i.e., modeling a given EA running on a given problem), one needs to

1. construct a reference chain for comparison and design an aligned mapping function from to ;

2. analyze their one-step transition probabilities, i.e., and , the CFHT of the chain , i.e., , and the state distribution of the chain , i.e., ;

3. examine Eq. (\refeqSA-condition) to get the difference between each step of the two chains;

4. sum up to get a running time gap of the two chains, and then bounds on can be derived by combining with .

###### Definition 3 (Aligned Mapping [Yqz15])

Given two spaces and with target subspaces and , respectively, a function is called
(a) a left-aligned mapping if ;
(b) a right-aligned mapping if ;
(c) an optimal-aligned mapping if it is both left-aligned and right-aligned.

###### Theorem 1 (Switch Analysis [Yqz15])

Given two absorbing Markov chains and , let and denote the hitting events of and , respectively, and let denote the distribution of . Given a series of values with and a right (or left)-aligned mapping , if   is finite and

 ∀t:∑x∈X,y∈Yπt(x)P(ξt+1∈ϕ−1(y)∣ξt=x)E[[τ′∣ξ′0=y]] (1) ≤(or ≥)∑u,y∈Yπϕt(u)P(ξ′1=y∣ξ′0=u)E[[τ′∣ξ′1=y]]+ρt,

where , we have

 E[[τ∣ξ0∼π0]]≤(or % ≥)E[[τ′∣ξ′0∼πϕ0]]+ρ.

The idea of switch analysis is to obtain the difference on the DCFHT of two chains by summing up all the one-step differences . Using Theorem 1 to compare two chains, we can waive the long-term behavior of one chain, since Eq. (1) does not involve the term . Therefore, the theorem can simplify the analysis of an EA process by comparing it with an easy-to-analyze one.

## 4 Running Time Analysis

In this section, we prove a lower bound on the expected running time of the (+)-EA solving UBoolean, as shown in Theorem 2. Our proof is accomplished by using switch analysis (i.e., Theorem 1). The target EA process we are to analyze is the (+)-EA running on any function in UBoolean. The constructed reference process for comparison is the RLS algorithm running on the LeadingOnes problem. RLS is a modification of the randomized local search algorithm. It maintains only one solution . In each iteration, a new solution is generated by flipping a randomly chosen bit of , and is accepted only if . That is, RLS searches locally and only accepts a better offspring solution.

###### Theorem 2

The expected running time of the (+)-EA on UBoolean is , when and are upper bounded by a polynomial in .

We first give some lemmas that will be used in the proof of Theorem 2. Lemma 1 characterizes the one-step transition behavior of a Markov chain via CFHT. Lemma 2 gives the CFHT of the reference chain (i.e., RLS running on LeadingOnes). In the following analysis, we will use to denote with , i.e., .

###### Lemma 1 ([Fre96])

Given a Markov chain and a target subspace , we have, for CFHT,   ,

 ∀x∉X∗:E[[τ∣ξt=x]]=1+∑x′∈XP(ξt+1=x′∣ξt=x)E[[τ∣ξt+1=x′]].
###### Lemma 2 ([Yqz15])

For the chain modeling RLS running on the LeadingOnes problem, the CFHT satisfies that , where denotes the number of 0-bits of .

###### Lemma 3

For , decreases with .

###### Proof.

Let . The goal is to show that for . Denote as independent random variables, where satisfies that and . Then we can express and as and . Thus,

 f(m+1)=P(∑mj=1Xj

###### Lemma 4

For where is a positive constant, it holds that

 n−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ≥n−⌈e(c+1)lnnlnlnn⌉.
###### Proof.

Let . Denote as independent random variables, where and . Let , then its expectation . We thus have

 ∀i≥m, n∑k=i(nk)(1n)k(1−1n)n−k=P(X≥i)≤e(i−1)/ii, (2)

where the inequality is by Chernoff bound. Then, we have

 n−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ≥n−1∑i=m−1(i∑k=0(nk)(1n)k(1−1n)n−k)λ =n−1∑i=m−1(1−n∑k=i+1(nk)(1n)k(1−1n)n−k)λ ≥n−1∑i=m−1(1−ei/(i+1)(i+1))λ≥n−1∑i=m−1(1−e(m−1)/mm)λ,

where the second inequality is by Eq. (\refeqChernoff), and the last inequality can be easily derived because decreases with when .

Then, we evaluate by taking logarithm to its reciprocal.

 ln(mm/em−1)=m(lnm−1)+1 ≥e(c+1)lnnlnlnn(1+ln(c+1)+lnlnn−lnlnlnn−1)+1 ≥e(c+1)lnnlnlnn1elnlnn=(c+1)lnn≥lnλn.(by λ≤nc)

This implies that . Thus, we have

 n−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ≥n−1∑i=m−1(1−1λn)λ ≥n−1∑i=m−1(1−1n)≥n−m=n−⌈e(c+1)lnnlnlnn⌉,

where the second inequality is by . ∎

###### Lemma 5 ([Fg06])

Let . It holds that

 ∀n≥1,0<ϵ<12:⌊ϵn⌋∑k=0(nk)≤2H(ϵ)n.

Proof of Theorem 2.  We use switch analysis (i.e., Theorem 1) to prove it. Let model the analyzed EA process (i.e., the (+)-EA running on any function in UBoolean). We use RLS running on the LeadingOnes problem as the reference process modeled by . Then, , , and , where denotes the number of 1-bits of a solution . We construct a mapping as that . It is easy to see that the mapping is an optimality-aligned mapping, because iff .

We investigate the condition Eq. (1) of switch analysis. For any , suppose that . Then, . By Lemma 1 and 2, we have

 ∑y∈YP(ξ′1=y∣ξ′0=ϕ(x))E[[τ′∣ξ′1=y]]=Erls(j)−1=nj−1. (3)

For the reproduction of the (+)-EA (i.e., the chain ) on the population , assume that the selected solutions from for reproduction have the number of 0-bits , respectively, where . If there are at most number of 0-bits mutating to 1-bits for each selected solution and there exists at least one selected solution which flips exactly number of 0-bits, which happens with probability (denoted by ), the next population satisfies that . Furthermore, increases with . Thus, we have

 ∑y∈YP(ξt+1∈ϕ−1(y)∣ξt=x)E[[τ′∣ξ′0=y]]≥j1∑i=0p(i)⋅Erls(j1−i) (4) ≥j∑i=0p(i)⋅Erls(j−i)=nj−1∑i=0(λ∏p=1(i∑k=0(jpk)(1n)k(1−1n)jp−k)).

By comparing Eq. (\refeqonestep-nonoptimal9) with Eq. (\refeqonestep-nonoptimal10), we have ,

 ∑y∈YP(ξt+1∈ϕ−1(y)∣ξt=x)E[[τ′|ξ′0=y]]−∑y∈YP(ξ′1=y∣ξ′0=ϕ(x))E[[τ′|ξ′1=y]] ≥n(j−1∑i=0(λ∏p=1(i∑k=0(jpk)(1n)k(1−1n)jp−k))−j)+1 ≥n(j−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ−j)+1 ≥n(n−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ−n)+1,

where the 2nd ’’ is because from Lemma 3, reaches the minimum when , and the last ’’ is by .
When , both Eq. (3) and Eq. (4) equal 0, because both chains are absorbing and the mapping is optimality-aligned. Thus, Eq. (1) in Theorem 1 holds with . By switch analysis,

 E[[τ|ξ0∼π0]]≥ E[[τ′|ξ′0∼πϕ0]] +(n(n−1∑i=0(i∑k=0(nk)(1n)k(1−1n)n−k)λ−n)+1)+∞∑t=0(1−πt(X∗)).

Since , we have

 Missing dimension or its units for \kern (5)

where the last inequality is by Lemma 4, since for some constant .

We then investigate . Since each of the solutions in the initial population is selected uniformly and randomly from , we have

 ∀0≤j≤n: πϕ0({y∈Y∣|y|0=j})=π0({x∈X∣miny∈x|y|0=j}) =(∑nk=j(nk))μ−(∑nk=j+1(n)k)μ2nμ,

where is the number of solutions with not less than number of 0-bits. Then,

 E[[τ′|ξ′0∼πϕ0]]=∑nj=0πϕ0({y∈Y∣|y|0=j})Erls(j) =12nμn∑j=1((n∑k=j(nk))μ−(n∑k=j+1(nk))μ)nj=n2nμn∑j=1(n∑k=j(nk))μ >n⌊n4⌋+1∑j=1(n∑k=j(nk)/2n)μ>n24(n∑k=⌊n4⌋+1(nk)/2n)μ=n24(1−⌊n4⌋∑k=0(nk)/2n)μ ≥n24(1−2H(14)n−n)μ≥n24e−μ2(1−H(14))n−1>n24e−μ1.13n−1,

where the third inequality is by Lemma 5, the fourth inequality is by , and the last inequality is by .

Applying the above lower bound on to Eq. (\refeqlower-bound), we get, noting that is upper bounded by a polynomial in ,

 E[[τ|ξ0∼π0]]≥n4⌈e(c+1)lnnlnlnn⌉e−μ1.13n−1,i.e.,Ω(nlnlnnlnn).

Considering the number of fitness evaluations for the initial population and the number of fitness evaluations in each generation, the expected running time of the (+)-EA on UBoolean is lower bounded by . Because the (+)-EA belongs to mutation-based EAs, we can also directly use the general lower bound  [Sud13]. Thus, the theorem holds.

## 5 Conclusion

This paper analyzes the expected running time of the (+)-EA for solving a general problem class consisting of pseudo-Boolean functions with a unique global optimum. We derive the lower bound by applying the recently proposed approach switch analysis. The results partially complete the running time comparison between the (+)-EA and the (1+1)-EA on the two well-studied pseudo-Boolean problems, OneMax and LeadingOnes. We can now conclude that when or is slightly large, the (+)-EA has a worse expected running time. The investigated (+)-EA only uses mutation, while crossover is a characterizing feature of EAs. Therefore, we will try to analyze the running time of population-based EAs with crossover operators in the future.

### References

1. A. Auger and B. Doerr. Theory of Randomized Search Heuristics: Foundations and Recent Developments. World Scientific, Singapore, 2011.
2. T. Bäck. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford, UK, 1996.
3. T. Chen, J. He, G. Chen, and X. Yao. Choosing selection pressure for wide-gap problems. Theoretical Computer Science, 411(6):926–934, 2010.
4. T. Chen, J. He, G. Sun, G. Chen, and X. Yao. A new approach for analyzing average time complexity of population-based evolutionary algorithms on unimodal problems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(5):1092–1106, 2009.
5. Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. A large population size can be unhelpful in evolutionary algorithms. Theoretical Computer Science, 436(8):54–70, 2012.
6. S. Droste, T. Jansen, and I. Wegener. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science, 276(1-2):51–81, 2002.
7. Benjamin Doerr and Marvin Künnemann. Optimizing linear functions with the (1+) evolutionary algorithm - different asymptotic runtimes for different instances. Theoretical Computer Science, 561:3–23, 2015.
8. J. Flum and M. Grohe. Parameterized Complexity Theory. Springer, New York, NY, 2006.
9. M.I. Freǐdlin. Markov Processes and Differential Equations: Asymptotic Problems. Birkhäuser Verlag, Basel, Switzerland, 1996.
10. Christian Gießen and Carsten Witt. Population size vs. mutation strength for the (1+) EA on OneMax. In Proceedings of GECCO’15, pages 1439–1446, Madrid, Spain, 2015.
11. J. He and X. Yao. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 127(1):57–85, 2001.
12. J. He and X. Yao. From an individual to a population: An analysis of the first hitting time of population-based evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6(5):495–511, 2002.
13. T. Jansen and I. Wegener. On the utility of populations in evolutionary algorithms. In Proceedings of GECCO’01, pages 1034–1041, San Francisco, CA, 2001.
14. P. K. Lehre and C. Witt. Black-box search by unbiased variation. Algorithmica, 64(4):623–642, 2012.
15. Per Kristian Lehre and Xin Yao. On the impact of mutation-selection balance on the runtime of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 16(2):225–241, 2012.
16. F. Neumann and C. Witt. Bioinspired Computation in Combinatorial Optimization: Algorithms and Their Computational Complexity. Springer-Verlag, Berlin, Germany, 2010.
17. T. Storch. On the choice of the parent population size. Evolutionary Computation, 16(4):557–578, 2008.
18. D. Sudholt. A new method for lower bounds on the running time of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 17(3):418–435, 2013.
19. C. Witt. Runtime analysis of the (+1) EA on simple pseudo-Boolean functions. Evolutionary Computation, 14(1):65–86, 2006.
20. C. Witt. Population size versus runtime of a simple evolutionary algorithm. Theoretical Computer Science, 403(1):104–120, 2008.
21. Yang Yu and Chao Qian. Running time analysis: Convergence-based analysis reduces to switch analysis. In Proceedings of CEC’15, pages 2603–2610, Sendai, Japan, 2015.
22. Y. Yu, C. Qian, and Z.-H. Zhou. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 19(6):777–792, 2015.
23. Y. Yu and Z.-H. Zhou. A new approach to estimating the expected first hitting time of evolutionary algorithms. Artificial Intelligence, 172(15):1809–1832, 2008.
101728