A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Optimization Problems

# A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Optimization Problems

Bo Song and Victor O.K. Li,  Bo Song and Victor O.K. Li are with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Pokfulam, Hong Kong (e-mail: bsong@connect.hku.hk; vli@eee.hku.hk).
###### Abstract

Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this paper, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the -ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed.

Evolutionary algorithms, infinite population models, population dynamics, convergence in distribution, theoretical analysis

## I Introduction

Evolutionary algorithms (EAs) are general purpose optimization algorithms which saw great successes in real-world applications. They are inspired by the evolutionary process in nature. A certain number of candidate solutions to the problem at hand are modeled as individuals in a population, and through generations the algorithm evolves the population by producing new individuals and selectively replacing the old ones. The idea is that the survival probabilities of individuals in the population are related to their objective function values, or fitness values in this context. In general, individuals with more preferable objective function values or higher fitness values are more likely to survive and remain in the next generation. As a result, by the “survival of the fittest” principle, it is likely that after many generations the population will contain individuals with sufficiently high fitness values, such that these individuals are satisfactory solutions to the problem at hand.

Though conceptually simple, the underlying evolutionary processes and the behaviors of EAs remain to be fully understood. The difficulties lie in the fact that EAs are customizable population-based iterative stochastic algorithms, and the objective function also has great influence on their behaviors. A successful model of EAs should account for both the mechanisms of the algorithm and the influence from the objective function. One way to derive such models is to study EAs as dynamical systems. The idea is to pick a certain quantity of interest first, such as the distribution of the population or a certain statistic about it. Then, transitions in the state space of all possible outcomes about the picked quantity are studied. A Markov chain described by a transition matrix (when the state space is finite) or a difference equation (when the state space is not finite) is derived to describe how the picked quantity changes between consecutive generations.

Although dynamical system approach brings many insights about EAs, the state spaces of the models tend to grow rapidly as the population size increases. This is because in order to characterize the population dynamics accurately, the state space in the model has to be large enough to describe all the interdependencies between individuals in the current and next generations. As a result, even for time-homogeneous EAs with moderate population size, the dynamical system model is often too large and too complex to be analyzed or simulated. To overcome this issue, some researchers instead turn to studying the limiting behaviors of EAs as the population size goes to infinity. The idea is to exploit some kind of symmetry in the state space (such as all individuals have the same marginal distribution), and prove that in the limit the Markov chain can be described by a more compact model. The models built in this way are called infinite population models (IPMs).

In this paper, we follow this line of research and study IPMs of EAs on continuous space. More specifically, we aim at rigorously proving the convergence of IPMs. Notice that in this study by convergence we usually mean a certain property of IPMs. That an IPM converges loosely means that as the population size goes to infinity, the population dynamics of the real EA converge in a sense to the population dynamics predicted by this model. This usage is different from conventional ones where it means that the EA eventually locates and gets stuck in some local or global optima. Convergence results guarantee that IPMs characterize some kind of limiting behaviors of real EAs. They are the foundations and justifications of IPMs.

To our knowledge, there are very few research efforts which directly studied the convergence of IPMs. Among them, the studies of Qi et al. in[1, 2] are the classic and most relevant ones. Qi et al. studied the population dynamics of simple EA on continuous space. In the first part of their research[1], the authors built an IPM to analyze the population dynamics of simple EA with proportionate selection and mutation. Traditionally, a transition equation is constructed to describe how the probability density functions (p.d.f.s) of the joint distributions of individuals change between consecutive generations. The novelty of the authors’ research lies in their introduction of the modeling assumption that individuals in the same generation are exchangeable, and therefore they all have the same marginal distribution. Then, as a key result, the authors proved that as the population size goes to infinity, the marginal p.d.f.s of the populations produced by the real algorithm converge point-wisely to the p.d.f.s. predicted by the following transition equation:

 fxk+1(x)=∫Ffxk(y)g(y)fw(x|y)dy∫Ffxk(y)g(y)dy, (1)

where is the solution space, is the predicted marginal p.d.f. of the th generation, is the objective function to be maximized and is the conditional p.d.f. decided by the mutation operator. Though the transition equation of marginal distributions loses information of interdependency between individuals, it has simpler form and can still provide a relatively complete description of the population. Moreover, as proved in[1], it is accurate in the limiting case when the population size goes to infinity. Furthermore, in the second part of the research[2], the authors analyzed the crossover operator and modified the transition equation to include all three operators in the simple EA. Overall, the studies of Qi et al. are inspiring, especially the idea of combining the modeling assumption that individuals are exchangeable with the mathematical analysis of point-wise convergence of p.d.f.s as the population size goes to infinity.

However, as will be shown in Section II, the convergence proof for (1) in[1] is problematic. We provide a counterexample to show that in the authors’ proof a key assertion about the law of large numbers (LLN) for exchangeable random vectors is generally not true. Therefore the whole proof is unsound. Furthermore, we show that the modeling assumption of exchangeability of individuals can not yield the transition equation in general. This means that under the authors’ modeling assumption, the conclusion (1) cannot be reached.

In addition to the aforementioned problems, we also show that the authors’ proofs in both[1] and[2] are incomplete. The authors did not address the convergence of the stacking of operators and of recursively iterating the algorithm. In essence, the authors only attempted to prove the convergence of the IPM for one iteration step. Even if the proof for (1) is correct, it only shows that the marginal p.d.f. of the th population produced by the real algorithm converges point-wisely to calculated by (1), provided that the marginal p.d.f. of the th generation is and assuming that the population size goes to infinity. However, this convergence does not automatically hold for all subsequent generations. In fact, it rarely holds because is only accurate in the limit. Compared with finite-sized populations produced by the real algorithm, it inevitably encompasses errors. As a result, (1) cannot be iterated to make predictions for subsequent () generations.

Besides [1, 2], we found no other studies which attempted to prove the convergence of IPMs for EAs on continuous space. Therefore, to fill the research gap, in Section III we propose a general analytical framework. The novelty of our framework is that from the very start of the analysis, we model generations of the population as random elements taking values in the metric space of infinite sequences, and we use convergence in distribution instead of point-wise convergence to define the convergence of IPMs.

To understand the issues and appreciate our framework, consider an EA operating in on a fixed continuous optimization problem with different population sizes. When the population size is , denote the algorithm by . The th generation produced by can be described by the joint distribution of random vectors of , with each random vector representing an individual. Denote the random element modeling the th generation by . Similarly, the same EA with population size is denoted by , and the th generation it produces is modeled by . Finally, denote the IPM for this EA by , and the generations it produces by . Notice that each is a random sequence. Essentially, the convergence of IPMs requires that predicts every generation produced by as . Mathematically, this corresponds to the requirement that for each generation , the sequence converges to in some sense as .

However, it is not obvious how one can rigorously define the convergence for the sequence . This is because and the limit are all random elements taking values in different metric spaces. The range of is the Cartesian product of copies of , whereas the range of is the infinite product space . To overcome this issue, Qi et al. essentially defined the convergence of IPMs as , where stands for point-wise convergence of marginal p.d.f.s. However, as mentioned, we believe their proofs are problematic and incomplete.

In this research, we took a different approach. We extended , unified the ranges of random elements in a common metric space and gave a mathematically rigorous definition of sequence convergence. We assume for each generation , first generates an intermediate infinite sequence of individuals based on the previous generation . Here is a random sequence whose elements are conditionally independent and identically distributed (c.i.i.d.) given . Then, preserves the first elements of to form the new generation , i.e. . Basically the modified progresses in the order of . For , because is already a random sequence, we just let . Then, we define that is convergent if and only if for every , as , where represents convergence in distribution, or equivalently weak convergence. Our design has several advantages. Firstly, for every population size , the sequence coincides exactly with the population dynamics produced by without the intermediate sequence . In other words, our model is a faithful model and the intermediate step does not change the population dynamics. Secondly, the ranges of and are unified in the same metric space. Therefore we can rigorously define the convergence of IPMs. Finally, in our proposed framework, the convergence of the stacking of operators and of iterating the algorithm can be proved. All these benefits come from the interplay between the finite-dimensional population dynamics and its infinite dimensional extensions . The only modeling assumption in our framework is that new individuals are generated c.i.i.d. given the current generation. This is a reasonable assumption because exchangeability and c.i.i.d. are equivalent given the current population. We will present the framework and related topics in Section III.

To illustrate the effectiveness of our framework, we perform convergence analysis of IPM of the simple EA. As our analyses show, the modeling assumption of exchangeability cannot yield the transition equation. Therefore, to obtain meaningful results, we adopt a “stronger” modeling assumption that individuals of the same generation in the IPM are identically and independently distributed (i.i.d.). This assumption seems restricted at first sight, but it turns out to be a reasonable one. We analyze the mutation operator and the -ary recombination operator. We show that these commonly used operators have the property of producing i.i.d. populations, in the sense that if the initial population is i.i.d., as the population size goes to infinity, in the limit all subsequent generations are also i.i.d.. This means that for these operators, the transition equation in the IPM can predict the real population dynamics as the population size goes to infinity. We also show that our results hold even if these operators are stacked together and iterated repeatedly by the algorithm. These results are presented in Section IV. Finally, in Section V we conclude the paper and propose future research.

To be complete, regarding [1, 2], there is a comment from Yong[3] with reply. However, the comment was mainly about the latter part of [1], where the authors analyzed the properties of EAs based on the IPM. It did not discuss the proof for the model itself. For IPMs of EAs on discrete optimization problems, extensive research were done by Vose et al. in a series of studies[4, 5, 6, 7]. The problems under consideration were discrete optimization problems with finite solution space. The staring point of the authors’ analysis was to model each generation of the population as an “incidence vector”, which describes for each point in the solution space the proportion of the population it occupies. Based on this representation the authors derived transition equations between incidence vectors of consecutive generations and analyzed their properties as the population size goes to infinity. However, for EAs on continuous solution space, the analyses of Vose et al. are not immediately applicable. This is because for continuous optimization problems the solution space is not denumerable. Therefore, the population cannot be described by a finite-dimensional incidence vector.

## Ii Discussion of the Works of Qi et al.

In this section we analyze the results of Qi et al. in[1, 2]. We begin by introducing some preliminaries for the analysis. Then, in Section II-B, following the notations and derivations in the authors’ papers, we provide a counterexample to show that the convergence proof for the transition equation in[1] is problematic. We further show that the modeling assumption of exchangeability cannot yield the transition equation in general. In Section II-C, we show that the analyses in both[1] and[2] are incomplete. The authors did not prove the convergence of IPMs in the cases where operators are stacked together and the algorithm is iterated for multiple generations.

### Ii-a Preliminaries

In the authors’ paper[1], the problem to be optimized is

 argmaxxg(x) s.t. x∈F⊆Rm, (2)

where is the solution space and is some given objective function. The analysis intends to be general; therefore no explicit form of is assumed. The algorithm to be analyzed is the simple EA with proportionate selection and mutation. Let denote the th generation produced by the EA, where is the population size. To generate the th population, an intermediate population is firstly generated based on by the proportionate selection operator. The elements in are c.i.i.d given . The distribution of follows the conditional probability that

 P(x′ik=xjk|Xk)=g(xjk)N∑l=1g(xlk), for all i,j=1,2,…,N. (3)

After selection, each individual in is mutated to generate individuals in . The mutation is conducted following the conditional p.d.f.

 f(xik+1=x|x′ik=y)=fw(x|y). (4)

Overall the algorithm is illustrated in Fig. 1.

After presenting the optimization problem and the algorithm, the authors proved the convergence of the IPM. It is the main result in[1]. It can be reiterated as follows.

###### Theorem 1 (Theorem 1 in Qi et al.[1]).

Assume that the fitness function in (2) and the mutation operator of simple EA described by (4) satisfy the following conditions:

1. .

2. .

Then as , the time history of the simple EA can be described by a sequence of random vectors with densities

 fxk+1(x)=∫Ffxk(y)g(y)fw(x|y)dy∫Ffxk(y)g(y)dy. (5)

In Theorem 1, is the marginal p.d.f. of the th generation predicted by the IPM.

As the proof for Theorem 1 in[1] and the analyses in this paper use the concept of exchangeability in probability theory, we list its definition and some basic facts.

###### Definition 1 (Exchangeable random variables, Definition 1.1.1 in[8]).

A finite set of random variables is said to be exchangeable if the joint distribution of is invariant with respect to permutations of the indices . A collection of random variables is said to be exchangeable if every finite subset of is exchangeable.

Definition 1 can also be extended to cover exchangeable random vectors or exchangeable random elements by replacing the term “random variables” in the definition with the respective term. One property of exchangeability is that if are exchangeable random elements, then the joint distributions of any distinct ones of them are always the same (Proposition 1.1.1 in[8]). When this property indicates that have the same marginal distribution. Another property is that a collection of random elements are exchangeable if and only if they are c.i.i.d. given some -field (Theorem 1.2.2 in[8]). Conversely, a collection of c.i.i.d. random elements are always exchangeable. Finally, an obvious fact is that i.i.d. random elements are exchangeable, but the converse is not necessarily true.

It can be seen that the simple EA generates c.i.i.d. individuals given the current population. Therefore, the individuals within the same generation are exchangeable, and they have the same marginal distribution. This leads to the transition equation of marginal p.d.f.s in Theorem 1. To analyze its proof and construct our framework, we will also use the definition and properties of exchangeability.

### Ii-B Convergence Proof of the Transition Equation

In this section we analyze the proof of Theorem 1 and show that it is incorrect. The proof by Qi et al. is in Appendix A of[1]. In the proof the authors assumed that individuals in the same generation are exchangeable, therefore they have the same marginal distribution. After a series of derivation steps, the authors managed to obtain a transition equation between the density functions of and :

 fxik+1(x)= ∬FNg(yj)fw(x|yj)1NN∑l=1g(yl)fXk(y1,y2,…,yn) dy1dy2…dyn for any fixed i,j (6) = E[ξk(x)ηNk], (7)

where in (7),

 ξk(x) ≜g(xjk)fw(x|xjk) for any fixed j, (8) ηNk ≜1NN∑l=1g(xlk). (9)

(6) and (7) are exact. They accurately describe how the marginal p.d.f. for any individual in the next generation can be calculated from the joint p.d.f. of individuals in the current generation. Noticing that is the average of the exchangeable random variables , by the LLN for exchangeable random variables, the authors asserted that

 (10)

The authors further asserted that is itself a random variable, satisfying

 E[ηk]=E[g(xjk)] for any j. (11)

(10) and (11) correspond to (A13) and (A14) in Appendix A of[1], respectively. The authors’ proof is correct until this step. However, the authors then asserted that

is independent of for any finite N. In particular, is independent of for all .

\tagform@

12

Based on this assertion the authors then proved that for all and ,

 limN→0∣∣ ∣∣E[ξk(x)ηNk]−E[ξk(x)]E[ηk]∣∣ ∣∣=0. (13)

Therefore, the p.d.f. in (7) converges point-wisely to . Noticing that the expression of is equal to the right hand side of (5), the authors claimed that Theorem 1 is proved.

In the following, we provide a counterexample to show that assertion (II-B) is not true. Then, we carry out further analysis to show that under the modeling assumption of exchangeability, the conclusion in (13), or equivalently Theorem 1, cannot be true in general.

#### Ii-B1 On Assertion (Ii-B)

We first reformulate the assertion. Since are exchangeable, are exchangeable (Property 1.1.2 in[8]). Let . Then the premises of Theorem 1 are equivalent to

 {yl}Nl=1 are exchangeable and gmin≤yl≤gmax. (14)

Let . According to (9), (10) and (11), has the properties that

 limN→∞1NN∑l=1yl=y, a.s., (15) E(y)=E(yl) for any l. (16)

Since is a general function, there is no other restrictions for and . Therefore, (II-B) is equivalent to the following assertion:

For any and satisfying (14), (15) and (16), and are independent for any finite . In particular, is independent of for any .

\tagform@

17

However, we use the following counterexample (modified from Example 1.1.1 and related discussions on pages 11-12 in[8]) to show that assertion (II-B1) is not true. Therefore (II-B) is not true.

#### Ii-B2 Counterexample

Let be a sequence of i.i.d. random variables satisfying

 −gmax−gmin4≤zl≤gmax−gmin4 and E(zl)=0

for all . Let be a random variable independent of satisfying

 gmax+3gmin4≤y≤3gmax+gmin4

and

 E(y)=gmax+gmin2.

Finally, let for all .

It can easily be verified that and satisfy (14) and (16). Since is bounded, for any . By the strong law of large numbers (SLLN) for i.i.d. random variables,

 1NN∑l=1zl→0 a.s. as % N→∞.

Therefore (15) is also satisfied, i.e. is the limit of as . However, because and is independent of , it can be seen that is not independent of except for some degenerate cases (for example when equals to a constant). In particular, in general is not independent of for any . Therefore, assertion (II-B1) is not true. Equivalently, assertion (II-B) is not true.

#### Ii-B3 Further Analysis

In[1] the authors intended to prove Theorem 1, or equivalently (13). As shown by the counterexample, assertion (II-B) is not true. This renders the authors’ proof for (13) invalid.

In the following, we carry out further analysis to show that (13) cannot be true even considering other methods of proof and adding new sufficient conditions. Therefore, in general, Theorem 1 cannot be true.

To begin with, consider the random variable . We prove the following lemma.

as .

###### Proof.

According to (10), . Since , almost surely.

Since is continuous on , we have

 h(ηNk)=1ηNka.s.−−→h(ηk)=1ηk (Proposition 47.2 in% \@@cite[cite]{[\@@bibref{}{prob}{}{}]}).

Then we have

 ξk(x)ηNka.s.−−→ξk(x)ηk (Proposition 47.4 (ii) in\@@cite% [cite]{[\@@bibref{}{prob}{}{}]}).

Finally, by the conditions in Theorem 1, . By the Lebesgue’s Dominated Convergence Theorem (Proposition 11.30 in[9]), we have as . ∎

Now by Lemma 1, (13) is equivalent to

 E(ξk(x)ηk)=E[ξk(x)]E[ηk].

Now it is clear that if the only assumption is exchangeability, () is not true even considering other methods of proof. Of course, if (II-B) is true, and are independent, then (II-B3) is true. However, as already shown by the counterexample, (II-B) is not true in general. Therefore, (II-B3), and equivalently Theorem 1, are in general not true.

A natural question then arises: is it possible to introduce some reasonable sufficient conditions such that () can be proved? One of such conditions frequently used is that , i.e. converges to its expectation, a constant which equals for any . However, the following analysis shows that given the modeling assumption of exchangeability, this condition is not true in general. Therefore it cannot be introduced.

For exchangeable random variables , we have

 V(ηk)= limN→∞V(ηNk) (18) = limN→∞⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩E⎡⎢ ⎢ ⎢ ⎢ ⎢⎣N∑l=1g(xlk)N⎤⎥ ⎥ ⎥ ⎥ ⎥⎦2−⎡⎢ ⎢ ⎢ ⎢ ⎢⎣EN∑l=1g(xlk)N⎤⎥ ⎥ ⎥ ⎥ ⎥⎦2⎫⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎭ = limN→∞1N2{N∑l=1V[g(xlk)]+ = (19) = C[g(x1k),g(x2k)], (20)

where is the variance of and is the covariance of and . (18) is by the boundedness of and the Lebesgue’s Dominated Convergence Theorem, (19) is by the exchangeability of , and (20) is by the boundedness of and pushing to infinity. Now it is clear that if the only modeling assumption is exchangeability, there is no guarantee that . Therefore, in general does not converge to a constant. Thus this condition cannot be introduced as a sufficient condition in order to prove (II-B3).

#### Ii-B4 Summary

As the analyses in this section show, the transition equation (5) does not hold under the modeling assumption of exchangeability. However, it does not preclude the possibility of enhancing the modeling assumption so that it can yield analytical results similar to the transition equation (5). We deal with this issue by adopting the “stronger” i.i.d. assumption when building IPMs. However, before presenting our framework and analyses, we show why the proofs in both[1] and[2] are incomplete.

### Ii-C The Issue of the Stacking of Operators and Iterating the Algorithm

In the following, we discuss IPMs from another perspective. Consider an EA with only one operator. Let the operator be denoted by . When the population size is , denote this EA by and the operator it actually uses by . Let denote the th generation produced by . Then the transition rules between consecutive generations produced by can be described by . In Table I, we write down the population dynamics of . Each row in Table I shows the population dynamics produced by . In the table is expanded as . Let denote the IPM, and denote the populations predicted by . Then we can summarize the results in[1] in the following way.

Let represent the combined operator of proportionate selection and mutation. Though the authors originally developed the transition equation from the th to the th generation, without loss of generality we can consider only the populations from the initial generation to the onward ones. Assume that the initial population comes from a known sequence of individuals, represented by . For , its initial population consists of the first elements of , i.e. . Let . This setting represents the fact that uses the same initial population, and knows this exact initial population. The aim of is to predict the subsequent populations. Considering that and are all from , if we redefine to be operators on which only takes the first elements to produce the next generation, then the authors essentially proved that

 Hn(P0)m.p.% w.−−−−−→H∞(P0) as n→∞, (21)

where m.p.w. stands for point-wise convergence of marginal p.d.f.s.

However, apart from the fact that this proof is problematic, the authors’ proof covers only one iteration step, corresponding to the column-wise convergence of the column in Table I. The problem is that even if (21) is true, it does not automatically lead to the conclusion that for the arbitrary th step, as . In other words, one has to study whether the transition equation for one step can be iterated recursively to predict the populations after multiple steps. In Table I, this problem corresponds to whether other columns have similar column-wise convergence property when the convergence of the column is proved.

To give an example, consider the column of in Table I. To prove column-wise convergence, the authors need to prove that given (21),

 Hn(Pn1)→m.p.w.H∞(P∞1), or equivalently (22) [Hn]2(P0)m.p.w.−−−−→[H∞]2(P0) (23)

as . Comparing (21) with (22) and (23), (22) has the same sequence of operators but with a sequence of converging inputs, (23) has the same input but with a sequence of different operators. Therefore, they are not necessarily true even if (21) is proved. In fact, different techniques may have to be adopted to prove (21) and (22), or equivalently (21) and (23). Similar problem exists when considering the arbitrary th generation. We call this problem the issue of iterating the algorithm. As Qi et al. in both[1, 2] ignored this issue, we believe their proofs are incomplete.

The issue of the stacking of operators is similar. Given some operator satisfying (21) and some operator satisfying

 Gn(P0)m.p.% w.−−−−−→G∞(P0)

as , it is not necessarily true that

 Hn(Gn(P0))m.p.w.−−−−→H∞(G∞(P0))

as . However, the authors in[2] totally ignored this issue and combined the transition equations for selection, mutation and crossover together (in Section III of[2]) without any justification.

In addition, there are several statements in the authors’ proofs in[2] that are questionable. First, in the first paragraph of Appendix A (the proof for Theorem 1 in that paper), the authors considered a pair of parents and for the uniform crossover operator. and are “drawn from the population independently with the same density of ”. Then, the authors claimed that “the joint density of and is therefore ”. This is simply not true. Two individuals drawn independently from the same population are conditionally independent, they are not necessarily independent, unless the modeling assumption is that all individuals in the same population are independent. In fact, without the i.i.d. assumption, it is very likely that individuals in the same population are dependent. Therefore, the joint density function of and is not necessarily , and the authors’ proof for Theorem 1 in[2] is dubious at best. On the other hand, even if the authors’ modeling assumption is independence of individuals for the uniform crossover operator, this assumption is incompatible with the modeling assumption of exchangeability in[1] for the operators of selection and mutation. Therefore, combining the transition equations for all these three operators is problematic, because the assumption of independence cannot hold beyond one iteration step.

Another issue in[2] is that the uniform crossover operator produces two dependent offspring at the same time. As a result, after uniform crossover, the intermediate population is not even exchangeable because it has pair-wise dependency between individuals. Then the same problem arises, that is the transition equation for the uniform crossover operator cannot be combined with the transition equations for selection and mutation. This is because the uniform crossover operator produces intermediate populations without exchangeability, but this property is required for modeling selection and mutation. Besides, the transition equation for the uniform crossover operator cannot be iterated beyond one step. This is because regardless of independence or exchangeability as its modeling assumption, this assumption will surely be corrupted beyond one iteration step.

In summary, several issues arise from previous studies on IPMs for EAs on continuous optimization problems. Therefore, new frameworks and proof methods are needed for analyzing the convergence of IPMs and studying the issue of the stacking of operators and iterating the algorithm.

## Iii Proposed Framework

In this section, we present our proposed analytical framework. In constructing the framework we strive to achieve the following three goals.

1. The framework should be general enough to cover real world operators and to characterize the evolutionary process of real EA.

2. The framework should be able to define the convergence of IPMs and serve as justifications of using them. The definition should match people’s intuitions and at the same time be mathematically rigorous.

3. The framework should provide an infrastructure to study the issue of the stacking of operators and iterating the algorithm.

The contents of this section roughly reflects the pursuit of the first two goals. The third goal is reflected in the analyses of the simple EA in Section IV. More specifically, in Section III-A, we introduce notations and preliminaries for the remainder of this paper. In Section III-B, we present our framework. In the framework, each generation is modeled by a random sequence. This approach unifies the spaces of random elements modeling populations of different sizes. In Section III-C, we define the convergence of the IPM as convergence in distribution on the space of random sequences. We summarize and discuss our framework in Section III-D.

To appreciate the significance of our framework, it is worth reviewing the methodology in [1, 2] studying the convergence of IPMs. Implicitly, the authors in [1, 2] used point-wise convergence of marginal p.d.f. as the criteria of defining the convergence of IPMs. Apart from the proofs being problematic and incomplete, this definition does not consider the joint distribution of individuals of the population. Thus, it loses information and cannot characterize the dynamics of the whole population. Besides, point-wise convergence of p.d.f.s depends on the existence and the explicit forms of the p.d.f.s. This fact limits the generality of the methodology. In addition, compared with convergence in distribution used in this paper, the criteria of point-wise convergence is unnecessarily strict. In essence, the core of the criteria should characterize the similarity between distributions of random elements. In this regard, convergence in distribution matches the intuition and suffices for the task. A stronger criteria, such as point-wise convergence, will inevitably increase the difficulty in analysis. Finally, in this paper we separate the framework (the definition of the convergence of IPMs) from the analyses of operators. The organization is logical and general.

### Iii-a Notations and Preliminaries

In the remainder of this paper we focus on the unconstrained continuous optimization problem

 argmaxxg(x) s.t. x∈Rd, (24)

where is some given objective function. Our framework is general enough such that it does not require other conditions on the objective function . However, to prove the convergence of IPMs for mutation and recombination, conditions such as those in Theorem 1 are sometimes needed. We will introduce them when they are required.

From now on we use to denote the set of nonnegative integers and the set of positive integers. For any two real numbers and , let be the smaller one of them and be the larger one of them. Let be random elements of some measurable space . We use to represent the law of . If and follow the same law, i.e. for every , we write . Note that and have different meanings. In particular, indicates dependency between and .

We use the notation to represent the array . When , represents the infinite sequence . We use and to represent the collections and , respectively. When the range is clear, we use and or and for short.

Let denote the solution space . This simplifies our notation system when we discuss the spaces and . In the following, we define metrics and -fields on , and and state properties of the corresponding measurable spaces.

is equipped with the ordinary metric . Let denote the Borel -field on generated by the open sets under . Together defines a measurable space.

Similarly, is equipped with the metric , and the corresponding Borel -field under is denoted by . Together is the measurable space for tuples.

Next, consider the space of infinite sequences . It is equipped with the metric

 ρ∞(x,y)=∞∑i=112i⋅ρ(xi,yi)1+ρ(xi,yi).

The Borel -field on under is denoted by . Then is the measurable space for infinite sequences.

Since is separable and complete, it can be proved that and are also separable and complete (Appendix M6 in [10]). In addition, because of separability, the Borel -fields and are equal to and , respectively. In other words, the Borel -fields and generated by the collection of open sets under the corresponding metrics coincide with the product -fields generated by all measurable rectangles () and all measurable cylinder sets (), respectively (Lemma 1.2 in [11]). Therefore, from now on we write and for the corresponding Borel -fields. Finally, let , and denote the set of all random elements of , and , respectively.

Let be the natural projection: . Since given , defines a random element of projected from , we also use to denote the mapping: where . By definition, is the operator which truncates random sequences to random vectors. Given , we use to denote the projection of , i.e. .

### Iii-B Analytical Framework for EA and IPMs

In this section, we present an analytical framework for the EA and IPMs. First, the modeling assumptions are stated. We only deal with operators which generate c.i.i.d. individuals. Then, we present an abstraction of the EA and IPMs. This abstraction serves as the basis for building our framework. Finally, the framework is presented. It unifies the range spaces of the random elements and defines the convergence of IPMs.

#### Iii-B1 Modeling Assumptions

We assume that the EA on the problem (24) is time homogeneous and Markovian, such that the next generation depends only on the current one, and the transition rule from the th generation to the th generation is invariant with respect to . We further assume that individuals in the next generation are c.i.i.d. given the current generation. As this assumption is the only extra assumption introduced in the framework, it may need some further explanation.

The main reason for introducing this assumption is to simplify the analysis. Conditional independence implies exchangeability, therefore individuals in the same generation are always exchangeable. As a result, it is possible to exploit the symmetry in the population and study the transition equations of marginal distributions. Besides, it is because of conditional independence that we can easily expand the random elements modeling finite-sized populations to random sequences, and therefore define convergence in distribution for random elements of the corresponding metric space. In addition, many real world operators in EAs satisfy this assumption, such as the proportionate selection operator and the crossover operator analyzed in [1, 2].

However, we admit that there are some exceptions to our assumption. A most notable one may be the mutation operator, though it does not pose significant difficulties. The mutation operator perturbs each individual in the current population independently, according to a common conditional p.d.f. If the current population is not exchangeable, then after mutation the resultant population is not exchangeable, either. Therefore, it seems that mutation does not produce c.i.i.d. individuals. However, considering the fact that mutation is often used along with other operators, as long as these other operators generate c.i.i.d. populations, the individuals after mutation will be c.i.i.d., too. Therefore, a combined operator of mutation and any other operator satisfying the c.i.i.d. assumption can satisfy our assumption. An example can be seen in [1], where mutation is analyzed together with proportionate selection. On the other hand, an algorithm which only uses mutation is very simple. It can be readily modeled and analyzed without much difficulty.

Perhaps more significant exceptions are operators such as selection without replacement, or the crossover operator which produces two dependent offspring at the same time. In fact, for these operators not satisfying the c.i.i.d. assumption, it is still possible to expand the random elements modeling finite-sized population to random sequences. For example, the random elements can be padded with some fixed constants or random elements of known distributions to form the random sequences. In this way, our definition of the convergence of IPMs can still be applied. However, whether in this scenario convergence in distribution for these random sequences can still yield meaningful results similar to the transition equation is another research problem. It may need further investigation. Nonetheless, our assumption is equivalent to the exchangeability assumption generally used in previous studies.

#### Iii-B2 The Abstraction of EA and IPMs

Given the modeling assumptions, we develop an abstraction to describe the population dynamics of the EA and IPMs.

Let the EA with population size be denoted by , and the th generation it produces be modeled as a random element , where is a random element representing the th individual in . Without loss of generality, assume that the EA has two operators, and . In each iteration, the EA first employs on the current population to generate an intermediate population, on which it then employs to generate the next population. Notice that here and are just terms representing the operators in the real EA. They facilitate describing the evolutionary process. For , and are actually instantiated as functions from to , denoted by and , respectively. For example, if represents proportionate selection, the function is the actual operator in generating c.i.i.d. individuals according to the conditional probability (3). Of course, for the above abstraction to be valid, the operators used in should actually produce random elements in , i.e. the newly generated population should be measurable on . As most operators in real EAs satisfy this condition and this is the assumption implicitly taken in previous studies, we assume that this condition is automatically satisfied.

Given these notations, the evolutionary process of can be described by the sequence , where the initial population is known and the generation of follows the recurrence equation

 Pnk+1=(Hn∘Gn)(Pnk). (25)

Then understanding the population dynamics of the EA can be achieved by studying the distributions and properties of .

Let the IPM of the EA be denoted by . The population dynamics it produces can be described by the sequence , where is known and the generation of follows the recurrence equation

 P∞k+1=(H∞∘G∞)(P∞k), (26)

in which are operators in modeled after and . Then, the convergence of basically requires that converges to for every generation .

#### Iii-B3 The Proposed Framework

As stated before, for each generation , the elements of the sequence and the limit are all random elements of different metric spaces. Therefore, the core of developing our model is to expand to random sequences, while ensuring that this expansion will not affect modeling the evolutionary process of the real EA. The result of this step is the sequence of random sequences for each , which completely describes the population dynamics of . For the population dynamics of , we just let .

The expansion of and the relationships between , and are the core of our framework. In the following, we present them rigorously.

#### Iii-B4 The Expansion of Pnk

We start by decomposing each of and to two operators. One operator is from to . It corresponds to how to convert random sequences to random vectors. A natural choice is the projection operator .

To model the evolutionary process, we also have to define how to expand random vectors to random sequences. In other words, we have to define the expansions of and , which are functions from to .

###### Definition 2 (The expansion of operator).

For an operator satisfying the condition that for any , the elements of are c.i.i.d. given , the expansion of is the operator , satisfying that for any ,

1. .

2. The elements of are c.i.i.d. given .

In Definition 2, the operator is the expansion of . Condition 1) ensures that can be safely replaced by . Condition 2) ensures that the paddings for the sequence are generated according to the same conditional probability distribution as that used by to generate new individuals. In other words, if the operator describes how generates each new individual from the current population, is equivalent to invoking independently on the current population for times, and is equivalent to invoking independently for infinite times. Finally, because satisfies the condition in the premise, the expansion always exists.

By Definition 2, the operators in can be decomposed as and , respectively. Then, the evolutionary process of can be described by the sequence of random sequences , satisfying the recurrence equation

 Qnk+1=(˜Hn∘πn∘˜Gn)(Pnk), (27)

where follows the recurrence equation (25), and . It can also be proved that

 Pnk=πn(Qnk). (28)

Essentially, (27) and (28) describe how the algorithm progresses in the order . It fully characterizes the population dynamics , and it is clear that the extra step of generating does not introduce modeling errors.

For , because , there is no need for expansion. For convenience we simply let