Finite First Hitting Timeversus Stochastic Convergence inParticle Swarm Optimisation1footnote 11footnote 1 This is an extended version of a paper published in MIC 2011 [11]. Supported by Deutsche Forschungsgemeinschaft (DFG) under grant no. WI 3552/1-1.

Finite First Hitting Time versus Stochastic Convergence in Particle Swarm Optimisation1

Abstract

We reconsider stochastic convergence analyses of particle swarm optimisation, and point out that previously obtained parameter conditions are not always sufficient to guarantee mean square convergence to a local optimum. We show that stagnation can in fact occur for non-trivial configurations in non-optimal parts of the search space, even for simple functions like Sphere. The convergence properties of the basic PSO may in these situations be detrimental to the goal of optimisation, to discover a sufficiently good solution within reasonable time. To characterise optimisation ability of algorithms, we suggest the expected first hitting time (FHT), i. e., the time until a search point in the vicinity of the optimum is visited. It is shown that a basic PSO may have infinite expected FHT, while an algorithm introduced here, the Noisy PSO, has finite expected FHT on some functions.

1 Introduction

Particle Swarm Optimisation (PSO) is an optimisation technique for functions over continuous spaces introduced by Kennedy and Eberhart [10]. The algorithm simulates the motions of a swarm of particles in the solution space. While limited by inertia, each particle is subject to two attracting forces, towards the best position visited by the particle, and towards the best position visited by any particle in the swarm. The update equations for the velocity and the position are given in Algorithm 1 in Section 2. The inertia factor , and the acceleration coefficients and are user-specified parameters. The algorithm only uses objective function values when updating and , and does not require any gradient information. So the PSO is a black-box algorithm [4]. It is straightforward to implement and has been applied successfully in many optimisation domains. Despite its popularity, the theoretical understanding of the PSO remains limited. In particular, how do the parameter settings influence the swarm dynamics, and in the end, the performance of the PSO?

One of the best understood aspects of the PSO dynamics are the conditions under which the swarm stagnates into an equilibrium point. It is not too difficult to see (e. g., [3]) that velocity explosion can only be avoided when the inertia factor is bounded by

(1)

The magnitude of the velocities still depends heavily on how the global and local best positions evolve with time , which again is influenced by the function that is optimised. To simplify the matters, it has generally been assumed that the swarm has entered a stagnation mode, where the global and local best particle positions and remain fixed. Under this assumption, there is no interaction between the particles, or between the problem dimensions, and the function to be optimised is irrelevant. The swarm can therefore be understood as a set of independent, one-dimensional processes.

An additional simplifying assumption made in early convergence analyses was to disregard the stochastic factors and , replacing them by constants [15, 3]. Trelea [14] analysed the 1-dimensional dynamics under this assumption, showing that convergence to the equilibrium point

(2)

occurs under condition (1) and

(3)

Kadirkamanathan et al. [9] were among the first to take the stochastic effects into account, approaching the dynamics of the global best particle position (for which ) from a control-theoretic angle. In particular, they considered asymptotic Lyapunov stability of the global best position, still under the assumption of fixed and . Informally, this stability condition is satisfied if the global best particle always converges to the global best position when started nearby it. Assuming a global best position in the origin, their analysis shows that condition (1), , and

(4)

are sufficient to guarantee asymptotic Lyapunov stability of the origin. These conditions are not necessary, and are conservative. Another stochastic mode of convergence considered, is convergence in mean square (also called second order stability) to a point , defined as Mean square convergence to implies that the expectation of the particle position converges to , while its variance converges to 0. It has been claimed that all particles in the PSO converges in mean square to the global best position if the parameter triplet is set appropriately. Jiang et al. [8] derived recurrence equations for the sequences and assuming fixed and , and determined conditions, i. e. a convergence region, under which these sequences are convergent. The convergence region considered in [8] is strictly contained in the convergence region given by the deterministic condition (3). For positive , the Lyapunov stability region described by condition (4) is strictly contained in the mean square stability region. Given the conditions indicated in Figure 1, the expectation will converge to (as in Eq. (2)), while the variance will converge to a value which is proportional to . It is claimed that the local best converges to , which would imply that the variance converges to 0. However, as we will explain in later sections, this is not generally correct. We will discuss further assumptions that are needed to fix the claim of [8]. Wakasa et al. [16] pointed out an alternative technique for determining mean square stability of specific parameter triplets. They showed that this problem, and other problems related to the the PSO dynamics, can be reduced to checking the existence of a matrix satisfying an associated linear matrix inequality (LMI). This is a standard approach in control theory, and is popular because the reduced LMI problem can be solved efficiently using convex- and quasi-convex optimisation techniques [1]. Wakasa et al. [16] obtained explicit expressions for the mean square stability region, identical to the stability region obtained in [8], using this technique. Assuming stagnation, Poli [12] provided recurrence equations for higher moments (e. g. skewness and kurtosis) of the particle distribution. The equations for the -th moment are expressed with an exponential number of terms in , but can be solved using computer algebra systems for not too high moments.

Recently, there has been progress in removing the stagnation assumption on and . Building on previous work by Brandstätter and Baumgartner [2], Fernández-Martínez and García-Gonzalo [5] interpret the PSO dynamics as a discrete-time approximation of a certain spring-mass system. From this mechanical interpretation follows naturally a generalisation of the PSO with adjustable time step , where the special case corresponds to the standard PSO. In the limit where , one obtains a continuous-time PSO governed by stochastic differential equations. They show that dynamic properties of the discrete-time PSO approach those of the continuous-time PSO when the time step approaches 0.

Figure 1: Comparison of convergence regions. Noisy PSO indicates when the precondition of Theorem 2 holds. (-axis: , -axis: ).

While theoretical research on PSO has mainly focused on convergence, there may be other theoretical properties that are more relevant in the context of optimisation. The primary goal in optimisation is to obtain a solution of acceptable quality within reasonable time. Convergence may be neither sufficient, nor necessary to reach this goal. In particular, convergence is insufficient when stagnation occurs at non-optimal points in the solution space. Furthermore, stagnation is not necessary when a solution of acceptable quality has been found.

As an alternative measure, we suggest to consider for arbitrarily small the expected time until the algorithm for the first time obtains a search point for which , where is the function value of an optimal search point, where time is measured in the number of evaluations of the objective function. We call this the expected first hitting time (FHT) with respect to . As a first condition, it is desirable to have finite expected FHT for any constant . Informally, this means that the algorithm will eventually find a solution of acceptable quality. Secondly, it is desirable that the growth of the expected FHT is upper bounded by a polynomial in and the number of dimensions of the problem. Informally, this means that the algorithm will not only find a solution of acceptable quality, but will do so within reasonable time.

Some work has been done in this direction. Sudholt and Witt [13] studied the runtime of the Binary PSO, i. e. in a discrete search space. Witt [17] considered the Guaranteed Convergence PSO (GCPSO) with one particle on the Sphere function, showing that if started in unit distance to the optimum, then after iterations, the algorithm has reached the -ball around the optimum with overwhelmingly high probability. The GCPSO avoids stagnation by resetting the global best particle to a randomly sampled point around the best found position. The behaviour of the one-particle GCPSO therefore resembles the behaviour of a (1+1) ES, and the velocity term does not come into play. In fact, the analysis has some similarities with the analysis by Jägersküpper [6].

The objectives of this paper are three-fold. Firstly, in Section 3, we show that the expected first hitting time of a basic PSO is infinite, even on the simple Sphere function. Secondly, in Section 4, we point out situations where the basic PSO does not converge in mean square to the global best particle (which needs not be a global optimum), despite having parameters in the convergence region. We discuss what extra conditions are needed to ensure mean square convergence. Finally, in Section 5, we consider a Noisy PSO which we prove to have finite expected FHT on the 1-dimensional Sphere function. Our results also hold for any strictly increasing transformation of this function because the PSO is a comparison-based algorithm.

2 Preliminaries

In the following, we consider minimisation of functions. A basic PSO with swarm size optimising an -dimensional function is defined below. This PSO definition is well-accepted, and called Standard PSO by Jiang et al. [8]. The position and velocity of particle at time are represented by the pair of vectors and . The parameter bounds the initial positions and velocities.

  for each particle , and dimension  do
     
  end for
  
  for  until termination condition satisfied do
     for each particle and dimension  do
        
(5)
(6)
     end for
  end for
Algorithm 1 Basic PSO

Assume that a function has at least one global minimum . Then for a given , the first hitting time (FHT) of the PSO on function is defined as the number of times the function is evaluated until the swarm for the first time contains a particle for which . We assume that the PSO is implemented such that the function is evaluated no more than times per time step . As an example function, we consider the Sphere problem, which for all is defined as , where denotes the Euclidian norm. This is a well-accepted benchmark problem in convergence analyses and frequently serves as a starting point for theoretical analyses.

3 Stagnation

Particle convergence does not necessarily occur in local optima. There are well-known configurations, e. g. with zero velocities, which lead to stagnation [15]. However, it is not obvious for which initial configurations and parameter settings the basic PSO will stagnate outside local optima. Here, it is shown that stagnation occurs already with 1 dimension for a broad range of initial parameters. It follows that the expected first hitting time of the basic PSO can be infinite.

As a first example of stagnation, we consider the basic PSO with swarm size one on the Sphere problem. Note that it is helpful to first study the PSO with swarm size one before analysing the behaviour of the PSO with larger swarm sizes. This is similar to the theory of evolutionary algorithms (EAs), where it is common to initiate runtime analyses on the simple (1+1) EA with population size one, before proceeding to more complex EAs.

Proposition 1.

The basic PSO with inertia factor and one particle () has infinite expected FHT on Sphere ().

Proof.

We say that the bad initialisation event has occurred if the initial position and velocity satisfy and This event occurs with positive probability. We claim that if the event occurs, then in any iteration , and If the claim holds, then for all , it holds that and . Therefore,

and the proposition follows. Note that since for each , the velocity reduces to .

The claim is proved by induction on . The base case clearly holds, because and . Assume the claim holds for all iterations smaller than . By induction, it holds that Therefore, by the induction hypothesis,

The claim now holds for all . The expected FHT conditional on the bad initialisation event is therefore infinite. However, the bad initialisation event occurs with positive probability, so the unconditional expected FHT is infinite by the law of total probability. ∎

We prove that the stagnation on Sphere illustrated in Proposition 1 is not an artefact of a trivial swarm size of . In the following theorem, we prove stagnation for a swarm of size  and think that the ideas can be generalised to bigger swarm sizes. We allow any initialisation of the two particles that are sufficiently far away from the optimum. It is assumed that both velocities are non-positive in the initialisation step, which event occurs with constant probability for uniformly drawn velocities.

Theorem 1.

Consider the basic PSO with two particles on the one-dimensional Sphere. If , , , where

all hold together, then the expected FHT for the -ball around the optimum is infinite.

The conditions are fulfilled, e. g., if , , , , , and . For a proof, we note that the assumed initialisation with positive particle positions, negative velocities and sufficiently large makes the sequences , , non-increasing provided no negative values are reached. Furthermore, the update equation for the velocities will then consist of three random non-positive terms, which means that velocities remain negative. In Lemma 2, we focus on the distance of the particles and show that its expectation converges absolutely to zero. The proof of this lemma makes use of Lemma 1, which gives a closed-form solution to a generalisation of the Fibonacci-sequence. In another lemma, we consider the absolute velocities over time and show that the series formed by these also converges in expectation. The proof of the theorem will be completed by applications of Markov’s inequality.

Lemma 1.

For any real , there exists two reals and such that the difference equation has the solution where

Proof.

The proof is by induction over . The lemma can always be satisfied for and by choosing appropriate and . Hence, assume that the lemma holds for all for some and . Note that

It therefore follows by the induction hypothesis that

Lemma 2.

Given , suppose that for all it holds that and . Then .

Proof.

The proof is mainly based on an inspection of the update equation of PSO. The aim is to obtain a recurrence for , where we have to distinguish between two cases. We abbreviate and in the following.

If , then and the update equations are

which means Since for , we obtain for , where we define to make the equation apply also for . Together, this gives us

(7)

If , then the update equations are and , which again results in (7) and finishes the case analysis. Taking absolute values on both sides of (7) and applying the triangle inequality to the right-hand side, we get

After taking the expectation and noting that , we have

which implies as the right-hand side is linear in both and . We are left with an estimate for . By the law of total probability,

Since is uniformly distributed and , the first conditional expectation is , while the second conditional expectation is . The probabilities for the conditions to occur are and , respectively. This results in

which finally gives us the following recurrence on :

Introducing and using , we have in more compact form that

for . Solving this recursion (noting that all terms are positive) using Lemma 1 yields for that

Note that if and only if . Furthermore, the factor in front of has clearly smaller absolute value than . We obtain which we wanted to show. ∎

The following lemma uses the previous bound on to show that the expected sum of velocities converges absolutely over time. This means that the maximum achievable progress is bounded in expectation. As an example, when choosing and in the following, we obtain a value of about for this bound.

Lemma 3.

Suppose the prerequisites of Lemma 2 apply. Then for it holds that

Proof.

For notational convenience, we drop the upper index and implicitly show the following for both and . According to the update equation of PSO, we have for , using . Resolving the recurrence yields for that

Hence,

since and . Using the linearity of expectation,

By Lemma 2, Hence, the series over the converges according to

which yields

where we have used . ∎

We are ready to prove Theorem 1.

Proof of Theorem 1.

Throughout this proof, we suppose the prerequisites from Lemma 2 to hold, which, as we will show, is true for an infinite number of steps with constant probability.

For any finite , Lemma 3 and linearity of expectation yield for that

which by Markov’s inequality means that the event

occurs with a positive probability that does not depend on . Given the assumed initial values of , the -ball around the optimum is not reached if the event occurs. Hence, there is a minimum probability such that for any finite number of steps , the probability of not hitting the -ball within  steps is at least . Consequently, the expected first hitting time of the -ball is infinite. ∎

4 Mean Square Convergence

As mentioned in the introduction, there exist several convergence analyses using different techniques that take into account the stochastic effects of the algorithm. The analysis by Jiang et al. [8] is perhaps the one where the proof of mean square convergence follows most directly from the definition. They consider the basic PSO and prove the following statement (Theorem 5 in their paper):

Statement 1.

Given , if , , and are all satisfied, the basic particle swarm system determined by parameter tuple will converge in mean square to .

This statement is claimed to hold for any fitness function and for any initial swarm configuration. However, as acknowledged by the corresponding author [7], there is an error in the proof of the above statement, which is actually wrong without additional assumptions.

Intuitively Statement 1 makes sense for well-behaved, continuous functions like Sphere. However, in retrospect, it is not too difficult to set up artificial fitness functions and swarm configurations where the statement is wrong: Let us consider the one-dimensional function defined by , , and for all , which is to be minimised.

Assume a swarm of two particles, where the first one has position , which is then its local best and the global best. Furthermore, assume velocity 0 for this particle, i.e., it has stagnated. Formally, . Now let us say the second particle has current and local best position 1 and velocity 0, formally and . This particle will now be attracted by a weighted combination of local and global best, e. g. the point if both learning rates are the same. The problem is that the particle’s local best almost surely will never be updated again since the probability of sampling either local best or global best is if the sampling distribution is uniform on an interval of positive volume or is the sum of two such distributions, as it is defined in the basic PSO. The sampling distribution might be deterministic because both and might be , but then then the progress corresponds to the last velocity value, which again was either obtained according to the sum of two uniform distributions or was already . The error in the analysis is hidden in the proof of Theorem 4 in [8], where is concluded even though might be in a null set. Nevertheless, important parts of the preceding analysis can be saved and a theorem on convergence can be proved under additional assumptions on the fitness function. In the following, we describe the main steps in the convergence analysis by Jiang et al. [8]. A key idea in [8] is to consider a one-dimensional algorithm and an arbitrary particle, assuming that the local best for this particle and global best do not change. Then a recurrence relation is obtained as follows: where we dropped the index denoting the arbitrary particle we have chosen, and the time index for local and global best. The authors proceed by deriving sufficient conditions for the sequence of expectations , , to converge (Theorem  in their paper).

Lemma 4.

Given , if and only if and , the iterative process is guaranteed to converge to .

Even though a process converges in expectation, its variance might diverge, which intuitively means that it becomes more and more unlikely to observe the actual process in the vicinity of the expected value. Another major achievement by Jiang et al. [8] is to study the variances of the still one-dimensional process. By a clever analysis of a recurrence of order , they obtain the following lemma (Theorem  in their paper).

Lemma 5.

Given , if and only if , and are all satisfied together, iterative process is guaranteed to converge to , where .

Lemma 5 means that the variance is proportional to . However, in contrast to what Jiang et al. [8] would like to achieve in their Theorem 4, we do not see how to prove that the variance approaches  for every particle. Clearly, this happens for the global best particle under the assumption that no further improvements of the global best are found. We do not follow this approach further since we are interested in PSO variants that converge to a local optimum.

5 Noisy PSO

The purpose of this section is to consider a variant of the basic PSO that includes a noise term. This PSO, which we call the Noisy PSO, is defined as in Algorithm 1, except that Eq. (5) is replaced by the velocity equation where the extra noise term has uniform distribution on the interval . Note that our analysis seems to apply also when the uniform distribution is replaced by a Gaussian one with the same expectation and variance. The constant parameter controls the noise level in the algorithm. Due to the uniformly distributed noise term, it is immediate that the variance of each particle is always at least . Therefore, the Noisy PSO does not enjoy the mean square convergence property of the basic PSO. In return, the Noisy PSO does not suffer the stagnation problems discussed in Section 1, and finite expected first hitting times can in some cases be guaranteed. The noisy PSO uses similar measures to avoid stagnation as the GCPSO mentioned in the introduction. However, our approach is simpler and treats all particles in the same way. On the other hand, the GCPSO relies on a specific update scheme for the global best particle.

Our main result considers the simplified case of a one-dimensional function but takes into account the whole particle swarm. For simplicity, we only consider the half-open positive interval by defining if , and otherwise, which has to be minimised, and assume that at least one particle is initialised in the positive region. This event happens with positive probability for a standardised initialisation scheme. It seems that our analyses can be adapted to the standard Sphere (and order-preserving transformations thereof), but changes of sign complicate the analysis considerably. Note that the analyses of stagnation in Section 3 only consider positive particle positions and thus apply to as well.

Theorem 2.

Consider the Noisy PSO on the function and assume . If and the assumptions from Theorems 3 and 4 below hold, then the expected first hitting time for the interval is finite.

The proof of this theorem relies heavily on the convergence analysis by Jiang et al. [8]. We will adapt their results to the Noisy PSO. Recall that the only difference between the two algorithms is the addition of in the update equation for the particle position. It is important to note that is drawn from (considering one dimension) fully independently for every particle and time step. As mentioned above, Jiang et al. [8] consider a one-dimensional algorithm and an arbitrary particle, assuming that the local best for this particle and global best do not change. Then a recurrence relation is obtained by manipulating the update equations. Taking this approach for the Noisy PSO yields: where we dropped the index for the dimension, the index denoting the arbitrary particle we have chosen and the time index for local and global best. This is the same recurrence relation as in [8] except for the addition of . The authors proceed by deriving sufficient conditions for the sequence of expectations , , to converge. Since , the recurrence relation for the expectations is exactly the same as with the basic PSO and the following theorem can be taken over.

Theorem 3.

Given , if and only if and , the iterative process is guaranteed to converge to .

The next step is to study the variances of the one-dimensional process. Obviously, modifications of the original analysis in [8] become necessary here. To account for the addition of , we replace Eq. (11) in the paper2 by where and is the original from the paper. Regarding the quantities involving in the following, we observe that and where we used that is drawn independently of other random variables. Finally, we get which means that all following calculations in Section 3.2 in [8] may use the same values for the variables and as before. Only the variable increases by . Recall that for constant . We obtain . Now the iteration equation (17) for can be taken over with increased by . The characteristic equation  remains unchanged and Theorem  applies in the same way as before. Theorem 3 in [8] is updated in the following way and proved as before, except for plugging in the updated value of .

Theorem 4.

Given , if and only if , and are all satisfied together, iterative process is guaranteed to converge to , where

As a consequence from the preceding lemma, the variance remains positive even for the particle  that satisfies . Under simplifying assumptions, we show that this particle allows the system to approach the optimum. Later, we will show how to drop the assumption.

Lemma 6.

Assume that , that the global and local bests are never updated, and that the conditions in Theorem 3 and Theorem 4 hold. Then for all sufficiently small there exists a such that

where is the position of the particle in iteration for which the local best position equals the global best position.

Proof.

We assume that , otherwise there is nothing to show. Furthermore, cannot be negative since . We decompose the process by defining . Our goal is to prove that it is unlikely that is much larger than using Chebyshev’s inequality. We therefore need to estimate the expectation and the variance of . From Theorem 3 and the fact that holds for the best particle,

(8)

To estimate the variance of , first recall that by Theorem 4, it holds that

(9)

Due to the independence of the random variables and , we have . The random variable has variance . The limit in (9) therefore implies that

(10)

where we have defined Combining Eq. (8) and Eq. (10), yields This limit implies that for any , there exists a such that and analogously . By the inequality above, and by Chebyshev’s inequality, it holds that

Obviously, the larger is the more restrictive the requirements on the outcome of  are. Hence, choosing so large that holds, we get the desired result

The previous lemma does not make any assumption on the objective function. With regard to , it implies that the global best (assuming