ConsensusBased Optimization on the Sphere I: WellPosedness and MeanField Limit
Abstract
We introduce a new stochastic KuramotoVicsektype model for global optimization of nonconvex functions on the sphere. This model belongs to the class of ConsensusBased Optimization methods. In fact, particles move on the sphere driven by a drift towards an instantaneous consensus point, computed as a convex combination of the particle locations weighted by the cost function according to Laplace’s principle. The consensus point represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached, then the stochastic component vanishes. In this paper, we study the wellposedness of the model and we derive rigorously its meanfield approximation for large particle limit.
Keywords: consensusbased optimization, stochastic KuramotoVicsek model, wellposedness, meanfield limit
1 Introduction
Over the last decades, large systems of interacting particles (or agents) are widely used to investigate selforganization and collective behavior. They frequently appear in modeling phenomena such as biological swarms [15, 19], crowd dynamics [2, 8, 9], selfassembly of nanoparticles [38] and opinion formation [3, 45, 35]. Similar particle models are also used in metaheuristics [1, 7, 11, 31], which provide empirically robust solutions to tackle hard optimization problems with fast algorithms. Metaheuristics are methods that orchestrate an interaction between local improvement procedures and global/high level strategies, and combine random and deterministic decisions, to create a process capable of escaping from local optima and performing a robust search of a solution space. Starting with the groundbreaking work of Rastrigin on Random Search in 1963 [49], numerous mechanisms for multiagent global optimization have been considered, among the most prominent instances we recall the Simplex Heuristics [46], Evolutionary Programming [27], the MetropolisHastings sampling algorithm [34], Genetic Algorithms [36], Particle Swarm Optimization (PSO) [40, 48], Ant Colony Optimization (ACO) [23], Simulated Annealing (SA), [37, 41]. Despite the tremendous empirical success of these techniques, most metaheuristical methods still lack proper mathematical proof of robust convergence to global minimizers. In fact, due to the random component of metaheuristics, their asymptotic analysis would require to discern the stochastic dependencies, which is a dounting task, especially for those methods that combine instantaneous decisions with memory mechanisms.
Recent work [47, 14] on ConsensusBased Optimization (CBO) focuses on instantaneous stochastic and deterministic decisions in order to establish a consensus among particles on the location of the global minimizers within a domain. In view of the instantaneous nature of the dynamics, the evolution of the system can be interpreted as a system of first order stochastic differential equations, whose large particle limit is approximated by a deterministic partial differential equation of meanfield type. The large time behavior of such a deterministic PDE can be analyzed by classical techniques of large deviation bounds and the global convergence of the meanfield model can be mathematically proven in a rigorous way for a large class of optimization problems. Certainly CBO is a significantly simpler mechanism with respect to more sophisticated metaheuristics, which can include different features including memory of past exploration. Nevertheless, it seems to be powerful and robust enough to tackle many interesting nonconvex optimizations [16, 28, 33], which would be harder to solve by gradient descent methods that have been dominating the field of optimization, especially in relevant applications such as machine learning. In fact in many of these problems the objective function is not differentiable and CBO do not use any gradient information for their exploration. Moreover, in certain models, such as training of deep neural networks, the gradient tends to explode or vanish [10]. Finally, although there exist situations where global optimization by gradient descent methods can be ensured under ad hoc conditions, see, e.g., [17, 43, 42], they do not offer in general guarantees of global convergence. Instead, CBO has potential of a rather general and rigorous global asymptotic/convergence analysis. Some theoretical gaps remain open in the analysis of CBO though, in particular the rigorous derivation of the meanfield limit, due to the difficulty in establishing bounds on the moments of the probability distribution of the particles [14].
Motivated by these theoretical gaps and several potential applications in machine learning, the main task of the present work is to develop a CBO approach to solve the following constrained optimization problem
(1.1) 
where is a given continuous cost function, which we wish to minimize over the sphere. In particular, we consider a system of interacting particles satisfying the following stochastic KuramotoVicsektype dynamics expressed in Itô’s form
(1.2) 
where is a stuitable drift parameter, a diffusion parameter,
(1.3) 
is the empirical measure of the particles ( is the Dirac measure at ), and
(1.4) 
For ease of notation, for any vector we may write to mean and is the Euclidean norm. This stochastic system is considered complemented with independent and identical distributed (i.i.d.) initial data with , and the common law is denoted by . The trajectories denote independent standard Brownian motions in . In (1.2) the projection operator is defined by
(1.5) 
The choice of the weight function in (1.4) comes from the wellknown Laplace principle [44, 22, 47], a classical asymptotic method for integrals, which states that for any probability measure , it holds
(1.6) 
Let us discuss the mechanism of the dynamics. The righthandside of the equation (1.2) is made of three terms. The first deterministic term imposes a drift to the dynamics towards , which is the current consensus point at time as approximation to the global minimizer. The second stochastic term introduces a random decision to favor exploration, whose variance is function of the distance of the particles to the current consensus points. In particular, as soon as consensus is reached, then the stochastic component vanishes. The last term combined with is needed to ensure that the dynamics stays on the sphere despite the Brownian motion component. In fact, we will initially analyze the wellposedness of the system (1.2) as defined in the whole space instead of being immediately considered constrained on the sphere , and we will check afterwards that, if the initial data is set on the sphere, then the particles remain there at all times. We further notice that the dynamics does not make use of any derivative of , but only of its pointwise evaluations, which appear integrated in (1.4). Hence, the equation can be in principle numerically implemented at discrete times also for cost functions which are just continuous and with no further smoothness. We require below more regularity to exclusively to ensure formal wellposedness of the evolution.
The main results of this paper are about the wellposedness of (1.2) and its rigorous meanfield limit  which is an open issue for unconstrained CBO [14]  to the following nonlocal, nonlinear FokkerPlanck equation
(1.7) 
with the initial data . Here is a Borel probability measure on and
The operators and denote the divergence and LaplaceBeltrami operator on the sphere respectively. The meanfield limit will be achieved through the coupling method [50, 25, 39, 12] by introducing an auxiliary monoparticle process, satisfying the selfconsistent nonlinear SDE
(1.8) 
with the initial data distributed according to . Here we require , and we will show that as a measure on solves the PDE (1.7). We call the SDE (1.8) the meanfield dynamic, and the PDE (1.7) is called the meanfield PDE. In our follow up paper [28], for more regular datum , we also prove existence and uniqueness of distributional solutions at any finite time . This model and the analysis we provide in this paper are very much inspired by the homogenous version of the kinetic KolmogorovKuramotoVicsek model, which was formally derived by Degond and Motsch [21] as a meanfield limit of the discrete Vicsek model [4, 18, 51]. Recently, the stochastic Vicsek model has received extensive attention in the mathematical community to establish its meanfield limit, hydrodynamic limit, and phase transitions. Bolley, Cañizo and Carrillo [13] have rigorously justified the meanfield limit in case of smoothed force field, and Gamba and Kang [30] and Figalli, Kang, and Morales [26] extended the global existence and uniqueness of weak solutions of the kinetic KolmogorovKuramotoVicsek model to the case of singular meanfield force field. Recently, Degond, Frouvelle and Liu [20] provided also a complete and rigorous description of phase transitions such as the number and nature of equilibria, stability, convergence rate, phase diagram and hysteresis. Our results build very much upon these developments.
Before starting our analysis of (1.2), (1.8), and (1.7), let us illustrate numerically the behavior of the dynamics in the case of the minimum solution of the Ackley function for constrained over the sphere
(1.9) 
with , , , and with .
The global minimum is attained at . In Figure 1 we report the standard Ackley function on the plane for and the case constrained over the half sphere .
In the reported simulations we initialize the particles with a uniform distribution over the half sphere characterized by and employ a simple EulerMaruyama scheme with projection and time step . We report in Figure 2 the particle trajectories for in the case of , , and . On the left we consider the case with minimum at , on the right the case with minimum at . The time evolution of the particle distribution in the numerical mean field limit for is also reported in the upper part of the same figure.
As these figures show it, the dynamics consistently converges to the global minimizers totally regardless of the many local minimizers and the large particle system does converges numerically to its meanfield approximation.
Let us conclude this introduction by mentioning that the optimization on the sphere offers further numerous advantages, besides allowing a rigorous proof of meanfield limit. First of all a vast class of optimization problems can be reduced to constrained optimizations over the sphere: in the companion paper [28], where we focus on the numerical implementation of the method and its asymptotic/convergence analysis to global minimizers, we present also a few relevant and challenging applications in signal processing and machine learning.
Due to compactness of the sphere, local smoothness and boundedness requirements on are necessarily a uniform and global property. However, against these properties that greatly simplify the analysis of the wellposedness of the system and its meanfield limit, the specific topology of the sphere makes it surprisingly harder to prove asymptotic convergence of the dynamics to global miminizers, requiring major technical variations with respect to the approach of unconstrained CBO [14]. We refer to the companion paper [28] for the details of the large time analysis of solutions of (1.7) and examples of applications in highdimension.
2 Wellposedness
This section focuses on proving the wellposedness for the particle system (1.2), the meanfield dynamic (1.8) and the meanfield PDE (1.7). To ensure this, we make the following smoothness assumption on the objective function .
Assumption 2.1.
The objective function is locally Lipschitz continuous.
2.1 Wellposedness for the interacting particle system (1.2)
The difficulty in showing first the wellposedness of (1.2) in the ambient space is that the projection is not defined for (singularity) and is unbounded for (blowup) . In order to overcome this problem, we regularize the diffusion and drift coefficients, that is, we replace them with appropriate functions and respectively: let be a matrix valued map on with bounded derivatives of all orders such that for all , and be a valued map on , again with bounded derivatives of all orders, such that if . It is also useful to mention that
(2.1) 
and
(2.2) 
hold for any . For later use, let us denote . Additionally, we regularize the locally Lipschitz function outside : Let us introduce satisfying the following assumptions
Assumption 2.2.
The regularized extension function is globally Lipschitz continuous and satisfies the properties

when , and when ;

for all for a suitable global Lipschitz constant ;

.
Given such , and from Assumption 2.2, we introduce the following regularized particle system
(2.3) 
for , where
(2.4) 
Next we can deduce that the coefficients in (2.3) are locally Lipschitz continuous and have linear growth. More precisely, we have the following result.
Lemma 2.1.
Let , be arbitrary and satisfy Assumption 2.2. Then for any , and corresponding empirical measures and it holds
(2.5) 
and
(2.6) 
where . Here we used the notations for norms of vectors and .
Proof.
Theorem 2.1.
Let be a probability measure on and, for every , be i.i.d. random variables with the common law . For every , there exists a pathwise unique strong solution to the particle system (1.2) with the initial data . Moreover it holds that for all and any .
Proof.
Given , and , the SDE (2.3) has locally Lipschitz coefficients, so it admits a pathwise unique local strong solution by standard SDE wellposedness result [24, Chap. 5, Theorem 3.1]. Moreover, it follows from Itô’s formula (5.1) (choosing ), as long as , that
(2.11) 
where is the th component of , and we have used the property of as in (2.2). Hence for all , which ensures that the solution keeps bounded at finite time, hence we have a global solution. Since all have norm 1, the solution to the regularized system (2.3) is a solution to (1.2), which provides the global existence of solutions to (1.2).
To show pathwise uniqueness let us consider two solutions to (1.2) for the same initial distribution and Brownian motion. According to the above computation these two solutions stay on the sphere for any , hence they are solutions to the regularised system (2.3), whose solutions are pathwise unique due to the locally Lipschitz continuous coefficients. Hence we have uniqueness for solutions to (1.2). ∎
2.2 Wellposedness for the meanfield dynamic (1.8)
For readers’ convenience, we give a brief introduction of the Wasserstein metric in the following definition, we refer to [5] for more details.
Definition 2.1 (Wasserstein Metric).
Let and be the space of Borel probability measures on with finite second moment. We equip this space with the Wasserstein distance
(2.12) 
where denotes the collection of all Borel probability measures on with marginals and in the first and second component respectively. The Wasserstein distance can also be expressed as
(2.13) 
where the infimum is taken over all joint distributions of the random variables , with marginals , respectively.
Notice now that, for any
(2.14) 
A direct application of above leads to
(2.15) 
with .
Lemma 2.2.
Assume that (with compact support), then the following stability estimate holds
(2.16) 
for any , where .
Proof.
Let us compute the difference
where is an arbitrary coupling of and . We can write the integrand as follows
where the last two terms on the right hand side can be rewritten as
Under Assumption 2.2, one can obtain the bounds
(2.17) 
and
(2.18) 
Similar to the estimate of , we estimate
(2.19) 
Notice that , so for any one has
(2.20) 
Collecting the above estimates, we obtain
(2.21) 
where depends only on and . Lastly, optimizing over all couplings yields (2.16). ∎
The following theorem states the wellposedness for the meanfield dynamic (1.8).