Distributed Optimization in Fixed-Time

Distributed Optimization in Fixed-Time

Mayank Baranwal
Department of Electrical Engineering & Computer Science
University of Michigan
Ann Arbor, MI 48109
mayankb@umich.edu
&Kunal Garg
Department of Aerospace Engineering
University of Michigan
Ann Arbor, MI 48109
kgarg@umich.edu
Alfred O. Hero
Department of Electrical Engineering & Computer Science
University of Michigan
Ann Arbor, MI 48109
hero@umich.edu
&Dimitra Panagou
Department of Aerospace Engineering
University of Michigan
Ann Arbor, MI 48109
dpanagou@umich.edu
Abstract

This paper introduces the fixed-time distributed convex optimization problem for continuous time multi-agent systems under time-invariant topology. A novel nonlinear protocol coupled with tools from Lyapunov theory is proposed to minimize the sum of convex objective functions of each agent in fixed-time. Each agent in the network can access only its private objective function, while exchange of local information is permitted between the neighbors. While distributed optimization protocols for multi-agent systems in finite-time have been proposed in the literature, to the best of our knowledge, this study investigates first such protocol for achieving distributed optimization in fixed-time. We propose an algorithm that achieves consensus of neighbors’ information and convergence to a global optimum of the cumulative objective function in fixed-time. Numerical examples corroborate our theoretical analysis.

1 Introduction

Over the past decade, distributed optimization problems for multi-agent systems have received considerable attention resulting primarily from the evergrowing size and complexity of datasets, privacy concerns and communication constraints among multiple agents nedic2010constrained ; lu2012zero ; nedic2015distributed ; lin2017distributed ; pan2018distributed . These distributed convex optimization problems take the following form:

(1)

where is the team objective function and represents the local objective function of the agent. Functions are assumed to be convex and twice differentiable. It is further assumed that the agents are only aware of their local objective functions, i.e., each function is known only to the agent, while agents can exchange relevant information with their neighbors.

Distributed optimization problems find applications in several domains including but not limited to sensor networks rabbat2004distributed , formation control wang2012leader , satellite tracking hu2016smooth , and large-scale machine learning nathan2017optimization . Another class of distributed optimization problems, primarily referred as distributed constraint optimization (DCOP) in the literature, deal with discrete variables and combinatorial constraints pearce2007quality ; petcu2007mb ; petcu2007pc ; leaute2009frodo and find relevance in scheduling and planning tasks. Most prior works on distributed convex optimization is primarily concerned with discrete-time algorithms blatt2007convergent ; nedic2010constrained ; nedic2017achieving . In recent years, use of dynamical systems for continuous-time optimization has emerged a viable alternative wang2011control ; lu2012zero ; liu2014continuous ; feng2017finite ; lin2017distributed ; pan2018distributed ; hu2018distributed . This viewpoint allows tools from Lyapunov theory and differential equations to be employed for analysis and design of optimization procedures.

In lu2012zero , a continuous-time zero-gradient-sum (ZGS) with exponential convergence rate was proposed, which, combined with a finite-time consensus protocol, achieves finite-time convergence in feng2017finite . A drawback of ZGS-type algorithms is the requirement of strong convexity of the local objective functions and the choice of specific initial conditions for the agents such that . In lin2017distributed , a novel continuous-time distributed optimization algorithm with private (nonuniform) gradient gains is proposed that achieves convergence in finite-time. A finite-time tracking and consensus based algorithm is recently proposed in hu2018distributed , which again achieves convergence in finite-time.

The notion of finite-time optimization garg2018new is closely related to finite-time stability (FTS) in control theory. In contrast to asymptotic stability (AS), finite-time stability (FTS) is a concept that guarantees convergence of solutions in a finite time. In bhat2000finite , the authors introduce necessary and sufficient conditions in terms of a Lyapunov function for continuous, autonomous systems to exhibit FTS. Fixed-time stability (FxTS) polyakov2012nonlinear is a stronger notion than FTS, where the time of convergence does not depend upon the initial condition.

To the best of our knowledge, distributed optimization procedures with fixed-time convergence have not been addressed in the literature. Many practical applications, such as, time critical classification, autonomous distributed systems and economic dispatch in power systems, often undergo frequent and severe changes in operating conditions, and thus require fast solutions irrespective of the initial conditions. This paper proposes a novel nonlinear distributed convex optimization algorithm with provable fixed-time convergence characteristics. The proposed procedure is a distributed tracking and consensus based algorithm, where both average consensus and tracking are achieved in fixed-time by leveraging tools from FxTS theory. Assumptions on strong convexity are also relaxed and thus the proposed algorithm generalizes to a broader class of convex objective functions. Moreover, the stability and optimality of the proposed algorithm is guaranteed using Lyapunov theory.

The rest of the paper is organized as follows: Section 2 describes some definitions and lemmas that are useful for designing the fixed-time distributed optimization protocol described in Section 3. The protocol is then validated on relevant example scenarios in Section 4, including distributed training of support vector machines. We then conclude our discussion with interesting directions for future work.

A note on mathematical notations: We use to denote the set of real numbers and to denote non-negative reals. Given a function , the gradient and the Hessian of at are denoted by and , respectively. Number of agents or nodes is denoted by . Given , denotes the 2-norm of . Symbol represents an undirected graph with the adjacency matrix and the set of nodes . The set of 1-hop neighbors of node is represented by . The second smallest eigenvalue of a matrix is denoted by . Finally, we define function as

(2)

and .

2 Problem Formulation and Preliminaries

We focus on distributed optimization of a sum of convex functions , as described in (1) under a fixed time constraint. Functions ’s are assumed to be convex and twice-differentiable. Let represent the state of agent . For simplicity, we model agent as a first-order integrator system, given by:

(3)

where depends upon the states of the agent , and the states of the neighboring agents . For sake of brevity, we denote the dynamical equation (3) as:

(4)

where can be regarded as a control input. Our objective is to design a control algorithm, such that is achieved in fixed time for any initial condition , where optimizes the team objective function in (1), i.e., , for all , for , where is fixed by the designer.

2.1 Overview of FxTS

In this section, we present relevant definitions and results on FxTS. Consider the system:

(5)

where , and . As defined in bhat2000finite , the origin is said to be an FTS equilibrium of (5) if it is Lyapunov stable and finite-time convergent, i.e., for all , where is some open neighborhood of the origin, , where depends upon the initial condition . The authors in polyakov2012nonlinear presented the following result for fixed-time stability, where the time of convergence does not depend upon the initial condition.

Lemma 1 (polyakov2012nonlinear ).

Suppose there exists a positive definite function for system (5) such that

(6)

with , and . Then, the origin of (5) is FxTS, i.e., for all , where the settling time satisfies

(7)

In this paper, we will only need to manifest a Lyapunov function for the case .

3 FxTS Distributed Optimization

Consider the system consisting of nodes with graph structure defining the communication links between the nodes. The objective is to find that solves

(8)

In this work, we assume that the minimizer for (8) exists (i.e., the optimal point is attained) and is unique. Unlike previous work (e.g. liu2014continuous ; feng2017finite ), we do not require the objective functions to be strongly convex, or of a particular functional form. Furthermore, in contrast with feng2017finite , where the initial conditions are required to satisfy some conditions, e.g., , we do not need any such conditions. In other words, we show fixed-time convergence for arbitrary initial conditions. We make the following assumption on the inter-node communications.

Assumption 1.

The communication topology between the agents is connected and undirected, i.e., the underlying graph is connected and is a symmetric matrix.

Assumption 2.

Functions are convex, twice differentiable and the Hessian is invertible for all .

Assumption 3.

Each node receives from each of its neighboring nodes .

Our approach to fixed-time multi-agent distributed optimization is based on first prescribing a centralized fixed-time protocol that relies upon global information. Then, the quantities in the centralized protocol are estimated in a distributed manner. In summary, the algorithm proceeds by first estimating global quantities ( as defined in (10)) in the centralized protocol, then driving the agents to reach average consensus ( for all ), and finally driving the common trajectory to the optimal point , all in a fixed time. Recall that agents are said to have reached consensus on states if for all . To this end, we define first a novel centralized fixed-time protocol in the following theorem:

Theorem 1 (Centralized fixed-time protocol).

Suppose the dynamics of each agent in the network is given by:

(9)

where is based on global (centralized) information as described below:

(10)

and for each , for all . Then the trajectories of all agents converge to the optimal point , i.e., the minimizer of the team objective function (8) in a fixed time .

Note that states of all the agents are driven by the same input and are initialized to same starting point. In a distributed setting, this behavior translates to agents having reached consensus and are subsequently being driven by a common input.

Remark 1.

Theorem 1 represents a centralized protocol for convex optimization of team objective functions. Here, the agents are always in consensus and have access to global information and . In the distributed setting, agents only have access to their local information , , and will not always be in consensus. Below we propose fixed-time schemes for estimation of global quantities that reach consensus in fixed-time.

Proof.

The proof is based on choosing an appropriate candidate for the Lyapunov function (6), such that its time-derivative satisfies the conditions for fixed-time convergence in Lemma 1. We consider a candidate Lyapunov function as . Observe that is 0 at the minimizer of the team objective function (1). By taking its time-derivative along (9), we obtain:

where is defined in (2). With , , we have that and . Hence, using Lemma 1, it follows that there exists such that for all , where

Now, implies , which implies for all . Hence, the trajectories of (9) reach the optimal point in a fixed time , starting from any initial condition. ∎

Now, for each agent , let us define the (vectorized) estimates of the global (centralized) quantities, , by

(11)

where denotes the j-th column of the matrix , for some . Let , and define as:

(12)

where and as defined in (10). We consider the following continuous-time update rule for in (11) given by:

(13)

where , and . Note that the quantity is updated in a distributed manner. Finally, denote the time-derivative of the last quantity in (11) by , i.e.,

(14)

Assume that for all , for some . Under this assumption, we have the following result on fixed-time distributed parameter estimation.

Theorem 2 (Fixed-time parameter estimation).

Let for each , i.e., agents initialize their local states at origin, and the control gain in (13) is sufficiently large, more precisely, . Then there exists a fixed-time such that for all and .

Proof.

We refer the reader to Section B in the supplementary material for detailed derivation. ∎

Remark 2.

Theorem 2 states that if the control gain is sufficiently large, then the agents estimate the global information and in a distributed manner. Theorem 2 only guarantees that for all and . However, in order to employ the centralized fixed-time protocol, agents must additionally reach consensus in their states , so that maps to for each agent .

In order to achieve consensus and optimal tracking, we propose the following update rule for each agent in the network:

(15)

where is as described in (12), and is defined as locally averaged signed differences:

(16)

where , and . The following results establish that the state update rule for each agent proposed in (15) ensures that the agents reach global consensus and optimality in fixed-time.

Theorem 3 (Fixed-time consensus).

Under the effect of update law (15) with defined as in (16), and for all and , the closed-loop trajectories of (4) converge to a common point for all in a fixed time , i.e., for all .

Proof.

We refer the reader to Section C in the supplementary material for detailed derivation. ∎

Finally, the following corollary establishes that the agents track optimal point in a fixed-time.

Corollary 1 (Fixed-time distributed optimization).

Let each agent in the network be driven by the control input (15). If the agents operate under consensus, i.e., , and additionally, for all and , where and are described in Theorems 2 and 3, respectively. Then the agents track the minimizer of the team objective function in a fixed time.

Proof.

The proof follows directly from previous results. From Theorems 2 and 3, it follows that and for all , . Thus, the conditions of the centralized fixed-time protocol in Theorem 1 are satisfied, and therefore, there exists , such that for all , . Here is an optimal solution to the distributed optimization problem (8). ∎

The overall fixed-time distributed optimization protocol is described in Algorithm 1.

1:procedure FxTS Distributed Optimization(, )
2:     Initialize parameters: as described in (12), (13) and (16)
3:     For each agent :
4:     FxTS Parameter Estimation
5:     while , (i.e., for all do
6:         Simulate (11) using control law (13)
7:     end while
8:     FxTS Consensus
9:     while , (i.e., for all do
10:         Simulate (4) using control law (15)
11:     end while
12:     FxTS Optimal Tracking
13:     while , (i.e., do
14:         Continue simulating (4) using control law (15)
15:     end whilereturn
16:end procedure
Algorithm 1 Fixed-time distributed optimization algorithm.

4 Numerical Examples

In this section, we present numerical examples demonstrating the efficacy of our proposed method. In each of the following examples, the graph topology is such that node is connected to node for . We use semilog-scale to clearly show the variation near 0, while we show the linear-scale plot in the inset of each figure. Simulation parameters in Theorems 1-3 can be chosen arbitrarily as long as the respective conditions are satisfied.

4.1 Example 1: Distributed Optimization with Heterogeneous Convex Functions

We present a case study, where multiple agents aim to minimize the sum of heterogeneous private functions in fixed-time. A graph consisting of 11 nodes is considered with the local and private objective functions described by:

(17)

so that each is convex for . It can be easily shown that . For simplicity, we use , , , and in (16), (10) and (13). With these parameters, we obtain that , and , which implies final time of convergence is .

Figure 1: Example 1 - The gradient of the objective function with time for various initial conditions and .
Figure 2: Example 1 - Individual states with time. The states converge to the optimal point .

Figure 2 shows the variation of with time for various initial conditions and . For various initial conditions, drops to the value of within units. Figure 2 plots the maximum with time and shows the convergence of the individual to the optimal point well within units.

4.2 Example 2: Distributed Support Vector Machine

Required data sharing (gradients, parameter updates) raises issues in data-parallel learning due to the increased computational and communication overhead. In distributed (data-parallel) learning, minibatches are split across multiple nodes, where each node (agent) computes necessary gradients at the local level and then all the agents aggregate their gradients to perform parameter updates. Interestingly, the distributed optimization algorithm proposed in this paper can be employed to perform data-parallel learning with limited communication among the agents. Note that the proposed algorithm assumes only a connected communication graph, i.e., agents only need to exchange information with their neighbors and not with every other agent. For illustration, consider the following linear SVM example, where functions are given as:

(18)

Here , represent separating hyperplane parameters of the agent, data points allocated to agent and corresponding labels, respectively.111Since the proposed method assumes that the functions are twice differentiable, we use function with large values of to smoothly approximate . The vectors are chosen from a random distribution around the line , so that the solution, i.e., the separating hyperplane, to the minimization problem is the vector . In this case, we consider a network consisting of 5 nodes and we consider the case when . Figure 3a shows the distribution of symmetrically around the line .

For this case, the parameters were set to , , , . With these parameter values, we obtain that and , which implies final time of convergence satisfies units.

(a) (b)
(c) (d)
Figure 3: Example 2 - (a) Distribution of points around the line (red dotted line). Blue and red stars denote the points corresponding to and , respectively. (b) The gradient of the objective function with time. (c) Maximum difference between the states with time. (d) Individual states with time. The states converge to the optimal point .

Figure 3b illustrates the variation of . The maximum of differences between states of any two agents, , is illustrated in Figure 3c. Figure 3d plots the convergence behavior of the state error .

5 Discussions

For Example 1, we performed a sensitivity analysis by varying the parameters to observe the effect of the exponents on the convergence time. We keep and . Figures 5 and 5 illustrate the variation of and for 11 sets of varying between and . As expected, the convergence time goes down, and the rate of convergence (the slope of the curve) goes up, as increases and decreases. Observe that in Figures 5 and 5, for , the convergence is linear on the scale, which translates to exponential convergence, while those for are super-linear.

Figure 4: Example 1: The gradient of the objective function with time for various exponents .
Figure 5: Example 1: Max. difference between the states and the optimal point, , with time for various exponents .

While optimization methods in continuous-time are important and have major theoretical relevance, sampling constraints may preclude continuous-time acquisition and updating. In polyakov2018consistent , the authors study a particular class of homogeneous systems to design a consistent discretization scheme that preserves the property of finite-time convergence. They extend their results to practically FxTS systems in poly2018consistent , where they show that the trajectories of the discretized system reach an arbitrary small neighborhood of the equilibrium point in fixed time, independent of the initial condition. One of the research avenues is to expand these results to more general class of FTS and FxTS systems, so that these results can be applied to the optimization schemes to obtain the solution in finite number of steps.

6 Conclusions

In this paper, we presented a scheme to solve distributed convex optimization problem for continuous time multi-agent systems with fixed-time convergence guarantees. We showed that when the topology of the information sharing network is fixed, consensus on the state values as well as the gradient and the hessian of the function values can be achieved in a fixed time. We then utilized this knowledge to find the optimum of the objective function. It is shown that each aspect of the algorithm, the consensus on the crucial information and convergence on the optimal value, are achieved in fixed time. In this paper, we considered unconstrained distributed optimization. Future work involves investigating methods of distributed optimization with fixed-time convergence guarantees with constraints, and incorporating private (non-uniform) gains between agents in the distributed protocol.

References

  • [1] Angelia Nedic, Asuman Ozdaglar, and Pablo A Parrilo. Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatic Control, 55(4):922–938, 2010.
  • [2] Jie Lu and Choon Yik Tang. Zero-gradient-sum algorithms for distributed convex optimization: The continuous-time case. IEEE Transactions on Automatic Control, 57(9):2348–2354, 2012.
  • [3] Angelia Nedic and Alex Olshevsky. Distributed optimization over time-varying directed graphs. IEEE Transactions on Automatic Control, 60(3):601–615, 2015.
  • [4] Peng Lin, Wei Ren, and Jay A Farrell. Distributed continuous-time optimization: nonuniform gradient gains, finite-time convergence, and convex constraint set. IEEE Transactions on Automatic Control, 62(5):2239–2253, 2017.
  • [5] Xiaowei Pan, Zhongxin Liu, and Zengqiang Chen. Distributed optimization with finite-time convergence via discontinuous dynamics. In 2018 37th Chinese Control Conference (CCC), pages 6665–6669. IEEE, 2018.
  • [6] Michael Rabbat and Robert Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks, pages 20–27. ACM, 2004.
  • [7] Jin-Liang Wang and Huai-Ning Wu. Leader-following formation control of multi-agent systems under fixed and switching topologies. International Journal of Control, 85(6):695–705, 2012.
  • [8] Qinglei Hu and Xiaodong Shao. Smooth finite-time fault-tolerant attitude tracking control for rigid spacecraft. Aerospace Science and Technology, 55:144–157, 2016.
  • [9] Alexandros Nathan and Diego Klabjan. Optimization for large-scale machine learning with distributed features and observations. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 132–146. Springer, 2017.
  • [10] Jonathan P Pearce and Milind Tambe. Quality guarantees on k-optimal solutions for distributed constraint optimization problems. In IJCAI, pages 1446–1451, 2007.
  • [11] Adrian Petcu and Boi Faltings. Mb-dpop: A new memory-bounded algorithm for distributed optimization. In IJCAI, pages 1452–1457, 2007.
  • [12] Adrian Petcu, Boi Faltings, and Roger Mailler. Pc-dpop: A new partial centralization algorithm for distributed optimization. In IJCAI, volume 7, pages 167–172, 2007.
  • [13] Thomas Leaute, Brammert Ottens, and Radoslaw Szymanek. Frodo 2.0: An open-source framework for distributed constraint optimization. In Proceedings of the IJCAI’ 09 Distributed Constraint Reasoning Workshop (DCR’ 09), number LIA-CONF-2010-002, pages 160–164, 2009.
  • [14] Doron Blatt, Alfred O Hero, and Hillel Gauchman. A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1):29–51, 2007.
  • [15] Angelia Nedic, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization, 27(4):2597–2633, 2017.
  • [16] Jing Wang and Nicola Elia. A control perspective for centralized and distributed convex optimization. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pages 3800–3805. IEEE, 2011.
  • [17] Shuai Liu, Zhirong Qiu, and Lihua Xie. Continuous-time distributed convex optimization with set constraints. IFAC Proceedings Volumes, 47(3):9762–9767, 2014.
  • [18] Zhi Feng and Guoqiang Hu. Finite-time distributed optimization with quadratic objective functions under uncertain information. In Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, pages 208–213. IEEE, 2017.
  • [19] Zilun Hu and Jianying Yang. Distributed finite-time optimization for second order continuous-time multiple agents systems with time-varying cost function. Neurocomputing, 287:173–184, 2018.
  • [20] Kunal Garg and Dimitra Panagou. A new scheme of gradient flow and saddle-point dynamics with fixed-time convergence guarantees. arXiv preprint arXiv:1808.10474, 2018.
  • [21] Sanjay P Bhat and Dennis S Bernstein. Finite-time stability of continuous autonomous systems. SICON, 38(3):751–766, 2000.
  • [22] Andrey Polyakov. Nonlinear feedback design for fixed-time stabilization of linear control systems. IEEE Transactions on Automatic Control, 57(8):2106, 2012.
  • [23] Andrey Polyakov, Denis Efimov, and Bernard Brogliato. Consistent discretization of finite-time stable homogeneous systems. In VSS 2018-15th International Workshop on Variable Structure Systems and Sliding Mode Control, 2018.
  • [24] Andrey Polyakov, Denis Efimov, and Bernard Brogliato. Consistent discretization of finite-time and fixed-time stable systems. SIAM Journal on Control and Optimization, 57(1):78–103, 2019.
  • [25] Mehran Mesbahi and Magnus Egerstedt. Graph theoretic methods in multiagent networks, volume 33. Princeton University Press, 2010.
  • [26] Godfrey Harold Hardy, John Edensor Littlewood, George Pólya, et al. Inequalities. Cambridge university press, 1988.
  • [27] Zongyu Zuo and Lin Tie. Distributed robust finite-time nonlinear consensus protocols for multi-agent systems. International Journal of Systems Science, 47(6):1366–1375, 2016.
  • [28] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  • [29] Carl Londahl. A very simple and intuitive neural network implementation, mathworks fileexchange. https://tinyurl.com/y3ts64o3, 2008.

Appendix A Some useful lemmas

We first present some Lemmas that will be useful in deriving our claims on fixed-time parameter estimation and consensus protocols 222We refer the readers to [25] for an overview of graph theory for multiagent systems..

Lemma 2 ([26, 27]).

Let for , . Then the following hold:

(19a)
(19b)
Lemma 3.

Let be the graph consisting of nodes located at for and denotes the in-neighbors of node . Then,

(20)
Lemma 4.

Let be an odd function, i.e., for all and the graph is undirected, and be set of vectors with and and . Then, the following holds

(21)
Lemma 5.

Let be an undirected, connected graph. Let be its Laplacian matrix given by:

Laplacian has following properties:
1) is positive semi-definite and .
2) .
3) .

Appendix B Proof of Theorem 2:

Our proof of Theorem 2 is based on the following key lemma:

Lemma 6.

Consider a network of agents, with , , and as defined in (11), (13), and (14), respectively. Let for some and all , and the control gain in (13) is sufficiently large, more precisely, ; then there exists a fixed-time , independent of the initial states , such that for each agent , for all .

Proof.

The time derivative of is given by:

Let us define , and the mean of ’s by . The difference between an agent ’s state and the mean of all agents’ states is denote by . Similarly, represents the difference . Then the time-derivate of is given by:

(22)

We consider the candidate Lyapunov function, . Taking its time-derivative along the trajectories of (22) gives:

(23)

From (13), the first term is rewritten as:

(24)

where the last equality follows from . Similarly, the second term in (B) is rewritten as:

(25)

Thus, from (B) and (B), it follows that

(26)

where , , and are the elements of the adjacency matrix . Define and . With this, and using the fact that and , we obtain:

(27)

Define matrices such that and