An Online Optimization Approach for MultiAgent Tracking of Dynamic Parameters in the Presence of Adversarial Noise
Abstract
This paper addresses tracking of a moving target in a multiagent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global timevarying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the nonasymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multiagent tracking with nonlinear observations.
I Introduction
Distributed estimation, detection, and tracking is ubiquitous in engineering applications ranging from sensor and robotic networks to social networks, and it has received a lot of attention for many years [1, 2, 3, 4, 5]. In these scenarios, the task is to estimate the value of a parameter which may or may not be dynamic. A group of agents aim to accomplish this task as a team. Each individual agent only partially observes the parameter, but the global spread of observations in the network allows agents to estimate the parameter collaboratively. This would require agents to aggregate local information, and many methods use consensus protocols as a critical component [6]. It is wellknown that when agents’ observations are linear with respect to the parameter, the tracking problem is equivalent to minimizing a global quadratic loss, written as a sum of local quadratic losses (see e.g. [7]). However, in general, the global loss can be more complicated, resulting in nonlinear observations.
In realworld applications, the parameter of interest is often timevarying. Therefore, regardless of the structure of the loss, the dynamic nature of the problem brings forward two issues: (i) The local losses are observed in an online or sequential fashion, i.e., the local losses are disclosed to agents only after they form their estimates at each round, and they are not aware of future loss functions. Therefore, the problem must be solved in an online setting. (ii) The online algorithm should mimic the performance of its offline counterpart in which the losses are known a priori. The gap between the two is often called regret. Tracking the minimizer of the global loss over time introduces the notion of dynamic regret [8]. This framework has been studied in centralized online optimization [8, 9, 10, 11, 12], where the hardness of the problem is captured via the variation in the minimizer sequence.
To address these issues in this paper, we adopt an online optimization approach to formulate distributed tracking. We consider tracking of a dynamic parameter or a moving target in a network of agents. The dynamics of the target is linear and known to agents, but the target deviates from this dynamics due to an unstructured or adversarial disturbance or noise. In other words, the noise is not necessarily generated from a statistical distribution, or it can be highly correlated to its past values over time. At each time instance, the target induces a global convex loss whose minimizer coincides with the target location. The global loss is a sum of local losses, where each local loss is associated to a specific agent. Agents exchange noisy local gradients according to a communication protocol to track the moving target.
Our problem setup is reminiscent of a distributed Kalman [13]. However, we differentiate the two as follows: (i) We do not assume that the target is driven by a Gaussian noise. Nor do we assume that this noise has a statistical distribution. Instead, we consider an adversarialnoise model with unknown structure. (ii) Agents observations are not necessarily linear; in fact, the observations are noisy local gradients that are nonlinear when the loss is not quadratic. Furthermore, our focus is on the finitetime analysis rather than asymptotic results.
We propose a decentralized version of the Mirror Descent algorithm, developed by Nemirovksi and Yudin [14]. Using the notion of Bregman divergence in lieu of Euclidean distance for projection, Mirror Descent has been shown to be a powerful tool in largescale optimization. Our algorithm consists of three interleaved updates: (i) each agent follows the noisy local gradient while staying close to previous estimates in the local neighborhood; (ii) agents take into account the dynamics of the moving target; (iii) agents average their estimates in their local neighborhood in a consensus step.
We then use a dynamic notion of regret to measure the difference between our online decentralized algorithm and its offline centralized version. We establish a regret bound that scales inversely in the spectral gap of the network, and it represents the adversarial noise causing deviation with respect to the linear dynamics. We further show that from optimization perspective our result subsumes two important classes of decentralized optimization in the literature: (i) decentralized optimization of timeinvariant losses, and (ii) decentralized optimization of timevariant losses for fixed targets. This generalization is achieved by allowing the loss function and the target value to vary simultaneously. We also provide a numerical experiment to show that our algorithm can be simply implemented to work with nonlinear observations in multiagent tracking.
Related Literature on Decentralized Optimization: In [15], decentralized mirror descent has been developed for timeinvariant functions in the case that agents receive the gradients with a delay. Moreover, Rabbat in [16] proposes a decentralized mirror descent for stochastic composite optimization problems and provide guarantees for strongly convex regularizers. Duchi et al. [17] study dual averaging for distributed optimization, and the extension of dual averaging to online distributed optimization is considered in [18]. MateosNúnez and Cortés [19] consider online optimization using subgradient descent of local functions, where the graph structure is timevarying. In [20], a decentralized variant of Nesterov’s primaldual algorithm is proposed for online optimization. In [21], distributed online optimization is studied for strongly convex objective functions over timevarying networks. Our setup follows the work of [22] on decentralized online mirror descent, but we extend the results to high probability bounds on the dynamic regret.
The set for any integer  

Transpose of the vector  
The th element of vector  
Identity matrix of size  
The dimensional probability simplex  
Standard inner product operator  
norm operator  
The dual norm of  
The th largest eigenvalue of in magnitude 
Ii Problem Formulation and Algorithm
Iia Dynamical Model and Optimization Perspective
Consider a dimensional moving target following the linear dynamics for a finite time as
(1) 
where is known, and is an adversarial noise, i.e., the sequence is neither generated according to a statistical distribution, nor it is independent over time. Our goal is to track , and regardless of the observation model, a distributiondependent mechanism, such as Kalman or particle filter, cannot solve the problem since the noise does not assume a statistical distribution.
In the centralized version of the tracking problem above, the observations of are realized through a timevarying, global loss function. That is, consider the tracking problem above as an optimization, where is the minimizer of the global loss at time . Let be a convex, compact set, and represent the global loss by at time . As the global loss varies over time, the goal is to track the minimizer of , which is . The offline and centralized version of our problem can be viewed as follows
(2)  
subject to 
We are interested to solve the problem above in an online and decentralized fashion. In particular, the global function at time is a sum of local functions as
(3) 
where is a local convex function on for all . We consider a network of agents facing two challenges when solving problem (2): (i) agent receives information only about and does not observe the global loss function , which is common to decentralized schemes; (ii) The functions are revealed to agents sequentially along the time horizon, i.e., at any time instance , agent has observed for , whereas the agent does not know for , which is common to online settings.
The agents interact with each another, and their relationship is captured via an undirected graph , where denotes the set of nodes, and is the set of edges. Each agent assigns a positive weight for the information received from agent , and the set of neighbors of agent is defined as .
While the problem framework is reminiscent of a distributed Kalman [13], there are fundamental distinctions in our setup: (i) The adversarial noise is neither Gaussian nor of known statistical distribution. It can be thought as a noise with unknown structure, which represents the deviation from the dynamics^{1}^{1}1In online optimization, the focus is not on distribution of data. Instead, data is thought to be generated arbitrarily, and its effect is observed through the loss functions[23].. (ii) Agents observations are not necessarily linear; in fact, the observations are local gradients of and are nonlinear when the objective is not quadratic. The other implicit distinction in this work is our focus on finitetime analysis rather than asymptotic results.
From optimization perspective, our framework subsumes two important classes of decentralized optimization in the literature:
However, in the tracking problem, functions and comparator variables evolve simultaneously, i.e., the variables are not constrained to be fixed in (2). Recall that is the minimizer of the global loss function at time . Then, the solution to problem (2) is simply . Denote by the estimate of agent for at time . To exhibit the online nature of problem (2), we reformulate it using the notion of dynamic regret as follows
(4) 
Then, the objective is to minimize the dynamic regret above which measures the gap between the online algorithm and its offline version. Our performance bound shall exhibit the impact of system noise, i.e., we want to prove a regret bound in terms of
(5) 
which represents the deviation of the moving target with respect to dynamics . Note that generalizing the results to the linear timevariant dynamics is straightforward, i.e., when is replaced by in (1).
IiB Technical Assumptions
To solve the multiagent online optimization (4), we propose to decentralize the Mirror Descent algorithm [14]. Mirror Descent has been shown to be a powerful method in largescale optimization by using Bregman divergence in lieu of Euclidean distance in the projection step. Before defining Bregman divergence and elaborating the algorithm, we start by stating a couple of standard assumptions on loss functions and agents communication.
Assumption 1
For any , the function is Lipschitz continuous on with a uniform constant . That is,
for any .
Assumption 2
The network is connected^{2}^{2}2The setup is generalizable to when network connectivity changes over time, and the communication matrix is timevarying., i.e., there exists a path from any agent to any agent . Also, the matrix is symmetric and doubly stochastic with positive diagonal. That is,
The connectivity constraint in Assumption 2 guarantees the information flow in the network.
We now outline the notion of Bregman divergence, which is critical in the development of Mirror Descent. Consider a compact, convex set , and let denote a 1strongly convex function on with respect to a norm . That is,
for any . Then, the Bregman divergence with respect to the function is defined as follows:
The definition of the Bregman divergence and the strong convexity of imply that
(6) 
for any . Two famous examples of Bregman divergence are the Euclidean distance and the KullbackLeibler (KL) divergence generated from and , respectively.
Assumption 3
Let and be vectors in . We assume that the Bregman divergence satisfies the separate convexity in the following sense
where is on the dimensional simplex.
The assumption is satisfied for commonly used cases of Bregman divergence. For instance, the Euclidean distance evidently respects the condition. The KLdivergence also satisfies the constraint, and we refer the reader to Theorem 6.4. in [25] for the proof.
Assumption 4
The Bregman divergence satisfies a Lipschitz condition of the form
for all .
When the function is Lipschitz on , the Lipschitz condition on the Bregman divergence is automatically satisfied. Again, for the Euclidean distance the assumption evidently holds. In the particular case of KL divergence, the condition can be achieved via mixing a uniform distribution to avoid the boundary (see e.g. [11] for more comments on the assumption).
Assumption 5
The dynamics is assumed to be nonexpansive. That is, the condition
holds for all , and .
The assumption postulates a natural constraint on the dynamics : it does not allow the effect of a poor estimation (at one step) to be amplified as the algorithm moves forward.
IiC Decentralized Tracking via Online Mirror Descent
We now propose our algorithm to solve the problem formulated in terms of dynamic regret in (4). In our setting, agents observations are gradients of the local losses. However, common in distributed state estimation and tracking, these observations are noisy. Hence, denoting the local gradient of agent at time by , the agent only receives representing the stochastic gradient. The stochastic oracle that provides noisy gradients satisfies the following constraints^{3}^{3}3For simplicity, we use one constant to bound gradients as well as the stochastic gradients.
(7) 
where is the field containing all information prior to the outset of round . A commonly used model to generate stochastic gradients satisfying (7) is an additive, bounded, zeromean noise. Agents then track the moving target using a decentralized variant of Mirror Descent as follows^{4}^{4}4We set to be the vector of all zeros to initialize the algorithm. In general, any initialization could work for the algorithm.
(8a)  
(8b) 
where is the stepsize sequence, and is the given dynamics in (1) which is common knowledge. In these updates, represents the estimate of agent of the moving target at time . The stepsize sequence should be tuned for different cases, but it is generally nonincreasing and positive.
The update (8a) allows an agent to follow the noisy local gradient while keeping the estimate close to those of the local neighborhood. This closeness occurs by minimizing the Bregman divergence. On the other hand, the first update in (8b) takes into account the dynamics of the moving target, and the second update in (8b) is the consensus term averaging the estimates in the local neighborhood.
Iii Theoretical Results
In this section, we state our theoretical result on the nonasymptotic performance of the decentralized online mirror descent for tracking dynamic parameters. Theorem 1 proves a bound on the dynamic regret, which captures the deviation of the moving target from the dynamics (tracking error), the decentralization cost (network error), and the impact of stochastic gradients (stochastic error). We show that this theorem recovers previous rates on decentralized optimization once the tracking error is removed. Also, it recovers previous rates on centralized online optimization in dynamic setting when the network error is eliminated. The proof is given in Appendix (Section A).
Theorem 1
Consider a moving target with the dynamical model of (1). Further consider the distributed, online tracking problem formulated in (4), where denotes the local estimate of agent of the moving target at time . Let the local estimates be generated by updates (8a)(8b), where the stochastic gradients satisfy the condition (7). Given Assumptions [15], the dynamic regret can be bounded as
with probability at least , where
and
In view of (1), the dynamical model of the target is described with the noise . The term shows the dependence of performance bound to noise by aggregating the errors over time. Also, and are the errors related to network and stochastic gradients, respectively.
In Section II, we discussed that our setup generalizes some of the previous results. It is now important to see that this generalization is valid in the sense that our result can recover those special cases:

When the global loss is timeinvariant, the target is fixed, i.e., the dynamics and in (1). In this case in Theorem 1, the term involving in is equal to zero, and we can use the stepsize sequence to recover the result of comparable algorithms, such as Theorem 4 in [17] on distributed dual averaging.

When the graph is complete, and hence . We then recover the results of [9] on centralized online learning (for linear dynamics) with exact gradients once we remove due to stochastic gradients.
Iv Numerical Experiment: Tracking Maneuvering Targets
In Mirror Descent algorithm, one has freedom over the selection of the Bregman divergence. A particularly wellknown type of Bregman is the Euclidean distance, commonly used in state estimation and tracking dynamic parameters. We focus on this scenario in this section to provide the numerical experiments for our method.
We consider a slowly maneuvering target in the plane and assume that each position component of the target evolves independently according to a near constant velocity model [26]. The state of the target at each time consists of four components: horizontal position, vertical position, horizontal velocity, and vertical velocity. We represent the state at time by , and therefore, the state space model takes the form
where is the system noise, and using for Kronecker product, can be written as
with being the sampling interval^{5}^{5}5The sampling interval of (seconds) is equivalent to the sampling rate of .. The goal is to cooperatively track in a network of agents. This problem has been studied in the context of distributed Kalman filtering [13, 27], state estimation [28, 29, 30], and particle filtering [31, 32, 33]. However, in contrast to Kalman filtering, we do not assume that the system noise is Gaussian. Also, as opposed to particle filtering, we do not receive a large number of samples (particles) per iteration since our setup is online, i.e., agents only observe one sample per time. Furthermore, we do not assume a statistical distribution on in our analysis, which differentiates our framework from state estimation. We adopt a modelfree approach where the noise can be adversarial (deterministic), stochastic with dependence over time, or of some complex structure. We generate the noise as follows. At each time we draw a sample from a zeromean Gaussian distribution with covariance matrix as follows
for the sampling interval seconds which amounts to frequency . Then, we let the system noise be . Though is generated from Gaussian distribution, the mismatch noise is nonGaussian and can have a complicated distribution. The constant takes different values in each experiment, and we describe this choice later.
We consider a sensor network of agents located on a grid. Agents aim to track the moving target collaboratively. Agents observe a noisy version of the target through a local loss function, and these observations are nonlinear. In particular, let the quantity be a noisy version of one coordinate of as follows
where denotes a random noise, and is the th unit vector in the standard basis of for . We partition the agents into four groups, and for each group we select one specific from the set . The random noise satisfies the standard assumption of being zeromean and finitevariance. Again, to show that our results are not dependent on Gaussian noise, we generate independently from a uniform distribution on .
Then, at time the local loss for agent takes the form
resulting in the global loss
where is the field containing all information in . It is straightforward to see that is the minimizer of the global loss. Observation of agent at time is the stochastic gradient of the local loss
We derive an explicit update to form an estimate of . We use Euclidean distance as the Bregman divergence in updates (8a)(8b) to get^{6}^{6}6We assume that the state of the target remains in a convex, compact set, and the updates can keep the estimate in the set without the projection step. This assumption can be satisfied in the finitetime domain.
and tune the step size to . The update is akin to consensus+innovation updates in the literature (see e.g. [2, 7]) though we recall that the observation is nonlinear, and the system noise is arbitrary.
It is proved in [7] that in decentralized tracking, the dynamic regret can be presented in terms of the tracking error when the local losses are quadratic. More specifically, the expected dynamic regret averages the tracking error over space and time (when normalized by ). While here we deal with polynomial loss of power four, the connection between tracking error and dynamic regret still holds true. Therefore, using the result of Theorem 1 we can expect that once the parameter does not deviate too much from the dynamics, i.e., when is small, the bound on the dynamic regret as well as the collective tracking error is small.
We show this intuitive idea by setting to different vaues. Larger values for are expected to cause more deviations from the dynamics and larger dynamic regret (worse performance). In Fig. 1, we plot the normalized dynamic regret for . Note that for each value of , we run the experiment only once to investigate the high probability bound in Theorem 1. As expected, the performance improves once tends to smaller values.
We next restrict our attention to the case that . For one run of this case, we provide a snapshot of the target trajectory (in red) in Fig. 2 and plot the estimator trajectory (in blue) for agents . Fig. 2 suggests that agents’ estimators closely follow the trajectory of the moving target with high probability.
V Conclusion
In this paper, we addressed tracking of a moving target in a network of agents. The target follows a linear dynamics which is common knowledge to agents, but it deviates from this dynamics due to an additive noise of an unknown structure. We formulated the problem as an online optimization of a global timevarying loss in a distributed fashion. The global loss at each time is a sum of a finite number of local losses, and each agent in the network holds a private copy of one local loss. Agents are unaware of the future loss functions as the local losses only become available to them sequentially. They exchange noisy local gradients with each other to track the value of the target.
Our proposed algorithm for this setup can be cast as a decentralized version of Mirror Descent. We however incorporated two more steps to include agents interactions and dynamics of the target. We used a notion of network dynamic regret to measure the performance of our algorithm versus its offline counterpart. We established that the regret bound scales inversely in the spectral gap of the network and captures the deviation of the target with respect to the dynamics. Our results generalized a number of results in online and offline distributed optimization. Also, numerical experiments verified the applicability of our algorithm to multiagent tracking with nonlinear observations. Future directions include studying the algorithm in the case that several observations are available per round, i.e., when agents can receive multiple noisy gradients per time. The method can be useful in the sensor networks where each sensor can have multiple measurements from different sources.
References
 [1] F. Bullo, J. Cortes, and S. Martinez, Distributed control of robotic networks: a mathematical approach to motion coordination algorithms. Princeton University Press, 2009.
 [2] S. Kar, J. M. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Transactions on Information Theory, vol. 58, no. 6, pp. 3575–3605, 2012.
 [3] S. Shahrampour, S. Rakhlin, and A. Jadbabaie, “Online learning of dynamic parameters in social networks,” in Advances in Neural Information Processing Systems, 2013.
 [4] S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Distributed detection: Finitetime analysis and impact of network topology,” IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3256–3268, 2016.
 [5] A. Nedić, A. Olshevsky, and C. A. Uribe, “Network independent rates in distributed learning,” in American Control Conference (ACC), 2016. IEEE, 2016, pp. 1072–1077.
 [6] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE Transactions on Automatic Control, vol. 48, no. 6, pp. 988–1001, 2003.
 [7] S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Distributed estimation of dynamic parameters: Regret analysis,” in American Control Conference (ACC), July 2016, pp. 1066–1071.
 [8] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” International Conference on Machine Learning (ICML), 2003.
 [9] E. C. Hall and R. M. Willett, “Online convex optimization in dynamic environments,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 647–662, 2015.
 [10] O. Besbes, Y. Gur, and A. Zeevi, “Nonstationary stochastic optimization,” Operations Research, vol. 63, no. 5, pp. 1227–1244, 2015.
 [11] A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online optimization: Competing with dynamic comparators,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015, pp. 398–406.
 [12] A. Mokhtari, S. Shahrampour, A. Jadbabaie, and A. Ribeiro, “Online optimization in dynamic environments: Improved regret rates for strongly convex problems,” in IEEE Conference on Decision and Control (CDC). IEEE, 2016, pp. 7195–7201.
 [13] R. OlfatiSaber, “Distributed kalman filtering for sensor networks,” in IEEE Conference on Decision and Control, 2007, pp. 5492–5498.
 [14] D. Yudin and A. Nemirovskii, “Problem complexity and method efficiency in optimization,” 1983.
 [15] J. Li, G. Chen, Z. Dong, and Z. Wu, “Distributed mirror descent method for multiagent optimization with delay,” Neurocomputing, vol. 177, pp. 643–650, 2016.
 [16] M. Rabbat, “Multiagent mirror descent for decentralized stochastic optimization,” in Computational Advances in MultiSensor Adaptive Processing (CAMSAP), 2015 IEEE 6th International Workshop on. IEEE, 2015, pp. 517–520.
 [17] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606, 2012.
 [18] S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed optimization via dual averaging,” in IEEE Conference on Decision and Control (CDC), 2013, pp. 1484–1489.
 [19] D. MateosNúnez and J. Cortés, “Distributed online convex optimization over jointly connected digraphs,” IEEE Transactions on Network Science and Engineering, vol. 1, no. 1, pp. 23–37, 2014.
 [20] A. Nedić, S. Lee, and M. Raginsky, “Decentralized online optimization with global objectives and local communication,” in IEEE American Control Conference (ACC), 2015, pp. 4497–4503.
 [21] M. Akbari, B. Gharesifard, and T. Linder, “Distributed online convex optimization on timevarying directed graphs,” IEEE Transactions on Control of Network Systems, 2015.
 [22] S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” arXiv preprint arXiv:1609.02845, 2016.
 [23] S. ShalevShwartz, “Online learning and online convex optimization,” Foundations and Trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
 [24] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multiagent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
 [25] H. H. Bauschke and J. M. Borwein, “Joint and separate convexity of the bregman distance,” Studies in Computational Mathematics, vol. 8, pp. 23–36, 2001.
 [26] Y. BarShalom, Tracking and data association. Academic Press Professional, Inc., 1987.
 [27] F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed kalman filtering and smoothing,” IEEE Transactions on automatic control, vol. 55, no. 9, pp. 2069–2084, 2010.
 [28] U. Khan, S. Kar, A. Jadbabaie, J. M. Moura et al., “On connectivity, observability, and stability in distributed estimation,” in IEEE Conference on Decision and Control (CDC), 2010, pp. 6639–6644.
 [29] S. Das and J. M. Moura, “Distributed state estimation in multiagent networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 4246–4250.
 [30] D. Han, Y. Mo, J. Wu, S. Weerakkody, B. Sinopoli, and L. Shi, “Stochastic eventtriggered sensor schedule for remote state estimation,” IEEE Transactions on Automatic Control, vol. 60, no. 10, pp. 2661–2675, 2015.
 [31] D. Gu, “Distributed particle filter for target tracking,” in IEEE International Conference on Robotics and Automation (ICRA), 2007, pp. 3856–3861.
 [32] O. Hlinka, O. Sluciak, F. Hlawatsch, P. M. Djuric, and M. Rupp, “Likelihood consensus and its application to distributed particle filtering,” IEEE Transactions on Signal Processing, vol. 60, no. 8, pp. 4334–4349, 2012.
 [33] J. Li and A. Nehorai, “Distributed particle filtering via optimal fusion of gaussian mixtures,” in Information Fusion (Fusion), 2015 18th International Conference on. IEEE, 2015, pp. 1182–1189.
 [34] A. Beck and M. Teboulle, “Mirror descent and nonlinear projected subgradient methods for convex optimization,” Operations Research Letters, vol. 31, no. 3, pp. 167–175, 2003.
Appendix A Appendix
We make use of two technical lemmas (Lemma 2 and 3) proved in the Appendix of [22]. We state their results here and use them in the proof of Theorem 1.
Lemma 2
Let be a convex set in a Banach space , denote a 1strongly convex function on with respect to a norm , and represent the Bregman divergence with respect to , respectively. Furthermore, assume that the local functions are Lipschitz continuous (Assumption 1), the matrix is doubly stochastic (Assumption 2), and the mapping is nonexpansive (Assumption 5). Then, the local estimates generated by the updates (8a)(8b) satisfy
for any , where .
Lemma 3
Let be a convex set in a Banach space , denote a 1strongly convex function on with respect to a norm , and represent the Bregman divergence with respect to , respectively. Furthermore, assume that the matrix is doubly stochastic (Assumption 2), the Bregman divergence satisfies the Lipschitz condition and the separate convexity (Assumptions 34), and the mapping is nonexpansive (Assumption 5). Then, it holds that
where
In what follows, we provide the proof of Theorem 1.
Aa Proof of Theorem 1
Recall the definition of dynamic regret in (4). Using the Lipschitz continuity of (Assumption 1) as well as the fact that the global loss is the sum of local losses (Eq. (3)), we get
Using the Lipschitz continuity of for , we simplify above as follows
(9) 
The second line can be controlled via Lemma 2, so we focus on the first term in the above bound. We have by convexity of that
(10) 
We now need to bound each of the terms on the right hand side of (10). The stochastic gradients are bounded in view of (7). Therefore, using Hölder’s inequality for any primaldual norm pair, we get
(11) 
where the last line is due to AMGM inequality. We now recall update (8b) and use Assumptions 1 and 2 to derive
(12) 
where in the last line we appealed to Lemma 2. Finally, the optimality of in (8a) implies (see e.g. Lemma 4.1 in [34]) that
(13) 
since the Bregman divergence satisfies for any in view of (6). Substituting (11), (12), and (13) into the bound (10), we derive
(14) 
To bound the last term, we note that
Also, due to (6) we have , which entails
Therefore, summing over forms a bounded difference martingale, and we can use Azuma’s inequality to get
Setting the probability above to and solving for implies
with probability at least . Summing (14) over and and incorporating the bound above into the last term, we get
with probability at least . Applying Lemma 3 to above, we can simplify as
We now return to sum (9) over . We apply the bound above and the bound in Lemma 2, respectively to the first and second line in (9) to finish the proof.