# Team-Optimal Distributed MMSE Estimation in General and Tree Networks

###### Abstract

We construct team-optimal estimation algorithms over distributed networks for state estimation in the finite-horizon mean-square error (MSE) sense. Here, we have a distributed collection of agents with processing and cooperation capabilities. These agents observe noisy samples of a desired state through a linear model and seek to learn this state by interacting with each other. Although this problem has attracted significant attention and been studied extensively in fields including machine learning and signal processing, all the well-known strategies do not achieve team-optimal learning performance in the finite-horizon MSE sense. To this end, we formulate the finite-horizon distributed minimum MSE (MMSE) when there is no restriction on the size of the disclosed information, i.e., oracle performance, over an arbitrary network topology. Subsequently, we show that exchange of local estimates is sufficient to achieve the oracle performance only over certain network topologies. By inspecting these network structures, we propose recursive algorithms achieving the oracle performance through the disclosure of local estimates. For practical implementations we also provide approaches to reduce the complexity of the algorithms through the time-windowing of the observations. Finally, in the numerical examples, we demonstrate the superior performance of the introduced algorithms in the finite-horizon MSE sense due to optimal estimation.

###### keywords:

Distributed networks, distributed Kalman filter, optimal information disclosure, team problem, finite-horizon, MMSE estimation, tree networks, Gaussian processes.^{†}

^{†}journal: Journal of LaTeX Templates

## 1 Introduction

Over a distributed network of agents with measurement, processing and communication capabilities, we can have enhanced processing performance, e.g., fast response time, relative to the centralized networks by distributing the processing power over the networks banavar15 (); sayedMag (); karMag (); bes15 (). Mainly, distributed agents observe the true state of the system through noisy measurements from different viewpoints, process the observation data in order to estimate the state, and communicate with each other to alleviate the estimation process in a fully distributed manner. Notably, the agents can respond to streaming data in an online manner by disclosing information among each other at certain instances. This framework is conveniently used to model highly complex structures from defense applications to social and economical networks krish11 (); ali2012 (); daron2011 (); jackson_book (). As an example, say that we have radar systems distributed over an area and seeking to locate hostile missiles, i.e., the location of the missile is the underlying state of the system. In that respect, distributed processing approach has vital importance in terms of detecting the missiles and reacting as fast as possible. In particular, even if the viewpoints of a few radar systems are blocked due to environmental obstacles, through the communication among the radar systems, each system should still be able to locate the missiles. Additionally, since each radar system not only collects measurements but also process them to locate the missiles, the overall system can respond to the missiles faster than a centralized approach in which measurements of all the radar systems are collected at a centralized unit and processed together.

Although there is an extensive literature on this topic, e.g., sayedMag (); ali2012 (); daron2011 (); jackson_book (); lopes2008 (); cattivelli2010 (); shahNips () and references therein, we still have significant and yet unexplored problems for disclosure and utilization of information among agents. Prior work has focused on the computationally simple algorithms that aim to minimize certain cost functions through the exchange of local estimates, e.g., diffusion or consensus based estimation algorithms sayedMag (); lopes2008 (); karMag (); giannakis09 (); sayin2013 (); sayin2014 (), due to processing power related practical concerns. However, there is a trade-off in terms of computational complexity and estimation performance.

Formulating the optimal distributed estimation algorithms with respect to certain performance criteria is a significant and unexplored challenge. To this end, we consider here the distributed estimation problem as a team problem for distributed agents in which agents take actions, e.g., which information to disclose and how to construct the local estimate. This differs from the existing approaches in which agents exchange their local estimates. Furthermore, we address the optimality of exchanging local estimates with respect to the team problem over arbitrary network structures.

We examine the optimal usage of the exchanged information based on its content rather than a blind approach in which exchanged information is handled irrespective of the content as in the diffusion or consensus based approaches. In such approaches, the agents utilize the exchanged information generally through certain static combination rules, e.g., the uniform rule uniform (), the Laplacian rule scherber2004 () or the Metropolis rule metropolis (). However, if the statistical profile of the measurement data varies over the network, i.e., each agent observes diverse signal-to-noise ratios, by ignoring the variation in noise, these rules yield severe degradation in the estimation performance sayedMag (). In such cases the agents can perform better even without cooperation sayedMag (). Therefore, the optimal usage of the exchanged information plays an essential role in performance improvement in the team problem.

Consider distributed networks of agents that observe noisy samples of an underlying state (possibly multi-dimensional) over a finite horizon. The agents can exchange information with only certain other agents at each time instant. In particular, agents cooperate with each other as a team according to a certain team cost depending on the agents’ actions. To this end, each agent constructs a local estimate of the underlying state and constructs messages to disclose to the neighboring agents at each time instant. We particularly consider a quadratic cost function and that the underlying state and measurement noises are jointly Gaussian.

We note that restrictions on the sent messages, e.g., on the size of the disclosed information, has significant impact on the optimal team actions. We introduce the concept of the oracle performance, in which there is no restriction on the disclosed information. In that case, a time-stamped information disclosure can be team-optimal and we introduce the optimal distributed online learning (ODOL) algorithm using the time-stamped information disclosure. Through a counter example, we show that the oracle performance cannot be achieved through the exchange of local estimates. Then, we analytically show that over certain networks, e.g., tree networks, agents can achieve the oracle performance through the exchange of local estimates. We propose the optimal and efficient distributed online learning (OEDOL) algorithm, which is practical for real life applications and achieves the oracle performance over tree networks through the exchange of local estimates. Finally, we introduce the time windowing of the measurements in the team cost and propose a recursive algorithm, sub-optimal distributed online learning (SDOL) algorithm, combining the received messages linearly through time-invariant combination weights.

We can list our main contributions as follows: 1) We introduce a team-problem to minimize finite horizon mean square error cost function in a distributed manner. 2) We derive the ODOL algorithm achieving the oracle performance over arbitrary networks through time-stamped information exchange. 3) We address whether agents can achieve the oracle performance through the disclosure of local estimates. 4) We propose a recursive algorithm, the OEDOL algorithm, achieving the oracle performance over certain network topologies with tremendously reduced communication load. 5) We also formulate sub-optimal versions of the algorithms with reduced complexity. 6) We provide numerical examples demonstrating the significant gains due to the introduced algorithms.

The remainder of the paper is organized as follows. We introduce the team problem for distributed-MMSE estimation in Section 2. We study the tree networks, exploit the network topology to formulate the OEDOL algorithm that reduces the communication load and introduce cell structures, which is relatively more connected than tree networks, in Section 3. We propose the sub-optimal versions of the ODOL algorithm for practical implementations in Section 4. In Section 5, we provide numerical examples demonstrating significant gains due to the introduced algorithms. We conclude the paper in Section 6 with several remarks.

Notation: We work with real data for notational simplicity. denotes the multivariate Gaussian distribution with zero mean and designated covariance. For a vector (or matrix ), (or ) is its ordinary transpose. We denote the vector whose terms are all s (or all s) by (and ). We denote random variables by bold lower case letters, e.g., . The operator produces a column vector or a matrix in which the arguments of are stacked one under the other. For a matrix , operator constructs a diagonal matrix with the diagonal entries of . For a given set , creates a diagonal matrix whose diagonal block entries are elements of the set. The operator denotes the Kronecker product.

## 2 Team Problem for Distributed-MMSE Estimation

Consider a distributed network of agents with processing and communication capabilities. In Fig. 1, we illustrate this network through an undirected graph, where the vertices and the edges correspond to the agents and the communication links across the network, respectively. For each agent , we denote the set of agents whose information could be received at least after hops, i.e., -hop neighbors, by , and is the cardinality of (See Fig. (b)b)^{1}^{1}1For notational simplicity, we define and .. We assume that and for . Note that the sequence of the sets is a non-decreasing sequence such that if .

Here, at certain time instants, the agents observe a noisy version of a time-invariant and unknown state vector which is a realization of a Gaussian random variable with mean and auto-covariance matrix . In particular, at time instant , each agent observes a noisy version of the state as follows:

where is a matrix commonly known by all agents, and is a realization of a zero-mean white Gaussian vector process with auto-covariance . Correspondingly, the observation is a realization of the random process , where almost everywhere (a.e.). The noise is also independent of the state and the other noise parameters , and . We assume that the statistical profiles of the noise processes are common knowledge of the agents since they can readily be estimated from the data sayed_book ().

The agents have communication capabilities and at certain time instants, i.e., after each measurement, they can exchange information with the neighboring agents as seen in Fig. (a)a. Let denote the information disclosed by to at time , and is the size of the disclosed information. We assume that there exists a perfect channel between the agents such that the disclosed information can be transmitted with infinite precision. Therefore, we denote the information available to agent- at time by

and let denote the sigma-algebra generated by the information set . Furthermore, we define the set of all -measurable functions from to by . Importantly, here, which information to disclose is not determined a priori in the problem formulation. Let be the decision strategy for , then agent- chooses , , from the set , i.e., and , based on his/her objective.

In addition to the disclosed information , , agent- takes action , where the corresponding decision strategy is chosen from the set , which is the set of all -measurable functions from to , i.e., and . Here, we consider that the agents have a common cost function:

where all actions , and are costly, and agent- should take actions and , and , accordingly. Therefore, this corresponds to a team-problem, in which agent- faces the following minimization problem:

(1) |

We point out that both and are infinite dimensional, i.e., (1) is a functional optimization problem and the optimal strategies can be a nonlinear function of the available information. Furthermore, the agents should also construct the disclosed information accordingly since other agents’ decisions directly depend on the disclosed information.

### 2.1 A Lower Bound

In order to compute the team optimal strategies, we first construct a lower bound on the performance of the agents by removing the limitation on the size of the disclosed information, i.e., . In that case, the following proposition provides an optimal information disclosure strategy.

Proposition 2.1. When , a time stamped information disclosure strategy, in which agents transmit the most current version of the available information (e.g., see Fig. 2), can lead to the team-optimal solution.

Proof. Through the time stamped information disclosure, each agent can obtain the measurements of the other agents separately in a connected network. However, the measurements of the non-neighboring agents could only be received after certain hops due to the partially connected structure, i.e., certain agents are not directly connected. As an example, the disclosed information of reaches to by passing through two communication links as seen in Fig. (b)b. In particular, this case assumes that each agent has access to full information from the other agents, albeit with certain hops, and corresponds to the direct aggregation of all measurements across the network at each agent.

Correspondingly, at time , all the information aggregated at th agent is given by

(2) |

where the information from the furthest agent is received at least after hops. Therefore, a time-stamped information disclosure strategy can lead to the team-optimal solution.

Let denote the sigma-algebra generated by the information set and be the set of all -measurable functions from (where if ) to . Then, agent- faces the following minimization problem:

(3) |

which is equivalent to

(4) |

since , , is set to the time-stamped strategy and has impact only on the term . Let be defined by

Then, team optimal strategy in the lower bound, i.e., oracle strategy, and the corresponding action are given by

(5) |

and we define .

### 2.2 ODOL Algorithm

Since the state and the observation noise are jointly Gaussian random parameters, we can compute (5) through a Kalman-like recursion moore_book (). Therefore, we provide the following ODOL algorithm. We introduce a difference set and a vector . Then, for the iterations of the ODOL algorithm are given by

where^{2}^{2}2If the inverse fails to exist, a pseudo inverse can replace the inverse moore_book (). , , , , , , and is the corresponding permutation matrix.

We point out that this is a lower bound on the original cost function (1), i.e.,

(6) |

where we substitute team-optimal action (when ) back into (4) and sum over and . However, the lower bound is not necessarily tight depending on . By Proposition 2.1, time-stamped information disclosure strategy, in which the size of the disclosed information is , yields the oracle solution. This implies that when , the lower bound is tight. Furthermore, team optimal solutions are linear in the available information and can be constructed through the recursive algorithm ODOL. However, is linear in the number of agents, , and in large networks this can cause excessive communication load yet communication load is crucial for the applicability of the distributed learning algorithms sayin2014 (); sayin2013 (). Therefore, in the following section, we provide a sufficient condition on the size of the disclosed information, which depends on the network structure (rather than its size), in order to achieve the lower bound (6).

## 3 Distributed-MMSE Estimation with Disclosure of Local Estimate

In the conventional distributed estimation algorithms, e.g., consensus and diffusion approaches, agents disclose their local estimates, which have size (note that this does not depend on the network size). The following example addresses whether the disclosure of local estimates can achieve the lower bound (6) or not.

### 3.1 A Counter Example

Consider a cycle network of agents as seen in Fig. 3, where , , , for , and . We aim to show that agent-’s oracle action at time , i.e., , cannot be constructed through the exchange of local estimates.

At time , agent- and agent- have the following oracle actions:

(7) | ||||

(8) |

Note that since there are two hops between agents and , at , agents do not have access to each other’s any measurement yet. At time , agent-’s oracle action is given by

Assume that can be obtained through the exchange of local estimates:

(9) |

Since all parameters are jointly Gaussian, the local estimates are also jointly Gaussian, , is linear in and . Furthermore, the measurements , , and are only included in and . Therefore, we obtain

where refers to the other terms. However, the equality of and implies due to the combination weights of and , respectively, and due to the combination weight of , which leads to a contradiction. Hence, which information to disclose over arbitrary networks for team-optimal solutions should be considered elaborately. In the following, we analytically show that lower bound could be achieved through the disclosure of local estimates over “tree networks”.

### 3.2 Tree Networks

A network has a “tree structure” if its corresponding graph is a tree, i.e., connected and undirected without any cycles skiena_book (). As an example, the conventional star or line networks have tree structures. We remark that for an arbitrary network topology we can also construct the spanning tree of the network and eliminate the cycles. In the literature, there exists numerous distributed algorithms for minimum spanning tree construction wu (); gallager1983 (); peleg_book (); elkin2004 (); khan2009 ().

Importantly, the following theorem shows that over tree networks we can achieve the performance of the oracle algorithm through the disclosure of the local estimates only.

Theorem 3.1: Consider the team-problem over a tree network, in which . Then, exchange of local estimates can lead to the team-optimal solution, i.e., agents can achieve the oracle performance.

Proof: Initially, agent- has access to only and the oracle action is . At time , the oracle action is given by

(10) |

which can be written as

(11) |

This implies that for and , the oracle performance can be achieved through the disclosure of local estimate. Therefore, we can consider the oracle action (10) even though agents disclose their local estimate instead of time-stamped information disclosure.

As seen in Fig. 4, over a tree network, for we have

(12) |

Note that the sets in (12) are disjoint as

(13) |

for all and . Notably, over a tree network, by (13), we can partition the collection set of the measurements received after at least -hops as follows

(14) |

In the time-stamped information disclosure, at time , agent- has access to , defined in (2). We denote the set of new measurements received by over at time by

which can also be written as

(15) |

where we exclude the information sent by to at time , i.e., . Then, we can write the accessed information as the union of new measurement , new measurements received over the neighboring agents and the accessed information at time as follows:

(16) |

Note that the sets on the right hand side of (16) are disjoint due to tree structure. Furthermore, by (15) and (16), the sigma-algebra generated by is equivalent to the sigma-algebra generated by the set . Since , we obtain

(17) |

By (15), we have

(18) |

which implies that for , is constructible from and for . Hence, by induction, we conclude that the lower bound can be achieved through the exchange of local estimates.

Remark 3.1: When the expectation of the state is conditioned on infinite number of observations over even a constructed spanning tree, only a finite number of the observations is missing compared to the case over a fully connected network. Hence, even if we construct the spanning tree of that network, we would still achieve the lower bound over a fully connected (or centralized) network asymptotically. As an illustrative example, in Fig. 10, we observe that the MMSE performance over the fully connected, star and line networks are asymptotically the same. Similarly, in chenConf (); chenI (); chenII (), the authors show that the performance of the diffusion based algorithms could approach the performance of a fully connected network under certain regularity conditions.

In the sequel, we propose the OEDOL algorithm that achieves the lower bound over tree networks iteratively.

### 3.3 OEDOL Algorithm

By Theorem 3.1, over a tree network, oracle action can be constructed by

(19) |

through the disclosure of oracle actions, i.e., local estimates. We remark that is linear in the previous actions , . In order to extract new information, i.e., innovation part, we need to eliminate the previously received information at each instant on the neighboring agents. This brings in additional computational complexity. On the contrary, agents can just send the new information compared to the previously sent information, e.g., . Note that here agents disclose the same information to the neighboring agents. Since we are conditioning on the linear combinations of the conditioned variables without effecting their spanned space, i.e., is computable from for and vice versa, agents can still achieve the oracle performance by reduced computational load, yet.

At time , agent- receives local measurement and sent information from the neighboring agents, . We aim to determine the content of the received information to extract the innovation within them and utilize this innovation in the update of the oracle action.

Initially, at time , agent- has only access to the local measurement . Then, the oracle action is given by

Let and , and set and . Then, we obtain

Next, instead of sending , agent- sends to the neighboring agents, ,

Correspondingly, at time , agent- receives and . Let be the corresponding random vector. Then, conditioning the state and the received information on the previously available information , we have

where and , where .

Let and and set

(20) | ||||

Then, we obtain

(21) | |||

and agent- sends

where denotes the corresponding th block of . Therefore, at time , agent- receives from :

(22) |

Since the last term on the right hand side of (22) is known by , we have

(23) |

where , and

(24) |

By (21), (23), and (24), the next oracle action is given by

Subsequently, agent- sends and the received information from yields

Then, is given by

(25) |

Correspondingly, we have

(26) |

Therefore, the oracle action can be written as

where are defined accordingly and

Following identical steps, for , the OEDOL algorithm is given by