Local Tomography of Large Networks
under the Low-Observability Regime
This article studies the problem of reconstructing the topology of a network of interacting agents via observations of the state-evolution of the agents. We focus on the large-scale network setting with the additional constraint of partial observations, where only a small fraction of the agents can be feasibly observed. The goal is to infer the underlying subnetwork of interactions and we refer to this problem as local tomography. In order to study the large-scale setting, we adopt a proper stochastic formulation where the unobserved part of the network is modeled as an Erdös-Rényi random graph, while the observable subnetwork is left arbitrary. The main result of this work is establishing that, under this setting, local tomography is actually possible with high probability, provided that certain conditions on the network model are met (such as stability and symmetry of the network combination matrix). Remarkably, such conclusion is established under the low-observability regime, where the cardinality of the observable subnetwork is fixed, while the size of the overall network can increase without bound.
In networked dynamical systems [1, 2, 3, 4] the state of the agents comprising the network evolves over time and is affected by peer-to-peer interactions. In general, information about the profile of interactions is unavailable. It is the goal of network tomography to infer network connectivity from observing the evolution of a subset of the graph nodes. Problems of this type arise in many domains where knowledge of the underlying topology linking agents is critical for better inference and control mechanisms. For example, in the context of epidemics, it is well-known that the network topology may foster or hinder the outbreak of diseases or opinions ; in the context of brain functionality, it is also known that the neuron connectivity impacts the efficiency and robustness of the brain dynamics  and can help explain brain functional disorders [7, 8]; and even in cyber-security applications it is important to determine and understand the underlying network structure to devise effective counter-measures .
This article focuses on the large-scale network setting. In such context, as in brain neuron networks or large Internet-of-Things networks, one can only observe and/or process limited portions of the network. More formally, we address a local tomography problem: a subset of the agents is observed and their subnetwork of interactions is inferred from these observations. Figure 1 depicts the local tomography paradigm. There are three main reasons that cause this observability limitation in the large-scale network setting:
Accessibility-limit. Some portions of the network are not accessible and, hence, unobservable. Moreover, in many large-scale settings the existence of some sources of interactions (i.e., unobserved network links) might be unknown.
Probing-limit. The acquisition of data and storage capacities can be smaller than the scale of the network.
Processing-limit. The complexity of the data-mining further constrains the size of the data that can be processed.
For instance, one may probe the activity of a subset of neurons – as it is unfeasible to track the activity of all the brain neuron network – in order to reconstruct its underlying profile of interactions (a.k.a. connectome). This requires that we partially observe the system and extract information about its underlying subnetwork of interactions. One could also sequentially integrate the inference results of various local tomographic experiments to deduce global information about the large-scale networked system, in particular, its topology.
Under the aforementioned local tomography setting, the problem of inferring the subnetwork topology across the observed agents becomes exceedingly challenging akin to ill-posed. It is therefore important to devise nontrivial conditions (if any) under which the problem is still well-posed, i.e., the information about the topology can be effectively inferred from the observable samples. In this article, we show that under an appropriate setting, the problem of local tomography becomes well-posed with high probability in the thermodynamic limit: when the number of interacting agents grows, the (fixed) subnetwork topology associated with the observed agents can be perfectly recovered. We refer to such framework as “low-observability” to emphasize that we are interested in studying the local tomography problem in the thermodynamic limit of large networks while the observed part is fixed and finite. Besides ascertaining conditions under which the problem is well-posed with high probability, we further derive a procedure on the space of observables to recover the subnetwork topology. Finally, as an application of these results, we devise a strategy that shows how to learn the topology sequentially, by partitioning the observable network into small patches, and launching successive instances of the local tomography algorithm on these patches.
I-a Preview of the Main Result
Network tomography is associated with retrieving the underlying network structure of a distributed dynamical system via observation of the output measurements of the constituent elements. The typical formulation of the network tomography problem involves two main objects: the statistical model that governs the laws of evolution of the (stochastic) dynamical system of interest; and a set of observables. In this article, we consider a stochastic dynamical system described by a first-order Vector Auto-Regressive (VAR) or diffusion model, where entities corresponding to the network agents interact over time , according to the following law:
Here, is a stable matrix with nonnegative entries (usually a stochastic matrix), and
with the vector collecting the state (or output measurements) at time of the agents comprising the network; and representing a random input (e.g., a source of noise or streaming data) at time . The ensemble are independent and identically distributed (i.i.d.) both spatially (i.e., w.r.t. to index ) and temporally (i.e., w.r.t. to index ). Without loss of generality, we assume that the random variables have zero mean and unit variance.
The support-graph of reflects the underlying connections among the agents. Indeed, we have from (1) that:
which shows that, in order to update its output at time , agent combines the outputs of other agents from time . In particular, agent scales the output of agent by using a combination weight . Note that the output of agent is employed by agent only if . The uppermost panel in Figure 2 offers a pictorial view of the local tomography problem, whereas the lowermost panel illustrates the role of the combination weights in determining the mutual influences between nodes.
Model (4) further shows how the observation is affected by the source value , which is available locally at agent at time .
It should be noted that several strategies for distributed processing over networks, such as consensus [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] and diffusion [22, 23, 24, 25, 26, 27] strategies, lead to data models of the form (4). The model also arises in economics – see, e.g.,  – and may also be viewed as a linear variational equation associated with a nonlinear stochastic dynamical system, where plays the role of the deviation about its noiseless counterpart, as in . In other words, tomography analysis over such family of stochastic dynamical systems is useful for a broad class of networked systems.
The problem of support-graph recovery addressed in this work is generally referred to as network tomography in the literature because only indirect observations are available. In our framework, only output measurements from a subset of the nodes are accessible, and no information is available about the unobserved nodes including their number or connectivity. We refer to this paradigm as local tomography. Under this challenging framework, the goal is identifiability of the topology linking the observable agents. That is, we consider the problem of inferring the topology associated with a subset of observable interacting agents, by measuring only the outputs produced by such agents.
Let us ignore for a while the restriction of partial observability. It is tempting (and actually not that uncommon in the literature) to estimate the connections between the network agents by measuring the correlation between their output measurements. There is, however, one critical issue related to the use of the correlation measure for topology inference, arising from the streaming nature of the data. In general, when an external observer starts collecting output measurements, the network would have been in operation since some time already. Therefore, after a transient phase, over a connected network all agent pairs will become correlated. In order to illustrate this point in greater detail, let us introduce the correlation matrix at time , namely, . When is symmetric (which is the case considered in this work), using (1), and neglecting transient terms associated with the initial state , we have that:
where the latter series is guaranteed to converge whenever is a stable matrix – all its eigenvalues lie inside the unit disc. Assuming that the system is observed at steady-state (i.e., that the system is in operation since some time), we must focus on the limiting correlation matrix, . However, we immediately see from (5) that, even if the correlation matrix were perfectly known, direct retrieval of the support graph of from is obfuscated by the fact that the correlation matrix depends on (a superposition of) powers of , and not only on . Moreover, even with full observation, when inversion of the matrix can be performed, in view of equation (5), one would retrieve , and not . Then, since the mapping from to (the support graph of) is not bijective in general, one would be faced with the inverse problem of retrieving the support graph of from – such inverse problem can still be explored by properly reinforcing some sparsity constraints, refer, e.g., to .
Tomography relies primarily in developing a scheme to properly process the observables – e.g., the state evolution of the interconnected agents – so as to infer the underlying network structure. The above naïve scheme, based purely on the correlation , can be improved by introducing the one-lag correlation matrix, which, in view of (1), takes the form:
Therefore, we obtain the following relationship:
In principle, since there exist many ways to estimate and consistently as , expression (7) reveals one possible strategy to estimate (and hence its support-graph) from the observations.
Topology estimation based on relation (7) is viable whenever full-observation of the system is permitted. Under a partial observability restriction, however, when only a subset of the network is accessible, only the covariance submatrices associated with the observable agents, denoted by and , are available. One is certainly free to introduce a truncated version of (7), say, as:
It is clear from basic linear algebra that (8) is in general distinct from the ground truth matrix , namely, from the combination matrix corresponding to the subnetwork connecting the observable agents whose support must be inferred.
Despite this difference, it has been shown in the recent work [31, 32] that the support of the observable network can still be recovered (consistent tomography) through the truncated estimator in (8), under certain conditions that can be summarized as follows: the overall network graph is drawn from a connected Erdös-Rényi random graph with vanishing connection probability; the cardinality of the observed subnetwork grows linearly with the size of the overall network; the matrix is a symmetric combination matrix belonging to a certain class.
The work  leads to several insightful conclusions about network tomography for Erdös-Rényi models. In this work, we pursue the same network tomography problem albeit for a different more demanding network setting, explained below, which will require new arguments and lead to new results that complement well the results from . In particular, the proof techniques used here will rely on graph theoretical techniques and on special graph constructs to arrive at the important conclusion that the naïve truncated estimator (8) is still able to deliver consistent tomography under partial network observations and more relaxed requirements. The main features of the framework proposed in the current manuscript can be summarized as follows:
— Topology of the accessible network portion. We assume that the subnetwork connecting the observable agents has an arbitrary topology, which is modeled through a deterministic graph. This subgraph is the object of inference. The remaining unobserved part is assumed to be drawn from an Erdös-Rényi random graph. The overall network construction is therefore referred to as a partial Erdös-Rényi model. It is useful to interpret and motivate this model in the classical “signal plus noise” paradigm in the following sense. For what concerns the object of the inference (i.e., the support graph of the observable nodes), it is modeled as an arbitrary deterministic signal. For what concerns the undesired component (i.e., the unobserved subnet), it is modeled as a noisy component. To get insightful results, we must choose some model for this random component. In the absence of any prior information, it is meaningful to opt for a uniform model, namely, the Erdös-Rényi random graph, where the presence/absence of each edge is determined through a sequence of i.i.d. Bernoulli experiments. Accordingly, few connections (i.e., high sparsity in the unobserved portion) take on the meaning of a controlled noise level. In contrast, in  it was assumed that the overall network (observed portion unobserved portion) is drawn from an Erdös-Rényi random graph. Such a construction poses limitations on the subgraph that we wish to identify, which cannot be selected in an arbitrary fashion any longer. Moreover, the network construction used in  assumes a vanishing fraction of connected nodes within the observable set; a condition that is removed in the current analysis.
— Cardinality of the accessible network portion. In , it was assumed that the cardinality of the observed subset scales linearly with , so that the ratio converges to some positive fraction as . In contrast, we assume here that the subnetwork of observable nodes is fixed. This means that, in our framework, we focus on retrieving the support of a subnetwork that is embedded in a network that becomes infinitely larger as , i.e., the size of the unobserved component becomes asymptotically dominant. The resulting regime is accordingly referred to as a low-observability regime. Such a model is particularly relevant, for example, in circumstances where we have a large network, and we are constrained to perform probing actions at few accessible locations. We remark that the case where the ratio goes to zero is not addressed (nor can be obtained from the results) in .
— Consistent tomography. The main result of the present work (Theorem further ahead) is establishing that consistent tomography is achievable under the aforementioned setting. We shall prove that, if the unobserved network is drawn from an Erdös-Rényi random graph with connection probability
where is a sequence that diverges as , satisfying the condition:
then the (arbitrary) support graph of can be recovered through the truncated estimator , with probability tending to one as . More specifically, in this work we are able to establish consistency in that each entry of the support graph is recovered perfectly in the limit as . In , mainly due to the fact that the object of estimation has cardinality growing with , consistency is not formulated in terms of an entry-by-entry recovery. Instead, consistency there is formulated in terms of two macroscopic indicators, namely, the fraction of correctly classified interacting pairs, and the fraction of correctly classified non-interacting pairs. Both fractions are proved to converge to one as grows to infinity.
Another difference with respect to  relates to the connection probability of the Erdös-Rényi graph. Having a sufficiently small translates into a sufficient degree of sparsity. In other words, if we interpret the unobserved network as a noisy component, the noise in the system cannot exceed a certain threshold to grant perfect reconstruction. For the setting considered in , a connection probability vanishing as in (9) was sufficient to achieve consistent tomography, without additional constraints on . On the other hand, for the results of this work to hold, we need the additional constraint in (10), which corresponds to invoking slightly more sparsity.
Finally, we remark that the results of this work allow drawing some useful conclusions also in relation to the setting addressed in , namely, in relation to the case of a full Erdös-Rényi construction with growing linearly with . We find it convenient to postpone the comments on this particular issue to Sec. VII, because some technical details are necessary for a proper explanation.
I-B Related work
The existing approaches to network reconstruction can be categorized based on two major features:
— : Class of networked dynamical systems governing the state-evolution of the agents, e.g., the diffusion model in (1), and related observables, e.g., the process in (1).
— : Topology-retrieval methods that should exploit the relation between the observables and the underlying support-topology. Such methods are sensitive to the dynamics and the observables arising from the model in .
Regarding , most works focus on linear systems. Nonlinear dynamics are often dealt with by linearizing via considering variational characterizations of the dynamics (under small-noise regimes) [33, 34, 35] or by appropriately increasing the dimension of the observable space [36, 37]. In the context of linear (or linearized) systems, particular attention is paid to autoregressive diffusion models [28, 38, 30, 39, 40].
For what concerns , the majority of the literature considers methods aimed at identifying commonalities between correlation constructs and graph topologies. We now make a brief summary of the available results as regards the existing topology-retrieval methods that are more closely related to our setting. To get some ideal benchmark, it is useful to start with the full observation case, and then focus on the case of interest of partial observation.
— Tomography under full observations. In , the authors introduce directed information graphs, which are used to reveal the dependencies in networks of interacting processes linked by causal dynamics. Such a setting is enlarged in , where a metric is proposed to learn causal relationships by means of functional dependencies, over a (possibly nonlinear) dynamical network. Causal graph processes are exploited in , where an algorithm (with a detailed convergence analysis) is proposed for topology recovery. Recently, the inverse problem of recovering the topology via correlation structures has been addressed through optimization-based methods, by reinforcing some (application-dependent) structural constraints such as sparsity, stability, symmetry. For instance, in [30, 39], since the combination matrix and the correlation matrix share the same eigenvectors, the set of candidate topologies is reduced by computing these eigenvectors, and the inverse problem is then tackled with optimization methods under sparsity constraints.
An account of topology inference from node measurements (still under the full observations regime) is offered in , where a general linear model is considered and an approach based on Wiener filtering is proposed to infer the topology.
However, as already noted in  a Wiener filtering approach is redundant, since exact topology recovery can be obtained (with full observations) through the estimator in (7). As it is well known, this solution admits the following useful interpretation: the combination weights obtained through (7) are the coefficients of the best one-step linear predictor (a.k.a., in the context of causal analysis, as Granger estimator), i.e., they yield the minimum expected squared error in estimating from the past samples – see, e.g., . We remark that the case where (7) is applied with correlation matrices estimated empirically from the measurements provides the best one-step linear predictor in a least-squares sense (i.e., when the expected squared error is replaced by the empirical squared error evaluated on the measurements collected over time). However, all the aforementioned results pertain to the case where node measurements from the whole network are available. It is instead necessary to consider the case when only partial observation of the network is permitted.
The case of more general topologies is instead addressed in [38, 47], where technical conditions for exact or partial topology identification are provided. It is useful to contrast such identifiability conditions with the approach pursued in the present work. Basically, the identifiability conditions offered in [38, 47] act at a “microscopic” level, namely, they need a detailed knowledge of the topology and/or the statistical model (e.g., type of noise, joint distribution of the observable data). For these reasons, the approach is not practical for large-scale network settings (which are the main focus of this work).
In contrast, in this work we pursue a statistical asymptotic approach that is genuinely tailored to the large-scale setting: the conditions on the network topology are described at a macroscopic level through average descriptive indicators, such as the connection probability between any two nodes. Under these conditions, we focus on establishing an achievability result that holds (in a statistical sense) as the size of the network scales to infinity.
— Tomography under partial observations, methods. As already noted, the classic, exact solution to the topology problem under full observation is provided by (7), and arises from the solution of a one-step linear prediction problem [43, 38]. Under partial observations, we propose to keep the same approach, except that the best one-step linear prediction is enforced on the observable nodes only. As a matter of fact, the combination weights estimated through (8) provide the best one-step linear prediction of the observable measurement (for ) from the past observable measurements . We remark that this solution, which can still be interpreted as a Granger estimator, is widely adopted in causal inference from time series, when one ignores and/or neglects the existence of latent components. However, there is in principle no guarantee that such an estimator can provide reliable tomography. Our main goal is establishing that it actually can, under the demanding setting illustrated in Sec. I-A.
— Connections with graphical models. In a nutshell, a graphical model can be described as a collection of random variables indexed by nodes in a network, whose pairwise relationships (which determine the topology, i.e., the undirected graph) are encoded in a Markov random field. One of the fundamental problems in graphical models is retrieving the network topology by collecting measurements from the network nodes. It is useful to comment on some fundamental differences, as well as useful commonalities, between the graphical model setting and our problem.
In the standard graphical model formulation (and, hence, in the vast majority of the available related results) the network evolution over time (e.g., the dynamical system in (1)) is not taken into account. Rather, the samples are assumed independent across the index . This difference has at least two relevant implications.
The first difference pertains to the type of estimators used for topology retrieval. For example, in a Gaussian graphical model, the inverse of the correlation matrix , a.k.a. concentration matrix, contains full information of the graph topology: the -th entry of the concentration matrix is nonzero if, and only if, nodes and are connected. In contrast, we see from the Granger estimator in (7) that in our case an additional operation is needed (namely, multiplication with the one-lag correlation matrix ) to obtain the matrix that contains the topology information (in our case, the combination matrix, ). This difference is an inherent consequence of the system dynamics described by the first-order VAR model in (1). Second, the dynamical system ruling the network evolution usually enforces some degree of dependence between subsequent measurements. For this reason, while in our case the observations collected over time are correlated, in the standard graphical model formulation the samples upon which the topology inference is based are usually assumed statistically independent. Keeping in mind these fundamental distinctions, we now list some recent works about topology recovery on graphical models.
The idea of studying the large-network behavior through an Erdös-Rényi model has been applied in , where the emergence of “large” paths over the random graph (a property that we will use in our treatment) has been exploited for topology inference. However, reference  addresses the case of full observations. Instead, for the case of partial observations, in  an efficient method is proposed, which is suited for the case of large-girth graphs, such as, e.g., the bipartite Ramanujan graphs and the random Cayley graphs. In , still for the case of partial observations, an inference method is proposed under the assumption that the connection matrix is sparse, whereas the error matrix associated to the latent-variables component exhibits a low-rank property.
In summary, contrasted with recent results about topology recovery on graphical models, the results obtained in the present work constitute an advance because: we deal with a dynamical system, see (1); the partial observations setting considered in the present work relies on assumptions different from those used in [50, 49]: in our case the unobserved component is Erdös-Rényi, but the subnetwork of observable nodes is deterministic and arbitrary, and the combination matrix obeys transparent conditions borrowed from the adaptive networks literature. We believe that the possibility of working with dynamical models, the arbitrariness of the monitored subnetwork, as well as the direct physical meaning of the conditions on the combination matrix, provide useful novel insights on the problem of topology inference under partial observations.
To sum up, our major contribution lies in establishing technical guarantees for graph structural consistency of the Granger estimator applied to the subset of observable nodes. This is formally stated in the main result of this paper, Theorem .
I-C Motivating Example: Adaptive Diffusion Networks
A network of agents observes a spatially and temporally i.i.d. sequence of zero-mean and unit-variance streaming data , for , and . Here, the letter refers to the time index while the letter refers to the node index. In order to track drifts in the phenomenon they are monitoring, the network agents implement an adaptive diffusion strategy [19, 53, 26], where each individual agent relies on local cooperation with its neighbors. One useful form is the combine-then-adapt (CTA) rule, which has been studied in some detail in these references. It involves two steps: a combination step followed by an adaptation step.
During the first step, agent combines the data of its neighbors through a sequence of convex (i.e., nonnegative and adding up to one) combination weights , for . The combination step produces the intermediate variable:
Next, during the adaptation step, agent updates its output variable by comparing against the incoming streaming data , and updating its state by using a small step-size :
It is convenient for our purposes to introduce a combination matrix, which we denote by , whose entries are obtained by scaling the weights as follows:
With this definition, we see immediately that (13) corresponds to (1) or (4) with . Note that under this diffusion framework, the matrix is naturally nonnegative and if we assume symmetry, its normalized counterpart is doubly stochastic.
Ii Notation and Definitions
We list our notation and some definitions used in later sections for ease of reference.
We represent sets and events by upper-case calligraphic letters, and the corresponding normal font letter will be used to denote the set cardinality. For instance, the cardinality of set is . The complement of a set is denoted by .
Standard canonical sets follow a different convention, for instance, the set of natural numbers is denoted by , and the set of symmetric matrices with nonnegative entries by .
We use boldface letters to denote random variables, and normal font letters for their realizations. Capital letters refer to matrices, small letters to both vectors and scalars. Sometimes we violate the latter convention, for instance, we denote the total number of network agents by .
Given an matrix , the submatrix that lies in the rows of indexed by the set and in the columns indexed by the set , is denoted by , or alternatively by . When , the submatrix will be abbreviated as or . In the indexing of the submatrix we will retain the index set of the original matrix. For example, if and , we have that the submatrix is a matrix, indexed as follows:
This notation is crucial in our treatment, since it will allow us to identify nodes without cumbersome and redundant double-index notation.
Finally, denotes a column vector with all its entries equal to one; denotes an matrix with all its entries equal to zero; denotes the indicator function, which is equal to one if condition is true, and is equal to zero, otherwise; the identity matrix is denoted by ; and denotes the natural logarithm.
Ii-B Graph notation
The set of all undirected graphs that can be defined on a set of nodes (vertex set) is denoted by . When is the number of nodes, the notation implies that the vertex set is .
When dealing with a graph , its connection structure (i.e., the edges of the graph) can be described through its adjacency matrix. The -th entry of the adjacency matrix of the graph will be denoted by the lower-case symbol , with if the nodes and are connected, and otherwise. Henceforth, we assume that , i.e., all nodes exhibit self-loops. This reflects the fact that usually each agent uses information from its own output measurement to update its state.
Given , and a subset , the subgraph corresponding to is denoted by . The support graph of a matrix is denoted by . The -th entry of its adjacency matrix is , namely, nodes and are connected on if, and only, if is strictly positive.
A path from to is a sequence of edges where the first edge originates from and the last edge terminates at . The existence of a path of length can be expressed as:
for a certain sequence of vertices belonging to . According to this definition, a path can also pass multiple times through the same node, or can linger for one or more steps at the same node when it has a self-loop.
The set of neighbors of the node (including itself) in the undirected graph will be denoted by . The degree of the node is the cardinality of , whereas is the maximum degree in . Likewise, the -th order neighborhood of the node (including itself) is denoted by , and is formally given by:
where is the distance between the nodes and on the graph , i.e., the length of the shortest path linking and .
A random graph obeys the Erdös-Rényi model if each edge of is drawn, independently from the other edges, with identical probability . Equivalently stated, the adjacency random variables , for and , are independent and identically distributed (i.i.d.) Bernoulli variables. The notation signifies that the graph belongs to the Erdös-Rényi class with connection probability that vanishes as , and that obeys the following scaling law:
where as (in an arbitrary way, provided that ). It is a well-known result that random graphs belonging to the family are connected with high probability .
As a note of clarity, in the forthcoming treatment, we assume that all random variables find domain in a common probability space , where is the set of realizations, is the sigma-algebra of measurable sets and is the probability measure. For instance, the event
represents the set of realizations yielding a graph whose distance between the (fixed) nodes and does not exceed . To render a more compact notation, we henceforth omit the realization in the characterization of the events. For instance, in the case of the event (19) we represent it rather as
where the random quantities are emphasized by the boldface letter – in the event in (20), the only random object is the graph as it is the only boldfaced variable.
Ii-C Useful Graph Operations
In our exposition, we will be performing certain operations over graphs, as well as evaluate certain functions such as comparing distances between nodes over distinct graphs. Therefore, it is useful to introduce the following graph operations for later use (which are illustrated in Figure 3):
Graph embedding. Given a vertex set , and a subset thereof, , the embedding of a graph into the larger graph will be denoted by:111In order to avoid confusion, we remark that the symbol is also used, in the graph literature, to denote a different kind of operation called “ring sum”. However, we prefer to denote our embedding operation by the same summation symbol to emphasize the “signal+noise” structure that is relevant in our application.
and results in a graph with the following properties: the connections between nodes in that are present in are cancelled; the nodes in the vertex set of graph are mapped into the corresponding nodes of graph , and so are the pertinent connections. We stress that the connections from to are determined by the graph . We notice that the operation in (21) is not commutative (because the first graph is embedded into the second graph, and not vice versa), and that the output graph does not depend on the connections existing in among nodes belonging to the set .
Local disconnection. Given a graph , the notation:
describes the graph that is obtained from by removing all the edges that connect nodes in to nodes in , namely, all the connections between and .
Connections inheritance. The notation:
describes the graph that is obtained from through the following chain of operations: all edges within are removed; all edges connecting nodes in to the rest of the network are removed; all connections from to the rest of the network are inherited by node .
All the above graph operations preserve self-loops unless otherwise stated.
Iii Problem Formulation
Consider a graph and assume we are able to observe data from a subset of the nodes. From these observations, we would like to devise a procedure that allows us to discover the connections among the nodes in , under the assumption that the structure of the graph in the complement set, , and as well as the connections between and , will be random, following i.i.d. drawing of the pertinent edges. The desired construction can be formally described as follows.
Let be a deterministic graph on the observable set , with some unknown topology (which is not restricted in any way), and let be an Erdös-Rényi random graph on nodes. We assume that the overall network graph, , is of the form:
Specifically, the connections within the observable set are described through the graph , while the connections within , as well as the connections between and , are described through the graph . Note that is an Erdös-Rényi random graph on nodes, but its subgraph is replaced by in characterizing , in view of (24) (refer also to Figure 3). Therefore, the structure of within the observable subnet becomes immaterial. Equation (24) highlights the “signal+noise” construction, with the boldface notation emphasizing the random (i.e., noisy) component that corresponds to the unobserved network portion, and with the normal font emphasizing the deterministic component that corresponds to the arbitrary topology of the observed network portion.
The aforementioned construction will be referred to as a partial Erdös-Rényi graph. The class of partial Erdös-Rényi random graphs with a deterministic graph component placed on the set , will be formally represented by the notation . We shall often refer to the observable graph over by the simpler notation . As such, we can also write,
to denote partial Erdös-Rényi random graphs with deterministic component . We assume that (and hence ) is unknown. In this context, the goal of local tomography is to estimate via observing the state evolution of the observable agents in . Figure 4 illustrates the partial Erdös-Rényi construction just described.
Before formulating the tomography problem, we observe that, under condition (9), the partial Erdös-Rényi graph is asymptotically connected with high probability for any choice of the subgraph , as stated in the following lemma.
Lemma 1 (Connectivity of partial Erdös-Rényi graphs).
Given any graph , the partial Erdös-Rényi graph
is connected with high probability, i.e.,
See Appendix A. ∎
Iii-a Combination assignment.
We assign (positive) weights to the edges of and denote the resulting matrix of weights by .
Some useful and popular choices are the Laplacian and the Metropolis rules, which are defined as follows222Strictly speaking, in the network literature the Laplacian and Metropolis rules are defined with weights that add up to one, which would correspond to (28) and (29) without the multiplying factor . The multiplying factor , which provides the matrix stability, is usually left separate and not absorbed into the combination matrix.
For instance, in the case of diffusion networks (14) we have , where is the step-size. In our treatment, it is more convenient to include this scaling factor into the combination matrix, as done in (28) and (29)..
Let and :
where is the degree of agent and is the maximum degree in the network. These rules arise naturally in the context of adaptive diffusion networks .
In this paper, we shall focus on the family of nonnegative symmetric combination policies introduced in , and whose characterizing properties we recall next.
Property 1 (Bounded-norm).
The maximum row-sum norm,
is upper bounded by some .
For nonnegative symmetric matrices, Property becomes:
From Property we see that (most of) the combination weights typically vanish as gets large, since a finite mass of value at most must be allocated across an ever-increasing number of neighbors – on an Erdös-Rényi graph, the average number of neighbors scales as , and in the regime considered in this paper we have in view of (18).
The next property identifies a useful class of combination policies, for which degeneracy to zero of the combination weights is prevented by proper scaling. As highlighted below, such property is broad enough to encompass typical combination rules, such as the Laplacian (28) and the Metropolis (29) rules.
Property 2 (Non-degeneracy under -scaling).
Consider a combination policy applied to a partial Erdös-Rényi graph . The combination policy belongs to class if there exists such that, for all with :
where goes to zero as . In other words, if two nodes and are connected (corresponding to ), then the scaled combination coefficient lies above a certain threshold value denoted by , with high probability, for large .
We denote by the class of weight-assignment policies holding both properties.
It is useful to remark that, since condition (32) is applied to a partial Erdös-Rényi construction, the nodes belonging to the observable set are connected in a deterministic fashion. This means that, for , the random variable is in fact deterministic. In this case, the condition in (32) should be rewritten, for any connected pair in , as:
because conditioning on a deterministic event becomes immaterial.
It is now useful to introduce a sufficient condition under which a combination rule fulfills Property . The relevance of this condition is that it can be readily verified for Laplacian and Metropolis rules, and it automatically provides one value of to identify the class .
Lemma 2 (Useful policies belonging to ).
Any policy for which
for some , satisfies Property with the choice .
See Appendix A. ∎
Using the definitions of the Laplacian rule in (28), it is readily verified that this rule fulfills (34) with the choice . Likewise, using (29), it is readily verified that this rule fulfills (34) with the choice . As a result, both policies fulfill Property , and belong to the class , with the following choices of (the meaning of the subscripts should be obvious):
Before proving the main result of this work, it is useful to illustrate the physical meaning of Property in connection with the network tomography problem. We introduce the error matrix that quantifies how much the truncated estimator in (8) differs from the true sub-matrix , namely,
The magnified -th entry of the truncated estimator in (8), , can be written as:
where is the error quantity, and the qualification of being “not vanishing” is a consequence of Property . According to (37), if we want the nonzero entries to stand out from the error floor, when and are interacting, or to be bounded above (by ), when and are non-interacting, as grows large, we must be able to control the impact of the error term .
We are now ready to summarize the main problem treated in this article.
Local Tomography. Let be an matrix obtained from any combination assignment belonging to the class , over a graph on nodes with a given (arbitrary) subgraph . Let be the state-evolution associated with the observable subset of agents and obeying the stochastic dynamical law
Problem: given , can we determine ?
Iv Main result
The main result of this work is establishing that the truncated estimator introduced in (8), contains enough information to recover the true support graph, , of the combination matrix that corresponds to the observable subset, . More specifically, we establish that a positive threshold exists, such that the graph obtained by comparing the entries of against some threshold matches, with high probability, the true support graph . This implies that the topology of the subnetwork can be fully recovered, with high probability, via the output measurements used to construct and , namely, via the observable nodes only. Even if broader observation is permitted, the result enables the possibility of surmounting the curse of dimensionality by processing smaller matrices and , instead of large-scale matrices and – wherein one of the operations involves the often expensive inversion of a large matrix – and yet attaining exact recovery with high probability.
Before stating the main theorem, let us introduce a useful thresholding operator. We consider a matrix , whose -th entry is , with and . The thresholding operator compares the off-diagonal entries against some threshold , and produces as output a graph, , whose adjacency matrix has -th entry equal to
In other words, the thresholding operator returns a graph whereby two nodes and are connected only if the -th entry of the matrix exceeds the threshold. We assume all entries of the main diagonal of equal to one. We are now ready to state the main theorem.
Theorem 1 (Exact recovery of ).
Let be a given deterministic graph (with arbitrary topology) on nodes, and let be a partial Erdös-Rényi random graph where the sequence that determines the connection probability, , obeys the condition:
Let also be a combination matrix obtained from any combination assignment belonging to class over the graph , for some and (recall Properties and ). Then, the following results hold:
If are interacting, then the -th magnified entry of the truncated estimator, , exceeds the threshold with high probability as .
If are non-interacting, then the -th magnified entry of the error matrix, , converges to zero in probability.
The graph obtained by applying the thresholding operator in (39) to the magnified truncated estimator, , matches the true support graph, , with high probability as , namely,
See Appendix B. ∎
Iv-a Outline of the main proof
In view of Property , Eq. (42) implies that, when and are interacting nodes, the quantity exceeds a positive threshold with high probability, and, hence, part of Theorem 1 is proved. If, in addition, we show that the magnified error converges to zero in probability over the non-interacting pairs, i.e., if we prove part , then it is possible to classify correctly, as , each pair of nodes by comparing the truncated estimator against the threshold (or any smaller value): if , then classify as an interacting pair, otherwise classify it as non-interacting. As a result, and since the cardinality of the observable set is finite, part would follow if we are able to prove part .
where and . The error in (43) is determined by three main terms, namely: , which is nonzero only if node (from subset ) and agent (from subset ) are neighbors; , which is nonzero only if node (from subnet ) and agent (from subset ) are second-order neighbors (i.e., connected in one or two steps); the term , which is the -th entry of the matrix . Clearly, in (43), the relevant entries are those that are “activated” by nonzero values of and . The pairs for which and are nonzero will be accordingly referred to as “active pairs”. Refer to Figure 5 for a graphical illustration of the active pairs.
It is also clear from (43) that, in order to get a small error, small values of (over the active pairs) are desirable. In Theorem 2 – see Appendix C – we are able to show that, for large , vanishing values of are obtained when the distance between nodes and gets large. In particular, Theorem 2 states that the distance between and that plays an important role in the magnitude of is the one evaluated on the transformed graph (see Figure 6), where the observable graph is replaced by an empty graph.
As observed in the proof of Theorem 2 the magnitude of is not contingent on the particular topology . As a result, removing the dependency on is crucial to get a universal result, namely, a result that holds for any arbitrary .
Since, loosely speaking, Theorem 2 implies that the error is small if the distance on the transformed graph is large, the remaining part of the proof consists of showing that the distance over the active pairs is large with high probability, namely, that small distances are rare as goes to infinity. Now, such a conclusion can be proved for a pure Erdös-Rényi graph, , as shown in Lemma 3 – see Appendix E. However, proving the same result for a partial Erdös-Rényi graph, , presents a nontrivial difficulty related to the fact that the partial Erdös-Rényi graph is not homogeneous333Actually, we will see in Appendix B that a further source of inhomogeneity arises, which is related to the terms and . The homogenization procedure that we are going to introduce is able to solve this further inhomogeneity. because the observable subgraph can be arbitrary. In order to overcome this difficulty, we will carefully implement a procedure of homogenization and coupling – see Appendix D. The homogenization procedure amounts to carefully replacing the partial Erdös-Rényi random graph by an Erdös-Rényi graph that is coupled with in the following sense: if small distances are rare over the classic (hence homogeneous) Erdös-Rényi graph , then small distances are also rare over the coupled partial (hence inhomogeneous) Erdös-Rényi graph . In summary, the homogenization-and-coupling is a useful formal tool that is used to reduce the inhomogeneous case to a (simpler) homogeneous graph.
V Applying Theorem 1
Theorem 1 asserts the possibility of performing local tomography over large-scale diffusion networks as it asserts the existence of a threshold such that the entries of the naïve estimator provide correct reconstruction of the observable network with high-probability. In particular, introducing a detection threshold:
the topology of the observable network can be recovered for sufficiently large as follows: If , then classify nodes and as interacting, otherwise classify them as non-interacting. Starting from this result, we will first examine the numerical implementation details related to the application of Theorem 1, and then, in Sec. VI, we will illustrate the local tomography algorithms at work.
From a practical perspective, it is necessary to select an appropriate value for , in order to correctly classify the interacting and non-interacting pairs. In this connection, prior information on the dynamical system in (1) can be useful to set the detection threshold. Indeed, we see from (44) that knowledge is needed about: the average degree ; and the parameter that characterizes the class where the combination matrix stems from. Let us assume that such a knowledge is available. Now, using the values of reported in (35), for the Laplacian and Metropolis rules we have, respectively:
We observe that Theorem 1 is an asymptotic (in the size ) result. For a numerical practical application of this result, it is useful to make three remarks.
First, given a detection threshold that guarantees exact asymptotic classification, any threshold smaller than still guarantees exact asymptotic classification. In order to explain why, let us consider two thresholds and , with , and assume that is known to provide exact asymptotic classification. Then we have that: for interacting pairs, if is higher than , then it is obviously higher than , implying correct classification also with threshold ; for non-interacting pairs, Theorem ensures that, asymptotically, will be smaller than any small value , implying correct classification also with threshold . In other words, also provides exact classification.
Second, we observe that a combination rule can fulfill Property for different values of . For example, assume that we proved that a combination rule fulfills Property with a certain value . Then, the same policy fulfills Property with a higher value, e.g., .
Third, consider a pair of interacting nodes, and let us examine (42). According to Property , the selection of relates only to the properties of the combination matrix, namely, to the behavior of for interacting nodes. On the other hand, for finite sizes of the network, the error is small, but not zero. As a result the quantity will be greater than , namely, the entries of the estimated combination matrix are, on average, shifted upward due to this additional (and positive) error. As a result, one expects that, for finite values of , increasing the values of may be beneficial for classification purposes.
The aforementioned issues show that there is freedom in selecting the threshold parameter to attain exact topology recovery asymptotically (i.e., as grows to infinity). On the other hand, we remark that different threshold choices are expected to behave differently for finite network sizes. In fact, the following trade-off arises: a higher threshold reduces the likelihood that a zero entry of the combination matrix gives rise to a ( false) threshold crossing, while concurrently increasing the likelihood that a nonzero entry gives rise to a (correct) threshold crossing.
V-a Nonparametric Strategies
From the analysis conducted in the previous section, we have learned the following facts about tomography based on the thresholding operator. First, a good threshold tuning requires some a-priori knowledge of the model (e.g., of the average number of connected nodes, , or of the class of combination matrices to set the constant ). Second, even with some good a-priori knowledge, it is not clear how the threshold should be optimized to maximize the performance, because a trade-off arises for finite network sizes, whose (nontrivial) solution would require an even more detailed knowledge of the underlying model.
For all these reasons, it is useful to verify the possibility of employing some nonparametric pattern recognition strategies, which work blindly (without any a-priori knowledge), and which are capable to automatically adapt the classification threshold to the empirical data. In particular, in the forthcoming experiments we will consider a -means clustering algorithm (with ) that will be fed with the entries of the truncated estimator matrix in (8). The clustering algorithm will attempt to find some separation threshold empirically on the data, and to split accordingly the matrix entries into two clusters (connected and non-connected). The cluster with higher arithmetic mean will be then elected as the cluster of connected nodes.
V-B Unknown Correlation Matrices
Until now, we have implicitly assumed that the correlation matrices, and , are perfectly known, and, hence, that the truncated estimator in (8) could be evaluated exactly. However, in practice such correlation matrices are unknown, and must be estimated from the data. For this reason, we will consider numerical simulations where we empirically estimate the truncated correlation matrices and from the observed data through the sample-average estimator (boldface notation is omitted to emphasize that the observed refers to a particular realization):
We remark that it is possible to optimize such estimates by exploiting prior information on the structural properties of the system, and such an optimization could boost the performance of the algorithm. This estimation-tuning is outside the scope of this paper, but showing that our strategy works with the (perhaps) simplest correlation estimators is definitely encouraging. In the next section, we will additionally display the performance of the algorithm under the ideal case of known correlations, where the exact computation of the truncated estimator in (8) can be accomplished. This ideal case provides a superior limit in performance also with respect to optimized correlation estimators.
Vi Numerical Experiments
We are now ready to present the results of the numerical experiments. In Figure 7, we display the (empirically-estimated) topology-recovery probability, with reference to an overall network with number of nodes ranging from to , and for the case of a Laplacian combination rule with . The observable network is made up of nodes.
We see that the probability of correct recovery gets close to as increases for all the considered scenarios: parametric versus -means thresholding, and empirically estimated truncated correlation matrices (as in (46) and (47)) versus known truncated correlation matrices. We notice that the recovery probability curve is not monotonic. Such behavior matches perfectly our theoretical results, as we now explain. First, when (first point in Figure 7), all the network is observed, and in view of (6) (and the comments that follow this equation) the recovery probability must be equal to . Second, Theorem ensures that a probability of correct recovery equal to must be also attained asymptotically (in ). Accordingly, since the error probability curve starts from the value , and converges to as increases, the non-monotonic behavior exhibited in Figure 7 makes sense.
Let us now compare the performance of the different strategies shown in Figure 7. As one expects, the strategies that know the true correlation matrices outperform the strategies that do not know them. A separate comment is called for while comparing the thresholding estimator and the clustering algorithm. Perhaps unexpectedly, the latter strategy outperforms the former one. However, this behavior matches well the theoretical considerations made in the previous section. Indeed, in the simulations the threshold employed by the thresholding estimator is not optimized at all, whereas the nonparametric clustering algorithm might automatically adapt the threshold to the empirical evidence arising from the data, thus delivering a better performance.
The above analysis is repeated for the case of a Metropolis combination rule, and the result is shown in Figure 8.
The same general conclusions that we draw for the Laplacian rule apply. However, we see here that the performance of the thresholding operator seems slightly worse. One explanation for this behavior is the following. The choice of the constant in (35) is perhaps over-conservative. Indeed, such choice follows by estimating the asymptotic scaling law of the maximal degree (see Lemma 2), whereas the nonzero entries of the Metropolis matrix in (29) are determined only by the maximum over pairs of degrees. This means that, on average, the nonzero entries of the Metropolis matrix are higher than what is predicted by the chosen . It is expected that for the Metropolis rule, a larger threshold can be used without affecting the identification of connected pairs, while reducing the errors corresponding to the disconnected pairs.
Vi-a Beyond Theorem 1
Theorem 1 establishes that, under certain technical conditions, it is possible to retrieve the topology of a subset , even when the majority of the network nodes are not observed. This appears to be a nontrivial result, since an observable measurement , , is subject to the influence of nodes from all across the network. This happens because the diffusion recursion in (4) links the nodes through the overall matrix , which takes into account also the influences that unobserved nodes have on observed nodes.
The possibility of inferring the topology of a subnet by taking measurements from this subnet only, and by ignoring the unobserved components, is of paramount importance, in view of the accessability, probing and processing limitations that arise unavoidably in practical applications. In particular, it is tempting to think about a sequential reconstruction strategy, where a larger network is reconstructed through local tomography experiments over smaller network portions, and where each local experiment obeys some probing/processing constraints. We start by illustrating the perhaps simplest sequential reconstruction strategy.
Assume that there is an observable subset of nodes , which is embedded in a larger network with many unobserved components. Due to probing and processing limitations, it is possible to probe and process at most nodes per single experiment. Accordingly, the set is divided into the “patches” . For simplicity, we assume that the patches do not overlap each other and that they cover completely the set (i.e., the patches form a partition of ). Each local tomography experiment will correspond to probing the union of two patches, . For this reason, each patch has cardinality , for , which allows coping with the probing-and-processing constraint. The process is pictorially illustrated in Figure 9. Clearly, if a particular pair of nodes does not belong to the union of patches under test, we cannot make inference on that pair. The maximum number of experiments that allows testing all pair of nodes is . Per each experiment, we apply the local tomography strategy described in the previous section, namely: we compute the empirical correlation matrices, and , from which the truncated estimator is computed; and we apply the -means algorithm to obtain an estimated subgraph . As more and more pairs of patches are tested, the connection profile of the network is reconstructed. A pseudo-code for the sequential reconstruction algorithm, nicknamed “Patch&Catch”, is shown in Algorithm 1.
Before seeing the Patch&Catch algorithm in operation, it is important to make a fundamental remark. Proving that the sequential reconstruction strategy retrieves the topology of consistently (as ) seems a nontrivial task. In particular, we now explain why consistency of the Patch&Catch algorithm does not come as a corollary of Theorem 1.
Assume first that is an arbitrary deterministic network. In order to apply Theorem 1 to each local experiment, the unobserved component should be an Erdös-Rényi graph. However, given a union-of-patches under test, , the unobserved component is a mix of an Erdös-Rényi graph and of a portion of (refer to Figure 9). Since the latter portion is not purely Erdös-Rényi (because is arbitrary), Theorem 1 does not directly apply. On the other hand, if we assume that the whole graph (and, hence, also ) is Erdös-Rényi, then the network would be not fixed as . In particular, since has finite cardinality, it will become asymptotically disconnected, with high probability as .
In summary, we make no claim that the sequential reconstruction can grant consistent recovery. Therefore, the numerical results we are going to illustrate in this subsection must be intended as a preliminary test aimed at checking whether, in the finite network-size regime, a sequential reconstruction strategy might be successfully applicable.
We are now ready to see an application of the Patch&Catch algorithm. The overall network is made up of nodes, and is generated according to an Erdös-Rényi graph with probability of connection . The combination matrix is obtained via the Metropolis rule, and the system is observed over a time scale of samples. We run the Patch&Catch algorithm in a sub-region , assuming a strict probing constraint of nodes per experiment.
In Figure 10 we consider a subset of cardinality , and we display the evolution of the algorithm for an increasing number of tested patches.
Since , and choosing equal-sized patches, we get patches. For each experiment, we depict the true graph of connections (blue edges), as well as the overall graph of connections estimated up to the current experiment (red edges). The network nodes that form the patches tested in the single experiment are highlighted in red. We see from Figure 10 that the network is faithfully reconstructed, sequentially as the number of experiments grows, until the complete subnetwork topology is correctly retrieved after experiments.
In Figure 11, the same procedure is applied to a larger subset with .