Spectral identification of networks using sparse measurements
We propose a new method to recover global information about a network of interconnected dynamical systems based on observations made at a small number (possibly one) of its nodes. In contrast to classical identification of full graph topology, we focus on the identification of the spectral graph-theoretic properties of the network, a framework that we call spectral network identification.
The main theoretical results connect the spectral properties of the network to the spectral properties of the dynamics, which are well-defined in the context of the so-called Koopman operator and can be extracted from data through the Dynamic Mode Decomposition algorithm. These results are obtained for networks of diffusively-coupled units that admit a stable equilibrium state. For large networks, a statistical approach is considered, which focuses on spectral moments of the network and is well-suited to the case of heterogeneous populations.
Our framework provides efficient numerical methods to infer global information on the network from sparse local measurements at a few nodes. Numerical simulations show for instance the possibility of detecting the mean number of connections or the addition of a new vertex using measurements made at one single node, that need not be representative of the other nodes’ properties.
A major problem in the context of complex networks of interacting dynamical systems, which has been considered for many years, is to predict the collective dynamics when the network topology is known. However, in many situations, it is often desirable to address the reverse problem of inferring the topology of the network from available data capturing the collective dynamics. This reverse problem is relevant in fields such as biology (e.g. reconstructing regulatory networks from gene expression data), neuroimaging (e.g. revealing the structural organization of the brain), and engineering (e.g. localizing failures in power grids or computer networks), to list a few. Network identification problems have received increasing attention over the past years, and the topic is actively growing in nonlinear systems theory. See e.g. the recent survey . Many methods have been developed, exploiting techniques from various fields: linearization [19, 31], velocities estimation [13, 28], adaptive control , steady-state control , optimization , compressed sensing [19, 34], stochastic methods , etc. These methods provide the structural (i.e. exact) connectivity of the underlying network and exploit to do so the dynamical nature of the individual units, which is often known, at least partially. In contrast, correlation-based methods using statistical measures  or information-theoretic measures  have also been developed, but they can only infer the effective (i.e. statistical) connectivity of the network.
Network identification methods developed in the framework of dynamical systems theory are usually not well-suited to the analysis of real networks such as biological networks, social networks, etc. Most of them are invasive, requiring the modification of the network connectivity or dynamics. In addition, some of them cannot be used “offline” for data analysis, since they require to interact dynamically with the network. More importantly, all the methods proposed so far for full network reconstruction require measurements at all the nodes of the network. Partial measurements have been considered in  in the context of linear time-invariant systems for a partial reconstruction of the network between the measured states, and yet the authors showed that the problem cannot be solved without additional information on the system. It can actually be shown that measuring all nodes is necessary for a full network reconstruction, and this is usually out of reach in large real networks. Indeed, the number of sensors is limited and typically (much) smaller that the number of nodes. Some nodes of real networks might also not be accessible, or the only available information might be the averaged activity of a group of nodes lying in a given region of the network (e.g. electrical activity in a region of the brain). All these limitations motivate the network identification framework developed in this paper, which overcomes them.
In this work, we take the view that identifying the exact complete topology of large networks is not only practically impossible, as mentioned above, but also often unnecessary. The presence or absence of an edge between two specific nodes is for instance often only marginally relevant when analyzing the global structure of a large network. For this reason, we focus instead on the identification of the spectral properties of the network, a framework that we call spectral network identification. Note that the idea of estimating the spectral properties of networks has been considered in the control theory community (e.g. [24, 8, 30]), but in the case of specific (linear) consensus dynamics imposed at each node. On one hand, spectral properties do not reveal the exact full network topology (indeed, we cannot “hear the shape” of a drum ), so that the identification objective has been relaxed. On the other hand, they are a central theme of study in spectral graph theory  —where they are typically defined through the so-called Laplacian matrix associated with the network. They are shown to provide relevant information on the global network structure such as mean, minimum and maximum node degree, and connectivity, and they are reflected on the network dynamics, see e.g. . For instance, the second smallest eigenvalue of the Laplacian matrix —also called algebraic connectivity— is related to the speed of information diffusion in the network (e.g. opinion propagation, spreading of epidemics) and plays a key role in studying network synchronization. More generally, the spectral properties of the network provide simple markers capturing the global network structure. These spectral markers can be used to detect a pathology or a fault and to compare different networks.
While classical full topology identification requires measurements at all the nodes of the network, we show that spectral network identification requires only sparse measurements in the network. This can be roughly explained by the fact that each node of a strongly connected network “feels” the influence of all the other nodes. With the method developed in this paper, measurements can therefore be performed on a very small subset of nodes (e.g. only one in some cases) that might not be representative of the whole set of nodes. They can also be defined by a possibly nonlinear function of the states of several nodes, such as the average dynamics of a group of units. Moreover, the proposed method is not invasive and can be used offline.
Spectral properties of (nonlinear) dynamics are well-defined in a framework based on the so-called Koopman operator  and can be extracted from data through numerical methods such as the Dynamic Mode Decomposition (DMD) algorithm [26, 33]. In this context, our main theoretical contribution is to connect these spectral properties of the collective network dynamics, which are measured, to the spectral properties of the network, which are to be inferred. These results are obtained in the case of a diffusive coupling for networks reaching a synchronized equilibrium, where the states of all units converge to the same value, a behavior which can be observed with excitable neurons, cardiac cells, opinion dynamics, and epidemics. For small networks, exact spectral identification is achieved. For large networks, a statistical approach is proposed, which focuses on spectral moments and is well-suited to the case of heterogeneous populations. With a few sparse measurements in the network, this framework allows to estimate the average number of connections, detect the addition of a node to the network, and measure whether two units influence each other.
The rest of the paper is organized as follows. In Section 2, the problem of spectral network identification is introduced. In Section 3, theoretical results are presented in the case of linear and nonlinear networks. At the end of the section, we also discuss the limitations encountered for the exact spectral identification of non-identical units. Section 4 focuses on large networks and provides statistical results related to the spectral moments of the Laplacian matrix, which are related to the statistical distribution of the node degrees. Numerical aspects of the methods are discussed in Section 5 and illustrated with several applications in Section 6. Concluding remarks and perspectives are given in Section 7.
2 Problem statement
2.1 Classical vs spectral network identification
A networked dynamical system consists of a set of interconnected dynamical systems (or units) interacting on a (weighted) graph . The system is deterministic since no stochastic perturbation is considered. Each unit is attached to a node (or vertex) and is directly influenced by another unit if is an edge of the graph, i.e. . The strength of the interaction between units is determined by the function which assigns a weight to each edge. The (weighted) degree of a vertex is given by
where, with a slight abuse of notation, if and if . We assume for all . The graph is represented by its (weighted) adjacency matrix defined with the entries
Alternatively, the graph is also described by the (weighted) Laplacian matrix
with the degree matrix . In the following, we consider graphs that can be weighted and directed, i.e. does not imply , so that and in general.
The networked dynamical system is completely defined by
the graph ;
the local dynamics of the states , , of the units attached to the vertices;
the type of coupling between pairs of interacting units.
Here we are interested in the following network identification problem: under the assumption that (ii) and (iii) are known, infer the graph topology (i) from measurements of the state of the units. In particular, classical and spectral network identification problems are defined as follows.
Classical network identification. Suppose that (ii) and (iii) are known. From measurements of the states of all the units of the network, infer the set of edges and the weight function .
Spectral network identification. Suppose that (ii) and (iii) are known. From measurements of the states of a small subset of units, estimate the spectrum of the Laplacian matrix (i.e. the Laplacian eigenvalues) or the first spectral moments in the case of large networks (see Section 2.2).
In this paper, we will develop the framework of spectral network identification. We focus on networked systems that admit a stable equilibrium corresponding to the synchronization of the units. We also make the standing assumption that the units interact through a diffusive coupling.
2.2 What does spectral information reveal about the graph?
In contrast to classical network identification, spectral network identification only requires sparse measurements in the network. The price to pay is the relaxed objective of getting only the Laplacian eigenvalues. Although this spectral information does not reveal the complete graph structure, it captures important topological properties of the network. We will not review the vast literature related to spectral graph theory (we refer the interested reader to ), but provide here some basic results connecting the Laplacian spectrum to the topological properties of the graph.
In the case of a connected graph, the second smallest eigenvalue —called algebraic connectivity—captures the connectivity of the graph; see also the Cheeger inequality in undirected graphs. The algebraic connectivity is related to the time constant of the dominant dynamics and to the speed of information propagation in the network. It also provides a bound on the diameter of the graph, i.e. the longest path between any pair of vertices . Moreover, the algebraic connectivity and the spectral radius (i.e. the largest eigenvalue) can be used to derive bounds on the minimal and maximal vertex degrees and . In particular, for an undirected graph, we have 
In the case of large graphs, it is convenient to consider the spectral moments
which are related to the moments of the degree distribution. The first spectral moment is equal to the mean vertex degree , i.e.
and the first and second spectral moments give bounds on the quadratic mean of the degree distribution :
The equality (3) is trivial and a short proof of the inequalities (4) is given in Appendix A. In the case of undirected and unweighted graphs, other relationships can be derived, which link the spectral moments of to local structural features of the network .
3 Exact spectral identification
In this section, we develop the spectral network identification framework in the case of identical units. We first consider linear systems and then extend the results to nonlinear dynamics. The main results of this section provide the exact connection between the spectral properties of the collective dynamics and the spectral properties of the network.
3.1 Linear systems with identical units
Consider a network of identical units that are each described by states evolving according to the linear dynamics
with , , and (which are assumed to be known). The interaction between the units is given by the diffusive coupling
Considering the state vector , we have
where is the identity matrix and is the Laplacian matrix. We denote
and the solution of is given by
where is an eigenvector of and is the corresponding eigenvalue. Note that depends on the initial condition . We assume that , , and are so that the units synchronize, i.e. , or equivalently for all .
In the context of spectral network identification, measurements are performed through the linear observation function , where is a sparse matrix and is the number of measurements. (Note that the case of a nonlinear observation function will be treated together with the case of nonlinear dynamics in Section 3.2.) It is clear that all eigenvalues of appear in the expression of the measurement
(provided that for all 111This condition is equivalent to the observability of the pair , i.e. .) and this is true even if only one state of one vertex is measured, i.e. , where is the th unit vector. Therefore, estimations of the eigenvalues can be computed from snapshots of (8). To do so, one can use the so-called Dynamic Mode Decomposition (DMD) algorithm [26, 33]. The algorithm is described in detail in Appendix B, and its numerical implementation is discussed in Section 5. The efficiency and accuracy of the algorithm will be illustrated in the sequel through several examples.
The DMD algorithm works in practice with a set of time series obtained with several initial conditions. We therefore assume that measurements are performed while the network is reset several times to a state different from its equilibrium point. Accurate results can also be obtained when only the states of a group of vertices are reset, provided that this group is large enough. In the sequel, we will however consider that all the states of the network are reset (as it happens for instance in real networks of excitable neurons or cardiac cells). Note also that Section 5 provides a way to decrease the number of time series used with the DMD algorithm.
What remains to show is that the spectrum of can be inferred from the (measured) spectrum of when the local dynamics (i.e. , , and ) are known. The following lemma provides a relationship between and .
For , we have
The eigenvectors of are given by
with and .
so that any vector of the form (10) is an eigenvector of associated with the eigenvalue . We have to show that does not admit other eigenvalues associated with other eigenvectors. If the matrices have independent eigenvectors , then a complete set of independent eigenvectors is given by (10). If a matrix has less than independent eigenvectors, then it admits an eigenvalue with (algebraic) multiplicity and
where is a generalized eigenvector of . In this case, we have
so that is a generalized eigenvector of and is also of multiplicity . It follows that there is no other eigenvector than (10). This concludes the proof. ∎
We remark that the result implies that , since . Now we can show that the spectral identification problem is consistent: there exists a bijection between the two spectra and (for fixed , , and ), so that can be inferred from .
Assume that the local dynamics (5) is controllable and observable (i.e.
and , respectively). Then,
with and .
We first show that for all . For , it is clear that . In the case , it follows from Lemma 1 that for some . This implies that there exists such that, for some ,
Since , we have .
Now we show that there does not exist such that for all . Assume such exists. If there exists that satisfies , then (15) is the unique solution of (14). We have and Lemma 1 implies . This is a contradiction. If all satisfy , then either
if and is in the image of , i.e. if is in the span of right eigenvectors of ;
if and is the right eigenvector of associated with the eigenvalue , i.e. is in the span of left eigenvectors of .
Since is a controllable pair, cannot be in the span of right eigenvectors of . Since is an observable pair, cannot be in the span of left eigenvectors of . It follows that the cases (b) and (c) are impossible, so that we have , which is a contradiction. This concludes the first part of the proof.
When the local dynamics of the units is not completely controllable or observable, it is still possible to infer the spectrum of from the spectrum of . When is in the span of the right eigenvectors of (or when is in the span of the left eigenvectors of ), it is easy to show that
with , so that
For instance, if (or ), we have
Proposition 1 does not hold, since different spectra of can be associated with the same spectrum . For instance, the spectra and are associated with the same spectrum . Note however that (13) still holds if one takes into account the multiplicity of the eigenvalues (provided that ). Moreover (12) can still be used to obtain the spectrum of from the spectrum of .
The spectral identification method is illustrated in the following simple example.
Example 1 (Linear system).
Consider a random network with vertices (see the adjacency matrix in Appendix C) with the local linear dynamics
and assume that the observation function is with (i.e. only one state of one unit is measured). Using the time series related to different initial conditions, the DMD algorithm provides an accurate estimate of the eigenvalues of the matrix (Figure 1(a)). Then the Laplacian spectrum of the network is also recovered by using (12) (Figure 1(b)).
Due to numerical imprecision, the eigenvalues of might be computed with some error (see e.g. Example 2). Moreover these eigenvalues cannot be obtained precisely in the case of heterogeneous populations of non-identical units, as we will see. In these situations, one can estimate the induced error on the eigenvalues of obtained with (15). Denoting by and a perturbed eigenvalue of and , respectively, we have the first order Taylor approximation
and (15) yields
It follows that measured eigenvalues of satisfying will induce a large error on the associated eigenvalue of . Since distinct measured eigenvalues , , yield eigenvalues approximating the same value , it can be advantageous to use a weighted average of these values . Assuming that the probability distribution of has a constant variance for all 222This is only an approximation, since non-dominant eigenvalues (i.e. satisfying ) might be computed by the DMD algorithm with larger errors., we can consider the weighted average
which is associated with a probability distribution characterized by the variance
3.2 Nonlinear systems with identical units
Now we show that the spectral network identification framework developed in the case of linear systems can easily be extended to nonlinear systems. Assume that the units have a nonlinear dynamics
with the (analytic) functions , , and . The units interact through the diffusive coupling
We make the standing assumption that the local dynamics (19) admit a stable fixed point and that the units synchronize, so that the solutions of (19)-(20) converge to the (stable) fixed point . The Jacobian matrix associated with (19)-(20) linearized at is given by
The proof follows from Proposition 1, with , , and . ∎
Proposition 2 implies that the Laplacian eigenvalues of the network can be obtained from the eigenvalues of the Jacobian matrix . Moreover, the eigenvalues of can be obtained from sparse measurement of the network dynamics. In the case of nonlinear systems, they are related to spectral properties of the dynamics defined in the framework of the so-called Koopman operator.
Let be a trajectory solution of (19)-(20) associated with the initial condition . We suppose that measurements of are obtained through a possibly nonlinear observation function , which depends on a few local states in our case. The Koopman operator(s) are a semi-group acting on the set of such functions . They are defined by
for all and . The value is thus the value that will be observed at time via for a trajectory which is at at time . One can verify that the are always linear and can therefore be characterized by spectral properties. Provided that is analytic, the spectral decomposition of the operator yields
where are the eigenvalues of the Koopman operator and are the so-called Koopman modes, which depend on [14, 16]. In addition, it can be shown that the DMD algorithm extracts the spectral properties of the Koopman operator from snapshots of (22) [23, 33]. Provided that the dominant Koopman modes are nonzero (see also Remark 3), the algorithm yields the dominant Koopman eigenvalues, which are the eigenvalues of the Jacobian matrix (21); see e.g. . According to Proposition 2, these eigenvalues can be used to retrieve the Laplacian eigenvalues.
The DMD algorithm can capture dominant eigenvalues only if the associated dominant Koopman modes (with ) are nonzero in (22). These dominant modes are given by , where are the right eigenvectors of and with the notation ; see e.g. . It follows that the dominant Koopman modes are nonzero if the pair is observable. In particular, we must have . In the following, we will assume that these conditions are satisfied.
Example 2 (Nonlinear system).
where is the state vector assigned to vertex . We choose the nonlinear observation function . Figure 2(a) shows that the DMD algorithm applied to times series related to different initial conditions retrieves the dominant eigenvalues of , but cannot compute the eigenvalues with a fast decay rate. However, the eigenvalues are redundant since two distinct eigenvalues are related to the same Laplacian eigenvalue, so that only dominant eigenvalues are sufficient to obtain the full Laplacian spectrum. Figure 2(b) shows that all the Laplacian eigenvalues are indeed obtained with good accuracy, although two additional values are predicted incorrectly. Note also that when two values are obtained for the same eigenvalue (as it can be observed in Figure 2(b)), one could use the weighted average (18) for a better approximation.
3.3 Non-identical units: impossibility results
So far we have considered the case of identical units sharing the same local dynamics. We now focus on the case of non-identical units and show that the spectral identification problem cannot be solved in this case. This is illustrated by the following example. Consider the one-dimensional linear local dynamics , (with ), where accounts for the heterogeneity of the units dynamics. The global dynamics of the network is given by , with . Let
Then we have
and (13) in Proposition 1 does not hold. It follows that the relation between the set of spectra and the set of spectra is not injective in the case of non-identical units, so that the Laplacian eigenvalues cannot be inferred from the (measured) eigenvalues of . This example shows that the spectral network identification problem cannot be solved even when the graph is unweighted and when only one unit differs from the others.
Now we consider the general nonlinear dynamics of non-identical units
where accounts for the heterogeneity of the units dynamics. We assume that the units are almost identical, so that . Note that the units do not synchronize perfectly, since the system admits a global fixed point (with ). We consider that the functions and are identical for all the units. There is almost no loss of generality, since these functions describe the connections between units, and these connections are already heterogeneous in the case of weighted graphs. In addition, this simplification does not significantly affect the theoretical results developed in the remaining of the paper.
The Jacobian matrix related to the system (24) (linearized around ) is
where is given by (21) and where is such that
with . Since the DMD algorithm provides the eigenvalues of , the spectral identification problem is to compute from . Equivalently, since the relationship between and has been established in Section 3.1 and 3.2, one has to estimate the eigenvalues of from the perturbed eigenvalues of (note that since ). In the case of unweighted graphs, it can be shown that if the perturbation is small enough, so that the spectral identification problem can be solved exactly. However this situation is very restrictive. For more general graphs, perturbation theory for matrix eigenvalues  can provide upper bounds on the difference between the eigenvalues of and , but these bounds are too conservative, especially when the network is large.
It is also noticeable that the linearized dynamics of identical agents that do not synchronize but converge to different equilibria are in general equivalent to the (linear) dynamics of non-identical agents. In such a case, we would thus encounter similar issues when inferring the spectral properties of the network .
In the next section, we show that a statistical approach can circumvent these limitations in the case of large networks.
4 Statistical approach to large networks
In the case of large graphs, most individual Laplacian eigenvalues have an insignificant influence on the network dynamics, making them very hard to identify precisely. On the other hand, each of them taken individually only captures a very small amount of information about the network structure. Therefore, recovering each individual eigenvalue is on the one hand impractical but on the other hand not really necessary. Instead, it is much more relevant and convenient to focus on statistical measures of the spectral density of the Laplacian matrix. In this section, we show that one can estimate the first spectral moments (2) of the Laplacian matrix from sparse measurements in the network. These spectral moments are related to statistical information on the degree distribution of the vertices (see Section 3). This approach is also well-suited to the case of non-identical units and can be used to obtain some information on the unweighted underlying graph.
4.1 Spectral moments of the Laplacian matrix
From a few eigenvalues of obtained with the DMD algorithm, one can compute the spectral moments of ; see Section 5.3 for details on the numerical method. Then the spectral moments of can be obtained from the spectral moments of , as shown in the following proposition.
Suppose that . Then the spectral moments of are given by
and the result follows. ∎
Note that this result holds for both cases of linear and nonlinear units. In the nonlinear case, we have , , and .
We will focus on the first two moments and it follows from Proposition 3 that
Example 3 (Large network).
We consider a random Erdős-Rényi graph with vertices and a probability (supposedly not known) for any two vertices to be connected. The weights of the edges are randomly distributed according to a uniform distribution on . The local dynamics of the units is given by (16) and the observation function is , corresponding to the measurement of one state of one unit. Note that the dynamics are linear, a choice made to provide the best illustration of the theoretical results. Examples of identification of large networks with nonlinear dynamics are shown in Section 6.
We use a heuristic method to estimate the spectral moments of (Figure 3), approximating the clusters of eigenvalues by the convex hull of values obtained with the DMD algorithm and assuming a uniform distribution of the eigenvalues within each cluster; see Section 5.3 for more details on the method. Using (26) and (27), we finally obtain the spectral moments and , which are close to the exact values. From (3) and (4), one obtains good approximations of the mean degree and quadratic mean degree . The results are summarized in Table 1. We also performed similar simulations for different random networks and computed in an automatic way the spectral moments of the Laplacian matrix. Table 2 shows that the mean relative error is of the order of .
|mean error (absolute)||0.10||0.29|
|mean error (relative)||0.07||0.13|
|root mean squared error (absolute)||0.13||0.37|
|root mean squared error (relative)||0.09||0.16|
4.2 Estimation of the spectral moments with non-identical units
Now we assume that the units are not identical and that their local dynamics are randomly distributed with known mean and variance (we will comment on the case of unknown mean and variance in Remark 4). We can then obtain an estimation of the spectral properties of the dynamics produced by the same network but with identical units, so that the results of Section 4.1 can be used to compute the moments of the Laplacian matrix.
We consider the local dynamics (24), with the objective of estimating the spectral moments of from the spectral moments of (see (25)). The matrix is a random block-diagonal matrix and the nonzero entries of are assumed to be independent random variables of zero mean and standard deviation , i.e. with
for and .
The spectral moments of are related to the moments of the averaged spectral density of . In particular, we will show that and . Then estimations and of the spectral moments of can be computed by considering the measured (random) values and instead of and . The expectation of and (with respect to the randomness on ) is equal to the exact spectral moments of . These results are summarized in the following proposition. The proof is given in Appendix A.
Assume that is a block-diagonal matrix that satisfies (28) and consider
Moreover, we have
where and are the mean vertex degree and the quadratic mean vertex degree, respectively.
In Proposition 4, the estimated values and are assumed to be obtained with a perfect estimation of the moments of (computed from the eigenvalues obtained with the DMD algorithm), but this is not the case in practice. For large networks, the variance of the estimated moments (29) and (30) is significantly smaller than the error on the estimation of the moments of , so that (29)-(30) provide approximations of the spectral moments of that are almost as good as in the case of identical units.
Remark 4 (Unknown local dynamics).
If we suppose that the average local dynamics of the units is not known (i.e. is not known in (24)), then is estimated by (e.g. measured on one unit), where and for all . In particular, and and we have and , with
We remark that the variance of the distribution of dynamics does not need to be known in this case.
Remark 5 (Other distributions and two different populations).
So far we have considered that all nonzero entries of are independently and identically distributed. We can also have another situation where the values of the subsystems are strongly correlated. Consider the random matrix
where is a random diagonal matrix whose elements are randomly distributed and satisfy , , and for . In this case, we can show that and , with