Weighted network estimation by the use of topological graph metrics

# Weighted network estimation by the use of topological graph metrics

## Abstract

Topological metrics of graphs provide a natural way to describe the prominent features of various types of networks. Graph metrics describe the structure and interplay of graph edges and have found applications in many scientific fields. In this work, the use of graph metrics is employed in network estimation by developing optimisation methods that incorporate prior knowledge of a network’s topology. The derivatives of graph metrics are used in gradient descent schemes for weighted undirected network denoising, network decomposition, and network completion. The successful performance of our methodology is shown in a number of toy examples and real-world datasets. Most notably, our work establishes a new link between graph theory, network science and optimisation.

graph metric derivatives, graph theory, network completion, network denoising, network decomposition, optimisation

## I Introduction

Graph theory has found applications in many scientific fields in an attempt to analyse interconnections between phenomena, measurements, and systems. It provides a data structure that naturally express those interconnections and also provides a framework for further analysis [1, 2]. A graph consists of a set of nodes and edges describing the connections between the nodes. The edges of binary graphs take the values of either 1 or 0 indicating the presence or absence of a connection, while in weighted graphs the edges are described by weights indicating the strength of the connection. Graphs have been extensively used in a variety of applications in network science such as biological networks, brain networks, and social networks [3, 4, 5, 6].

Graph metrics are functions of a graph’s edges and characterize one or several aspects of network connectivity [7, 8, 9]. Local metrics deal with the relation of specific nodes to the network structure while global metrics describe properties of the whole network. Graph metrics are largely used to describe the functional integration or segregation of a network, quantify the centrality of individual regions, detect community structure, characterize patterns of interconnections, and test resilience of networks to abrupt changes.

Estimation of a network’s structure or properties has been performed in a variety of contexts. Although a single definition does not exist, network estimation can include any algorithm or method that detects, enhances, generates, or increases some quality measure of networks. Link prediction and network completion deal with predicting the existence of missing edges based on the observed links in a binary graph [10, 11, 12, 13, 14]. Also, in [13] the prediction of missing nodes was attempted. So far such methods have dealt with detecting only whether a link (edge) exists or not, not with the estimation of its weight. Typical applications include predicting the appearance of future connections in social [10, 15] or biological [16] networks. The network reconstruction problem deals with composing networks that satisfy specific properties [17, 18, 19, 20]. This can be particularly useful when building null models for hypothesis testing. Network inference problem attempts to identify a network structure where the edges have been corrupted by a diffusion process through the network [21, 22]. More recently, reducing the noise in social networks has been attempted in [23, 24, 25].

In this work, we assume that prior information is available regarding an observed weighted undirected network. This prior information comes in the form of estimates of a number of topological graph metrics of the network. We utilise these estimates in an optimisation framework in order to adjust the weights of the observed network to satisfy those properties. There are many real-world cases where there is knowledge of a network’s structure but not exact or reliable network information [17], e.g. strength of connections between banks is known but exact connections are hidden for privacy issues in bank networks [26], modularity metrics of brain networks are similar between subjects [27], properties of the constituent networks may be known for mixed networks [24]. We demonstrate the utility of our methodology in three schemes.

Firstly, a network denoising scheme, where an observed network is a noisy version of an underlying noise-free network. By considering the error between the observed network’s graph metrics and the true network’s known metrics, in an iterative gradient descent process on the network’s weights, the resulting network is closer to the noise-free one. Using the node degrees as priors has been performed in binary networks in [24] and in terms of network reconstruction the knowledge of degrees has been employed in weighted networks in [17, 28]. In [23], the transitivity of a network was used in a network diffusion scheme to change a social network’s weights but without a specific stopping point. In [25], denoising has been attempted in the context of removing weak ties between users in a social networks. Here, we provide analytical and empirical proofs on the utility of denoising schemes that are based on the optimisation of various graph metrics through gradient descent.

Secondly, we develop a network decomposition scheme for the cases where an observed network is an additive mixture of two networks. Assuming that graph metrics are known for the two constituent networks we derive an algorithm that can estimate the networks by enforcing them to have specific values for graph metrics while keeping the reconstruction error between the original and estimated mixture small. Network decomposition has been traditionally applied in a different context, on decomposing graphs with disjoint nodes [29]. For factored binary graphs untangling has been performed by taking into account the degree distribution of the constituent graphs [24]. Here, we not only consider the degrees of a network and decomposing it into subgraphs (i.e. multiple graphs with disjoint nodes) but we use multiple graph metrics in additive graph mixtures.

Finally, we develop a weighted network completion scheme where some of the weights of the network’s edges are missing. Similarly to the two previous schemes, we adapt the missing weights such that the whole network obtains specific values for known graph metrics. Weighted network completion has not been performed in the literature per se, only the closely related matrix completion problem [30] and matrix completion on graphs [31] where the completion is aided by assuming that the rows and columns of the matrix form communities.

Therefore, in this paper, we provide a comprehensive description of theoretical and empirical results on the use of optimisation of graph metrics for various problems in network estimation. In section II we give some basic definitions and a brief introduction to graph theory and graph metrics. The network estimation methods are shown in Section III with the details of the three schemes, denoising III-A, decomposition III-B, and completion III-C. In section III-D we derive the graph metrics derivatives that are used in the optimisation methods. In section IV we apply our methodology to a number of toy examples and real data and in section V we put the results into context and discuss the utility that our method provides. Section VI concludes the paper.

## Ii Graph metrics

A weighted graph is defined by a finite set of nodes (or vertices) with , a set of edges of the form with and a weighted adjacency matrix with . In this work, we consider undirected graphs for which is symmetric, i.e. or . The entries in the weighted adjacency matrix (weight matrix from now on) indicate the strength of connection between nodes. We assume that networks are normalised, i.e. .

Graph metrics are scalar functions of the weight matrix, i.e. . Global metrics map the weight matrix into a single value and therefore attempt to simply quantify a specific property of a network. Local metrics on the other hand, quantify some property of the network separately for each node with , potentially resulting in functions and separate values.

Although graph metrics were originally defined on binary (unweighted) networks, the conversion to weighted metrics is usually but not always straightforward [8, 32, 33, 34, 35, 36]. The main motivation of this study can be recognised by pointing out that a network can be adjusted by changing the matrix W so that it obtains a certain value for some graph measure. There are numerous graph metrics that describe various features of networks [8, 9]. The main properties that they measure are:

• Integration (ability of the network to combine information from distributed nodes).

• Segregation (ability of the network for specialised processing in densely interconnected node groups).

• Centrality (characteristic that describes the presence of regions responsible for integrating total activity).

• Resilience (ability of a network to withstand weight changes).

• Motifs (presence of specific network patterns).

• Community structure (separation of a network to various functional groups).

• Small-worldness (highly segregated and integrated nodes).

There are graph metrics that contain quantities which are themselves a product of an optimisation procedure (e.g. module structure, shortest path length). These quantities are considered outside of the scope of this work.

### Ii-a Definitions

Here we show some definitions and useful matrices that are used in the following sections.

## Iii Network Estimation

In this section we formulate the optimisation methodologies for the three schemes considered in this work.

### Iii-a Denoising

Suppose that a weight matrix of a network is corrupted by additive noise:

 We=W+E (1)

The error matrix can be considered as a network unrelated to the network structure being considered. For example, could be social calls when trying to detect suspicious calls in social networks [24], the effect of volume conduction in EEG based brain network connectivity, or measurement noise. A different type of noise that occurs in networks, the effect of missing values is treated in section III-C.

If we assume that we have estimates of graph metrics of the original , i.e. where , then we can formulate a cost function that measures the deviation of the observed weight matrix’s metrics to the estimates as:

 c(We)=∑me2m(We)=∑m(fm(We)−Km)2 (2)

 W(t+1)e=Wte−μ∑mem(Wte)dfm(Wte)dWte (3)

where is the iteration index and the learning rate. For the case of , Equation (3) describes the traditional single function gradient descent. For , it can be considered an equally weighted sum method of multiobjective optimisation, motivated by the fact that the graph metrics are in the same range. Multiobjective optimisation enables the weighting of different metrics to accommodate priorities on which are more important for a specific task. Such weighting is considered above the scope of this work. The full denoising procedure is described in Algorithm 1. Note that values below 0 and above 1 are truncated to zero and one respectively.

In Appendix A we provide a proof that for convex cost functions , denoising guarantees error reduction. The implication of that is that when a graph metric results in a convex cost function, such as the degrees () of the network, then the optimisation of Algorithm 1 with will always converge to a solution that is closer to the original network W than is. For non convex metrics, there is no such guarantee. In Section III-A we show empirical results on the extent of that effect. In the Appendix C we show the proof that cost functions based on graph metrics such as the degree are convex.

### Iii-B Decomposition

Suppose that we observe a mixed network that arises as a combination of two networks:

 Wf=W1+W2 (4)

If we have estimates of some topological properties of the two networks, i.e. , then we can utilise these information to infer the networks from their mixture. This could be accomplished separately for each network using Algorithm 1. However, since we know the mixture , we utilise that in the following optimisation problem:

 argminW1,W2 ∑m(f1m(W1)−K1m)2+(f2m(W2)−K2m)2 subject to ||Wf−(W1+W2)||2F≤ξ

We solve this optimisation problem with alternating minimisation since it is a function of two matrices. We fix one of the two weight matrices and solve the following optimisation problem for the other one in an alternating fashion:

 argminW1 ∑m(f1m(W1)−K1m)2+λ||W1−(Wf−W2)||2F (5)
 argminW2 ∑m(f2m(W2)−K2m)2+λ||W2−(Wf−W1)||2F (6)

where the constraint has been incorporated into the cost function through the penalty parameter . Each separate minimisation, Algorithm 2, resembles Algorithm 1 but in this case deviations from or are penalised. This whole procedure is shown in Algorithm 3.

The motivation for the procedure in Algorithm 3 is the following. Consider estimating the two weight matrices only by using the denoising algorithm for each one separately. The estimate of can be written as where indicates the error from the true weight matrix . Similarly for resulting in . Therefore, it is evident that the estimates of the weight matrices are not ideal whenever . Note that if it does not necessarily imply that the estimates of the weight matrices are optimal since it is possible that . It would be optimal if we could constraint the estimate e.g. as:

 argminW1 ∑m(f1m(W1)−K1m)2 subject to ||E1||2F=0

However that would require knowledge of the true weight matrix . Instead, note that:
. Therefore by reducing in Eq. (5), we are reducing the total error. In other words we are solving the following constrained optimisation problem:

 argminW1 ∑m(f1m(W1)−K1m)2 (7) subject to ||E1+E2||2F≤ξ

The inequality is incorporated such that when optimising and vice versa. Consider the extreme cases. Firstly, when . That would imply that which would render the graph measure optimisation ineffective. On the other hand, a large implies small which would render the constraint ineffective. In our implementation we adjust the parameter such that the reconstruction error, i.e. slowly decreases over the iterations of the alternating minimisation algorithm. Note the well known fact that there is a one-to-one correspondence between in Eq. 7 and in Eq. 5.

### Iii-C Completion

For the case that a set of entries of the weight matrix are missing, we can perform matrix completion by:

 (8)

where is a matrix with ones at the set of missing entries and zeroes everywhere else. Similar to the previous two cases, the assumption is that if the true graph metrics are known, gradient descent will adjust the missing weights close to their true values. The missing entries of the incomplete weight matrix can be initialised to the most likely value of the network (). This can be considered as a denoising procedure with the missing weights equal to ‘noisy’ weights of value . In Algorithm 4 we describe the network completion procedure.

### Iii-D Derivatives of graph metrics

In this section we derive the expressions for the derivatives of popular graph metrics that describe some important properties of networks. We deal with: degree, average neighbour degree, transitivity, clustering coefficient, modularity. More details can be found in Appendix B-A.

#### Degree

The degree of a node describes the connection strength of that node to all other nodes:

 kwi=∑jwij=tr{WRi} (9)

with the degree derivative being:

 ∂kwi∂W=RTi (10)

Since is non-zero only for column it can be computed efficiently as:

 ∂kwi∂wi=1nT (11)

where is the column of W.

#### Average neighbour degree - resilience

The average neighbour degree for node is given by:

 NDi=∑jwijkwjkwi=tr{W2Ri}tr{WRi}=ρτ (12)

The derivative of the average neighbour degree is:

 ∂NDi∂W=τ(WRi+RiW)T−ρRTiτ2 (13)

#### Transitivity - segregation

The transitivity is a global measure of the segregation of a network and here we defined it as (see also [23]):

 T=∑ijhwijwihwjh∑ij∑hwihwjh=tr{W3}tr{WHnW}=αβ (14)

The transitivity derivative is:

 ∂T∂W=(3βW2−α(WHn+HnW)β2) (15)

#### Clustering coefficient - segregation

The clustering coefficient for node is a local measure of the clustering of a network. It is defined as:

 Ci=∑jhwijwihwjh∑jhwijwih={W3}ii{WHnW}ii=tr{SiiW3}tr{SiiWHnW}=γiζi (16)

The derivative of the clustering coefficient is:

 ∂Ci∂W= ⎛⎜ ⎜ ⎜ ⎜⎝3ζi2∑r(WrSiiW2−r)−γi(STiiW% THT+HTWTSTii)ζ2i⎞⎟ ⎟ ⎟ ⎟⎠

#### Modularity - community structure

Modularity metrics the tendency of a network to be divided into modules. Here we deal with optimising the modularity in terms of the network weights, not in terms grouping nodes into modules. Modularity can be written as (see also [37]):

 M=1lw∑ij(wij−kwikwjlw)δij (17)

where whenever nodes and belong to the same module and zero otherwise.

The modularity derivative is expressed as:

 ∂M∂W=∂m1∂W−∂m2∂W (18)

with:

 ∂m1∂W=lwΔ−θOTn(lw)2 (19)

and:

 ∂m2∂W=n∑r=1lw(CrWΔT+CTrWΔ)−2ξrOTn(lw)3 (20)

where is a circular shift matrix that shifts down the rows of the matrix on the right by and . See Appendix B-B for more details.

#### Local and global metrics

Any graph measure that operates locally on node can be cast into its global (full network) form by evaluating the gradient as the average of the nodes’ derivatives:

 ∂f∂W=1n∑i∂fi∂W (21)

## Iv Results

### Iv-a Denoising

#### Synthetic Networks

In this section we show results of applying the denoising algorithm for various cases. We create synthetic undirected networks of three types:

• Random complete network ,

• Scale free weighted network where the degrees are distributed with the power law and the non zero edges are given weights . The network was created according to [38] with average degree of 5.

• A modular network where the weights exhibit community structure in a number of modules. The network was created with the BCT toolbox [8]. The network consists of 8 modules and of the non-zero weights in the modules.

For each case we add a noise matrix where each entry of E, is normally distributed with mean 0 and standard deviation 1:

 We=W+σE (22)

Weights of that go below 0 are set to zero, and subsequently the weight matrix is normalised by dividing by its maximum value. In that way we guarantee that all elements of are between 0 and 1.

Firstly, we show the error reduction of the scheme for various noise levels and networks of 128 nodes. We define error reduction as the ratio of the error of the denoised network to the error of the noisy network :

 er=1−||^W−W||F||We−W||F (23)

Error reduction is measure in the domain where indicates perfect denoising. Negative values indicate larger error after applying the denoising algorithm.

In Figures 1, 2 and 3 we show the error reduction for an increasing noise level and different graph metrics for the random, scale-free and modular network respectively. Each noise level considers the average of 50 noise matrix realisations. Note that even though the error reduction increases as the noise increases, in absolute terms the error always increases.

Next, we show the error reduction in terms of the number of nodes in the network and a noise level of , see Figure 4.

#### Real EEG data

The denoising algorithm was applied on two electroencephalography (EEG) datasets on a memory task from Alzheimer’s patients and control subjects [39]. In dataset-1, there were 128-channel recordings from 13 patients with mild cognitive impairment (MCI) and 19 control subjects while for dataset-2 there were 64-channel recordings from 10 patients with familial Alzheimer’s disease (FAD) and 10 control subjects. For both datasets we selected the common subset of 28 electrodes that have the exact same locations on the scalp. For each subject we split the recording into 1 second epochs and computed the connectivity network matrix of the first 50 epochs and of each epoch separately. We used the imaginary part of coherence as connectivity metric and considered the alpha (8-12Hz) spectral band.

We tested two settings. Firstly, a train-test scheme where connectivity matrices are obtained for the two datasets separately and the weight matrices of the test set (dataset-1, MCI) are denoised according to the graph metrics of the training dataset (dataset-2). Secondly, a within-subject denoising setting where the connectivity matrix of the first 50 epochs of a subject (from dataset-2, FAD) is considered as the training weight matrix , and each of the other matrices is a test matrix to be denoised. Here, we used the transitivity and degree as graph metrics combining both the global and local properties of brain activity.

In Table I we show the results of the denoising algorithm when using as target values the graph metrics of a) the same subject, b) the same group (patient/control), c) the opposite group. The mean square error (MSE) is computed against the training network . Differences between subjects were significant under a unpaired ttest, , for all pairwise comparisons except between the same group and opposite group. In Figure 5 we show an example of denoised networks for one trial of a subject.

### Iv-B Decomposition

#### Synthetic Networks

In this section we show example results of the decomposition scheme by considering by mixing a modular and scale-free network. The modular network consists of 8 modules. For the modular network we use the modularity and for the scale free network we use the transitivity as graph metrics to optimise. In Table II we show the average error reduction of the two networks of Algorithm 3 as compared with only denoising the two networks separately. In this cases error reduction for a single network is defined as:

 er=1−||Wdec−W||F||Wden−W||F (24)

For all cases the penalty parameter is adjusted such that the reconstruction error is reduced w.r.t. iterations of the alternating minimisation procedure.

#### Airline data

The dataset is a binary network that contains the networks for 37 airlines and 1000 airports [40]. We converted each airline’s binary network to a weighted network by adjusting any existing edge between two airports to the total number of edges for all airlines. The mixed network consists of two networks of different airlines mixed together (Lufthansa, Ryanair) for a subset of 50 nodes (airports). In Figure 6 we show the decomposition result by using only global graph metrics (transitivity and global clustering coefficient).

### Iv-C Completion

#### Synthetic Networks

Here we show the results of the network completion Algorithm 4 for an increasing number of missing entries and number of nodes. Each separate case considers the average of 50 noise matrix realisations.Here we deal with a random network with . For each case, we optimise three graph metrics (transitivity, degree, clustering coefficient) and show the error reduction of the completion procedure. The missing values are initialised to 0.5 since this is the mean of the uniform distribution. This initialisation produces the smallest distance to the true network from all possible initialisations. Error reduction is calculated the same way as in Eq. 24 with being the initialised network. Results are shown in Figure 7.

#### Real Networks

We applied the completion algorithm on the following datasets. 1) USAir: a 330 node network of US air transportation, where the weight of a link is the frequency of flights between two airports [41]. 2) Baywet: a 128 node network which contains the carbon exchanges in the cypress wetlands of south Florida during the wet season [42]. The weights indicate the feeding levels between taxons. 3) Celegans: the neural network of the worm C. elegans. Nodes indicate neurons and the edges are weighted by the number of synapses between the neurons [43]. In Figure 8 we show the results of the completion algorithm 4 for all three networks and different graph metrics.

## V Discussion

The optimisation schemes described in this work enable the adjustment of a network’s weight matrix to fulfil specific properties. The utility of the denoising scheme (Section III-A) was evaluated on a number of cases including real-world data. It has to be pointed out that for convex graph metrics, the denoising scheme is guaranteed to converge to a network that is closer to the true underlying network than the noisy observed network. In Figure 1 it is observed that considering the degrees of the nodes overshadows any other global metric’s performance. For non convex graph metrics there is no guarantee but as shown in Figures 1 and 2, the estimated network is a better estimate of the underlying network than the original noisy version, even for increasing noise and different network types. The only exceptions to this can be observed for the clustering metrics (transitivity, clustering coefficient) of the modular network in Figure 3. This can be explained by noting that in our example most of the weights () are clustered in modules while the denoising algorithm operates on all the weights. Constraining the weight updates only in the modules alleviates that problem. The utility of this scheme is also displayed in Figure 4 where an increase in the number of nodes does not affect the performance. We point out that although the error reduction increases as the number of nodes increases, the error actually increases in absolute terms.

We also tested the efficacy of the scheme in a real world EEG connectivity dataset of Alzheimers patients and control subjects. When performed in a within subject fashion, and splitting each subject’s data into a train set to obtain estimates of the graph metrics, and a test set to apply the denoising algorithm, we successfully reduced the variability of the network regarding background EEG activity. More importantly, though, in a leave-subject-out procedure; using other subject’s graph metrics as prior knowledge the algorithm was able to decrease the noise of the network, albeit not as much as in the within subject paradigm as expected. This has important implications in the EEG and related fields (e.g. BCI, fMRI) where subject independent paradigms are necessary to obtain practically usable and consistent performance [44]. Furthermore, such an approach can be useful in any application that can obtain prior knowledge of the structure of the network under consideration. Our work extends prior work in the network denoising field [25, 24, 23] by providing theoretical proofs and empirical evidence that it is viable for a variety of network types. We also provide the strong proof that the degree of a network, resulting in a convex cost function, is very important in network estimation.

When utilising graph metrics in the network decomposition scheme (Section III-B) the first observation is that the error reduction is greater by using information from both networks that combine to produce the mixed network. As shown in Table II the resulting estimates from the alternating minimisation procedure of Algorithm 3 confirm the intuition behind that methodology. It has to be noted that this is expected since more information is used as compared to only denoising. Namely, that the constraints employed in the optimisation problems of (5) and (6) can result in the behaviour of optimisation problem (7). Also in Table II, we show the error reduction as a function of the number of nodes. The choice of global graph metrics, modularity and transitivity, further demonstrates the utility of the methodology in setting where local information would be difficult to obtain.

The decomposition algorithm was tested on real data on mixed airline networks for both global and local only metrics. Assuming prior information on the structure of the networks, the algorithm was able to produce reliable estimates for the underlying networks, see Figure 6. Network decomposition for mixed networks has been rarely attempted in the literature. Our work provides a new formulation that takes account prior knowledge and the use of the reconstruction error in the estimation process. This is in contrast with [29, 24] where only factor graphs are considered.

Weighted network completion has not been attempted in the literature and here we employed graph metrics as the driving force behind estimating the missing weights. For modest sizes of missing entries we showed that there is significant benefit of using the completion Algorithm 4 up to 128 node networks and using three graph metrics. Similarly, for real networks, we show that the knowledge of graph metrics can aid in completing the network. More importantly, there is a benefit from the knowledge of global metrics as seen in Figure 8.

The choice of which graph measure should be used depends on the application and on which may be available. Local metrics assume knowledge of individual nodes’ properties as is typical for e.g EEG applications (Figure 5) but may not be the case for the airport data (Figure 6). There are two limitations of this study that will be addressed in future work. Computational complexity and scaling the efficacy to large networks. Although many real world networks (e.g. EEG, Social Networks, Weather Networks, Airline Networks) are of the size that we consider in this work ( of nodes) there exist networks that consist of very large number of nodes ( of nodes). The calculation of the derivatives increases in the order of making the computation slow and inefficient. As also discussed in [45], this is an open issue in graph based metrics. For large computational operations on graphs, polynomial approximations have been proposed [46]. Similarly, as a network increases in size, the number of graph metrics should be increased in order to produce comparable performance to the medium sized networks considered here. This is because the number of unknown parameters to estimate increases therefore more graph metrics are necessary. In order to complement the popular graph metrics that we used in this work, we are considering other potential candidates such as ones based on graph spectral methods, and statistical graph analysis.

## Vi Conclusions

In this work we developed three mathematical optimisation frameworks that were utilised in network denoising, decomposition and completion. The basis of the methodology lies in adjusting a network’s weights to conform with known graph measure estimates. We derived expressions for the derivatives of popular graph metrics and designed algorithms that use those derivatives in gradient descent schemes. We tested our proposed methods in toy examples as well as real world datasets.

The work performed here has the following implications for network estimation. Firstly, we showed that the use of graph metrics for network denoising reliably reduces the noise in an observed network for both convex and non-convex graph metrics. Also, by combining multiple graph metrics, further reduction is ensued. Depending on the type of network, some metrics may be more appropriate than others. Modularity works well for modular networks while degree seems to perform well for both random and scale free networks. For network decomposition, the use of global information as prior knowledge was sufficient to separate the underlying networks from their mixture. Such a framework can be the basis for constrained matrix or tensor decomposition of dynamic networks or multilayer networks. Finally, we provide a new weighted network completion paradigm that can complement existing matrix completion algorithms.

Other applications of our methodology can be weighted network reconstruction; the field that designs networks from scratch fullfilling specific criteria (e.g. a specific value for transitivity). The design of such network from scratch can be performed by transversing the level sets of a graph measure through the graph measure derivatives. Link prediction can also be incorporated by considering not only the weight similarities between different nodes but also the similarity between their derivatives.

## Acknowledgment

We would like to thank Dr Mario Parra Rodriguez (Heriot-Watt University) for making the EEGs available to us.

## Appendix A

###### Theorem 1.

Let and with . Suppose . If the cost function is convex, then gradient descent with iteration index :

 zk+1=zk−λ∇c(zk) (25)

will converge to a point such that:

 ||x−zK||2≤||x−z||2 (26)

for any z and small enough .

###### Proof.

The vectors denote the vectorised versions of the adjacency matrices W. This theorem describes the situation that a network (x) is corrupted by noise (y) resulting in a noisy network (). The purpose of this proof is to show that gradient descent with convex will converge to which is a better, in terms of distance, estimate of x than z. The proof does not try to show that gradient descent on convex functions achieves a global minimum; instead that the minimum achieved is a better estimate of the network than the original noisy network z. Note that for graph metrics in general, there are infinite solutions for a matrix W that achieves a minumum point.

For strictly convex functions, the proof is trivial since there is a unique minimum which corresponds to x only and gradient descent will converge to that value.

For both convex and strictly convex functions the proof is as follows. The sublevel sets, of a convex function are convex sets. Furthermore, for any . For any point with , the negative gradient forms a right angle with the level set at (by definition). Since the level sets are convex sets, the gradient update leads to a point such that the angle between and is acute in the triangle for any (see Figure 9) and small enough step size. Since the optimum point x is also contained in and the relation between the distances , this implies that for any z and k provided that the step size is small enough. Small enough in the sense that the gradient step must not cross the level set of . ∎

## Appendix B

### B-a Derivatives of scalar functions of matrices

Differentiating a scalar function w.r.t. a matrix , , is essentially a collection of derivatives w.r.t. the separate matrix elements placed at the corresponding indices, i.e.:

 ∂f∂W=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣∂f∂w11∂f∂w12⋯∂f∂w1n∂f∂w21∂f∂w22⋯∂f∂w2n⋮⋮⋱⋮∂f∂wn1∂f∂wn2⋯∂f∂wnn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (27)

Expressing the derivatives in matrix form allows easy and scalable formulation of the derivatives irrespective of the number of entries. If the derivative of a specific element of , indexed by , is required to be processed separately, this can be performed by selecting the same index of the derivative matrix. For example .

In the case of undirected networks, where the weight matrices are symmetric, the following adjustment needs to be made to ensure that the derivatives are themselves symmetric:

 dfdW=∂f∂W+(∂f∂W)T−diag(∂f∂W) (28)

The formulations in the text are in terms of the partial derivatives for simplicity.

### B-B Modularity derivative

The modularity is written as:

 M=1lw∑ij(wij−kwikwjlw)δij (29)

where whenever nodes and belong to the same module and zero otherwise. The term can be written as:

 m1=1lw∑ij(wijδij)=tr{WΔT}tr{WOn}=θlw (30)

where is the matrix that contains the . The gradient is given by:

 ∂m1∂W=lwΔ−θOTn(lw)2 (31)

The other term can be written as:

 m2=1(lw)2∑ijkwikwjδij=1(lw)2∑ijklwikwjlδij= (32) 1(lw)2n∑r=1tr{WTCrWΔT}=1(lw)2n∑r=1ξr (33)

where is a circular shift matrix that shifts down the rows of the matrix on the right by .

## Appendix C

###### Theorem 2.

For any weighted network with a graph measure of the type the function is convex for any matrix A.

###### Proof.

In order to show that is convex it suffices to show that the Hessian of is positive semidefinite. We will use differential notation [47] to calculate the Hessian which is defined as:

 {Hg}ij≡∂2g∂xi∂xj (34)

where is an element of the vectorized weight matrix W. The first differential of is:

 dg(W)=2(tr{WA}−K)tr{dWA} (35)

The second differential is:

 d2g(W)=2tr{dWA}tr{dWA} (36)

Using the relation between the trace and the vec operator, i.e. and the circular property of the trace, i.e. :

 d2g(W)=2vec(dWT)Tvec(A)vec(AT)Tvec(dW) (37) =2vec(dW)TKnnvec(A)vec(A% T)Tvec(dW) (38) =vec(dW)T2vec(AT)vec(AT)Tvec(dW) (39)

where is the commutation matrix satisfying . Note that we have ‘commuted’ from to . The second differential was brought to the form which means that the Hessian is [47]:

 Hg=12(Z+ZT) (40)

The matrix Z is of the type . Since c is a vector the product is rank-one producing a single nonzero positive eigenvalue. Therefore Z is positive semi-definite and hence the Hessian is positive semi-definite. ∎

### References

1. B. Bollobas, Graph theory: an introductory course.   Springer Science & Business Media, 2012.
2. J. L. Gross and J. Yellen, Graph theory and its applications.   CRC press, 2005.
3. N. Deo, Graph theory with applications to engineering and computer science.   Courier Dover Publications, 2016.
4. A.-L. Barabási, “Network science: Luck or reason,” Nature, vol. 489, no. 7417, pp. 507–508, 2012.
5. N. M. Tichy, M. L. Tushman, and C. Fombrun, “Social network analysis for organizations,” Academy of management review, vol. 4, no. 4, pp. 507–519, 1979.
6. M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the national academy of sciences, vol. 99, no. 12, pp. 7821–7826, 2002.
7. A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani, “The architecture of complex weighted networks.” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 11, pp. 3747–3752, 2004.
8. M. Rubinov and O. Sporns, “Complex network measures of brain connectivity,” NeuroImage, vol. 52, pp. 1059–1069, 2010.
9. A. Laita, J. S. Kotiaho, and M. Mönkkönen, “Graph-theoretic connectivity measures: What do they tell us about connectivity?” Landscape Ecology, vol. 26, no. 7, pp. 951–967, 2011.
10. D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” Journal of the American Society for Information Science and Technology, vol. 58, no. May 2007, pp. 1019–1031, 2007.
11. D. S. Goldberg and F. P. Roth, “Assessing experimentally derived interactions in a small world.” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 8, pp. 4372–4376, 2003.
12. L. Lu and T. Zhou, “Link prediction in complex networks: a survey,” Physica A: Statistical Mechanics and its Applications, vol. 390, no. 6, pp. 1150–1170, 2010.
13. M. Kim and J. Leskovec, “The Network Completion Problem: Inferring Missing Nodes and Edges in Networks,” SIAM International Conference on Data Mining, pp. 47–58, 2011.
14. S. Hanneke and E. P. Xing, “Network Completion and Survey Sampling,” Aistats, vol. 5, pp. 209–215, 2009.
15. M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link prediction using supervised learning,” in SDM06: workshop on link analysis, counter-terrorism and security, 2006.
16. P. Symeonidis, N. Iakovidou, N. Mantas, and Y. Manolopoulos, “From biological to social networks: Link prediction based on multi-way spectral clustering,” Data & Knowledge Engineering, vol. 87, pp. 226–242, 2013.
17. R. Mastrandrea, T. Squartini, G. Fagiolo, and D. Garlaschelli, “Enhanced reconstruction of weighted networks from strengths and degrees,” New Journal of Physics, vol. 16, 2014.
18. T. Squartini and D. Garlaschelli, “Analytical maximum-likelihood method to detect patterns in real networks,” New Journal of Physics, vol. 13, 2011.
19. K. Bleakley, G. Biau, and J.-P. Vert, “Supervised reconstruction of biological networks with local models,” Bioinformatics, vol. 23, no. 13, pp. 57–65, 2007.
20. M. Filosi, R. Visintainer, S. Riccadonna, G. Jurman, and C. Furlanello, “Stability indicators in network reconstruction,” PLoS ONE, vol. 9, no. 2, 2014.
21. M. Gomez-Rodriguez, J. Leskovec, and A. Krause, “Inferring networks of diffusion and influence,” Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’10, vol. 5, no. 4, pp. 1019–1028, 2010.
22. S. Myers and J. Leskovec, “On the convexity of latent social network inference,” in Advances in Neural Information Processing Systems, 2010, pp. 1741–1749.
23. M. Aghagolzadeh, M. Al-Qizwini, and H. Radha, “Denoising of network graphs using topology diffusion,” Conference Record - Asilomar Conference on Signals, Systems and Computers, vol. 2015-April, no. 1, pp. 728–732, 2015.
24. Q. D. Morris and B. J. Frey, “Denoising and Untangling Graphs Using Degree Priors,” Advances in Neural Information Processing Systems 16, pp. 385–392, 2004.
25. H. Gao, X. Wang, J. Tang, and H. Liu, “Network denoising in social media,” Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - ASONAM ’13, pp. 564–571, 2013.
26. G. Caldarelli, A. Chessa, F. Pammolli, A. Gabrielli, and M. Puliga, “Reconstructing a credit network,” Nature Physics, vol. 9, no. 3, pp. 125–126, 2013.
27. O. Sporns and R. F. Betzel, “Modular Brain Networks,” Annual Review Psychology, vol. 67, pp. 613–640, 2016.
28. B. Huang and T. Jebara, “Exact graph structure estimation with degree priors,” 8th International Conference on Machine Learning and Applications, ICMLA 2009, pp. 111–118, 2009.
29. C. Nash-Williams, “Decomposition of finite graphs into forests,” Journal of the London Mathematical Society, vol. 39, no. 12, pp. 157–166, 1964.
30. R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010.
31. V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst, “Matrix Completion on Graphs,” arXiv, p. 10, 2014. [Online]. Available: http://arxiv.org/abs/1408.1717
32. M. Barthélemy, A. Barrat, R. Pastor-Satorras, and A. Vespignani, “Characterization and modeling of weighted networks,” Physica A: Statistical Mechanics and its Applications, vol. 346, no. 1-2 SPEC. ISS., pp. 34–43, 2005.
33. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical Review E, vol. 69, no. 2, p. 026113, 2004.
34. G. Fagiolo, “Clustering in Complex Directed Networks,” Reason, vol. 162, no. August, pp. 139– 162, 2004.
35. J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski, “Intensity and coherence of motifs in weighted complex networks,” Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, vol. 71, no. 6, pp. 1–4, 2005.
36. T. Opsahl and P. Panzarasa, “Clustering in weighted networks,” Social Networks, vol. 31, no. 2, pp. 155–163, 2009.
37. Y.-T. Chang and D. Pantazis, “Modularity Gradients: Measuring the contribution of edges to the community Structure of a brain network,” IEEE 10th International Symposium on Biomedical Imaging, pp. 536–539, 2013.
38. B. J. Prettejohn, M. J. Berryman, and M. D. McDonnell, “Methods for generating complex networks with selected structural properties for simulations: a review and tutorial for neuroscientists.” Frontiers in computational neuroscience, vol. 5, no. March, p. 11, 2011.
39. M. Pietto, M. A. Parra, T. N., F. F., G. A.M., B. J., R. P., M. F., L. F. I. A., and B. S., “Behavioral and Electrophysiological Correlates of Memory Binding Deficits in Patients at Different Risk Levels for Alzheimerâs Disease,” Journal of Alzheimer’s Disease, vol. 53, pp. 1325–1340, 2016.
40. A. Cardillo, J. Gómez-Gardenes, M. Zanin, M. Romance, D. Papo, F. del Pozo, and S. Boccaletti, “Emergence of network features from multiplexity,” arXiv preprint arXiv:1212.2153, 2012.
41. “Pajek dataset; 2006. available: http://vlado.fmf.uni-lj.si/pub/networks/data/.”
42. “The koblenz network collection; 2015. available: http://konect.uni-koblenz.de/.”
43. D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks.” Nature, vol. 393, no. 6684, pp. 440–2, 1998.
44. L. Spyrou, Y. Blokland, J. Farquhar, and J. Bruhn, “Optimal multitrial prediction combination and subject-specific adaptation for minimal training brain switch designs,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. PP, no. 99, 2015.
45. D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
46. D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129 – 150, 2011.
47. J. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics.   Wiley, 1989.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters