A Connectedness Constraint for Learning Sparse Graphs

A Connectedness Constraint for Learning Sparse Graphs

Martin Sundin, Arun Venkitaraman, Magnus Jansson, Saikat Chatterjee ACCESS Linnaeus Center, KTH Royal Institute of Technology, Sweden
Email: masundi@kth.se, arunv@kth.se, janssonm@kth.se, sach@kth.se
Abstract

Graphs are naturally sparse objects that are used to study many problems involving networks, for example, distributed learning and graph signal processing. In some cases, the graph is not given, but must be learned from the problem and available data. Often it is desirable to learn sparse graphs. However, making a graph highly sparse can split the graph into several disconnected components, leading to several separate networks. The main difficulty is that connectedness is often treated as a combinatorial property, making it hard to enforce in e.g. convex optimization problems. In this article, we show how connectedness of undirected graphs can be formulated as an analytical property and can be enforced as a convex constraint. We especially show how the constraint relates to the distributed consensus problem and graph Laplacian learning. Using simulated and real data, we perform experiments to learn sparse and connected graphs from data.

I Introduction

Graphs are naturally sparse objects which describe relations between different data sources. Graphs are classically used in network problems, such as distributed estimation [1, 2, 3, 4]. Other examples are webpage ranking [5, 6] and relations in social networks [7, 8]. More recently, graphs have been used to generalize signal processing concepts, such as transforms, to signals defined on graphs [7, 9, 10, 11, 8, 12]. Often, the graph in question is sparse. For example, in social networks, there are many users and each user is only connected to few other users, resulting in a sparse graph. In many applications, the graph is not given a-priori but needs to be learned from data [13, 14, 15, 16, 17, 18]. When learning sparse graphs, one challenge is that promoting sparsity may make the graph disconnected. Connectedness is hard to incorporate as it is often treated as a combinatorial property [18]. In this article we show that connectedness is an analytical property that can be formulated as a convex constraint and subsequently be used efficiently in learning.

I-a Preliminaries and notations

A graph is commonly defined as a set of nodes/vertices and edges together with an adjacency matrix . Two nodes and share an edge if . The elements of the adjacency matrix describes the strength of the connection between two nodes. The relation between and is: For undirected graphs, the adjacency matrix is symmetric. In this article we consider real symmetric adjacency matrices with non-negative components.

We denote the matrix determinant by , the matrix trace by . We use () to denote positive (semi) definiteness of symmetric matrices and to denote element-wise greater or equal. We use to denote the vector consisting of only ones and to denote the identity matrix. The ’th largest eigenvalue of a matrix is denoted by and the element-wise -norm by .

I-B Problem statement

Many graph learning problems can be posed as finding the adjacency matrix that minimizes an objective function under appropriate constraints. To make the adjacency matrix sparse, a sparsity promoting penalty function is often introduced. The graph is thus obtained by solving the optimization problem:

(1)

where is a regularization parameter and is symmetric. A few examples of , and are shown in Table I. Signal Processing on Graphs (SPG) and the graphical LASSO are data-driven problems while the distributed consensus problem only uses the topology of an underlying graph. We note that all optimization problems in Table I are convex.

Problem constraints
Consensus [4, 15, 16] , , for
Graphical LASSO [13]
SPG [12, 17]
TABLE I: Few examples of , and that can be used in (1).

Often, the learned graph becomes disconnected for large values of . This means that the resulting graph describes two or more separate non-interacting systems. In this article, we address the issue of learning sparse connected graphs by formulating a convex constraint which preserves connectedness. Our contributions are as follows:

  1. We analytically formulate connectedness in terms of a weighted Laplacian matrix.

  2. For the distributed consensus problem we show that the graph splits into several components when the parameter exceeds some value.

We illustrate the validity of the proposed constraint through numerical simulations considering both synthetic data and real world temperature data.

Ii Preserving the connectedness of graphs

The connectedness of a graph is usually described by the spectrum of the graph Laplacian matrix defined using the incidence matrix [19]. As the Laplacian is not a continuous function of the adjacency matrix , preserving connectedness in terms of the Laplacian matrix leads to combinatorial optimization problems. For this reason we instead consider the weighted graph Laplacian matrix with elements

the matrix can thus be expressed as . We note that for

The weighted Laplacian is thus positive semi-definite when . The nullspace of the weighted Laplacian relates to the number of connected components through the following lemma.

Lemma 1 (Connected components).

The number of connected components of a graph with adjacency matrix equals to the dimension of the null space of .

The proof of Lemma 1 is similar to the proof for the (unweighted) Laplacian in [19] and is therefore not repeated here. Lemma 1 gives that a graph is connected if and only if the second smallest eigenvalue of (also known as the Fiedler value) is nonzero. By noting that , we find the following proposition.

Proposition 1 (Graph connectedess constraint).

A graph with adjacency matrix is connected if and only if

(2)
Proof.

Let the graph be . Assume first that there exists a vector such that

is zero. This means that and for all and such that . Hence, for all nodes connected to a the node , the vector is thus piecewise constant over the graph. However, the vector cannot be completely constant since and by assumption. The graph must therefore consist of at least two components.

Next, assume that the graph is disconnected. Then there exists two sets and such that , and for all . Set the elements of be for and for where and denotes the number of elements in and respectively. We find that

Hence . ∎

Proposition 1 immediately implies that the solution to the graph learning problem (1) is connected when the constraint

(3)

is imposed for some and .

For the graphical Lasso problem [13], the components of the adjacency matrix are not necessarily non-negative. However, the sign pattern of the solution is the same as that of in the sense that for all [20, 21]. One can therefore replace the adjacency matrix in (3) by an adjacency matrix with components . The constraint can therefore also be used in problems with negative adjacency matrices provided that the sign pattern of is known.

We next consider the application of the constraint to signal processing on graphs and the distributed consensus problem.

Iii Applications

Iii-a The consensus problem

In the distributed consensus problem [4, 16, 15], we iteratively compute the mean of a sequence as

where and denotes the value at the ’th node at the ’th iteration. The iterations converge to the mean as when and [4]. Smaller leads to faster (worst case) convergence.

In some scenarios, for example when the number of communication links is limited, it is desirable to obtain a sparse graph [16, 15]. This can be done by setting [16]

The optimization problem (1) then becomes [16]:

(4)

The optimization problem (4) does not ensure that the graph is connected. In fact, we now prove that the solution to (4) is guaranteed to be disconnected for certain values of .

Proposition 2 (Consensus graph splitting with sparsity penalty).

In the consensus problem (4), the graph with nodes and adjacency matrix splits up into at least connected components when

Proof.

We prove the proposition by induction over . The graph consists of at least one connected component, so the proposition holds for . Assume that the proposition holds for . Since , the eigenvalues of are non-negative. The induction hypothesis gives that . We find that

So . Since , we find that if . Hence, for , and the graph consists of at least connected components. ∎

Proposition 2 generalizes the result from [16] which states that the graph becomes completely disconnected (consists of independent components) when .

Next we examine the effect of imposing the connectedness constraint (3) on the consensus problem. The constraint can be rewritten as

This gives us that

This shows that the graph is connected when . Convergence of the consensus rule () thus implies connectedness, whereas the reverse is not true. For example, when and , the graph is connected but the consensus rule is not guaranteed to converge.

Iii-B Signal processing on graphs

Signal processing on graphs (SPG) [12, 8, 11, 7] is an emerging field which deals with processing and analysis of data defined over graphs. Concepts such as sampling, filters and transforms have been generalized to graph signals [12, 11, 10, 22, 9, 23]. In many problems the graph needs to be learned from data [17, 24, 14]. The aim is then to find a sparse adjacency matrix that describes similarities in a given data set. For undirected graphs, a commonly used measure of graph signal smoothness is the Laplacian quadratic form

where . A graph signal with a small varies smoothly across the edges of the graph [12, 25, 9, 17]. Given time series in a matrix , one can find graphs describing the smoothness in the data by setting

Self-edges can be removed by setting for all and the rows can be normalized by setting to prevent the trivial solution , giving that . The connected graph learning problem thus becomes

(5)

where we used that .

Iv Numerical simulations

Fig. 1: Graph learning from data. (a) shows the sinusoids used to learn the graph, (b) shows reconstruction without the connectedness constraint and (c) shows reconstruction with the connectedness constraint. Edges with weights less than have been truncated.
Fig. 2: Histogram of edge weights for learning graphs from data without the connectedness constraint (a) and with the constraint (b). The -axis has been reduced to show the number of non-zero components more clearly.

We here show numerically that the constraint (3) ensures connectedness. We consider the problem of learning the weighted graph Laplacian from data using synthetic data and a data set of temperatures at Swedish cities [26]. In the experiments we solved the optimization problem (5) using the CVX toolbox [27] with .

Iv-a Experiments using synthetic data

In the first experiment we synthetically generated the signals as

where , , and denotes the uniform distribution on an interval . In the experiment we set . The data is shown in Figure 1 (a). We construct a graph describing the smoothness and periodicity of the data [12, 24, 25] with and without the connectedness constraint. In Figure 1 (b) and (c) we see that without the connectedness constraint the graph shows local smoothness of the signals and is disconnected while the graph with the constraint shows local smoothness as well as some periodicity. In Figure 1, the edges with weight less than were truncated. To examine if the graph is sensitive to the truncation we show the histogram of the adjacency matrices in Figure 2. We find that without the constraint, only a few edges have large weights while the remaining are small. With the constraint, the weights assume a wider range of values.

Fig. 3: Learned graph for the Swedish temperature dataset without the connectedness constraint.
Fig. 4: Learned graph for the Swedish temperature dataset with the connectedness constraint.

Iv-B Experiments using real data

In this experiment, we use time series of daily temperature data from 45 swedish cities from October to December of 2014 [26]. Our task is to find a graph that shows which cities have similar temperatures. Note that in the experiment, the algorithm only has access to the temperatures and not the locations of the cities. We truncated edges with weights less than . In Figure 3 we see that the graph learned without the connectedness constraint does identify some neighboring cities but is disconnected while the graph with the constraint in Figure 4 does identify neighboring cities from the temperature data. The graph also shows that cities on the same latitude have related temperatures. We show the histogram of edge weights in Figure 5. We find that the edge weights have a wider range of values when the constraint is enforced.

Fig. 5: Histogram of edge weights for learning graphs from temperature data without the connectedness constraint (a) and with the constraint (b). The -axis has been reduced to show the number of non-zero components more clearly.

V Conclusion

In many problems, it is useful to represent relations between data using graphs. Often, the graph is not given a-priori but needs to be learned from data. Since graphs are often preferred to be sparse, it is a challenge to preserve connectedness of the graph while simultaneously enforcing sparsity. In this paper, we showed that connectedness is an analytical property that it can be imposed on a graph as a convex constraint making it possible to guarantee connectedness when learning sparse graphs. For the consensus problem, we showed that the graph is guaranteed to be disconnected for certain values of the regularization parameter when no constraint is imposed. We illustrated the effect of the constraint when learning a graph for synthetic data and for temperature data.

Acknowledgement

This work was partially supported by the Swedish Research Council under contract 2015-05484. We also want to thank the anonymous reviewers for providing useful comments.

References

  • [1] P. K. Varshney, “Distributed Bayesian detection: Parallel fusion network,” in Distributed Detection and Data Fusion, pp. 36–118. Springer, 1997.
  • [2] A. H. Sayed, “Diffusion adaptation over networks,” Academic Press Library in Signal Processing, vol. 3, pp. 323–454, 2013.
  • [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
  • [4] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing Markov chain on a graph,” SIAM review, vol. 46, no. 4, pp. 667–689, 2004.
  • [5] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking: bringing order to the web.,” Tech. Rep., 1999.
  • [6] A. N. Langville and C. D. Meyer, Google’s PageRank and beyond: The science of search engine rankings, Princeton University Press, 2011.
  • [7] A. Sandryhaila and J. M. F. Moura, “Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 80–90, Sept 2014.
  • [8] A. Sandryhaila and J. M.-F. Moura, “Discrete signal processing on graphs,” IEEE transactions on signal processing, vol. 61, no. 7, pp. 1644–1656, 2013.
  • [9] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovačević, “Signal recovery on graphs: Variation minimization,” IEEE Transactions on Signal Processing, vol. 63, no. 17, pp. 4609–4624, Sept 2015.
  • [10] A. Venkitaraman, S. Chatterjee, and P. Händel, “On Hilbert transform of signals on graphs,” in 2015 International Conference on Sampling Theory and Applications (SampTA), May 2015.
  • [11] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” Signal Processing, IEEE Transactions on, vol. 62, no. 12, pp. 3042–3054, 2014.
  • [12] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
  • [13] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covariance estimation with the graphical LASSO,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008.
  • [14] J. Mei and J. M. F Moura, “Signal processing on graphs: Estimating the structure of a graph,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5495–5499.
  • [15] F. Lin, M. Fardad, and M. R Jovanovic, “Identification of sparse communication graphs in consensus networks,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. IEEE, 2012, pp. 85–89.
  • [16] G. Gnecco, R. Morisi, and A. Bemporad, “Sparse solutions to the average consensus problem via l1-norm regularization of the fastest mixing markov-chain problem,” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on, Dec 2014, pp. 2228–2233.
  • [17] D. Xiaowen, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning graphs from signal observations under smoothness prior,” CoRR, vol. abs/1406.7842, 2014.
  • [18] C. Hegde, P. Indyk, and L. Schmidt, “A nearly-linear time framework for graph-structured sparsity,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 928–937.
  • [19] F. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, vol. 92. AMS, 1997.
  • [20] Jerome Friedman, Trevor Hastie, and Robert Tibshirani, “Applications of the LASSO and grouped LASSO to the estimation of sparse graphical models,” Tech. Rep., 2010.
  • [21] Somayeh Sojoudi, “Equivalence of graphical LASSO and thresholding for sparse graphs,” Journal of Machine Learning Research, vol. 17, no. 115, pp. 1–21, 2016.
  • [22] A. Jung, P. Berger, G. Hannak, and G. Matz, “Scalable graph signal recovery for big data over networks,” in 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), July 2016, pp. 1–6.
  • [23] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-frequency analysis on graphs,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 260–291, 2016.
  • [24] J. Hou, L. P. Chau, Y. He, and H. Zeng, “Robust Laplacian matrix learning for smooth graph signals,” in 2016 IEEE International Conference on Image Processing (ICIP), Sept 2016, pp. 1878–1882.
  • [25] S. Chen, R. Varma, A. Singh, and J. Kova, “Representations of piecewise smooth signals on graphs,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 6370–6374.
  • [26] Swedish Meteorological and Hydrological Institute (SMHI), “Swedish temperature data: October to December 2014,” http://opendata-catalog.smhi.se/explore/, Accessed: 2016-08-15.
  • [27] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
19229
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description