A Connectedness Constraint for Learning Sparse Graphs
Abstract
Graphs are naturally sparse objects that are used to study many problems involving networks, for example, distributed learning and graph signal processing. In some cases, the graph is not given, but must be learned from the problem and available data. Often it is desirable to learn sparse graphs. However, making a graph highly sparse can split the graph into several disconnected components, leading to several separate networks. The main difficulty is that connectedness is often treated as a combinatorial property, making it hard to enforce in e.g. convex optimization problems. In this article, we show how connectedness of undirected graphs can be formulated as an analytical property and can be enforced as a convex constraint. We especially show how the constraint relates to the distributed consensus problem and graph Laplacian learning. Using simulated and real data, we perform experiments to learn sparse and connected graphs from data.
I Introduction
Graphs are naturally sparse objects which describe relations between different data sources. Graphs are classically used in network problems, such as distributed estimation [1, 2, 3, 4]. Other examples are webpage ranking [5, 6] and relations in social networks [7, 8]. More recently, graphs have been used to generalize signal processing concepts, such as transforms, to signals defined on graphs [7, 9, 10, 11, 8, 12]. Often, the graph in question is sparse. For example, in social networks, there are many users and each user is only connected to few other users, resulting in a sparse graph. In many applications, the graph is not given apriori but needs to be learned from data [13, 14, 15, 16, 17, 18]. When learning sparse graphs, one challenge is that promoting sparsity may make the graph disconnected. Connectedness is hard to incorporate as it is often treated as a combinatorial property [18]. In this article we show that connectedness is an analytical property that can be formulated as a convex constraint and subsequently be used efficiently in learning.
Ia Preliminaries and notations
A graph is commonly defined as a set of nodes/vertices and edges together with an adjacency matrix . Two nodes and share an edge if . The elements of the adjacency matrix describes the strength of the connection between two nodes. The relation between and is: For undirected graphs, the adjacency matrix is symmetric. In this article we consider real symmetric adjacency matrices with nonnegative components.
We denote the matrix determinant by , the matrix trace by . We use () to denote positive (semi) definiteness of symmetric matrices and to denote elementwise greater or equal. We use to denote the vector consisting of only ones and to denote the identity matrix. The ’th largest eigenvalue of a matrix is denoted by and the elementwise norm by .
IB Problem statement
Many graph learning problems can be posed as finding the adjacency matrix that minimizes an objective function under appropriate constraints. To make the adjacency matrix sparse, a sparsity promoting penalty function is often introduced. The graph is thus obtained by solving the optimization problem:
(1) 
where is a regularization parameter and is symmetric. A few examples of , and are shown in Table I. Signal Processing on Graphs (SPG) and the graphical LASSO are datadriven problems while the distributed consensus problem only uses the topology of an underlying graph. We note that all optimization problems in Table I are convex.
Problem  constraints  

Consensus [4, 15, 16]  , , for  
Graphical LASSO [13]  
SPG [12, 17] 
Often, the learned graph becomes disconnected for large values of . This means that the resulting graph describes two or more separate noninteracting systems. In this article, we address the issue of learning sparse connected graphs by formulating a convex constraint which preserves connectedness. Our contributions are as follows:

We analytically formulate connectedness in terms of a weighted Laplacian matrix.

For the distributed consensus problem we show that the graph splits into several components when the parameter exceeds some value.
We illustrate the validity of the proposed constraint through numerical simulations considering both synthetic data and real world temperature data.
Ii Preserving the connectedness of graphs
The connectedness of a graph is usually described by the spectrum of the graph Laplacian matrix defined using the incidence matrix [19]. As the Laplacian is not a continuous function of the adjacency matrix , preserving connectedness in terms of the Laplacian matrix leads to combinatorial optimization problems. For this reason we instead consider the weighted graph Laplacian matrix with elements
the matrix can thus be expressed as . We note that for
The weighted Laplacian is thus positive semidefinite when . The nullspace of the weighted Laplacian relates to the number of connected components through the following lemma.
Lemma 1 (Connected components).
The number of connected components of a graph with adjacency matrix equals to the dimension of the null space of .
The proof of Lemma 1 is similar to the proof for the (unweighted) Laplacian in [19] and is therefore not repeated here. Lemma 1 gives that a graph is connected if and only if the second smallest eigenvalue of (also known as the Fiedler value) is nonzero. By noting that , we find the following proposition.
Proposition 1 (Graph connectedess constraint).
A graph with adjacency matrix is connected if and only if
(2) 
Proof.
Let the graph be . Assume first that there exists a vector such that
is zero. This means that and for all and such that . Hence, for all nodes connected to a the node , the vector is thus piecewise constant over the graph. However, the vector cannot be completely constant since and by assumption. The graph must therefore consist of at least two components.
Next, assume that the graph is disconnected. Then there exists two sets and such that , and for all . Set the elements of be for and for where and denotes the number of elements in and respectively. We find that
Hence . ∎
Proposition 1 immediately implies that the solution to the graph learning problem (1) is connected when the constraint
(3) 
is imposed for some and .
For the graphical Lasso problem [13], the components of the adjacency matrix are not necessarily nonnegative. However, the sign pattern of the solution is the same as that of in the sense that for all [20, 21]. One can therefore replace the adjacency matrix in (3) by an adjacency matrix with components . The constraint can therefore also be used in problems with negative adjacency matrices provided that the sign pattern of is known.
We next consider the application of the constraint to signal processing on graphs and the distributed consensus problem.
Iii Applications
Iiia The consensus problem
In the distributed consensus problem [4, 16, 15], we iteratively compute the mean of a sequence as
where and denotes the value at the ’th node at the ’th iteration. The iterations converge to the mean as when and [4]. Smaller leads to faster (worst case) convergence.
In some scenarios, for example when the number of communication links is limited, it is desirable to obtain a sparse graph [16, 15]. This can be done by setting [16]
The optimization problem (1) then becomes [16]:
(4) 
The optimization problem (4) does not ensure that the graph is connected. In fact, we now prove that the solution to (4) is guaranteed to be disconnected for certain values of .
Proposition 2 (Consensus graph splitting with sparsity penalty).
In the consensus problem (4), the graph with nodes and adjacency matrix splits up into at least connected components when
Proof.
We prove the proposition by induction over . The graph consists of at least one connected component, so the proposition holds for . Assume that the proposition holds for . Since , the eigenvalues of are nonnegative. The induction hypothesis gives that . We find that
So . Since , we find that if . Hence, for , and the graph consists of at least connected components. ∎
Proposition 2 generalizes the result from [16] which states that the graph becomes completely disconnected (consists of independent components) when .
Next we examine the effect of imposing the connectedness constraint (3) on the consensus problem. The constraint can be rewritten as
This gives us that
This shows that the graph is connected when . Convergence of the consensus rule () thus implies connectedness, whereas the reverse is not true. For example, when and , the graph is connected but the consensus rule is not guaranteed to converge.
IiiB Signal processing on graphs
Signal processing on graphs (SPG) [12, 8, 11, 7] is an emerging field which deals with processing and analysis of data defined over graphs. Concepts such as sampling, filters and transforms have been generalized to graph signals [12, 11, 10, 22, 9, 23]. In many problems the graph needs to be learned from data [17, 24, 14]. The aim is then to find a sparse adjacency matrix that describes similarities in a given data set. For undirected graphs, a commonly used measure of graph signal smoothness is the Laplacian quadratic form
where . A graph signal with a small varies smoothly across the edges of the graph [12, 25, 9, 17]. Given time series in a matrix , one can find graphs describing the smoothness in the data by setting
Selfedges can be removed by setting for all and the rows can be normalized by setting to prevent the trivial solution , giving that . The connected graph learning problem thus becomes
(5) 
where we used that .
Iv Numerical simulations
We here show numerically that the constraint (3) ensures connectedness. We consider the problem of learning the weighted graph Laplacian from data using synthetic data and a data set of temperatures at Swedish cities [26]. In the experiments we solved the optimization problem (5) using the CVX toolbox [27] with .
Iva Experiments using synthetic data
In the first experiment we synthetically generated the signals as
where , , and denotes the uniform distribution on an interval . In the experiment we set . The data is shown in Figure 1 (a). We construct a graph describing the smoothness and periodicity of the data [12, 24, 25] with and without the connectedness constraint. In Figure 1 (b) and (c) we see that without the connectedness constraint the graph shows local smoothness of the signals and is disconnected while the graph with the constraint shows local smoothness as well as some periodicity. In Figure 1, the edges with weight less than were truncated. To examine if the graph is sensitive to the truncation we show the histogram of the adjacency matrices in Figure 2. We find that without the constraint, only a few edges have large weights while the remaining are small. With the constraint, the weights assume a wider range of values.
IvB Experiments using real data
In this experiment, we use time series of daily temperature data from 45 swedish cities from October to December of 2014 [26]. Our task is to find a graph that shows which cities have similar temperatures. Note that in the experiment, the algorithm only has access to the temperatures and not the locations of the cities. We truncated edges with weights less than . In Figure 3 we see that the graph learned without the connectedness constraint does identify some neighboring cities but is disconnected while the graph with the constraint in Figure 4 does identify neighboring cities from the temperature data. The graph also shows that cities on the same latitude have related temperatures. We show the histogram of edge weights in Figure 5. We find that the edge weights have a wider range of values when the constraint is enforced.
V Conclusion
In many problems, it is useful to represent relations between data using graphs. Often, the graph is not given apriori but needs to be learned from data. Since graphs are often preferred to be sparse, it is a challenge to preserve connectedness of the graph while simultaneously enforcing sparsity. In this paper, we showed that connectedness is an analytical property that it can be imposed on a graph as a convex constraint making it possible to guarantee connectedness when learning sparse graphs. For the consensus problem, we showed that the graph is guaranteed to be disconnected for certain values of the regularization parameter when no constraint is imposed. We illustrated the effect of the constraint when learning a graph for synthetic data and for temperature data.
Acknowledgement
This work was partially supported by the Swedish Research Council under contract 201505484. We also want to thank the anonymous reviewers for providing useful comments.
References
 [1] P. K. Varshney, “Distributed Bayesian detection: Parallel fusion network,” in Distributed Detection and Data Fusion, pp. 36–118. Springer, 1997.
 [2] A. H. Sayed, “Diffusion adaptation over networks,” Academic Press Library in Signal Processing, vol. 3, pp. 323–454, 2013.
 [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
 [4] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing Markov chain on a graph,” SIAM review, vol. 46, no. 4, pp. 667–689, 2004.
 [5] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking: bringing order to the web.,” Tech. Rep., 1999.
 [6] A. N. Langville and C. D. Meyer, Google’s PageRank and beyond: The science of search engine rankings, Princeton University Press, 2011.
 [7] A. Sandryhaila and J. M. F. Moura, “Big data analysis with signal processing on graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 80–90, Sept 2014.
 [8] A. Sandryhaila and J. M.F. Moura, “Discrete signal processing on graphs,” IEEE transactions on signal processing, vol. 61, no. 7, pp. 1644–1656, 2013.
 [9] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovačević, “Signal recovery on graphs: Variation minimization,” IEEE Transactions on Signal Processing, vol. 63, no. 17, pp. 4609–4624, Sept 2015.
 [10] A. Venkitaraman, S. Chatterjee, and P. Händel, “On Hilbert transform of signals on graphs,” in 2015 International Conference on Sampling Theory and Applications (SampTA), May 2015.
 [11] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” Signal Processing, IEEE Transactions on, vol. 62, no. 12, pp. 3042–3054, 2014.
 [12] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.
 [13] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covariance estimation with the graphical LASSO,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008.
 [14] J. Mei and J. M. F Moura, “Signal processing on graphs: Estimating the structure of a graph,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5495–5499.
 [15] F. Lin, M. Fardad, and M. R Jovanovic, “Identification of sparse communication graphs in consensus networks,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. IEEE, 2012, pp. 85–89.
 [16] G. Gnecco, R. Morisi, and A. Bemporad, “Sparse solutions to the average consensus problem via l1norm regularization of the fastest mixing markovchain problem,” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on, Dec 2014, pp. 2228–2233.
 [17] D. Xiaowen, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning graphs from signal observations under smoothness prior,” CoRR, vol. abs/1406.7842, 2014.
 [18] C. Hegde, P. Indyk, and L. Schmidt, “A nearlylinear time framework for graphstructured sparsity,” in Proceedings of the 32nd International Conference on Machine Learning (ICML15), 2015, pp. 928–937.
 [19] F. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, vol. 92. AMS, 1997.
 [20] Jerome Friedman, Trevor Hastie, and Robert Tibshirani, “Applications of the LASSO and grouped LASSO to the estimation of sparse graphical models,” Tech. Rep., 2010.
 [21] Somayeh Sojoudi, “Equivalence of graphical LASSO and thresholding for sparse graphs,” Journal of Machine Learning Research, vol. 17, no. 115, pp. 1–21, 2016.
 [22] A. Jung, P. Berger, G. Hannak, and G. Matz, “Scalable graph signal recovery for big data over networks,” in 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), July 2016, pp. 1–6.
 [23] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertexfrequency analysis on graphs,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 260–291, 2016.
 [24] J. Hou, L. P. Chau, Y. He, and H. Zeng, “Robust Laplacian matrix learning for smooth graph signals,” in 2016 IEEE International Conference on Image Processing (ICIP), Sept 2016, pp. 1878–1882.
 [25] S. Chen, R. Varma, A. Singh, and J. Kova, “Representations of piecewise smooth signals on graphs,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 6370–6374.
 [26] Swedish Meteorological and Hydrological Institute (SMHI), “Swedish temperature data: October to December 2014,” http://opendatacatalog.smhi.se/explore/, Accessed: 20160815.
 [27] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.