Learning Sparse Graph Laplacian with k Eigenvector Prior via Iterative GLASSO and Projection

# Learning Sparse Graph Laplacian with k Eigenvector Prior via Iterative GLASSO and Projection

## Abstract

Learning a suitable graph is an important precursor to many graph signal processing (GSP) pipelines, such as graph spectral signal compression and denoising. Previous graph learning algorithms either i) make some assumptions on connectivity (e.g., graph sparsity), or ii) make simple graph edge assumptions such as positive edges only. In this paper, given an empirical covariance matrix computed from data as input, we consider a structural assumption on the graph Laplacian matrix : the first eigenvectors of are pre-selected, e.g., based on domain-specific criteria, and the remaining eigenvectors are then learned from data. One example use case is image coding, where the first eigenvector is pre-chosen to be constant, regardless of available observed data. We first prove that the subspace of symmetric positive semi-definite (PSD) matrices with the first eigenvectors being in a defined Hilbert space is a convex cone. We then construct an operator to project a given positive definite (PD) matrix to , inspired by the Gram-Schmidt procedure. Finally, we design an efficient hybrid graphical lasso / projection algorithm to compute the most suitable graph Laplacian matrix given . Experimental results show that given the first eigenvectors as a prior, our algorithm outperforms competing graph learning schemes using a variety of graph comparison metrics.

\name

Author(s) Name(s)1 \addressAuthor Affiliation(s) \nameSaghar Bagheri   Gene Cheung   Antonio Ortega   Fen Wang 2 \address York University, Toronto, Canada            Xidian University, Xi’an, China
University of Southern California, Los Angeles, CA, USA \ninept {keywords} Graph learning, Graph signal processing

## 1 Introduction

Graph signal processing (GSP) [19] is the study of signals residing on graphs. While GSP tools have been demonstrated to be effective in a wide range of applications from image compression and denoising to matrix completion [5, 1, 7, 14, 20, 24], a fundamental first step in GSP is the selection of an appropriate graph (or graph Laplacian matrix ) that suitably describes pairwise (dis)similarities. Previous graph learning algorithms fall into two categories: i) statistical approaches like graphical lasso that, given an empirical covariance matrix , estimate a sparse inverse matrix with no graph-structured assumptions [11, 4]; and ii) edge-based approaches that assume edge properties such as edge signs and absence of self-loops[9, 6, 8]. Neither of these two categories of methods make assumptions about the eigen-structure of the graph Laplacian matrix .

In this paper, we introduce explicit eigen-structure assumptions on the graph Laplacian matrix —that the first eigenvectors of are pre-selected or computed based on domain-specific criteria—into a graph learning framework. Consider the example of image compression, where one can deduce from domain knowledge that the most common pixel block is the constant signal, and thus should be the first eigenvector regardless of (possibly limited) training data. Consider also political voting records, where the most common pattern is voting along party affiliations in a two-party political system, and thus the first eigenvector should be piecewise constant (i.e., nodes of one political party are assigned 1, while nodes of the other party are -1). There are also practical cases where the first eigenvectors can be pre-chosen for fast computation. For example, one can use fast graph Fourier transform (FGFT) [17] to construct a set sparse eigenvectors based on Givens rotation matrices, so that the first transform coefficients can be computed speedily.

We propose an optimization strategy where a graph Laplacian matrix is learned optimally from data, while restricting the first eigenvectors to be those chosen ahead of time. We first prove that the subspace of symmetric positive semi-definite (PSD) matrices with the first eigenvectors taken from a given set is a convex cone. We next construct an operator to project a given positive definite (PD) matrix to , inspired by the Gram-Schmidt procedure [13]. Finally, we design an efficient hybrid graphical lasso / projection algorithm to compute the most suitable graph Laplacian matrix given input empirical covariance matrix . Experimental results demonstrate that given the first eigenvectors as a prior, our algorithm outperformed competing graph learning methods in terms of different graph comparison metrics.

## 2 Preliminaries

### 2.1 Graph Definitions

Suppose we are given a graph with nodes and edges connecting nodes and with weight . Denote by the adjacency matrix, where . may include self-loops ’s. Assuming that the edges are undirected, is symmetric. Define next the diagonal degree matrix where . The combinatorial graph Laplacian matrix [19] is then defined as . To properly account for self-loops, the generalized graph Laplacian matrix is defined as . In this paper, we assume that the target graph Laplacian to be learned from data may have self-loops, and edge weights can be positive or negative. This means that our graph Laplacian can account for both pairwise correlation and anti-correlation.

### 2.2 Hilbert Space

Following terminologies in [23], we first define a vector space of real, symmetric matrices in . Note that is closed under addition and scalar multiplication. We next define the standard inner product for matrices in as:

 ⟨P,Q⟩ =Tr(Q⊤P)=∑i,jPijQij (1)

This definition of inner product induces the Frobenius norm:

 ⟨P,P⟩ =∥P∥2F=∑i,jP2ij (2)

We can now define a Hilbert space in as a space of real, symmetric matrices endowed with an inner product . Further, we are interested in the subspace of positive semi-definite (PSD) matrices, i.e., . One can easily show that subspace is a convex cone.

## 3 Projection Operator

We first overview our algorithm development. We prove that the subspace of symmetric PSD matrices that share a common ordered set of first eigenvectors is a convex cone. This means that, given an empirical covariance matrix estimated from signal observations , minimizing a convex objective computed using and a graph Laplacian matrix while restricting is a convex optimization problem. We then develop a projection operator to project a positive definite (PD) matrix to . We describe our optimization algorithm using the projection operator in Section 4.

### 3.1 Subspace of Matrices with Common First K Eigenvector

Denote by the subspace of PSD matrices that share the first eigenvectors , assumed to be orthonormal, i.e.,

 u⊤juk=δj−k,   ∀j,k∈IK (3)

where and is the discrete impulse function that evaluates to 1 if and 0 otherwise. We can define using the Rayleigh quotient and the min-max theorem [12] as

 H+u={L∈H+|uk=argminx|x⊥uj,∀j
###### Lemma 3.1.

is a convex cone.

###### Proof.

We prove by induction. For the base case when , let be two matrices that share the first eigenvector . Consider a positive linear combination , where . We examine the smallest eigenvalue of using the Rayleigh quotient again, i.e.,

 λmin(L) =minxx⊤Lxx⊤x=minxx⊤(c1L1+c2L2)xx⊤x \lx@stackrel(a)≥c1(minxx⊤L1xx⊤x)+c2(minyy⊤L2yy⊤y) \lx@stackrel(b)=c1λmin(L1)+c2λmin(L2)

The inequality in is true since right-hand side (RHS) has more degrees of freedom than left-hand side (LHS) to minimize the two non-negative terms ( and are PSD matrices in ). is true by definition of Rayleigh quotient, with being the minimizing argument for both terms by assumption. Thus, the argument that minimizes the Rayleigh quotient for is also , and therefore is the first eigenvector of . Since this analysis is true for all , is a convex cone for .

Consider next the inductive step, where we assume the case is true, and we prove the case. Let be two matrices that share the first eigenvectors . Consider a positive linear combination , where . By the inductive assumption, also shares the first eigenvectors . For the -th eigenvector, consider the -th eigenvalue computed as:

 =minx|x⊥{uk}Kk=1x⊤Lxx⊤x=minx|x⊥{uk}Kk=1x⊤(c1L1+c2L2)xx⊤x ≥c1(minx|x⊥{uk}Kk=1x⊤L1xx⊤x)+c2(miny|y⊥{uk}Kk=1y⊤L2yy⊤y) =c1λKo+1(L1)+c2λKo+1(L2)

Since the argument that minimizes the Rayleigh quotient for both and is , this is also the argument that minimizes the Rayleight quotient for . Thus the ()-th eigenvector for is also . Thus, we conclude that has as its first eigenvectors. Since both the base case and the inductive step are true, is a convex cone. ∎

### 3.2 Projection to Convex Cone H+u

To project a PD matrix to convex cone , we focus on the inverse and ensure of the known eigenvector set is the argument that maximizes the Rayleigh quotient of , then of is the argument that maximizes the Rayleigh quotient while being orthogonal to , and so on.

Specifically, we first compute . Define as the subspace spanned by rank-1 matrix with scalar . Denote by the orthogonal subspace in so that . We first write as its projection to plus its orthogonal component, i.e.,

 C=⟨C,U1⟩U1 + CΩ⊥ (5)

where is the orthogonal component of in subspace . We show that given is PD, inner product is non-negative.

.

###### Proof.

Since is PD by assumption, without loss of generality we can write . Then

 ⟨C,U1⟩ =Tr((BB⊤)⊤u1u⊤1)=Tr(BB⊤u1u⊤1) =Tr(B⊤u1u⊤1B)=⟨u⊤1B,u⊤1B⟩

The last term is a Frobenius norm . ∎

Define . Suppose can be expressed as the span of orthogonal and , where , and . Suppose also that the inner products of with the orthogonal components, , are within range , . This means that the respective Rayleigh quotients are no larger than :

 (6)

Then is surely the last eigenvector of corresponding to largest eigenvalue . If this is not the case, then we need to approximate ’s projection to as just to ensure is the last eigenvector:

 (7)

Jointly optimizing rank-1 matrices ’s and in (7) is difficult. We propose a greedy algorithm instead next.

### 3.3 Gram-Schmidt-inspired Algorithm

Our algorithm is inspired by the Gram-Schmidt procedure [13] that iteratively computes a set of orthonormal vectors given a set of linearly independent vectors. Similarly, our algorithm iteratively computes one orthogonal rank-1 matrix at each iteration , for . Iteration uses , which is known.

Consider iteration . From (5), we first define the residual signal as . Consider first the case where , and thus rank-1 matrix orthogonal to is known. We need only that the inner product of approximation and is no larger than . Thus, we set

 μ2={⟨E1,U2⟩,if  ⟨E1,U2⟩≤μ1μ1,o.w. (8)

Consider next the case where , and we must identify a new rank-1 matrix orthogonal to to reconstruct . In this case, to minimize error , we seek a “most aligned” with , i.e.,

 maxv2 ⟨E1,v2v⊤2⟩,  s.t.  {⟨v2v⊤2,U1⟩=0∥v2∥2=1 (9)

Essentially, (9) seeks a rank-1 approximation of matrix , while constraining to be orthogonal to . The objective is equivalent to , which is convex for a maximization problem.

To convexify the problem, we first approximate where is the last eigenvector3 of , and is the best rank-1 approximation of . We then relax the norm equality constraint and rewrite (9) into the following optimization:

 maxv2e⊤v2,  s.t.  {v⊤2u1=0∥v2∥22≤1 (10)

Optimization (10) is now convex and solvable in polynomial time, using algorithms such as proximal gradient (PG) [21].

Having computed in (10), we project onto and threshold its inner product to within as done in the case, i.e.,

 μ2={⟨E1,V2⟩,if ⟨E1,V2⟩≤μ1μ1,o.w. (11)

Notice that . We omit the proof of the more general case for brevity. The projection of on is thus . We then compute new residual signal :

 E2=E1−μ2V2 (12)

Given residual , we again compute a rank-1 matrix —orthogonal to and —that is most aligned with , and compute projection of to , and so on.

More generally, at each iteration , we threshold the inner product , and compute the next residual signal . On the other hand, at each iteration , we first approximate where is ’s last eigenvector. We then compute an optimal :

 maxvt+1 e⊤vt+1,  s.t.  ⎧⎪ ⎪⎨⎪ ⎪⎩v⊤t+1uk=0,i∈IKv⊤t+1vτ=0,  τ∈{K+1,…,t}∥vt+1∥22≤1 (13)

We then threshold the inner product , where . We compute the next residual signal .

We finally note that our constructed operator is indeed a projection, since it is provably idempotent [23], i.e., for any PD matrix .

## 4 Graph Laplacian Matrix Estimation

Given an empirical covariance matrix computed from data, we formulate the following optimization problem to estimate a graph Laplacian matrix starting from the GLASSO formulation in [18]:

 minL∈H+u  %Tr(L¯C)−logdetL+ρ∥L∥1 (14)

where is a shrinkage parameter for the norm. The only difference from GLASSO is that (14) has an additional constraint . Because is a convex set, (14) is a convex problem.

We solve (14) iteratively using our developed projection operator in Section 3 and a variant of the block Coordinate descent (BCD) algorithm in [26]. Specifically, we first solve the dual of GLASSO as follows. We first note that the norm in (14) can be written as

 ∥L∥1=max∥U∥∞≤1  Tr(LU) (15)

where is the maximum absolute value element of the symmetric matrix . Then the dual problem of GLASSO that solves for an estimated covriance matrix is

 minC−1∈H+  −logdetC,   s.t.  ∥C−¯C∥∞≤ρ (16)

where implies that the primal and dual variables are related via [2]. To solve (16), we update one row-column pair in in (16) in each iteration. Specifically, we first reorder the rows / columns of so that the optimizing row / column are swapped to the last. We then partition and into blocks:

 (17)

[2] showed that the optimal is the solution to the following linearly constrained quadratic programming problem:

 c12=argminy{y⊤C−111y},  s.t.  ∥y−¯c12∥∞≤ρ (18)

Our algorithm to solve (14) is thus as follows. We minimize the GLASSO terms in (14) by solving its dual (16)—iteratively updating one row / column of using (18). We then project to convex cone using our projection operator. We repeat these two steps till convergence. Note that both steps are computed using covariance directly, and thus inversion to graph Laplacian is not necessary until convergence, when we output a solution.

## 5 Experimental Results

### 5.1 Experiments Setups for Synthetic Datasets

We conducted experiments using synthetic datasets to evaluate the performance of our method (Proj-Lasso) and of competing schemes: Graphical Lasso (GLASSO) [11], graph Laplacian learning with Gaussian probabilistic prior (GL-SigRep) [8] and diagonally dominant generalized graph Laplacian estimation under structural and Laplacian constraints (DDGL) [9]. The convergence tolerance for those algorithms was set to , and the regularization parameter was .

To simulate ground truth graphs, we first randomly located 20 nodes in 2D space and used the Erdos-Renyi model [10] to determine their connectivity with probability . We then computed edge weights using a Gaussian kernel, i.e., , where is the Euclidean distance between and and is 0.5. Edge weights smaller than were removed for sparsity. To introduce negative weights, we flipped the sign of each edge with probability 0.5 and then computed the generalized graph Laplacian . To generate data from , we first computed covariance for . We then generated data set from multivariate normal distribution . An empirical covariance matrix was then computed from and used as input to different graph learning algorithms.

We employed three popular graph similarity metrics to evaluate graph Laplacian matrices learned: relative error (RE), DeltaCon and -distance [16, 22, 3, 25]. Specifically, RE computes the relative Frobenius norm error between the ground truth Laplacian matrix and the learned matrix ; DeltaCon compares the similarities between all node pairs in the two graphs, and -distance metric computes the eigenvalues’ distance between two matrices.

### 5.2 Results assuming the First K Eigenvectors

Table 1 shows the graph learning performance of different methods evaluated using the aforementioned three metrics. For our proposed Proj-Lasso, we also simulated its performance using different ’s—the number of known first eigenvectors—which in our experiments were the first eigenvectors of the ground truth matrix . As shown in Table 1, using RE as metric, Proj-Lasso outperformed GLASSO, DDGL and GL-SigRep when and became even better as increased. Using -distance as metric, Proj-Lasso was always better than its competitors. Using DeltaCon as metric, all the methods are comparable. We observe that Proj-Lasso’s performance evaluated using RE and -distance improved significantly as increased, while the results of DeltaCon metric were not sensitive to .

We show visual comparisons of learned Laplacian matrices in Fig. 1, demonstrating that Proj-Lasso had the best performance, i.e., Proj-Lasso is visually closer to the ground truth matrix than others.

### 5.3 Results using Eigenvectors from Givens Rotation Method

[17] developed a method based on Givens rotation matrices to approximately diagonalize Laplacian , i.e.,

 L≈S1⋯SJ^ΛS⊤J⋯S⊤1 (19)

where are Givens rotation matrices that are both sparse and orthogonal, and is a near-diagonal matrix. can be interpreted as a fast Graph Fourier Transform (FGFT) that approximates the original eigen-matrix of . is sparse since each is sparse, and thus computation of transform coefficients can be efficient. is a parameter to trade off the complexity of the transform and the GFT approximation error.

In this experiment, we assumed that the first rows of are chosen as the first eigenvectors in our prior, which are the input to our algorithm Proj-Lasso. The algorithm then computed the remaining eigenvectors to compose Laplacian . We set . We compared the performance of Proj-Lasso against the scheme that uses all the rows of . Table 2 shows learning performances using different numbers of rows from ( is Proj-Lasso, and is Givens method). We observe that Proj-Lasso has smaller error for both metrics RE and -distance.

## 6 Conclusion

Given observable graph signals, we propose a new graph learning algorithm, where we assume that the first eigenvectors of the graph Laplacian matrix are pre-chosen based on domain knowledge, or pre-computed using an alternative criterion. We construct an operator in Hilbert space to project a positive definite (PD) matrix into the convex cone of positive semi-definite (PSD) matrices that share the first eigenvectors. We design an efficient algorithm combining block coordinate descent (BCD) in GLASSO and our projection operator, both optimizing the covariance matrix directly. Experimental results show that our algorithm outperformed competing graph learning schemes when the first eigenvectors are known.

### Footnotes

1. thanks: Thanks to XYZ agency for funding.
2. thanks: Gene Cheung acknowledges the support of the NSERC grants [RGPIN-2019-06271], [RGPAS-2019-00110].
3. Extreme eigenvectors of sparse symmetric matrices can be computed efficiently using LOBPCG [15].

### References

1. Y. Bai, G. Cheung, X. Liu and W. Gao (2019-03) Graph-based blind image deblurring from a single photograph. In IEEE Transactions on Image Processing, Vol. 28, no.3, pp. 1404–1418. Cited by: §1.
2. O. Banerjee and L. Ghaoui (2007-08) Model selection through sparse max likelihood estimation. Journal of Machine Learning Research 9, pp. . External Links: Document Cited by: §4.
3. H. Bunke, P.J. Dickinson, M. Kraetzl and W.D. Wallis (2006) A graph-theoretic approach to enterprise network dynamics. Progress in Computer Science and Applied Logic, Birkhäuser Boston. External Links: ISBN 9780817644857, LCCN 2006928981, Link Cited by: §5.1.
4. T. Cai, W. Liu and X. Luo (2011) A constrained minimization approach to sparse precision matrix estimation. In Journal of the American Statistical Association, Vol. 106, pp. 594–607. Cited by: §1.
5. G. Cheung, E. Magli, Y. Tanaka and M. Ng (2018-05) Graph spectral image processing. In Proceedings of the IEEE, Vol. 106, no.5, pp. 907–930. Cited by: §1.
6. G. Cheung, W.-T. Su, Y. Mao and C.-W. Lin (2018-12) Robust semisupervised graph classifier learning with negative edge weights. In IEEE Transactions on Signal and Information Processing over Networks, Vol. 4, no.4, pp. 712–726. Cited by: §1.
7. C. Dinesh, G. Cheung and I. V. BajiÄ (2020) Point cloud denoising via feature graph laplacian regularization. IEEE Transactions on Image Processing 29 (), pp. 4143–4158. External Links: Document Cited by: §1.
8. X. Dong, D. Thanou, P. Frossard and P. Vandergheynst (2016) Learning laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal Processing 64 (23), pp. 6160–6173. External Links: Document Cited by: §1, Figure 1, §5.1.
9. H. Egilmez, E. Pavez and A. Ortega (2017-07) Graph learning from data under laplacian and structural constraints. In IEEE Journal of Selected Topics in Signal Processing, Vol. 11, no.6, pp. 825–841. Cited by: §1, Figure 1, §5.1.
10. P. Erdös and A. Rényi (1959) On random graphs i. Publicationes Mathematicae Debrecen 6, pp. 290. Cited by: §5.1.
11. J. Friedman, T. Hastie and R. Tibshirani (2008-08) Sparse inverse covariance estimation with the graphical lasso. Biostatistics (Oxford, England) 9, pp. 432–41. External Links: Document Cited by: §1, Figure 1, §5.1.
12. G. Golub and C. F. V. Loan (2012) Matrix computations (johns hopkins studies in the mathematical sciences). Johns Hopkins University Press. Cited by: §3.1.
13. W. Hoffmann (1989) Iterative algorithms for Gram-Schmidt orthogonalization. Computing 41 (4), pp. 335–348. Cited by: §1, §3.3.
14. W. Hu, G. Cheung, A. Ortega and O. Au (2015-01) Multi-resolution graph Fourier transform for compression of piecewise smooth images. In IEEE Transactions on Image Processing, Vol. 24, no.1, pp. 419–433. Cited by: §1.
15. A. V. Knyazev (2001) Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM journal on scientific computing 23 (2), pp. 517–541. Cited by: footnote 1.
16. D. Koutra, J. T. Vogelstein and C. Faloutsos (2013) DELTACON: A principled massive-graph similarity function. CoRR abs/1304.4657. External Links: Link, 1304.4657 Cited by: §5.1.
17. L. Le Magoarou, N. Tremblay and R. Gribonval (2017) Analyzing the approximation error of the fast graph fourier transform. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, Vol. , pp. 45–49. External Links: Document Cited by: §1, §5.3, Table 2.
18. R. Mazumder and T. Hastie (2012) The graphical lasso: new insights and alternatives. Electron. J. Statist. 6, pp. 2125–2149. External Links: Cited by: §4.
19. A. Ortega, P. Frossard, J. Kovacevic, J. M. F. Moura and P. Vandergheynst (2018-05) Graph signal processing: overview, challenges, and applications. In Proceedings of the IEEE, Vol. 106, no.5, pp. 808–828. Cited by: §1, §2.1.
20. J. Pang, G. Cheung, A. Ortega and O. C. Au (2015-04) Optimal graph Laplacian regularization for natural image denoising. In IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia. Cited by: §1.
21. N. Parikh and S. Boyd (2014) Proximal algorithms. Foundations and Trends in Optimization 1 (3), pp. 127–239. Cited by: §3.3.
22. M. Tantardini, F. Ieva, L. Tajoli and C. Piccardi (2019-12) Comparing methods for comparing networks. Scientific Reports 9, pp. . External Links: Document Cited by: §5.1.
23. M. Vetterli, J. Kovavcević and V.K. Goyal (2014) Foundations of signal processing. Cambridge University Press. External Links: ISBN 9781139839099, Link Cited by: §2.2, §3.3.
24. F. Wang, Y. Wang, G. Cheung and C. Yang (2020) Graph sampling for matrix completion using recurrent gershgorin disc shift. IEEE Transactions on Signal Processing 68 (), pp. 2814–2829. External Links: Document Cited by: §1.
25. R. Wilson and P. Zhu (2008-09) A study of graph spectra for comparing graphs and trees. Pattern Recognition 41, pp. 2833–2841. External Links: Document Cited by: §5.1.
26. S. J. Wright (2015) Coordinate descent algorithms. Math. Program. 151 (1), pp. 3–34. External Links: Cited by: §4.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

420011

How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description