Graph Filtering For Data Reduction and Reconstruction
A novel approach is put forth that utilizes data similarity, quantified on a graph, to improve upon the reconstruction performance of principal component analysis. The tasks of data dimensionality reduction and reconstruction are formulated as graph filtering operations, that enable the exploitation of data node connectivity in a graph via the adjacency matrix. The unknown reducing and reconstruction filters are determined by optimizing a mean-square error cost that entails the data, as well as their graph adjacency matrix. Working in the graph spectral domain enables the derivation of simple gradient descent recursions used to update the matrix filter taps. Numerical tests in real image datasets demonstrate the better reconstruction performance of the novel method over standard principal component analysis.
Graph Filtering For Data Reduction and Reconstruction
|Ioannis D. Schizas|
|Department of Electrical Engineering|
|University of Texas at Arlington|
Index Terms— Graph filtering, dimensionality reduction, reconstruction
Data dimensionality reduction and reconstruction has been extensively studied, with the workhorse approach being the principal component analysis (PCA) framework which determines proper compression and reconstruction matrices that minimize the mean-square error (MSE), see e.g., . Standard PCA relies on data correlations within each data vector to find a MSE-optimal data representation in a reduced dimensional space. Our goal here, is to exploit similarity among different data vectors when performing dimensionality reduction, manifested as edge weights on a graph, to improve the data reconstruction performance.
Graph signal processing is an emerging field where similarity among the available data is exploited, via the utility of shift operators, to improve the performance in a variety of tasks including sampling, filtering, clustering and sampling/reconstruction [2, 12, 16]. The concept of sampling a graph signal in a subset of nodes and reconstructing it wherever is not available has been extensively explored [2, 6, 11, 13, 17]. In these works, the idea of bandlimited signals is extended in the graph spectral domain, and techniques exploiting the Laplacian eigenspace are devised to reconstruct the signal values in every node of the graph from a subset of nodes.
Dimensionality reduction in graphs has been proposed by expanding the PCA or nonnegative matrix factorization formulations with a Laplacian regularization term that takes into account similarity among single-hop neighboring data entities in a graph [8, 5, 7, 15, 14]. In the aforementioned line of work dimensionality reduction is performed to improve data clustering performance. Differently, our goal here is data dimensionality reduction and reconstruction by exploiting data similarity quantified here by the graph adjacency matrix.
The tasks of data dimensionality reduction and reconstruction are carried out via graph filtering, while the order of the matrix filters will determine the neighborhood size that will be utilized in determining the compressed and reconstructed data. The novel formulation is seeking MSE-optimal filter matrices that minimize the reconstruction MSE in the graph. A computationally effective gradient descent approach is proposed to recursively determine the filters. For zero-order filters the novel framework boils down to standard PCA. Numerical tests using real image datasets demonstrate the superiority of the novel graph-based dimensionality reduction and reconstruction framework over standard PCA.
2 Problem Setting and Preliminaries
Consider a collection of data , where each data vector has scalar entries. Columns in could correspond to a collection of images, sensor measurements and so on [9, 5]. In many practical applications the data vectors lie on a low dimensional vector space , where .
One of the most effective ways to apply dimensionality reduction to the data is to employ principal component analysis (PCA), see e.g., . PCA, being the dimensionality reduction workhorse, extracts the principal components by projecting the data onto a low dimensional vector subspace in which the data demonstrate the largest variability. PCA is determining a dimensionality reducing matrix of size , with and a reconstruction matrix , which are found by minimizing the reconstruction MSE
where corresponds to a centered version of the data, with for , and denotes the Frobenius norm. It turns out that , where contains in its columns the principal eigenvectors of sample-average covariance matrix .
PCA is designed to estimate the low dimensional subspace using , without taking into account similarity among different data vectors. However, the dataset may contain groups of data vectors that exhibit similarity in some sense, e.g., images depicting a similar object or having similar texture. Standard PCA does not take into account data similarity information that can potentially identify structurally similar data and lead to better reconstruction.
Data similarity measures if available can be utilized in a graph. Specifically, let scalar quantify the similarity between data vectors and for . Then, an undirected graph with nodes within set and edges in can summarize the similarity among the different data in . Note that since the graph is undirected then . The similarity quantities can be summarized in the so called adjacency matrix which is an symmetric matrix whose eigenvalue decomposition can be written as , where is a diagonal matrix that contains the eigenvalues, while is a unitary matrix containing the eigenvectors of .
PCA is redesigned in this work to exploit data similarities summarized in the adjacency matrix , via graph filtering, and improve reconstruction performance.
3 Data Reduction and Reconstruction via Graph Filtering
where denotes a graph shift operator that in this paper will be the adjacency matrix . Building upon (2) we define the following data reducing graph matrix filtering operation
where is obtained after stacking the columns in on top of each other, while refers to an identify matrix of size and is the Kronecker product.
Vector contains the reduced dimensionality data vectors with entries for each node , while each is produced by compressing and linearly combining data vectors from neighboring nodes (up to hops away from node ) using the dimensionality reducing matrices for . The motivation behind this reducing filtering step is that data vectors within a neighborhood of few hops will exhibit large similarity, and these data can be used jointly to better reduce to the contents of . Note that for , (3) boils down to which pertains to standard PCA.
Similarly, graph filtering can be utilized as in (3) to reconstruct the data vectors using the reduced vectors , the adjacency matrix and reconstruction matrices in the following way
For simplicity it has been assumed that the order of the reducing and reconstruction filters is , nonetheless the proposed framework allows for different orders. Note that for the cost function in (3) boils down to which corresponds to the standard PCA formulation which does not take into account data similarity information.
3.1 Graph Spectrum MSE Reformulation
The cost function in (3) is reformulated next to facilitate the determination of the matrix filter taps . Multiplication of and in (3) with the unitary matrix has no effect in the cost, i.e., . Let denote the graph Fourier transform (GFT) of the data with respect to the adjacency matrix . In detail, the GFT at the ith frequency (ith eigenvalue of ) is given as , where corresponds to the th entry of . After the unitary transformation of the reconstruction MSE and using the property that for , the minimization problem in (3) can be rewritten as
where , while and . Thus, (6) can be viewed as a spectral version of (3) and convolution has been transformed into a multiplication between the filters’ spectral response and the GFT of the data vectors. Note that can be viewed as the spectral response of the reconstruction matrix filter at eigenvalue , similarly corresponds to the spectral response of the reducing matrix filter at .
The cost function in (6) can be rewritten as follows
where and .
The equalities in (8) can be utilized to show the following result (the proof has been omitted due to space considerations).
The reducing matrix filter taps in can be written as a linear combination of the transformed data vectors , i.e.,
where , while .
The result of Corollary 1 can be utilized to replace with in (6) reducing in that way the number of primary optimization variables. Note that contains entries that need to be found, whereas has entries that need to be determined. For applications where , Cor. 1 can be used to introduce computational savings when solving (6).
3.2 Gradient Descent Based Algorithm
where and denotes the th column of , i.e., and .
Similarly, the gradient can be calculated as
The computational complexity (number of additions and multiplications) for carrying out the the gradient descent recursions in (12) is of the order of , while for (13) complexity is of the order of . Complexity is proportional to the dimensionality of the data vectors , the order of the filters and quadratic in .
Optimal step-size selection: We resort to line search, see e.g., , where the step-sizes in and are set such that they minimize the cost function in (6) after substituting and with the updating recursions in (12) and (13) and minimizing wrt to the or parameters. We demonstrate the process for . After substituting in (7) with the right hand side in (12), and it turns out that the optimal choice for during iteration can be obtained as
where with . Further, the quantities and are
Then, it follows readily that the optimal step-size in (3.2) is equal to .
Initialization: and can be initialized using the solution of standard PCA to which our framework boils down to when . Let the standard PCA compression and reconstruction matrices be denoted as Then, we can initialize as . From Corollary 1 it holds that (when ) from which we can obtain . The gradient descent based approach is tabulated as Alg. 1. and are updated until the norm of the difference between successive iterates drops below a desired threshold .
Remark: Note that the original data consist of scalars, which can be prohibitively large. When, applying the dimensionality reduction matrix filter each data vector is described by scalars corresponding to the entries of . Thus, a total of scalars are utilized to characterize the dimensionality reduced data. Notice that to form the reconstructed data in (4), the entries of and , as well as the different entries of the symmetric adjacency matrix and the scalars in are needed. The cost of storing the entries of , and for the graph-based data reduction scheme, is higher than storing scalars required in standard PCA for and the . Nonetheless, the graph-based approach achieves better reconstruction accuracy as detailed next. Here compression occurs as long as
Thus, for high-dimensional data (such as images) and a limited amount of data vectors, the right hand side in (18) can be approximated as . Thus, as long as there is meaningful data reduction.
4 Numerical Simulations
We test and compare the performance of the graph-based reduction and reconstruction approach versus standard PCA (where ) in the MNIST database of handwritten digits, and the Extended Yale-B (EYB) face image dataset [10, 9]. The MNIST dataset consists of grayscale images of handwritten digits. The EYB database contains frontal colored images of size of individuals. Using the MNIST dataset we pick randomly images of randomly selected digits giving rise to a graph with nodes each associated with a data vector of size . The approach is repeated times to perform averaging when testing the performance. In a similar fashion, EYB is used to randomly pick roughly images for randomly chosen individuals giving rise to a graph with nodes. Each facial image is rescaled to a size of and converted to grayscale, thus here entries.
For the MNIST dataset the adjacency matrix is built such that its th entry is given as whereas for the EYB a Gaussian similarity kernel is employed where and . A k-nearest neighbor rule is applied where for each node connectivity with the most similar neighbors is preserved.
Fig. 1 depicts the reconstruction MSE, in the MNIST-derived dataset, versus the reduced dimension for the standard PCA (), as well as different graph matrix filters orders and . Clearly, the introduction of graph filtering leads to much lower reconstruction MSE which improves as increases. Though, after a certain filter order the MSE reduction becomes negligible. Similar conclusions can be drawn from Fig. 2 that depicts the reconstruction MSE associated with the EYB-derived dataset. The utilization of similarity information in the adjacency matrix of the graph boosts the reconstruction performance over PCA ().
A novel graph-filtering based data reduction and reconstruction scheme was proposed. A novel formulation incorporates in the reconstruction MSE graph-filtering, that takes into account data vector similarities. Working in the graph spectral domain enables the derivation of computationally efficient gradient descent techniques to determine the reducing and reconstruction matrix filters. Numerical tests on the image datasets EYB and MNIST demonstrate the improvement in reconstruction quality with respect to standard PCA.
-  A. Anis, A. Gadde, and A. Ortega, “Towards a Sampling Theorem for Signals on Arbitrary Graphs,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 3864–3868, May 2014.