Signal Representations on Graphs: Tools and Applications

# Signal Representations on Graphs: Tools and Applications

Siheng Chen, , Rohan Varma, , Aarti Singh, Jelena Kovačević,  S. Chen and R. Varma are with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213 USA. Email: sihengc@andrew.cmu.edu, rohanv@andrew.cmu.edu. A. Singh is with the Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA. Email: aarti@cs.cmu.edu. J. Kovačević is with the Departments of Electrical and Computer Engineering and Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA. Email: jelenak@cmu.edu. The authors gratefully acknowledge support from the NSF through awards 1130616,1421919, the University Transportation Center grant (DTRT12-GUTC11) from the US Department of Transportation, and the CMU Carnegie Institute of Technology Infrastructure Award.
###### Abstract

We present a framework for representing and modeling data on graphs. Based on this framework, we study three typical classes of graph signals: smooth graph signals, piecewise-constant graph signals, and piecewise-smooth graph signals. For each class, we provide an explicit definition of the graph signals and construct a corresponding graph dictionary with desirable properties. We then study how such graph dictionary works in two standard tasks: approximation and sampling followed with recovery, both from theoretical as well as algorithmic perspectives. Finally, for each class, we present a case study of a real-world problem by using the proposed methodology.

{keywords}

Discrete signal processing on graphs, signal representations

## I Introduction

Signal processing on graphs is a framework that extends classical discrete signal processing to signals with an underlying complex and irregular structure. The framework models that underlying structure by a graph and signals by graph signals, generalizing concepts and tools from classical discrete signal processing to graph signal processing. Recent work includes graph-based transforms [1, 2, 3], sampling and interpolation on graphs [4, 5, 6], graph signal recovery [7, 8, 9, 10], uncertainty principle on graphs [11, 12], graph dictionary learning [13, 14], community detection [15, 16, 17] , and many others.

In this paper, we consider the signal representations on graphs. Signal representation is one of the most fundamental tasks in our discipline. For example, in classical signal processing, we use the Fourier basis to represent the sine waves; we use the wavelet basis to represent smooth signals with transition changes. Signal representation is highly related to approximation, compression, denoising, inpainting, detection, and localization [18, 19]. Previous works along those lines consider representations based on the graph Fourier domain, which emphasize the smoothness and global behavior of a graph signal [20], as well as the representations based on the graph vertex domain, which emphasize the connectivity and localization of a graph signal [21, 22].

We start by proposing a representation-based framework, which provides a recipe to model real-world data on graphs. Based on this framework, we study three typical classes of graph signals: smooth graph signals, piecewise-constant graph signals, and piecewise-smooth graph signals. For each class, we provide an explicit definition for the graph signals and construct a corresponding graph dictionary with desirable properties. We then study how the proposed graph dictionary works in two standard tasks: approximation and sampling followed with recovery, both from theoretical as well as algorithmic perspectives. Finally, for each class, we present a case study of a real-world problem by using the proposed methodology.

Contributions. The main contribution of this paper is to build a novel and unified framework to analyze graph signals. The framework provides a general solution allowing us to study real-world data on graphs. Based on the framework, we study three typical classes of graph signals:

• Smooth graph signals. We explicitly define the smoothness criterion and construct corresponding representation dictionaries. We propose a generalized uncertainty principle on graphs and study the localization phenomenon of graph Fourier bases. We then investigate how the proposed graph dictionary works in approximation and sampling followed with recovery. Finally, we demonstrate a case study on a co-authorship network.

• Piecewise-constant graph signals. We explicitly define piecewise-constant graph signals and construct the multiresolution local sets, a local-set-based piecewise-constant dictionary and local-set-based piecewise-constant wavelet basis, which provide a multiresolution analysis on graphs and promote sparsity for such graph signals. We then investigate how the proposed local-set-based piecewise-constant dictionary works for approximation and sampling followed with recovery. Finally, we demonstrate a case study on epidemics processes.

• Piecewise-smooth graph signals. We explicitly define piecewise-smooth graph signals and construct a local-set-based piecewise-smooth dictionary, which promotes sparsity for such graph signals. We then investigate how the proposed local-set-based piecewise-smooth dictionary works in approximation. Finally, we demonstrate a case study on environmental change detection.

Outline of the paper. Section II introduces and reviews the background on signal processing on graphs; Section III proposes a representation-based framework to study graph signals, which lays the foundation for this paper; Sections IVV, and VI present representations of smooth, piecewise constant, and piecewise smooth graph signals, respectively, including their effectiveness in approximation and sampling followed by recovery, as well as validation on three different real-world problems: co-authorship network, epidemics processes, and environmental change detection. Section VII concludes the paper.

## Ii Signal Processing on Graphs

Let be a directed, irregular and weighted graph, where is the set of nodes, is the set of weighted edges, and is the weighted adjacency matrix, whose element measures the underlying relation between the th and the th nodes. Let be a degree vector, where . Given a fixed ordering of nodes, we assign a signal coefficient to each node; a graph signal is then defined as a vector,

 x=[x1,x2,⋯,xN]T∈RN,

with the signal coefficient corresponding to the node .

To represent the graph structure by a matrix, two basic approaches have been considered. The first one is based on algebraic signal processing [23], and is the one we follow. We use the graph shift operator as a graph representation, which is an elementary filtering operation that replaces a signal coefficient at a node with a weighted linear combination of coefficients at its neighboring nodes. Some common choices of a graph shift are weighted adjacency matrix , normalized adjacency matrix , and transition matrix  [24]. The second one is based on the spectral graph theory [25], where the graph Laplacian matrix is used as a graph representation, which is a second-order difference operator on graphs. Some common choices of a graph Laplacian matrix are the unnormalized Laplacian , the normalized Laplacian , and the transition Laplacian .

A graph shift emphasizes the similarity while a graph Laplacian matrix emphasizes the difference between each pair of nodes. The graph shift and graph Laplacian matrix often appear in pairs; see Table I. Another overview is presented in [26]. We use to represent a graph structure matrix, which can be either a graph shift or a graph Laplacian matrix. Based on this graph structure matrix, we are able to generalize many tools from traditional signal processing to graphs, including filtering [27, 28], Fourier transform [29, 1], wavelet transforms [30, 2], and many others. We now briefly review the graph Fourier transform.

The graph Fourier basis generalizes the traditional Fourier basis and is used to represent the graph signal in the graph spectral domain. The graph Fourier basis is defined to be the eigenvector matrix of , that is,

 R=VΛV−1,

where the th column vector of is the graph Fourier basis vector corresponding to the eigenvalue as the th diagonal element in . The graph Fourier transform of is where is called the graph Fourier transform matrix. When is symmetric, then is orthornormal; the graph Fourier basis vector is the th row vector of . The inverse graph Fourier transform is The vector represents the frequency coefficients corresponding to the graph signal , and the graph Fourier basis vectors can be regarded as graph frequency components. In this paper, we use , to denote the inverse graph Fourier transform matrix and graph Fourier transform matrix for a general graph structure matrix, which can be an adjacency matrix, a graph Laplacian, or a transition matrix. When we emphasize that and is generated from a certain graph structure matrix, we add a subscript. For example, is the graph Fourier transform matrix of the weighted adjacency matrix .

The ordering of graph Fourier basis vectors depends on their variations. The variations of a graph signal are defined in different ways. When the graph representation matrix is the graph shift, the variation of a graph signal is defined as,

 SA(x)=∥∥∥x−1|λmax(A)|Ax∥∥∥22,

where is the eigenvalue of with the largest magnitude. We can show that when the eigenvalues of the graph shift are sorted in a nonincreasing order , the variations of the corresponding eigenvectors follow a nondecreasing order .

When the graph representation matrix is the graph Laplacian matrix, the variation of a graph signal is defined as,

 SL(x)=N∑i,j=1Ai,j(xi−xj)2=xTLx.

Similarly, we can show that when the eigenvalues of the graph Laplacian are sorted in a nondecreasing order , the variations of the corresponding eigenvectors follows a nondecreasing order .

The variations of graph Fourier basis vectors thus allow us to provide the ordering: the Fourier basis vectors with small variations are considered as low-frequency components while the vectors with large variations are considered as high-frequency components [31]. We will discuss the difference between these two variations in Section IV.

Based on the above discussion, the eigenvectors associated with large eigenvalues of the graph shift (small eigenvalues of the graph Laplacian) represent low-frequency components and the eigenvectors associated with small eigenvalues of the graph shift (large eigenvalues of graph Laplacian) represent high-frequency components. In the following discussion, we assume all graph Fourier bases are ordered from low frequencies to high frequencies.

## Iii Foundations

In this section, we introduce a representation-based framework with three components, including graph signal models, representation dictionaries and the related tasks, in a general and abstract level. This lays a foundation for the following sections, which are essentially special cases that follow this general framework.

As shown in Figure 1, when studying a task with a graph, we first model the given data with some graph signal model. The model describes data by capturing its important properties. Those properties can be obtained from observations, domain knowledge, or statistical learning algorithms. We then use a representation dictionary to represent the graph signal model. In the following discussion, we go through each component one by one.

### Iii-a Graph Signal Model

In classical signal processing, people often work with signals with some specific properties, instead of arbitrary signals. For example, smooth signals have been studied extensively over decades; sparse signals are intensively studied recently. Here we also need a graph signal model to describe a class of graph signals with specific properties. In general, there are two approaches to mathematically model a graph signal, including a descriptive approach and a generative approach.

For the descriptive approach, we describe the properties of a graph signal by bounding the output of some operator. Let be a graph signal, be a function operating on , we define a class of graph signals by restricting

 f(x)≤C, (1)

where is some constant. For example, we define smooth graph signals by restricting be small [32].

For the generative approach, we describe the properties of a graph signal by using a graph dictionary. Let be a graph dictionary, we define a class of graph signals by restricting

 x = Da,

where is a vector of expansion coefficients. For the descriptive approach, we do not need to know everything about a graph signal, instead, we just need to its output of some operator; for the generative approach, we need to reconstruct a graph signal, which requires to know everything about a graph signal.

### Iii-B Graph Dictionary

For a certain graph signal model, we aim to find some dictionary to provide accurate and concise representations.

#### Iii-B1 Design

In general, there are two approaches to design a graph dictionary, including a passive approach and a active one.

For the passive approach, the graph dictionary is designed only based on the graph structure; that is,

 D=g(R),

where is some operator on the graph structure matrix . For example, can be the eigenvector matrix of , which is the graph Fourier basis. In classical signal processing, the Fourier basis, wavelet bases, wavelet frames, and Gabor frames are all constructed using this approach, where the graph is a line graph or a lattice graph [23].

For the active approach, the graph dictionary is designed based on both graph structure and a set of given graph signals; that is,

 D=g(A,X),

where is a matrix representation of a set of graph signals. We can fit to those given graph signals and provide a specialized dictionary. Some related works see [13, 14].

#### Iii-B2 Properties

The same class of graph signals can be modeled by various dictionaries. For example, whenever is an identity matrix, it can represent arbitrary graph signals, but may not be appealing to represent a non-sparse signals. Depending on the application, we may have different requirements for the constructed graph dictionary. Here are some standard properties of a graph dictionary , we aim to study.

• Frame bounds. For any in a certain graph signal model,

 α1∥x∥2≤∥Dx∥2≤α2∥x∥2,

where are some constants;

• Sparse representations. For any in a certain graph signal model, there exists a sparse coefficient with , which satisfies

 ∥x−Da∥22≤ϵ,

where are some constants;

• Uncertainty principles. For any in a certain graph signal model, the following is satisfied

 ∥a1∥0+∥a2∥0≥C,

where and , and .

### Iii-C Graph Signal Processing Tasks

We mainly consider two standard tasks in signal processing, approximation and sampling followed with recovery.

#### Iii-C1 Approximation

Approximation is a standard task to evaluate a representation. The goal is to use a few expansion coefficients to approximate a graph signal. We consider approximating a graph signal by using a linear combination of a few atoms from and solving the following sparse coding problem,

 x∗,a∗= argminx,a d(x′,x), subject to: x′=Da, ∥a∥0≤K.

where is some evaluation metric. The objective function measures the difference between the original signal and the approximated one, which evaluates how well the designed graph dictionary represents a given graph signal. The same formulation can also be used for denoising graph signals.

#### Iii-C2 Sampling and Recovery

The goal is to recover an original graph signal from a few samples. We consider a general sampling and recovery setting. We consider any decrease in dimension via a linear operator as sampling, and, conversely, any increase in dimension via a linear operator as recovery [19]. Let be a sampling pattern matrix, which is constrained by a given application and the sampling operator is

 Ψ=CF∈RM×N, (3)

where selects rows from . For example, when we choose the th row of as the th sample, the th row of is

 Ci,j={1,j=k;0,otherwise.

There are three sampling strategies: (1) uniform sampling when designing , , where row indices are chosen from from independently and uniformly; and experimentally design sampling, where row indices can be chosen beforehand; and active sampling, where we will use feedback as samples are sequentially collected to decide the next row to be sampled. Each sampling strategy can be implemented by two approaches, including a random approach and a deterministic one. The sampling pattern matrix constraints the following sampling patterns: when is an identity matrix, is a subsampling operator; when is a Gaussian random matrix, is a compressed sampling operator.

In the sampling phase, we take samples with the sampling operator ,

 xΨ=Ψy=Ψ(x+ϵ),

is a vector of samples and is noise with zero mean and as variance. In the recovery phase, we reconstruct the graph signal by using a recovery operator ,

 x′=ΦxΨ,

where recovers either exactly or approximately. The evaluation metric can be the mean square error or other metrics. Without any property of , it is hopeless to design an efficient sampling and recovery strategy. Here we focus on a special graph signal model, which can be described by a graph dictionary. The prototype of designing sampling and recovery strategies is

 Ψ∗(D),Φ∗(D)=minΨ,Φmaxa d(x′,x), subject to x′=ΦΨ(x+ϵ), x=Da,

where is some evaluation metric. The optimal sampling and recovery strategies are influenced by the given graph dictionary . We often consider fixing either the sampling strategy or the recovery strategy and optimizing over the other one.

## Iv Representations of Smooth Graph Signals

Smooth graph signals are mostly studied in the previous literature; however, many works only provide a heuristic. Here we rigorously define graph signal models and design the corresponding graph dictionaries.

### Iv-a Graph Signal Models

We introduce four smoothness criteria for graph signals; while they have been implicitly mentioned previously, none have been rigorously defined. The goal here is not to conclude which criterion or representation approach works best; instead, we aim to study the properties of various smoothness criteria and model a smooth graph signal with a proper criterion.

###### Definition 1.

A graph signal with unit norm is pairwise Lipschitz smooth with parameter when it satisfies

 |xi−xj| ≤ C d(vi,vj), for all i,j=0,1,…,N−1,

with the distance between the th and the th nodes.

We can choose the geodesic distance, the diffusion distance [33], or some other distance metric for . Similarly to the traditional Lipschitz criterion [18], the pairwise Lipschitz smoothness criterion emphasizes pairwise smoothness, which zooms into the difference between each pair of adjacent nodes.

###### Definition 2.

A graph signal with unit norm is total Lipschitz smooth with parameter when it satisfies

 ∑(i,j)∈EWi,j(xi−xj)2 ≤ C.

The total Lipschitz smoothness criterion generalizes the pairwise Lipschitz smoothness criterion while still emphasizing pairwise smoothness, but in a less restricted manner; it is also known as the Laplacian smoothness criterion [34].

###### Definition 3.

A graph signal with unit norm is local normalized neighboring smooth with parameter when it satisfies

 ∑i⎛⎝xi−1∑j:(i,j)∈EWi,j∑j:(i,j)∈EWi,jxj⎞⎠2 ≤ C.

The local normalized neighboring smoothness criterion compares each node to the local normalized average of its immediate neighbors.

###### Definition 4.

A graph signal with unit norm is global normalized neighboring smooth with parameter when it satisfies

 ∑i⎛⎝xi−1|λmax(W)|∑j:(i,j)∈EWi,jxj⎞⎠2 ≤ C.

The global normalized neighboring smoothness criterion compares each node to the global normalized average of its immediate neighbors. The difference between the local normalized neighboring smoothness criterion and the global normalized neighboring smoothness criterion is the normalization factor. For the local normalized neighboring smoothness criterion, each node has its own normalization factor; for the global normalized neighboring smoothness criterion, all nodes have the same normalization factor.

The four criteria quantify smoothness in different ways: the pairwise and the total Lipschitz ones focus on the variation of two signal coefficients connected by an edge with the pairwise Lipschitz one more restricted, while the local and global neighboring smoothness criterion focuses on comparing a node to the average of its neighbors.

### Iv-B Graph Dictionary

As shown in (1), The graph signal models in Definitions 1234 are introduced in a descriptive approach. Following these, we are going to translate the descriptive approach into a generative approach; that is, we represent the corresponding signal classes satisfying each of the four criteria by some representation graph dictionary.

#### Iv-B1 Design

We first construct polynomial graph signals that satisfy the Lipschitz smoothness criterion.

###### Definition 5.

A graph signal is polynomial with degree when

 x = Dpoly(K)a = [1D(1)D(2)…D(K)]a∈RN,

where and is a graph polynomial dictionary with . Denote this class by .

In classical signal processing, polynomial time signals can be expressed as , ; we can rewrite this as in the above definition as , with . The columns of are denoted as , , and called atoms; the elements of each atom are . Since polynomial time signals are shift-invariant, we can set any time point as the origin; such signals are thus characterized by degrees of freedom , . This is not true for graph signals, however; they are not shift-invariant and any node can serve as the origin (see Figure 3). In the above definition, are now matrices with the number of atoms equal to the number of nodes (with each atom corresponding to the node serving as the origin). The dictionary thus contains atoms.

###### Theorem 1.

(Graph polynomial dictionary represents the pairwise Lipschiz smooth signals) is a subset of the pairwise Lipschitz smooth with some parameter .

###### Proof.

Let , that is,

 x=[1D(1)]a,

Then, we write the pairwise Lipschitz smooth criterion as

 |xi−xj| = |∑k(d(vk,vi)−d(vk,vj))ak| ≤ ∑k|d(vk,vi)−d(vk,vj)||ak| ≤ ∑k|ak||d(vi,vj)| = ∥a∥1|d(vi,vj)|.

The parameter , which corresponds to the energy of the original graph signal. ∎

We now construct bandlimited signals that satisfy the total Lipschitz, local normalized neighboring and global normalized neighboring smoothness smoothness criteria.

###### Definition 6.

A graph signal is bandlimited with respect to a graph Fourier basis with bandwidth when

 x=V(K)a,

where and is a submatrix containing the first columns of . Denote this class by  [10].

When is the eigenvector matrix of the unnormalized graph Laplacian matrix, we denote it as and can show that signals in are total Lipschitz smooth; when is the eigenvector matrix of the transition matrix, we denote it as and can show that signals in are local normalized neighboring smooth; when is the eigenvector matrix of the weighted adjacency matrix, we denote it as and can show that signals in are global normalized neighboring smooth.

###### Theorem 2.

(The graph Fourier basis of the graph Laplacian represents the total Lipschiz smooth signals) For any , is a subset of the total Lipschitz smooth with parameter , when .

###### Proof.

Let be a graph signal with bandwidth , that is,

 x=K∑k=1ˆxkv(L)k,

Then, we write the total Lipschitz smooth criterion as

 ∑(i,j)∈EWi,j(xi−xj)2 = xTLx = (K∑k=1ˆxkv(L)k)T(K−1∑k=0ˆxkλkv(L)k) = K∑k=1λ(L)kˆx2k ≤ λ(L)KK∑k=1ˆx2k = λ(L)K.

###### Theorem 3.

(The graph Fourier basis of the transition matrix represents the local normalized neighboring smooth signals) For any , is a subset of the local normalized neighboring smooth with parameter , when .

###### Proof.

Let be a graph signal with bandwidth , that is,

 x=K∑k=1ˆxkv(P)k,

Then, we write the local normalized neighboring smooth criterion as

 ∣∣ ∣∣xi−1∑j∈NiWi,j∑j∈NiWi,jxj∣∣ ∣∣ = ∣∣ ∣∣(K∑k=1ˆxkv(P)k)i−∑j∈NiPi,j(K∑k=1ˆxkv(P)k)j∣∣ ∣∣ = ∣∣ ∣∣K∑k=1ˆxk⎛⎝(v(P)k)i−∑j∈NiPi,j(v(P)k)j⎞⎠∣∣ ∣∣ = ∣∣ ∣∣K∑k=1ˆxk(1−λk)(v(P)k)i∣∣ ∣∣ ≤ (1−λ(P)K)∣∣ ∣∣K−1∑k=0ˆxk(v(P)k)i∣∣ ∣∣ = (1−λ(P)K)|xi|.

The last equality follows from the fact that and are eigenvectors and eigenvalues of .

 ∑i⎛⎝xi−1∑j∈Ni∑j∈NiWi,jxj⎞⎠2 = N∑i=1|xi−1∑j∈NiWi,j∑j∈NiWi,jxj|2 ≤ N−1∑i=0(1−λ(P)K)2|xi|2 = (1−λ(P)K)2.

###### Theorem 4.

(The graph Fourier basis of the adjacency matrix represents the global normalized neighboring smooth signals) For any , is a subset of the global normalized neighboring smooth with parameter , when .

The proof is similar to Theorem 3. Note that for graph Laplacian, the eigenvalues are sorted in an ascending order; for the transition matrix and the adjacency matrix, the eigenvalues are sorted in a descending order.

Each of these three models generates smooth graph signals according to one of the four criteria in Definitions 123 and 4: models Lipschitz smooth signals; models total Lipschitz smooth signals; with models the local normalized neighboring smooth signals; and models the global normalized neighboring smooth signals; the corresponding graph representation dictionaries are , , , and .

#### Iv-B2 Properties

We next study the properties of graph representation dictionaries for smooth graph signals, especially for graph Fourier bases , , and . We first visualize them in Figures 2. Figure 2 compares the first four graph Fourier basis vectors of , and in a geometric graph. We see that tends to localize in some small regions; and have similar behaviors.

We then check the properties mentioned in Section III-B2.

Frame Bound. Graph polynomial dictionary is highly redundant and the frame bound is loose. When the graph is undirected, the adjacency matrix is symmetric, then and are orthonormal. It is hard to draw any meaningful conclusion when the graph is directed; we leave it for the future work.

Sparse Representations. Graph polynomial dictionary provides sparse representations for polynomial graph signals. On the other hand, the graph Fourier bases provide sparse representations for the bandlimited graph signals. For approximately bandlimited graph signals, there are some residuals coming from the high-frequency components.

Uncertainty Principles. In classical signal processing, it is well known that signals cannot localize in both time and frequency domains at the same time [19, 35]. Some previous works extend this uncertainty principle to graphs by studying how well a graph signal exactly localize in both the graph vertex and graph spectrum domain [11, 12]. Here we study how well a graph signal approximately localize in both the graph vertex and graph spectrum domain. We will see that the localization depends on the graph Fourier basis.

###### Definition 7.

A graph signal is -vertex concentrated on a graph vertex set when it satisfies

 ∥x−IΓx∥22≤ϵ,

where is a diagonal matrix, with when and , otherwise.

The vertex set represents a region that supports the main energy of signals. When is small, a -vertex concentrated signal is approximately sparse.

###### Definition 8.

A graph signal is -spectrum concentrated on a graph spectrum band when it satisfies

 ∥x−VΩUΩx∥22≤ϵ,

where is a submatrix of with columns selected by and is a submatrix of with rows selected by .

The graph spectrum band provides a bandlimited space that supports the main energy of signals. An equivalent formulation is . Definition 8 is a simpler version of the approximately bandlimited space in [6].

###### Theorem 5.

(Uncertainty principle of the graph vertex and spectrum domains) Let a unit norm signal supported on an undirected graph be -vertex concentrated and -spectrum concentrated at the same time. Then,

 |Γ|⋅|Ω|≥(1−(ϵΩ+ϵΓ))2∥UΩ∥2∞.
###### Proof.

We first show .

 (VΩUΩIΓx)s = ∑k∈ΩVs,k(∑i∈ΓUk,ixi) = ∑i∈Γ(∑k∈ΩVs,kUk,i)xi = ∑iq(s,i)xi,

where

 q(s,i)={∑k∈ΩVs,kUk,i,i∈Γ;0,otherwise.

Let be a graph signal with . Then, . We then have

 ∥VΩUΩIΓ∥22 ≤ ∥VΩUΩIΓ∥2HS = ∑i∈Γ∑s|q(s,i)|2 =∑i∈Γ∥∥y(i)∥∥22 = ∑i∈Γ∥∥∥ˆy(i)∥∥∥22 = ∑i∈Γ∑k(1k∈ΩUk,i)2 ≤

We then show that . Based on the assumption, we have

 ∥x−VΩUΩIΓx∥2 = ∥x−IΓx∥2+∥IΓx−VΩUΩIΓx∥2 ≤ ϵΩ+ϵΓ.

Since has a unit norm, by the triangle inequality, we have

 ∥VΩUΩIΓ∥2≥1−(ϵΩ+ϵΓ).

Finally, we combine two results and obtain

 |Γ|⋅|Ω|≥∥VΩUΩIΓ∥22∥UΩ∥2∞ > (1−(ϵΩ+ϵΓ))2∥UΩ∥2∞

We see that the lower bound involves with the maximum magnitude of . In classical signal processing, is the discrete Fourier transform matrix, so ; the lower bound is and signals cannot localize in both time and frequency domain. However, for complex and irregular graphs, the energy of a graph Fourier basis vector may concentrate on a few elements, that is, , as shown in Figures 2(a)(b)(c)(d). It is thus possible that graph signals can be localized in both the vertex and spectrum domain. We now illustrate this localization phenomenon

Localization Phenomenon. A part of this section has been shown in [36]. We show it here for the completeness. The localization of a graph signal means that most elements in a graph signal are zeros, or have small values; only a small number of elements have large magnitudes and the corresponding nodes are clustered in one subgraph with a small diameter.

Prior work uses inverse participation ratio (IPR) to quantify localization [37]. The IPR of a graph signal is

 IPR=∑Ni=1x4i(∑Ni=1x2i)2.

A large IPR indicates that is localized, while a small IPR indicates that is not localized. The range of IPR is from to . For example, is the most delocalized vector with , while is the most localized vector with . IPR has some shortcomings: a) IPR only promotes sparsity, that is, a high IPR does not necessarily mean that the nonzero elements concentrate in a clustered subgraph, which is the essence of localization (Figure  4); b) IPR does not work well for large-scale datasets. When is large, even if only a small set of elements are non zero, IPR tends to be small.

To solve this, we propose a novel measure to quantify the localization of a graph signal. We use energy concentration ratio (ECR) to quantify the energy concentration property. The ECR is defined as

 ECR=S∗N,

where is the smallest that satisfies with the first elements with the largest magnitude in . This indicates that 95% energy of a graph signal is concentrated in the first elements with largest magnitude. ECR ranges from 0 to 1: when the signal is energy-concentrated, the ECR is small; when the energy of the signal is evenly distributed, the ECR is 1.

We next use normalized geodesic distance (NGD) to quantify the clustered property. Let be the set of nodes that possesses 95% energy of the whole signal. The normalized geodesic distance is defined as:

 NGD=1D∑i,j∈M,i≠jd(vi,vj)n(n−1)/2,

where is the diameter of the graph, is the geodesic distance between nodes and . Here we use the normalized average geodesic distance as a measure to determine whether the nodes are highly connected. We use the average geodesic distance instead of the largest geodesic distance to avoid the influence of outliers. The NGD ranges from 0 to 1: when the nodes are clustered in a small subgraph, the NGD is small; when the nodes are dispersive, the NGD is large.

We use ECR and NGD together to determine the localization of graph signals. When the two measures are small, the energy of the signal is concentrated in a small set of highly connected nodes, which can be interpreted as localization.

Each graph Fourier basis vector is regarded as a graph signal. When most basis vectors in the graph Fourier basis are localized, that is, the corresponding ECRs and NGDs are small, we call that graph Fourier basis localized.

We now investigate the localization phenomenon of graph Fourier bases of several real-world networks, including the arXiv general relativity and quantum cosmology (GrQc) collaboration network [38], arXiv High Energy Physics - Theory (Hep-Th) collaboration network [38] and the Facebook ‘friend circles’ network [38]. We find similar localization phenomena among different datasets. Due to the limited space, we only show the result of arXiv GrQc collaboration network. The arXiv GrQc network represents the collaborations between authors based on the submitted papers in general relativity and quantum cosmology category of arXiv. When the author and author coauthored a paper, the graph contains an undirected edge between node and . The graph contains 5242 nodes and 14496 undirected edges. Since the graph is not connected, we choose a connect component with 4158 nodes and 13422 edges.

We investigate the Fourier bases of the weighted adjacency matrix, the transition matrix and the unnormalized graph Laplacian matrix in the arXiv GrQc network. Figure 5 illustrates the ECRs and NGDs of the first 50 graph Fourier basis vectors (low-frequency components) and the last 50 graph Fourier basis vectors (high-frequency components) of the three graph representation matrices, where the ECR and NGD are plotted as a function of the index of the corresponding graph Fourier basis vectors. We find that a large number of graph Fourier basis vectors has small ECRs and NGDs, which indicates that graph Fourier basis vectors of various graph representation matrices are localized. Among various graph representation matrices, the graph Fourier basis vectors of graph Laplacian matrix tend to be more localized, especially in high-frequency components. In low-frequency components, the graph Fourier basis vectors of adjacency matrix are more localized.

To combine the uncertainty principle previously, when the graph Fourier basis shows localization phenomenon, it is possible that a graph signal can be localized in both graph vertex and spectrum domain. Based the graph Fourier basis of the graph Laplacian, a high-frequency bandlimited signals can be well localized in the graph vertex domain; based on the graph Fourier basis of adjacency matrix, a low-frequency bandlimited signals can be well localized in the graph vertex domain. The Fourier transform is famous for capturing the global behaviors and works as a counterpart of the delta functions; however, this may not be true on graphs. The study of new role of the graph Fourier transform will be an interesting future work.

### Iv-C Graph Signal Processing Tasks

As mentioned in Section III-C, we focus on two tasks: approximation and sampling following with recovery.

#### Iv-C1 Approximation

We compare the graph Fourier bases based on different graph structure matrices.

Algorithm. We consider nonlinear approximation for the graph Fourier bases, that is, after expanding with a representation, we should choose the largest-magnitude expansion coefficients so as to minimize the approximation error. Let and be a pair of biorthonormal basis and be a signal. Here the graph Fourier transform matrix and the graph Fourier basis . The nonlinear approximation to is

 x∗=∑k∈IK⟨x,ˆϕk⟩ϕk, (4)

where is the index set of the largest-magnitude expansion coefficients. When a basis promotes sparsity for , only a few expansion coefficients are needed to obtain a small approximation error. Note that (8) is a special case of (III-C1) when the distance metric is the norm and is a basis.

Since the the graph polynomial dictionary is redundant, we solve the following sparse coding problem,

 x∗= argmina subject to: ∥a∥0≤K,

where is the graph polynomial dictionary with order and are expansion coefficients. The idea is to use a linear combination of a few atoms from to approximate the original signal. When is an orthonormal basis, the closed-form solution is exactly (8). We solve (III-C1) by using the orthogonal matching pursuit, which is a greedy algorithm [39]. Note that (IV-C1) is a special case of (III-C1) when the distance metric is the norm.

Experiments. We test the four representations on two datasets, including the Minnesota road graph [40] and the U.S city graph [7].

The Minnesota road graph is a standard dataset including 2642 intersections and 3304 roads  [40]. we construct a graph by modeling the intersections as nodes and the roads as undirected edges. We collect a dataset recording the wind speeds at those 2642 intersections [41]. The data records the hourly measurement of the wind speed and direction at each intersection. In this paper, we present the data of wind speed on January 1st, 2015. Figure 6(a) shows a snapshot of the wind speeds on the entire Minnesota road. The expansion coefficients obtained by using four representations are shown in Figure 6(b), (c), (d) and (e). The energies of the frequency coefficients of , and mainly concentrate on the low-frequency bands; and are more concentrated; is redundant and the corresponding expansion coefficients are not sparse.

To make a more serious comparison, we evaluate the approximation error by using the normalized mean square error, that is,

 Normalized MSE=∥x∗−x∥22∥x∥22, (6)

where is the approximation signal and is the original signal. Figure 6(f) shows the approximation errors given by the four representations. The x-axis is the number of coefficients used in approximation, which is in (8) and (III-C1) and the y-axis is the approximation error, where lower means better. We see that and tie ; all of them are much better than . This means that the wind speeds on the Minnesota road graph are well modeled by pairwise Lipschitz smooth, total Lipschitz smooth and local normalized neighboring smooth criteria. The global normalized neighboring smooth criterion is not appropriate for this dataset.

The U.S weather station graph is a network representation of weather stations across the U.S. We assign an edge when two weather stations are within 500 miles. The graph includes nodes and undirected, unweighted edges. Each weather station has days of recordings (one recording per day), for a total of 365 graph signals. As an example, see Figure 7(a). The expansion coefficients obtained by using four representations are shown in Figure 7(b), (c), (d) and (e). Similarly to the wind speed dataset, the energies of the frequency coefficients of , and are mainly concentrated on the low-frequency bands; and are more concentrated; is redundant and the corresponding expansion coefficients are not sparse.

The evaluation metric of the approximation error is also the normalized mean square error. Figure 7(f) shows the approximation errors given by the four representations. The results are averages over 365 graph signals. Again, we see that , and perform similarly; all of them are much better than . This means that the wind speeds on the Minnesota road graph are well modeled by pairwise Lipschitz smooth, total Lipschitz smooth and local normalized neighboring smooth criteria. The global normalized neighboring smooth criterion is not appropriate for this dataset.

The results from two real-world datasets suggest that we should consider using pairwise Lipschitz smooth, total Lipschitz smooth and local normalized neighboring smooth criteria to model real-world smooth graph signals. In terms of the representation dictionary, among , and , is redundant; is not orthonormal. We thus prefer using because it is an orthonormal basis.

#### Iv-C2 Sampling and Recovery

The goal is to recover a smooth graph signal from a few subsamples. A subsample is collected from an individual node each time; that is, we constraint the sampling pattern matrix in (3) be an identity matrix. Previous works show that in this senario, experimentally designed sampling is equivalent to active sampling and is better than uniform sampling asymptotically [42]. Here we compare various sampling strategies based on experimentally designed sampling, which are implemented in a deterministic approach. The random approach sees [6].

Algorithm. We follow the sampling and recovery framework in Section III-C2. Let a smooth graph signal be . For example, , then are the frequency coefficients. In general, we assume that the energy of the expansion coefficient is concentrated in a few known supports, that is, , where and is the known. We aim to recover and further approximate by using . We consider the partial least squares (PLS), , where

 a∗PLS = argmina∥∥ΨTxΨ−ΨTΨDΩaΩ∥∥22 = (DTΩΨTΨDΩ)−1DΩΨTΨy = (ΨDΩ)†Ψ(DΩaΩ+DΩcaΩc+ϵ),

where is the noisy version of the graph signal with is noise, and , which is a diagonal matrix with , when the th node is sampled, and 0, otherwise. The recovery error is then

 x∗PLS−x = DΩ(ΨDΩ)†Ψ(DΩcaΩc+ϵ),

where the first term is bias and the second term is the variance from noise.

We aim to optimize the sampling strategy by minimizing the recovery error and there are six cases to be considered:

1. minimizing the bias in the worst case; that is,

 minΨ∥∥(ΨDΩ)†ΨDΩc∥∥2,

where is the largest singular value;

2. minimizing the bias in expectation; that is,

 minS∥∥(ΨDΩ)†ΨDΩc∥∥2F,

where is the Frobenius norm;

3. minimizing the variance from noise in the worst case; that is,

 minΨ∥∥(ΨDΩ)†Ψ∥∥2;
4. minimizing the variance from noise in expectation; that is,

 minΨ∥∥(ΨDΩ)†Ψ∥∥F;
5. minimizing the recovery error in worst case, that is,

 minΨ∥∥(ΨDΩ)†ΨDΩc∥∥2+c∥∥(ΨDΩ)†Ψ∥∥2,

where is related to the signal-to-noise ratio;

6. minimizing the recovery error in expectation; that is,

 minΨ∥∥(ΨDΩ)†ΨDΩc∥∥F+c∥∥(ΨDΩ)†Ψ∥∥F,

where is related to the signal-to-noise ratio.

We call the resulting sampling operator of each setting as an optimal sampling operator with respect to setting (). As shown in the previous work, (c) can be solved by a heuristic greedy search algorithm as shown in [10] and a dual set algorithm as shown in [43]; (d) can be solved by a convex relaxation as shown in [44]. When we deal with a huge graph, can be a huge matrix, solving (a), (b), (e), and (f) can lead to computational issues. When we only want to avoid the interference from a small subset of , we can use a similar heuristic greedy search algorithm to solve (a) and (e) and a similar convex relaxation to solve (b) and (f). Note that the sampling strategies are designed based the recovery strategy of partial least squares; it is not necessary to be optimal in general. Some other recovery strategies are variation minimization algorithms [45, 46, 7, 26]. Comparing to the efficient sampling strategies [26], the optimal sampling operator is more general because it does not require to have any specific property. For example, can contain high-frequency atoms.

Relations to Matrix Approximation. We show that the nodes sampled by the optimal sampling operator are actually the prototype nodes that maximally preserve the connectivity information in the graph structure matrix. In the task of matrix approximation, we aim to find a subset of rows to minimize the reconstruction error. When we specify the matrix to be a graph structure matrix, we solve

 minΨ∈RM×N∥∥R−R(ΨR)†(ΨR)∥∥ξ,

where is the subsampling operator in (3) and [43] shows that

 ∥∥R−R(ΨR)†(ΨR)∥∥2ξ≤∥R−RK∥2ξ∥∥(ΨV(K))†Ψ∥∥22,

where is the best rank approximation of and is the first columns of the graph Fourier transform matrix of . We see that when , the rows sampled by the optimal sampling operator in (c) minimize the reconstruction error of the graph structure matrix. When sampling nodes, we always lose information, but the optimal sampling operator selects most representative nodes that maximally preserves the connectivity information in the graph structure matrix.

Experiments. The main goal is to compare sampling strategies based on various recovery strategies and datasets.

We compare sampling strategies in total. We consider the sampling strategies implementing the optimal sampling operator in three ways: solving the setting (c) by the greedy search algorithm (Opt(G)), solving the setting (c) by the dual set algorithm (Opt(D)) and solving the setting (d) by solving the convex relaxation algorithm (Opt(C)). We also consider the efficient sampling strategies with three parameter settings (Eff(), where is the connection order, varying as [26], where the efficient sampling strategies provide fast implementations by taking the advantages of the properties of the graph Laplacian.

We test on recovery strategies in total, including the partial least squares (PLS) (IV-C2), harmonic functions (HF) [45], total variation minimization (TVM) [7] and graph Laplacian based variation minimization with connection order 1 (LVM()) [26]. Note that the proposed optimal sampling operator is based on PLS and the compared efficient sampling strategy is based on LVM.

We test the sampling strategies on two real-world datasets, including the wind speeds on the Minnesota road graph and the temperature measurements on the U.S city graph. As shown in Figures 6 and 7, these signals are smooth on the corresponding graphs, but are not bandlimited.

In Section IV-C1, we conclude that the graph Fourier basis based on graph Laplacian models the graph signals in these two datasets well. We thus specify the first columns of as in (IV-C2), where and is the sample size. The physical meaning of is the bandwidth of a bandlimited space, which means that we design the samples based on a small bandlimited space. We use the same bandwidths in the recovery strategies of the partial least squares and the iterative projection between two convex sets.

Figures 8 and 9 shows the comparison of sampling strategies based on recovery strategies on the temperature dataset and the wind dataset, respectively. For each figure, the -axis is the number of samples and the -axis is the recovery error, which is evaluated by Normalized MSE (6) in a logarithm scale. For both datasets, three optimal sampling operators are competitive with three efficient sampling strategies under each of recovery strategies. Since efficient sampling strategies need to compute the eigenvectors corresponds to the small eigenvalues, the computation is sometimes unstable when the graph is not well connnected. Within three optimal sampling operators, the greedy search algorithm provides the best performance. As shown in [26], the computational complexity of greedy search algorithm is , where is the number of samples, which is computationally inefficient; however, since we are under the experimentally designed sampling, all the algorithms are designed offline. When the sample size is not too huge, we prefer using a slower, but more accurate sampling strategy. The dual set algorithm also provides competitive performance and the computational complexity of dual set algorithm is , which is more efficient than the greedy search algorithm. Thus, when one needs a small number of samples, we recommend the optimal sampling operator implemented by the greedy search algorithm; when one needs a large number of samples, we recommend the optimal sampling operator implemented by the dual set algorithm.

In [41], the sampling followed with recovery of wind speeds is used to plan routes for autonomous aerial vehicles.

#### Iv-C3 Case Study: Coauthor Network

We aim to use the proposed approximation and sampling with recovery techniques to large-scale graphs. Here we present some preliminary results.

We collect the publications on three journals, including IEEE Transactions on Signal Processing (TSP), Image Processing (TIP) and Information Theory (TIT). The dataset includes papers on TSP contributed by authors, papers on TIP contributed by authors, papers on TIT contributed by authors. We construct a coauthor network where nodes are unique authors and edges indicate the coauthorship. The edge weight when author and wrote at least one journal together, and , otherwise. The graph includes unique authors and coauthorships. We emphasize unique authors because some of them may publish papers on more than one journals.

We form a graph signal by count the total number of papers on TSP for each unique author, which describes a distribution of the contribution to the signal processing community. Intuitively, the graph signal is smooth because people in the same community often write papers together. We then check which graph Fourier basis can well represent this smooth graph signal. Since the graph is huge, the full eigendecomposition is computational inefficient and we only compute eigenvectors. For the adjacency matrix, we compute the eigenvectors corresponding the largest