Efficient Sampling of Bandlimited Graph Signals

# Efficient Sampling of Bandlimited Graph Signals

Abolfazl Hashemi,  Rasoul Shafipour,  Haris Vikalo,  and Gonzalo Mateos,  Work in this paper was supported in part by the NSF award CCF-1750428. Abolfazl Hashemi and Haris Vikalo are with the Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712, USA. Rasoul Shafipour and Gonzalo Mateos are with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627, USA. Part of the results in this paper were presented at the Fourty-Third IEEE International Conference on Acoustics, Speech, and Signal Processing, Calgary, Canada, April 2018 [1] and may appear at the Sixth IEEE Global Conference on Signal and Information Processing, Anaheim, California, USA, November 2018 [2].
###### Abstract

We study the problem of sampling and reconstructing bandlimited graph signals where the objective is to select a subset of nodes of pre-specified cardinality that ensures interpolation of the original signal with the lowest possible reconstruction error. First, we consider a non-Bayesian scenario and propose an efficient iterative sampling procedure that in the noiseless case enables exact recovery of the original signal from the set of selected nodes. In the case of noisy measurements, a bound on the reconstruction error of the proposed algorithm is established. Then, we consider the Bayesian scenario where we formulate the sampling task as the problem of maximizing a monotone weak submodular function, and propose a randomized-greedy algorithm for finding a sub-optimal subset. We derive the worst-case performance guarantee on the mean-square error achieved by the randomized-greedy algorithm for general non-stationary graph signals. We further study the problem of identifying unknown support of bandlimited signals and show that under a pragmatic sufficient condition, the proposed framework perfectly discovers the support using a small number of samples. The efficacy of the proposed methods is illustrated through numerical simulations on synthetic and real-world graphs.

graph signal processing, sampling, reconstruction, weak submodularity, iterative algorithms

## I Introduction

Network data that is naturally supported on vertices of a graph is becoming increasingly ubiquitous, with examples ranging from the measurements of neural activities at different regions of the brain [3] to vehicle trajectories over road networks [4]. Predicated on the assumption that the properties of a network process relate to the underlying graph, the goal of graph signal processing (GSP) is to broaden the scope of traditional signal processing tasks and develop algorithms that fruitfully exploit this relational structure [5, 6]. Consider a network represented by a graph consisting of a node set of cardinality and a weighted adjacency matrix whose entry, , denotes weight of the edge connecting node to node . A graph signal is a vertex-valued network process that can be represented by a vector of size supported on , where its component denotes the signal value at node .

A cornerstone problem in GSP that has drawn considerable attention in recent years pertains to sampling and reconstruction of graph signals [7, 8, 9, 10, 11, 12, 13, 14]. The task of selecting a subset of nodes whose signals enable reconstruction of the information in the entire graph with minimal loss is known to be NP-hard. Conditions for exact reconstruction of graph signals from noiseless samples were put forth in [7, 8, 9, 10]. Existing approaches for sampling and reconstruction of graph signals can be categorized in two main groups – selection sampling [10] and aggregation sampling [12]. The focus of the current paper is on the former.

Sampling of noise-corrupted signals using randomized schemes including uniform and leverage score sampling is studied in [15]; there, optimal sampling distributions and performance bounds are derived. Building on the ideas of variable density sampling from compressed sensing, [16] derives random sampling schemes and proves that samples are sufficient to recover all -band-limited signals with high probability. Moreover, [16] provides a fast technique for accurate estimation of the optimal sampling distribution. Recent work [17] relies on loop-erased random walks on graphs to speed up sampling of band-limited signals. In [11, 14], reconstruction of graph signals and their power spectrum density is studied and schemes based on the greedy sensor selection algorithm [18, 19] are developed. However, the performance guarantees in [15, 14] are restricted to the case of stationary graph signals, i.e., the covariance matrix in the nodal or spectral domains is required to have a certain structure (e.g., diagonal; see also [20, 21, 22]).

An influential work [23] presents a method that enables recovery of some band-limited functions on a simple undirected unweighted graph using the signal values observed on the so-called uniqueness sets of vertices; see also [24] and [25]. An iterative local set-based algorithm based on graph partitioning that improves the convergence rate of band-limited graph signals reconstruction is proposed in [26]. Another sampling approach relies on collecting observations at a single node instead of a subset of nodes via successive applications of the so-called graph shift operator and aggregating the results [12]. Specifically, shifted versions of the signal are sampled at a single node which, under certain conditions, enables recovery of the signal at all nodes. While the aggregation sampling in [12] reduces to classical sampling for time signals, the required inspection of the invertibility of the submatrix of eigenvectors is computationally expensive. Moreover, the recovery of graph signals from their partial samples collected via the aggregation scheme requires the first components (signal bandwidth) to be distinct, which may not be the case in certain applications.

A main challenge in sampling and reconstruction is the problem of identifying the support of bandlimited graph signals [12, 27, 28, 24, 9]. In [24, 9], support identification of smooth graph signals is studied. However, the techniques in [24, 9] rely solely on a user-defined sampling strategy and the graph Laplacian, and disregard the availability of observations of the graph signal. A similar scheme is developed in [12] for aggregation sampling where under established assumptions on the topology of a graph, conditions for exact support identification from noiseless measurements are established. In particular, the aggregation sampling method of [12] requires twice as many samples as the bandwidth of the graph signal (i.e., ) to guarantee perfect recovery in the noiseless setting. An alternating minimization approach that jointly recovers unknown support of the signal and designs a sampling strategy in an iterative fashion is proposed in [27]. However, the convergence of the alternating scheme in [27] is not guaranteed and the conditions for exact support identifications are unknown [27].

In this paper, we consider the task of sampling and reconstruction of bandlimited graph signals in various settings.

We first study the non-Bayesian scenario where no prior information about signal covariance is available. Based on ideas from compressed sensing, we develop a novel and efficient iterative sampling approach that exploits the low-cost selection criterion of the orthogonal matching pursuit algorithm [29] to recursively select a subset of nodes of the graph. We theoretically demonstrate that in the noiseless case the original -bandlimited signal can be recovered exactly from the set of selected nodes with cardinality . In the case of -norm bounded noise, we establish a bound on the worst-case reconstruction error of the proposed algorithm that turns out to be proportional to the bound on the -norm of the noise term. The proposed scheme requires only the normal adjacency matrix, a regularly made assumption in prior works on sampling of graph signals. Therefore, the proposed iterative algorithm guarantees recovery for a wide class of graph structures.

Next, we study the Bayesian scenario where the graph signal is a non-stationary network process with a known non-diagonal covariance. Following [14, 18, 19], we formulate the sampling task as the problem of maximizing a monotone weak submodular function that is directly related to the mean square error (MSE) of the linear estimator of the original graph signal. To find a sub-optimal solution to this combinatorial optimization problem, we propose a randomized-greedy algorithm that is significantly faster than the greedy sampling method developed in [14, 18, 19]. We theoretically analyze performance of the proposed randomized-greedy algorithm and demonstrate that the resulting MSE is a constant factor away from the MSE of the optimal sampling set. Unlike the prior work in [14], our results do not require stationarity of the graph signal. Furthermore, in contrast to existing theoretical works, we do not restrict our study to the case of additive white noise. Instead, we assume that the noise coefficients are independent but the power of noise is different for individual nodes of the graph.

We then extend our results to the case where the support of the bandlimited signal is unknown. In particular, we show that in the noiseless scenario the proposed framework still requires sampling nodes to exactly identify the unknown support and recover the entire graph signal. In the noisy scenario, we derive a sufficient condition for the exact identification of the support. Specifically, we demonstrate that if the smallest entry of the sampled graph signal satisfies a certain sufficient condition, the proposed framework can perfectly identify unknown support from sampled nodes.

Finally, simulation studies on both synthetic and real world graphs verify our theoretically findings and illustrate that the proposed sampling framework compares favorably to competing alternatives in terms of both accuracy and runtime.

The rest of the paper is organized as follows. Section II reviews the relevant background and concepts. In Section III, we formally state the sampling problem and develop the proposed iterative selection sampling method. In Section IV, we study the Bayesian setting and introduce the randomized-greedy algorithm for the sampling task and theoretically analyze its performance. Extension of proposed frameworks to the case of unknown support are developed in Section V. Section VI presents simulation results while the concluding remarks are stated in Section VIII.

## Ii Preliminaries

In this section, we briefly overview notation, concepts, and definitions that are used in the development of the proposed algorithmic and theoretic frameworks.

### Ii-a Notations

Bold capital letters denote matrices while bold lowercase letters represent vectors. Sets are denoted by calligraphic letters, , and denotes the cardinality of set . denotes the entry of , () is the row (column) of , () is the submatrix of that contains rows (columns) indexed by the set , and and are the largest and the smallest eigenvalues of , respectively. is the projection operator onto the orthogonal complement of the subspace spanned by the rows of , where denotes the Moore-Penrose pseudo-inverse of and is the identity matrix. Finally, returns the support of .

### Ii-B Bandlimited Graph signals

Let be a graph signal which is -bandlimited in a given basis . This means that the signal’s so-called graph Fourier transform (GFT) is -sparse. There are several choices for in the literature with most aiming to decompose a graph signal into different modes of variation with respect to the graph topology. For instance, can be defined via the Jordan decomposition of the adjacency matrix [30, 31], through the eigenvectors of the Laplacian when is undirected [5], or it can be obtained as the result of an optimization procedure [32, 33]. In this paper, we assume that the adjacency matrix is normal which in turn implies is unitary and .

Recall that since is bandlimited, is sparse with at most nonzero entries. Let be the support set of , where . Then one can write , where . In the sequel, without a loss of generality we assume does not contain all-zero rows; otherwise, one could omit the all-zero rows of and their corresponding nodes from the graph as they provide no meaningful information about the graph signal. Moreover, we start by assuming that the support set is known. In Section V, we discuss how to tackle sampling scenarios where is unknown or only a probabilistic prior on is available.

### Ii-C Submodularity concepts

An important concept in contemporary combinatorial optimization is the notion of submodular functions that has recently found applications in many signal processing tasks. Relevant concepts are formally defined below.

###### Definition 1 (Submodularity and monotonicity).

Let be a ground set. Set function is submodular if

 f(S∪{j})−f(S)≥f(T∪{j})−f(T)

for all subsets and . The term is the marginal value of adding element to set . Furthermore, is monotone if for all .

In many applications, the objective function of the combinatorial optimization problem of interest is not submodular. The notion of set functions with bounded curvature captures these scenarios by generalizing the concept of submodularity.

###### Definition 2 (Curvature).

The maximum element-wise curvature of a monotone non-decreasing function is defined as

 Cf=maxl∈[N−1]max(S,T,i)∈Xlfi(T)/fi(S),

where .

The maximum element-wise curvature essentially quantifies how close the set function is to being submodular. It is worth noting a set function is submodular if and only if its maximum element-wise curvature satisfies . When , is called a weak submodular function.

## Iii Sampling of Bandlimited Graph Signals

In this section, we study the problem of sampling bandlimited signals with known support. In particular, we assume that a graph signal is sparse given a basis and that where is the adjacency matrix of the undirected graph ; alternatively, we may use the Laplacian matrix to characterize the undirected graph. We can also consider any orthogonal basis for the most general directed graphs; see e.g., [32]. We first consider the noise-free scenario (Section III-A) and then extend our results to the case of sampling and reconstruction from noisy measurements (Section III-B).

### Iii-a Sampling bandlimited graph signals

In selection sampling (see, e.g.[10]), sampling a graph signal amounts to finding a matrix such that , where denotes the sampled graph signal. Since is bandlimited with support and , it holds that The original signal can then be reconstructed as

 ^x=U¯xK=U(CU)−1~x. (1)

According to (1), a necessary and sufficient condition for perfect reconstruction (i.e., ) from noiseless observations, is guaranteed by the invertibility of matrix . However, as argued in [12, 7] (see, e.g. Section III-A in [12]), current random selection sampling approaches cannot construct a sampling matrix to ensure is invertible for an arbitrary graph; moreover, invertibility of is examined by inspection which for large graphs requires intensive computational effort. To overcome these issues, motivated by the well-known OMP algorithm in compressed sensing [29], we propose a simple iterative scheme with complexity that with probability one guarantees perfect recovery of from the sampled signal .

The proposed approach (see Algorithm 1) works as follows. First, the algorithm chooses an arbitrary node of the graph with index for some as a residual node. Then, in the iteration the algorithm identifies a node with index to be included in the sampling set according to

 js=argmaxj∈N∖S|r⊤i−1uj|2∥uj∥22, (2)

where is a residual vector initialized as . This procedure is repeated for iterations to construct the sampling set .

Theorem 1 demonstrates that Algorithm 1 returns a sampling set that ensures perfect recovery of the graph signal in the noise-free scenario.

###### Theorem 1.

Let denote the sampling set constructed by Algorithm 1 and let be the corresponding sampling matrix such that . Then, the matrix is always invertible.

###### Proof.

See Appendix A. ∎

Theorem 1 states that as long as the adjacency matrix is normal, the proposed selection scheme guarantees perfect reconstruction of the original signal from its noiseless samples. Therefore, in contrast to existing random selection sampling and aggregation sampling schemes [10, 16, 12] that require strong conditions on (e.g., eigenvalues of to be distinct), Algorithm 1 guarantees recovery for a wider class of graphs.

The worst-case computational complexity of Algorithm 1 is analyzed next. In the iteration, step 6 costs as one needs to search over rows of and compute inner-product of -dimensional vectors in order to evaluate the selection criterion. Step 8 is a matrix-vector product whose complexity is . Note that in our implementations we use the modified Gram-Schmidt (MGS) algorithm to update the residual vector with a significantly lower complexity of . Thus the total cost of the iteration is . Since and there are iterations, the overall complexity of Algorithm 1 is .

### Iii-B Sampling in the presence of noise

Here we provide an extension of the proposed selection sampling scheme to scenarios where only noisy observations of the graph nodes are available. Note that because of the noise, perfect reconstruction is no longer possible. Nevertheless, we provide an upper bound on the reconstruction error of the proposed sampling scheme as a function of the noise covariance and the sampling matrix . Another distinguishing aspect of sampling and reconstruction in the presence of noise is that, to achieve better reconstruction accuracy, it may be desirable to select nodes as the sampling set. This stands in contrast to the noiseless case where, as we proved, sampling nodes are sufficient for perfect reconstruction if the sampling set is constructed by Algorithm 1.

Let be the noise-corrupted signal, where denotes the zero-mean noise vector with covariance matrix . We also assume that the support is known. Therefore, since , the samples and the non-zero frequency components of are related via the linear model

 ~x=yS=US,r¯xK+nS, (3)

where , , and . The reconstructed signal in the Fourier domain satisfies the normal equation [34],

 U⊤S,rQ−1SUS,r^¯x=U⊤S,rQ−1S~x, (4)

where is the covariance of . If the matrix is invertible, we can recover the original graph signal up to an error term as stated in the following theorem.

###### Proposition 1.

Let be the sampling set constructed by Algorithm 1 and let be the corresponding sampling matrix. Moreover, let us denote . Then with probability one the matrix is invertible. Furthermore, if , the reconstruction error of the signal reconstructed from satisfies

 ∥^x−x∥2≤σmax((U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S)ϵn. (5)
###### Proof.

See Appendix B. ∎

Proposition 1 states an explicit bound on the reconstruction error of the proposed sampling scheme for general noise models with bounded -norm. In addition to the performance bound stated in Proposition 1, another advantage of the proposed selection sampling scheme is that, unlike the aggregation sampling strategy, it preserves structural properties of noise. More specifically, if is white with , the effective noise term in aggregation sampling becomes correlated (see Section IV-A in [12]) while under the proposed sampling method the noise term remains unchanged and hence white. In aggregation sampling, on the other hand, whiteness of the noise term in the above model is not preserved.

Remark. Compared to the noiseless scenario where the main challenge is to ensure is invertible, in the presence of noise a sampling scheme with the lowest reconstruction error is desired. Although Proposition 1 states a performance bound for any sampling matrix constructed by Algorithm 1, the statistics of noise is not exploited in the process of constructing . This issue also arises in state-of-the-art random selection sampling and aggregation sampling schemes [10, 16, 12], where one resorts to an exhaustive search over the space of all sampling matrices to find one that results in the lowest mean-square error. In the Bayesian setting studied in Section IV where one assumes a prior distribution on , the original signal can be reconstructed up to an error term for any with . Therefore, invertibility of is not a concern in the Bayesian case and there we focus on constructing a sampling set with the lowest reconstruction error.

## Iv Bayesian Sampling of Graph Signals

So far we have considered the problem of sampling in scenarios where the graph signal is not stochastic. In this section, we consider the problem of sampling and interpolation in a Bayesian setting where the graph signal is a non-stationary network process. To this end, we adopt the following definition of stationarity, recently proposed in [21].

###### Definition 3.

A stochastic graph signal is graph wide-sense stationary (GWSS) if and only if the matrix

 (6)

is diagonal.

In addition to our novel algorithmic contributions, the setting we consider in this section is more general than those considered in [14, 35, 36, 37]. Specifically, in contrast to the prior work in [14], we assume that the signal in not necessarily stationary with respect to and that is a zero-mean random vector with (generally non-diagonal) covariance matrix = . Furthermore, we do not restrict our study to the case of additive white noise. Rather, we consider a more practical setting where the noise coefficients are independent but the power of noise is different for individual nodes of the graph. That is, if denotes the noise-corrupted signal, is the zero-mean noise vector with covariance matrix . Note that this particular scenario is not explored in the prior work [14] or the related sensor selection and experimental design schemes [35, 36, 37].

Let denote a sampling set of graph nodes. Since , the samples and the non-zero frequency components of are related via the Bayesian linear model

 yS=US,r¯xK+nS. (7)

As before, in order to find it suffices to estimate based on . The least mean-square estimator of , denoted by , satisfies the Bayesian counterparts of the normal equations in the Gauss-Markov theorem (see, e.g. [34, Ch. 10]). Accordingly, it is given by

 ^¯xK=¯ΣSU⊤S,rQ−1SyS, (8)

where

 ¯ΣS (9) =⎛⎝P−1+∑j∈S1σ2juju⊤j⎞⎠−1

is the error covariance matrix of . Therefore, and its error covariance matrix can be obtained as .

The problem of sampling for near-optimal reconstruction can now be formulated as the task of choosing so as to minimize the mean square error (MSE) of the estimator . Since the MSE is defined as the trace of the error covariance matrix, we obtain the following optimization problem,

 minSTr(ΣS) s.t.S⊆N,k|S|≤m. (10)

Using trace properties and the fact that , (10) simplifies to

 minSTr(¯ΣS) s.t.S⊆N,k|S|≤m. (11)

The optimization problem (11) is NP-hard and evaluating all possibilities to find the exact solution makes it intractable even for relatively small graphs. To this end, we propose an alternative to find a near-optimal solution in polynomial time. In [14], following the greedy sensor selection approach of [18, 19], a greedy algorithm is proposed for the described Bayesian setting and its performance is analyzed under the assumption that the graph signal is stationary and the noise is white. In application dealing with extremely large graphs, the greedy algorithm proposed in [14, 18, 19] might be computationally infeasible. Moreover, the graph signal is not necessary stationary and, perhaps more importantly, different nodes of the graph experience different levels of noise. To address these challenges, motivated by the algorithm recently developed in [38] for maximization of strictly submodular functions, we develop a randomized-greedy algorithm for Bayesian sampling of graph signals that is significantly faster than the greedy algorithm. In addition, by leveraging the notion of weak submodularity, we establish performance bounds for the general setting of non-stationary graph signals.

### Iv-a Randomized-greedy selection sampling

Following [18, 19, 14], we start by formulating (11) as a set function maximization task. Let . Then (11) can equivalently be written as

 maxSf(S) s.t.S⊆N,|S|≤m. (12)

In Proposition 2 below, by applying the matrix inversion lemma [39] we establish that is monotone and weakly submodular. Moreover, we derive an efficient recursion to find the marginal gain of adding a new node to the sampling set .

###### Proposition 2.

is a weak submodular, monotonically increasing set function, , and for all

 f(S∪{j})−f(S)=u⊤j¯Σ2Sujσ2j+u⊤j¯ΣSuj, and (13)
 ¯ΣS∪{j}=¯ΣS−¯ΣSuju⊤j¯ΣSσ2j+u⊤j¯ΣSuj. (14)
###### Proof.

See Appendix C. ∎

Proposition 2 enables efficient construction of the sampling set in an iterative fashion. To further reduce the computational cost, we propose a randomized-greedy algorithm for selection sampling with minimal MSE that selects a sampling set in the following way. Starting with , at iteration of the algorithm, a subset of size is sampled uniformly at random and without replacement from . The marginal gain of each node in is found using (13), and the one corresponding to the highest marginal gain is added to . Then, the algorithm employs (14) to update for the subsequent iteration. This procedure is repeated until some stopping criteria, e.g., a condition on the cardinality of is met. Regarding , we follow the suggestion in [38] and set where is a predetermined parameter that controls the trade-off between the computational cost and MSE of the reconstructed signal; randomized-greedy algorithm with smaller produces sampling solutions with lower MSE while the one with larger requires lower computational costs. Note that if , the randomized-greedy algorithm in each iteration considers all the available nodes and hence matches the greedy scheme proposed in [18, 19, 14]. However, as we illustrate in our simulation studies, the proposed randomized-greedy algorithm is significantly faster than the greedy method in [18, 19, 14] for large while returning essentially the same sampling solution. The randomized-greedy algorithm is formalized as Algorithm 2.

### Iv-B Theoretical analysis

To start performance analysis of the proposed randomized greedy algorithm, we first we state Lemma 1 [40] that upper-bounds the difference between the values of the objective corresponding to two sets having different cardinalities.

###### Lemma 1.

[40] Let denote a monotone set function with the maximum element-wise curvatures . Let and be any two sampling sets such that with . Then, it holds that

 f(T)−f(S)≤C(r)∑j∈T∖Sfj(S), (15)

where . Moreover, is decreasing (increasing) with respect to if ().

###### Lemma 2.

(Weyl’s inequality [39]) Let and be two real positive definite matrix matrices. Then it holds that

 (16)

where denotes the largest eigenvalue of .

Theorem 2 below states that if is characterized by a bounded maximum element-wise curvature, Algorithm 2 returns a sampling subset yielding an MSE that is on average within a multiplicative factor of the MSE associated with the optimal sampling set.

###### Theorem 2.

Let denote the maximum element-wise curvature of , the objective function in (12). Let , where , , and . Let be the sampling set returned by the randomized greedy algorithm and let denote the optimal solution of (11). Then

 E[Tr(¯ΣSrg)]≤αTr(¯ΣO)+(1−α)Tr(P). (17)
###### Proof.

The proof of Theorem 2 relies on the argument that if , then with high probability the random set in each iteration of Algorithm 2 contains at least one node from . See Appendix D for the complete proof. ∎

Compared to the results of [38] where the maximization of strictly submodular and monotone functions is considered, Theorem 2 relaxes the submodularity assumption and states that submodularity is not required for near-optimal performance of the randomized greedy algorithm. Rather, if the set function is weak submodular, Algorithm (2) still selects a sampling set with an MSE near that of the optimal sampling set. In addition, even if the function is submodular (e.g., when the objective is function instead of the MSE), the approximation factor in Theorem 2 is tighter than that of [38] as the result of analysis presented in the proof of Theorem 2. Moreover, a major assumption in [38] is that is constructed by sampling with replacement. To the contrary, we assume is constructed by sampling without replacement and carry out the analysis in this setting.

Next, we study the performance of the randomized greedy algorithm using the tools of probably approximately correct (PAC) learning theory [41, 42]. That is, not only the sampling set selected by Algorithm 2 is on expectation near optimal, but the MSE associated with the selected sampling set is with high probability close to the smallest achievable MSE. The randomization of Algorithm 2 can be interpreted as an approximation of the marginal gains of the nodes selected by the greedy scheme [14, 18, 19]. More specifically, for the iteration it holds that , where subscripts and indicate the sampling sets and nodes selected by the randomized greedy (Algorithm 2) and greedy algorithm in [14, 18, 19], respectively, and for all are random variables. Following this argument and by employing the Bernstein inequality [43], we arrive Theorem 3 which states that the randomized greedy algorithm selects a near-optimal sampling set with high probability.

###### Theorem 3.

Instate the notation and hypotheses of Theorem 2. Assume is a collection of random variables such that , for all . Then, it holds that

 Tr(¯ΣSrg)≤(1−e−∑mi=1ηimc)Tr(¯ΣO)+e−∑mi=1ηimcTr(P). (18)

Moreover, if are independent, for all with probability at least it holds that

 Tr(¯ΣSrg)≤(1−e−(1−q)μϵc)Tr(¯ΣO)+e−(1−q)μϵcTr(P) (19)

for some .

###### Proof.

See Appendix E. ∎

In our simulation studies (see Section VI), we empirically verify the results of Theorems 2 and 3 and illustrate that Algorithm 2 performs favorably compared to the competing greedy scheme both on average and for each individual sampling task.

Finally, in Theorem 4 we extend the results of [14] derived for stationary graph signals and show that the maximum element-wise curvature of is bounded even for non-stationary graph signals and in the scenario where the statistics of the noise varies across nodes of the graph.

###### Theorem 4.

Let be the maximum element-wise curvature of . Then it holds that

 Cmax≤maxj∈Nλ2max(P)λ2min(P)(1+λmax(P)σ2j)3. (20)
###### Proof.

See Appendix F. ∎

It was shown in [14] that if is stationary and for some and for all , then the curvature of the MSE objective is bounded. However, Theorem 4 holds even in the scenarios where the signal is non-stationary and noise is not white.

## V Efficient Support Recovery

In the previous sections, we assumed that the support of the bandlimited graph signal (that is, the set of nonzero frequency components) is known. However, in many practical settings, the support of the signal might be unknown and one needs to recover it prior to or concurrent with the sampling step. In this section, we address this task and provide a novel framework to efficiently identify the support.

Existing selection and aggregation sampling methods capable of support identification (e.g. [12]) require twice as many samples as the bandwidth of the graph signal (i.e., ) for perfect recovery of and hence exact interpolation of from the sampled signal . Furthermore, the conditions needed to ensure exactness of aggregation sampling may be challenging to meet in practical setting where the graph is arbitrary. See Fig. 1 for an illustrative example.

Here we propose a novel approach to recover support for a broad range of graphs. Then we combine support recovery and the sampling schemes of the previous sections into an adaptive graph sampling with support identification framework.

### V-a Support recovery from historical signals

Suppose that for a given graph and its Fourier basis , the observed signals measured at all of the vertices are collected in a matrix . The goal is to perform support recovery of the underlying bandlimited signals collected in which involves estimating sparse GFTs . We assume that has full rank and that GFTs share a common support. The latter assumption means that matrix follows a block-sparse model having entire rows as zeros or nonzeros. Recall that is the support set of GFTs with . Then, upon defining

 ΨK:= {¯X=[¯x1,⋯,¯xP]∈RN×Psuch that (21)

the bandlimited signal recovery (and thus also finding the support ) boils down to solving

 min¯X ∥¯X−¯Y∥2F s.t. ¯X∈ΨK, (22)

where . Note that support recovery is a byproduct of optimizing (22). Inspired by [44], we propose to use the mixed norm to reformulate (22) as

 min¯X ∥¯X−¯Y∥2F s.t. ∥¯X∥2,0≤k, (23)

where the -mixed norm of matrix is defined as

 ∥¯X∥p,q:=(N∑i=1∥¯xi∥qp)1/q. (24)

It is obvious that the solution of (23) is obtained by the row-wise -norm thresholding of . Specifically, upon calculating the -norm of the rows of (i.e., ), we identify the largest -norm, . Then, the solution has rows given by

 ¯x⋆i={¯yi∥¯yi∥2≥ζ,0otherwise. (25)

The running time of finding the solution to (23) according to (25) is . This consists of finding the norm of the rows in and then performing an off-the-shelf sorting algorithm (e.g., merge sort) in .

Analyzing the support identifiability is challenging in the presence of noise. To render it manageable, we assume an energy constraint on the noise, i.e., impose a constraint that noise does not exceed a threshold . In the following theorem, we provide sufficient conditions for exact recovery of the support from the noisy observations assuming an energy-constrained noise model.

###### Theorem 5.

Consider the support recovery of bandlimited graph signals with unknown shared support where , i.e., . Let denote the noisy measurements of the graph signals , where models the noise. Assume is orthogonal and has bounded norm; in particular, for all . Then GFTs and, as a byproduct, the support , are identifiable by solving (22) if

 mini∈K∥¯xi∥2>2δ√P. (26)
###### Proof.

See Appendix G. ∎

Inspection of (26) shows that in the noiseless scenario (i.e., ), the support is identifiable even for . In the noisy case with , if the smallest entry of that one sampled graph signal is greater than , the proposed framework can still perfectly identify unknown support from sampled nodes. Moreover, in the presence of noise, it is conceivable that as grows, the chance of (26) being satisfied increases and the support is rendered identifiable as the number of observations grows; see also the numerical tests in Section VI.

### V-B Joint sampling and support recovery

In Section V-A we demonstrated that if (26) is met in the presence of noise, the support is identifiable from historical samples of the graph signal. After the recovery of support, one can employ Algorithm 1 (Algorithm 2) to construct a sampling set in the non-Bayesian (Bayesian) scenario. In this section, we consider the noisy scenario where the sufficient condition in V-A is not satisfied. Since in this case exact identification of support is not guaranteed, we propose an alternating scheme to jointly identify the support and construct the sampling set in an iterative fashion. Specifically, in the first iteration we set and identify an approximate support from the historical samples according to the results of Section V-A, i.e., solve

 ^¯X= argmin∥¯X∥2,0≤k ∥¯X−V−1S0,rX∥2F (27)

and set . Then, we perform either Algorithm 1 or Algorithm 2 on to construct a sampling set such that in non-Bayesian or Bayesian setting, respectively. This procedure is continued until convergence or meeting a stopping criteria, such as reaching the predetermined maximum number of iterations. This alternating scheme is formalized as Algorithm 3. Theoretical analysis of convergence properties of Algorithm 3 is of interest and is subject of our future work.

## Vi Numerical Simulations

To assess the proposed support recovery and sampling algorithms, we study their performance in recovery of signals supported on synthetic and real-world graphs.

### Vi-a Synthetic Erdős-Rényi random graphs

We first consider undirected Erdős-Rényi random graphs of size with edge probability and [45]. Bandlimited graph signals are generated by forming using randomly selected eigenvectors of the graph adjacency matrix, where or . The non-zero frequency components are drawn independently from a zero-mean Gaussian distribution with standard deviation . Measurements are formed by corrupting the signals with an additive Gaussian noise having dB power. Assuming unknown frequency support, we test the accuracy of support recovery via the proposed framework (23), (25), using the measurements collected across all nodes. Fig. 2(a) shows the support recovery error – measured by the ratio of the number of elements common between the true support and the identified frequency support to the bandwidth () – as a function of number of observations (). The results are obtained by averaging over Monte-Carlo simulations. We notice that as increases, the recovery error decreases monotonically [cf. Theorem 5] for different degrees of connectivity () and bandwidths (). The recovery performance is not impacted by varying edge probabilities since we only need to be full rank which is satisfied in all the undirected graphs considered in this study. Moreover, since with higher bandwidth the energy in GFT components and the chance of satisfying (26) increases, the support recovery task becomes easier, as predicted by the results of Theorem 5.

Next, we consider the task of sampling and reconstruction of noise-corrupted bandlimited graph signals with known support. Specifically, we consider undirected Erdős-Rényi random graphs of size and edge probability . We generate by forming using the first eigenvectors of the graph adjacency matrix, where is varied linearly from to . The non-zero frequency components are drawn independently from a zero-mean Gaussian distribution with standard deviation . The signal is corrupted by a Gaussian noise term with . We compare the recovery performance of the proposed scheme in Algorithm 1 with state-of-the-art uniform, leverage score, and optimal random sampling schemes [10, 15, 16]. We define the recovery error as the ratio of the error energy to the true signal’s energy. Furthermore, the success rate [10] is defined as the fraction of instances where is invertible [cf. (1)]. The results, averaged over 100 independent instances are shown in Fig 1(b). As we can see from Fig 1(b) (top), the proposed scheme consistently achieves lower recovery error than competing schemes. Moreover, as shown in Fig 1(b) (bottom), when the bandwidth increases the success rate of random sampling schemes decreases while the success rate of the proposed scheme is always one, as formally established in Theorem 1.

### Vi-B Real graph: interpolation of industrial sectors’ production

Next, we analyze data from the Bureau of Economic Analysis of the U.S. Department of Commerce which publicizes an annual table of input and outputs organized by economic sectors 111Dataset from https://www.bea.gov.. Specifically, we represent by nodes industrial sectors as defined by the North American Industry Classification System, and construct weighted edges and the graph signal similar to [12]. The (undirected) edge weight between two nodes represents the average total production of the sectors, the first sector being used as the input to the other sector, expressed in trillions of dollars per year. This edge weight is averaged over the years , , and . Also, two artificial nodes are connected to all nodes as the added value generated and the level of production destined to the market of final users. Thus, the final graph has nodes. The weights lower than are thresholded to zero and the eigenvalue decomposition of the corresponding adjacency matrix is performed. A graph signal can be regarded as a unidimensional total production – in trillion of dollars – of each sector during the year 2011. Signal is shown to be approximately (low-pass) bandlimited in [12, Fig. 4(a)(top)] with a bandwidth of .

We interpolate sectors’ production by observing a few nodes using Algorithm 1 and assuming that the signal is low-pass (i.e., with smooth variations over the built network). Then, we vary the sample size and compare the recovery performance of the proposed scheme with state-of-the-art uniform, leverage score, and optimal random sampling schemes [10, 15, 16] averaged over Monte-Carlo simulations as shown in Fig. 2(c) (top). As the figure indicates, the proposed algorithm outperforms uniform, leverage score, and optimal random sampling schemes [10, 15, 16]. However, Algorithm 1 does not achieve perfect recovery in this noiseless scenario because the signal is not truly bandlimited. Moreover, Fig. 2(c) (bottom) shows a realization of the graph signal superimposed with the reconstructed signal obtained using Algorithm 1 with for all nodes excluding two artificial ones. The recovery error of the reconstructed signal is approximately ; as Fig. 2(c) (bottom) illustrates, closely approximates .

### Vi-C Synthetic graph: Localization of UAVs

We now tackle a UAV localization problem in which the goal is to estimate absolute positions of robots from on-board sensor measurements. Specifically, consider a network of UAVs moving in a 2D plane and assume that each UAV is equipped with two systems: a laser scanner that measures the relative position of other UAVs within a sensing radius, and a GPS system that finds the absolute position of the UAV. While the laser system can find the relative position of nearby UAVs with minimal power consumption, the GPS system requires intensive power to receive the location of the UAV from the control unit located potentially far from the network of UAVs. Due to this intrinsic power constraint, we consider the case in which energy constraints prevent all UAVs to collect GPS data, i.e., only a subset of the UAVs can use the GPS. The objective is to compute the most representative subset the UAVs so to minimize the mean-squared error of the global positions of all UAVs. To this end, we employ the proposed randomized-greedy scheme in Algorithm 2 with various values of to find a sampling set (a subset of UAVs) and compare its recovery error to that of the greedy sampling scheme [14, 18, 19]. Note that two graph signals, namely the x and y coordinate of UAVs, are supported on the network. Further, since UAVs that are close to each other have similar locations, both of these graph signals are smooth and hence bandlimited.

We run Monte Carlo simulations with 1000 instances where we consider 1000 UAVs distributed uniformly in a grid; the range of the laser system is set to 0.3 and the power of noise affecting laser measurements is set to . The recovery error and running time results as a function of signals’ bandwidth – which is also the size of the sampling set – are shown in Fig. 4(a) and Fig. 4(b), respectively. As we see in Fig. 4(a), performance of the proposed scheme and greedy algorithm are fairly similar and as bandwidth increases the recovery error decreases. Furthermore, as gets smaller, the gap between the performance of the proposed scheme and the greedy algorithm reduces until becoming negligible. The running time comparison illustrated in Fig. 4(b) reveals that for the largest sampling set considered (i.e. ), the proposed scheme is more than 2x faster than the greedy method. Additionally, the complexity of the proposed scheme is linear in , while that of the greedy method is quadratic, as predicted in our theoretical results.

### Vi-D Real graph: Semi-supervised face clustering

Clustering faces of individuals is an important task in computer vision [46, 47, 48]. In real-world settings, labeling all images is practically infeasible. However, acquiring labels even for a small subset of data that can represent all images may drastically improve the clustering accuracy. The proposed randomized-greedy selection sampling framework can be employed in this setting to acquire labels for a small number of images to achieve improved clustering accuracy. To this end, we test the randomized-greedy algorithm on EYaleB dataset [46] (see Fig. 3) which contains frontal face images of 38 individuals under 64 different illumination conditions. Similar to the prior works (see, e.g., [47, 48]), in our studies the images are down-sampled to from the original size of . In each of independent instances of the Monte Carlo simulation we randomly pick subjects and all of their images as the data points to be clustered; this results in a clustering problem with data points. To construct the underlying graph signal and capture similarity of data points, we employ the sparse subspace clustering (SSC) scheme recently proposed in [47] to find the adjacency matrix and the Laplacian matrix . The graph signal support on the constructed similarity graph is discrete valued and the value of each node is an integer in . Note that the graph signal supported on the constructed similarity graph is smooth and bandlimited as similar images are unlikely to correspond to different individuals. The performance comparison of Algorithm 2 with various values for , greedy sampling method, random sampling schemes, and the unsupervised clustering method are illustrated in Fig. 4(c) as a function of the sampling ratio (). For the sake of clarity of presentation, we only show the result of the best method among uniform, leverage score, and optimal random sampling approaches [10, 16]. As we see in Fig. 4(c), the greedy and randomized-greedy schemes deliver the best clustering performance; as we increase size of the sampling set, the accuracy of semi-supervised schemes improves and the gap between the performance of random sampling methods and the proposed scheme decreases. Furthermore, our simulation studies reveal that acquiring labels of only 8 data points using the proposed scheme results in more than improvements in clustering accuracy as compared to the unsupervised method.

## Vii Acknowledgment

We would like to thank the authors in [12] for providing the data used for the economy network analysis.

## Viii Conclusion

We considered the task of sampling and reconstruction of bandlimited graph signals where the objective is to select a node subset of prescribed cardinality that ensures interpolation of the original signal with the lowest reconstruction error. We first studied the non-Bayesian scenario and proposed an efficient iterative sampling approach that exploits the low-cost selection criterion of the orthogonal matching pursuit algorithm to recursively select a subset of nodes of the graph. We also theoretically showed that in the noiseless case the original -bandlimited signal is exactly recovered from the set of selected nodes with cardinality . In the case of noisy measurements, we established a worst-case performance bound on the reconstruction error of the proposed algorithm. In the Bayesian scenario where the graph signal is a non-stationary random process, we formulated the task of sampling as the problem of maximizing a monotone weak submodular function that is directly related to the mean square error (MSE) of the linear estimator of the original signal. We proposed a randomized-greedy algorithm to find a sub-optimal subset of sampling nodes. We analyzed the performance of the randomized-greedy algorithm and showed that the resulting MSE is a constant factor away from the MSE of the optimal sampling set. Unlike prior work, our guarantees do not require stationarity of the graph signal and the study is not restricted to the case of additive white noise. Instead, the noise coefficients are assumed to be independent but the power of noise varies across individual nodes of the graph. We further extended these results to the case where the support of a bandlimited signal is unknown. In particular, we demonstrated that, under a certain sufficient condition, the proposed framework requires samples to ensure exact recovery of signals with unknown support from historical samples of the graph signal. Simulations studies showed the efficacy of the proposed sampling algorithms in comparisons to competing alternatives.

## Appendix A Proof of Theorem 1

To prove the theorem, it suffices to show that Algorithm 1 selects a subset of rows of which are linearly independent. Consider the iteration where is identified and assume that until this iteration contains indices of a collection of linearly independent vectors . If , since is orthogonal to the span of , is not in the span of these vectors. Hence, is also a collection of linearly independent vectors and by an inductive argument we conclude that rows of are linearly independent. Now assume for some . Since does not have all-zero rows, this condition implies . Therefore, all the remaining rows of which are not selected lie in the span of . Since by assumption , this condition implies that the rank of is at most which contradicts the fact that is a basis and has full column-rank. Therefore, holds only for and thus rows of are linearly independent. This completes the proof.

## Appendix B Proof of Proposition 1

According to Theorem 1, if , is invertible. Therefore, since is positive definite and invertible it is easy to see that is also invertible. Now consider the case where is a tall full rank matrix. Let be the Cholesky decomposition of . Since is a positive definite matrix, is full rank and invertible. Therefore, is also a full rank matrix. Thus, is full rank and invertible. Hence, for any given a constructed by Algorithm 1, (4) simplifies to

 ^¯x=(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S~x. (28)

Since , the reconstructed signal can be obtained according to

 ^x =U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S~x (29) =U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S(US,r¯xK+nS) =x+U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1SnS.

Therefore,

 ∥^x−x∥2 ≤∥U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1SnS∥2 (30) \lx@stackrel(a)≤∥U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1Sn∥2 \lx@stackrel(b)≤σmax(U(U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S)ϵn \lx@stackrel(c)≤σmax(U)σmax((U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S)ϵn \lx@stackrel(d)≤σmax((U⊤S,rQ−1SUS,r)−1U⊤S,rQ−1S)ϵn

where and follow by the assumption , stems from submultiplicative property of -norm, and is by the fact that as it is a submatrix of an orthogonal matrix.

## Appendix C Proof of Proposition 2

We first verify that

 f(∅)=Tr(P−¯Σ∅)=Tr(P−P)=0.

Next, to show monotonicity, we establish a recursive relation for the marginal gain of selecting a new node on graph. More specifically, for it holds that

 fj(S) =Tr(P−¯ΣS∪{j})−Tr(P−¯ΣS) (31)

where easily follows by applying Sherman–Morrison formula [39] on matrix , and is due to properties of the trace of a matrix. Finally, since is the error covariance matrix, it is symmetric and positive definite. Hence,