# Adaptive Least Mean Squares Estimation

of Graph Signals

###### Abstract

The aim of this paper is to propose a least mean squares (LMS) strategy for adaptive estimation of signals defined over graphs. Assuming the graph signal to be band-limited, over a known bandwidth, the method enables reconstruction, with guaranteed performance in terms of mean-square error, and tracking from a limited number of observations over a subset of vertices. A detailed mean square analysis provides the performance of the proposed method, and leads to several insights for designing useful sampling strategies for graph signals. Numerical results validate our theoretical findings, and illustrate the performance of the proposed method. Furthermore, to cope with the case where the bandwidth is not known beforehand, we propose a method that performs a sparse online estimation of the signal support in the (graph) frequency domain, which enables online adaptation of the graph sampling strategy. Finally, we apply the proposed method to build the power spatial density cartography of a given operational region in a cognitive network environment.

## I Introduction

In many applications that are of current interest, like social networks, vehicular networks, big data or biological networks, the signals of interest are defined over the vertices of a graph [1]. This has motivated the development of tools for analyzing signals defined over a graph, or graph signals for short [1, 2, 3]. Graph signal processing (GSP) aims at extending classical discrete-time signal processing tools to signals defined over a discrete domain whose elementary units (vertices) are related to each other through a graph. This framework subsumes as a very simple special case discrete-time signal processing, where the vertices are associated to time instants and edges link consecutive time instants. A peculiar aspect of GSP is that, since the signal domain is dictated by the graph topology, the analysis tools come to depend on the graph topology as well. This paves the way to a plethora of methods, each emphasizing different aspects of the problem. An important feature to have in mind about graph signals is that the signal domain is not a metric space, as in the case, for example, of biological networks, where the vertices may be genes, proteins, enzymes, etc, and the presence of an edge between two molecules means that those molecules undergo a chemical reaction. This marks a fundamental difference with respect to time signals where the time domain is inherently a metric space. Processing signals defined over a graph has been considered in [2], [4, 5, 6]. A central role in GSP is of course played by spectral analysis of graph signals, which passes through the introduction of the so called Graph Fourier Transform (GFT). Alternative definitions of GFT have been proposed, depending on the different perspectives used to extend classical tools [7], [8], [1], [9], [2]. Two basic approaches are available, proposing the projection of the graph signal onto the eigenvectors of either the graph Laplacian, see, e.g., [7], [1], [9] or of the adjacency matrix, see, e.g. [2], [10]. The first approach applies to undirected graphs and builds on the spectral clustering properties of the Laplacian eigenvectors and the minimization of the norm graph total variation; the second approach was proposed to handle also directed graphs and it is based on the interpretation of the adjacency operator as the graph shift operator, which lies at the heart of all linear shift-invariant filtering methods for graph signals [11], [12]. A further very recent contribution proposes to build the graph Fourier basis as the set of orthonormal signals that minimize the (directed) graph cut size [13].

After the introduction of the GFT, an uncertainty principle for graph signals was derived in [14], [15], [16], [17], [18]. with the aim of assessing the link between the spread of a signal on the vertices of the graph and on its dual domain, as defined by the GFT. A simple closed form expressions for the fundamental tradeoff between the concentrations of a signal in the graph and the transformed domains was given in [18].

One of the basic problems in GSP is the development of a graph sampling theory, whose aim is to recover a band-limited (or approximately band-limited) graph signal from a subset of its samples. A seminal contribution was given in [7], later extended in [19] and, very recently, in [10], [18], [20], [21], [22]. Dealing with graph signals, the recovery problem may easily become ill-conditioned, depending on the location of the samples. Hence, for any given number of samples enabling signal recovery, the identification of the sampling set plays a key role in the conditioning of the recovery problem. It is then particularly important to devise strategies to optimize the selection of the sampling set. Alternative signal reconstuction methods have been proposed, either iterative as in [23], [20], [24], or single shot, as in [10], [18]. Frame-based approaches to reconstruct signals from subsets of samples have been proposed in [7], [20], [18].

The theory developed in the last years for GSP was then applied to solve specific learning tasks, such as semi-supervised classification on graphs [25, 26, 27], graph dictionary learning [28, 29], learning graphs structures [30], smooth graph signal recovery from random samples [31, 32], inpainting [33], denoising [34], and community detection on graphs [35]. Finally, in [36, 37], the authors proposed signal recovery methods aimed to recover graph signals that are assumed to be smooth with respect to the underlying graph, from sampled, noisy, missing, or corrupted measurements.

Contribution: The goal of this paper is to propose LMS strategies for the adaptive estimation of signals defined on graphs. To the best of our knowledge, this is the first attempt to merge the well established theory of adaptive filtering [38], [39], with the emerging field of signal processing on graphs. The proposed method hinges on the graph structure describing the observed signal and, under a band-limited assumption, it enables online reconstruction and tracking from a limited number of observations taken over a subset of vertices. An interesting feature of our proposed strategy is that this subset is allowed to vary over time, in adaptive manner. A detailed mean square analysis illustrates the role of the sampling strategy on the reconstruction capability, stability, and mean-square performance of the proposed algorithm. Based on these results, we also derive adaptive sampling strategies for LMS estimation of graph signals. Several numerical results confirm the theoretical findings, and assess the performance of the proposed strategies. Furthermore, we consider the case where the graph signal is band-limited but the bandwidth is not known beforehand; this case is critical because the selection of the sampling strategy fundamentally depends on such prior information. To cope with this issue, we propose an LMS method with adaptive sampling, which estimates and tracks the signal support in the (graph) frequency domain, while at the same time adapting the graph sampling strategy. Numerical results illustrate the tracking capability of the aforementioned method in the presence of time-varying graph signals. As an example, we apply the proposed strategy to estimate and track the spatial distribution of the electromagnetic power in a cognitive radio framework. The resulting graph signal turns out to be smooth, i.e. the largest part of its energy is concentrated at low frequencies, but it is not perfectly band-limited. As a consequence, recovering the overall signal from a subset of samples is inevitably affected by aliasing [22]. Numerical results show the tradeoff between complexity, i.e. number of samples used for processing, and mean-square performance of the proposed strategy, when applied to such cartography task. Intuitively, processing with a larger bandwidth and a (consequent) larger number of samples, improves the performance of the algorithm, at the price of a larger complexity.

The paper is organized as follows. In Sec. II, we introduce some basic graph signal processing tools, which will be useful for the following derivations. Sec. III introduces the proposed LMS algorithm for graph signals, illustrates its mean-square analysis, and derives useful graph sampling strategies. Then, in Sec. IV we illustrate the proposed LMS strategy with adaptive sampling, while Sec. V considers the application to power density cartography. Finally, Sec. VI draws some conclusions.

## Ii Graph Signal Processing Tools

We consider a graph consisting of a set of nodes , along with a set of weighted edges , such that , if there is a link from node to node , or , otherwise. The adjacency matrix of a graph is the collection of all the weights . The degree of node is . The degree matrix is a diagonal matrix having the node degrees on its diagonal. The Laplacian matrix is defined as:

(1) |

If the graph is undirected, the Laplacian matrix is symmetric and positive semi-definite, and admits the eigendecomposition , where collects all the eigenvectors of in its columns, whereas is a diagonal matrix containing the eigenvalues of . It is well known from spectral graph theory [40] that the eigenvectors of are well suited for representing clusters, since they minimize the norm graph total variation.

A signal over a graph is defined as a mapping from the vertex set to the set of complex numbers, i.e. . In many applications, the signal admits a compact representation, i.e., it can be expressed as:

(2) |

where is exactly (or approximately) sparse. As an example, in all cases where the graph signal exhibits clustering features, i.e. it is a smooth function within each cluster, but it is allowed to vary arbitrarily from one cluster to the other, the representation in (2) is compact, i.e. the only nonzero (or approximately nonzero) entries of are the ones associated to the clusters.

The GFT of a signal is defined as the projection onto the orthogonal set of vectors [1], i.e.

(3) |

The GFT has been defined in alternative ways, see, e.g., [1], [9], [2], [10]. In this paper, we basically follow the approach based on the Laplacian matrix, assuming an undirected graph structure, but the theory could be extended to handle directed graphs with minor modifications. We denote the support of in (2) as , and the bandwidth of the graph signal is defined as the cardinality of , i.e. . Clearly, combining (2) with (3), if the signal exhibits a clustering behavior, in the sense specified above, computing its GFT is the way to recover the sparse vector in (2).

Localization Operators: Given a subset of vertices , we define a vertex-limiting operator as the diagonal matrix

(4) |

where is the set indicator vector, whose -th entry is equal to one, if , or zero otherwise. Similarly, given a subset of frequency indices , we introduce the filtering operator

(5) |

where is a diagonal matrix defined as . It is immediate to check that both matrices and are self-adjoint and idempotent, and so they represent orthogonal projectors. The space of all signals whose GFT is exactly supported on the set is known as the Paley-Wiener space for the set [7]. We denote by the set of all finite -norm signals belonging to the Paley-Wiener space associated to the frequency subset . Similarly, we denote by the set of all finite -norm signals with support on the vertex subset . In the rest of the paper, whenever there will be no ambiguities,, we will drop the subscripts referring to the sets. Finally, given a set , we denote its complement set as , such that and . Similarly, we define the complement set of as . Thus, we define the vertex-projector onto as and, similarly, the frequency projector onto the frequency domain as .

Exploiting the localization operators in (4) and (5), we say that a vector is perfectly localized over the subset if

(6) |

with defined as in (4). Similarly, a vector is perfectly localized over the frequency set if

(7) |

with given in (5). As previously stated, represents the (not necessarily contiguous) bandwidth of the graph signal. The localization properties of graph signals were studied in [22] and later extended in [18] to derive the fundamental trade-off between the localization of a signal in the graph and on its dual domain. An interesting consequence of that theory is that, differently from continuous-time signals, a graph signal can be perfectly localized in both vertex and frequency domains. The conditions for having perfect localization are stated in the following theorem, which we report here for completeness of exposition; its proof can be found in [22].

###### Theorem 1

There is a vector , perfectly localized over both vertex set and frequency set (i.e. ) if and only if the operator (or ) has an eigenvalue equal to one; in such a case, is an eigenvector of associated to the unitary eigenvalue.

Equivalently, the perfect localization properties can be expressed in terms of the operators and . Indeed, since the operators and have the same singular values [18], perfect localization onto the sets and can be achieved if and only if

(8) |

Building on these previous results on GSP, in the next section we introduce the proposed LMS strategy for adaptive estimation of graph signals.

## Iii LMS Estimation of Graph Signals

The least mean square algorithm, introduced by Widrow and Hoff [41], is one of the most popular methods for adaptive filtering. Its applications include echo cancelation, channel equalization, interference cancelation and so forth. Although there exist algorithms with faster convergence rates such as the Recursive Least Square (RLS) methods [38], LMS-type methods are popular because of their ease of implementation, low computational costs and robustness. For these reasons, a huge amount of research was produced in the last decades focusing on improving the performance of LMS-type methods, exploiting in many cases some prior information that is available on the observed signals. For instance, if the observed signal is known to be sparse in some domain, such prior information can help improve the estimation performance, as demonstrated in many recent efforts in the area of compressed sensing [42, 43]. Some of the early works that mix adaptation with sparsity-aware reconstruction include methods that rely on the heuristic selection of active taps [44], and on sequential partial updating techniques [45]; some other methods assign proportional step-sizes to different taps according to their magnitudes, such as the proportionate normalized LMS (PNLMS) algorithm and its variations [46]. In subsequent studies, motivated by the LASSO technique [47] and by connections with compressive sensing [43, 48], several algorithms for sparse adaptive filtering have been proposed based on LMS [49], RLS [50], and projection-based methods [51]. Finally, sparsity aware distributed methods were proposed in [52, 53, 54, 55, 56].

In this paper, we aim to exploit the intrinsic sparsity that is present in band-limited graph signals, thus designing proper sampling strategies that guarantee adaptive reconstruction of the signal, with guaranteed mean-square performance, from a limited number of observation sampled from the graph. To this aim, let us consider a signal defined over the graph . The signal is initially assumed to be perfectly band-limited, i.e. its spectral content is different from zero only on a limited set of frequencies . Later on, we will relax such an assumption. Let us consider partial observations of signal , i.e. observations over only a subset of nodes. Denoting with the sampling set (observation subset), the observed signal at time can be expressed as:

(9) |

where is the vertex-limiting operator defined in (4), which takes nonzero values only in the set , and is a zero-mean, additive noise with covariance matrix . The second equality in (9) comes from the bandlimited assumption, i.e. , with denoting the operator in (5) that projects onto the (known) frequency set . We remark that, differently from linear observation models commonly used in adaptive filtering theory [38], the model in (9) has a free sampling parameter that can be properly selected by the designer, with the aim of reducing the computational/memory burden while still guaranteeing theoretical performance, as we will illustrate in the following sections. The estimation task consists in recovering the band-limited graph signal from the noisy, streaming, and partial observations in (9). Following an LMS approach [41], the optimal estimate for can be found as the vector that solves the following optimization problem:

(10) | |||

where denotes the expectation operator. The solution of problem (10) minimizes the mean-squared error and has a bandwidth limited to the frequency set . For stationary , the optimal solution of (10) is given by the vector that satisfies the normal equations [38]:

(11) |

Exploiting (9), this statement can be readily verified noticing that minimizes the objective function of (10) and is bandlimited, i.e. it satisfies . Nevertheless, in many linear regression applications involving online processing of data, the expectation may be either unavailable or time-varying, and thus impossible to update continuously. For this reason, adaptive solutions relying on instantaneous information are usually adopted in order to avoid the need to know the signal statistics beforehand. A typical solution proceeds to optimize (10) by means of a steepest-descent procedure. Thus, letting be the instantaneous estimate of vector , the LMS algorithm for graph signals evolves as illustrated in Algorithm 1, where is a (sufficiently small) step-size, and we have exploited the fact that is an idempotent operator, and (i.e., is band-limited) for all . Algorithm 1 starts from an initial signal that belongs to the Paley-Wiener space for the set , and then evolves implementing an alternating orthogonal projection onto the vertex set (through ) and the frequency set (through ).

Start with chosen at random. Given a sufficiently small step-size , for each time , repeat:

(12) |

The properties of the LMS recursion in (12) crucially depend on the choice of the sampling set , which defines the structure of the operator [cf. (4)]. To shed light on the theoretical behavior of Algorithm 1, in the following sections we illustrate how the choice of the operator affects the reconstruction capability, mean-square stability, and steady-state performance of the proposed LMS strategy.

### Iii-a Reconstruction Properties

It is well known from adaptive filters theory [38] that the LMS algorithm in (12) is a stochastic approximation method for the solution of problem (10), which enables convergence in the mean-sense to the true vector (if the step-size is chosen sufficiently small), while guaranteing a bounded mean-square error (as we will see in the sequel). However, since the existence of a unique band-limited solution for problem (12) depends on the adopted sampling strategy, the first natural question to address is: What conditions must be satisfied by the sampling operator to guarantee reconstruction of signal from the selected samples? The answer is given in the following Theorem, which gives a necessary and sufficient condition to reconstruct graph signals from partial observations using Algorithm 1.

###### Theorem 2

Problem (10) admits a unique solution, i.e. any band-limited signal can be reconstructed from its samples taken in the set , if and only if

(13) |

i.e. if the matrix does not have any eigenvector that is perfectly localized on and bandlimited on .

Proof. From (11), exploiting the relation , it holds

(14) |

Hence, it is possible to recover from (14) if is invertible. This happens if the sufficient condition (13) holds true. Conversely, if (or, equivalently, ), from (8) we know that there exist band-limited signals that are perfectly localized over . This implies that, if we sample one of such signals over the set , we get only zero values and then it would be impossible to recover from those samples. This proves that condition (13) is also necessary.

A necessary condition that enables reconstruction, i.e. the non-existence of a non-trivial vector satisfying , is that . However, this condition is not sufficient, because matrix in (9) may loose rank, or easily become ill-conditioned, depending on the graph topology and sampling strategy defined by . This suggests that the location of samples plays a key role in the performance of the LMS reconstruction algorithm in (12). For this reason, in Section III.D we will consider a few alternative sampling strategies satisfying different optimization criteria.

### Iii-B Mean-Square Analysis

When condition (13) holds true, Algorithm 1 can reconstruct the graph signal from a subset of samples. In this section, we study the mean-square behavior of the proposed LMS strategy, illustrating how the sampling operator affects its stability and steady-state performance. From now on, we view the estimates as realizations of a random process and analyze the performance of the LMS algorithm in terms of its mean-square behavior. Let be the error vector at time . Subtracting from the left and right hand sides of (12), using (9) and relation , we obtain:

(15) |

Applying a GFT to each side of (15) (i.e., multiplying by ), and exploiting the structure of matrix in (5), we obtain

(16) |

where is the GFT of the error . From (16) and the definition of in (5), since for all , we can analyze the behavior of the error recursion (16) only on the support of , i.e. . Thus, letting be the matrix having as columns the eigenvectors of the Laplacian matrix associated to the frequency indices , the error recursion (16) can be rewritten in compact form as:

(17) |

The evolution of the error in the compact transformed domain is totally equivalent to the behavior of from a mean-square error point of view. Thus, using energy conservation arguments [57], we consider a general weighted squared error sequence , where is any Hermitian nonnegative-definite matrix that we are free to choose. In the sequel, it will be clear the role played by a proper selection of the matrix . Then, from (17) we can establish the following variance relation:

(18) |

where denotes the trace operator, and

(19) |

Let and , where the notation stacks the columns of on top of each other and is the inverse operation. We will use interchangeably the notation and to denote the same quantity . Exploiting the Kronecker product property

and the trace property

in the relation (III-B), we obtain:

(20) |

where

(21) | ||||

(22) |

The following theorem guarantees the asymptotic mean-square stability (i.e., convergence in the mean and mean-square error sense) of the LMS algorithm in (12).

###### Theorem 3

Assume model (9) holds. Then, for any bounded initial condition, the LMS strategy (12) asymptotically converges in the mean-square error sense if the sampling operator and the step-size are chosen to satisfy (13) and

(23) |

with denoting the maximum eigenvalue of the symmetric matrix . Furthermore, it follows that, for sufficiently small step-sizes:

(24) |

Proof. Letting , recursion (20) can be equivalently recast as:

(25) |

where denotes the initial condition. We first note that if is stable, as . In this way, the first term on the RHS of (25) vanishes asymptotically. At the same time, the convergence of the second term on the RHS of (25) depends only on the geometric series of matrices , which is known to be convergent to a finite value if the matrix is a stable matrix [58]. In summary, if is stable, the RHS of (25) asymptotically converges to a finite value, and we conclude that will converge to a steady-state value. From (22), we deduce that is stable if matrix is stable as well. This holds true under the two following conditions. The first condition is that matrix must have full rank, i.e. (13) holds true, where we have exploited the relation

Now recalling that, for any Hermitian matrix , it holds [58], with denoting the spectral radius of , the second condition guaranteing the stability of is that , which holds true for any step-sizes satisfying (23). This concludes the first part of the Theorem.

We now prove the second part. Selecting in (25), we obtain the following bound

(26) |

where . Taking the limit of (26) as , since if conditions (13) and (23) hold, we obtain

(27) |

From (22), we have

(28) |

where , , and in (a) we have exploited . Substituting (III-B) in (27), we get

(29) |

It is easy to check that the upper bound (29) does not exceed for any stepsize . Thus, we conclude that (24) holds for sufficiently small step-sizes.

### Iii-C Steady-State Performance

Taking the limit of (20) as (assuming conditions (13) and (23) hold true), we obtain:

(30) |

Expression (30) is a useful result: it allows us to derive several performance metrics through the proper selection of the free weighting parameter (or ). For instance, let us assume that one wants to evaluate the steady-state mean square deviation (MSD) of the LMS strategy in (12). Thus, selecting in (30), we obtain

(31) |

If instead one is interested in evaluating the mean square deviation obtained by the LMS algorithm in (12) when reconstructing the value of the signal associated to -th vertex of the graph, selecting in (30), we obtain

(32) |

where , with denoting the -th canonical vector. In the sequel, we will confirm the validity of these theoretical expressions by comparing them with numerical simulations.

### Iii-D Sampling Strategies

As illustrated in the previous sections, the properties of the proposed LMS algorithm in (12) strongly depend on the choice of the sampling set , i.e. on the vertex limiting operator . Indeed, building on the previous analysis, it is clear that the sampling strategy must be carefully designed in order to: a) enable reconstruction of the signal; b) guarantee stability of the algorithm; and c) impose a desired mean-square error at convergence. In particular, we will see that, when sampling signals defined on graphs, besides choosing the right number of samples, whenever possible it is also fundamental to have a strategy indicating where to sample, as the samples’ location plays a key role in the performance of the reconstruction algorithm in (12). To select the best sampling strategy, one should optimize some performance criterion, e.g. the MSD in (III-C), with respect to the sampling set , or, equivalently, the vertex limiting operator . However, since this formulation translates inevitably into a selection problem, whose solution in general requires an exhaustive search over all the possible combinations, the complexity of such procedure becomes intractable also for graph signals of moderate dimensions. Thus, in the sequel we will provide some numerically efficient, albeit sub-optimal, greedy algorithms to tackle the problem of selecting the sampling set.

Greedy Selection - Minimum MSD: This strategy aims at minimizing the MSD in (III-C) via a greedy approach: the method iteratively selects the samples from the graph that lead to the largest reduction in terms of MSD. Since the proposed greedy approach starts from an initially empty sampling set, when , matrix in (III-C) is inevitably rank deficient. Then, in this case, the criterion builds on the pseudo-inverse of the matrix in (III-C), denoted by , which coincides with the inverse as soon as . The resulting algorithm is summarized in the table entitled “Sampling strategy 1”, where we made explicit the dependence of matrices and on the sampling operator . In the sequel, we will refer to this method as the Min-MSD strategy.

Greedy Selection - Maximum : In this case, the strategy aims at maximizing the volume of the parallelepiped build with the selected rows of matrix . The algorithm starts including the row with the largest norm in , and then it adds, iteratively, the rows having the largest norm and, at the same time, are as orthogonal as possible to the vectors already in . The rationale underlying this strategy is to design a well suited basis for the graph signal that we want to estimate. This criterion coincides with the maximization of the the pseudo-determinant of the matrix (i.e. the product of all nonzero eigenvalues), which is denoted by . In the sequel, we motivate the rationale underlying this strategy. Let us consider the eigendecomposition . From (III-C), we obtain:

(33) |

where , . From (III-D), we notice how the MSD of the LMS algorithm in (12) strongly depends on the values assumed by the eigenvalues , . In particular, we would like to design matrix in (22) such that its eigenvalues are as far as possible from 1. From (22), it is easy to understand that

. Thus, requiring , , to be as far as possible from 1 translates in designing the matrix such that its eigenvalues are as far as possible from zero. Thus, a possible surrogate criterion for the approximate minimization of (III-D) can be formulated as the selection of the sampling set (i.e. operator ) that maximizes the determinant (i.e. the product of all eigenvalues) of the matrix . When , matrix is inevitably rank deficient, and the strategy builds on the pseudo-determinant of . Of course, when , the pseudo determinant coincides with the determinant. The resulting algorithm is summarized in the table entitled “Sampling strategy 2”. In the sequel, we will refer to this method as the Max-Det sampling strategy.

Greedy Selection - Maximum : Finally, using similar arguments as before, a further surrogate criterion for the minimization of (III-D) can be formulated as the maximization of the minimum nonzero eigenvalue of the matrix , which is denoted by . This greedy strategy exploits the same idea of the sampling method introduced in [10] in the case of batch signal reconstruction. The resulting algorithm is summarized in the table entitled “Sampling strategy 3”. We will refer to this method as the Max- sampling strategy.

In the sequel, we will illustrate some numerical results aimed at comparing the performance achieved by the proposed LMS algorithm using the aforementioned sampling strategies.

### Iii-E Numerical Results

In this section, we first illustrate some numerical results aimed at confirming the theoretical results in (III-C) and (III-C). Then, we will illustrate how the sampling strategy affects the performance of the proposed LMS algorithm in (12). Finally, we will evaluate the effect of a graph mismatching in the performance of the proposed algorithm.

Performance: Let us consider the graph signal shown in Fig. 1 and composed of nodes, where the color of each vertex denotes the value of the signal associated to it. The signal has a spectral content limited to the first ten eigenvectors of the Laplacian matrix of the graph in Fig. 1, i.e. . The observation noise in (9) is zero-mean, Gaussian, with a diagonal covariance matrix, where each element is chosen uniformly random between 0 and 0.01. An example of graph sampling, obtained selecting vertexes using the Max-Det sampling strategy, is also illustrated in Fig. 1, where the sampled vertexes have thicker marker edge. To validate the theoretical results in (III-C), in Fig. 2 we report the behavior of the theoretical MSD values achieved at each vertex of the graph, comparing them with simulation results, obtained averaging over 200 independent simulations and 100 samples of squared error after convergence of the algorithm. The step-size is chosen equal to and, together with the selected sampling strategy , they satisfy the reconstruction and stability conditions in (13) and (23). As we can notice from Fig. 2, the theoretical predictions match well the simulation results.

Effect of sampling strategies: It is fundamental to assess the performance of the LMS algorithm in (12) with respect to the adopted sampling set . As a first example, using the Max-Det sampling strategy, in Fig. 3 we report the transient behavior of the MSD, considering different number of samples taken from the graph, i.e. different cardinalities of the sampling set. The results are averaged over 200 independent simulations, and the step-sizes are tuned in order to have the same steady-state MSD for each value of . As expected, from Fig. 3 we notice how the learning rate of the algorithm improves by increasing the number of samples. Finally, in Fig. 4 we illustrate the steady-state MSD of the LMS algorithm in (12) comparing the performance obtained by four different sampling strategies, namely: a) the Max-Det strategy; b) the Max- strategy; c) the Min-MSD strategy; and d) the random sampling strategy, which simply picks at random nodes. We consider the same parameter setting of the previous simulation. The results are averaged over 200 independent simulations. As we can notice from Fig. 4, the LMS algorithm with random sampling can perform quite poorly, especially at low number of samples. This poor result of random sampling emphasizes that, when sampling a graph signal, what matters is not only the number of samples, but also (and most important) where the samples are taken. Comparing the other sampling strategies, we notice from Fig. 4 that the Max-Det and Max- strategies perform well also at low number of samples ( is the minimum number of samples that allows signal reconstruction). As expected, the Max-Det strategy outperforms the Max- strategy, because it considers all the modes of the MSD in (III-D), as opposed to the single mode associated to the minimum eigenvalue considered by the Max- strategy. It is indeed remarkable that, for low number of samples, Max-Det outperforms also Min-MSD, even if the performance metric is MSD. There is no contradiction here because we need to remember that all the proposed methods are greedy strategies, so that there is no claim of optimality in all of them. However, as the number of samples increases above the limit , the Min-MSD strategy outperforms all other methods. This happens because the Min-MSD strategy takes into account information from both graph topology and spatial distribution of the observation noise (cf. (III-C)). Thus, when the number of samples is large enough to have sufficient degrees of freedom in selecting the samples’ location, the Min-MSD strategy has the capability of selecting the vertexes in a good position to enable a well-conditioned signal recovery, with possibly low additive noise, thus improving the overall performance of the LMS algorithm in (12). Conversely, when the number of samples is very close to its minimum value, the Min-MSD criterion may give rise to ill-conditioning of the signal recovery strategy because the low noise samples may be in sub-optimal positions with respect to signal recovery. This explains its losses with respect to Max-Det and Max- strategies, for low values of the number of samples. This analysis suggests that an optimal design of the sampling strategy for graph signals should take into account processing complexity (in terms of number of samples), prior knowledge (e.g., graph structure, noise distribution), and achievable performance.

Effect of graph mismatching: In this last example, we aim at illustrating how the performance of the proposed method is affected by a graph mismatching during the processing. To this aim, we take as a benchmark the graph signal in Fig. 1, where the signal bandwidth is set equal to . The bandwidth defines also the sampling operator , which is selected through the Max-Det strategy, introduced in Sec. III.D, using samples. Now, we assume that the LMS processing in (12) is performed keeping fixed the sampling operator , while adopting an operator in (5) that uses the same bandwidth as for the benchmark case (i.e., the same matrix ), but different GFT operators , which are generated as the eigenvectors of Laplacian matrices associated to graphs that differs from the benchmark in Fig. 1 for one (removed) link. The aim of this simulation is to quantify the effect of a single link removal on the performance of the LMS strategy in (12). Thus, in Fig. 5, we report the transient MSD versus the iteration index of the proposed LMS strategy, considering four different links that are removed from the original graph. The four removed links are those shown in Fig. 1 using thicker lines; the colors and line styles are associated to the four behaviors of the transient MSD in Fig. 5. The results are averaged over 100 independent simulations, using a step-size . The theoretical performance in (III-C) achieved by the ideal LMS, i.e. the one perfectly matched to the graph, are also reported as a benchmark. As we can see from Fig. 5, the removal of different links from the graph leads to very different performance obtained by the algorithm. Indeed, while removing Link 1 (i.e., the red one), the algorithm performs as in the ideal case, the removal of links 2, 3, and 4, progressively determine a worse performance loss. This happens because the structure of the eigenvectors of the Laplacian of the benchmark graph is more or less preserved by the removal of specific links. Some links have almost no effects (e.g., Link 1), whereas some others (e.g., Link 4) may lead to deep modification of the structure of such eigenvectors, thus determining the mismatching of the LMS strategy in (12) and, consequently, its performance degradation. This example opens new theoretical questions that aim at understanding which links affect more the graph signals’ estimation performance in situations where both the signal and the graph are jointly time-varying. We plan to tackle this exciting case in future work.

## Iv LMS Estimation with Adaptive Graph Sampling

The LMS strategy in (12) assumes perfect knowledge of the support where the signal is defined in the graph frequency domain, i.e. . Indeed, this prior knowledge allows to define the projector operator in (5) in a unique manner, and to implement the sampling strategies introduced in Sec. III.D. However, in many practical situations, this prior knowledge is unrealistic, due to the possible variability of the graph signal over time at various levels: the signal can be time varying according to a given model; the signal model may vary over time, for a given graph topology; the graph topology may vary as well over time. In all these situations, we cannot always assume that we have prior information about the frequency support , which must then be inferred directly from the streaming data in (9). Here, we consider the important case where the graph is fixed, and the spectral content of the signal can vary over time in an unknown manner. Exploiting the definition of GFT in (3), the signal observations in (9) can be recast as:

(34) |

The problem then translates in estimating the coefficients of the GFT , while identifying its support, i.e. the set of indexes where is different from zero. The support identification is deeply related to the selection of the sampling set. Thus, the overall problem can be formulated as the joint estimation of sparse representation and sampling strategy from the observations in (34), i.e.,

(35) |

where is the (discrete) set that constraints the selection of the sampling strategy , is a sparsifying penalty function (typically, or norms), and is a parameter that regulates how sparse we want the optimal GFT vector . Problem (35) is a mixed integer nonconvex program, which is very complicated to solve, especially in the adaptive context considered in this paper. Thus, to favor low complexity online solutions for (35), we propose an algorithm that alternates between the optimization of the vector and the selection of the sampling operator . The rationale behind this choice is that, given an estimate for the support of vector , i.e. , we can select the sampling operator in a very efficient manner through one of the sampling strategies illustrated in Sec. III.D. Then, starting from a random initialization for and a full sampling for (i.e., ), the algorithm iteratively proceeds as follows. First, fixing the value of the sampling operator at time , we update the estimate of the GFT vector using an online version of the celebrated ISTA algorithm [59, 60], which proceeds as:

(36) |

, where is a (sufficiently small) step-size, and is a thresholding function that depends on the sparsity-inducing penalty in (35). Several choices are possible, as we will illustrate in the sequel. The aim of recursion (36) is to estimate the GFT of the graph signal in (9), while selectively shrinking to zero all the components of that are outside its support, i.e., which do not belong to the bandwidth of the graph signal. Then, the online identification of the support of the GFT enables the adaptation of the sampling strategy, which can be updated using one of the strategies illustrated in Sec. III.D. Intuitively, the algorithm will increase (reduce) the number of samples used for the estimation, depending on the increment (reduction) of the current signal bandwidth. The main steps of the LMS algorithm with adaptive graph sampling are listed in Algorithm 2.

Thresholding functions : Several different functions can be used to enforce sparsity. A commonly used thresholding function comes directly by imposing an norm constraint in (35), which is commonly known as the Lasso [47]. In this case, the vector threshold function is the component-wise thresholding function applied to each element of vector , with

(37) |

The function in (37) tends to shrink all the components of the vector and, in particular, sets to zero the components whose magnitude are within the threshold . Since the Lasso constraint is known for introducing a large bias in the estimate, the performance would deteriorate for vectors that are not sufficiently sparse, i.e. graph signals with large bandwidth. To reduce the bias introduced by the Lasso constraint, several other thresholding functions can be adopted to improve the performance also in the case of less sparse systems. A potential improvement can be made by considering the non-negative Garotte estimator as in [61], whose thresholding function is defined as a vector whose entries are derived applying the threshold

(38) |

. Finally, to completely remove the bias over the large components, we can implement a hard thresholding mechanism, whose function is defined as a vector whose entries are derived applying the threshold

(39) |

In the sequel, numerical results will illustrate how different thresholding functions such as (37), (38), and (39), affect the performance of Algorithm 2.

### Numerical Results

In this section, we illustrate some numerical results aimed at assessing the performance of the proposed LMS method with adaptive graph sampling, i.e. Algorithm 2. In particular, to illustrate the adaptation capabilities of the algorithm, we simulate a scenario with a time-varying graph signal with nodes, which has the same topology shown in Fig. 1, and spectral content that switches between the first 5, 15, and 10 eigenvectors of the Laplacian matrix of the graph. The elements of the GFT inside the support are chosen to be equal to 1. The observation noise in (9) is zero-mean, Gaussian, with a diagonal covariance matrix , with . In Fig. 6 we report the transient behavior of the normalized Mean-Square Deviation (NMSD), i.e.

versus the iteration index, considering the evolution of Algorithm 2 with three different thresholding functions, namely: a) the Lasso threshold in (37), the Garotte threshold in (38), and the hard threshold in (39). Also, in Fig. 7, we illustrate the behavior of the estimate of the cardinality of versus the iteration index (cf. Step 2 of Algorithm 2), obtained by the three aforementioned strategies at each iteration. The value of the cardinality of of the true underlying graph signal is also reported as a benchmark. The curves are averaged over 100 independent simulations. The step-size is chosen to be , the sparsity parameter , and thus the threshold is equal to for all strategies. The sampling strategy used in Step 3 of Algorithm 2 is the Max-Det method introduced in Sec. III.D, where the number of samples to be selected at each iteration is chosen to be equal to the current estimate of cardinality of the set . As we can notice from Fig. 6, the LMS algorithm with adaptive graph sampling is able to track time-varying scenarios, and its performance is affected by the adopted thresholding function. In particular, from Fig. 6, we notice how the algorithm based on the hard thresholding function in (39) outperforms the other strategies in terms of steady-state NMSD, while having the same learning rate. The Garotte based algorithm has slightly worse performance with respect to the method exploiting hard thresholding, due to the residual bias introduced at large values by the thresholding function in (38). Finally, we can notice how the LMS algorithm based on Lasso may lead to very poor performance, due to misidentifications of the true graph bandwidth. This can be noticed from Fig. 7 where, while the Garotte and hard thresholding strategies are able to learn exactly the true bandwidth of the graph signal (thus leading to very good performance in terms of NMSD, see Fig. 6), the Lasso strategy overestimates the bandwidth of the signal, i.e. the cardinality of the set (thus leading to poor estimation performance, see Fig. 6). Finally, to illustrate an example of adaptive sampling, in Figs. 10, 10, and 10 we report the samples (depicted as black nodes) chosen by the proposed LMS algorithm based on hard thresholding at iterations , , and . As we can notice from Figs. 6, 7 and 10, 10, and 10, the algorithm always selects a number of samples equal to the current value of the signal bandwidth, while guaranteeing good reconstruction performance.

## V Application to Power Spatial Density Estimation in Cognitive Networks

The advent of intelligent networking of heterogeneous devices such as those deployed to monitor the 5G networks, power grid, transportation networks, and the Internet, will have a strong impact on the underlying systems. Situational awareness provided by such tools will be the key enabler for effective information dissemination, routing and congestion control, network health management, risk analysis, and security assurance. The vision is for ubiquitous smart network devices to enable data-driven statistical learning algorithms for distributed, robust, and online network operation and management, adaptable to the dynamically evolving network landscape with minimal need for human intervention. In this context, the unceasing demand for continuous situational awareness in cognitive radio (CR) networks calls for innovative signal processing algorithms, complemented by sensing platforms to accomplish the objectives of layered sensing and control. These challenges are embraced in the study of power cartography, where CRs collect data to estimate the distribution of power across space, namely the power spatial density (PSD). Knowing the PSD at any location allows CRs to dynamically implement a spatial reuse of idle bands. The estimated PSD map need not be extremely accurate, but precise enough to identify idle spatial regions.

In this section, we apply the proposed framework for LMS estimation of graph signals to spectrum cartography in cognitive networks. We consider a 5G scenario, where a dense deployment of radio access points (RAPs) is envisioned to provide a service environment characterized by very low latency and high rate access. Each RAP collects streaming data related to the spectrum utilization of primary users (PU’s) at its geographical position. This information can then be sent to a processing center, which collects data from the entire system, through high speed wired links. The aim of the center is then to build a spatial map of the spectrum usage, while processing the received data on the fly and envisaging proper sampling techniques that enable a proactive sensing of the system from only a limited number of RAP’s measurements. As we will see in the sequel, the proposed approach hinges on the graph structure of the signal received from the RAP’s, thus enabling real-time PSD estimation from a small set of observations that are smartly sampled from the graph.

Numerical examples: Let us consider an operating region where 100 RAPs are randomly deployed to produce a map of the spatial distribution of power generated by the transmissions of two active primary users. The PU’s emit electromagnetic radiation with power equal to 1 Watt. For simplicity, the propagation medium is supposed to introduce a free-space path loss attenuation on the PU’s transmissions. The graph among RAPs is built from a distance based model, i.e. stations that are sufficiently close to each other are connected through a link (i.e. , if nodes and are neighbors). In Fig. 11, we illustrate a pictorial description of the scenario, and of the resulting graph signal. We assume that each RAP is equipped with an energy detector, which estimates the received signal using 100 samples, considering an additive white Gaussian noise with variance . The resulting signal is not perfectly band-limited, but it turns out to be smooth over the graph, i.e. neighbor nodes observe similar values. This implies that sampling such signals inevitably introduces aliasing during the reconstruction process. However, even if we cannot find a limited (lower than ) set of frequencies where the signal is completely localized, the greatest part of the signal energy is concentrated at low frequencies. This means that if we process the data using a sufficient number of observations and (low) frequencies, we should still be able to reconstruct the signal with a satisfactory performance.

To illustrate an example of cartography based on the LMS algorithm in (12), in Fig. 12 we report the behavior of the steady-state NMSD versus the number of samples taken from the graph, for different bandwidths used for processing. The step-size is chosen equal to 0.5, while the adopted sampling strategy is the Max-Det method introduced in Sec. III.D. The results are averaged over 200 independent simulations. As expected, from Fig. 12, we notice that the steady-state NMSD of the LMS algorithm in (12) improves by increasing the number of samples and bandwidths used for processing. Interestingly, in Fig. 12 we can see a sort of threshold behavior: the NMSD is large for , when the signal is undersampled, whereas the values become lower and stable as soon as . Finally, we illustrate an example that shows the tracking capability of the proposed method in time-varying scenarios. In particular, we simulate a situation the two PU’s switch between idle and active modes: for only the first PU transmits; for both PU’s transmit; for only the second PU’s transmits. In Fig. 13 we show the behavior of the transient NMSD versus iteration index, for different number of samples and bandwidths used for processing. The results are averaged over 200 independent simulations. From Fig. 13, we can see how the proposed technique can track time-varying scenarios. Furthermore, its steady-state performance improves with increase in the number of samples and bandwidths used for processing. These results, together with those achieved in Fig. 12, illustrate an existing tradeoff between complexity, i.e. number of samples used for processing, and mean-square performance of the proposed LMS strategy. In particular, using a larger bandwidth and a (consequent) larger number of samples for processing, the performance of the algorithm improves, at the price of a larger computational complexity.

## Vi Conclusions

In this paper we have proposed LMS strategies for adaptive estimation of signals defined over graphs. The proposed strategies are able to exploit the underlying structure of the graph signal, which can be reconstructed from a limited number of observations properly sampled from a subset of vertexes, under a band-limited assumption. A detailed mean square analysis illustrates the deep connection between sampling strategy and the properties of the proposed LMS algorithm in terms of reconstruction capability, stability, and mean-square error per