Graph Sampling for Covariance Estimation

# Graph Sampling for Covariance Estimation

Sundeep Prabhakar Chepuri,  Geert Leus,  The authors are with the Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands. Email: {s.p.chepuri;g.j.t.leus}@tudelft.nl.This work was supported by the KAUST-MIT-TUD consortium grant OSR-2015-Sensors-2700. A conference precursor of this manuscript appeared in the the Ninth IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, June 2016 [sundeep16sam].
###### Abstract

In this paper the focus is on subsampling as well as reconstructing the second-order statistics of signals residing on nodes of arbitrary undirected graphs. Second-order stationary graph signals may be obtained by graph filtering zero-mean white noise and they admit a well-defined power spectrum whose shape is determined by the frequency response of the graph filter. Estimating the graph power spectrum forms an important component of stationary graph signal processing and related inference tasks such as Wiener prediction or inpainting on graphs. The central result of this paper is that by sampling a significantly smaller subset of vertices and using simple least squares, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. To this end, both a nonparametric approach as well as parametric approaches including moving average and autoregressive models for the graph power spectrum are considered. The results specialize for undirected circulant graphs in that the graph nodes leading to the best compression rates are given by the so-called minimal sparse rulers. A near-optimal greedy algorithm is developed to design the subsampling scheme for the non-parametric and the moving average models, whereas a particular subsampling scheme that allows linear estimation for the autoregressive model is proposed. Numerical experiments on synthetic as well as real datasets related to climatology and processing handwritten digits are provided to demonstrate the developed theory.

\IEEEkeywords

Graph signal processing, stationary graph signals, sparse sampling, graph power spectrum estimation, compressive covariance sensing.

## I Introduction

Graphs are mathematical objects that can be used for describing and explaining relationships in complex datasets, which appear commonly in modern data analysis. The nodes of the graph denote the entities themselves and the edges encode the pairwise relationship between these entities. Some examples of such complex-structured data beyond traditional time-series include gene regulatory networks [barabasi2004network], brain networks [bullmore2009complex], transportation networks [guimera2005worldwide], social and economic networks [jackson2010social], and so on. Processing signals residing on the nodes of a graph taking into account the relationships between them as explained by the edges of the graph is recently receiving a significant amount of interest. In particular, generalizing as well as drawing parallels of classical time-frequency analysis tools to graph data analysis while incorporating the irregular structure on which the graph signals are defined is an emerging area of research [shuman2013Emerging, sandryhaila2014big].

Graph signals could be stochastic in nature and they can be modeled as the output of a graph filter [sandryhaila2013discrete] whose input is also a random signal (e.g., white noise). We are interested in sampling and processing stationary graph signals, which are stochastic signals defined on graphs with second-order statistics that are invariant similar to time series, but in the graph setting. Second-order stationary graph signals are characterized by a well-defined graph power spectrum. They can be generated by graph filtering white noise (or any other stationary graph signal) and the graph power spectrum of the filtered signal will be characterized by the squared magnitude of the frequency response of the filter; see [benjamin15eusipco, perraudin2016stationary, marques2016stationary, girault2015signal].

The second-order statistics of graph signals, or equivalently the graph power spectrum, are essential to solve inference problems on graphs in the Bayesian setting such as smoothing, prediction, inpainting, and deconvolution; see [girault2014semi] and [perraudin2016stationary] for some Bayesian inference problems. These inference problems are solved by designing optimum (in the minimum mean squared error sense) Wiener-like filters and the graph power spectrum forms a crucial component of such filter designs. In order to compute the graph power spectrum, traditional methods require the processing of signals on all graph nodes. The sheer quantity of data and scale of the graph often inhibit this reconstruction method. Therefore, the main question that we address in this paper is, can we reconstruct the graph power spectrum by observing a small subset of graph nodes?

### I-A Related works and main results

The notion of stationarity of signals on graphs and related definitions can be found in [benjamin15eusipco, perraudin2016stationary, marques2016stationary, girault2015signal], and it will be briefly explained in the next section as well. Several techniques for graph power spectrum estimation have been discussed in [perraudin2016stationary] and [marques2016stationary], and they are based on observations from all the nodes. In this paper, we consider the problem of reconstructing the second-order statistics of signals on graphs, but from subsampled observations. The fact that we are reconstructing the graph power spectrum, instead of the graph signal, enables us to subsample the graph signal (or sparsely sample the graph nodes), even without any spectral priors (e.g., sparsity, bandlimited with known support). This is a new and different perspective as compared to subsampling for graph signal reconstruction [anis2014towards, tsitsvero2015uncertainty, varma2015spectrum, chen2015discrete], which imposes some spectral prior that enables graph signal reconstruction. The proposed concept basically generalizes the field of compressive covariance sensing [ariananda2012compressive, romero2013wideband, Romero16CCSspm] to the graph setting.

The aim of this paper is to reconstruct second-order statistics of stationary graph signals from observations available at a few nodes using simple reconstruction methods such as least squares. The contributions are summarized as the following main results:

• Non-parametric approach: Without any spectral priors, second-order statistics of length-N stationary graph signals can be recovered using least squares from a reduced subset of \mathcal{O}(\sqrt{N}) observations, i.e., by observing \mathcal{O}(\sqrt{N}) graph nodes. In this case, the processing is done in the graph spectral domain.

• Circulant graphs: As a special case, when the graphs are circulant, the identifiability results are elegant. That is, the subset of nodes resulting in the best compression rates are given by the so-called minimal sparse rulers. This is reminiscent of compressive covariance sensing [Romero16CCSspm] for data that reside on a regular support such as time series, which is a specific instance of a circulant graph.

• Parametric approach: It is also possible to model the graph power spectrum using a small number of parameters, e.g., the graph signals may be modeled by moving average or autoregressive graph filters. The reconstruction of the second-order statistics of the graph signal then boils down to the estimation of moving average or autoregressive coefficients. Such a parameterization allows for a higher compression. When the graph power spectrum is modeled using a moving average graph filter, the second-order statistics can be recovered using least squares from \mathcal{O}(\sqrt{Q}) observations, where Q=\min\{2L-1,N\} with L being the number of moving average filter coefficients. When the graph power spectrum is modeled using an autoregressive graph filter, P autoregressive filter coefficients can be recovered using linear least squares by observing \mathcal{O}(P) nodes.

• Subsampler design: The proposed samplers are deterministic and they perform node subsampling. Subsampler design, therefore, becomes a discrete combinatorial optimization problem. For the spectral domain and moving average case, the subsampler can be designed using a near-optimal greedy algorithm. However, for the autoregressive approach, the sampler design depends also on (unobserved) data, and thus a mean squared error optimal design is not possible. This is due to the fact that we restrict ourselves to a low-complexity linear estimator for the autoregressive filter coefficients. Nevertheless, we present a suboptimal technique to design a subsampler for the autoregressive case as well.

### I-B Outline and notation

The remainder of the paper is organised as follows. The preliminary concepts of graph signal processing are discussed in Section II. The proposed least squares based reconstruction of the second-order statistics based on the subsampled observations are discussed in Section III. Connections of compressive covariance sensing for time-series with sensing data residing on circulant graphs are discussed in Section IV. In Section V, the graph power spectrum is represented with a small number of parameters under moving average and autoregressive models, and these parameters are then reconstructed using least squares from subsampled observations. In Section LABEL:sec:finitedata, we discuss the validity of the results provided in this paper for finite data records. Under the assumption that the data follows a Gaussian distribution, the maximum likelihood estimator and the related Cramér-Rao bound are also derived. In Section LABEL:sec:subsampdesgn, the design of sparse sampling matrices based on low-complexity greedy algorithms is discussed. A few examples to illustrate the proposed framework are provided in Section LABEL:sec:numericalresults. Finally, the paper concludes with Section LABEL:sec:conc.

The notation used in this paper is described as follows. Upper (lower) boldface letters are used for matrices (column vectors). Overbar \bar{(\cdot)} denotes complex conjugation, (\cdot)^{T} denotes the transpose, and (\cdot)^{H} denotes the complex conjugate (Hermitian) transpose. (\cdot)^{-T} is a shorthand notation for \left((\cdot)^{-1}\right)^{T}. \mathrm{diag}[\cdot] refers to a diagonal matrix with its argument on the main diagonal. {\rm diag_{r}}[\cdot] represents a diagonal matrix with the argument on its diagonal, but with the all-zero rows removed. {\boldsymbol{1}} ({\boldsymbol{0}}) denotes the vector of all ones (zeros). {\boldsymbol{I}} is an identity matrix. \mathbb{E}\{\cdot\} denotes the expectation operation. The \ell_{0}-(quasi) norm of {\boldsymbol{w}}=[w_{1},w_{2},\ldots,w_{N}]^{T} refers to the number of non-zero entries in {\boldsymbol{w}}, i.e., {\|{\boldsymbol{w}}\|}_{0}:=|\{n\,:\,w_{n}\neq 0\}|. The \ell_{1}-norm of {\boldsymbol{w}} is denoted by {\|{\boldsymbol{w}}\|}_{1}=\sum_{n=1}^{N}|w_{n}|. The notation \thicksim is read as “is distributed according to”. Unless and otherwise noted, logarithms are natural. {\rm tr}\{\cdot\} is the matrix trace operator. {\rm det}\{\cdot\} is the matrix determinant. {\rm rank}(\cdot) denotes the rank of a matrix. \lambda_{\rm min}\{{\boldsymbol{A}}\} (\lambda_{\rm max}\{{\boldsymbol{A}}\}) denotes the minimum (maximum) eigenvalue of a symmetric matrix {\boldsymbol{A}}. {\boldsymbol{A}}\succeq{\boldsymbol{B}} means that {\boldsymbol{A}}-{\boldsymbol{B}} is a positive semidefinite matrix. \mathbb{S}^{N} (\mathbb{S}^{N}_{+}) denotes the set of symmetric (symmetric positive semi-definite) matrices of size N\times N. |\mathcal{U}| denotes the cardinality of the set \mathcal{U}. \otimes denotes the Kronecker product, \circ denotes the Khatri-Rao or columnwise Kronecker product, and {\rm vec}(\cdot) refers to the matrix vectorization operator. For a full column rank tall matrix {\boldsymbol{A}}, the left inverse is given by {\boldsymbol{A}}^{\dagger}=({\boldsymbol{A}}^{H}{\boldsymbol{A}})^{-1}{% \boldsymbol{A}}^{H}. The column span of {\boldsymbol{A}} and row null space of {\boldsymbol{A}} are denoted by {\rm ran}({\boldsymbol{A}}) and {\rm null}({\boldsymbol{A}}), respectively. Properties that are frequently used in this paper:

• {\rm vec}({\boldsymbol{A}}{\boldsymbol{B}}{\boldsymbol{C}})=({\boldsymbol{C}}^% {T}\otimes{\boldsymbol{A}}){\rm vec}({\boldsymbol{B}});

• {\rm vec}({\boldsymbol{A}}{\rm diag}[{\boldsymbol{b}}]{\boldsymbol{C}})=({% \boldsymbol{C}}^{T}\circ{\boldsymbol{A}}){\boldsymbol{b}}.

## II Preliminaries

In this section, we introduce some preliminary concepts related to deterministic and stochastic signals defined on graphs.

### II-A Graph signals and filtering

Consider a dataset with N elements denoted as {\boldsymbol{x}}\in\mathbb{C}^{N}, which live on an irregular structure represented by an undirected graph \mathcal{G}=(\mathcal{V},\mathcal{E}), where the vertex set \mathcal{V}=\{v_{1},\cdots,v_{N}\} denotes the set of nodes, and the edge set \mathcal{E} reveals any connection between the nodes, i.e., (i,j)\in\mathcal{E} means that node i is connected to node j. The nth entry of {\boldsymbol{x}}, i.e., x_{n}, is indexed by node v_{n} of the graph \mathcal{G}. Therefore, we refer to the dataset {\boldsymbol{x}} as a length-N graph signal.

Let us introduce an operator {\boldsymbol{S}}\in\mathbb{C}^{N\times N}, where the (i,j)th entry of {\boldsymbol{S}} denoted by s_{i,j} is nonzero only if (i,j)\in\mathcal{E} and s_{i,j} can also be nonzero if i=j for (i,j)\in\mathcal{E}, and is zero otherwise. The pattern of {\boldsymbol{S}} captures the local structure of the graph. More specifically, for a graph signal {\boldsymbol{x}}, the signal {\boldsymbol{S}}{\boldsymbol{x}} denotes the unit shifted version of {\boldsymbol{x}}. Hence {{\boldsymbol{S}}} is referred to as the graph-shift operator [sandryhaila2013discrete]. Different choices for {\boldsymbol{S}} include the graph Laplacian {\boldsymbol{L}} [shuman2013Emerging], the adjacency matrix {\boldsymbol{A}} [sandryhaila2013discrete], or their respective variants. For undirected graphs, {\boldsymbol{S}} is symmetric (more generally, Hermitian), and thus it admits the following eigenvalue decomposition

 \displaystyle{\boldsymbol{S}} \displaystyle={\boldsymbol{U}}{\boldsymbol{\Lambda}}{\boldsymbol{U}}^{H} (1) \displaystyle=[{\boldsymbol{u}}_{1},\cdots,{\boldsymbol{u}}_{N}]\,{\rm diag}[% \lambda_{1},\cdots,\lambda_{N}]\,[{\boldsymbol{u}}_{1},\cdots,{\boldsymbol{u}}% _{N}]^{H},

where the eigenvectors \{{\boldsymbol{u}}_{n}\}_{n=1}^{N} and the eigenvalues \{\lambda_{n}\}_{n=1}^{N} of {\boldsymbol{S}} provide the notion of frequency in the graph setting [shuman2013Emerging, sandryhaila2014big]. Specifically, \{{\boldsymbol{u}}_{n}\}_{n=1}^{N} forms an orthonormal Fourier-like basis for graph signals with the graph frequencies denoted by \{\lambda_{n}\}_{n=1}^{N}. Hence, the graph Fourier transform of a graph signal, {\boldsymbol{x}}_{f}=[x_{f,1},x_{f,2},\ldots,x_{f,N}]^{T}\in\mathbb{C}^{N}, is given by

 {\boldsymbol{x}}_{f}:={\boldsymbol{U}}^{H}{\boldsymbol{x}}\,\Leftrightarrow\,{% \boldsymbol{x}}=:{\boldsymbol{U}}{\boldsymbol{x}}_{f}. (2)

The frequency content of graph signals can be modified using linear shift-invariant graph filters [sandryhaila2013discrete, shuman2013Emerging]. Let us call the system {\boldsymbol{H}}\in\mathbb{C}^{N\times N} as a graph filter. If the eigenvalues of {\boldsymbol{S}} are distinct, a shift-invariant graph filter, which satisfies {\boldsymbol{H}}({\boldsymbol{S}}{\boldsymbol{x}})={\boldsymbol{S}}({% \boldsymbol{H}}{\boldsymbol{x}}), can be expressed as a polynomial in {\boldsymbol{S}} as [sandryhaila2013discrete]

 \displaystyle{\boldsymbol{H}} \displaystyle=h_{0}{\boldsymbol{I}}+h_{1}{\boldsymbol{S}}+\cdots+h_{L-1}{% \boldsymbol{S}}^{L-1} (3) \displaystyle={\boldsymbol{U}}\left[h_{0}{\boldsymbol{I}}+h_{1}{\boldsymbol{% \Lambda}}+\cdots+h_{L-1}{\boldsymbol{\Lambda}}^{L-1}\right]{\boldsymbol{U}}^{H},

where the filter {\boldsymbol{H}} is of degree L-1 with filter coefficients {\boldsymbol{h}}=[h_{0},h_{1},\ldots,h_{L-1}]^{T}\in\mathbb{C}^{L}, and L\leq N as N is the degree of the minimal polynomial (equal to the characteristic polynomial) of {\boldsymbol{S}}. The diagonal matrix

 {\boldsymbol{H}}_{f}=\sum_{l=0}^{L-1}h_{l}{\boldsymbol{\Lambda}}^{l}={\rm diag% }[{\boldsymbol{V}}_{L}{\boldsymbol{h}}]={\rm diag}[h_{f,1},\cdots,h_{f,N}] (4)

can be viewed as the frequency response of the graph filter. Here, {\boldsymbol{V}}_{L} is an N\times L Vandermonde matrix with the (i,j)th entry as \lambda_{i}^{j-1}.

### II-B Stationary graph signals

Let {\boldsymbol{x}}=[x_{1},x_{2},\cdots,x_{N}]^{T}\in\mathbb{C}^{N} be a stochastic signal defined on the vertices of the graph \mathcal{G} with expected value {\boldsymbol{m}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}\} and covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{({\boldsymbol{x}}-{\boldsymbol{m% }}_{\boldsymbol{x}})({\boldsymbol{x}}-{\boldsymbol{m}}_{\boldsymbol{x}})^{H}\}. Efforts to generalize some of the concepts of statistical time invariance or stationarity of signals defined over regular structures to random graph signals have been made in [benjamin15eusipco, perraudin2016stationary, marques2016stationary, girault2015signal]. For the sake of completeness, we will summarize the definitions from [benjamin15eusipco, perraudin2016stationary, marques2016stationary, girault2015signal] as follows.

###### Definition 1 (Second-order stationarity).

A random graph signal {\boldsymbol{x}} is second-order stationary, if and only if, the following properties hold:

• The mean of the graph signal is collinear to an eigenvector of {\boldsymbol{S}} corresponding to the smallest eigenvalue, i.e., {\boldsymbol{m}}_{\boldsymbol{x}}=m_{\boldsymbol{x}}{\boldsymbol{u}}_{1}.

• Matrices {\boldsymbol{S}} and {\boldsymbol{R}}_{\boldsymbol{x}} can be simultaneously diagonalized.

Since we assume that the eigenvalues of {\boldsymbol{S}} are distinct and {\boldsymbol{U}} forms an orthonormal basis, property 2 in the above definition essentially means the statistical orthogonality of spectral components, i.e,. \mathbb{E}\{x_{f,i}{\bar{x}_{f,j}}\}=0 for i\neq j [girault2015signal].

For simplicity, from now on we will focus on graph signals with zero mean, where we assume that m_{\boldsymbol{x}} is either known or m_{\boldsymbol{x}} can be set to zero by preprocessing the data as discussed in Section LABEL:sec:numericalresults. We can generate zero-mean second-order stationary graph signals by graph filtering zero-mean white noise. Let {\boldsymbol{n}}=[n_{1},n_{2},\ldots,n_{N}]^{T}\in\mathbb{C}^{N} be zero-mean unit-variance noise with covariance matrix {\boldsymbol{R}}_{\boldsymbol{n}}={\boldsymbol{I}}. Then, a zero-mean second-order stationary graph signal {\boldsymbol{x}} can be modeled as {\boldsymbol{x}}={\boldsymbol{H}}{\boldsymbol{n}}, where {\boldsymbol{H}} can be any valid graph filter. The filtered signal will have zero mean and covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{({\boldsymbol{H}}{\boldsymbol{n}% })({\boldsymbol{H}}{\boldsymbol{n}})^{H}\} given by

 \displaystyle{\boldsymbol{R}}_{\boldsymbol{x}} \displaystyle={\boldsymbol{H}}{\boldsymbol{R}}_{\boldsymbol{n}}{\boldsymbol{H}% }^{H} (5) \displaystyle={\boldsymbol{U}}{\rm diag}[|h_{f,1}|^{2},\cdots,|h_{f,N}|^{2}]{% \boldsymbol{U}}^{H} \displaystyle={\boldsymbol{U}}{\rm diag}[{\boldsymbol{p}}]{\boldsymbol{U}}^{H},

where h_{f,n}=h_{0}+h_{1}\lambda_{n}+\cdots+h_{L-1}\lambda_{n}^{L-1} is defined in (4). This conforms to the second property listed in Definition 1. More generally, graph filtering any second-order stationary graph signal also results in a second-order stationary graph signal (it is easy to verify this using property 2 in Definition 1). The nonnegative vector {\rm diag}[{\boldsymbol{p}}] in (5) is referred to as the graph power spectral density or graph power spectrum. We now formally introduce the graph power spectrum through the following definition.

###### Definition 2 (Graph power spectrum).

The graph power spectral density of a second-order stationary graph signal is a real-valued nonnegative length-N vector {\boldsymbol{p}}=[p_{1},p_{2},\ldots,p_{N}]^{T}\in\mathbb{R}_{+}^{N} with entries given by

 p_{n}={\boldsymbol{u}}_{n}^{H}{\boldsymbol{R}}_{\boldsymbol{x}}{\boldsymbol{u}% }_{n},\,n=1,2,\ldots,N. (6)

Alternatively, p_{n}=|h_{f,n}|^{2}\geq 0, for n=1,2,\ldots,N, where h_{f,n}=h_{0}+h_{1}\lambda_{n}+\cdots+h_{L-1}\lambda_{n}^{L-1} is defined in (4).

Second-order stationarity is preserved by linear graph filtering. This means that stationary graph signals with a prescribed graph power spectrum can be generated by filtering white noise, where the graph power spectrum of the filtered signal is reshaped according to the frequency response of the graph filter [benjamin15eusipco, perraudin2016stationary, marques2016stationary]. As a result, the graph power spectrum reveals critical information about the second-order stationary graph signal, and thus estimating the graph power spectrum or recovering the second-order statistics of a graph signal is useful in many applications.

We end this section by summarizing the list of assumptions made in this paper.

1. The shift operator {\boldsymbol{S}} is known.

2. The orthonormal basis {\boldsymbol{U}} and the distinct eigenvalues \{\lambda_{n}\}_{n=1}^{N} of {\boldsymbol{S}} are known a priori.

## III Non-parametric Spectral Domain Approach

The size of the datasets often inhibits a direct computation of the second-order statistics, e.g., by observing all the N nodes and using (6) to compute the graph power spectrum. This would computationally cost \mathcal{O}(N^{3}). As such, compression or data reduction is preferred especially for large-scale data in the graph setting [sandryhaila2014big]. In the context of graph signal processing, most works consider subsampling the graph signal {\boldsymbol{x}} assuming some spectral prior to reconstruct it [anis2014towards, tsitsvero2015uncertainty, varma2015spectrum, chen2015discrete]. This approach is, in principle, also possible for recovering the second-order statistics of {\boldsymbol{x}}. However, when the goal is to reconstruct the second-order statistics of {\boldsymbol{x}} (and not {\boldsymbol{x}} itself), it is computationally advantageous, and allows for a stronger compression, when we avoid the intermediate step of reconstructing and storing {\boldsymbol{x}}. In this paper, we will therefore focus on recovering graph second-order statistics directly from subsampled graph signals. We refer to this problem as graph covariance subsampling.

The extension of compressive covariance sensing [ariananda2012compressive, romero2013wideband, Romero16CCSspm] to graph covariance subsampling is non-trivial. This is because for second-order (or wide-sense) stationary signals with a regular support, the covariance matrix has a clear structure (e.g., Toeplitz, circulant) that enables an elegant subsampler design, but for second-order stationary graph signals residing on arbitrary graphs, the covariance matrix does not admit any clear structure that can be easily exploited, in general.

Consider the problem of estimating the graph power spectrum of the second-order stationary graph signal {\boldsymbol{x}}\in\mathbb{C}^{N} from a set of K\ll N linear observations stacked in the vector {\boldsymbol{y}}\in\mathbb{C}^{K}, given by

 {\boldsymbol{y}}={\boldsymbol{\Phi}}{\boldsymbol{x}}, (7)

where {\boldsymbol{\Phi}} is a known K\times N selection matrix with Boolean entries, i.e., {\boldsymbol{\Phi}}\in\{0,1\}^{K\times M} (we will discuss the subsampler design in Section LABEL:sec:subsampdesgn) and where several realizations of {\boldsymbol{y}} may be available. The matrix {\boldsymbol{\Phi}} is referred to as the subsampling or sparse sampling matrix, where the compression is achieved by setting K\ll N. For applications where graph nodes correspond to sensing devices (e.g., weather stations in climatology, electroencephalography (EEG) probes in brain networks), such a sparse sampling scheme results in a significant reduction in the hardware, storage and communications costs next to the reduction in the processing costs.

The covariance matrices {\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}{\boldsymbol{x}}% ^{H}\}\in\mathbb{C}^{N\times N} and {\boldsymbol{R}}_{\boldsymbol{y}}=\mathbb{E}\{{\boldsymbol{y}}{\boldsymbol{y}}% ^{H}\}\in\mathbb{C}^{K\times K} contain the second-order statistics of {\boldsymbol{x}} and {\boldsymbol{y}}, respectively. In practice, typically, multiple snapshots, say N_{s} snapshots, are observed to form a sample covariance matrix. Forming the sample covariance matrix from N_{s} snapshots of {\boldsymbol{x}} costs \mathcal{O}(N^{2}N_{s}), while forming the sample covariance matrix from N_{s} snapshots of {\boldsymbol{y}} only costs \mathcal{O}(K^{2}N_{s}). We now state the problem of interest as follows.

###### Problem.

(Recovering second-order statistics) For a known undirected graph \mathcal{G}, given a number of realizations , say N_{s}, of the subsampled length-K graph signal {\boldsymbol{y}} or the subsampled covariance matrix {\boldsymbol{R}}_{\boldsymbol{y}}, recover the graph power spectrum {\boldsymbol{p}} and thus the covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}}.

Let us decompose the graph signal {\boldsymbol{x}} in terms of its graph Fourier transform coefficients as [cf. (2)]

 {\boldsymbol{x}}=\sum_{i=1}^{N}x_{f,i}{\boldsymbol{u}}_{i}.

This allows us to represent the covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}}=\mathbb{E}\{{\boldsymbol{x}}{\boldsymbol{x}}% ^{H}\} in the graph Fourier domain using the graph power spectrum {\boldsymbol{p}} as

 \displaystyle{\boldsymbol{R}}_{\boldsymbol{x}}=\sum_{i=1}^{N}\mathbb{E}\{|x_{f% ,i}|^{2}\}{\boldsymbol{u}}_{i}{\boldsymbol{u}}_{i}^{H}=\sum_{i=1}^{N}p_{i}{% \boldsymbol{u}}_{i}{\boldsymbol{u}}_{i}^{H}=\sum_{i=1}^{N}p_{i}{\boldsymbol{Q}% }_{i}, (8)

where we use the fact that for i\neq j we have \mathbb{E}\{x_{f,i}\bar{x}_{f,j}\}=0 and {\boldsymbol{Q}}_{i}={\boldsymbol{u}}_{i}{\boldsymbol{u}}_{i}^{H} is a size-N rank-one matrix. Here, we expand {\boldsymbol{R}}_{\boldsymbol{x}} using a set of N Hermitian matrices \{{\boldsymbol{Q}}_{1},{\boldsymbol{Q}}_{2},\ldots,{\boldsymbol{Q}}_{N}\} as a basis. Vectorizing {\boldsymbol{R}}_{x} in (8) results in

 {\boldsymbol{r}}_{\boldsymbol{x}}={\rm vec}({\boldsymbol{R}}_{\boldsymbol{x}})% =\sum_{i=1}^{N}p_{i}{\rm vec}({\boldsymbol{Q}}_{i})={\boldsymbol{\Psi}}_{\rm s% }{\boldsymbol{p}},

where we have stacked {\rm vec}({\boldsymbol{Q}}_{i})=\bar{{\boldsymbol{u}}}_{i}\otimes{\boldsymbol{% u}}_{i} to form the N^{2}\times N matrix {\boldsymbol{\Psi}}_{\rm s} as

 {\boldsymbol{\Psi}}_{\rm s}=[\bar{{\boldsymbol{u}}}_{1}\otimes{\boldsymbol{u}}% _{1},\cdots,\bar{{\boldsymbol{u}}}_{N}\otimes{\boldsymbol{u}}_{N}]=\bar{{% \boldsymbol{U}}}\circ{\boldsymbol{U}}.

The subscript “{\rm s}” in the matrix {\boldsymbol{\Psi}}_{\rm s}, which is constructed using the graph Fourier basis vectors, stands for spectral domain.

Using the compression scheme described in (7), the covariance matrix {\boldsymbol{R}}_{\boldsymbol{y}}\in\mathbb{C}^{K\times K} of the subsampled graph signal {\boldsymbol{y}} can be related to {\boldsymbol{R}}_{\boldsymbol{x}} as

 \displaystyle{\boldsymbol{R}}_{\boldsymbol{y}}={\boldsymbol{\Phi}}{\boldsymbol% {R}}_{\boldsymbol{x}}{\boldsymbol{\Phi}}^{T}=\sum_{i=1}^{N}p_{i}{\boldsymbol{% \Phi}}{\boldsymbol{Q}}_{i}{\boldsymbol{\Phi}}^{T}. (9)

This means that the expansion coefficients of {\boldsymbol{R}}_{\boldsymbol{y}} with respect to the set \{{\boldsymbol{\Phi}}{\boldsymbol{Q}}_{1}{\boldsymbol{\Phi}}^{T},{\boldsymbol{% \Phi}}{\boldsymbol{Q}}_{2}{\boldsymbol{\Phi}}^{T},\cdots,{\boldsymbol{\Phi}}{% \boldsymbol{Q}}_{N}{\boldsymbol{\Phi}}^{T}\} are the same as those of {\boldsymbol{R}}_{\boldsymbol{x}} with respect to the set \{{\boldsymbol{Q}}_{1},{\boldsymbol{Q}}_{2},\cdots,{\boldsymbol{Q}}_{N}\}, and they are preserved under linear compression. It is not yet clear though whether these expansion coefficients, which basically represent the power spectrum, can be uniquely recovered from {\boldsymbol{R}}_{\boldsymbol{y}}.

Vectorizing {\boldsymbol{R}}_{\boldsymbol{y}} as

 {\boldsymbol{r}}_{\boldsymbol{y}}={\rm vec}({\boldsymbol{R}}_{\boldsymbol{y}})% =({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\rm vec}({\boldsymbol{R}}_{% \boldsymbol{x}})\in\mathbb{C}^{K^{2}}

we obtain

 \displaystyle{\boldsymbol{r}}_{\boldsymbol{y}} \displaystyle=\sum_{i=1}^{N}p_{i}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}% })(\bar{{\boldsymbol{u}}}_{i}\otimes{\boldsymbol{u}}_{i})=\sum_{i=1}^{N}p_{i}(% {\boldsymbol{\Phi}}\bar{{\boldsymbol{u}}}_{i}\otimes{\boldsymbol{\Phi}}{% \boldsymbol{u}}_{i}) (10) \displaystyle=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi% }}_{\rm s}{\boldsymbol{p}}.

This linear system with N unknowns has a unique solution if ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s} has full column rank, which requires K^{2}\geq N. Assuming that this is the case, the graph power spectrum (thus the second-order statistics of {\boldsymbol{x}}) can be estimated in closed form via least squares:

 \widehat{\boldsymbol{p}}=[({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){% \boldsymbol{\Psi}}_{\rm s}]^{\dagger}{\boldsymbol{r}}_{\boldsymbol{y}}. (11)

Computing this least squares solution costs \mathcal{O}(K^{2}N^{2}) [golub1996matrix]. Although for the non-parametric approach, cost of computing (11) is on the same order as that of the uncompressed case, the cost reduction will be prominent for problems discussed later on in Section V. Further, to compute (11), we have assumed that the true covariance matrix {\boldsymbol{R}}_{\boldsymbol{y}} is available, but a practical scenario with finite data records is discussed in Section LABEL:sec:finitedata.

###### Definition 3.

A wide matrix {\boldsymbol{\Phi}} is a valid graph covariance subsampler if it yields a full column rank matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s}.

We now derive the conditions under which {\boldsymbol{\Phi}} is a valid graph covariance subsampler. To do this, we first introduce two important lemmas.

###### Lemma 1.

Since the matrix {\boldsymbol{U}}\in\mathbb{C}^{N\times N} is full rank, the matrix {\boldsymbol{\Psi}}_{\rm s}=\bar{{\boldsymbol{U}}}\circ{\boldsymbol{U}} of size {N^{2}\times N} has full column rank.

###### Proof.

See Appendix LABEL:app:proofselfkr. ∎

###### Lemma 2.

If the matrix {\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N} has full row rank, then the matrix {\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}} of size K^{2}\times N^{2} also has full row rank.

###### Proof.

Follows from the singular value decomposition of {\boldsymbol{\Phi}} and the property ({\boldsymbol{A}}\otimes{\boldsymbol{B}})({\boldsymbol{C}}\otimes{\boldsymbol{% D}})=({\boldsymbol{A}}{\boldsymbol{C}}\otimes{\boldsymbol{B}}{\boldsymbol{D}}). ∎

Using the above two lemmas, we can provide the necessary and sufficient conditions under which the solution in (11) is unique.

###### Theorem 1.

A full row rank matrix {\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N} is a valid graph covariance subsampler if and only if the matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s} is tall, i.e., K^{2}\geq N, and {\rm null}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})\cap{\rm ran}({% \boldsymbol{\Psi}}_{s})=\emptyset.

###### Proof.

See Appendix LABEL:app:theo1. ∎

Although the linear system of equations (10) can be solved using (unconstrained) least squares, nonnegativity constraints or any spectral prior can be easily accounted for while solving (10) as summarized in the following remark.

###### Remark 1 (Spectral priors).

Any available prior information about the graph spectrum might allow for a higher compression with K^{2}<N, or an improvement of the solution (11). Suppose we have some prior knowledge about the graph spectrum, i.e., {\boldsymbol{p}}\in\mathcal{P} with \mathcal{P} being the constraint set. For instance, suppose we know a priori that (a) the spectrum is bandlimited (e.g., lowpass) with known support such that \mathcal{P}=\{{\boldsymbol{p}}\,|\,p_{n}=0,n\notin[N_{l},N_{u}]\}, where [N_{l},N_{u}] denotes the support set, (b) the spectrum is sparse, but with unknown support such that \mathcal{P}:=\{{\boldsymbol{p}}\,|\,\sum_{n=1}^{N}p_{n}=S\}, where S denotes the sparsity order (here, we use the convex relaxation of the cardinality constraint), or (c) the power spectrum is nonnegative (by definition), for which \mathcal{P}:=\{{\boldsymbol{p}}\,|\,p_{n}\geq 0,\forall n\}. With such spectral priors, the following constrained least squares problem may be solved

In what follows, we will discuss and illustrate the connections with compressive covariance sensing [ariananda2012compressive, Romero16CCSspm] for datasets that reside on regular structures (e.g., time series) using a circulant graph (e.g., a cycle graph). We will also see that designing a compression matrix is much more elegant for such circulant graphs.

## IV Circulant Graphs

Discrete-time finite or periodic data can be represented using directed cycle graphs, where the direction of the edge represents the evolution of time from past to future. The edge directions may be ignored in some cases, e.g., when we are only interested in exploiting the regular Fourier transform, when we are dealing with the spatial domain, or when the underlying data is a time-reversible stochastic process that is invariant under the reversal of the time scale [weiss1975time]. In such cases, the data can be represented using an undirected cycle graph, see Fig. 1.

Consider the adjacency matrix of this undirected cycle graph as its graph-shift operator, which will be an N\times N symmetric circulant matrix. We know that a circulant matrix can be diagonalized with a discrete Fourier transform matrix. In other words, the graph Fourier transform matrix {\boldsymbol{U}} related to this graph will consist of the orthonormal vectors

 {\boldsymbol{u}}_{n}=[{\omega}_{n}^{0},{\omega}_{n},{\omega}_{n}^{2},\cdots,{% \omega}_{n}^{N-1}]^{T}

with {\omega}_{n}=\exp(-\imath 2\pi n/N)/\sqrt{N} and it will be a Vandermonde matrix (here, \imath^{2}=-1). In general, for circulant graphs with circulant graph-shift operators, an eigenvalue decomposition is not required to compute the graph Fourier transform matrix {\boldsymbol{U}} or the model matrix {\boldsymbol{\Psi}}_{\rm s}, which was introduced in Section III.

Let the set \mathcal{K}\subset\mathcal{N} denote the indices of the selected graph nodes. Now, if we can smartly select the entries of {{\boldsymbol{u}}}_{n} such that the related entries of \bar{{\boldsymbol{u}}}_{n}\otimes{\boldsymbol{u}}_{n} contain all the distinct values \{{\omega}_{n}^{m}\} for m=0,\cdots,N-1, the matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s} will be a full-column rank Vandermonde matrix. In particular, this means that, for every m=0,\ldots,N-1, there must exist at least one pair of elements n_{i},n_{j}\in\mathcal{K} that satisfies n_{i}-n_{j}=m, where the difference n_{i}-n_{j} is due to the Kronecker product \bar{{\boldsymbol{u}}}_{n}\otimes{\boldsymbol{u}}_{n}. Sets \mathcal{K} having this property are called sparse rulers [Romero16CCSspm]. Furthermore, if the set contains a minimum number of elements, they are called minimal sparse rulers, which results in the best possible compression.

Let us illustrate this with an example for N=10. In this case, the set \mathcal{K}=\{0,1,4,7,9\} with K=|\mathcal{K}|=5 elements is a minimal sparse ruler. In other words, by choosing the subsampling matrix {\boldsymbol{\Phi}}=\operatorname*{\mathrm{diag}}_{\rm r}[{\boldsymbol{w}}] with {\boldsymbol{w}}=[1,1,0,0,1,0,0,1,0,1]^{T} we can ensure that the matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm s} is full column rank, and hence the second-order statistics of {\boldsymbol{x}} can be estimated using (11) by subsampling only K=5 nodes. Here, we achieve a compression rate of K/N=0.5. Similarly, for N=80, the minimal sparse ruler has K=15 elements, and this results in a compression rate of K/N=0.1875 (we will see an example related to N=80 and K=15 in Section LABEL:sec:numericalresults). Sparse rulers for other values of N are tabulated in [linebarger1993difference].

Computing minimal sparse rulers is a combinatorial problem with no known expressions. Nevertheless, subsamplers such as coprime [vaidyanathan2011sparse] and nested sparse samplers [pal2010nested], which can be computed using a closed-form expression for any N, are also valid covariance subsamplers. However, they are not minimal sparse rulers and thus they do not provide the best compression rate.

Subsampler design for reconstructing the second-order statistics of signals residing on a circulant graph is as elegant as that for reconstructing the second-order statistics of stationary time-series. The design of subsamplers for general graphs, however, is more challenging. This is the subject of Section LABEL:sec:subsampdesgn.

## V Parameteric Models

In this section, we will focus on a parametric representation of the graph power spectrum. In particular, the focus will be on moving average and autoregressive parametric models. Typically, the model order (i.e., the number of parameters) is much smaller than the length of the graph signal, and since we now have to recover only these parameters, a much stronger compression can be achieved. Also, this means that, we need to store or transmit only fewer parameters, which could be used to generate realizations of second-order stationary graph signals (we will illustrate this with an example in Section LABEL:sec:numericalresults)

Parametric methods can be viewed as an alternative approach, where going to the graph spectral domain may be avoided, and instead, all the processing is done directly in the graph vertex domain.

### V-A Graph moving average models

As before, we assume that the stationary graph signal {\boldsymbol{x}} is generated by graph filtering zero-mean unit-variance white noise. Recall that in Section III, we did not impose any structure to the graph filter, but now we will assume that the graph filter has a finite impulse response with an all-zero form as in (3); see [perraudin2016stationary, marques2016stationary].

Let us begin by writing the graph signal {\boldsymbol{x}} as

 {\boldsymbol{x}}={\boldsymbol{H}}({\boldsymbol{h}}){\boldsymbol{n}}=\sum_{l=0}% ^{L-1}h_{l}{\boldsymbol{S}}^{l}{\boldsymbol{n}}={\boldsymbol{U}}\left(\sum_{l=% 0}^{L-1}h_{l}{\boldsymbol{\Lambda}}^{l}\right){\boldsymbol{U}}^{H}{\boldsymbol% {n}}

with covariance matrix

 \displaystyle{\boldsymbol{R}}_{\boldsymbol{x}} \displaystyle={{\boldsymbol{H}}}({\boldsymbol{h}}){{\boldsymbol{H}}}^{H}({% \boldsymbol{h}}) (12) \displaystyle={\boldsymbol{U}}\left(\sum_{l=0}^{L-1}h_{l}{\boldsymbol{\Lambda}% }^{l}\right)\left(\sum_{l=0}^{L-1}\bar{h}_{l}{\boldsymbol{\Lambda}}^{l}\right)% {\boldsymbol{U}}^{H},

where {\boldsymbol{x}} is a moving average graph signal (G-MA) of order L-1 with G-MA coefficients \{h_{k}\}_{k=0}^{L-1}, and the length-L vector {\boldsymbol{h}} collects the G-MA coefficients as {\boldsymbol{h}}=[h_{0},h_{1},\ldots,h_{L-1}]^{T}. Moving average models are particularly useful to represent a smooth graph power spectrum [perraudin2016stationary, marques2016stationary].

The expression (12) basically means that we can express the covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}} as a polynomial of the graph shift operator:

 {\boldsymbol{R}}_{\boldsymbol{x}}=\sum_{k=0}^{Q-1}b_{k}{\boldsymbol{S}}^{k}, (13)

where Q=\min\{2L-1,N\} unknown expansion coefficients \{b_{k}\}_{k=0}^{Q-1} collected in the vector {\boldsymbol{b}}=[b_{0},b_{1},\cdots,b_{Q-1}]^{T}\in\mathbb{R}^{Q} completely characterize the covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}}. In other words, we assume a linear parametrization of the covariance matrix {\boldsymbol{R}}_{\boldsymbol{x}} using the set of Q Hermitian matrices \{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\} as a basis.

The expansion coefficients {\boldsymbol{b}} depend on the G-MA coefficients {\boldsymbol{h}}. To see this, let us consider an example G-MA model with L=3 having coefficients {\boldsymbol{h}}=[h_{0},h_{1},h_{2}]^{T}, for which (13) simplifies to

 \displaystyle{\boldsymbol{R}}_{\boldsymbol{x}}=h_{0}^{2}{\boldsymbol{I}} \displaystyle+2h_{0}h_{1}{\boldsymbol{S}}+(h_{1}^{2}+2h_{0}h_{2}){\boldsymbol{% S}}^{2} \displaystyle\>+2h_{1}h_{2}{\boldsymbol{S}}^{3}+h_{2}^{2}{\boldsymbol{S}}^{4}. (14)

This means that, {\boldsymbol{b}}({\boldsymbol{h}}) will be of length 2L-1 with entries {\boldsymbol{b}}({\boldsymbol{h}})=[h_{0}^{2},2h_{0}h_{1},h_{1}^{2}+2h_{2}h_{0% },2h_{2}h_{1},h_{2}^{2}]^{T} that are related to the G-MA parameters {\boldsymbol{h}}. To arrive a simple (unconstrained) least squares estimator, we will ignore this structure in {\boldsymbol{b}} (we will discuss the how to account for this structure at the end of this subsection). Therefore, with a slight abuse of notation we will henceforth refer to {\boldsymbol{b}}({\boldsymbol{h}}) as the G-MA coefficients.

Depending on the shape of the power spectrum, Q can be much smaller than the number of graph nodes (i.e., the length of the vector {\boldsymbol{p}}) thus allowing a higher compression. In any case, the value of Q will be at most N, recalling that N is the degree of the minimal (and characteristic) polynomial of {\boldsymbol{S}}. That is to say, for Q\geq N, the set of matrices \{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\} are linearly dependent.

Vectorizing {\boldsymbol{R}}_{\boldsymbol{x}} in (13) yields

 {\boldsymbol{r}}_{\boldsymbol{x}}={\rm vec}({\boldsymbol{R}}_{\boldsymbol{x}})% =\sum_{k=0}^{Q-1}b_{k}{\rm vec}({\boldsymbol{S}}^{q})={\boldsymbol{\Psi}}_{\rm MA% }{\boldsymbol{b}}, (15)

where we have stacked {\rm vec}({\boldsymbol{S}}^{q}) to form the columns of the matrix {\boldsymbol{\Psi}}_{\rm MA}\in\mathbb{R}^{N^{2}\times Q} as

 {\boldsymbol{\Psi}}_{\rm MA}=\left[{\rm vec}({\boldsymbol{S}}^{0}),{\rm vec}({% \boldsymbol{S}}^{1}),\cdots,{\rm vec}({\boldsymbol{S}}^{Q-1})\right],

and the subscript “{\rm MA}” in {\boldsymbol{\Psi}}_{\rm MA} stands for moving average.

The covariance matrix of the subsampled graph signal {\boldsymbol{y}} in (7) will then be

 {\boldsymbol{R}}_{\boldsymbol{y}}={\boldsymbol{\Phi}}{\boldsymbol{R}}_{% \boldsymbol{x}}{\boldsymbol{\Phi}}^{T}=\sum_{k=0}^{Q-1}b_{k}{\boldsymbol{\Phi}% }{\boldsymbol{S}}^{k}{\boldsymbol{\Phi}}^{T}. (16)

As in the graph spectral domain approach discussed in Section III, the G-MA coefficients \{b_{k}\}_{k=0}^{Q-1} of {\boldsymbol{R}}_{\boldsymbol{y}} with respect to the set \{{\boldsymbol{\Phi}}{\boldsymbol{S}}^{0}{\boldsymbol{\Phi}}^{T},{\boldsymbol{% \Phi}}{\boldsymbol{S}}{\boldsymbol{\Phi}}^{T},\cdots,{\boldsymbol{\Phi}}{% \boldsymbol{S}}^{Q-1}{\boldsymbol{\Phi}}^{T}\} are the same as those of {\boldsymbol{R}}_{\boldsymbol{x}} with respect to the set \{{\boldsymbol{S}}^{0},{\boldsymbol{S}},\cdots,{\boldsymbol{S}}^{Q-1}\}.

Vectorizing {\boldsymbol{R}}_{\boldsymbol{y}}, we get a set of K^{2} equations in Q unknowns, given by

 \displaystyle{\boldsymbol{r}}_{\boldsymbol{y}}={\rm vec}({\boldsymbol{R}}_{% \boldsymbol{y}}) \displaystyle=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\rm vec}({% \boldsymbol{R}}_{\boldsymbol{x}}) (17) \displaystyle=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi% }}_{\rm MA}{\boldsymbol{b}}.

If the matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA} has full column rank, which requires K^{2}\geq Q, then the overdetermined system (17) can be uniquely solved using least squares as

 \widehat{\boldsymbol{b}}=[({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){% \boldsymbol{\Psi}}_{\rm MA}]^{\dagger}{\boldsymbol{r}}_{\boldsymbol{y}}. (18)
###### Corollary 1.

A full row rank matrix {\boldsymbol{\Phi}}\in\mathbb{R}^{K\times N} is a valid graph covariance subsampler if and only if the matrix ({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}}){\boldsymbol{\Psi}}_{\rm MA} is tall, i.e., K^{2}\geq Q, and {\rm null}({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi}})\cap{\rm ran}({% \boldsymbol{\Psi}}_{\rm MA})=\emptyset.

###### Proof.

Follows from Theorem 1. ∎

Although knowing the moving average filter coefficients {\boldsymbol{b}} is equivalent to knowing {\boldsymbol{R}}_{\boldsymbol{x}}, it might be interesting to study the relation between {\boldsymbol{b}} and the power spectrum {\boldsymbol{p}}. We can relate the vector {\boldsymbol{p}} and the vector {\boldsymbol{b}}, by using (6) and (13). That is, we can write p_{n}=\sum_{k=0}^{Q-1}b_{k}{\lambda}_{n}^{k}, or in matrix-vector form we have

 {\boldsymbol{p}}={\boldsymbol{V}}_{Q}{\boldsymbol{b}},

where {\boldsymbol{V}}_{Q} is an N\times Q Vandermonde matrix with (i,j)th entry equal to \lambda_{i}^{j-1}. To recover {\boldsymbol{p}} from {\boldsymbol{b}}, however, we need all the N eigenvalues of {\boldsymbol{S}} to construct {\boldsymbol{V}}_{Q}.

This relation between {\boldsymbol{p}} and {\boldsymbol{b}} can be used to show the equivalence between the linear models (10) and (17) as follows. The fact that {\boldsymbol{S}}^{q}={\boldsymbol{U}}{\boldsymbol{\Lambda}}^{q}{\boldsymbol{U}% }^{H} from (1) allows us to express {\boldsymbol{\Psi}}_{\rm MA} in (17) as {\boldsymbol{\Psi}}_{\rm MA}=({\bar{\boldsymbol{U}}}\circ{\boldsymbol{U}}){% \boldsymbol{V}}_{Q}. Using this in (17), we obtain {\boldsymbol{r}}_{\boldsymbol{y}}=({\boldsymbol{\Phi}}\otimes{\boldsymbol{\Phi% }})({\bar{\boldsymbol{U}}}\circ{\boldsymbol{U}}){\boldsymbol{V}}_{Q}{% \boldsymbol{b}}=({\boldsymbol{\Phi}}{\bar{\boldsymbol{U}}}\circ{\boldsymbol{% \Phi}}{\boldsymbol{U}}){\boldsymbol{p}}.

In the following, we exploit the structure in {\boldsymbol{b}}, which we ignored while solving (17), to develop a constrained least squares estimator.

###### Remark 2 (Constrained least squares).

To reveal the structure in {\boldsymbol{b}}({\boldsymbol{h}}), let us recall the example (14) with L=3. The coefficients in {\boldsymbol{b}}({\boldsymbol{h}}) are related to the squared polynomial p(t)=(h_{0}+h_{1}t+h_{2}t^{2})^{2}, which can also be written as

 \begin{aligned} \displaystyle p(t)&\displaystyle={\boldsymbol{h}}^{T}\left[% \begin{array}[]{ccc}1&t&t^{2}\\ t&t^{2}&t^{3}\\ t^{2}&t^{3}&t^{4}\cr\omit\span\omit\span\@@LTX@noalign{ }\omit\\ \end{array}\end{aligned}
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters