Nonparametric collective spectral density estimation with an application to clustering the brain signals

Mehdi Maadooliat^{1}^{1}1Department of Mathematics, Statistics and Computer Science, Marquette University, Milwaukee, WI 53201-1881 USA. E-mail: mehdi@mscs.mu.edu, Ying Sun^{2}^{2}2CEMSE Division, King Abdullah University of Science and Technology, Thuwal 23955-6900 Saudi Arabia. E-mail: ying.sun@kaust.edu.sa and Tianbo Chen{}^{2}

July 15, 2019

Abstract

In this paper, we develop a method for the simultaneous estimation of spectral density functions (SDFs) for a collection of stationary time series that share some common features. Due to the similarities among the SDFs, the log-SDF can be represented using a common set of basis functions. The basis shared by the collection of the log-SDFs is estimated as a low-dimensional manifold of a large space spanned by a pre-specified rich basis. A collective estimation approach pools information and borrows strength across the SDFs to achieve better estimation efficiency. Also, each estimated spectral density has a concise representation using the coefficients of the basis expansion, and these coefficients can be used for visualization, clustering, and classification purposes. The Whittle pseudo-maximum likelihood approach is used to fit the model and an alternating blockwise Newton-type algorithm is developed for the computation. A web-based shiny App found at “https://ncsde.shinyapps.io/NCSDE” is developed for visualization, training and learning the SDFs collectively using the proposed technique. Finally, we apply our method to cluster similar brain signals recorded by the electroencephalogram for identifying synchronized brain regions according to their spectral densities.

MSC 2010 subject classifications: Primary 62G05, 62M15; secondary 62P10.

Keywords and phrases: Collective estimation; nonparametric estimation; roughness penalty; time series clustering; spectral density function; Whittle likelihood

Short title: Nonparametric collective spectral density estimation

## 1 Introduction

The spectral density plays an important role in time series analysis in frequency domain. We suppose that X_{t} represents a zero-mean, weakly stationary time series with the autocovariance function \gamma(h) defined as \gamma(h)=E(X_{t}X_{t+h}), h=0,\pm 1,\pm 2,\ldots. If the autocovariance function is absolutely summable, i.e., \sum_{h=-\infty}^{\infty}|\gamma(h)|<\infty, then the autocovariance sequence \gamma(h) has the spectral representation

\gamma(h)=\displaystyle\int_{-1/2}^{1/2}{f(\omega)}e^{2\pi i\omega h}d\omega, |

where f(\omega) is the spectral density of X_{t}, which has the inverse Fourier representation

f(\omega)=\sum_{h=-\infty}^{\infty}\gamma(h)e^{-2\pi i\omega h},\quad-1/2\leq% \omega\leq 1/2. |

Given the time series \{x_{t}, t=1,\ldots,n\}, when {\gamma}(h) is replaced by the sample covariance \hat{\gamma}(h), the periodogram is defined as

{I_{n}(\omega_{j})}=\displaystyle\sum_{h=-(n-1)}^{n+1}\hat{\gamma}(h)e^{-2\pi i% \omega_{j}h},\quad j=0,1,...,n-1, |

which can be calculated as I_{n}(\omega_{j})=|d(\omega_{j})|^{2}, where \displaystyle d(\omega_{j})=n^{-1/2}\sum_{t=1}^{n}x_{t}e^{-2\pi i\omega_{j}t} is the discrete Fourier transform of x_{t} at the fundamental frequencies \omega_{j}=j/n.

However, the raw periodogram is not a consistent estimator for the spectral density of a stationary random process. One classical method to obtain a consistent estimator is to smooth the periodogram across frequencies. yuen1979smoothed analyzed the performance of three methods of periodogram smoothing for spectrum estimation. wahba1980automatic developed an objective optimum smoothing procedure for estimating the log-spectral density using the spline to smooth the log-periodogram. A discrete spectral average estimator and lag window estimators were introduced in brockwell2013time. Both of the two methods are consistent. brillinger2001time introduced periodogram kernel smoothing. One critical issue in periodogram smoothing is span selection. lee1997simple proposed a span selector based on unbiased risk estimation. ombao2001simple proposed using the Gamma-deviance generalized cross-validation (Gamma GCV), and lee2001stabilized proposed a bandwidth selection based on coupling of the so-called plug-in and the unbiased risk estimation ideas.

Another popular method for spectral density estimation is based on likelihoods. For example, capon1983maximum used the maximum likelihood method to estimate the spectral density of signals with noise. In guler2001ar, the spectral density of brain signal data was analyzed using the maximum likelihood. The well-known Whittle likelihood was developed for time series analysis in whittle1953estimation, whittle1954some, and whittle1962gaussian. pawitan1994nonparametric proposed using a penalized Whittle likelihood to estimate spectral density functions. chow1985sieve developed a penalized likelihood-type method for the nonparametric estimation of the spectral density of Gaussian processes. fan1998automatic used local polynomial techniques to fit the Whittle likelihood for spectral density estimation. We suppose there is a time series {\bf X} of length n from a mean-zero stationary Gaussian process with a parametric covariance function \gamma(h;{\boldsymbol{\theta}}), where {\boldsymbol{\theta}} is the unknown parameter. If we let \Gamma_{n,{\boldsymbol{\theta}}} be the covariance matrix of the random vector {\bf X}, then the likelihood function is

L({\boldsymbol{\theta}})=\displaystyle{{1}\over{(2\pi)^{n/2}|{\Gamma_{n,{% \boldsymbol{\theta}}}}|^{1/2}}}\exp\left(-\displaystyle{{1}\over{2}}{\bf X}^{% \top}{\Gamma}^{-1}_{n,{\boldsymbol{\theta}}}{\bf X}\right), |

and \ell({\boldsymbol{\theta}})=-2\times\hbox{log}L({\boldsymbol{\theta}}) is the log-likelihood,

\ell({\boldsymbol{\theta}})\propto\hbox{log}(|{\Gamma_{n,{\boldsymbol{\theta}}% }}|)+{\mathrm{tr}}\left({\bf X}{\bf X}^{\top}{\Gamma}^{-1}_{{n,{\boldsymbol{% \theta}}}}\right). |

Whittle1954 proposed an approximation of the above log-likelihood function, known as the Whittle likelihood approximation:

\displaystyle\ell_{W}({\boldsymbol{\theta}})=n\int_{-1/2}^{1/2}\left\{\hbox{% log}({f_{\boldsymbol{\theta}}(\omega)})+I_{n}(\omega){f_{\boldsymbol{\theta}}(% \omega)}^{-1}\right\}d\omega, | (1) |

where {f_{\boldsymbol{\theta}}(\omega)} is the spectral density function. For a discrete frequency range, the Whittle approximation (1) can be written as

\ell_{W}({\boldsymbol{\theta}})\lx@stackrel{{\scriptstyle.}}{{=}}n\sum_{-1/2<% \omega_{j}<1/2}\Bigl{\{}\hbox{log}({f_{\boldsymbol{\theta}}(\omega_{j})})+{I_{% n}(\omega_{j})}{f_{\boldsymbol{\theta}}(\omega_{j})}^{-1}\Bigr{\}}. |

Using the fast Fourier transform (FFT), \ell_{W}({\boldsymbol{\theta}}) can be obtained efficiently with only O\left(n\hbox{log}_{2}n\right) operations.

We apply our method to the neuroscience study of functional connectivity in the brain with electroencephalgram (EEG) data. The EEG is a method of monitoring spontaneous electrical activity in the brain over a period of time. EEGs are typically recorded from multiple electrodes, referred as EEG channels, placed on the scalp. The clustered EEG channels are useful for understanding how different brain regions communicate with each other. To identify synchronized EEG channels that show similar oscillations or waveforms, many time series clustering algorithms have been developed to cluster similar smoothed periodograms estimated from the EEG channels separately (see Hennig:2015, for some examples).

In this paper, we propose collectively estimating multiple spectral densities that share common features. The collective estimation was first proposed by Maadooliat2016 for the estimation of probability density functions in protein structure analysis. For multiple stationary time series that may share similar spectral density functions, we propose a nonparametric spectral density estimation approach that minimizes a penalized Whittle likelihood. The dimension is reduced by representing the raw periodograms on a set of common basis functions while allowing each one to differ by introducing random effects into the model. The collectively estimated spectral densities can then be used for clustering purposes. Collective estimation is a novel approach for estimating multiple spectral densities that share common features in a statistically more efficient way.

The rest of the paper is organized as follows. Section 2 presents the core of the proposed method. Subsection 2.1 provides the blockwise Newton-Raphson algorithm, and subsections 2.2-LABEL:sec:identifiability provide the implementation details. Section LABEL:sec:sim reports the simulation results to illustrate the proposed collective estimation approach and to compare it with competitive non-collective estimation approaches. The application to EEG data is given in Section LABEL:sec:app. Section LABEL:sec:end concludes the paper.

## 2 Methodology

We consider a collection of stationary time series, {\bf X}_{i}s, where i=1,\cdots,m and {\bf X}_{i}^{\top}=(X_{i1},X_{i2},\cdots,X_{in}) with the associated spectral density f_{i}. We want to estimate the spectral density functions together. The rationale of this collective spectral density estimation approach is to improve the efficiency of the estimates by using a shared basis to obtain the SDFs.

We assume that each log-spectral density function can be represented by a linear combination of a common set of basis functions \{\phi_{k}(\omega),k=1,\cdots,K\}, and that each has its own set of coefficients. More specifically, we assume that \hbox{log}\{f_{i}(\omega)\}=u_{i}(\omega), with

u_{i}(\omega)=\sum_{k=1}^{K}\phi_{k}(\omega)\alpha_{ik},\qquad i=1,\dots,m. | (2) |

Equivalently, the spectral density functions can be written as

f_{i}(\omega)=\exp{u_{i}(\omega)}=\exp\Biggl{\{}\sum_{k=1}^{K}\phi_{k}(\omega)% \alpha_{ik}\Biggr{\}},\quad i=1,\dots,m. | (3) |

For identifiability, we require that 1,\phi_{k},\ k=1,\dots,K, to be linearly independent. We would like K to be a small number so that the number of parameters to be estimated remains manageable even when we estimate a large number of spectral densities (i.e., when m is large).

In our setting, the basis functions are not prespecified and need to be determined from the data. To this end, we suppose that these basis functions fall in a low-dimensional subspace of a function space spanned by a rich family of fixed basis functions, \{b_{\ell}(\omega),\ell=1,\cdots,L\} (L\gg K), such that

\phi_{k}(\omega)=\sum_{\ell=1}^{L}b_{\ell}(\omega)\theta_{{\ell}k},\qquad k=1% \dots,K. | (4) |

For identifiability, we require that 1,\ b_{\ell},\ \ell=1,\dots,L, to be linearly independent. A large enough L ensures the necessary flexibility to represent the unknown spectral densities. For univariate cases, the fixed basis can be the monomials, the B-splines, or the Fourier basis. Bivariate splines can be used as the fixed basis functions for bivariate spectral densities.

To simplify the presentation, we now introduce some vectors and matrices to denote the quantities of interest: {\boldsymbol{\phi}}(\omega)=(\phi_{1}(\omega),\cdots,\phi_{K}(\omega))^{\top}, {\boldsymbol{\alpha}}_{i}=(\alpha_{i1},\cdots,\alpha_{iK})^{\top}, {\bf b}(\omega)=(b_{1}(\omega),\cdots,b_{L}(\omega))^{\top}, {\boldsymbol{\theta}}_{k}=(\theta_{1k},\cdots,\theta_{Lk})^{\top}, and {\boldsymbol{\Theta}}=({\boldsymbol{\theta}}_{1},\cdots,{\boldsymbol{\theta}}_% {K}). Then, from (2) and (4), we can rewrite u_{i}(\omega) in the vector-matrix form as

u_{i}(\omega)={\boldsymbol{\phi}}(\omega)^{\top}{\boldsymbol{\alpha}}_{i}={\bf b% }(\omega)^{\top}{\boldsymbol{\Theta}}{\boldsymbol{\alpha}}_{i},\qquad i=1,% \dots,m. | (5) |

One may combine the equations above for i=1,\dots,m into a matrix form by evaluating the log-SDFs over the discrete frequencies {\boldsymbol{\omega}}=(\omega_{1},\cdots,\omega_{\tilde{n}})^{\top}. The m equations given in (5) can be written as {\bf U}={\bf B}{\boldsymbol{\Theta}}{\bf A}^{\top}, where {\bf U}=\left(u_{1}({\boldsymbol{\omega}}),\cdots,u_{m}({\boldsymbol{\omega}})\right) is an \tilde{n}\times m matrix that represents the log-SDFs, {\bf B}=\left({\bf b}(\omega_{1}),\cdots,{\bf b}(\omega_{\tilde{n}})\right)^{\top} is an \tilde{n}\times L matrix that represents the rich basis functions at the discrete frequencies {\boldsymbol{\omega}}, and {\bf A}=({\boldsymbol{\alpha}}_{1},\dots,{\boldsymbol{\alpha}}_{m})^{\top}. The unknown parameters can then be collectively written as the pair ({\boldsymbol{\Theta}},{\bf A}). There is an identifiability issue caused by the non-uniqueness of the parametrization of ({\boldsymbol{\Theta}},{\bf A}). This issue can be resolved by introducing some restrictions on the parameterization; see subsection LABEL:sec:identifiability.

We could have used the fixed basis \{b_{\ell}(\omega),\ell=1,\cdots,L\} in (2) and (3); however, that would be either too restrictive (if L is small) or produce a large number of parameters (if L is large). Alternatively, if we were to model the individual density functions separately using the fixed basis \{b_{\ell}(\omega),\ell=1,\cdots,L\}, then we would write

u_{i}(\omega)={\bf b}(\omega)^{\top}\boldsymbol{\psi}_{i},\qquad i=1,\dots,m. | (6) |

We let \mathbf{\Psi}=(\boldsymbol{\psi}_{1},\dots,\boldsymbol{\psi}_{m}) be the L\times m matrix of coefficients from the basis expansions given in (6). By comparing (5) and (6), we obtain \mathbf{\Psi}={\boldsymbol{\Theta}}{\bf A}^{\top}, which is a rank K matrix. Thus, the collective modeling approach introduces a low-rank structure to the coefficient matrix in the basis expansion of the log-spectral densities. This dimensionality reduction allows us to significantly reduce the number of parameters to be estimated and, thus, gain estimation efficiency.

### 2.1 Estimation using penalized Whittle likelihood

The periodogram I_{i,n} is a rough estimate of the spectral density, f_{i}, associated with the time series {\bf X}_{i} observed over n time points:

I_{i,n}(\omega)={\Bigl{|}\displaystyle{{1}\over{n}}\sum_{t=1}^{n}X_{it}e^{-2% \pi it\omega}\Bigr{|}^{2}}. |

We consider the periodogram at the Fourier frequencies \bm{\omega}=2\pi\bm{J}/n for \bm{J}=\{-\lfloor(n-1)/2\rfloor,\cdots,n-\lfloor n/2\rfloor\}.

The Whittle likelihood for estimating m spectral densities has the following form:

\ell_{W}({\boldsymbol{\Theta}},{\bf A})=\sum_{i=1}^{m}\sum_{{j}\in\bm{J}}% \Biggl{\{}u_{i}(\omega_{j})+I_{i,n}(\omega_{j})\exp\bigl{[}-u_{i}(\omega_{j})% \bigr{]}\Biggr{\}}, | (7) |

where u_{i}(\omega) are defined in (5). It is concave in {\boldsymbol{\alpha}}_{i} when other parameters are fixed and also concave in {\boldsymbol{\theta}}_{k} when other parameters are fixed. Applying the roughness penalty approach of function estimation (Green1994), we estimate the model parameters by minimizing the penalized likelihood criterion

-2\,\ell_{W}({\boldsymbol{\Theta}},{\bf A})+\lambda\sum_{k=1}^{K}\,{\sf PEN}(% \phi_{k}), | (8) |

where {\sf PEN}(\phi_{k}) is a roughness penalty function that regularizes the estimated basis function \phi_{k} to ensure that it is a smooth function, and \lambda>0 is a penalty parameter. The penalty function can be written in a quadratic form as

\sum_{k=1}^{K}\,{\sf PEN}(\phi_{k})=\sum_{i=k}^{K}{\boldsymbol{\theta}}_{k}^{% \top}{\bf R}{\boldsymbol{\theta}}_{k}={\mathrm{tr}}\{{\boldsymbol{\Theta}}^{% \top}{\bf R}{\boldsymbol{\Theta}}\}. | (9) |

Two choices for the penalty matrices ({\bf R}_{1} and {\bf R}_{2}), based on the second derivative of the basis functions and the difference operator, are given in subsection 2.3.

We use an alternating blockwise Newton-Raphson algorithm to minimize the penalized Whittle likelihood approximation. Our algorithm cycles through updating {\boldsymbol{\alpha}}_{i} for i=1,\dots,m and {\boldsymbol{\theta}}_{k} for k=1,\dots,K until convergence. Following the usual step-halving strategy for the Newton-Raphson iteration, the updating formulas are

\displaystyle{\boldsymbol{\alpha}}_{i}^{new} | \displaystyle= | \displaystyle{\boldsymbol{\alpha}}_{i}^{old}-\tau\,\biggl{[}\frac{\partial^{2}% }{\partial{\boldsymbol{\alpha}}_{i}\partial{\boldsymbol{\alpha}}_{i}^{\top}}\{% \ell_{W}({\boldsymbol{\Theta}},{\bf A})\}\biggr{]}^{-1}\biggl{[}\frac{\partial% }{\partial{\boldsymbol{\alpha}}_{i}}\{\ell_{W}({\boldsymbol{\Theta}},{\bf A})% \}\biggr{]}\bigg{|}_{{\boldsymbol{\Theta}}={\boldsymbol{\Theta}}^{old},{\bf A}% ={\bf A}^{old}} | (10) | ||

\displaystyle= | \displaystyle{\boldsymbol{\alpha}}_{i}^{old}-\tau\biggl{[}{\boldsymbol{\Theta}% }^{\top}\sum_{j}\Bigl{\{}{\bf b}(\omega_{j})I_{i,n}(\omega_{j})\exp\bigl{[}-u_% {i}(\omega_{j})\bigr{]}{\bf b}(\omega_{j})^{\top}\Bigr{\}}{\boldsymbol{\Theta}% }\biggr{]}^{-1}\times | ||||

\displaystyle\quad\biggl{[}{\boldsymbol{\Theta}}^{\top}\sum_{j}\Bigl{\{}{\bf b% }(\omega_{j})-{\bf b}(\omega_{j})I_{i,n}(\omega_{j})\exp\bigl{[}-u_{i}(\omega_% {j})\bigr{]}\Bigr{\}}\biggr{]}\bigg{|}_{{\boldsymbol{\Theta}}={\boldsymbol{% \Theta}}^{old},{\bf A}={\bf A}^{old}} |

and

\displaystyle{\boldsymbol{\theta}}_{k}^{new}\ ={\boldsymbol{\theta}}_{k}^{old}% -\tau\,\biggl{[}\frac{\partial^{2}}{\partial{\boldsymbol{\theta}}_{k}\partial{% \boldsymbol{\theta}}_{k}^{\top}}\{\ell_{W}({\boldsymbol{\Theta}},{\bf A})\}-% \lambda{\bf R}\biggr{]}^{-1}\times | |||

\displaystyle\qquad\qquad\qquad\biggl{[}\frac{\partial}{\partial{\boldsymbol{% \theta}}_{k}}\{\ell_{W}({\boldsymbol{\Theta}},{\bf A})\}-\lambda{\bf R}{% \boldsymbol{\theta}}_{k}\biggr{]}\bigg{|}_{{\boldsymbol{\Theta}}={\boldsymbol{% \Theta}}^{old},{\bf A}={\bf A}^{old}}, | |||

\displaystyle={\boldsymbol{\theta}}_{k}^{old}-\tau\biggl{[}\sum_{i=1}^{m}% \alpha_{ik}^{2}\sum_{j}\Bigl{\{}{\bf b}(\omega_{j})I_{i,n}(\omega_{j})\exp% \bigl{[}-u_{i}(\omega_{j})\bigr{]}{\bf b}(\omega_{j})^{\top}\Bigr{\}}-\lambda{% \bf R}\biggr{]}^{-1}\times | |||

\displaystyle\qquad\biggl{[}\sum_{i=1}^{m}\alpha_{ik}\sum_{j}\Bigl{\{}{\bf b}(% \omega_{j})-{\bf b}(\omega_{j})I_{i,n}(\omega_{j})\exp\bigl{[}-u_{i}(\omega_{j% })\bigr{]}\Bigr{\}}-\lambda{\bf R}{\boldsymbol{\theta}}_{k}\biggr{]}\bigg{|}_{% {\boldsymbol{\Theta}}={\boldsymbol{\Theta}}^{old},{\bf A}={\bf A}^{old}} | (11) |

### 2.2 Selecting the tuning parameter

We may select the penalty parameter by minimizing the AIC (Akaike1974),

{\rm AIC}(\lambda)=-2\,\ell_{W}(\widehat{{\boldsymbol{\Theta}}},\widehat{{\bf A% }})+2\,{\sf df}(\lambda), | (12) |

where \ell_{W}({\boldsymbol{\Theta}},{\bf A}) is the log-likelihood defined in (7), and the degrees of freedom {\sf df}(\lambda) is defined as

\displaystyle{\sf df}(\lambda)=\sum_{k=1}^{K}{\rm trace}\biggl{\{}\biggl{[}% \sum_{i=1}^{m}\alpha_{ik}^{2}\sum_{j}\Bigl{\{}{\bf b}(\omega_{j})I_{i,n}(% \omega_{j})\exp\bigl{[}-u_{i}(\omega_{j})\bigr{]}{\bf b}(\omega_{j})^{\top}% \Bigr{\}}-\lambda{\bf R}\biggr{]}^{-1}\times\biggr{.} | |||

\displaystyle\biggl{.}\biggl{[}\sum_{i=1}^{m}\alpha_{ik}^{2}\sum_{j}\Bigl{\{}{% \bf b}(\omega_{j})I_{i,n}(\omega_{j})\exp\bigl{[}-u_{i}(\omega_{j})\bigr{]}{% \bf b}(\omega_{j})^{\top}\Bigr{\}}\biggr{]}\biggr{\}}. | (13) |

The parameters in these formulas are replaced by their estimated values. The AIC can be derived as an approximation of the leave-one-out cross-validation (OSullivan1988; Gu2002).

Selecting the tuning parameter that minimizes the AIC requires training the model for different values of \lambdas and then picking the one that minimizes the criterion function, which can be very expensive in time. Instead, we present an alternative procedure that updates the value of the tuning parameter within the Newton-Raphson iterations. This idea has been used in a generalized mixture model to iteratively update the smoothing parameter (Schall1991). Schellhase2012 and Najibi2017 extended this approach for density estimation. We borrow their formulation, and use the parameter estimates in the i^{th} step to update the tuning parameter, \hat{\lambda}_{i+1}, through

\hat{\lambda}_{i+1}^{-1}=\frac{\hbox{trace}(\hat{\boldsymbol{\Theta}}_{i}^{% \top}{\bf R}\hat{\boldsymbol{\Theta}}_{i})}{{\sf df}(\hat{\lambda}_{i})-(a-1)}, | (14) |

where a is the order of the differences (derivative) used in the penalty matrix {\bf R} (see Section 2.3). From what we have seen in the implementation of the new procedure, updating the tuning parameter within the Newton-Raphson iterations, on average, does not increase the number of iterations required for convergence. Therefore, the new procedure obtains the final result p times faster than the old procedure, where p is the number of \lambdas used in the grid search to minimize the AIC.

### 2.3 Choices of penalties

#### 2.3.1 Second derivative of the basis

For the univariate spectral density estimation, if we use the usual squared-second-derivative penalty {\sf PEN}(\phi_{k})=\int\{\phi_{k}^{\prime\prime}(\omega)\}^{2}\,dx and \phi_{k}(\omega)={\bf b}(\omega)^{\top}{\boldsymbol{\theta}}_{k}, then {\bf R}_{1}=\displaystyle\int\ddot{{\bf b}}(\omega)\ddot{{\bf b}}(\omega)^{% \top}\,d\omega with \ddot{{\bf b}}(\omega)=({b}_{1}^{\prime\prime}(\omega),\dots,b_{L}^{\prime% \prime}(\omega))^{\top}.

#### 2.3.2 Difference operator

We can also control the roughness of the estimated functions by using the difference penalty (eilers1996) to achieve the appropriate level of smoothness. The variability is controlled through a difference function of order a, \Delta_{a}, where \Delta_{1}{\boldsymbol{\theta}}_{k}:={\boldsymbol{\theta}}_{k}-{\boldsymbol{% \theta}}_{k-1}, and \Delta_{a} is obtained recursively. For example, the second order difference function, \Delta_{2}, has the following form:

\Delta_{2}{\boldsymbol{\theta}}_{k}:=\Delta_{1}\Delta_{1}{\boldsymbol{\theta}}% _{k}={\boldsymbol{\theta}}_{k}-2{\boldsymbol{\theta}}_{k-1}+{\boldsymbol{% \theta}}_{k-2}. |

We can write the difference functions \Delta_{a} into a matrix form, \mbox{\boldmath$L$}_{a}. For example, when a=1 we have

\mbox{\boldmath$L$}_{1}={\left[{\begin{array}[]{*{20}{c}}1&{-1}&0&\ldots&0\\ 0&1&{-1}&\ddots&0\\ \vdots&\ddots&\ddots&\ddots&0\\ 0&\cdots&0&1&{-1}\cr\omit\span\omit\span\omit\span\omit\span\omit\span\omit% \span\omit\span\omit\span\omit\span\omit\span\omit\span\omit\span\omit\span% \omit\span\omit\span\omit\span\omit\span\omit\span\omit\span\@@LTX@noalign{ }\omit\\ \end{array}}} |