An efficient estimator for locally stationary Gaussian long-memory processes\thanksrefT1

# An efficient estimator for locally stationary Gaussian long-memory processes\thanksrefT1

[ [    [ Pontificia Universidad Católica de Chile Department of Statistics
Vicuña Mackenna 4860
Macul, Santiago
Chile
\smonth4 \syear2009\smonth1 \syear2010
\smonth4 \syear2009\smonth1 \syear2010
\smonth4 \syear2009\smonth1 \syear2010
###### Abstract

This paper addresses the estimation of locally stationary long-range dependent processes, a methodology that allows the statistical analysis of time series data exhibiting both nonstationarity and strong dependency. A time-varying parametric formulation of these models is introduced and a Whittle likelihood technique is proposed for estimating the parameters involved. Large sample properties of these Whittle estimates such as consistency, normality and efficiency are established in this work. Furthermore, the finite sample behavior of the estimators is investigated through Monte Carlo experiments. As a result from these simulations, we show that the estimates behave well even for relatively small sample sizes.

[
\kwd
\doi

10.1214/10-AOS812 \volume38 \issue5 2010 \firstpage2958 \lastpage2997 \newproclaimexmpExample[section] \newproclaimremRemark[section]

\runtitle

Locally stationary long-memory processes

\thankstext

T1Supported in part by Fondecyt Grant 1085239.

{aug}

A]\fnmsWilfredo \snmPalma\correflabel=e1]wilfredo@mat.puc.cl and A]\fnmsRicardo \snmOlea

class=AMS] \kwd[Primary ]62M10 \kwd[; secondary ]60G15. Nonstationarity \kwdlocal stationarity \kwdlong-range dependence \kwdWhittle estimation \kwdconsistency \kwdasymptotic normality \kwdefficiency.

## 1 Introduction

Even though stationarity is a very attractive theoretical assumption, in practice most time series data fail to meet this condition. As a consequence, several approaches to deal with nonstationarity have been proposed in the literature. Among these methodologies, differentiation and trend removal are popular choices. Other approaches include, for instance, the evolutionary spectral techniques first discussed by Priestley (1965). In a similar spirit, during the last decades a number of new time-varying dependence models have been proposed. One of these methodologies, the so-called locally stationary processes developed by Dahlhaus (1996, 1997), has been widely discussed in the recent time series literature, see, for example, Dahlhaus (2000), von Sachs and MacGibbon (2000), Jensen and Whitcher (2000), Guo et al. (2003), Genton and Perrin (2004), Orbe, Ferreira and Rodriguez-Poo (2005), Dahlhaus and Polonik (2006, 2009), Chandler and Polonik (2006), Fryzlewicz, Sapatinas and Subba Rao (2006) and Beran (2009), among others. This approach allows the stochastic process to be nonstationary, but assuming that the time variation of the model is sufficiently smooth so that it can be locally approximated by stationary processes.

On the other hand, during the last decades, long-range dependent data have arisen in disciplines as diverse as meteorology, hydrology, economics, etc., see, for example, the recent surveys by Doukhan, Oppenheim and Taqqu (2003) and Palma (2007). As a consequence, statistical methods for modeling that type of data are of great interest to scientists and practitioners from many fields. At the same time, many of these long-memory data also display nonstationary behavior, see, for instance, Granger and Ding (1996), Jensen and Whitcher (2000) and Beran (2009). Nevertheless, most of the currently available methods for dealing with long-range dependence are incapable of modeling time series with these features. In particular, much of the theory of locally stationary processes applies only to time series with short memory, such as time-varying autoregressive moving average (ARMA) processes and not to time series exhibiting both nonstationarity and strong dependence. In order to treat that type of data, this paper addresses a class of strongly dependent locally stationary processes. In particular, these models include a Hurst parameter which evolves over time. Following Dahlhaus (1997), we propose a Whittle maximum likelihood estimation technique for fitting Gaussian long-memory locally stationary models. This is an extension of the spectrum-based likelihood estimator introduced by Whittle (1953). A great advantage of this estimation procedure is its computational efficiency, since it only requires the calculation of the periodogram by means of the fast Fourier transform. Additionally, we prove in this article that the proposed Whittle estimator is asymptotically consistent, normally distributed and efficient. Thus, this paper provides a framework for modeling and making statistical inferences about several types of nonstationarities that may be difficult to handle with other techniques. For instance, changes in the variance of a time series could be spotted by simple inspection of the data. However, variations on the dependence structure of the data are far more difficult to uncover and model.

The remainder of this paper is structured as follows. Section 2 discusses a class of long-memory locally stationary processes and proposes a quasi maximum likelihood estimator based on an extended version of the Whittle spectrum-based methodology. The consistency, asymptotic normality and efficiency of these quasi maximum likelihood estimators are established. Applications of the asymptotic results to some specific locally stationary processes are also presented in this section. Proofs of the theorems are provided in Section 3. Note that the techniques employed by Dahlhaus (1997) to show the asymptotic properties of the Whittle estimates are no longer valid for the class of long-memory locally stationary processes discussed in this paper. This difficulty is due to the fact that these processes have an unbounded time-varying spectral density at zero frequency. Consequently, several technical results must be introduced and proved. Section 4 reports the results from several Monte Carlo experiments which allow to gain some insight into the finite sample behavior of the Whittle estimates. Conclusions are presented in Section 5 while auxiliary lemmas are provided in a technical appendix. Additional examples and simulations along with a comparison of the Whittle estimator with a kernel maximum likelihood estimation approach and two real-life applications of the proposed methodology can be found in Palma and Olea (2010). The bandwidth selection problem for the locally stationary Whittle estimator is also discussed in that paper, from an empirical perspective.

## 2 Definitions and main results

### 2.1 Long-memory locally stationary processes

A class of Gaussian locally stationary process with transfer function A^{0} can be defined by the spectral representation

 Y_{t,T}=\int_{-\pi}^{\pi}A^{0}_{t,T}(\lambda)e^{i\lambda t}\,dB(\lambda), (1)

for t=1,\ldots,T, where B(\lambda) is a Brownian motion on [-\pi,\pi] and there is a positive constant K and a 2\pi-periodic function A\dvtx(0,1]\times\mathbb{R}\to\mathbb{C} with A(u,-\lambda)=\overline{A(u,\lambda)} such that

 \sup_{t,\lambda}\biggl{|}A^{0}_{t,T}(\lambda)-A\biggl{(}\frac{t}{T},\lambda% \biggr{)}\biggr{|}\leq\frac{K}{T}, (2)

for all T. The transfer function A^{0}_{t,T}(\lambda) of this class of nontstationary processes changes smoothly over time so that they can be locally approximated by stationary processes. An example of this class of locally stationary processes is given by the infinite moving average expansion

 Y_{t,T}=\sigma\biggl{(}\frac{t}{T}\biggr{)}\sum_{j=0}^{\infty}\psi_{j}\biggl{(% }\frac{t}{T}\biggr{)}\varepsilon_{t-j}, (3)

where \{\varepsilon_{t}\} is a zero-mean and unit variance Gaussian white noise and \{\psi_{j}(u)\} are coefficients satisfying \sum_{j=0}^{\infty}\psi_{j}(u)^{2}<\infty for all u\in[0,1]. In this case, the transfer function of process (3) is given by A^{0}_{t,T}(\lambda)=\break\sigma(\frac{t}{T})\sum_{j=0}^{\infty}\psi_{j}(% \frac{t}{T})e^{-i\lambda j}=A(\frac{t}{T},\lambda), so that condition (2) is satisfied. The model defined by (3) generalizes the Wold expansion for a linear stationary process allowing the coefficients of the infinite moving average expansion vary smoothly over time. A particular case of (3) is the generalized version of the fractional noise process described by the discrete-time equation

 Y_{t,T}=\sigma\biggl{(}\frac{t}{T}\biggr{)}(1-B)^{-d({t/T})}\varepsilon_{t}=% \sigma\biggl{(}\frac{t}{T}\biggr{)}\sum_{j=0}^{\infty}\eta_{j}\biggl{(}\frac{t% }{T}\biggr{)}\varepsilon_{t-j}, (4)

for t=1,\ldots,T, where \{\varepsilon_{t}\} is a Gaussian white noise sequence with zero mean and unit variance and the infinite moving average coefficients \{\eta_{j}(u)\} are given by

 \eta_{j}(u)=\frac{\Gamma[j+d(u)]}{\Gamma(j+1)\Gamma[d(u)]}, (5)

where \Gamma(\cdot) is the Gamma function and d(\cdot) is a smoothly time-varying long-memory coefficient. For simplicity, the locally stationary fractional noise process (4) will be denoted as LSFN.

A natural extension of the LSFN model is the locally stationary autoregressive fractionally integrated moving average (LSARFIMA) process defined by the equation

 \Phi(t/T,B)Y_{t,T}=\sigma(t/T)\Theta(t/T,B)(1-B)^{-d(t/T)}\varepsilon_{t}, (6)

for t=1,\ldots,T, where for u\in[0,1], \Phi(u,B)=1+\phi_{1}(u)B+\cdots+\phi_{P}(u)B^{P} is an autoregressive polynomial, \Theta(u,B)=1+\theta_{1}(u)B+\cdots+\theta_{Q}(u)B^{Q} is a moving average polynomial, d(u) is a long-memory parameter, \sigma(u) is a noise scale factor and \{\varepsilon_{t}\} is a Gaussian white noise sequence with zero mean and unit variance. This class of models extends the well-known ARFIMA process, which is obtained when the components \Phi(u,B), \Theta(u,B), d(u) and \sigma(u) appearing in (6) do not depend on u. Note that by Theorem 4.3 of Dahlhaus (1996), under some regularity conditions on the polynomial \Phi(u,B), the model defined by (6) satisfies (1) and (2), see Jensen and Whitcher (2000) for details.

### 2.2 Estimation

Let \theta\in\Theta be a parameter vector specifying model (1) where the parameter space \Theta is a subset of a finite-dimensional Euclidean space. Given a sample \{Y_{1,T},\ldots,Y_{T,T}\} of the process (1) we can estimate \theta by minimizing the Whittle log-likelihood function

 \mathcal{L}_{T}(\theta)=\frac{1}{4\pi}\frac{1}{M}\int_{-\pi}^{\pi}\sum_{j=1}^{% M}\biggl{\{}\log f_{\theta}(u_{j},\lambda)+\frac{I_{N}(u_{j},\lambda)}{f_{% \theta}(u_{j},\lambda)}\biggr{\}}\,d\lambda, (7)

where f_{\theta}(u,\lambda)=|A_{\theta}(u,\lambda)|^{2} is the time-varying spectral density of the limiting process specified by the parameter \theta, I_{N}(u,\lambda)=\frac{|D_{N}(u,\lambda)|^{2}}{2\pi H_{2,N}(0)} is a tapered periodogram with

 D_{N}(u,\lambda)=\sum_{s=0}^{N-1}h\biggl{(}\frac{s}{N}\biggr{)}Y_{[uT]-N/2+s+1% ,T}e^{-i\lambda s},\qquad H_{k,N}=\sum_{s=0}^{N-1}h\biggl{(}\frac{s}{N}\biggr{% )}^{k}e^{-i\lambda s},

T=S(M-1)+N, u_{j}=t_{j}/T, t_{j}=S(j-1)+N/2, j=1,\ldots,M and h(\cdot) is a data taper. The intuition behind this extended version of the Whittle estimation procedure (7) is as follows: the sample \{Y_{1,T},\ldots,Y_{T,T}\} is subdivided into M blocks of length N each shifting S places from block to block. For instance, if we split a time series of T=652 observations into M=100 blocks of length N=256 each, shifting S=4 positions forward each time we get the blocks (Y_{1,652},Y_{2,652},\ldots,Y_{256,652}),\ldots,(Y_{397,652},Y_{398,652},% \ldots,Y_{652,652}). Then, the spectrum is locally estimated by means of the data tapered periodogram on each one of these M=100 blocks and then averaged to form (7). Finally, the Whittle estimator of the parameter vector \theta is given by

 \widehat{\theta}_{T}=\arg\min\mathcal{L}_{T}(\theta), (8)

where the minimization is over a parameter space \Theta. The analysis of the asymptotic properties of the Whittle locally stationary estimates (8) is discussed in detail next. Before stating these results, we introduce a set of the regularity conditions.

### 2.3 Assumptions

The first assumption below is concerned with the time-varying spectral density of the process. The second assumption is related to the data tapering function and the third assumption is concerned with the block sampling scheme. It is assumed that the parameter space \Theta is compact. In what follows, K is always a positive constant that could be different from line to line. {longlist}[A3.]

The time-varying spectral density of the limiting process (1) is strictly positive and satisfies

 f_{\theta}(u,\lambda)\sim C_{f}(\theta,u)|\lambda|^{-2d_{\theta}(u)},

as |\lambda|\to 0, where C_{f}(\theta,u)>0, 0<\inf_{\theta,u}d_{\theta}(u), \sup_{\theta,u}d_{\theta}(u)<\frac{1}{2} and d_{\theta}(u) has bounded first derivative with respect to u. There is an integrable function g(\lambda) such that |\nabla_{\theta}\log f_{\theta}(u,\lambda)|\leq g(\lambda) for all \theta\in\Theta, u\in[0,1] and \lambda\in[-\pi,\pi]. The function A(u,\lambda) is twice differentiable with respect to u and satisfies

 \int_{-\pi}^{\pi}A(u,\lambda)A(v,-\lambda)\exp(ik\lambda)\,d\lambda\sim C(% \theta,u,v)k^{d_{\theta}(u)+d_{\theta}(v)-1},

as k\to\infty, where |C(\theta,u,v)|\leq K for u,v\in[0,1] and \theta\in\Theta. The function f_{\theta}(u,\lambda)^{-1} is twice differentiable with respect to \theta, u and \lambda.

The data taper h(u) is a positive, bounded function for u\in[0,1] and symmetric around \frac{1}{2} with a bounded derivative.

The sample size T and the subdivisions integers N, S and M tend to infinity satisfying S/N\to 0, \sqrt{T}\log^{2}N/N\to 0, \sqrt{T}/M\to 0 and N^{3}\log^{2}N/\break T^{2}\to 0. {exmp} As an illustration of the assumptions described above, consider the extension of the usual fractional noise process with time-varying Hurst parameter, described by (4) and (5). The spectral density of this LSFN process is given by

 f_{\theta}(u,\lambda)=\frac{\sigma^{2}}{2\pi}\biggl{(}2\sin\frac{\lambda}{2}% \biggr{)}^{-2d_{\theta}(u)}.

Note that this function is integrable over \lambda\in[-\pi,\pi] for every u\in[0,1] as long as d_{\theta}(u)<\frac{1}{2} for all u\in[0,1] and \theta\in\Theta. Furthermore, we have that f_{\theta}(u,\lambda)\sim\frac{\sigma^{2}}{2\pi}|\lambda|^{-2d_{\theta}(u)}, as \lambda\to 0. By assuming that |\nabla_{\theta}d_{\theta}(u)|\leq K, we have that |\nabla_{\theta}\log f_{\theta}(u,\lambda)|=|\nabla_{\theta}d_{\theta}(u)||{% \log}(2\sin\frac{\lambda}{2})^{2}|\leq K|{\log}|\lambda||, which is an integrable function in \lambda\in[-\pi,\pi]. In addition, from (5) the function A(u,\lambda) of this process satisfies

 \int_{-\pi}^{\pi}A(u,\lambda)A(v,-\lambda)\exp(ik\lambda)\,d\lambda=\frac{% \Gamma[1-d_{\theta}(u)-d_{\theta}(v)]\Gamma[k+d_{\theta}(u)]}{\Gamma[1-d_{% \theta}(u)]\Gamma[d_{\theta}(u)]\Gamma[k+1-d_{\theta}(v)]},

for k\geq 0. Thus, by Stirling’s approximation, we get

 \int_{-\pi}^{\pi}A(u,\lambda)A(v,-\lambda)\exp(ik\lambda)\,d\lambda\sim\frac{% \Gamma[1-d_{\theta}(u)-d_{\theta}(v)]}{\Gamma[1-d_{\theta}(u)]\Gamma[d_{\theta% }(u)]}k^{d_{\theta}(u)+d_{\theta}(v)-1},

for k\to\infty. Besides, a simple calculation shows that f_{\theta}(u,\lambda)^{-1} is twice differentiable with respect to u and \lambda as long as d_{\theta}(u) is twice differentiable with respect to u. Thus, under these conditions the time-varying spectral density f_{\theta}(u,\lambda) satisfies assumption A1. On the other hand, an example of data taper that satisfies assumption A2 is the cosine bell function

 h(x)=\tfrac 12[1-\cos(2\pi x)]. (9)

Note that if S=\mathcal{O}(N^{a}) and M=\mathcal{O}(N^{b}) then T=\mathcal{O}(N^{a+b}) for a+b\geq 1. Thus, by choosing exponents a and b such that (a,b)\in\mathcal{C}=\{a<1,\frac{3}{2}<a+b<2,a<b\}, assumption A3 is fulfilled. Observe that the \mathcal{C} is a nonempty set.

### 2.4 Main results

Some fundamental large sample properties of the Whittle quasi-likelihood estimators (8), including consistency, asymptotic normality and efficiency are established next. In addition, we establish an asymptotic result about the estimation of the time-varying long-memory parameter for a class of locally stationary processes. The proofs of these four results are provided in Section 3.

###### Theorem 2.1 ((Consistency))

Let \theta_{0} be the true value of the parameter \theta. Under assumptions A1A3, the estimator \widehat{\theta}_{T} satisfies \widehat{\theta}_{T}\to\theta_{0}, in probability, as T\to\infty.

###### Theorem 2.2 ((Normality))

Let \theta_{0} be the true value of the parameter \theta. If assumptions A1A3 hold, then the Whittle estimator \widehat{\theta}_{T} satisfies a central limit theorem

 \sqrt{T}(\widehat{\theta}_{T}-\theta_{0})\to N[0,\Gamma(\theta_{0})^{-1}],

in distribution, as T\to\infty, where

 \Gamma(\theta)=\frac{1}{4\pi}\int_{0}^{1}\int_{-\pi}^{\pi}[\nabla\log f_{% \theta}(u,\lambda)][\nabla\log f_{\theta}(u,\lambda)]^{\prime}\,d\lambda\,du. (10)
###### Theorem 2.3 ((Efficiency))

Assuming that conditions A1A3 hold, the Whittle estimator \widehat{\theta}_{T} is asymptotically Fisher efficient.

{rem}

Recall that for a stationary fractional noise process FN(d), the asymptotic variance of the maximum likelihood estimate of the long-memory parameter, \widehat{d}, satisfies

 \lim_{T\to\infty}T\operatorname{Var}(\widehat{d})=\frac{6}{\pi^{2}}.

On the other hand, suppose that we consider a LSFN process where the long-memory parameter varies according to, for example, d(u)=\alpha_{0}+\alpha_{1}u. Thus, in order to estimate d(u), the parameters \alpha_{0} and \alpha_{1} must be estimated. Let \widehat{\alpha}_{0} and \widehat{\alpha}_{1} be their Whittle estimators, respectively, so that \widehat{d}(u)=\widehat{\alpha}_{0}+\widehat{\alpha}_{1}u. According to Theorem 2.2, the asymptotic variance of this estimate of d(u) satisfies

 \lim_{T\to\infty}T\operatorname{Var}[\widehat{d}(u)]=\frac{24}{\pi^{2}}(1-3u+3% u^{2}),

and then integrating over u we get

 \lim_{T\to\infty}T\int_{0}^{1}\operatorname{Var}[\widehat{d}(u)]\,du=\frac{12}% {\pi^{2}}.

Since two parameters are being estimated, on the average, the asymptotic variance of the estimate \widehat{d}(u) is twice the asymptotic variance of \widehat{d} from a stationary FN process. This result can be generalized to the case where three or more coefficients are estimated and to more complex trends, as established on the following theorem.

###### Theorem 2.4

Consider a LSFN process (4) with time-varying long-memory parameter d_{\beta}(u)=\sum_{j=1}^{p}\beta_{j}g_{j}(u), where \{g_{j}(u)\} are basis functions as defined in (12) below. Let \widehat{d}(u)=\sum_{j=1}^{p}\widehat{\beta}_{j}g_{j}(u) be the estimator of d_{\beta}(u) for u\in[0,1]. Then under assumptions A1A3 we have that

 \lim_{T\to\infty}T\int_{0}^{1}\operatorname{Var}[\widehat{d}(u)]\,du=\frac{6p}% {\pi^{2}}. (11)
{rem}

Note that according to Theorem 2.4 the limiting average of the variances of d(u) given by (11) does not depend on the basis functions g_{j}(\cdot) for j=1,\ldots,p.

### 2.5 Illustrations

As an illustration of the asymptotic results discussed above, consider the class of LSARFIMA models defined by (6). The evolution of these models can be specified in terms of a general class of functions. For example, let \{g_{j}(u)\}, j=1,2,\ldots, be a basis for a space of smoothly varying functions and let d_{\theta}(u) be the time-varying long-memory parameter in model (6). Then we could write d_{\theta}(u) in terms of the basis \{g_{j}(u)\} as follows:

 \ell[d_{\theta}(u)]=\sum_{j=0}^{k}\alpha_{j}g_{j}(u), (12)

for unknown values of k and \theta=(\alpha_{0},\alpha_{1},\ldots,\alpha_{k})^{\prime}, where \ell(\cdot) is a known link function. In this situation, estimating \theta involves determining k and estimating the coefficients \alpha_{0},\ldots,\alpha_{k}. Important examples of this approach are the classes of polynomials generated by the basis \{g_{j}(u)=u^{j}\}, Fourier expansions generated by the basis \{g_{j}(u)=e^{iuj}\} and wavelets generated by, for instance, the Haar or Daubechies systems. Extensions of these cases can also be considered. For example, the basis functions could also include parameters as in the case \{g_{j}(u)=e^{iu\beta_{j}}\}, where \{\beta_{j}\} are unknown values.

In order to illustrate the application of the theoretical results established in Section 2.4, we discuss next a number of combinations of polynomial and harmonic evolutions of the long-memory parameter, the noise variance, the autoregressive and moving average components of the LSARFIMA process (6). Additional examples are provided in Section 2 of Palma and Olea (2010). {exmp} Consider first the case P=Q=0 in model (6) where d(u) and \sigma(u) are specified by

for u\in[0,1], where \ell_{1}(\cdot) and \ell_{2}(\cdot) are differentiable link functions, g_{j}(\cdot) and h_{j}(\cdot) are basis functions. The parameter vector in this case is \theta=(\alpha_{0},\ldots,\alpha_{p}, \beta_{0},\ldots,\beta_{q})^{\prime} and the matrix \Gamma can be written as

 \Gamma=\pmatrix{\Gamma_{\alpha}&0\cr 0&\Gamma_{\beta}}, (13)

where

 \displaystyle\Gamma_{\alpha} \displaystyle= \displaystyle\frac{\pi^{2}}{6}\biggl{[}\int_{0}^{1}\frac{g_{i}(u)g_{j}(u)\,du}% {[\ell_{1}^{\prime}(d(u))]^{2}}\biggr{]}_{i,j=0,\ldots,p}, \displaystyle\Gamma_{\beta} \displaystyle= \displaystyle 2\biggl{[}\int_{0}^{1}\frac{h_{i}(u)h_{j}(u)\,du}{[\sigma(u)\ell% _{2}^{\prime}(\sigma(u))]^{2}}\biggr{]}_{i,j=0,\ldots,q}.
{exmp}

As a particular case of the parameter specification of the previous example, consider the case P=Q=0 in model (6) where d(u) and \sigma(u) are both specified by polynomials,

for u\in[0,1]. Similar to Example 2.5, in this case the parameter vector is \theta=(\alpha_{0},\ldots,\alpha_{p},\beta_{0},\ldots,\beta_{q})^{\prime}, \ell_{1}(u)=\ell_{2}(u)=u and the matrix \Gamma given by (10) can be written as in (13) with

 \displaystyle\Gamma_{\alpha} \displaystyle= \displaystyle\biggl{[}\frac{\pi^{2}}{6(i+j+1)}\biggr{]}_{i,j=0,\ldots,p}, \displaystyle\Gamma_{\beta} \displaystyle= \displaystyle 2\biggl{[}\int_{0}^{1}\frac{u^{i+j}\,du}{(\beta_{0}+\beta_{1}u+% \cdots+\beta_{q}u^{q})^{2}}\biggr{]}_{i,j=0,\ldots,q}.

The above integrals can be evaluated by standard calculus procedures; see, for example, Gradshteyn and Ryzhik [(2000), page 64] or by numerical integration. {exmp} Considering now a similar setup as Example 2.5 with p=q=1, but with link function \ell(\cdot)=\log(\cdot) such that

for u\in[0,1]. Then \Gamma can be written as (13) with

 \displaystyle\Gamma_{\alpha} \displaystyle= \displaystyle\frac{\pi^{2}}{6}\frac{e^{2\alpha_{0}}}{4\alpha_{1}^{3}}\left[% \matrix{2\alpha_{1}^{2}(e^{2\alpha_{1}}-1)&\alpha_{1}\bigl{(}(2\alpha_{1}-1)e^% {2\alpha_{1}}+1\bigr{)}\cr\alpha_{1}\bigl{(}(2\alpha_{1}-1)e^{2\alpha_{1}}+1% \bigr{)}&(2\alpha_{1}^{2}-2\alpha_{1}+1)e^{2\alpha_{1}}+1}\right], \displaystyle\Gamma_{\beta} \displaystyle= \displaystyle\left[\matrix{2&1\cr 1&2/3}\right].
{exmp}

Following with the assumption P=Q=0 in model (6), consider that d(u) and \sigma(u) are defined by the harmonic expansions

 \displaystyle d(u) \displaystyle= \displaystyle\alpha_{0}+\alpha_{1}\cos(\lambda_{1}u)+\cdots+\alpha_{p}\cos(% \lambda_{p}u), \displaystyle\sigma(u) \displaystyle= \displaystyle\beta_{0}+\beta_{1}\cos(\omega_{1}u)+\cdots+\beta_{q}\cos(\omega_% {q}u),

for u\in[0,1], where \lambda_{0}=0, \lambda_{i}^{2}\neq\lambda_{j}^{2} for all i,j=0,\ldots,p, i\neq j, \omega_{0}=0 and \omega_{i}^{2}\neq\omega_{j}^{2} for all i,j=0,\ldots,q, i\neq j. For simplicity, the values of the frequencies \{\lambda_{j}\} and \{\omega_{j}\} are assumed to be known. As in Example 2.5, in this case the parameter vector is \theta=(\alpha_{0},\ldots,\alpha_{p},\beta_{0},\ldots,\beta_{q})^{\prime} and the matrix \Gamma appearing in (10) can be written as in (13) with

 \Gamma_{\alpha}=\frac{\pi^{2}}{12}\biggl{[}\frac{\sin(\lambda_{i}-\lambda_{j})% }{\lambda_{i}-\lambda_{j}}+\frac{\sin(\lambda_{i}+\lambda_{j})}{\lambda_{i}+% \lambda_{j}}\biggr{]}_{i,j=0,\ldots,p}

and

 \Gamma_{\beta}=\frac{\pi^{2}}{12}\biggl{[}\frac{\sin(\omega_{i}-\omega_{j})}{% \omega_{i}-\omega_{j}}+\frac{\sin(\omega_{i}+\omega_{j})}{\omega_{i}+\omega_{j% }}\biggr{]}_{i,j=0,\ldots,q}.
{exmp}

Consider now the case P=Q=1 in model (6) where \sigma(u)=1 and d(u), \Phi(u,B), \Theta(u,B) are specified by

 \displaystyle d(u) \displaystyle= \displaystyle\alpha_{1}u, \displaystyle\Phi(u,B) \displaystyle= \displaystyle 1+\phi(u)B,\qquad\phi(u)=\alpha_{2}u, \displaystyle\Theta(u,B) \displaystyle= \displaystyle 1+\theta(u)B,\qquad\theta(u)=\alpha_{3}u,

for u\in[0,1]. In this case, the parameter vector is \theta=(\alpha_{1},\alpha_{2},\alpha_{3})^{\prime}, with 0<\alpha_{1}<\frac{1}{2}, |\alpha_{j}|<1, j=1,2 and the matrix \Gamma from (10) can be written as

 \Gamma=\pmatrix{\gamma_{11}&\gamma_{12}&\gamma_{13}\cr\gamma_{21}&\gamma_{22}&% \gamma_{23}\cr\gamma_{31}&\gamma_{32}&\gamma_{33}},

where

 \displaystyle\gamma_{11} \displaystyle= \displaystyle\frac{1}{2\alpha_{1}^{3}}\log\frac{1+\alpha_{1}}{1-\alpha_{1}}-% \frac{1}{\alpha_{1}^{2}},\qquad\gamma_{12}=\frac{g(\alpha_{1}\alpha_{2})}{(% \alpha_{1}\alpha_{2})^{3/2}}-\frac{1}{\alpha_{1}\alpha_{2}}, \displaystyle\gamma_{13} \displaystyle= \displaystyle\frac{1}{2\alpha_{1}}\biggl{\{}\biggl{[}\frac{1}{2}-\frac{1}{% \alpha_{1}}\biggr{]}-\biggl{[}1-\frac{1}{\alpha_{1}^{2}}\biggr{]}\log(1+\alpha% _{1})\biggr{\}}, \displaystyle\gamma_{22} \displaystyle= \displaystyle\frac{1}{2\alpha_{2}^{3}}\log\frac{1+\alpha_{2}}{1-\alpha_{2}}-% \frac{1}{\alpha_{2}^{2}}, \displaystyle\gamma_{23} \displaystyle= \displaystyle\frac{1}{2\alpha_{2}}\biggl{\{}\biggl{[}1-\frac{1}{\alpha_{2}^{2}% }\biggr{]}\log(1+\alpha_{2})-\biggl{[}\frac{1}{2}-\frac{1}{\alpha_{2}}\biggr{]% }\biggr{\}},\qquad\gamma_{33}=\frac{\pi^{2}}{18},

with g(x)=\operatorname{arctanh}(\sqrt{x}) for x\in(0,1) and g(x)=\arctan(\sqrt{-x}) for x\in(-1,0).

## 3 Proofs

This section is devoted to the proof of Theorems 2.12.4. Before presenting the proofs of these results, we introduce and prove three useful propositions which are of independent interest. These propositions involve the large sample properties of the functional operator defined next. Consider the function \phi\dvtx[0,1]\times[-\pi,\pi]\to\mathbb{R} and define the functional operator

 J(\phi)=\int_{0}^{1}\int_{-\pi}^{\pi}\phi(u,\lambda)f(u,\lambda)\,d\lambda\,du, (14)

where f(u,\lambda) is the time-varying spectral density of the limit process (1). Define the sample version of J(\cdot) as

 J_{T}(\phi)=\frac{1}{M}\sum_{j=1}^{M}\int_{-\pi}^{\pi}\phi(u_{j},\lambda)I_{N}% (u_{j},\lambda)\,d\lambda, (15)

where M and u_{j}, j=1,\ldots,M are given in Section 2. Furthermore, define the matrix

 Q(u)=\biggl{(}\int_{-\pi}^{\pi}\phi(u,\lambda)e^{i\lambda(s-t)\,d\lambda}% \biggr{)}_{s,t=1,\ldots,N}, (16)

and the block-diagonal matrix Q(\phi)=\operatorname{diag}[Q(u_{1}),\ldots,Q(u_{M})]. For notational simplicity, sometimes in what follows we have dropped \theta from d_{\theta}(u) so that it becomes d(u). {rem} Since the function A(u,\lambda) and the spectral density f(u,\lambda) of a locally stationary long-memory process are unbounded at zero frequency, the techniques used next to prove the large sample properties of J(\phi) and the quasi-likelihood estimators are different from those used in the short-memory context. For instance, the function A(u,\lambda) does not satisfy the key assumption A.1 of Dahlhaus (1997) or the coefficients \psi_{j}(\frac{t}{T}) of (3) fail to meet conditions (2) and (3) of Dahlhaus and Polonik (2009). Due to the unboundeness of f(u,\lambda) at the origin, our proofs exploit the properties of the Fourier transforms

 \displaystyle\widehat{f}(u,\cdot) \displaystyle= \displaystyle\int_{-\pi}^{\pi}f(u,\lambda)e^{i\lambda\cdot}\,d\lambda, \displaystyle\widehat{f}(u,v,\cdot): \displaystyle= \displaystyle\int_{-\pi}^{\pi}A(u,\lambda)A(v,-\lambda)e^{i\lambda\cdot}\,d\lambda.

### 3.1 Propositions

###### Proposition 1

Let f(u,\lambda) be a time-varying spectral density satisfying assumption A1 and assume that the function \phi(u,\lambda) appearing in (14) is symmetric in \lambda and twice differentiable with respect to u. Let \widehat{f}(u,k) and \widehat{\phi}(u,k) be their Fourier coefficients, respectively. If there is a positive constant K such that

 |\widehat{f}(u,k)\widehat{\phi}(u,k)|\leq K\biggl{(}\frac{\log k}{k^{2}}\biggr% {)},

for all u\in[0,1] and k>1, then, under assumptions A2 and A3 we have that

 E[J_{T}(\phi)]=J(\phi)+\mathcal{O}\biggl{(}\frac{\log^{2}N}{N}\biggr{)}+% \mathcal{O}\biggl{(}\frac{1}{M}\biggr{)}.
{pf}

From definition (15), we can write

 \displaystyle E[J_{T}(\phi)] \displaystyle= \displaystyle\frac{1}{M}\sum_{j=1}^{M}\int_{-\pi}^{\pi}\phi(u_{j},\lambda)E[I_% {N}(u_{j},\lambda)]\,d\lambda \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\int_{-\pi}^{\pi}\phi(u_{% j},\lambda)E|D_{N}(u_{j},\lambda)|^{2}\,d\lambda \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\int_{-\pi}^{\pi}\phi(u_{% j},\lambda)\sum_{t,s=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)}h\biggl{(}\frac{s}{% N}\biggr{)} \displaystyle\qquad\quad\hskip 126.6pt{}\times c(u_{j},t,s)e^{i\lambda(s-t)}\,% d\lambda,

where

 c(u,t,s)=E\bigl{(}Y_{[uT]-N/2+t+1,T}Y_{[uT]-N/2+s+1,T}\bigr{)}.

Thus,

 \displaystyle E[J_{T}(\phi)] \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)} \displaystyle{}\times\sum_{j=1}^{M}\sum_{t,s=0}^{N-1}h\biggl{(}\frac{t}{N}% \biggr{)}h\biggl{(}\frac{s}{N}\biggr{)}c(u_{j},t,s)\int_{-\pi}^{\pi}\phi(u_{j}% ,\lambda)e^{i\lambda(s-t)}\,d\lambda \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\sum_{t,s=0}^{N-1}h\biggl% {(}\frac{t}{N}\biggr{)}h\biggl{(}\frac{s}{N}\biggr{)}c(u_{j},t,s)\widehat{\phi% }(u_{j},s-t) \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\sum_{t=0}^{N-1}\sum_{k=0% }^{N-t}h\biggl{(}\frac{t}{N}\biggr{)}h\biggl{(}\frac{t}{N}+\frac{k}{N}\biggr{)% }c(u_{j},t,t+k) \displaystyle                                   {}\times\widehat{\phi}(u_{j},k% )(2-\delta_{k}),

where \delta_{k}=1 for k=0 and \delta_{k}=0 for k\neq 0. By assumption A2 and Taylor’s theorem,

 h\biggl{(}\frac{t}{N}+\frac{k}{N}\biggr{)}=h\biggl{(}\frac{t}{N}\biggr{)}+h^{% \prime}(\xi_{t,k,N})\frac{k}{N},

for some \xi_{t,k,N}\in(\frac{t}{N},\frac{t+k}{N}), for k\geq 0. Thus,

 \displaystyle\qquad E[J_{T}(\phi)] \displaystyle= \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\sum_{t=0}^{N-1}\sum_{k=0% }^{N-t}h^{2}\biggl{(}\frac{t}{N}\biggr{)}c(u_{j},t,t+k) \displaystyle                                       \hskip-13.0pt{}\times% \widehat{\phi}(u_{j},k)(2-\delta_{k}) \displaystyle{}+\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\sum_{t=0}^{N-1}\sum_{% k=0}^{N-t}h\biggl{(}\frac{t}{N}\biggr{)}h^{\prime}(\xi_{t,k,N})c(u_{j},t,t+k) \displaystyle                                       {}\times\widehat{\phi}(u_{% j},k)(2-\delta_{k}).

Under assumption A1, we can expand c(u,t,t+k) by Taylor’s theorem as

where

 \displaystyle\varphi_{1}(u,k) \displaystyle= \displaystyle\frac{C_{1}(\theta,u,u)}{C(\theta,u,u)}+2d^{\prime}(u)\log k, \displaystyle C_{1}(\theta,u,u) \displaystyle= \displaystyle\frac{\partial C(\theta,u,u+v)}{\partial u}\bigg{|}_{v=0}, \displaystyle\varphi_{2}(u,k) \displaystyle= \displaystyle\frac{C_{2}(\theta,u,u)}{C(\theta,u,u)}+d^{\prime}(u)\log k, \displaystyle C_{2}(\theta,u,u) \displaystyle= \displaystyle\frac{\partial C(\theta,u,u+v)}{\partial v}\bigg{|}_{v=0},

d^{\prime}(u)=\frac{\partial d_{\theta}(u)}{\partial u}, C(\theta,u,v) is defined in assumption A1 and the remainder term is given by

 R(u,t,k,N,T)=\mathcal{O}\biggl{\{}\widehat{f}(u,k)\biggl{[}\biggl{(}\frac{k}{T% }\biggr{)}^{2}+\biggl{(}\frac{t}{T}\biggr{)}^{2}\biggr{]}\log^{2}k\biggr{\}}.

Thus, since by assumption A1 |d^{\prime}(u)|\leq K for all u\in[0,1], we have |\varphi_{j}(u,k)|\leq K\log k for j=1,2 and k>1. Now we can write

Since by assumption |\widehat{f}(u,k)\widehat{\phi}(u,k)|\leq K\log k/k^{2}, for k>1, uniformly in u\in[0,1], we conclude that there is a finite limit A(u)<\infty such that

 A(u)=\lim_{N\to\infty}\sum_{k=0}^{N}\widehat{f}(u,k)\widehat{\phi}(u,k)(2-% \delta_{k}).

Consequently,

by Lemma 7. Hence,

But,

 \Biggl{|}\sum_{k=N}^{\infty}\widehat{f}(u_{j},k)\widehat{\phi}(u_{j},k)(2-% \delta_{k})\Biggr{|}

and consequently,

 \Biggl{|}\sum_{t=0}^{N-1}h^{2}\biggl{(}\frac{t}{N}\biggr{)}\sum_{k=N}^{\infty}% \widehat{f}(u_{j},k)\widehat{\phi}(u_{j},k)(2-\delta_{k})\Biggr{|}=\mathcal{O}% (\log N).

Therefore,

 \sum_{t=0}^{N-1}h^{2}\biggl{(}\frac{t}{N}\biggr{)}\sum_{k=0}^{N-t}\widehat{f}(% u_{j},k)\widehat{\phi}(u_{j},k)(2-\delta_{k})=A(u_{j})\sum_{t=0}^{N-1}h^{2}% \biggl{(}\frac{t}{N}\biggr{)}+\mathcal{O}(\log^{2}N).

On the other hand, by analyzing the term involving the second summand of (3.1) we get

by Lemma 8. Now, since h(\cdot) is symmetric around 1/2, we have

 \sum_{t=0}^{N-1}h^{2}\biggl{(}\frac{t}{N}\biggr{)}\biggl{(}\frac{t+1-N/2}{T}% \biggr{)}=\mathcal{O}\biggl{(}\frac{1}{T}\biggr{)}.

Besides, |{\sum_{k=0}^{N-1}}\varphi_{1}(u_{j},k)\widehat{f}(u_{j},k)\widehat{\phi}(u_{j% },k)(2-\delta_{k})|\leq K\sum_{k=1}^{N}\frac{(\log k)^{2}}{k^{2}}<\infty. Consequently,

The third term of (3.1) can be bounded as follows:

 \Biggl{|}\sum_{k=0}^{N-t}\varphi_{2}(u_{j},k)\widehat{f}(u_{j},k)\widehat{\phi% }(u_{j},k)k\Biggr{|}\leq K\sum_{k=1}^{N}\frac{\log k}{k}\leq K\log^{2}N,

and then

 \Biggl{|}\sum_{t=0}^{N-1}h^{2}\biggl{(}\frac{t}{N}\biggr{)}\sum_{k=0}^{N-t}% \varphi_{2}(u_{j},k)\widehat{f}(u_{j},k)\widehat{\phi}(u_{j},k)\frac{k}{T}% \Biggr{|}\leq K\frac{N}{T}\log^{2}N.

The last term of (3.1) can be bounded as follows:

 \Biggl{|}\sum_{k=0}^{N-t}R(u_{j},t,k,N,T)\widehat{\phi}(u_{j},k)(2-\delta_{k})% \Biggr{|}\leq K\log^{2}N\biggl{(}\frac{N}{T}\biggr{)}^{2},

and then

 \Biggl{|}\sum_{t=0}^{N-1}\sum_{k=0}^{N-t}h^{2}\biggl{(}\frac{t}{N}\biggr{)}R(u% _{j},t,k,N,T)\widehat{\phi}(u_{j},k)(2-\delta_{k})\Biggr{|}\leq K\frac{N^{3}}{% T^{2}}\log^{2}N.

Note that by assumption A3, the term above converges to zero as N,T\to\infty. Therefore, the first term in (3.1) can be written as

 \displaystyle\frac{1}{2\pi MH_{2,N}(0)}\sum_{j=1}^{M}\sum_{t,k=0}^{N-1}h^{2}% \biggl{(}\frac{t}{N}\biggr{)}c(u_{j},t,t+h)\widehat{\phi}(u_{j},h)(2-\delta_{k}) \displaystyle\qquad=\frac{1}{2\pi M}\sum_{j=1}^{M}A(u_{j})+\mathcal{O}\biggl{(% }\frac{\log^{2}N}{N}\biggr{)}.

Now, by Lemma 1 we can write A(u)=2\pi\int_{-\pi}^{\pi}\phi(u,\omega)f(u,\omega)\,d\omega, and then

 \frac{1}{2\pi M}\sum_{j=1}^{M}A(u_{j})=J(\phi)+\mathcal{O}\biggl{(}\frac{1}{M}% \biggr{)}. (19)

On the other hand, the second term in (3.1) can be bounded as follows:

Since |\varphi_{i}(u_{j},k)|\leq K\log k for i=1,2,j=1,\ldots,M and k>1, we conclude that

 |c(u_{j},t,t+k)\widehat{\phi}(u_{j},k)(2-\delta_{k})|\leq K\frac{N}{T}\frac{% \log k}{k^{2}}.

Therefore, since |h^{\prime}(u)|\leq K for u\in[0,1] by assumption A2, we have

 \Biggl{|}\sum_{k=0}^{N-t}c(u_{j},t,t+k)\widehat{\phi}(u_{j},k)\frac{k}{N}h^{% \prime}(\xi_{t,k,N})\Biggr{|}\leq\frac{K}{T}\sum_{k=1}^{N}\frac{\log k}{k}\leq K% \frac{\log^{2}N}{T}.

Consequently,

 \Biggl{|}\sum_{t=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)}\sum_{k=0}^{N-t}c(u_{j}% ,t,t+k)\widehat{\phi}(u_{j},k)(2-\delta_{k})\frac{k}{N}h^{\prime}(\xi_{t,k,N})% \Biggr{|}\leq KN\frac{\log^{2}N}{T}.

Hence, the second term of (3.1) is bounded by K(\log^{2}N)/T. From this and (19), the required result is obtained.

###### Proposition 2

Let f(u,\lambda) be a time-varying spectral density satisfying assumption A1. Let \phi_{1},\phi_{2}\dvtx[0,1]\times[-\pi,\pi]\to\mathbb{R} be two functions such that \phi_{1}(u,\lambda) and \phi_{2}(u,\lambda) are symmetric in \lambda, twice differentiable with respect to u and their Fourier coefficients satisfy |\widehat{\phi}_{1}(u,k)|,|\widehat{\phi}_{2}(u,k)|\leq K|k|^{-2d(u)-1} for u\in[0,1] and |k|>1. If assumptions A2 and A3 hold, then

 \lim_{T\to\infty}T\operatorname{cov}[J_{T}(\phi_{1}),J_{T}(\phi_{2})]=4\pi\int% _{0}^{1}\int_{-\pi}^{\pi}\phi_{1}(u,\lambda)\phi_{2}(u,\lambda)f(u,\lambda)^{2% }\,d\lambda\,du.
{pf}

We can write

But,

Now, an application of Theorem 2.3.2 of Brillinger (1981) yields

Thus,

 T\operatorname{cov}(J_{T}(\phi_{1}),J_{T}(\phi_{2}))=\frac{T}{[2\pi MH_{2,N}(0% )]^{2}}\bigl{[}B_{N}^{(1)}+B_{N}^{(2)}\bigr{]}, (20)

where

 \displaystyle B_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},\lambda)\phi_{2}(u_{k},% \mu)H_{N}\biggl{(}A_{t_{j}-{N/2}+1+\cdot,T}^{0}(x)h\biggl{(}\frac{\cdot}{N}% \biggr{)},\lambda-x\biggr{)} \displaystyle           {}\times H_{N}\biggl{(}\overline{A_{t_{k}-{N/2}+1+% \cdot,T}^{0}(x)}h\biggl{(}\frac{\cdot}{N}\biggr{)},x-\mu\biggr{)}H_{N} \displaystyle           {}\times\biggl{(}A_{t_{j}-{N/2}+1+\cdot,T}^{0}(y)h% \biggl{(}\frac{\cdot}{N}\biggr{)},-y-\lambda\biggr{)} \displaystyle           {}\times H_{N}\biggl{(}\overline{A_{t_{k}-{N/2}+1+% \cdot,T}^{0}(y)}h\biggl{(}\frac{\cdot}{N}\biggr{)},y+\mu\biggr{)} \displaystyle           {}\times e^{i(x+y)(t_{j}-t_{k})}\,dx\,dy\,d\mu\,d\lambda,

with \Pi=[-\pi,\pi]^{4}, and

 \displaystyle B_{N}^{(2)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},\lambda)\phi_{2}(u_{k},% \mu)H_{N}\biggl{(}A_{t_{j}-{N/2}+1+\cdot,T}^{0}(x)h\biggl{(}\frac{\cdot}{N}% \biggr{)},\lambda-x\biggr{)} \displaystyle           {}\times H_{N}\biggl{(}\overline{A_{t_{k}-{N/2}+1+% \cdot,T}^{0}(x)}h\biggl{(}\frac{\cdot}{N}\biggr{)},x+\mu\biggr{)} \displaystyle           {}\times H_{N}\biggl{(}A_{t_{j}-{N/2}+1+\cdot,T}^{0}(y% )h\biggl{(}\frac{\cdot}{N}\biggr{)},-y-\lambda\biggr{)} \displaystyle           {}\times H_{N}\biggl{(}\overline{A_{t_{k}-{N/2}+1+% \cdot,T}^{0}(y)}h\biggl{(}\frac{\cdot}{N}\biggr{)},y-\mu\biggr{)} \displaystyle           {}\times e^{i(x+y)(t_{j}-t_{k})}\,dx\,dy\,d\mu\,d\lambda.

The term B_{N}^{(1)} can be written as follows:

 \displaystyle      B_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},\lambda)\phi_{2}(u_{k},\mu) \displaystyle           {}\times A(u_{j},x)A(u_{k},-x)A(u_{j},y)A(u_{k},-y) \displaystyle           {}\times H_{N}(\lambda-x)H_{N}(x-\mu)H_{N}(\mu+y)H_{N}% (-y-\lambda) \displaystyle           {}\times e^{i(x+y)(t_{j}-t_{k})}\,dx\,dy\,d\lambda\,d% \mu+R_{N} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},x)A(u_{j},x)A(u_{k},-x)% \phi_{2}(u_{k},y)A(u_{j},y) \displaystyle                 {}\times A(u_{k},-y)H_{N}(\lambda-x)H_{N}(x-\mu)% H_{N}(\mu+y) \displaystyle                 {}\times H_{N}(-y-\lambda)e^{i(x+y)(t_{j}-t_{k})% }\,dx\,dy\,d\lambda\,d\mu \displaystyle{}+\Delta_{N}^{(1)}+\Delta_{N}^{(2)}+R_{N},

with

 \displaystyle\Delta_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}[\phi_{1}(u_{j},\lambda)-\phi_{1}(u_{j}% ,x)]\phi_{2}(u_{k},\mu)A(u_{j},x)A(u_{k},-x)A(u_{j},y) \displaystyle          {}\times A(u_{k},-y)H_{N}(\lambda-x)H_{N}(x-\mu)H_{N}(% \mu+y) \displaystyle          {}\times H_{N}(-y-\lambda)e^{i(x+y)(t_{j}-t_{k})}\,dx\,% dy\,d\lambda\,d\mu, \displaystyle\Delta_{N}^{(2)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},x)[\phi_{2}(u_{k},\mu)-% \phi_{2}(u_{k},y)]A(u_{j},x)A(u_{k},-x)A(u_{j},y) \displaystyle           {}\times A(u_{k},-y)H_{N}(\lambda-x)H_{N}(x-\mu)H_{N}(% \mu+y) \displaystyle           {}\times H_{N}(-y-\lambda)e^{i(x+y)(t_{j}-t_{k})}\,dx% \,dy\,d\lambda\,d\mu,

and by Lemma 2 the remainder term R_{N} can be bounded as follows:

 \displaystyle\qquad|R_{N}| \displaystyle\leq \displaystyle\frac{N}{T}\Biggl{|}\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},% \lambda)\phi_{2}(u_{k},\mu)A(u_{j},x)A(u_{k},-x) \displaystyle               {}\times A(u_{j},y)y^{-d(u_{k})}L_{N}(y+\mu) \displaystyle               {}\times H_{N}(\lambda-x)H_{N}(x-\mu) \displaystyle               {}\times H_{N}(-y-\lambda)e^{i(x+y)(t_{j}-t_{k})}% \,dx\,dy\,d\lambda\,d\mu\Biggr{|}.

By integrating with respect to \mu the term \Delta_{N}^{(1)} can be written as

 \displaystyle\Delta_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\sum_{t,s=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)}h% \biggl{(}\frac{s}{N}\biggr{)}\widehat{\phi}_{2}(u_{k},t-s) \displaystyle             {}\times\int_{-\pi}^{\pi}\int_{-\pi}^{\pi}\int_{-\pi% }^{\pi}[\phi_{1}(u_{j},\lambda)-\phi_{1}(u_{j},x)]A(u_{j},x)A(u_{k},-x) \displaystyle                                  {}\times A(u_{j},y)A(u_{k},-y)H% _{N}(\lambda-x)H_{N}(-y-\lambda) \displaystyle                                  {}\times e^{i(x+y)(t_{j}-t_{k})% -ixt-iys}\,dx\,dy\,d\lambda,

and by integrating with respect to y we get

 \displaystyle\Delta_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)% }h\biggl{(}\frac{s}{N}\biggr{)}h\biggl{(}\frac{p}{N}\biggr{)} \displaystyle               {}\times\widehat{\phi}_{2}(u_{k},t-s)\widehat{f}(u% _{j},u_{k},t_{j}-t_{k}-s+p) \displaystyle               {}\times\int_{-\pi}^{\pi}\int_{-\pi}^{\pi}[\phi_{1% }(u_{j},\lambda)-\phi_{1}(u_{j},x)] \displaystyle                              {}\times A(u_{j},x)A(u_{k},-x)H_{N}% (\lambda-x) \displaystyle                              {}\times e^{ix(t_{j}-t_{k}-t)+i% \lambda p}\,dx\,d\lambda \displaystyle= \displaystyle\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)% }h\biggl{(}\frac{s}{N}\biggr{)}h\biggl{(}\frac{p}{N}\biggr{)}\widehat{\phi}_{2% }(u_{k},t-s) \displaystyle               {}\times\widehat{f}(u_{j},u_{k},t_{j}-t_{k}-s+p) \displaystyle               {}\times\varepsilon_{N}(u_{j},u_{k},p,t_{j}-t_{k}-% t),

where \widehat{f}(u,v,k) and \varepsilon_{N}(r) are given by \widehat{f}(u,v,k)=\int_{-\pi}^{\pi}A(u,\lambda)A(v,-\lambda)e^{i\lambda k}\,d\lambda, and

But h(\frac{m}{M})=h(\frac{p}{M})+h^{\prime}(\xi_{p,m})\frac{m-p}{N} for some \xi_{p,m}\in[0,1]. Thus,

where the term \varepsilon_{N}^{(1)}(u_{j},u_{k},p,r) is given by

 \varepsilon_{N}^{(1)}(u_{j},u_{k},p,r)=\int_{-\pi}^{\pi}g(\omega)e^{ir\omega}% \sum_{m=0}^{N-1}e^{im\omega}\,d\omega-2\pi g(0),

with g(\omega)=\int_{-\pi}^{\pi}\phi_{1}(u_{j},\lambda)A(u_{j},\omega+\lambda)A(u_{% k},-\omega-\lambda)e^{i(p+r)\lambda}\,d\lambda. Observe that by Lemma 1, for every u_{j},u_{k},p,r, \varepsilon_{N}^{(1)}(u_{j},u_{k},p,r)\to 0 as N\to\infty, consequently we can write

 \displaystyle\varepsilon_{N}^{(1)}(u_{j},u_{k},p,r) \displaystyle= \displaystyle\int_{-\pi}^{\pi}g(\omega)\sum_{m=N}^{\infty}e^{i(m+r)\omega}\,d\omega \displaystyle= \displaystyle\sum_{m=N}^{\infty}\widehat{\phi}_{1}(u_{j},p-m)\widehat{f}(u_{j}% ,u_{k},r+m).

On the other hand, by assumption A1, |\widehat{f}(u_{j},u_{k},r+m)|\leq K|r+m|^{d(u_{j})+d(u_{k})-1}. Thus, the term \varepsilon_{N}^{(2)}(u_{j},u_{k},p,r) is bounded by

where for notational simplicity we have dropped \theta from d_{\theta}(\cdot). Thus,

 \displaystyle\varepsilon_{N}(u_{j},u_{k},p,r) \displaystyle= \displaystyle h\biggl{(}\frac{p}{N}\biggr{)}\sum_{m=N}^{\infty}\widehat{\phi}_% {1}(u_{j},p-m)\widehat{f}(u_{j},u_{k},r+m) \displaystyle{}+\mathcal{O}\bigl{(}N^{-2d(u_{j})}r^{d(u_{j})+d(u_{k})-1}\bigr{% )}.

Hence, \Delta_{N}^{(1)} can be written as

 \displaystyle\Delta_{N}^{(1)} \displaystyle= \displaystyle\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)% }h\biggl{(}\frac{s}{N}\biggr{)}h\biggl{(}\frac{p}{N}\biggr{)}\widehat{\phi}_{2% }(u_{k},t-s)\widehat{f}(u_{j},u_{k},t_{j}-t_{k}) \displaystyle               {}\times\Biggl{\{}h\biggl{(}\frac{p}{N}\biggr{)}% \sum_{m=N+1}^{\infty}\widehat{\phi}_{1}(u_{j},p-m)\widehat{f}(u_{j},u_{k},t_{j% }-t_{k}+m) \displaystyle                                    {}+O\bigl{(}N^{-2d(u_{j})}|t_% {j}-t_{k}-t|^{d(u_{j})+d(u_{k})-1}\bigr{)}\Biggr{\}} \displaystyle:= \displaystyle\Delta_{N}^{(1.1)}+\Delta_{N}^{(1.2)},

say. Therefore, |\Delta_{N}^{(1)}|\leq|\Delta_{N}^{(1.1)}|+|\Delta_{N}^{(1.2)}|. Observe that since \phi_{2}(u,\lambda)\sim C|\lambda|^{2d(u)} as \lambda\to 0 and d(u)>0 for all u\in[0,1], we conclude that \phi_{2}(u,0)={\sum_{k=-\infty}^{\infty}}\widehat{\phi}_{2}(u,k)=0. Thus,

where for simplicity we assume that h(x)=0 for x outside [0,1]. Now, by an application of Taylor’s theorem we can write h(\frac{t}{N}+\frac{k}{N})=h(\frac{t}{N})+h^{\prime}(\xi(t,k))\frac{k}{N} for some \xi(t,k)\in(\frac{t}{N}-\frac{|k|}{N},\frac{t}{N}+\frac{|k|}{N}). Hence,

Note that \sum_{k=1-N}^{N-1}\widehat{\phi}_{2}(u,k)=2\sum_{k=N}^{\infty}\widehat{\phi}_{% 2}(u,k). Therefore, |{\sum_{k=1-N}^{N-1}}\widehat{\phi}_{2}(u,k)|\leq K\sum_{k=N}^{\infty}k^{-2d(u% )-1}\leq KN^{-2d(u)}. Consequently, |{\sum_{t=0}^{N-1}h}(\frac{t}{N})^{2}\sum_{k=1-N}^{N-1}\widehat{\phi}_{2}(u,k)|\leq KN^{1-2d(u)}. On the other hand, |\sum_{t=0}^{N-1}\sum_{k=1-N}^{N-1}h(\frac{t}{N})h^{\prime}(\xi(t,k))\widehat{% \phi}_{2}(u,k)\times\break\frac{k}{N}|\leq K\sum_{k=1}^{N}k^{-2d(u)}\leq KN^{1% -2d(u)}. Hence,

 \Biggl{|}\sum_{t=0}^{N-1}\sum_{k=1-N}^{N-1}h\biggl{(}\frac{t}{N}\biggr{)}h% \biggl{(}\frac{t}{N}+\frac{k}{N}\biggr{)}\widehat{\phi}_{2}(u,k)\Biggr{|}\leq KN% ^{1-2d(u)}.

Thus, we conclude that

 \displaystyle\bigl{|}\Delta_{N}^{(1.1)}\bigr{|} \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\sum_{p=0}^{N-1}N^{1-2d(u_{k})}|S(j-k)+% p|^{d(u_{j})+d(u_{k})-1} \displaystyle                     {}\times\sum_{m=N+1}^{\infty}|p-m|^{-2d(u_{j% })-1}|S(j-k)+m|^{d(u_{j})+d(u_{k})-1} \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\sum_{p=0}^{N-1}\biggl{|}\frac{S}{N}(j-% k)+\frac{p}{N}\biggr{|}^{d(u_{j})+d(u_{k})-1} \displaystyle\qquad\quad\hskip 35.1pt{}\times\frac{1}{N}\sum_{m=N+1}^{\infty}% \biggl{|}\frac{p}{N}-\frac{m}{N}\biggr{|}^{-2d(u_{j})-1} \displaystyle\qquad\quad\hskip 90.1pt{}\times\biggl{|}\frac{S}{N}(j-k)+\frac{m% }{N}\biggr{|}^{d(u_{j})+d(u_{k})-1}\frac{1}{N} \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\int_{0}^{1}\int_{1}^{\infty}\biggl{|}% \frac{S}{N}(j-k)+x\biggr{|}^{d(u_{j})+d(u_{k})-1}|x-y|^{-2d(u_{j})-1} \displaystyle                          {}\times\biggl{|}\frac{S}{N}(j-k)+y% \biggr{|}^{d(u_{j})+d(u_{k})-1}\,dy\,dx.

Let \delta>0 and define I_{1}(\delta)=\{j,k=1,M\dvtx k<j\vee k-j>\frac{N}{S}(1+\delta)\} and I_{2}(\delta)=\{j,k=1,M\dvtx 0<k-j\leq\frac{N}{S}(1+\delta)\}. Therefore, the sum above can be written as \sum_{j,k=1,j\neq k}^{M}\cdot=\sum_{I_{1}(\delta)}\cdot+\sum_{I_{2}(\delta)}% \cdot:=|\Delta_{N}^{(1.1.1)}|+|\Delta_{N}^{(1.1.2)}|, say. Observe that over I_{1}(\delta) we have that |\frac{S}{N}(j-k)+x|^{-\alpha}\leq K|\frac{S}{N}(j-k)|^{-\alpha} for \alpha>0. Hence,

 \displaystyle\bigl{|}\Delta_{N}^{(1.1.1)}\bigr{|} \displaystyle\leq \displaystyle K\sum_{I_{1}(\delta)}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{d(u_{j}% )+d(u_{k})-1} \displaystyle        {}\times\int_{0}^{1}\int_{1}^{\infty}|x-y|^{-2d(u_{j})-1} \displaystyle                       {}\times\biggl{|}\frac{S}{N}(j-k)+y\biggr{% |}^{d(u_{j})+d(u_{k})-1}\,dy\,dx \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{d(u% _{j})+d(u_{k})-1} \displaystyle               {}\times\int_{0}^{1}\int_{1}^{\infty}|x-y|^{-2d(u_% {j})-1} \displaystyle\qquad\quad\hskip 64.4pt{}\times\biggl{|}\frac{S}{N}(j-k)+y\biggr% {|}^{d(u_{j})+d(u_{k})-1}\,dy\,dx.

Since the integrands in the above expression are all positive, an application of Tonelli’s theorem yields

 \displaystyle\bigl{|}\Delta_{N}^{(1.1.1)}\bigr{|} \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{d(u% _{j})+d(u_{k})-1} \displaystyle               {}\times\int_{1}^{\infty}\int_{0}^{1}|x-y|^{-2d(u_% {j})-1} \displaystyle                             {}\times\biggl{|}\frac{S}{N}(j-k)+y% \biggr{|}^{d(u_{j})+d(u_{k})-1}\,dx\,dy \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{d(u% _{j})+d(u_{k})-1} \displaystyle               {}\times\int_{1}^{\infty}\bigl{[}(y-1)^{-2d(u_{j})% }-y^{-2d(u_{j})}\bigr{]} \displaystyle                        {}\times\biggl{|}\frac{S}{N}(j-k)+y\biggr% {|}^{d(u_{j})+d(u_{k})-1}\,dy.

Then, by Lemma 3 we conclude that

 \displaystyle\bigl{|}\Delta_{N}^{(1.1.1)}\bigr{|} \displaystyle\leq \displaystyle K\sum_{j,k=1,j\neq k}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{2d(% u_{j})+2d(u_{k})-2} \displaystyle\leq \displaystyle K\Biggl{[}\sum_{\lx@stackrel{{\scriptstyle j,k=1,j\neq k}}{{d(u_% {j})+d(u_{k})\leq 1/2}}}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{2d(u_{j})+2d(u% _{k})-2} \displaystyle     {}+\sum_{\lx@stackrel{{\scriptstyle j,k=1,j\neq k}}{{d(u_{j}% )+d(u_{k})>1/2}}}^{M}\biggl{|}\frac{S}{N}(j-k)\biggr{|}^{2d(u_{j})+2d(u_{k})-2% }\Biggr{]}.

For the first summand above, we have the upper bound

 \sum_{j,k=1,j\neq k}^{M}|j-k|^{-1}\biggl{(}\frac{N}{S}\biggr{)}^{2}\leq K% \biggl{(}\frac{N}{S}\biggr{)}^{2}M\log M,

while the second summand can be bounded as follows:

Thus,

 \bigl{|}\Delta_{N}^{(1.1.1)}\bigr{|}\leq K\biggl{(}\frac{N}{S}\biggr{)}^{2}M% \log M+M^{2}. (23)

On the other hand, if z=\frac{S}{N}(k-j) then 0<z\leq 1+\delta for j,k\in I_{2}(\delta). Thus, an application of Lemma 9 yields for 2>\delta>0

 \bigl{|}\Delta_{N}^{(1.1.2)}\bigr{|}\leq K\sum_{I_{2}(\delta)}\biggl{|}1-\frac% {S}{N}(k-j)\biggr{|}^{2d-1},

where d:=\inf_{0\leq u\leq 1}d(u)>0. Hence, by defining p=k-j and P=N/S we can write

 \displaystyle\bigl{|}\Delta_{N}^{(1.1.2)}\bigr{|} \displaystyle\leq \displaystyle KM\sum_{p=1}^{P(1+\delta)}\biggl{|}1-\frac{p}{P}\biggr{|}^{2d-1} \displaystyle\leq \displaystyle KM\frac{N}{S}\int_{0}^{1+\delta}|1-x|^{2d-1}\,dx\leq KM\frac{N}{% S}.

Note that from assumption A3, N/S\to\infty. Thus, by combining the above bound and (23) we conclude that

 \bigl{|}\Delta_{N}^{(1.1)}\bigr{|}\leq K\biggl{(}\frac{N}{S}\biggr{)}^{2}M\log M% +M^{2}. (24)

A similar bound can be found for |\Delta_{N}^{(1.2)}| and consequently for |\Delta_{N}^{(1)}|. Furthermore, an analogous argument yields a similar bound for the term |\Delta_{N}^{(2)}| appearing in (3.1). Now, we focus on obtaining an upper bound for the remaining term R_{N} from (3.1). By integrating that expression with respect to \lambda we get

 \displaystyle|R_{N}| \displaystyle\leq \displaystyle\frac{N}{T}\Biggl{|}\sum_{j,k=1}^{M}\sum_{t,s=0}^{N-1}h\biggl{(}% \frac{t}{N}\biggr{)}h\biggl{(}\frac{s}{N}\biggr{)}\widehat{\phi}_{1}(u_{j},s-t) \displaystyle                  {}\times\int_{\Pi}\phi_{2}(u_{k},\mu)A(u_{j},x)% A(u_{k},-x) \displaystyle                         {}\times A(u_{j},y)y^{-d(u_{k})}L_{N}(y+% \mu)H(x-\mu) \displaystyle                         \hskip 24.4pt{}\times e^{ix(t_{j}-t_{k}+% t)+iy(t_{j}-t_{k}+s)}\,dx\,dy\,d\mu\Biggr{|},

where the function L_{N}(\cdot) is defined as

 L_{N}(x)=\cases{N,&\quad$|x|\leq 1/N$,\cr 1/|x|,&\quad$1/N<|x|\leq\pi$.}

Hence,

 \displaystyle|R_{N}| \displaystyle\leq \displaystyle\frac{N}{T}\Biggl{|}\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}h\biggl{(% }\frac{t}{N}\biggr{)}h\biggl{(}\frac{s}{N}\biggr{)}h\biggl{(}\frac{p}{N}\biggr% {)} \displaystyle                    {}\times\widehat{\phi}_{1}(u_{j},s-t)\widehat% {f}(u_{j},u_{k},t_{j}-t_{k}+t-p) \displaystyle                    {}\times\int_{-\pi}^{\pi}\int_{-\pi}^{\pi}% \phi_{2}(u_{k},\mu)A(u_{j},y)y^{-d(u_{k})} \displaystyle                    \hskip 49.3pt{}\times L_{N}(\mu+y)e^{iy(t_{j}% -t_{k}+s)+ip\mu}\,dy\,d\mu\Biggr{|} \displaystyle\leq \displaystyle K\frac{N}{T}\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}|\widehat{\phi}_% {1}(u_{j},s-t)||\widehat{f}(u_{j},u_{k},t_{j}-t_{k}+t-p)| \displaystyle                       {}\times\biggl{|}\int_{-\pi}^{\pi}\int_{-% \pi}^{\pi}\phi_{2}(u_{k},\mu)A(u_{j},y)y^{-d(u_{k})} \displaystyle                                      {}\times L_{N}(\mu+y)e^{iy(% t_{j}-t_{k}+s)+ip\mu}\,dy\,d\mu\biggr{|} \displaystyle\leq \displaystyle K\frac{N}{T}\sum_{j,k=1}^{M}\sum_{t,s,p=0}^{N-1}|\widehat{\phi}_% {1}(u_{j},s-t)||\widehat{f}(u_{j},u_{k},t_{j}-t_{k}+t-p)| \displaystyle                       {}\times\biggl{|}\int_{-\pi}^{\pi}\int_{-% \pi}^{\pi}L_{N}(\mu+y)y^{-d(u_{j})-d(u_{k})}\,dy\,d\mu\biggr{|} \displaystyle\leq \displaystyle K\frac{N\log N}{T}\sum_{j,k=1}^{M}\sum_{t,s,p=0,t\neq s}^{N-1}|s% -t|^{-2d(u_{j})-1} \displaystyle                                   {}\times|S(j-k)+t-p|^{d(u_{j})% +d(u_{k})-1} \displaystyle\leq \displaystyle K\frac{N^{2}\log N}{T}\sum_{j,k=1}^{M}\sum_{t,s,p=0,t\neq s}^{N-% 1}|s-t|^{-2d(u_{j})-1}S^{d(u_{j})+d(u_{k})-1} \displaystyle                                     {}\times|j-k|^{d(u_{j})+d(u_% {k})-1} \displaystyle\leq \displaystyle K\frac{N^{3}\log N}{T}M^{2}\sum_{j,k=1}^{M}(SM)^{d(u_{j})+d(u_{k% })-1}\biggl{|}\frac{j}{M}-\frac{k}{M}\biggr{|}^{d(u_{j})+d(u_{k})-1}\frac{1}{M% ^{2}}.

Since by assumption A3, T/N^{2}\to 0, we conclude that

 |R_{N}|\leq K\frac{N^{3}M^{2}}{T^{2-d}}\int_{-\pi}^{\pi}\int_{-\pi}^{\pi}|x-y|% ^{2d-1}\,dx\,dy\leq KN^{3}M^{2}T^{d-2}. (25)

Thus, from (24) and (25), we conclude

where

 C_{N}=\mathcal{O}\biggl{(}\frac{\log M}{S}+\frac{T}{N^{2}}+NT^{d-1}\biggr{)}. (26)

Therefore, by assumption A3 we conclude that C_{N}=o(1). By following successive decompositions as in (3.1), we replace \phi_{2}(u_{k},y) by \phi_{2}(u_{k},x), A(u_{k},-y) by A(u_{k},-x) and A(u_{j},y) by A(u_{j},x), respectively. Thus,

 \displaystyle\frac{TB_{N}^{(1)}}{[2\pi MH_{2,N}(0)]^{2}} \displaystyle= \displaystyle\frac{T}{[2\pi MH_{2,N}(0)]^{2}} \displaystyle\times{}\sum_{j,k=1}^{M}\int_{\Pi}\phi_{1}(u_{j},x)A(u_{j},x)A(u_% {k},-x)\phi_{2}(u_{k},x)A(u_{j},x) \displaystyle              {}\times A(u_{k},-x)H_{N}(\lambda-x)H_{N}(x-\mu)H_{% N}(\mu+y) \displaystyle              {}\times H_{N}(-y-\lambda)e^{i(x+y)(t_{j}-t_{k})}\,% dx\,dy\,d\lambda\,d\mu+o(1).

By integrating with respect to \mu and \lambda, we get

By assumption A3, for S<N we can write

Observe that by the assumptions of this proposition the products \phi_{1}(u,x)f(u,x) and \phi_{2}(u,x)f(u,x) are differentiable with respect to u. Furthermore, note that by assumption A3, \lim_{T,S\to\infty}\frac{S|p|}{T}=0 for any |p|\leq\frac{N}{S}. Consequently,

for any |p|<\frac{N}{S} as M,N,S,T\to\infty. On the other hand,

 \displaystyle\frac{2\pi T^{2}N^{2}}{S^{2}[MH_{2,N}(0)]^{2}}\sum_{t=0}^{N-1}% \sum_{p=-{t/S}}^{({N-t})/{S}}h^{2}\biggl{(}\frac{t}{N}\biggr{)}h^{2}\biggl{(}% \frac{t}{N}+\frac{pS}{N}\biggr{)}\frac{S}{N^{2}} \displaystyle\qquad\to 2\pi\int_{0}^{1}\int_{-x}^{1-x}h^{2}(x)h^{2}(x+y)\,dx\,% dy\biggl{(}\int_{0}^{1}h^{2}(x)\,dx\biggr{)}^{-2}=2\pi,

as M,N,S,T\to\infty. Therefore, in this case

 \frac{T}{[2\pi MH_{2,N}(0)]^{2}}B_{N}^{(1)}\to 2\pi\int_{0}^{1}\int_{-\pi}^{% \pi}\phi_{1}(u,x)\phi_{2}(u,x)f(u,x)^{2}\,dx\,du,

as M,N,S,T\to\infty. Similarly, we have that

 \frac{T}{[2\pi MH_{2,N}(0)]^{2}}B_{N}^{(2)}\to 2\pi\int_{0}^{1}\int_{-\pi}^{% \pi}\phi_{1}(u,x)\phi_{2}(u,x)f(u,x)^{2}\,dx\,du,

as M,N,S,T\to\infty. Therefore, by virtue of (20) this proposition is proved.

###### Proposition 3

Let \operatorname{cum}_{p}(\cdot) be the pth order cumulant with p\geq 3. Then, T^{p/2}\operatorname{cum}_{p}(J_{T}(\phi))\to 0, as T\to\infty.

{pf}

Observe that J_{T}(\phi) can be written as

 J_{T}(\phi)=\frac{1}{2\pi MH_{2,N}(0)}Y^{\prime}Q(\phi)Y,

where the block-diagonal matrix Q(\phi) is defined in (16) and Y\in\mathbb{R}^{NM} is a Gaussian random vector defined by Y=(Y(u_{1})^{\prime},\ldots,Y(u_{M})^{\prime})^{\prime}, Y(u)=(Y_{1}(u),\ldots,Y_{N}(u)), Y_{t}(u)=h(\frac{t}{N})Y_{[uT]-{N/2}+t+1,T} with Y_{[uT]-{N/2}+t+1,T} satisfying (1). For simplicity, denote the matrix Q(\phi) as Q. Since Y is Gaussian,

 \operatorname{cum}_{p}[J_{T}(\phi)]=\frac{2^{p-1}(p-1)!}{(2\pi MH_{2,N}(0))^{p% }}\operatorname{tr}(RQ)^{p},

where R=\operatorname{Var}(Y). Let |A|=[\operatorname{tr}(AA^{\prime})]^{1/2} be the Euclidean norm of matrix A and let \|A\|=\sup_{\|x\|=1}(Ax)^{\prime}Ax be the spectral norm of A. Now, since |\operatorname{tr}(QB)|\leq|Q||B| and |QB|\leq\|Q\||B| we get |{\operatorname{tr}}(RQ)^{p}|\leq\|RQ\|^{p-2}|RQ|^{2}.

On the other hand, for fixed \lambda, decompose the function \phi(\cdot,\lambda) as \phi(\cdot,\lambda)=\phi_{+}(\cdot,\lambda)-\phi_{-}(\cdot,\lambda) where \phi_{+}(\cdot,\lambda),\phi_{-}(\cdot,\lambda)\geq 0. Thus, we can write Q=Q(\phi)=Q(\phi_{+}-\phi_{-})=Q(\phi_{+})-Q(\phi_{-}):=Q_{+}-Q_{-}, say. Now, by Lemma 6 we conclude that

 \|RQ\|=\|RQ_{+}-RQ_{-}\|\leq\|RQ_{+}\|+\|RQ_{-}\|\leq K(MN^{1-2d}T^{2d-1}),

and by Proposition 2 we have that |RQ|^{2}\leq K\frac{M^{2}N^{2}}{T}. Thus,

 |{\operatorname{tr}}(RQ)^{p}|\leq K(MN^{1-2d}T^{2d-1})\frac{M^{2}N^{2}}{T}.

Consequently,

 |T^{p/2}\operatorname{cum}_{p}[J_{T}(\phi)]|\leq KM^{1-{p/2}}\biggl{(}\frac{N}% {T}\biggr{)}^{(1-2d)(p-2)}\biggl{(}\frac{\sqrt{T}}{N}\biggr{)}^{p-2}.

Since p\geq 3 and by assumption A2, N/T\to 0 and \sqrt{T}/N\to 0 as T,N\to\infty, the required result is obtained.

### 3.2 Proof of theorems

{pf*}

Proof of Theorem 2.1 To prove the consistency of the Whittle estimator, it suffices to show that

 {\sup_{\theta}}|\mathcal{L}_{T}(\theta)-\mathcal{L}(\theta)|\to 0,

in probability, as T\to\infty, where \mathcal{L}(\theta):=\frac{1}{4\pi}\int_{0}^{1}\int_{-\pi}^{\pi}[\log f_{% \theta}(u,\lambda)+\frac{f_{\theta_{0}}(u,\lambda)}{f_{\theta}(u,\lambda)}]\,d% \lambda\,du. Define g_{\theta}(u,\lambda)=f_{\theta}(u,\lambda)^{-1}. By assumption A1, g_{\theta}(u,\lambda) is continuous in \theta, \lambda and u. Thus, g_{\theta} can be approximated by the Cesaro sum of its Fourier series

 \displaystyle g^{(L)}_{\theta}(u,\lambda) \displaystyle= \displaystyle\frac{1}{4\pi^{2}}\sum_{\ell=-L}^{L}\sum_{m=-L}^{L}\biggl{(}1-% \frac{|\ell|}{L}\biggr{)}\biggl{(}1-\frac{|m|}{L}\biggr{)} \displaystyle                     {}\times\widehat{g}_{\theta}(\ell,m)\exp(-i2% \pi u_{j}\ell-i\lambda m),

such that \sup_{\theta}|g_{\theta}(u,\lambda)-g^{(L)}_{\theta}(u,\lambda)|<\varepsilon; see, for example, Theorem 1.5(ii) of Körner (1988). Following Theorem 3.2 of Dahlhaus (1997), we can write

where

 \widehat{g}_{\theta}(\ell,m)=\frac{}{}\int_{0}^{1}\int_{-\pi}^{\pi}g_{\theta}(% u,\lambda)\exp(i2\pi u\ell+i\lambda m)\,du\,d\lambda.

Consequently, |\widehat{g}_{\theta}(\ell,m)|\leq 2\pi\sup_{(\theta,u,\lambda)}|g_{\theta}(u,% \lambda)|. However, by assumption A1, |g_{\theta}(u,\lambda)| is continuous in \theta, u and \lambda. Thus, since the parameter space is compact we have that |\widehat{g}_{\theta}(\ell,m)|\leq K, for some positive constant K. Now, by defining for fixed \ell,m=1,\ldots,L, \phi(u,\lambda)=\cos(2\pi u\ell)\cos(\lambda m) or \phi(u,\lambda)=\sin(2\pi u\ell)\cos(\lambda m) in Proposition 1 and \phi_{1}(u,\lambda)=\phi_{2}(u,\lambda)=\cos(2\pi u\ell)\cos(\lambda\times% \break m) or \phi_{1}(u,\lambda)=\phi_{2}(u,\lambda)=\sin(2\pi u\ell)\cos(\lambda m) in Proposition 2, we deduce that

and

 \frac{1}{M}\sum_{j=1}^{M}\int_{-\pi}^{\pi}\{I_{N}(u_{j},\lambda)+f(u_{j},% \lambda)\}\,d\lambda\to 2\int_{0}^{1}\int_{-\pi}^{\pi}f(u,\lambda)\,d\lambda\,du, (28)

in probability, as M\to\infty. Now, from the limits (3.2) and (28), this theorem follows. {pf*}Proof of Theorem 2.2 Let \widehat{\theta}_{T} be the parameter value that minimizes the Whittle log-likelihood function \mathcal{L}_{T}(\theta) given by (7) and let \theta_{0} be the true value of the parameter. By the mean value theorem, there exists a vector \bar{\theta}_{T} satisfying \|\bar{\theta}_{T}-\theta_{0}\|\leq\|\widehat{\theta}_{T}-\theta_{0}\|, such that

 \nabla\mathcal{L}_{T}(\widehat{\theta}_{T})-\nabla\mathcal{L}_{T}(\theta_{0})=% [\nabla^{2}\mathcal{L}_{T}(\bar{\theta}_{T})](\widehat{\theta}_{T}-\theta_{0}). (29)

Therefore, it suffices to show that (a) \nabla^{2}\mathcal{L}_{T}(\theta_{0})\to\Gamma(\theta_{0}), as T\to\infty; (b) \nabla^{2}\mathcal{L}_{T}(\bar{\theta}_{T})-\nabla^{2}\mathcal{L}_{T}(\theta_{% 0})\to 0 in probability, as T\to\infty; and (c) \sqrt{T}\nabla\mathcal{L}_{T}(\theta_{0})\to N[0,\Gamma(\theta_{0})], in distribution, as T\to\infty. To this end, observe that

 \displaystyle\nabla^{2}\mathcal{L}_{T}(\theta) \displaystyle= \displaystyle\frac{1}{4\pi}\frac{1}{M}\sum_{j=1}^{M}\int_{-\pi}^{\pi}[I_{N}(u_% {j},\lambda)-f_{\theta}(u_{j},\lambda)]\nabla^{2}f_{\theta}(u_{j},\lambda)^{-1} \displaystyle{}-\nabla f_{\theta}(u_{j},\lambda)[\nabla f_{\theta}(u_{j},% \lambda)^{-1}]^{\prime}\,d\lambda \displaystyle= \displaystyle\frac{1}{4\pi}\frac{1}{M}\Biggl{\{}\sum_{j=1}^{M}\int_{-\pi}^{\pi% }\phi(u_{j},\lambda)[I_{N}(u_{j},\lambda)-f_{\theta}(u_{j},\lambda)] \displaystyle           {}+\sum_{j=1}^{M}\int_{-\pi}^{\pi}\nabla\log f_{\theta% }(u_{j},\lambda)[\nabla\log f_{\theta}(u_{j},\lambda)]^{\prime}\,d\lambda% \Biggr{\}} \displaystyle= \displaystyle\frac{1}{4\pi}[J_{T}(\phi)-J(\phi)]+\Gamma(\theta)+\mathcal{O}% \biggl{(}\frac{1}{M}\biggr{)},

where \phi(u,\lambda)=\nabla^{2}f_{\theta}(u,\lambda)^{-1}. Hence, an application of Proposition 1 and Proposition 2 yields parts (a) and (b). On the other hand, part (c) can be proved by means of the cumulant method. That is, by showing that all the cumulants of \sqrt{T}\nabla\mathcal{L}_{T}(\theta_{0}) converge to zero, excepting the second order cumulant. To this end, note that

 \displaystyle\nabla\mathcal{L}_{T}(\theta_{0}) \displaystyle= \displaystyle\frac{1}{4\pi}\frac{1}{M}\sum_{j=1}^{M}\int_{-\pi}^{\pi}[I_{N}(u_% {j},\lambda)-f_{\theta_{0}}(u_{j},\lambda)]\nabla f_{\theta_{0}}(u_{j},\lambda% )^{-1}\,d\lambda \displaystyle= \displaystyle\frac{1}{4\pi}J_{T}(\phi)-\frac{1}{4\pi}\sum_{j=1}^{M}\int_{-\pi}% ^{\pi}f_{\theta_{0}}(u_{j},\lambda)\nabla f_{\theta_{0}}(u_{j},\lambda)^{-1}\,d\lambda \displaystyle= \displaystyle\frac{1}{4\pi}[J_{T}(\phi)-J(\phi)]+\mathcal{O}\biggl{(}\frac{1}{% M}\biggr{)},

where \phi(u,\lambda)=\nabla f_{\theta_{0}}(u,\lambda)^{-1}. Hence, by Proposition 1 and assumption A3, the first-order cumulant of \sqrt{T}\nabla\mathcal{L}_{T}(\theta_{0}) satisfies

 \displaystyle\sqrt{T}E[\nabla\mathcal{L}_{T}(\theta_{0})] \displaystyle= \displaystyle\mathcal{O}\biggl{(}\frac{\sqrt{T}\log^{2}N}{N}\biggr{)}+\mathcal% {O}\biggl{(}\frac{\sqrt{T}}{M}\biggr{)} \displaystyle\to \displaystyle 0,

as T\to\infty. Furthermore, by (3.2) we have that the second-order cumulant of \sqrt{T}\nabla\mathcal{L}_{T}(\theta_{0}) can be written as

 T\operatorname{cov}[\nabla\mathcal{L}_{T}(\theta_{0}),\nabla\mathcal{L}_{T}(% \theta_{0})]=\frac{1}{16\pi^{2}}T\operatorname{cov}[J_{T}(\phi),J_{T}(\phi)].

Therefore, by Proposition 2 we have that

 \displaystyle\lim_{T\to\infty}T\operatorname{cov}[\nabla\mathcal{L}_{T}(\theta% _{0}),\nabla\mathcal{L}_{T}(\theta_{0})] \displaystyle\qquad=\frac{1}{4\pi}\int_{0}^{1}\int_{-\pi}^{\pi}\nabla f_{% \theta_{0}}(u,\lambda)^{-1}[\nabla f_{\theta_{0}}(u,\lambda)^{-1}]^{\prime}f_{% \theta_{0}}(u,\lambda)^{2}\,d\lambda\,du \displaystyle\qquad=\frac{1}{4\pi}\int_{0}^{1}\int_{-\pi}^{\pi}\nabla\log f_{% \theta_{0}}(u,\lambda)[\nabla\log f_{\theta_{0}}(u,\lambda)]^{\prime}\,d% \lambda\,du=\Gamma(\theta_{0}).

Finally, for p>2, Proposition 3 gives T^{p/2}\operatorname{cum}_{p}[\nabla\mathcal{L}_{T}(\theta_{0})]\to 0, as T\to\infty, proving part (c). {pf*}Proof of Theorem 2.3 By observing that the Fisher information matrix evaluated at the true parameter, \Gamma_{T}(\theta_{0}), is given by

 \Gamma_{T}(\theta_{0})=T\operatorname{cov}[\nabla\mathcal{L}_{T}(\theta_{0}),% \nabla\mathcal{L}_{T}(\theta_{0})],

the result is an immediate consequence of Proposition 2. {pf*}Proof of Theorem 2.4 Let V^{(T)}=[V_{ij}^{(T)}]_{i,j=1,\ldots,p}=\operatorname{Var}(\widehat{\beta}), then

 \displaystyle\int_{0}^{1}\operatorname{Var}[\widehat{d}(u)]\,du \displaystyle= \displaystyle\int_{0}^{1}\sum_{i=1}^{p}\sum_{j=1}^{p}g_{i}(u)V_{ij}^{(T)}g_{j}% (u)\,du \displaystyle= \displaystyle\sum_{i=1}^{p}\sum_{j=1}^{p}V_{ij}^{(T)}\int_{0}^{1}g_{i}(u)g_{j}% (u)\,du \displaystyle= \displaystyle\sum_{i=1}^{p}\sum_{j=1}^{p}V_{ij}^{(T)}b_{ij},

where b_{ij}=\int_{0}^{1}g_{i}(u)g_{j}(u)\,du=b_{ji}. Therefore, by Theorem 2.2

 \lim_{T\to\infty}T\int_{0}^{1}\operatorname{Var}[\widehat{d}(u)]\,du=\sum_{i=1% }^{p}\sum_{j=1}^{p}\lim_{T\to\infty}\bigl{[}TV_{ij}^{(T)}\bigr{]}b_{ij}=\sum_{% i=1}^{p}\sum_{j=1}^{p}a_{ij}b_{ij},

where A=(a_{ij})_{i,j=1,\ldots,p}=\Gamma^{-1} and

 \Gamma_{ij}=\frac{1}{4\pi}\int_{0}^{1}\int_{-\pi}^{\pi}\frac{\partial}{% \partial\beta_{i}}\log f(u,\lambda)\,\frac{\partial}{\partial\beta_{j}}\log f(% u,\lambda)\,d\lambda\,du.

But, \log f(u,\lambda)=\log(\sigma^{2})-\log(2\pi)-{d_{\beta}(u)\log}|1-e^{i\lambda% }|^{2}. Thus,

 \frac{\partial}{\partial\beta_{i}}\log f(u,\lambda)={-g_{i}(u)\log}|1-e^{i% \lambda}|^{2}.

Hence, \Gamma_{ij}=\int_{0}^{1}g_{i}(u)g_{j}(u)\,du\times\frac{1}{4\pi}\int_{-\pi}^{% \pi}(\log|1-e^{i\lambda}|^{2})^{2}\,d\lambda=\frac{\pi^{2}}{6}b_{ij}. Therefore, \Gamma=\frac{\pi^{2}}{6}B and A=\frac{6}{\pi^{2}}B^{-1}. Consequently, since A and B are symmetric matrices \lim_{T\to\infty}T\int_{0}^{1}\operatorname{Var}[\widehat{d}(u)]\,du=% \operatorname{tr}(AB)=\frac{6}{\pi^{2}}\operatorname{tr}(I_{p})=\frac{6p}{\pi^% {2}}.

## 4 Simulations

In order to gain some insight into the finite sample performance of the Whittle estimator discussed in Section 2, we report next a number of Monte Carlo experiments for the LSARFIMA model

 Y_{t,T}=\sigma(t/T)(1-\vartheta B)(1-B)^{-d(t/T)}\varepsilon_{t},

for t=1,\ldots,T with d(u)=\alpha_{0}+\alpha_{1}u, \sigma(u)=\beta_{0}+\beta_{1}u and Gaussian white noise \{\varepsilon_{t}\} with unit variance. The samples of this LSARFIMA process are generated by means of the innovation algorithm; see, for example, Brockwell and Davis (1991), page 172. In this implementation, the covariances of the process \{Y_{t,T}\} is given by

 \displaystyle E[Y_{s,T}Y_{t,T}] \displaystyle= \displaystyle\sigma\biggl{(}\frac{s}{T}\biggr{)}\sigma\biggl{(}\frac{t}{T}% \biggr{)}\frac{\Gamma[1-d({s/T})-d({t/T})]\Gamma[s-t+d({s/T})]}{\Gamma[1-d({s/% T})]\Gamma[d({s/T})]\Gamma[s-t+1-d({t/T})]} \displaystyle{}\times\biggl{[}1+\vartheta^{2}-\vartheta\frac{s-t-d({t/T})}{s-t% -1+d({s/T})}-\vartheta\frac{s-t+d({s/T})}{s-t+1-d({t/T})}\biggr{]},

for s,t=1,\ldots,T, s\geq t. Let \theta=(\alpha_{0},\alpha_{1},\beta_{0},\beta_{1},\vartheta)^{\prime} be the parameter vector. The Whittle estimates in these Monte Carlo simulations have been computed by using the cosine bell data taper (9). Figure 1 displays the contour curves for the empirical mean squared error (MSE) for the Whittle estimator \widehat{\theta} defined in this case as the average of \|\widehat{\theta}-\theta\|^{2} over 100 replications of \widehat{\theta}, where \theta is the true value of the parameter. These contour curves correspond to \theta=(0.20,0.25,0.5,0.3,0.5), for sample sizes T=512 and T=1024, respectively. In these graphs, the darkest regions represent the minimal empirical MSE while clear regions indicate greater MSE values. Note that for the case T=512, shown in the left panel, the minimal empirical MSE region is located around N\approx 105 and S\approx 35. For the sample size T=1024, displayed on the right panel, the minimal empirical MSE is reached close to N\approx 200 y S\approx 45. As noted in these graphs, there is a degree of flexibility for selecting N and S as long they belong to the areas with minimal empirical MSE. Contour curves for other parameters \theta such as those presented in Tables 1 and 2 are similar to Figure 1 and produce similar empirical optimal regions for N and S.

Tables 1 and 2 report the results from the Monte Carlo simulations for several parameter values, based on 1000 replications. These tables show the average of the estimates as well as their theoretical and empirical standard deviations (SD). The theoretical SD are based on Theorem 2.2 with matrix \Gamma_{\theta} given by

 \Gamma_{\theta}=\pmatrix{\Gamma_{\alpha}&0&\gamma_{\alpha\vartheta}\cr 0&% \Gamma_{\beta}&0\cr\gamma_{\alpha\vartheta}^{\prime}&0&\gamma_{\vartheta}},

where \gamma_{\alpha\vartheta}=[\frac{\log(1-\vartheta)}{\vartheta},\frac{\log(1-% \vartheta)}{2\vartheta}]^{\prime},\gamma_{\vartheta}=\frac{1}{1-\vartheta^{2}}, and the matrices \Gamma_{\alpha} and \Gamma_{\beta} are given in Example 2.5. The bandwidth parameters N and S for each table are based on values found in Figure 1 for \theta=(0.20,0.25,0.5,0.3,0.5). As mentioned above, these values are very similar for the other parameters reported in Tables 1 and 2. Observe from these tables that the estimated parameters are close to their true values. Besides, the empirical standard deviations are close to their theoretical counterparts.

These simulations suggest that the finite sample performance of the proposed estimators seems to be very good in terms of bias and standard deviations. This, despite the fact that in many of these simulations we have tested the method with large values of the long-memory parameter, that is, close to \frac{1}{2}. In Table 1, for example, for the combination \alpha_{0}=0.20, \alpha_{1}=0.25, the maximum value of d(u) is 0.45. Additional Monte Carlo experiments with other model specifications are reported in Palma and Olea (2010). Those simulations explore the empirical optimal selection of N and S and the finite sample performance of the Whittle estimators. Note, however, that further research is needed to establish optimal selection of N and S from a theoretical perspective. A comparison of the performances of the Whittle method with a kernel maximum likelihood estimation approach proposed by Beran (2009) and two data illustrations are also discussed in that paper.

## 5 Final remarks

A class of locally stationary long-memory processes has been addressed in this paper, which is capable of modeling nonstationary time series data exhibiting time-varying long-range dependence. A computationally efficient Whittle estimation method has been proposed and it has been shown that these estimators possess very desirable asymptotic properties such as consistency, normality and efficiency. Moreover, several Monte Carlo simulations indicate that the estimates perform well even for relatively small sample sizes.

## Appendix

This appendix contains nine auxiliary lemmas used to prove the theorems stated in Section 2 and the propositions stated in Section 3. Proof of these results are provided in Palma and Olea (2010).

###### Lemma 1

Let f(u,\lambda) be a time-varying spectral density satisfying assumption A1 and let \phi\dvtx[0,1]\times[-\pi,\pi]\to\mathbb{R} be a function such that \phi(u,\lambda) is continuously differentiable in \lambda. Consider the function defined by

 g(u,\lambda)=\int_{-\pi}^{\pi}\phi(u,\lambda+\omega)f(u,\omega)\,d\omega,

and its Fourier coefficients \widehat{g}(u,k)=\int_{-\pi}^{\pi}g(u,\lambda)e^{-ik\lambda}\,d\lambda. Under assumption A1, for every u\in[0,1] we have that \lim_{n\to\infty}\sum_{k=-n}^{n}\widehat{g}(u,k)=2\pi g(u,0).

###### Lemma 2

Consider the function \phi\dvtx[0,1]\times[-\pi,\pi]\to\mathbb{C}, such that \partial\phi(u,\gamma)/\partial u exists and |\partial\phi(u,\gamma)/\partial u|\leq K|\gamma|^{-2d(u)}, where 0\leq d(u)\leq d for all u\in[0,1]. Then, for any 0\leq t\leq N we have that

 H_{N}\biggl{[}\phi\biggl{(}\frac{\cdot}{T},\gamma\biggr{)}h\biggl{(}\frac{% \cdot}{N}\biggr{)},\lambda\biggr{]}=\phi\biggl{(}\frac{t}{T},\gamma\biggr{)}H_% {N}(\lambda)+\mathcal{O}\biggl{[}\frac{N}{T}|\gamma|^{-2d}L_{N}(\lambda)\biggr% {]}.
###### Lemma 3

Consider d_{1},d_{2}\in[0,1/2) and for any \ell\in\mathbb{Z} define the integral I(\ell)=\int_{1}^{\infty}[(x-1)^{-2d_{1}}-x^{-2d_{1}}]|\ell+x|^{d_{1}+d_{2}-1}% \,dx. Then I(\ell)=\mathcal{O}(|\ell|^{d_{1}+d_{2}-1}).

###### Lemma 4

Let \phi(u,\lambda) be a positive function, symmetric in \lambda, such that \phi(u,\lambda)\geq C|\lambda|^{2d(u)}, for \lambda\in[-\pi,\pi], where d(u) is a positive bounded function for u\in[0,1] and C>0. Let Q(u) for u\in[0,1] be the matrix defined in (16). Then there exists K>0 such that X^{\prime}Q(u)^{-1}X\leq KX^{\prime}XN^{2d(u)}, for all vector X\in\mathbb{R}^{N}.

###### Lemma 5

Let \phi(u,\lambda) be a positive function, symmetric in \lambda, such that \phi(u,\lambda)\geq C|\lambda|^{2d(u)}, for \lambda\in[-\pi,\pi], where d(u) is a positive bounded function for u\in[0,1] and C>0. Let Q(u) for u\in[0,1] and Q(\phi) be the matrices defined in (16). Then there exists K>0 such that

 |X^{\prime}[Q(\phi)^{-1}-Q(\varphi)]X|\leq KX^{\prime}XN^{2d+{1/2}},

where \varphi(u,\cdot)=\phi(u,\cdot)^{-1}/4\pi^{2}, d=\sup d(u)<\infty and X\in\mathbb{R}^{NM}.

###### Lemma 6

Let \phi(u,\lambda) be a positive function, symmetric in \lambda, such that \phi(u,\lambda)\geq C|\lambda|^{2d(u)}, for \lambda\in[-\pi,\pi], where d(u) is a positive bounded function for u\in[0,1] and C>0. Let Q(\phi) be the block-diagonal matrix defined in (16). Then there exists K>0 such that

 \sup_{X}\biggl{|}\frac{X^{\prime}RX}{X^{\prime}Q(\phi)^{-1}X}\biggr{|}\leq KMN% ^{1-2d}T^{2d-1},

where d=\sup d(u)<\frac{1}{2} and X\in\mathbb{R}^{NM}.

###### Lemma 7

Let f(\lambda) and \phi(\lambda) be two real-valued functions defined over \lambda\in[-\pi,\pi] with Fourier coefficients \widehat{f}(k) and \widehat{\phi}(k), respectively, satisfying |\widehat{f}(k)\widehat{\phi}(k)|\leq K/k^{2}, for some positive constant K and |k|>0. Let C(N) be given by C(N)=\sum_{t=0}^{N-1}h^{2}(\frac{t}{N})\sum_{k=N-t}^{N-1}\widehat{f}(k)% \widehat{\phi}(k) with bounded data taper, |h(u)|<K, for all u\in[0,1]. Then there exits a positive constant K such that |C(N)|\leq K\log^{2}N.

###### Lemma 8

Define D(N,T)=\frac{1}{N}\sum_{t=0}^{N-1}\sum_{k=N-t+1}^{N-1}\frac{\varphi(k)}{k^{2}-% d^{2}}(\frac{t-N/2}{T}) with function |\varphi(k)|<C\log N for all 0\leq k\leq N, N>1, where C is a positive constant. Then there exists a constant K>0 such that |D(N,T)|\leq K\frac{\log^{2}N}{T}.

###### Lemma 9

Let z\in[0,1+\delta] with 2>\delta>0 and 2\beta>2\alpha>0. Then, the positive double integral I(z)=\int_{0}^{1}|z-x|^{\alpha-1}\int_{1}^{\infty}(y-x)^{-\beta}(y-z)^{\alpha-% 1}\,dy\,dx, satisfies I(z)\leq K|1-z|^{2\alpha-\beta}.

## Acknowledgments

We are deeply thankful to the Associate Editor and two anonymous referees for their careful reading of the manuscript and for their constructive comments which led to substantial improvements.

## References

• Beran (2009) Beran, J. (2009). On parameter estimation for locally stationary long-memory processes. J. Statist. Plann. Inference 139 900–915. \MR2479836
• Brillinger (1981) Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, 2nd ed. Holden-Day, Oakland, CA. \MR0595684
• Brockwell and Davis (1991) Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York. \MR1093459
• Chandler and Polonik (2006) Chandler, G. and Polonik, W. (2006). Discrimination of locally stationary time series based on the excess mass functional. J. Amer. Statist. Assoc. 101 240–253. \MR2268042
• Dahlhaus (1996) Dahlhaus, R. (1996). On the Kullback–Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62 139–168. \MR1388767
• Dahlhaus (1997) Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist. 25 1–37. \MR1429916
• Dahlhaus (2000) Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. Ann. Statist. 28 1762–1794. \MR1835040
• Dahlhaus and Polonik (2006) Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi-maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. 34 2790–2824. \MR2329468
• Dahlhaus and Polonik (2009) Dahlhaus, R. and Polonik, W. (2009). Empirical spectral processes for locally stationary time series. Bernoulli 15 1–39. \MR2546797
• Doukhan, Oppenheim and Taqqu (2003) Doukhan, P., Oppenheim, G. and Taqqu, M. S., eds. (2003). Theory and Applications of Long-Range Dependence. Birkhäuser, Boston, MA. \MR1956041
• Fryzlewicz, Sapatinas and Subba Rao (2006) Fryzlewicz, P., Sapatinas, T. and Subba Rao, S. (2006). A Haar–Fisz technique for locally stationary volatility estimation. Biometrika 93 687–704. \MR2261451
• Genton and Perrin (2004) Genton, M. and Perrin, O. (2004). On a time deformation reducing nonstationary stochastic processes to local stationarity. J. Appl. Probab. 41 236–249. \MR2036285
• Gradshteyn and Ryzhik (2000) Gradshteyn, I. S. and Ryzhik, I. M. (2000). Table of Integrals, Series, and Products, 6th ed. Academic Press, San Diego, CA. \MR1773820
• Granger and Ding (1996) Granger, C. W. J. and Ding, Z. (1996). Varieties of long memory models. J. Econometrics 73 61–77. \MR1410001
• Guo et al. (2003) Guo, W., Dai, M., Ombao, H. C. and von Sachs, R. (2003). Smoothing spline ANOVA for time-dependent spectral analysis. J. Amer. Statist. Assoc. 98 643–652. \MR2011677
• Jensen and Whitcher (2000) Jensen, M. J. and Whitcher, B. (2000). Time-varying long memory in volatility: Detection and estimation with wavelets. Technical report, EURANDOM.
• Körner (1988) Körner, T. W. (1988). Fourier Analysis. Cambridge Univ. Press, Cambridge. \MR0924154
• Orbe, Ferreira and Rodriguez-Poo (2005) Orbe, S., Ferreira, E. and Rodriguez-Poo, J. (2005). Nonparametric estimation of time varying parameters under shape restrictions. J. Econometrics 126 53–77. \MR2118278
• Palma (2007) Palma, W. (2007). Long-Memory Time Series: Theory and Methods. Wiley, Hoboken, NJ. \MR2297359
• Palma and Olea (2010) Palma, W. and Olea, R. (2010). Supplement to “An efficient estimator for locally stationary Gaussian long-memory processes.” DOI: 10.1214/10-AOS812.
• Priestley (1965) Priestley, M. B. (1965). Evolutionary spectra and non-stationary processes. J. Roy. Statist. Soc. Ser. B 27 204–237. \MR0199886
• von Sachs and MacGibbon (2000) von Sachs, R. and MacGibbon, B. (2000). Non-parametric curve estimation by wavelet thresholding with locally stationary errors. Scand. J. Statist. 27 475–499. \MR1795776
• Whittle (1953) Whittle, P. (1953). Estimation and information in stationary time series. Ark. Mat. 2 423–434. \MR0060797
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters