Uniform convergence rates for a class of martingales with application in non-linear cointegrating regression

# Uniform convergence rates for a class of martingales with application in non-linear cointegrating regression

\fnmsQiying \snmWang\corref\thanksrefe1label=e1 [    mark]qiying.wang@sydney.edu.au    \fnmsNigel \snmChan\thanksrefe2label=e2 [    mark]chanhiungai@gmail.com School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia.
\smonth10 \syear2011\smonth9 \syear2012
\smonth10 \syear2011\smonth9 \syear2012
\smonth10 \syear2011\smonth9 \syear2012
###### Abstract

For a class of martingales, this paper provides a framework on the uniform consistency with broad applicability. The main condition imposed is only related to the conditional variance of the martingale, which holds true for stationary mixing time series, stationary iterated random function, Harris recurrent Markov chains and processes with innovations being a linear process. Using the established results, this paper investigates the uniform convergence of the Nadaraya–Watson estimator in a non-linear cointegrating regression model. Our results not only provide sharp convergence rate, but also the optimal range for the uniform convergence to be held. This paper also considers the uniform upper and lower bound estimates for a functional of Harris recurrent Markov chain, which are of independent interests.

\kwd
\aid

0 \volume20 \issue1 2014 \firstpage207 \lastpage230 \doi10.3150/12-BEJ482 \newremarkremRemark[section] \newproclaimassumptionAssumption[section] \newremarkexampleExample

\runtitle

Uniform convergence rates for a class of martingales

{aug}

and

Harris recurrent Markov chain \kwdmartingale \kwdnon-linearity \kwdnon-parametric regression \kwdnon-stationarity \kwduniform convergence

## 1 Introduction

Let with , be a sequence of random vectors. A common functional of interests of is defined by

 Sn(x)=n∑k=1ukf[(xk+x)/h],x∈Rd, (1)

where is a certain sequence of positive constants and is a real function on . Such functionals arise in non-parametric estimation problems, where may be a kernel function or a squared kernel function and the sequence is the bandwidth used in the non-parametric regression.

The uniform convergence of in the situation that the satisfy certain stationary conditions was studied in many articles. Liero liero (), Peligrad peligrad () and Nze and Doukhan angonze () considered the uniform convergence over a fixed compact set, while Masry masry (), Bosq bosq () and Fan and Yao fan () gave uniform results over an unbounded set. These work mainly focus on random sequence which satisfies different types of mixing conditions. Investigating a more general framework, Andrews andrew () gave result on kernel estimate when the data sequence is near-epoch dependent on another underlying mixing sequence. More recently, Hansen hansen () provided a set of general uniform consistency results, allowing for stationary strong mixing multivariate data with infinite support, kernels with unbounded support and general bandwidth sequences. Kristensen kristensen () further extended Hansen’s results to the heterogenous dependent case under -mixing condition. Also see Wu, Huang and Huang wu1 () for kernel estimation in general time series settings.

In comparison to the extensive results where the comes from a stationary time series data, there is little investigation on the the uniform convergence of for the being a non-stationary time series. In this regard, Gao, Li and Tjøstheim gao3 () derived strong and weak consistency results for the case where the is a null-recurrent Markov chain. Wang and Wang wangwang () worked with partial sum processes of the type where is a general linear process. While the rate of convergence in Gao, Li and Tjøstheim gao3 () is sharp, they impose the independence between and . Using a quite different method, Wang and Wang wangwang () allowed for the endogeneity between and , but their results hold only for the being in a fixed compact set.

The aim of this paper is to present a general uniform consistency result for with broad applicability. As a framework, our assumption on the is only related to the conditional variance of the martingale, that is, . See Assumption 2.1 in Section 2. This of course is a “high level” condition, but it in fact is quite natural which holds true for many interesting and important examples, including stationary mixing time series, stationary iterated random function and Harris recurrent Markov chain. See Sections 2.2 and 2.3 for the identification of Assumption 2.1. This condition also holds true for processes with innovations being a linear process, but the identification is complicated and requires quite different techniques. We will report related work in a separate paper. By using the established result, we investigate the uniform convergence of the Nadaraya–Watson estimator in a non-linear cointegrating regression model. It confirms that the uniform asymptotics in Wang and Wang wangwang () can be extended to a unbounded set and the independence between the and in Gao, Li and Tjøstheim gao3 () can be removed. More importantly, our result not only provides sharp convergence rate, but also the optimal range for the uniform convergence to be held. It should be mentioned that our work on the uniform upper and lower bound estimation for a functional of Harris recurrent Markov chain is of independent interests.

This paper is organized as follows. Our main results are presented in next section, which includes the establishment of a framework on the uniform convergence for a class of martingale and uniform upper and lower bound estimation for a functional of Harris recurrent Markov chain. An application of the main results in non-linear cointegrating regression is given in Section 3. All proofs are postponed to Section 4. Throughout the paper, we denote constants by which may be different at each appearance. We also use the notation .

## 2 Main results

### 2.1 Uniform convergence for a class of martingales

We make use of the following assumptions in the development of uniform convergence for the defined by (1). Recall where is an integer. {assumption} is a martingale difference, where , satisfying , a.s., for some specified in Assumption 2.1 below. {assumption} is a real function on satisfying and for all and some constant . {assumption} There exist positive constant sequences and with for some such that

 sup∥x∥≤bnn∑t=1f2[(xt+x)/h]=OP(cn). (2)
{assumption}

and , where is defined as in Assumption 2.1 and is defined as in Assumption 2.1.

We remark that Assumption 2.1 ensures that is a martingale for each fixed and is quite weak. Clearly, Assumption 2.1 is satisfied if is a sequence of i.i.d. random variables, which is independent of , with and . The Lipschitz condition used in Assumption 2.1 is standard in the investigation of uniform consistency, where we do not require the to have finite compact support. Assumption 2.1 is a “high level” condition for the . We use it here to provide a framework. In Sections 2.2 and 2.3, we will show that this condition is in fact quite natural which holds true by many interesting and important examples. Assumption 2.1 provides the connections among the moment condition required in Assumption 2.1, the condition (2) and the bandwidth . In many applications, we have , where and is a slowly varying function at infinite. See Section 2.3 and Examples 2.22.2 in Section 2.2. In the typical situation that , if there exists a such that , the required in Assumption 2.1 can be specified to .

We have the following main result.

###### Theorem 2.1

Under Assumptions 2.12.1, we have

 sup∥x∥≤bn∣∣∣n∑t=1utf[(xt+x)/h]∣∣∣=OP[(cnlogn)1/2]. (3)

If (2) is replaced by

 sup∥x∥≤bnn∑t=1f2[(xt+x)/h]=O(cn),a.s., (4)

the result (3) can be strengthened to

 sup∥x∥≤bn∣∣∣n∑t=1utf[(xt+x)/h]∣∣∣=O[(cnlogn)1/2],a.s. (5)

Theorem 2.1 can be extended to uniform convergence for the over unrestricted space . This requires additional condition on the and the tail decay for the function .

###### Theorem 2.2

In addition to Assumptions 2.12.1, and there exists a such that

 b−k0nn∑t=1E∥xt∥k0=O[(cnlogn)1/2]. (6)

Then,

 supx∈Rd∣∣∣n∑t=1utf[(xt+x)/h]∣∣∣=OP[(cnlogn)1/2]. (7)

Similarly, if (2) is replaced by (4) and (6) is replaced by

 b−k0nn∑t=1∥xt∥k0=O[(cnlogn)1/2],a.s., (8)

then

 supx∈Rd∣∣∣n∑t=1utf[(xt+x)/h]∣∣∣=O[(cnlogn)1/2],a.s. (9)
{rem}

Theorems 2.12.2 allow for the to be a stationary or non-stationary time series. See Examples 2.22.2 and Section 2.3 below. More examples on non-stationary time series will be reported in a separate paper. The rates of convergence in both theorems are sharp. For instance, in the well-known stationary situation such as those appeared in Examples 2.22.2, the can be chosen as . Hence, when there are enough moment conditions on the (i.e., is large enough), we obtain the optimal rate , by taking . In non-stationary situation, the rate of convergence is different. In particular we have for the to be a random walk given in Corollary 2.1. The reason behind this fact is that the amount of time spent by the random walk around any particular point is of order rather than for a stationary time series. For more explanation in this regard, we refer to Wang and Phillips wangphillips1 (), wangphillips2 ().

### 2.2 Identifications of Assumption 2.1

This section provides several stationary time series examples which satisfy Assumption 2.1. Examples 2.2 and 2.2 come from Wu, Huang and Huang wu1 (), where more general settings on the are established. Example 2.2 discusses a strongly mixing time series. This example comes from Hansen hansen (). By making use of other related works such as Peligrad peligrad (), Nze and Doukhan angonze (), Masry masry (), Bosq bosq () and Andrews andrew (), similar results can be established for other mixing time series like -mixing and near-epoch-dependent time series. In these examples, we only consider the situation that . The extension to is straightforward and hence the details are omitted. Throughout Examples 2.22.2, we use the notation .

Example on the Harris recurrent Markov chains, which allows for stationary (positive recurrent) or non-stationary (null recurrent) series, is given in Section 2.3. In the section, we also consider the uniform lower bound, which is of independent interests. More examples on processes with innovations being linear processes will be reported in a separate paper. {example} Let be a linear process defined by

 xt=∞∑k=0ϕkεt−k,

where is a sequence of i.i.d. random variables with and a density satisfying and

 ∫R∣∣p(r)ε(x)∣∣2dx<∞,r=0,1,2,

where denotes the -order derivative of . Suppose that and , and in addition Assumption 2.1, has a compact support. It follows from Section 4.1 of Wu, Huang and Huang wu1 () that, for any and ,

 supx∈R∣∣∣1nn∑t=1[f2h(xt+x)−Ef2h(xt+x)]∣∣∣=O[√lognnh+n−1/2l(n)],a.s., (10)

where is a slowly varying function. Note that is stationary process with a bounded density under the given conditions on . Simple calculations show that

 supx∈Rn∑t=1f2[(xt+x)/h]=OP(nh), (11)

that is, satisfies Assumption 2.1. {example} Consider the non-linear time series of the following form

 xk=R(xk−1,εk),

where is a bivariate measurable function and are i.i.d. innovations. This is the iterated random function framework that encompasses a lot of popular non-linear time series models. For example, if , it is the threshold autoregressive (TAR) model (see Tong tong ()). If , then it is autoregressive model with conditional heteroscedasticity (ARCH) model. Other non-linear time series models, including random coefficient model, bilinear autoregressive model and exponential autoregressive model can be fitted in this framework similarly. See Wu and Shao wu2 () for details.

In order to identify Assumption 2.1, we need some regularity conditions on the initial distribution of and the function . Define

 Lε=supx≠x′|R(x,ε)−R(x′,ε)||x−x′|. (12)

Denote by the conditional density of at given . Further let and

 (13)

and can be interpreted as a prediction sensitivity measure. These quantities measure the change in 1-step predictive distribution of with respect to change in initial value . Suppose that: {longlist}[(iii)]

there exist and such that

;

in addition to Assumption 2.1, has a compact support. It follows from Section 4.2 of Wu, Huang and Huang wu1 () that, for any and

 supx∈R∣∣∣1nn∑t=1[f2h(xt+x)−Ef2h(xt+x)]∣∣∣=O[√lognnh+n−1/2l(n)],a.s., (14)

where is a slowly varying function. Note that has a unique and stationary distribution under the given condition (i) and (ii). See Diaconis and Freedman diaconis (), for instance. Simple calculations show that

 supx∈Rn∑t=1f2[(xt+x)/h]=OP(nh), (15)

that is, satisfies Assumption 2.1. {example} Let be a strictly stationary time series with density . Suppose that: {longlist}[(iii)]

is strongly mixing with mixing coefficients that satisfy where and ;

for some satisfying and there is some such that for all , where is the joint density of ;

in addition to Assumption 2.1, has a compact support. It follows from Theorem 4 (with ) of Hansen hansen () that, for any and with ,

 supx∈R∣∣∣1nn∑t=1[f2h(xt+x)−Ef2h(xt+x)]∣∣∣=OP[√lognnh]. (16)

If in addition , the result (16) can be strengthened to almost surely convergence. Simple calculations show that

 supx∈Rn∑t=1f2[(xt+x)/h]=OP(nh), (17)

that is, satisfies Assumption 2.1.

### 2.3 Uniform bounds for functionals of Harris recurrent Markov chain

Let be a Harris recurrent Markov chain with state space , transition probability and invariant measure . We denote for the Markovian probability with the initial distribution , for correspondent expectation and for the -step transition of . A subset of with is called -set of if for any ,

 supx∈EEx(τA∑k=1ID(xk))<∞,

where and . As is well-known, -sets not only exist, but generate the entire sigma , and for any -sets and any probability measure on ,

 limn→∞n∑k=1νPk(C)/n∑k=1μPk(D)=π(C)π(D), (18)

where . See Nummelin nummelin (), for instance.

Let a -set and a probability measure on be fixed. Define

 a(t)=π−1(D)[t]∑k=1νPk(D),t≥0.

By recurrence, . By virtue of (18), the asymptotic order of depends only on . As in Chen chen (), a Harris recurrent Markov chain is called -regular if

 limλ→∞a(λt)/a(λ)=tβ∀t>0, (19)

where . It is interesting to notice that, under the condition (19), the function is regularly varying at infinity, that is, there exists a slowly varying function such that . This implies that the definition of -regular Harris recurrent Markov chain is similar to that of -null recurrent given in Karlsen and Tjøstheim karlsen1 () and Gao, Li and Tjøstheim gao3 (), but it is more natural and simple.

The following theorem provides uniform upper and lower bounds for a functional of . The upper bound implies that satisfies Assumption 2.1, allowing for the being stationary (, positive recurrent Markov chain) and non-stationary (, null recurrent Markov chain). The lower bound plays a key role in the investigation of the uniform consistency for the kernel estimator in a non-linear cointegrating regression, and hence is of independent interests. See Section 3 for more details. Both upper and lower bounds are optimal, which is detailed in Remarks 2.3 and 2.3.

###### Theorem 2.3

Suppose that: {longlist}[(ii)]

is a -regular Harris recurrent Markov chain, where the invariant measure has a bounded density function on ;

in addition to Assumption 2.1, . Then, for any satisfying for some , we have

 sup|x|≤nmn∑k=1f2[(xk+x)/h]=OP[a(n)h], (20)

where can be any finite integer.

For a given sequence of constants , if there exists a constant such that, uniformly for large enough,

 inf|x|≤bn+1n∑k=1Ef2[(xk+x)/h]≥a(n)h/C0, (21)

then, for any satisfying for some , we have

 {inf|x|≤bnn∑k=1f2[(xk+x)/h]}−1=OP{[a(n)h]−1}. (22)
{rem}

The result (22) implies that, for any , there exists a constant such that

 P(inf|x|≤bnn∑k=1f2[(xk+x)/h]≥a(n)h/Cη)≥1−η. (23)

This makes both bounds on (20) and (22) are optimal. On the other hand, since the result (23) implies that

 Einf|x|≤bnn∑k=1f2[(xk+x)/h]≥a(n)h(1−η)/Cη

for any , the condition (21) is close to minimal.

Note that random walk is a -regular Harris recurrent Markov chain. The following corollary on a random walk shows the range can be taken to be optimal as well.

###### Corollary 2.1

Let be a sequence of i.i.d. random variables with , and the characteristic function of satisfying . Write . If in addition to Assumption 2.1, , then, for and where , we have

 sup|x|≤nmn∑k=1f2[(xk+x)/h]=OP(√nh) (24)

for any integer , and

 {inf|x|≤τn√nn∑k=1f2[(xk+x)/h]}−1=OP{(√nh)−1} (25)

for any .

{rem}

For a random walk defined as in Corollary 2.1, it was shown in Wang and Phillips wangphillips1 () that

 1√nhn∑t=1f2[(xt+yn)/h]→D∫f2(s)dsLW(1,y), (26)

where is a local time of a Brownian motion , and if and if . Since for any , it follows from (26) that the range in (25) cannot be extended to for any . {rem} As in Examples 2.22.2, we may obtain a better result if is stationary (positive null recurrent) and satisfies certain other restrictive conditions. Indeed, Kristensen kristensen () provided such a result.

Let be a time-homogeneous, geometrically ergodic Markov chain. Denote the 1-step transition probability by , such that . Also denote the -step transition probability by , such that . Since is geometrically ergodic, it has a density . Further suppose that: {longlist}[(iii)]

(strong Doeblin condition) there exists and such that for all ,

 ps(y∣x)≥ρg(y); (27)

exists and is uniformly continuous for all , for some ,

for some ,

in addition to Assumption 2.1, has a compact support. It follows from Kristensen kristensen () that, for any and ,

 (28)

which yields (20) with and (22) with and , where is a constant such that . {rem} It is much more complicated if is a null recurrent Markov chain, even in the simple situation that is a random walk defined as in Corollary 2.1. In this regard, we have (26), but it is not clear at the moment if it is possible to establish a result like

 sup|x|≤bn∣∣∣1√nhn∑t=1f2[(xt+x)/h]−∫f2(s)dsLW(1,x)∣∣∣=OP(cn) (29)

for some and . Note that (29) implies that

 1√nhn∑t=1f2[(xt+y)/h]→P∫f2(s)dsLW(1,0) (30)

for any fixed . This is a stronger convergence than that given in (26). Our experiences show that it might not be possible to prove (29) without enlarging the probability space in which the hosts.

## 3 Applications in non-linear cointegrating regression

Consider a non-linear cointegrating regression model:

 yt=m(xt)+ut,t=1,2,…,n, (31)

where is a stationary error process and is a non-stationary regressor. Let be a non-negative real function and set where . The conventional kernel estimate of in model (31) is given by

 ^m(x)=∑nt=1ytKh(xt−x)∑nt=1Kh(xt−x). (32)

The point-wise limit behavior of has currently been investigated by many authors. Among them, Karlsen, Myklebust and Tjøstheim karlsen2 () discussed the situation where is a recurrent Markov chain. Wang and Phillips wangphillips2 (), wangphillips3 () and Cai, Li and Park cai () considered an alternative treatment by making use of local time limit theory and, instead of recurrent Markov chains, worked with partial sum representations of the type where is a general linear process. In another paper, Wang and Phillips wangphillips2 () considered the errors to be serially dependent and cross correlated with the regressor for small lags. For other related works, we refer to Kasparis and Phillips kasparis (), Park and Phillips park1 (), park2 (), Gao et al. gao1 (), gao2 (), Marmer marmer (), Chen, Li and Zhang chenetal (), Wang and Phillips wangphillips4 () and Wang wang ().

This section provides a uniform convergence for the by making direct use of Theorems 2.1 and 2.3 in developing the asymptotics. For reading convenience, we list the assumptions as follows. {assumption} (i) is a -regular Harris recurrent Markov chain defined as in Section 3, where the invariant measure has a bounded density function on ; (ii) is a martingale difference, where , satisfying , where for some . {assumption} The kernel satisfies that , and for any ,

 ∣∣K(x)−K(y)∣∣≤C|x−y|.
{assumption}

There exists a real positive function such that

 ∣∣m(y)−m(x)∣∣≤C|y−x|αg(x),

uniformly for some and any , where can be chosen sufficiently small and .

Assumption 3 is similar to, but weaker than those appeared in Karlsen, Myklebust and Tjøstheim karlsen2 (), where the authors considered the point-wise convergence in distribution.

Assumption 3 is a standard condition on as in the stationary situation. The Lipschitz condition on is not necessary if we only investigate the point-wise asymptotics. See Remark 3 for further details.

Assumption 3 requires a Lipschitz-type condition in a small neighborhood of the targeted set for the functionals to be estimated. This condition is quite weak, which may host a wide set of functionals. Typical examples include that ; ; ; .

We have the following asymptotic results.

###### Theorem 3.1

Suppose Assumptions 33 hold, and where is given as in Assumption 3. It follows that

 sup|x|≤b′n∣∣^m(x)−m(x)∣∣=OP{[a(n)h]−1/2log1/2n+hαδn}, (33)

where , and satisfies that

 inf|x|≤bn+1n∑k=1EK[(xk+x)/h]≥a(n)h/C0

for some and all sufficiently large. In particular, for the random walk defined as in Corollary 2.1, we have

 sup|x|≤b′n∣∣^m(x)−m(x)∣∣=OP{(nh2)−1/4log1/2n+hαδn}, (34)

where for some and .

{rem}

When a high moment exists on the error , the can be chosen sufficiently small so that there are more bandwidth choices in practice. It is understandable that the results (33) and (34) are meaningful if only , which depends on the tail of the unknown regression function , the bandwidth and the range . When has a light tail such as , may be bounded by a constant. In this situation, the in (34) can be chosen to be for some . In contrast to Theorem 2.3 and Remark 2.3, this kind of range might be optimal, that is, the cannot be improved to , for any , to establish the same rate of convergence as in (34). {rem} Both results (33) and (34) are sharp. However, a better result can be obtained if we are only interested in the point-wise asymptotics for . For instance, as in Wang and Phillips wangphillips1 (), wangphillips2 () with minor modification, we may show that, for each ,

 ^m(x)−m(x)=OP{(nh2)−1/4+hα}, (35)

whenever is a random walk defined as in Corollary 2.1. Furthermore has an asymptotic distribution that is mixing normal, under minor additional conditions. More details are referred to Wang and Phillips wangphillips1 (), wangphillips2 (). {rem} Wang and Wang wangwang () established a similar result to (34) with the being a partial sum of linear process, but only for the being a compact support and imposing a bounded condition on . The setting on the in this paper is similar to that given in Gao, Li and Tjøstheim gao3 (), but our result provides the optimal range for the uniform convergence holding true and removes the independence between the error and required by Gao, Li and Tjøstheim gao3 ().

## 4 Proofs of main results

{pf*}

Proof of Theorem 2.1 We split the set into balls of the form

 Anj={x\dvtx∥x−yj∥≤1/m′n},

where , and are chosen so that . It follows that

 sup∥x∥≤bn∣∣∣n∑t=1utf[(xt+x)/h]∣∣∣ ≤max0≤j≤mnsupx∈Anjn∑t=1|ut|∣∣f[(xt+x)/h]−f[(xt+yj)/h]∣∣ (36) +max0≤j≤mn∣∣∣n∑t=1utf[(xt+yj)/h]∣∣∣ :=λ1n+λ2n.

Recalling the Assumption 2.1, it is readily seen that

 λ1n ≤ n∑t=1|ut|max0≤j≤mnsupx∈Anj∣∣f[(xt+x)/h]−f[(xt+yj)/h]∣∣ ≤ C(hm′n)−1n∑t=1|ut| ≤ C(cnlogn)1/21nn∑t=1|ut|=O[(cnlogn)1/2],a.s.

by the strong law of large number.

In order to investigate , write and . Recalling and , we have

 λ2n ≤ max0≤j≤mn∣∣∣n∑t=1u∗tf[(xt+yj)/h]∣∣∣ ≤ max0≤j≤mn∣∣∣n∑t=1u∗tf[(xt+yj)/h]∣∣∣+Cn∑t=1[∣∣ut−u′t∣∣+E(∣∣ut−u′t∣∣∣Ft−1)] : = λ3n+λ4n.

Routine calculations show that, under and ,

 λ4n ≤ n∑t=1[|ut|I{|ut|>(cn/logn)1/2}+E(|ut|I{|ut|>(cn/logn)1/2}∣Ft−1)] ≤ C(cnlogn)(1−2p)/2n∑t=1[|ut|2p+E(|ut|2p∣Ft−1)] ≤