Semiparametric stationarity and fractional unit roots tests based on data-driven multidimensional increment ratio statistics

Semiparametric stationarity and fractional unit roots tests based on data-driven multidimensional increment ratio statistics

Jean-Marc Bardet and Béchir Dola

bardet@univ-paris1.fr,   bechir.dola@univ-paris1.fr
 
SAMM, Université Panthéon-Sorbonne (Paris I), 90 rue de Tolbiac, 75013 Paris, FRANCE
Abstract

In this paper, we show that the central limit theorem (CLT) satisfied by the data-driven Multidimensional Increment Ratio (MIR) estimator of the memory parameter established in Bardet and Dola (2012) for can be extended to a semiparametric class of Gaussian fractionally integrated processes with memory parameter . Since the asymptotic variance of this CLT can be estimated, by data-driven MIR tests for the two cases of stationarity and non-stationarity, so two tests are constructed distinguishing the hypothesis and , as well as a fractional unit roots test distinguishing the case from the case . Simulations done on numerous kinds of short-memory, long-memory and non-stationary processes, show both the high accuracy and robustness of this MIR estimator compared to those of usual semiparametric estimators. They also attest of the reasonable efficiency of MIR tests compared to other usual stationarity tests or fractional unit roots tests.

Keywords: Gaussian fractionally integrated processes; semiparametric estimators of the memory parameter; test of long-memory; stationarity test; fractional unit roots test.


1 Introduction

The set of fractionally integrated stochastic process was defined and used in many articles (see for instance, Granger and Joyeux, 1980). Here we consider the following spectral version of this set for :
 
Set : is a stochastic process and there exists a continuous function satisfying:

  1. if , is a stationary process with a spectral density satisfying for all , with .

  2. if , is a stationary process with a spectral density satisfying for all , with .

The case is the case of long-memory processes, while corresponds to short-memory processes while corresponds to non-stationary processes having stationary increments. ARFIMA processes (which are linear processes), as well fractional Gaussian noises (with parameter ) or fractional Brownian motions (with parameter ) are famous examples of processes satisfying Assumption . The purpose of this paper is twofold: firstly, we establish the consistency of an adaptive data-driven semiparametric estimator of for any . Secondly, we use this estimator to build new stationarity and fractional unit roots semiparametric tests.
 
Numerous articles have been devoted to the estimation of in the case only. The books of Beran (1994) and Doukhan et al. (2003) provide large surveys of such parametric estimators (as maximum likelihood or Whittle estimators) or semiparametric estimators (as local Whittle, log-periodogram or wavelet based estimators). Here we will focus on the case of semiparametric estimators of processes satisfying Assumption . Even if first versions of local Whittle, log-periodogram and wavelet based estimators are considered in the case only (see for instance Robinson, 1995a and 1995b, Veitch et al., 2003) , new extensions have been provided to estimate when also (see for instance Hurvich and Ray, 1995, Velasco, 1999a, Velasco and Robinson, 2000, Moulines and Soulier, 2003, Shimotsu and Phillips, 2005, Giraitis et al., 2003, 2006, Abadir et al., 2007 or Moulines et al., 2007). Moreover, adaptive data-driven versions of these estimators have been defined to avoid any trimming or bandwidth parameters, generally required by these methods (see for instance Giraitis et al., 2000, Moulines and Soulier, 2003, Veitch et al., 2003, or Bardet and Bibi, 2012). The first objective of this paper is to propose for the first time an adaptive data-driven estimator of satisfying a CLT, providing confidence intervals or tests, that is valid for but also for . This objective is achieved by using Multidimensional Increment Ratio (MIR) statistics.
The original version of the Increment Ratio (IR) statistic was defined in Surgailis et al. (2008) from an observed trajectory of a process satisfying and for any as:

(1.1)

Under conditions on , if and , it is proved that the statistics converges to a deterministic monotone function on and a CLT is also established for when is large enough with respect to . As a consequence of this CLT and using the Delta-method, the estimator , where is a smooth and increasing function defined in (2.6), is a consistent estimator of satisfying also a CLT (see more details below). However this new estimator was not totally satisfying. Firstly, it requires the knowledge of the second order behavior of the spectral density, which is clearly unknown in practice, to select . Secondly, its numerical accuracy is reasonable but clearly lower than those of local Whittle or log-periodogram estimators. As a consequence, in Bardet and Dola (2012), we built a data-driven Multidimensional (MIR) estimator computed from (see its precise definition in (3.2)) improving both these points but only for . This is an adaptive data-driven semiparametric estimator of achieving the minimax convergence rate (up to a multiplicative logarithm factor) and requiring no regulation of any auxiliary parameter (as bandwidth or trimming parameters). Moreover, its numerical performances are comparable to the ones of local Whittle, log-periodogram or wavelet based estimators.
Here we extend this previous work to the case . Hence we obtain a CLT satisfied by for all with an explicit asymptotic variance depending on only. This especially allows to obtain confidence intervals of using Slutsky Lemma. The case is now studied and this offers new perspectives: our data-driven estimator can be used for building a stationarity (or non-stationarity) test since is the “border number” between stationarity and non-stationarity. The case is also now studied and it provides another application of to test fractional unit roots, that is to decide between and .  

There exist several famous stationarity (or non-stationarity) tests. We may cite parametric tests defined by Elliott et al. (1996) or Ng and Perron (1996, 2001). For non parametric stationarity tests we may cite the LMC test (see Leybourne and McCabe, 2000) and the KPSS (Kwiatkowski, Phillips, Schmidt, Shin) test (see Kwiatkowski et al., 1992), improved by the V/S test (see Giraitis et al., 2003). For non-stationarity tests we may cite the Augmented Dickey-Fuller test (see Said and Dickey, 1984) and the Philipps and Perron test (PP test in the sequel, see Philipps and Perron, 1988). All these tests are unit roots tests (except the V/S test which is also a short-memory test), which are, roughly speaking, tests based on the model with . A right-tailed test for a process satisfying Assumption is therefore a refinement of a basic unit roots test since the case is a particular case of and the case a particular case of . Thus, a stationarity (or non-stationarity test) based on the estimator of provides a useful complementary test to usual unit roots tests.
This principle of stationarity test linked to has been already investigated in many articles. We can cite Robinson (1994), Tanaka (1999), Ling and Li (2001), Ling (2003) or Nielsen (2004). It also be used to define fractional unit roots tests, like the Fractional Dickey-Fuller test defined by Dolado et al. (2002) or the cointegration rank test defined by Breitung et al. (2002). However, all these papers provide parametric tests, with a specified model (for instance ARFIMA or ARFIMA-GARCH processes). Extensions proposed by Lobato an Velasco (2007) and Dolado et al. (2008) allow to extend these tests to I processes with ARMA component but requiring the knowledge of the order of this component. Several papers have been recently devoted to the construction of semiparametric tests, see for instance Giraitis et al. (2006), Abadir et al. (2007) or Surgailis et al. (2008). But these semiparametric tests require the knowledge of the second-order expansion of the spectral density at the zero frequency for adjusting a trimming or a bandwidth parameter; an a priori choice of this parameter always implies a bias of the estimator and therefore of the test when this asymptotic expansion is not smooth enough.
The MIR estimator does not present this drawback. It converges to following a CLT with minimax convergence rate without any a priori choice of a parameter. This result is established for time series belonging to the Gaussian semiparametric class defined below (see the beginning of Section 2) which is a restriction of the general set . As a consequence, we construct a stationarity test which accepts the stationarity assumption when with a threshold only depending on the type I error test, and . A non-stationarity test accepting the non-stationarity assumption when is also proposed. By the same principle, also provides a fractional unit roots test for deciding between and , i.e. whether or not, where is a threshold depending on the type I error test. ˙
In Section 5, numerous simulations are realized on several models of time series (short and long-memory processes). First, the new MIR estimator is compared to the most efficient and famous semiparametric estimators for several values of . The performances of are convincing: this estimator is accurate and robust for all the considered processes and is globally as efficient as local Whittle, log-periodogram or wavelet based estimators. Secondly, the new stationarity and non-stationarity tests are compared to the most famous unit roots tests (KPSS, V/S, ADF and PP tests) for numerous I processes. And the results are quite surprising: even on AR or ARIMA processes, and tests provide convincing results which are comparable to those obtained with ADF and PP tests while those tests are especially built for these specific processes. For long-memory processes (such as ARFIMA processes), the results are clear: and tests are accurate tests of (non)stationarity while ADF and PP tests are only helpful when is close to or . Concerning the new MIR fractional unit roots test , it provides satisfying results for all considered processes, while fractional unit roots tests such as the fractional Dickey-Fuller test developed by Dolado et al. (2002) or the efficient Wald test introduced by Lobato and Velasco (2007) are respectively only performing for ARFIMA processes or a class of long-memory processes containing ARFIMA processes but not ARFIMA processes with .
 
The forthcoming Section 2 is devoted to the definition and asymptotic behavior of MIR estimators of and Section 3 studies an adaptive MIR estimator. The stationarity and non-stationarity tests are presented in Section 4 while Section 5 deals with the results of simulations, Section 6 provides conclusive remarks and Section 7 contains all the proofs.

2 The Multidimensional Increment Ratio statistic

Now we consider a semiparametric class which is a refinement of the general class . For and define:
 
Assumption is a Gaussian process such that there exist , , and satisfying:

  1. if , is a stationary process with a spectral density satisfying for all

    (2.1)
  2. if , is a stationary process with a spectral density satisfying for all

    (2.2)

Note that Assumption is a particular (but still general) case of the set defined above.

Remark 1.
  • The extension of the definition from to is classical since the conditions on the process is replaced by conditions on the process’ increments.

  • The condition on the derivative is not really usual. However, this is not a very restrictive condition since it is satisfied by all the classical long-range dependent processes.

  • In the literature, all the theoretical results concerning the IR statistic for time series have been obtained under Gaussian assumptions. In Surgailis et al. (2008) and Bardet and Dola (2012), simulations exhibited that the obtained limit theorems should be also valid for linear processes. However a theoretical proof of such result would require limit theorems for functionals of multidimensional linear processes difficult to be established, even if numerical experiments seem to show that this assumption could be replaced by the assumption that is a linear process having a fourth-moment order like it was done in Giraitis and Surgailis (1990).

In this section, under Assumption , we establish central limit theorems which extend to the case those already obtained in Bardet and Dola (2012) for . Let be a process satisfying Assumption and be a path of . The statistic (see its definition in (1.1)) was first defined in Surgailis et al. (2008) as a way to estimate the memory parameter. In Bardet and Surgailis (2011) a simple version of IR-statistic was also introduced to measure the roughness of continuous time processes, and its connection with level crossing index by geometrical arguments. The main interest of such a statistic is to be very robust to additional or multiplicative trends.
 
As in Bardet and Dola (2012), let with and , and define the random vector . In the sequel we naturally extend the results obtained for to by the convention: (which does not change the asymptotic results).
For , let be a standard fractional Brownian motion, i.e. a centered Gaussian process having stationary increments and such as . Now, using obvious modifications of Surgailis et al. (2008), for and , define the stationary multidimensional centered Gaussian processes such as for ,

(2.3)

Using a continuous extension when of the covariance of , we also define the stationary multidimensional centered Gaussian processes with covariance such as:

where for , using the convention . Now, we establish a multidimensional CLT satisfied by for all :

Proposition 1.

Assume that Assumption holds with and . Then

(2.4)

with where for ,

(2.5)

The proof of this proposition as well as all the other proofs can be found in Section 7.
 
In the sequel, we will assume that is a positive definite matrix for all . Extensive numerical experiments seem to give strong evidence of such a property. Now, the CLT (2.4) can be used for estimating . To begin with,

Property 2.1.

Let satisfy Assumption with and . Then, there exists a non-vanishing constant depending only on and such that for large enough,

(2.6)
and (2.7)

Therefore by choosing and such as when , the term can be replaced by in Proposition 1. Then, using the Delta-method with the function (the function is a increasing function), we obtain:

Theorem 1.

Let for . Assume that Assumption holds with and . Then if with and ,

(2.8)

This result is an extension to the case from the case already obtained in Bardet and Dola (2012). Note that the consistency of is ensured when but the previous CLT does not hold (the asymptotic variance of diverges to when , see Surgailis et al., 2008).
 
Now define

(2.9)

The function is and therefore, under assumptions of Theorem 1,

Thus, a pseudo-generalized least square estimation (PGLSE) of can be defined by

with and denoting its transpose. From a Gauss-Markov Theorem type (see again Bardet and Dola, 2012), the asymptotic variance of is smaller than the one of any , . Hence, we obtain under the assumptions of Theorem 1:

(2.10)

3 The adaptive data-driven version of the estimator

Theorem 1 and CLT (2.10) require the knowledge of to be applied. But in practice is unknown. The procedure defined in Bardet and Bibi (2012) or Bardet and Dola (2012) can be used for obtaining a data-driven selection of an optimal sequence derived from an estimation of . Since the case was studied in Bardet and Dola (2012) we consider here and for , define

(3.1)

which corresponds to the sum of the pseudo-generalized squared distance between the points and PGLSE of . Note that by the previous convention, and . Then can be minimized on a discretization of and define:

Remark 2.

The choice of the set of discretization is implied by our proof of convergence of . If the interval is stepped in points, with , the used proof cannot attest this convergence. However may be replaced in the previous expression of by any negligible function of compared to functions with (for instance, or with ).

From the central limit theorem (2.8) one deduces the following limit theorem:

Proposition 2.

Assume that Assumption holds with and . Then,

Finally define

and the estimator

(3.2)

(the definition and use of instead of are explained just before Theorem 2 in Bardet and Dola, 2012). The following theorem provides the asymptotic behavior of the estimator :

Theorem 2.

Under assumptions of Proposition 2,

(3.3)

Moreover,

The convergence rate of is the same (up to a multiplicative logarithm factor) than the one of minimax estimator of in this semiparametric framework (see Giraitis et al., 1997). As it was already established in Surgailis et al. (2008), the use of IR statistics confers a robustness of to smooth additive or multiplicative trends (see also the results of simulations thereafter). The additional advantage of with respect to other adaptive estimators of (see Moulines and Soulier, 2003, for an overview over frequency domain estimators of ) is the central limit theorem (3.3) satisfied by . This central limit theorem provides asymptotic confidence intervals on which are unobtainable for instance with FEXP or local periodogram adaptive estimator (see respectively Iouditsky et al., 2001, and Giraitis et al., 2000 or Henry, 2007). Moreover can be used for , i.e. as well for stationary and non-stationary processes, without modifications in its definition. Both these advantages allow to define stationarity and fractional unit roots tests based on .

4 Stationarity, non-stationarity and fractional unit roots tests

Assume that is an observed trajectory of a process . We define here new stationarity, non-stationarity and fractional unit roots tests for based on .

4.1 A stationarity test

There exist many stationarity and non-stationarity tests. The most famous stationarity tests are certainly the following unit roots tests:

  • The KPSS (Kwiatkowski, Phillips, Schmidt, Shin) test (see Kwiatkowsli et al., 1992);

  • The V/S test (see its presentation in Giraitis et al., 2001) which was first defined for testing the presence of long-memory versus short-memory. As it was already notified in Giraitis et al. (2003-2006), the V/S test is also more powerful than the KPSS test for testing the stationarity.

  • A test based on unidimensional IR statistic and developed in Surgailis et al. (2008).

More precisely, we consider here the following statistical hypothesis test:

  • Hypothesis (stationarity): is a process satisfying Assumption with and .

  • Hypothesis (non-stationarity): is a process satisfying Assumption with and .

We use a test based on for deciding between both these hypothesis. Hence from the previous CLT (3.3) and with a significance level , define

(4.1)

where (see (3.3)) and is the quantile of a standard Gaussian random variable .
 
Then we define the following rules of decision:

(stationarity) is accepted when and rejected when .”

Remark 3.

In fact, the previous stationarity test defined in (4.1) can also be seen as a semiparametric test versus with . It is obviously possible to extend it to any value by defining The particular case will be considered thereafter as a fractional unit roots test.

From previous results, it is clear that:

Property 1.

Under Hypothesis , the asymptotic type I error of the test is and under Hypothesis , the test power tends to .

Moreover, this test can be used as a unit roots (UR) test. Indeed, define the following typical problem of UR test. Let , with , and an ARIMA with or . Then, a (simplified) problem of a UR test is to decide between:

  • : and is a stationary ARMA process.

  • : and is a stationary ARMA process.

Then,

Property 2.

Under Hypothesis , the type I error of this unit roots test problem using decreases to when and under Hypothesis , the test power tends to .

4.2 A non-stationarity test

Unit roots tests are also often used as non-stationarity test. Hence, between the most famous non-stationarity tests and in a nonparametric framework, consider

  • The Augmented Dickey-Fuller (ADF) test (see Said and Dickey, 1984);

  • The Philipps and Perron (PP) test (see for instance Phillips and Perron 1988).

Using the statistic we propose a new non-stationarity test for deciding between:

  • Hypothesis (non-stationarity): is a process satisfying Assumption with and .

  • Hypothesis (stationarity): is a process satisfying Assumption with and .

Then, the decision rule of the test under the significance level is the following:

”Hypothesis is accepted when and rejected when

where

(4.2)

Then,

Property 3.

Under Hypothesis , the asymptotic type I error of the test is and under Hypothesis the test power tends to .

As previously, this test can also be used as a unit roots test where , with , and an ARIMA with or . We consider here a “second” simplified problem of unit roots test which is to decide between:

  • : and is a stationary ARMA process.

  • : and is a stationary ARMA process.

Then,

Property 4.

Under Hypothesis , the type I error of the unit roots test problem using decreases to when and under Hypothesis the test power tends to .

4.3 A fractional unit roots test

Fractional unit roots tests have also been defined for specifying the eventual long-memory property of the process in a unit roots test. In our Gaussian framework, they consist on testing

  • Hypothesis : is a ”random walk”-type process such as:

    (4.3)

    with a process satisfying Assumption with . Therefore is a process satisfying Assumption .

  • Hypothesis : is a process satisfying the following relation:

    (4.4)

    where is a process satisfying Assumption with , , and is the fractional integration operator of order , i.e. and .

After computations, it follows that if satisfies (4.4), then satisfies Assumption . There exist several fractional unit roots tests (see for example, Robinson, 1994, Tanaka, 1999, Dolado et al., 2002, or more recently, Kew and Harris, 2009). It is clear that the estimator can be used in such a framework for testing fractional unit roots by comparing to . Hence, the decision rule of the test under the significance level is the following:

”Hypothesis is accepted when and rejected when

where

(4.5)

Then as previously

Property 5.

Under Hypothesis , the asymptotic type I error of the test is and under Hypothesis the test power tends to .

5 Results of simulations

5.1 Numerical procedure for computing the estimator and tests

First of all, softwares used in this Section are available on http://samm.univ-paris1.fr/-Jean-Marc-Bardet with a free access on (in Matlab language).
 
The concrete procedure for applying the MIR-test of stationarity is the following:

  1. using additional simulations (performed on ARMA, ARFIMA, FGN processes and not presented here in order to avoid overloading the paper), we have observed that the value of the parameter is not really important with respect to the accuracy of the test (there are less than of fluctuations on the value of when varies). However, for optimizing our procedure (in the sense of minimizing from simulation the mean square error of the estimation) we chose as a stepwise function of :

  2. as the values of and are essential for computing the thresholds of the tests, we have estimated them and obtained:

  3. then after computing presented in Section 3, the adaptive estimator defined in (3.2), the test statistics defined in (4.1), defined in (4.2) and defined in (4.5) are computed.

5.2 Monte-Carlo experiments on several time series

In the sequel the results are obtained from generated independent trajectories of each process defined below. The concrete procedures of generation of these processes are obtained from the circulant matrix method, as detailed in Doukhan et al. (2003). The simulations are realized for different values of and and processes which satisfy Assumption :

  1. the usual ARIMA processes with respectively or and an innovation process which is a Gaussian white noise. Such processes satisfy Assumption or (respectively);

  2. the ARFIMA processes with parameter such that and an innovation process which is a Gaussian white noise. Such ARFIMA processes satisfy Assumption (note that ARIMA processes are particular cases of ARFIMA processes).

  3. the Gaussian stationary processes with the spectral density

    (5.1)

    with , and . Therefore the spectral density implies that Assumption holds. In the sequel we will first use and , implying that the second order term of the spectral density is ”less negligible” than in case of ARFIMA processes, and , implying that the second order term of the spectral density is ”more negligible” than in case of ARFIMA processes.

  4. the Gaussian stationary processes , such as its spectral density is

    (5.2)

    with . Therefore the spectral density implies that Assumption holds, but not stricto sensu.

  5. the Gaussian non-stationary process which can be written as , where the additive and multiplicative trends are respectively and (for us we chose a non-polynomial but smooth additive trend).

5.2.1 Comparison of with other semiparametric estimators of

Here we first compare the performance of the data-driven MIR estimator with other famous semiparametric estimators of :

  • is the original version of the IR based estimator defined in Surgailis et al. (2008). As it was recommended in that article, we chose .

  • is the global log-periodogram estimator introduced by Moulines and Soulier (2003), also called FEXP estimator, with bias-variance balance parameter . Such an estimator was shown to be consistent for . This semiparametric estimator is an adaptive data-driven estimator of .