Semiparametrically Point-Optimal Hybrid Rank Tests for Unit Roots

Semiparametrically Point-Optimal Hybrid Rank Tests for Unit Roots

\fnmsBo \snmZhou\thanksrefm1label=e1 [    email]bozhouhkust@ust.hk    \fnmsRamon \snmvan den Akker\thanksrefm2label=e2 [    email]r.vdnakker@gmail.com    \fnmsBas J.M. \snmWerker\thanksrefm2 label=e3 [    email]b.j.m.werker@tilburguniversity.edu Hong Kong University of Science and Technology\thanksmarkm1 and Tilburg University\thanksmarkm2
Abstract

We propose a new class of unit root tests that exploits invariance properties in the Locally Asymptotically Brownian Functional limit experiment associated to the unit root model. The invariance structures naturally suggest tests that are based on the ranks of the increments of the observations, their average, and an assumed reference density for the innovations. The tests are semiparametric in the sense that they are valid, i.e., have the correct (asymptotic) size, irrespective of the true innovation density. For a correctly specified reference density, our test is point-optimal and nearly efficient. For arbitrary reference densities, we establish a Chernoff-Savage type result, i.e., our test performs as well as commonly used tests under Gaussian innovations but has improved power under other, e.g., fat-tailed or skewed, innovation distributions. To avoid nonparametric estimation, we propose a simplified version of our test that exhibits the same asymptotic properties, except for the Chernoff-Savage result that we are only able to demonstrate by means of simulations.

[
\kwd
\startlocaldefs\endlocaldefs\startlocaldefs\endlocaldefs\runtitle

Semiparametrically optimal unit root tests

{aug}

, and

class=MSC] \kwd[Primary ]62G10\kwd62G20 \kwd[; secondary ]62P20\kwd62M10

unit root test \kwdsemiparametric power envelope \kwdlimit experiment \kwdLABF \kwdmaximal invariant \kwdrank statistic

1 Introduction

The monographs of (Patterson, 2011) (2011, 2012) and Choi (2015) provide an overview of the literature on unit roots tests. This literature traces back to White (1958) and includes seminal papers as Dickey and Fuller (1979, 1981), Phillips (1987), Phillips and Perron (1988), and Elliott, Rothenberg and Stock (1996). The present paper fits into the stream of literature that focuses on “optimal” testing for unit roots. Important earlier contributions here are Dufour and King (1991), Saikkonen and Luukkonen (1993), and Elliott, Rothenberg and Stock (1996). The latter paper derives the asymptotic power envelope for unit root testing in models with Gaussian innovations. Rothenberg and Stock (1997) and Jansson (2008) consider the non-Gaussian case.

The present paper considers testing for unit roots in a semiparametric setting. Following earlier literature, we focus on a simple AR(1) model driven by possibly serially correlated errors. The innovations driving these serially correlated errors are i.i.d., whose distribution is considered a nuisance parameter. Apart from some smoothness and the existence of relevant moments, no assumptions are imposed on this distribution. From earlier work it is known that the unit root model leads to Locally Asymptotically Brownian Functional (LABF) limit experiments (in the Le Cam sense; see Jeganathan, 1995). As a consequence, no uniformly most powerful test exists (even in case the innovation distribution would be known) – see also Elliott, Rothenberg and Stock (1996). In the semiparametric case the limit experiment becomes more difficult due to the infinite-dimensional nuisance parameter. Jansson (2008) derives the semiparametric power envelope by mimicking ideas that hold for Locally Asymptotically Normal (LAN) models. However, the proposed test needs a nonparametric score function estimator which complicates its implementation. The point-optimal tests proposed in the present paper only require nonparametric estimation of a real-valued cross-information factor and we also provide a simplified version that avoids any nonparametric estimation.

The main contribution of this manuscript is twofold. First, we derive the semiparametric power envelopes of unit root tests with serially correlated errors for two cases: symmetric or possibly non-symmetric innovation distributions (Section 3). Our method of derivation is novel and exploits the invariance structures embedded in the semiparametric unit root model. To be precise, we use a “structural” description of the LABF limit experiment (Section 3.2), obtained from Girsanov’s theorem. This limit experiment corresponds to observing an infinitely-dimensional Ornstein-Uhlenbeck process (on the time interval ). The unknown innovation density in the semiparametric unit root model takes the form of an unknown drift parameter in this limit experiment. Within the limit experiment, Section 3.3 derives the maximal invariant, i.e., a reduction of the data which is invariant with respect to the nuisance parameters (that is, the unknown drift in the limiting Ornstein-Uhlenbeck experiment). It turns out that this maximal invariant takes a rather simple form: all processes associated to density perturbations have to be replaced by their associated bridges (i.e., consider for the process with ). The power envelopes for invariant tests in the limit experiment then readily follow from the Neyman-Pearson lemma. An application of the Asymptotic Representation Theorem (see, e.g., Theorem 15.1 in van der Vaart (2000)) subsequently yields the local asymptotic power envelope (Theorem 3.3). In case the innovation density is known to be symmetric, the semiparametric power envelope coincides with the parametric power envelope. This implies the existence of an adaptive testing procedure (see also Jansson (2008)). Moreover, we note that our analysis of invariance structures in the LABF experiment is also of independent interest and could, for example, be exploited in the analysis of optimal inference for cointegration or predictive regression models. Also, the analysis gives an alternative interpretation of the test proposed in Elliott, Rothenberg and Stock (1996) — the ERS test — as this test is also based on an invariant, though not the maximal one (see Remark 3.3).

As a second contribution, we provide two new classes of easy-to-implement unit root tests that are semiparametrically optimal in the sense that their asymptotic power curves are tangent to the associated semiparametric power envelopes (Section 4.1). The form of the maximal invariant developed before suggests how to construct such tests based on the ranks/signed-ranks (depending on whether the innovation density is known to be symmetric or not) of the increments of the observations, the average of these increments, and an assumed reference density . These tests are semiparametric in the sense that the reference density need not equal the true innovation density, while they are still valid (i.e., provide the correct asymptotic size). The reference density is not restricted to be Gaussian, which it generally needs to be in more classical QMLE results. When the reference density is correctly specified (i.e., happens to be equal to the true density ), the asymptotic power curve of our test is tangent to the semiparametric power envelope, and this in turn gives the optimality property. A feasible version of the oracle test using is obtained by using a nonparametrically estimated density , of which the corresponding simulation results are provided in Section 5.

In relation to the classical literature on efficient rank-based testing (for instance, Hájek and Sidák (1967), Hallin and Puri (1988), and Hallin, Van den Akker and Werker (2011)) our approach can be interpreted as follows. In the aforementioned papers, the invariance arguments (that is, using the ranks of the innovations) are applied in the sequence of models at hand. We, on the other hand, only apply the invariance arguments in the limit experiment. In this way, we can extend these ideas to non-LAN experiments. For the LAN case, both approaches would effectively lead to the same results. Our tests, despite the absence of a LAN structure, satisfy a Chernoff and Savage (1958) type result (Corollary 4.1): for any reference density our test outperforms, at any true density, its classical counterpart which, in this case, is the ERS test. We provide, in Section 4.2, even simpler alternative classes of tests that require no nonparametric estimations at all. These (simplified) classes of tests coincide with their corresponding originals for correctly specified reference density and, hence, share the same optimality properties. In case of misspecified reference density, the alternative classes still seem to enjoy the Chernoff-Savage type property, though only for a Gaussian reference density. This is in line with with the traditional Chernoff-Savage results for Locally Asymptotically Normal models.

The remainder of this paper is organized as follows. Section 2 introduces the model assumptions and some notation. Next, Section 3 contains the analysis of the limit experiment. In particular we study invariance properties in the limit experiment leading to our new derivation of the semiparametric power envelopes. The classes of hybrid rank tests we propose are introduced in Section 4. Section 5 provides the results of a Monte Carlo study and Section 6 contains a discussion of possible extensions of our results. All proofs are organized in 6.

2 The model

Consider observations generated from the classical component specification, for ,

(2.1)
(2.2)
(2.3)

where , the innovations form an i.i.d. sequence defined for with density , and is the AR(p) lag polynomial. Moreover, it is assumed that . We impose the following assumptions on this innovation density.

Assumption 1.

  1. The density is absolutely continuous with a.e. derivative , i.e., for all we have

  2. and .

  3. The standardized Fisher-information for location,

    where is the location score, is finite.

  4. The density is positive, i.e., .

The imposed smoothness on is mild and standard (see, e.g., Le Cam (2012), van der Vaart (2000)). The finite variance assumption (b) is important to our asymptotic results as it is essential to the weak convergence, to a Brownian motion, of the partial-sum process generated by the innovations.111Let us already mention that, although not allowed for in our theoretical results, we will also assess the finite-sample performances of the proposed tests (Section 5) for innovation distributions with infinite variance. For tests specifically developed for such cases we refer to Hasan (2001), Ahn, Fotopoulos and He (2001), and Callegari, Cappuccio and Lubian (2003). The zero intercept assumption in (b) excludes a deterministic trend in the model. Such a trend leads to an entirely different asymptotic analysis, see Hallin, Van den Akker and Werker (2011). The Fisher information in (c) has been standardized by premultiplying with the variance , so that it becomes scale invariant (i.e., invariant with respect to ). In other words, only depends on the shape of the density and not on its variance . The positivity of the density in (d) is mainly made for notational convenience.

The assumption on the initial condition, , is less innocent then it may appear. Indeed, it is known, see Müller and Elliott (2003) and Elliott and Müller (2006), that, even asymptotically, the initial condition can contain non-negligible statistical information. Nevertheless, it is still stronger than necessary for the sake of simplicity, and it can be relaxed to the level of generality in Elliott, Rothenberg and Stock (1996).

Let denote the set of densities satisfying Assumption 1. We also investigate in the present paper the special case of symmetric densities. For that purpose, we denote by the set of densities which satisfy Assumption 1 and, at the same time, are symmetric about zero. Of course, it follows .

With respect to the autocorrelation structure , we impose the following assumption.

Assumption 2.

The lag polynomial is of finite order and satisfies .

Let denote the set of such that the induced lag polynomial satisfies Assumption 2. For mathematical convenience we restrict the lag polynomial to be of finite order . One may expect many of the results in the present paper to extend to the case (see, e.g., Jeganathan (1997)).

The main goal of this paper is to develop tests, with optimality features, for the semiparametric unit root hypothesis

i.e., apart from Assumption 1-2, no further structure is imposed on , the intercept , and the autocorrelation structure .

In the following section, we derive the (asymptotic) power envelope of tests that are (locally and asymptotically) invariant with respect to the nuisance parameters , , and . We consider both the non-symmetric () and the symmetric case (). Section 4 is subsequently devoted to tests, depending on a reference density that can be freely chosen, that are point optimal with respect to this power envelope and proves a Chernoff-Savage type result.

3 The power envelope for invariant tests

This section first introduces some notations and preliminaries (Section 3.1). Afterwards, we will derive the limit experiment (in the Le Cam sense) corresponding to the component unit root model (2.1)-(2.3) and provide a “structural” representation of this limit experiment (Section 3.2). In Section 3.3 we discuss, exploiting this structural representation, a natural invariance restriction, to be imposed on tests for the unit root hypothesis with respect to the infinite-dimensional nuisance parameter associated to the innovation density. We derive the maximal invariant and obtain from this the power envelope for invariant tests in the limit experiment. At last, in Section 3.4, we exploit the Asymptotic Representation Theorem to translate these results to obtain (asymptotically) optimal invariant test in the sequence of unit root models. Again we consider both the case of unrestricted densities and that of symmetric densities .

3.1 Preliminaries

We first introduce local reparameterizations for the parameter of interest and the nuisance autocorrelation structure . Then we discuss a convenient parametrization of perturbations to the innovation density which we use to deal with the semiparametric nature of the testing problem. These perturbations follow the standard approach of local alternatives in (semiparametric) models commonly used in experiments that are Locally Asymptotically Normal (LAN). We will see that, with respect to all parameters but , the model is actually LAN; compare also Remark 3.1 below. Moreover, we introduce some partial sum processes that we need in the sequel, as well as their Brownian limits.

Local reparameterizations of and

It is well-known, and goes back to Phillips (1987), Chan and Wei (1988) and Phillips and Perron (1988), that the contiguity rate for the unit root testing problem, i.e., the fastest convergence rate at which it is possible to distinguish (with non-trivial power) the unit root from a stationary alternative , is given by . Therefore, in order to compare performances of tests with this proper rate of convergence, we reparametrize the autoregression parameter into its local-to-unity form, i.e.,

(3.1)

The appropriate local reparameterization for the lag polynomial is of a traditional form with rate , i.e.,

(3.2)

where the local perturbation is defined by with local parameter . As is open, for large enough.

Perturbations to the innovation density

To describe the local perturbations to the density , we need the separable Hilbert space

where denotes the space of Lebesgue-measurable functions satisfying . Because of the separability, there exists a countable orthonormal basis , , of (see, e.g., Rudin (1987, Theorem 3.14)). This basis can be chosen such that , for all , i.e., each is bounded and two times continuously differentiable with bounded derivatives. Moreover, and . Hence each function can be written as , for some . Besides the sequence space we also need the sequence space which is defined as the set of sequences with finite support, i.e.,

Of course, is a dense subspace of . Given the orthonormal basis and , we introduce the following perturbation to the density :

(3.3)

The rate is already indicative of the standard LAN behavior of the nuisance parameter as will formally follow from Proposition 3.2 below.

For the symmetric case, we can assume that the perturbations , , are also symmetric about zero.

The following proposition shows that the perturbations, both for the non-symmetric and for the symmetric case, are valid in the sense that they satisfy the conditions on the innovation density that we imposed throughout on the model (Assumption 1). The proof is organized in 6.

Proposition 3.1.

Let and suppose . Then there exists such that for all we have . If we further restrict and is chosen symmetric about zero for , then there exists such that for all we have .

Remark 3.1.

In semiparametric statistics one typically parametrizes perturbations to a density by a so-called “non-parametric” score function , i.e., the perturbation takes the form for a suitable function ; see, for example, Bickel et al. (1998) for details. By using the basis , , we instead tackle all such perturbations simultaneously via the infinite-dimensional nuisance parameter . Of course, one would need to use as parameter space to “generate” all score functions . We instead restrict to which ensures (3.3) to be a density (for large ). For our purposes this restriction will be without cost. Intuitively, this is since is a dense subspace of (so if a property is “sufficiently continuous” one only needs to establish it on because it extends to the closure).

Partial sum processes

To describe the limit experiment in Section 3.2, we introduce some partial sum processes and their limits. These results are fairly classical but, for completeness, precise statements are organized in Lemma A.1 in the supplementary material.

Define, for , the partial sum processes

Note that we pick the starting point of the sums at , so that these partial sum processes are (maximally) invariant with respect to the intercept (otherwise, e.g., for , the term contains ).

Using Assumption 1 we find, under the null hypothesis, joint weak convergence of observation processes , , , and to Brownian motions that we denote by , , and , respectively.222All weak convergences in this paper are in product spaces of with the uniform topology. These limiting Brownian motions are defined on a probability space . Let us already mention that we will introduce a collection of probability measures , on , representing the limit experiment, in Section 3.2. We use the notational convention that probability measures related to the limit experiment (i.e., to the “W-processes”) are denoted by , while probability measures related to the finite-sample unit root model, i.e., observing , will be denoted by .

We remark that integrals like can be shown to converge weakly, under the null hypothesis, to the associated stochastic integral with the limiting Brownian motions, i.e., to . Weak convergence of integrals like follows from an application of the continuous mapping theorem. Again, details can be found in the proof of Proposition 3.2 in the 6.

Behavior of , , and under when

As and are orthogonal for each , it holds that and , , are all mutually independent. Moreover, we have

As is the score of the location model, it is well known (see, for example, Bickel et al. (1998)) that we have (under Assumption 1) and . Consequently, for , again because and are orthogonal for each , we can decompose

(3.4)

with coefficients . This establishes

(3.5)

Moreover, we have, for ,

(3.6)

and

(3.7)

As for : since, under the null, is independent of the innovation and , , it follows

(3.8)

and

(3.9)

We define the covariance matrix of as

(3.10)

with

Behavior of , , and under when

In this case, the density function is an even function and so are the perturbation functions , . The score is now an odd function. Therefore, cannot be decomposed by and anymore as in (3.4) and (3.5). Instead of (3.7) we now have

for all and . All the other results mentioned above still hold.

3.2 A structural representation of the limit experiment

The results in the previous section are needed to study the asymptotic behavior of log-likelihood ratios. These in turn determine the limit experiment, which we use to study asymptotically optimal procedures invariant with respect to the nuisance parameters. Thus, fix , , and . Let, for , , and , denote the law of under (2.1)-(2.3) with parameter given by (3.1), given by (3.2), and innovation density (3.3). The following proposition shows that the semiparametric unit root model is of the Locally Asymptotically Brownian Functional (LABF) type introduced in Jeganathan (1995).

Proposition 3.2.

Let , , , , , and . Let denote differencing, i.e., .

  1. Then we have, under ,

    where the central-sequence , with , is given by

    and

  2. Moreover, with , , and , , we have, still under and as ,

    (3.11)

    where

  3. For all , and the right-hand side of (3.11) has unit expectation under .

Of course, Proposition 3.2 still holds for ; in that case we have . The proof of (i) follows by an application of Proposition 1 in Hallin, Van Den Akker and Werker (2015) which provides generally applicable sufficient conditions for the quadratic expansion of log likelihood ratios. Of course, Part (ii) is not surprising and follows using the weak convergence of the partial sum processes to Brownian motions (and integrals involving the partial sum processes to stochastic integrals) discussed above. Finally, Part (iii) follows by verifying the Novikov condition. For the sake of completeness, detailed proofs are organized in 6.

Part (iii) of the proposition implies that we can introduce, for , , and , new probability measures on the measurable space (on which the processes , , and were defined) by their Radon-Nikodym derivatives with respect to :

Proposition 3.2 then implies that the sequence of (local) unit root experiments (each yields an experiment) weakly converges (in the Le Cam sense) to the experiment described by the probability measures . Formally, we define the sequence of experiments of interest by

for , and the limit experiment by, with the Borel -field on ,

Note that the latter experiment indeed depends on as the measure depends on .

Corollary 3.1.

Let , , and . Then the sequence of experiments , , converges to the experiment as .

The Asymptotic Representation Theorem (see, for example, Chapter 9 in van der Vaart (2000)) implies that for any statistic which converges in distribution to the law , under , there exists a (possibly randomized) statistic , defined on , such that the law of under is given by . This allows us to study (asymptotically) optimal inference: the “best” procedure in the limit experiment also yields a bound for the sequence of experiments. If one is able to construct a statistic (for the sequence) that attains this bound, it follows that the bound is sharp and the statistic is called (asymptotically) optimal. This is precisely what we do: Section 3.3 establishes the bound and in Section 4 we introduce statistics attaining it.

To obtain more insight in the limit experiment the following proposition, which follows by a direct application of Girsanov’s theorem, provides a “structural” description of the limit experiment.

Proposition 3.3.

Let , , , and . The processes , , and , , defined by the starting values , , and the stochastic differential equations, for ,

are zero-drift Brownian motions under . Their joint law is that of under .

For the case , Proposition 3.3 still applies with . Moreover, for this case, we denote by the associated sequence of experiments and by the associated limit experiment.

Remark 3.2.

Part (i) and (ii) Proposition 3.2 show that the parameter vanishes in the limit. More explicitly, in the proof of this proposition, we replace in the likelihood ratio term by and then show that the difference term is . On the other hand, one could also “localize” the parameter as (with rate ) as in Jansson (2008). As shown in that paper, the term associated to the parameter does not change with and is independent of the other terms of the likelihood ratio. By the additively separable structure, we can treat the parameter “as if” it is known. In either way, inference for would be invariant with respect to in the limit. Analogously, in the finite-sample experiment , is eliminated (automatically) by using the increments , , which are (maximally) invariant with respect to (see Section 4).

3.3 The limit experiment: invariance and power envelope

In this section, we consider the limit experiments and . In these experiments, we observe the processes , , , and , , continuously on the time interval from the model , and we are interested in the power envelopes for testing the hypothesis

(3.12)

To eliminate the nuisance parameters and , we first propose a statistic that is sufficient for the parameter of interest and does not depend on . Afterwards, using Proposition 3.3, we discuss a natural invariance structure with respect to the infinite-dimensional nuisance parameter . We derive the maximal invariant and apply the Neyman-Pearson lemma to obtain the power envelopes of invariant tests within the experiments and , respectively in Section 3.3.1 and Section 3.3.2.

We begin with the elimination of . The statistic serves as a sufficient statistic for the parameter . This is because, according to the structural version of (or ) in Proposition 3.3, the distribution of is only affected by and so is the distribution of the process only affected by and . It then follows that the distribution of the statistic conditional on the statistic is only a function of and and, in turn, does not depend on . Next, observe that the sufficient statistic is independent of the process and thus has a distribution that does not depend on . This allows us to restrict attention to the sufficient statistic to conduct inference for , and the nuisance parameter disappears from the likelihoods.

3.3.1 Elimination of in and the associated power envelope

The elimination of the nuisance parameter is more involved and different for the limit experiments and . We start with .

In Proposition 3.3, if one applies the decompositions (3.5) and (3.7) to the first equation (of ) and the fourth equation (of ), one retrieves the second equation (of ). This essentially allows us to omit the process and restrict the observations to the processes and , .

Now we formalize the invariance structure with respect to . Introduce, for , the transformation defined by, for ,

(3.13)

i.e., adds a drift to its argument process. Proposition 3.3 implies that the law of under is the same as the law of under . Hence our testing problem (3.12) is invariant with respect to the transformations . Therefore, following the invariance principle, it is natural to restrict attention to test statistics that are invariant with respect to these transformations as well, i.e., test statistics that satisfy

(3.14)

Given a process let us define the associated bridge process by . Now note that we have, for all and ,

i.e., taking the bridge of a process ensures invariance with respect to adding drifts to that process. Define the mapping by

where . It follows that statistics that are measurable with respect to the -field,

(3.15)

are invariant (with respect to , ). It is, however, not (immediately) clear that we did not throw away too much information. Formally, we need to be maximally invariant which means that each invariant statistic is -measurable. The following theorem, which once more exploits the structural description of the limit experiment, shows that this indeed is the case.

Theorem 3.1.

The -field in (3.15) is maximally invariant for the group of transformations , , in the experiment .

The above theorem implies that invariant inference must be based on . An application of the Neyman-Pearson lemma, using as observation, yields the power envelope for the class of invariant tests. To be precise, consider the likelihood ratios restricted to , which are given by

where the conditional expectation indeed does not depend on precisely because of the invariance. The conditional expectation also does not depend on as a result of the arguments stated at the beginning of this section.

To calculate this conditional expectation, we first introduce , i.e., the bridge process associated to . Following the decomposition in (3.5), we can decompose with

Note that part is -measurable. Under the random variables , , are independent of and , . Indeed, the independence of holds by construction and the independence of is a well-known, and easy to verify, property of Brownian bridges. We thus obtain, since is -measurable as well,

with