T-statistic for Autoregressive process

T-statistic for Autoregressive process

Eric Benhamou , , A.I. SQUARE CONNECT, 35 Boulevard d’Inkermann 92200 Neuilly sur Seine, FranceLAMSADE, UniversitÃ© Paris Dauphine, Place du MarÃ©chal de Lattre de Tassigny,75016 Paris, FranceE-mail: eric.benhamou@aisquareconnect.com, eric.benhamou@dauphine.eu
Abstract

In this paper, we discuss the distribution of the t-statistic under the assumption of normal autoregressive distribution for the underlying discrete time process. This result generalizes the classical result of the traditional t-distribution where the underlying discrete time process follows an uncorrelated normal distribution. However, for AR(1), the underlying process is correlated. All traditional results break down and the resulting t-statistic is a new distribution that converges asymptotically to a normal. We give an explicit formula for this new distribution obtained as the ratio of two dependent distribution (a normal and the distribution of the norm of another independent normal distribution). We also provide a modified statistic that is follows a non central t-distribution. Its derivation comes from finding an orthogonal basis for the the initial circulant Toeplitz covariance matrix. Our findings are consistent with the asymptotic distribution for the t-statistic derived for the asympotic case of large number of observations or zero correlation. This exact finding of this distribution has applications in multiple fields and in particular provides a way to derive the exact distribution of the Sharpe ratio under normal AR(1) assumptions.

AMS 1991 subject classification: 62E10, 62E15

Keywords: t-Student, Auto regressive process, Toeplitz matrix, circulant matrix, non centered Student distribution

1 Introduction

Let be a random sample from a cumulative distribution function (cdf) with a constant mean and let define the following statistics referred to as the t-statistic

 Tn=T(Xn)=√n(¯Xn−μ)sn (1)

where is the empirical mean, the empirical Bessel corrected empirical variance, and the regular full history of the random sample defined by:

 ¯Xn=1nn∑i=1Xi,s2n=1n−1n∑i=1(Xi−¯Xn)2,Xn=(X1,…,Xn)T (2)

It is well known that if the sample comes from a normal distribution, , has the Student t-distribution with degrees of freedom. The proof is quite simple (we provide a few in the appendix section in A.1). If the variables have a mean non equal to zero, the distribution is referred to as a non-central t-distribution with non centrality parameter given by

 η=√nμσ (3)

Extension to weaker condition for the t-statistics has been widely studied.

Mauldon Mauldon (1956) raised the question for which pdfs the t-statistic as defined by 1 is t-distributed with degrees of freedom. Indeed, this characterization problem can be generalized to the one of finding all the pdfs for which a certain statistic possesses the property which is a characteristic for these pdfs. Kagan et al. (1973), Bondesson (1974) and Bondesson (1983) to cite a few tackled Mauldonâs problem. Bondesson (1983) proved the necessary and sufficient condition for a t-statistic to have Studentâs t-distribution with degrees of freedom for all sample sizes is the normality of the underlying distribution. It is not necessary that is an independent sample. Indeed consider as a random vector each component of which having the same marginal distribution function, . Efron (1969) has pointed out that the weaker condition of symmetry can replace the normality assumption. Later, Fang et al. (2001) showed that if the vector has a spherical distribution, then the t-statistic has a t-distribution. A natural question that gives birth to this paper was to check if the Student resulting distribution is conserved in the case of an underlying process following an AR(1) process. This question and its answer has more implication than a simple theoretical problem. Indeed, if by any chance, one may want to test the statistical significance of a coefficient in a regression, one may do a t-test and rely upon the fact that the resulting distribution is a Student one. If by any chance, the observations are not independent but suffer from auto-correlation, the building blocks supporting the test break down. Surprisingly, as this problem is not easy, there has been few research on this problem. Even if this is related to the Dickey Fuller statistic (whose distribution is not closed form and needs to be computed by Monte Carlo simulation), this is not the same statistics. Mikusheva (2015) applied an Edgeworth expansion precisely to the Dickey Fuller statistics but not to the original t-statistic. The neighboring Dickey Fuller statistic has the great advantage to be equal to the ratio of two known continuous time stochastic process making the problem easier. In the sequel, we will first review the problem, comment on the particular case of zero correlation and the resulting consequence of the t-statistic. We will emphasize the difference and challenge when suddenly, the underlying observations are not any more independent. We will study the numerator and denominator of the t-statistic and derive their underlying distribution. We will in particular prove that it is only in the case of normal noise in the underlying AR(1) process, that the numerator and denominator are independent. We will then provide a few approximation for this statistic and conclude.

2 AR(1) process

The assumptions that the underlying process (or observations) follows an AR(1) writes :

 {Xt=μ+ϵtt≥1;ϵt=ρϵt−1+σvtt≥2; (4)

where is an independent white noise processes (i.i.d. variables with zero mean and unit constant variance). To assume a stationary process, we impose

 |ρ|≤1 (5)

It is easy to check that equation 4 is equivalent to

 Xt=μ+ρ(Xt−1−μ)+σvtt≥2; (6)

We can also easily check that the variance and covariance of the returns are given by

 V(Xt)=σ21−ρ2fort≥1Cov(Xt,Xu)=σ2ρ|t−u|1−ρ2fort,u≥1 (7)

Both expressions in 7 are independent of time and the covariance only depends on implying that is a stationary process.

2.1 Case of Normal errors

If in addition, we assume that are distributed according to a normal distribution, we can fully characterize the distribution of and rewrite our model in reduced matrix formulations as follows:

 X=⎛⎜ ⎜⎝X1⋮Xn⎞⎟ ⎟⎠=μ⋅\mathbbm1n+σ⋅ϵ=μ⎛⎜ ⎜⎝1⋮1⎞⎟ ⎟⎠+σ⎛⎜ ⎜⎝ϵ1⋮ϵn⎞⎟ ⎟⎠ (8)

where , hence, .

The matrix is a Toeplitz circulant matrix defined as

 Ω=11−ρ2⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1ρ…ρn−2ρn−1ρ1…ρn−3ρn−2⋮⋮⋱⋮⋮ρn−2ρn−3…1ρρn−1ρn−2…ρ1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠=MTM (9)

Its Chlolesky decomposition is given by

 M = 1√1−ρ2⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝10…00ρ√1−ρ2…00⋮⋮⋱⋮⋮ρn−2ρn−3√1−ρ2…√1−ρ20ρn−1ρn−2√1−ρ2…ρ√1−ρ2√1−ρ2⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ (10)

It is worth splitting into and another matrix as follows:

 M = (11)

The inverse of is given by

 A=Ω−1=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1−ρ…00−ρ1+ρ2…00⋮⋮⋱⋮⋮00…1+ρ2−ρ00…−ρ1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠=LTL (12)

Its Cholesky decomposition is given by

 L=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝√1−ρ20…00−ρ1…00⋮⋮⋱⋮⋮00…1000…−ρ1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ (13)

Notice in the various matrix the dissymmetry between the first term and the rest. This shows up for instance in the first diagonal term of which is , while all other diagonal terms are equal to 1. Similarly, in the matrix , we can notice that the first column is quite different from the other ones as it is a fraction over .

2.2 T-statistics issue

The T-statistic given by equation 1 is not easy to compute. For the numerator, we have that follows a normal distribution. The proof is immediate as is a linear combination of the Gaussian vector generated by the AR(1) process. We have . It follows that (for a quick proof of the fact that any linear combination of a Gaussian vector is normal, see B.1). In section 3, we will come back on the exact computation of the characteristics of the distribution of the numerator and denominator as this will be useful in the rest of the paper.

As for the denominator, for a non null correlation , the distribution of is not a known distribution.

The distributions of the variables are normal given by with . where

Hence the square of these normal variables is the sum of Gamma distributions. However, we cannot obtain a closed form for the distribution as the variance of the different terms are different and the terms are neither independent. If the correlation is null, and only in this specific case, we can apply the Cochranâs Theorem to prove that follows a Chi square distribution with degree of freedom. However, in the general case, we need to rely on approximation that will be presented in the rest of the paper.

Another interesting result is to use the Cholesky decomposition of the inverse of the covariance matrix of our process to infer a modified t-statistic that has now independent terms and is defined as follows

Let us take the modified process defined by

 U=LX (14)

The variables is distributed according to a normal . We can compute the modified T-statistic on as follows:

 ~Tn=√n(¯Un−μ)~sn (15)

where

 ¯Un=1nn∑i=1Ui,~s2n=1n−1n∑i=1(Ui−¯Un)2 (16)

In this specific case, the distribution of is a Student distribution of degree . We will now work on the numerator and denominator of the T-statistic in the specific case of AR(1) with a non null correlation .

3 Expectation and variance of numerator and denominator

The numerator of the T-statistic writes

 √n(¯Xn−μ)=1√nn∑i=1(Xi−μ), (17)

Its expectation is null as each term is of zero expectation. Its variance is given by

{lemma}
 Var(√n(¯Xn−μ)) = σ21−ρ2[1+ρ1−ρ−2ρ(1−ρn)n(1−ρ)2] (18) = σ2(1−ρ)2[1−2ρ(1−ρn)n(1−ρ)(1+ρ)] (19)
Proof.

: See B.2

The proposition 3 is interesting as it states that the sample mean variance converges to  for large . It is useful to keep the two forms of the variance. The first one (equation (18)) is useful in following computation as it shares the denominator term . The second form (equation 19) gives the asymptotic form.

The denominator writes:

 sn= ⎷1n−1n∑i=1(Xi−¯Xn)2, (20)

In the following, we denote by the zero mean variable and work with these variables to make computation easier. We also write the variable orthogonal to whose variance (we sometimes refer to it as its squared norm to make notation easier) is equal to the one of : . To see the impact of correlation, we can write for any , .

As studying this denominator is not easy because of the presence of the square root, it is easier to investigate the properties of its squared given by

 s2n=∑ni=1(Yi−¯Yn)2n−1=∑ni=1Y2i−n¯Y2nn−1 (21)

We have that the mean of is zero while proposition 3 gives its variance :

 Var(¯Yn) = σ2n(1−ρ2)[1+ρ1−ρ−2ρ(1−ρn)n(1−ρ)2]=σ2n(1−ρ)2[1−2ρ(1−ρn)n(1−ρ)(1+ρ)] (22)
{lemma}

The covariance between and each stochastic variable is useful and given by

 Cov(¯Yn,Yj)=σ2n(1−ρ2)[1+ρ−ρn+1−j−ρj1−ρ] (23)

In addition, we have a few remarkable identities

 n∑j=1Cov(¯Yn,Yj)=σ2(1−ρ2)[1+ρ1−ρ−2ρ(1−ρn)n(1−ρ)2] (24)
 n∑j=1(Cov(¯Yn,Yj))2=σ4(1−ρ2)2[(1+ρ)2+2ρn+1(1−ρ)21n−4(1+ρ)2ρ(1−ρn)−2ρ2(1−ρ2n)(1−ρ)2(1−ρ2)1n2] (25)
Proof.

: See B.3

We can now compute easily the expectation and variance of the denominators as follows

{proposition}

The expectation of is given by:

 Es2n=σ21−ρ2(1−2ρ(1−ρ)(n−1)+2ρ(1−ρn)n(n−1)(1−ρ)2) (26)
Proof.

: See B.4

{proposition}

The second moment of is given by:

 E[s4n] =σ4(1−ρ2)21(n−1)2[n2−1+ρ(nA1+A2+1nA3+1n2A4)] (27)

with

 A1 =−41−ρ2 (28) A2 =−2(3+9ρ+11ρ2+3ρ3+6ρn+12ρn+1+6ρn+2−2ρ2n+2)(1−ρ2)2 (29) A3 =4(1−ρn)(1−3ρ+4ρ2−8ρn+1)(1−r)3(1+r) (30) A4 =12ρ(1−ρn)2(1−ρ)4 (31)
Proof.

: See B.5

Combining the two results leads to {proposition} The variance of is given by:

 Var[s4n] =σ4(1−ρ2)21(n−1)[2+ρn−1(nB1+B2+1nB3+1n2B4)] (32)

with

 B1 =−21+ρ (33) B2 =−21−ρ−4ρ2(1−ρ)2−2(1−ρn)(1−ρ)2 (34) −2(12ρn+1+6ρn+2−2ρ2n+2+6ρn+3ρ3+11ρ2+9ρ+3)(1−ρ2)2 (35) B3 =(1−ρn)(13−4ρ+15ρ2−ρn−32ρn+1+ρn+2)(1−r)3(1+r) (36) B4 =−4(1−3ρ)(1−ρn)2(1−ρ)4 (37)
Proof.

: See B.6

It is worth noting that a direct approach as explained in Benhamou (2018) could also give the results for the first, second moments and variance for the numerator and denominator.

4 Resulting distribution

The previous section shows that under the AR(1) assumptions, the t-statistic is no longer a Student distribution but the ratio of a normal whose first and second moments have been given above and the norm of a Gaussian whose moments have also been provided. To go further, one need to rely on numerical integration. This is the subject of further research.

5 Conclusion

In this paper, we have given the explicit first, second moment and variance of the numerator of the t statistic under the assumption of AR(1) underlying process. We have seen that these moments are very sensitive to the correlation assumptions and that the distribution is far from a Student distribution.

Appendix A Various Proofs for the Student density

a.1 Deriving the t-student density

Let us first remark that in the T-statistic, the factor cancels out to show the degree of freedom as follows:

 Tn=¯X−μsn/√n=¯X−μσ√n1snσ=U1snσ=√n−1U√∑(Xi−¯X)2σ2=√n−1UV (38)

In the above expression, it is well know that if , then the renormalized variable and as well as and are independent. Hence, we need to prove that the distribution of is a Student distribution with and mutually independent, and is the degree of freedom of the chi squared distribution.

The core of the proof relies on two steps that can be proved by various means.
Step 1 is to prove that the distribution of is given by

 fT(t)=1Γ(k2)2k+12√πk∫∞0e−w(t22k+12)wk−12dw (39)

Step 2 is to compute explicitly the integral in equation 39
Step 1 can be done by transformation theory using the Jacobian of the inverse transformation or the property of the ratio distribution. Step 2 can be done by Gamma function, Gamma distribution properties, Mellin transform or Laplace transform.

a.2 Proving step 1

a.2.1 Using transformation theory

The joint density of and is:

 fU,V(u,v)=1(2π)1/2e−u2/2pdf N(0,1)1Γ(k2)2k/2v(k/2)−1e−v/2pdf χ2k (40)

with the distribution support given by and .

Making the transformation and , we can compute the inverse: and . The Jacobian 111determinant of the Jacobian matrix of the transformation is given by

 J(t,w)=∣∣ ∣∣(wk)1/2t2(kw)1/201∣∣ ∣∣ (41)

whose value is . The marginal pdf is therefore given by:

 fT(t) = ∫∞0fU,V(t(wk)1/2,w)J(t,w)dw (42) = ∫∞01(2π)1/2e−(t2wk)/21Γ(k2)2k/2w(k/2)−1e−w/2(w/k)1/2dw (43) = 1Γ(k2)2k+12√πk∫∞0e−w(t22k+12)wk−12dw (44)

which proves the result ∎

a.2.2 Using ratio distribution

The square-root of , is distributed as a chi-distribution with degrees of freedom, which has density

 f^V(^v)=21−k2Γ(k2)^vk−1exp{−^v22} (45)

Define . Then by change-of-variable, we can compute the density of :

 fX(x) = f^V(√kx)∣∣∂^V∂X∣∣ (46) = 21−k2Γ(k2)kk2xk−1exp{−kx22} (47)

The student’s t random variable defined as has a distribution given by the ratio distribution:

 fT(t)=∫∞−∞|x|fU(xt)fX(x)dx (48)

We can notice that over the interval since is a non-negative random variable. We are therefore entitled to eliminate the absolute value. This means that the integral reduces to

 fT(t) = ∫∞0xfU(xt)fX(x)dx (49) = ∫∞0x1√2πexp{−(xt)22}21−k2Γ(k2)kk2xk−1exp{−k2x2}dx (50) = 1√2π21−k2Γ(k2)kk2∫∞0xkexp{−12(k+t2)x2}dx (51)

To conclude, we make the following change of variable that leads to

 fT(t)=1Γ(k2)2k+12√πk∫∞0e−w(t22k+12)wk−12dw (52)

a.3 Proving step 2

The first step is quite relevant as it proves that the integral to compute takes various form depending on the change of variable done.

a.3.1 Using Gamma function

Using the change of variable and knowing that , we can easily conclude as follows:

 fT(t) = 1Γ(k2)2k+12√πk∫∞0e−w(t22k+12)wk−12dw (53) = (54) = (55) = Γ(k+12)Γ(k2)1√πk(kt2+k)k+12 (56)

a.3.2 Using Gamma distribution properties

Another way to conclude is to notice the kernel of a gamma distribution pdf given by in the integral of 39 with parameters . The generic pdf for the gamma distribution is and it sums to one over , hence

 fT(t) = 1Γ(k2)2k+12√πk∫∞0e−w(t22k+12)wk−12dw (57) = 1Γ(k2)2k+12√πkΓ(k+12)(t2+k2k)k+12 (58) = Γ(k+12)Γ(k2)1√πk(kt2+k)k+12 (59)

a.3.3 Using Mellin transform

The integral of equation 39 can be seen as a Mellin transform for the function , whose solution is well known and given by

 Mg(k+12)≡∫∞0xk+12−1g(x)dx=Γ(k+12)(t2+k2k)k+12 (60)

Like previously, this concludes the proof. ∎

a.3.4 Using Laplace transform

We can use a result of Laplace transform for the function as folllows:

 Lf(s)=∫∞0e−usuαdu=Γ(α+1)sα+1 (61)

Hence the integral is simply the the value of the Laplace transform of the polynomial function taken for , whose value is . Making the change of variable in equation 39 enables to conclude similarly to the proof for the Gamma function ∎

a.3.5 Using other transforms

Indeed, as the Laplace transform is related to other transform, we could also prove the result with LaplaceâStieltjes, Fourier, Z or Borel transform.

a.4 Sum of independent normals

We want to prove that if then . There are multiple proofs for this results:

• Recursive derivation

• Cochran’s theorem

a.4.1 Recursive derivation

{lemma}

Let us remind a simple lemma:

• If is a random variable, then ; which states that the square of a standard normal random variable is a chi-squared random variable.

• If are independent and then , which states that independent chi-squared variables add to a chi-squared variable with its degree of freedom equal to the sum of individual degree of freedom.

The proof of this simple lemma can be established with variable transformations for the fist part and by moment generating function for the second part. We can now prove the following proposition

{proposition}

If is a random sample from a distribution, then

• and are independent random variables.

• has a distribution where denotes the normal distribution.

• has a chi-squared distribution with degrees of freedom.

Proof.

Without loss of generality, we assume that and . We first show that can be written only in terms of . This comes from:

 s2n = 1n−1n∑i=1(Xi−¯Xn)2=1n−1[(X1−¯Xn)2+n∑i=2(Xi−¯Xn)2] (62) = 1n−1[(n∑i=2(Xi−¯Xn))2+n∑i=2(Xi−¯Xn)2] (63)

where we have use the fact that , hence .

We now show that and are independent as follows: The joint pdf of the sample is given by

 f(x1,…,xn)=1(2π)n/2e−12∑ni=1x2i,−∞

We make the

 y1 = ¯x (65) y2 = x2−¯x (66) ⋮ (67) yn = xn−¯x (68)

The Jacobian of the transformation is equal to . Hence

 f(y1,,…,yn) = n(2π)n/2e−12(y1−∑ni=2yi)2e−12∑ni=2(yi+y1)2,−∞

which proves that is independent of , or equivalently, is independent of . To finalize the proof, we need to derive a recursive equation for as follows: We first notice that there is a relationship between and as follows:

 ¯xn+1=∑n+1i=1xin+1=xn+1+n¯xnn+1=¯xn+1n+1(xn+1−¯xn), (71)

We have therefore:

 ns2n+1 = n+1∑i=1(xi−¯xn+1)2=n+1∑i=1[(xi−¯xn)−1n+1(xn+1−¯xn)]2 (72) = n+1∑i=1[(xi−¯xn)2−2(xi−¯xn)(xn+1−¯xnn+1)+1(n+1)2(xn+1−¯xn)2] (73) = n+1∑i=1(xi−¯xn)2+(xn+1−¯xn)2−2(xn+1−¯xn)2n+1+(n+1)(n+1)2(xn+1−¯xn)2 (74) = (n−1)s2n+nn+1(xn+1−¯xn)2 (75)

We can now get the result by induction. The result is true for since with , hence . Suppose it is true for , that is , then since