Modelling high-dimensional time series efficiently by means of constrained spatio–temporal models

# Modelling high-dimensional time series efficiently by means of constrained spatio–temporal models

Maria Lucia Parrella***Maria Lucia Parrella, Università di Salerno - Via Giovanni Paolo II 137 - 84084 Fisciano (SA), Italy, tel: +3908996221, fax: +39089962049, email: mparrella@unisa.it
Department of Economics and Statistics, University of Salerno, Italy
mparrella@unisa.it
###### Abstract

Many econometric analyses involve spatio–temporal data, where the concept of space may be interpreted in a wide sense (from a physical, economic, or social point of view). A considerable amount of literature has addressed spatio–temporal models, with Spatial Dynamic Panel Data (SDPD) being widely investigated and applied. In real data applications, checking the validity of the theoretical assumptions underlying the SDPD models is essential but sometimes difficult. At other times, the assumptions are clearly violated. For example, the spatial matrix (which gives the spatial weights and influences the correlation structure of the process) is assumed to be known, but it may actually be unknown and needs to be estimated. In such cases, the properties and performance of the SDPD model’s estimator are generally affected. Motivated by such considerations, we propose a new model (called stationary SDPD) and a new estimation procedure based on very simple and clear assumptions that can be easily checked using real data. The new model is highly adaptive, and the estimation procedure has a rate of convergence that is not affected by the dimension of the time series (under general assumptions), notwithstanding the relatively high number of parameters to be estimated. A further result of this work is that the new model may also be used to represent a wide class of multivariate time series that are not explicitly spatio–temporal, with respect to a “latent spatial matrix,” which needs to be estimated. Therefore, it can be used as a valid alternative to vector autoregressive (VAR) models with two immediate advantages: i) a faster rate of convergence of the estimation procedure and ii) the possibility of estimating the model even when the dimension is higher than the time series length, overcoming the curse of dimensionality typical of the VAR models. The simulation study shows that the new estimation procedure performs well compared with the classic alternative procedure, even when the spatial matrix is unknown, and therefore, to be estimated.
Keywords: multivariate time series, spatio–temporal models, VAR models, curse of dimensionality, high dimension.
JEL: C23, C50, C51.

## 1 Introduction

Spatio–temporal models are widely used by practitioners. Explaining economic, environmental, social, or biological phenomena, such as peer influence, neighbourhood effects, contagion, epidemics, interdependent preferences, climate change, and so on, are only some of the interesting applications of such models. A widely used spatio–temporal model is the spatial dynamic panel data model (SDPD) proposed and analysed by [Lee & Yu(2010a)]. See [Lee & Yu(2010b)] for a survey. To improve adaptivity of SDPD models, [Dou et al.(2016)] recently proposed a generalized model that assigns different coefficients to varied locations and assumes heteroskedastic and spatially correlated errors. The model is

 yt=D(λ0)Wyt+D(λ1)yt−1+D(λ2)Wyt−1+\boldmathεt, (1.1)

where the vector is of order and contains the observations at time from different locations; the errors are serially uncorrelated; the spatial matrix is a weight matrix with zero main diagonal and is assumed to be known; with are diagonal matrices, and are the vectors with the spatial coefficients for . The generalized SDPD model in (1.1) guarantees adaptivity by means of its parameters. It is characterized by the sum of three terms: the spatial component, driven by matrix and the spatial parameter ; the dynamic component, driven by the autoregressive parameter ; and the spatial–dynamic component, driven by matrix and the spatial–autoregressive parameter . If the vectors are scalars for all , then model (1.1) reduces to the classic SDPD of [Lee & Yu(2010a)].

The errors in model (1.1) are serially uncorrelated and may show heteroskedasticity and cross-correlation over space, so that is a full matrix. This is a novelty compared with the SDPD model of [Lee & Yu(2010a)], where the errors must be cross-uncorrelated and homoskedastic in order to get consistency of the estimators. A setup similar to ours for the errors has been also considered by [Kelejian & Prucha(2010)] and [Su(2012)], but not for panel models. However, their estimators are generally based on the instrumental variables technique, in order to overcome the endogeneity of the zero-lag regressor. For the generalized SDPD model, instead, [Dou et al.(2016)] propose a new estimation procedure based on a generalized Yule–Walker approach. They show the consistency of the estimators under regularity assumptions. They also derive the convergence rate and the conditions under which the estimation procedure does not suffer for high-dimensional setups, notwithstanding the large number of parameters to be estimated (which become infinite with the dimension ).

In real data applications, it is important to check the validity of the assumptions required for the consistency of the estimation procedure. See, for example, the assumptions and asymptotic analysis in [Lee & Yu(2010a)] and [Dou et al.(2016)] as well as the references therein. Checking such assumptions on real data is often not easy; at times, they are clearly violated. Moreover, the spatial matrix is assumed to be known, but in many cases, this is not true, and it must be estimated. For example, the spatial weights can be associated with “similarities” between spatial units and measured by estimated correlations. Another example is when the spatial weights are zeroes/ones, depending on the “relationships” between the spatial units, but the neighbourhood structure of is unknown (i.e., it is not known where the ones must be allocated). In such cases, the performance of the SDPD models needs to be investigated. Readers are advised to refer to recent papers on spatial matrix estimation (see, among others, [Lam & Souza(2016)]).

Motivated by the above considerations, we propose a new version of the SDPD model obtained by adding a constraint to the spatial parameters of the generalized SDPD of [Dou et al.(2016)]. New estimators of the parameters are proposed and investigated theoretically and empirically.

The new model is called stationary SDPD and has several advantages. First, the structure of the model and the interpretation of the parameters are simpler than the generalized SDPD model, with the consequence that the assumptions underlying the theoretical results are clearer and can be checked easily with real data. Moreover, the estimation procedure is fast and simple to implement.

Second, the proposed estimators of the parameters are always unbiased and reach the convergence rate (where is the temporal length of the time series) even in the high-dimensional case, although the number of parameters tends to infinity with the dimension .

Last, but not least, our model allows wider application than the classic SDPD model, and it is general enough to represent a wide range of multivariate linear processes that can be implicitly interpreted (when they are not explicitly interpretable) as spatio–temporal processes, with respect to a “latent spatial matrix,” which needs to be estimated. A big implication of this is that our model is not necessarily confined to the representation of strict spatio–temporal processes (where the spatial matrix is known), but it can also be considered as a valid alternative to the general VAR models (where there is no spatial matrix), with two relevant advantages: i) more efficient estimation of the model and ii) the possibility of estimating the model even when , thus avoiding the curse of dimensionality that characterizes the VAR models. Surprisingly, the simulation results show the remarkably better performance of our model and the new estimation procedure compared with the standard VAR model and the standard estimation procedure, even when the spatial matrix is latent and, therefore, to be estimated (see section 5.3).

The rest of the paper is organized as follows. Section 2 presents the new model and discusses the issue of identifiability. Section 3 describes the estimation procedure. The theoretical results are shown in section 4. The empirical performance of the estimation procedure is investigated in section 5. Finally, all the proofs are provided in the Appendix.

## 2 A constrained spatio–temporal model: the stationary SDPD

In the sequel, we assume that are the observations from a stationary process defined by (2.1). The transpose of a matrix is denoted with . We assume that the process has mean zero and denote with the covariance matrix of the process at the lag . The generalized SDPD model in (1.1) can be reformulated as follows.

 [Ip−D(λ0)W]yt=D(λ1)[Ip−D(λ+2)W]yt−1+\boldmathεt, (2.1)

where is a vector obtained by dividing the elements of by the corresponding elements of (assuming, for now, that all the coefficients in are different from zero). Note that model (2.1) is equivalent to a multivariate (auto)regression between a linear combination of and a linear combination of the lag , where the weights of the two linear combinations depend on and the coefficients and , respectively.

 z(λ0,W)t=D(λ1)z(λ+2,W)t−1+\boldmathεt. (2.2)

Some special cases may arise from model (2.1) by adding restrictions on the parameters . First, if we assume that the spatial parameters are constant over space, that is, is scalar for , then we obtain the classic SDPD model of [Lee & Yu(2010a)].

Another constrained model, proposed and analysed in this paper, may be derived by assuming that . The reason underlying the choice of this constraint is a generalized assumption of stationarity. In time series analysis, stationarity means that the dependence structure of the process is constant (in some sense) over time. In particular, second-order stationarity assumes that correlations between the observations depend on the lag but not on , implying that is constant for all . However, in spatio–temporal time series, there are two kinds of correlations: temporal correlations, involving observations at different time points, and spatial correlations, involving observations at different spatial units. As we refer to stationarity, it makes sense to assume that spatial correlations are also time-invariant, which means that the weights in (2.2) must not change over time, thus, , also implying that is the same for all . Therefore, we add the constraint , and the model becomes

 [Ip−D(λ0)W]yt=D(λ1)[Ip−D(λ0)W]yt−1+\boldmathεt. (2.3)

We denote the model as stationary SDPD. Model (2.3) has several advantages that will be shown in the following sections. Above all, imposing spatio–temporal stationarity helps gain efficiency while still preserving the spatial adaptability that characterizes the generalized SDPD model of [Dou et al.(2016)]. Moreover, our model allows representation of a wide range of multivariate processes by means of a simple model subject to few assumptions that can be easily checked using real data.

Finally, it is worthwhile to stress the difference between the SDPD model of [Lee & Yu(2010a)], the generalized SDPD model of [Dou et al.(2016)], and the stationary SDPD model proposed here. The first model imposes that the spatial relationships be the same for all units, since the coefficients (with ) do not change with . Instead, the stationary SDPD model in (2.3) allows varied coefficients for different spatial units, as in the generalized SDPD of [Dou et al.(2016)], but they are assumed to be time-invariant thanks to a constraint on the time-lagged parameters. Of course, the estimation procedures vary for the three cases in terms of the convergence rates. The constrained model underlying our stationary SDPD allows the estimation procedure to reach the convergence rate and to guarantee unbiased estimators, whatever the dimension and even when at any rate. This is a big improvement with respect to the other two models. In fact, for the classic SDPD model, the estimators are characterized by a convergence rate (which is faster than that of our model, since they have only three parameters to estimate instead of ), but a bias of order exists, and it does not vanish when (see Theorem 3 of [Lee & Yu(2010a)]). On the other hand, the convergence rates of the estimators in the generalized SDPD model are slower than those of our model and deteriorate when at a rate faster than (see Theorem 2 of [Dou et al.(2016)]).

### 2.1 Identification of parameters in the case of cross-uncorrelated errors

In this section, we assume, for simplicity, that the matrix is diagonal (i.e., there is heteroskedasticity but no cross-correlation in the error process) and discuss the identifiability of the model. In the next section, we generalize the problem by also adding some cross-correlations in the error process.

Defining , model (2.3) can be reformulated as

 z(0)t = D(λ1)z(0)t−1+% \boldmathεt, (2.4)

which is a transformed VAR process with uncorrelated components, since is diagonal. Given that we assume that is also diagonal, the coefficients for , represent the slopes of univariate autoregressive models with respect to the latent variables . Therefore, . From (2.3), it follows that

 [Ip−D(λ0)W]Σ1=D(λ1)[Ip−D(λ0)W]Σ0 (2.5)

and for the -th equation,

 (eTi−λ0iwTi)Σ1=λ1i(eTi−λ0iwTi)Σ0, (2.6)

where is the column vector with its -th component equal to one and all the others equal to zero, while is the column vector containing the -th row of matrix .

Under general assumptions, (2.6) admits only one solution with respect to and (see Theorem 1), which can be found among the extreme points of as a function of . To provide insight into this, the first two plots of figure 1 show two examples based on model 1 used in the simulation study. Denote with ( the true values of the coefficients used in model 1 for a given location (in particular, in figure 1, the first two plots refer to locations and ). The solid line shows as a function of . The two dots show the points of this function where the first derivative is zero. The vertical and horizontal dashed lines identify which one of the two points satisfies the sufficient condition in (2.6). As expected, it coincides with the true values ( used to generate the time series. Theorem 1, shown in the Appendix, formalizes this result.

###### Theorem 1

Consider model (2.3) for a stationary process with mean zero, and assume that the error process is such that is diagonal (i.e., there is heteroskedasticity but no cross-correlation in the errors). Under assumptions in section 4, the following results hold:

1. There exist a unique couple of values satisfying the following system of equations:

 (eTi−λ0iwTi)Σ1−λ1i(eTi−λ0iwTi)Σ0=0T,i=1,…,p, (2.7)

where is the -th unit vector, and contains the -th row of the spatial matrix .

2. Such a point, , is also the solution of the following second-order polynomial equation:

 = 0 (2.8)

Remark 1: Theorem 1 not only shows that the stationary SDPD model is well identified, because there is a unique solution for , but it also suggests a way to estimate such parameters. In fact, we can find all the solutions to equation (2.8) and then check which one satisfies the sufficient condition in (2.6). This estimation procedure is described in section 3.

### 2.2 Identification of parameters in the case of cross-correlated errors

Now, we relax the assumption on the error by letting be a full matrix (i.e., there is heteroskedasticity and cross-correlation in the error process). This setup allows the process to include some spurious cross-correlation not explained by . In this case, the coefficients still give the correlations between the latent variables and , but now, the equations in model (2.4) are correlated. The main consequence of this is that the true values do not identify an extreme point of the correlation function (see case in figure 1). Anyway, the sufficient condition in (2.6) is still valid, and the true coefficients can be identified by introducing a “constrained” condition.

###### Theorem 2

Consider model (2.3) for a stationary process with mean zero, and assume that the error process is such that is a full matrix (i.e., there is heteroskedasticity and cross-correlation in the errors). Under assumptions in section 4, the following results hold:

1. There exist a unique couple of values satisfying the following system of equations

 (eTi−λ0iwTi)Σ1−λ1i(eTi−λ0iwTi)Σ0=0T,i=1,…,p,

where is the -th unit vector, and contains the -th row of the spatial matrix ;

2. such a point, , is also the solution of the following second-order polynomial equation.

 = (eTi−λ∗0iwTi)(ΣT1−Σ1)wi.

Remark 2: When the errors are not cross-correlated, the matrix is symmetric by Lemma 1 in the Appendix, so that point 2 in Theorem 2 becomes the same as in Theorem 1. Therefore, Theorem 2 includes Theorem 1 as a special case.

## 3 Estimation procedure

We present here a simple algorithm for the estimation of the parameters for . First, estimate the matrices and through some consistent estimators and . For example, and , where . Alternatively, the threshold estimator analyzed in [Chen et al.(2013)] can be considered in the high dimensional setup. Then, for each location , implement the following steps.

1. Define as the -th unit vector and , then compute:

 ˆa0i = eTiˆΣ0ei,ˆa1i=eTiˆΣ1ei,ˆa2i=eTi(ˆΣT1−ˆΣ1)wi, ˆb0i = −2eTiˆΣ0wi,ˆb1i=−eTi(ˆΣ1+ˆΣT1)wi, ˆc0i = wTiˆΣ0wi,ˆc1i=wTiˆΣ1wi.
2. Find the two roots and of the following two-order polynomial equation.

 ˆt0i+ˆt1iλi0+ˆt2iλ2i0=0, (3.1)

where , , and .

3. Estimate and by

 ˆλ0i = argminj=1,2vTijvij, (3.2) ˆλ1i = (eTi−ˆλ0iwTi)ˆΣ1(ei−ˆλ0iwi)(eTi−ˆλ0iwTi)ˆΣ0(ei−ˆλ0iwi), (3.3)

where , and .

Remark 3: Assumption in section 4 guarantees that matrix has full rank. However, the above estimation procedure may suffer for some locations if matrix is near singularity. Such a case may come about because of the presence of some almost linearly dependent rows in the matrix, which may cause the quantity to be almost zero for those rows (see Lemma 2). As a result, the procedure loose efficiency for the estimation of for those locations (but it still works for ). Something similar may happen if there are some (almost) zero rows in , which is excluded by assumption . Anyway, it is worthwhile to stress that the estimation procedure works efficiently for all the other locations. In fact, the procedure does not require the inversion of matrix , so it is able to isolate and separate the effects of “collinear” locations (or uncorrelated locations) from the other locations and to guarantee consistent and efficient estimations for the last locations.

## 4 Theoretical results

In this section, we show the theoretical foundations of our proposal. In particular, we present the assumptions and show the consistency and the asymptotic normality of the estimators, for the cases of finite dimension and high dimension. Moreover, we show that the stationary SDPD model can be used to represent a wide range of multivariate linear processes with respect to a “latent spatial matrix,” and therefore, it is of wider application than classic spatio–temporal contexts.

The reduced form of model (2.3) is

 yt=A∗yt−1+\boldmathε∗t, (4.1)

where and

 A∗=[Ip−D(λ0)W]−1D(λ1)[Ip−D(λ0)W]. (4.2)

Note that the errors have mean zero and are serially uncorrelated. Model (4.1) has a VAR representation, so it is stationary when all the eigenvalues of matrix are smaller than one in absolute value. From (4.2), we can note that contains the eigenvalues of whereas only affects its eigenvectors (see the proof of Theorem 1). Therefore, we must consider the following assumptions:

• and , for all , and vector is not scalar;

• for all and vector is such that matrix has full rank;

• the errors are serially independent and such that and for all , for some ;

• the spatial matrix is nonsingular and has zero main diagonal.

Assumption implies stationarity. Moreover, it guarantees that there are at least two distinct values in vector so that model (2.3) is identifiable, as shown in Theorem 1. Assumption is clearly necessary to assure that matrix can be inverted so that the reduced model in (4.1) is well defined (Remark 3 indicates what happens when this assumption is not satisfied). Incidentally, it is worthwhile to note that our setup automatically solves the problem concerning the parameter space of , highlighted at the end of section 2.2 by [Kelejian & Prucha(2010)]. So, in the empirical applications of our model, it is possible to use any kind of normalization for (i.e., row-factor normalization or single-factor normalization), since the vector would automatically rescale accordingly (see section 4.1 for more details on this aspect). This means that the coefficients can also take values outside the classic interval . Assumption assures that the results in [Hannan(1976)] can be applied to show the asymptotic normality of the estimators. Assumption is classic in spatio–temporal models and guarantees that the model is well defined and identifiable with respect to all the parameters, also for (see Lemma 2 and Theorem 4).

Under assumptions , it is immediately evident that the estimators and , presented in section 3, are both consistent following Theorem 11.2.1 in [Brockwell & Davis(1987)]. For asymptotic normality, the following theorem can be stated.

###### Theorem 3

Consider and , the estimators obtained by the algorithm in section 3. Under assumptions , we have for finite

 √T(ˆλji−λji)\lx@stackreld⟶N(0,DTjiVjiDji)j=0,1;i=1,…,p,

where are the vectors, and are the matrices of order with (see the proof).

Note that the estimators are unbiased for all and for all .

In the high dimension, we have infinite parameters to estimate ( in total, where ). Therefore, we must assure that the consistency of the estimators is still valid in such a case. As expected, the properties of matrix influence the consistency and the convergence rates of the estimators when . For example, denote with the number of nonzero elements in vector . If as , for all , then the effective dimension of model (2.3) is finite and Theorem 3 can still be applied for the consistency and the asymptotic normality of the estimators , even if . The following Theorem 4, instead, shows the consistency of the estimators under more general vectors , with as .

### 4.1 Asymptotics for high dimensional setups

In model (2.3), the spatial correlation between a given location -th and the other locations is summarized by . If the vector is rescaled by a factor , then we can have an equivalent model by rescaling the spatial coefficient by the inverse of the same factor, since . In such a way, we may consider irrelevant a row-normalization of matrix if we let the coefficients in rescale accordingly. Such an approach is not new and follows the idea of [Kelejian & Prucha(2010)]. We use this approach here in order to simplify the analysis and the interpretation of the stationary SDPD model in the high dimensional setup.

In fact, when and , the vectors may change with and this may have an influence on the scale order of the process. This happens, for example, if we consider a row-normalized spatial matrix , since the weights become infinitely small for infinitely large . Looking at the (4.2), model (2.3) appears to become spatially uncorrelated for because matrix tends to be asymptotically diagonal (for and given). As a consequence, the model appears to become not identifiable in the high dimension with respect to the parameters . To avoid this, we assume here that also the coefficients may depend on the dimension , borrowing the idea of [Kelejian & Prucha(2010)]. In such a way, we can derive the conditions for the identifiability of the model in the high dimension and better convergence rates for the estimators. This is shown by the following theorem.

###### Theorem 4

Consider and , the estimators obtained by the algorithm in section 3. Assume that the number of nonzero values in is for all . Under assumptions , for we have the following cases:

• if the vectors are normalized by norm then

 ∣∣ˆλji−λji∣∣=Op(T−1/2)for j=0,1;i=1,…,p,

provided that ;

• if the vectors are normalized by norm and then

 ∣∣ˆλji−λji∣∣=Op(T−1/2)for j=0,1;i=1,…,p;
• for generic (not normalized but bounded) vectors and we have

 ∣∣ˆλji−λji∣∣=Op(pT−1/2)for j=0,1;i=1,…,p.

As shown by Theorem 4, cases (i) and (ii), if we consider a row-normalized spatial matrix , our estimation procedure is consistent for any value of and with at any rate. In other words, the convergence rate is not affected by the dimension . However, there are some differences between the two cases of and normalization. In the first case, we need to impose that the spatial coefficients increases in the order as (otherwise the model becomes not identifiable in the high dimension), whereas in the last case of norm they can remain constant for . In case (iii), which is more general because it is valid for any , we need to impose in order to guarantee the consistency of the estimators.

To complete this section, we want to show the class of processes that can be analysed by our stationary SDPD model. Under assumption , any stationary SDPD model can be equivalently represented as a VAR process as in (4.1), with respect to an autoregressive matrix coefficient defined in (4.2). Now, by exploiting the simple structure of our model, we can show the conditions under which the opposite is true. The following corollary derives from standard results.

###### Corollary 1

Given a stationary multivariate process , with satisfying assumption , a necessary and sufficient condition to represent the process by a stationary SDPD model is that matrix is diagonalizable. Therefore, matrix must have linearly independent eigenvectors. This is (alternatively) assured by one of the following sufficient conditions:

• the eigenvalues of matrix are all distinct, or

• the eigenvalues of matrix consist of distinct values having geometric multiplicities , such that .

By corollary 1 and assumptions , the VAR processes that cannot be represented and consistently estimated by our stationary SDPD model are those characterized by a matrix with linear dependent eigenvectors (i.e., those with algebraic multiplicities) or those with complex eigenvalues. In order to apply our model to those cases also, we should generalize the estimation procedure using the Jordan decomposition of matrix . However, we leave this topic to future study.

## 5 Simulation study

This section contains the results of a simulation study implemented to evaluate the performance of the proposed estimation procedure. In section 5.1, we describe the settings and check the validity of the assumptions for the simulated models. Then, in section 5.2, we evaluate the consistency of the estimation procedure and the convergence rate for the estimators using a known spatial matrix. Finally, section 5.3, we analyse the case when the spatial matrix is unknown, and therefore, to be estimated.

### 5.1 Settings

We consider three different spatial matrices. In the first, we randomly generate a matrix of order , and we post-multiply this matrix by its transpose in order to force symmetry. The resulting spatial matrix is denoted with . Note that such a matrix is full, and it may have positive and negative elements. In the other two cases, the spatial matrix is sparse and has only positive entries: is generated by setting to one only four values in each row while is generated by setting to one elements in each row. For all three cases, we check the rank to guarantee that the spatial matrix has linearly independent rows. Moreover, we set to zero the main diagonal, and we rescale the elements so that each row has norm equal to one ( row-normalization).

For the error process, we generate independent Gaussian series with mean zero and standard error , where the values are generated randomly from a uniform distribution for . Then, we define the cross-correlated error process , where

 {εti=eti−0.7∗et2for i=3,…,p,εti=etiotherwise.

We generate all from a uniform distribution . The settings above guarantee that assumptions hold. We generate different models with dimensions and sample sizes . Note that we may have . For each configuration of settings, we generate 500 Monte Carlo replications of the model and report the estimation results. All the analyses have been made in R.

### 5.2 Empirical performance of the estimators when W is known

Figure 2 shows the box plots of the estimations for increasing sample sizes and fixed dimension . The four plots at the top refer to the estimation of while the four plots at the bottom refer to that of . Each plot focuses on a different location , where . The true values of the coefficients are shown through the horizontal lines. Note that we have for the first two box plots in each plot, since for this model. The box plots are centred on the true value of the parameters, and the variance reduces for increasing values of , showing consistency of the estimators and a good performance for small /large also.

To evaluate the estimation error, for each realized time series, we compute the average error and the average squared error using the equations below.

 AE(ˆλj)=1pp∑i=1(ˆλji−λji),ASE(ˆλj)=1pp∑i=1(ˆλji−λji)2,j=1,2. (5.1)

Table 1 reports the mean values of (with the standard deviations in brackets) computed over 500 simulated time series for different values of and .

As shown in the table, the estimation error decreases when the sample size increases. It is interesting to note that the estimation error does not increase for increasing values of the dimension . This is more evident from figure 3, which shows the box plots of the average errors (at the top) and (at the bottom) computed for 500 replications of the model, with varying values of , sample sizes , and spatial matrix . We can note from the figure that and are unbiased for all and . Moreover, the variability of the box plots decreases for and fixed : this is a consequence of averaging the absolute error over the locations using equation (5.1).

### 5.3 Estimation results when the spatial matrix is unknown

In this section, we evaluate the performance of the proposed estimation procedure when the spatial matrix is unknown and needs to be estimated. In this case, the estimation error has to be evaluated with respect to matrix in order to include the effects of both and on the final estimations. So, using (4.2), we define the two estimators

 ˆA∗SDPD(W) = [Ip−D(ˆλ0)W]−1D(ˆλ1)[Ip−D(ˆλ0)W]and (5.2) ˆA∗SDPD(ˆW) = [Ip−D(ˆλ0)ˆW]−1D(ˆλ1)[Ip−D(ˆλ0)ˆW], (5.3)

where matrix is assumed to be known in the first case and unknown in the second. When is unknown, we estimate it by the (row-normalized) correlation matrix at lag zero, but other more efficient estimators of can be considered alternatively.

For the sake of comparison, remembering the VAR representation of our model in (4.1), we also estimate matrix using the classic Yule–Walker estimator of the VAR model .

To give a measure of the estimation error, we define

 ASE(A∗(1))=1pp∑i=1(ˆA∗1i−A∗1i)2, (5.4)

where for are the true coefficients in the first row of matrix , and are their estimated values. The box plots in figure 4 summarize the results of the estimations from 500 replications of the model with (at the top) and (at the bottom). We report the average squared error computed by (5.4) in three different cases: the classic Yule–Walker estimator of the VAR model on the left, our estimator proposed in (5.2) with the known spatial matrix in the middle, and our estimator proposed in (5.3) with the estimated spatial matrix on the right.

Figure 4 shows interesting results. First, note that the classic estimator cannot be applied when , and this is a serious drawback of the classic VAR models. On the other hand, the stationary SDPD model is equivalently used to represent the same process but it can always generate an estimation result for all values of and regardless of whether is known or unknown. Moreover, if we compare the box plots, we can note that both the median and the variability of the estimators and are remarkably lower than those relative to the classic estimator (when available) for all sample sizes and dimensions . This deserves a further remark: while it is expected that the estimator performs better than (given that it exploits the knowledge of the true spatial matrix ), it is surprising to also see that the estimator outperforms the classic estimator , notwithstanding the fact that they function under the same conditions (only the time series is observed and no spatial matrix is known). Of course, the ASE of the estimator slightly increases compared to that of the estimator , but its variability remains more or less the same.

## Appendix A Appendix

###### Lemma 1

Consider model (2.3) and its transformation (2.4), assuming a stationary process with white noise errors . If the matrix is diagonal (i.e., there is no cross-correlation in the error process), then the matrix is symmetric.

Proof. Assume for simplicity that the stationary process has zero mean. By (2.4), we have

 z(0)t(z(0)t−1)T = D(λ1)z(0)t−1(z(0)t−1)T+\boldmathεt(z(0)t−1)T E{z(0)t(z(0)t−1)T} = D(λ1)E{z(0)t−1(z(0)t−1)T}+E{\boldmathεt(z(0)t−1)T} Σ01 = D(λ1)Σ00 Σ01(Σ00)−1 = D(λ1). (A.1)

Now, under the assumption that is diagonal (i.e., there is no spatial correlation in the error process), it is easy to show that is also diagonal. Therefore, by (A.1) and the diagonality of matrix , the matrix must also be diagonal.

So, defining and remembering that , we can write

 Σ01 = cov(z(0)t,z(0)t−1)=E{W0ytyTt−1W0T}=W0Σ1W0T Σ1 = (W0)−1Σ01(W0T)−1≡((W0)−1Σ01(W0T)−1)T.

All this implies that matrix is symmetric.

###### Lemma 2

Consider model (2.3) and its transformation (2.4). Under assumptions -, for fixed we have for all . Moreover, , and .

Proof. Without loss of generality, we assume that and for all . Given (2.6), we can write so that

 eTi[Σ1−λ1iΣ0]=−λ0iwTi[Σ1−λ1iΣ0], (A.2)

from which we can see that the -th row of matrix is equal to a linear combination of the other rows using the weights . Remember that has the -th component equal to zero and by assumption . Therefore, matrix is singular and has a rank of less than . Without loss of generality, we assume that the coefficients are distinct for all . If we post-multiply by , we get

 [Σ1−λ1iΣ0]Σ−10=Σ1Σ−10−λ1iIp,

which has rank since matrix is diagonalizable by (4.2), and is one of its eigenvalues (see Theorem 4.8 of [Schott(2005)]). Given that , we conclude that matrix has rank ; so it has linearly independent rows. By (A.2), we must conclude that the linear independent rows are those extracted by vector , which identify a nonsingular submatrix extracted from . This sub-matrix is therefore definite positive or definite negative, and the first part of the Lemma is proved.

The last part of the Lemma is a direct consequence of the first one. In particular, note that the zero value is not admissible for the coefficient since it would be in contrast with the assumption . In fact, if for a given , it means that there is not spatial correlation between the -th location and the others, so that cannot be different from the null vector. Therefore, assumption implies that for all (but see also Remark 3).

Proof of Theorem 1. Consider a given . To show point 1, from (2.7) and assumption , we can write

 aTiΣ1Σ−10=λ1iaTi, (A.3)

where . So, it is evident that is an eigenvalue of the matrix , and is an eigenvector associated with . Given that by assumption , the eigenvector is a continuous function of . Now, remember that has all zeroes except for a single instance of one in position while has a zero in position . So, the vector has a one in position , and this can be used in order to univocally identify the eigenvector among the eigenvectors associated with . Therefore, we can conclude that only one combination of eigenvalue/eigenvector satisfies (A.3) identifying a unique couple of values .

Now, we analyse what happens if there are multiplicities in the vector . Under assumption , multiplicities of order in are allowed since there are linearly independent eigenvectors in matrix . But if the multiplicity is of order , which means that all the s in vector are the same, the equation system in (2.7) would admit infinite solutions, and model (2.3) would not be identifiable (note that in such a case, the true model would be the classic SDPD of [Lee & Yu(2010a)] instead of the stationary SDPD proposed here). However, the last case is excluded by assumption .

So, under assumptions , the equation in (2.7) actually admits only one solution. Denote such a solution with . For point 2 of the Theorem, we show that this solution is also an extreme point of the following function:

 λ1i=cov(z(0)it,z(0)i,t−1)/var(z(0)i,t−1)≡λ1i(λ0i). (A.4)

Now, we use the following arguments. We find the extreme points of as a function of , and then, we check if one of these satisfies the constraint in (2.7).

The first-order condition to find the extreme points of (A.4) requires that

 ∂∂