1 Introduction
###### Abstract

We consider multivariate time series on dynamic networks with a fixed number of vertices. Each component of the time series is assigned to a vertex of the underlying network. The dependency of the various components of the time series is modeled dynamically by means of the edges. We make use of a multivariate doubly stochastic time series framework, that is we assume linear processes for which the coefficient matrices are stochastic processes themselves. We explicitly allow for dependence in the dynamics of the coefficient matrices, including of course an i.i.d. structure as is typically assumed in random coefficients models. Autoregressive moving average models are defined in this framework and stationarity conditions are discussed for network autoregressive models. Estimators of the parameters are discussed for various parameterizations of such network autoregressive models and how this can be used to forecast such a process. The finite sample behavior of the forecast approach is investigated and a real data example is presented.

Time Series Modeling on Dynamic Networks
J. Krampe

TU Braunschweig; j.krampe@tu-bs.de

## 1 Introduction

Consider a vertex-labeled network with vertices . The number of vertices is fixed over time, whereas, the edges are time dependent. Thus, over time edges may vanish or new ones may appear. Throughout this work, directed edges are considered and multi-edges can occur. Such a dynamic network with a fixed number of vertices can be described by a time dependent adjacency matrix, here denoted by , where is - valued and gives the number of edges at time from vertex to vertex . The notation is used here for the -th entry of the -th row of . It is considered that the network is driven by some random process, hence, the corresponding adjacency matrix process is a stochastic process. Such networks could describe social networks, where some actors (e.g. persons) are represented by the vertices and these actors have some form of relation (e.g. friendship, communication) which is represented by the edges, see for instance blei2007statistical; morris1997concurrent. Since these relations could change over time, the corresponding network is considered as dynamic. Social media networks such as Facebook or Twitter are examples for dynamic networks. The actors in such networks often posses attributes. These attributes can be static (e.g a person’s name or birthday) or dynamic (e.g. personal income, time a person does sports or political views). These dynamic attributes may be affected by the attributes of other actors, especially by actors with which the considered actor is connected. Such an attribute is denoted as a network-influenced attribute. In this work the dynamic attributes are denoted by a -dimensional time series , where each component of the time series is assigned to a vertex (actor) of the underlying network. In the social-economical literature the influence of connected actors on the attributes is denoted as peer effects, see goldsmith2013social; manski1993identification.

In this work the focus is on the network-influenced attributes and not on the network itself. Consequently, this work is not about modeling a dynamic network. For modeling these dynamic networks, many models for static networks have been extended to the dynamic case as it is done by hanneke2010discrete; krivitsky2014separable for the Exponential Random Graph Models (ERGM), see Section 6.5 in Kolaczyk:2009, or by xu2015stochastic for the stochastic block model (SBM), see goldenberg2010survey. This work gives a framework which models the network-influenced dynamic attributes, that means modeling a time series on a dynamic network in which the edges influence the dependency of the time series. knight2016modelling; zhu2017network have considered these network-influenced attributes for non-random edges, which mainly covers static networks. In the context of a static network, network-influenced attributes can be considered as an ordinary multivariate time series with additional information and can be modeled by using vector autoregressive (VAR) models, see luetkepohl2007new. However, VAR models have many parameters which is why knight2016modelling; zhu2017network focus on how to use the network structure to reduce the number of parameters so that high dimensions become feasible. In contrast, this work deals with a random network structure and consequently the process cannot be modeled appropriately by using VAR models. That is why we make use of a multivariate doubly stochastic time series framework. That is, we consider linear processes or autoregressive models in which the coefficient matrices are stochastic processes themselves. Doubly stochastic time series models were introduced in tjostheim1986some. In this work, a slightly different notion more similar to the one of pourahmadi1986stationarity; pourahmadi1988stationarity is used.

This paper is structured as follows. In section 2 time series on dynamic networks are defined and some basic properties are given. In section 3 the focus is on statistical results; for instance, estimation of a network AR and its usage to forecast is discussed. Some of the forecasting results are underlined by a simulation study which is given in section 4. A real data example is given in section LABEL:net.real.data. Proofs can be found in section LABEL:net.proofs. Figures S1 to S3 can be found in the supplementary material given at https://www.tu-braunschweig.de/Medien-DB/stochastik/tsmnet_arxiv_sup.pdf.

## 2 Time Series Modeling on Dynamic Networks

Recall that the dynamic network with vertices is described by the -valued stochastic process . As mentioned before, the doubly stochastic framework is used to model the time series on the random network . That means, besides some innovation process , the time series is also driven by the stochastic process . Furthermore, it is considered that the underlying network is strictly stationary in order to get a stationary process . Since some interesting features come only into play if non-centered innovations are considered, the innovations posses a mean . The induced norms are used as matrix norms, i.e., . If not stated otherwise, the -norm is used. The main process structure of is defined as follows:

###### Definition 2.1.

Let be a -valued, strictly stationary stochastic process and let be measurable functions. Furthermore, let be an i.i.d. sequence of -valued random vectors with (positive definite and ), and and are mutually independent. If the following -limes exists,

we denote the process given by a (doubly stochastic) network linear process (DSNLP).

Let and be measurable functions. A process fulfilling equation is denoted as a (doubly stochastic) network autoregressive moving average process of order (DSNARMA)

The notation , where and , is used to simplify the notation of DSNLP. Notice that is a stochastic process and independent of . Defining this with a stochastic process not necessarily generated by a random network process leads to doubly stochastic linear processes. For such processes similar results can be established. Since this work focuses on networks, the focus is on doubly stochastic network processes.

There is no single feasible model which covers all kinds of dynamic networks. Instead, there exist several models and each of them is suitable for a specific kind of network. The intuition behind the assumption that and are mutually independent is that the time series can be modeled regardless of what network is underneath. Thus, it does not matter if its a sparse or dense network or if it has properties like small-world-network. In this work, apart from a mixing condition, the dynamic network does not need to fulfill any further conditions. Hence, this assumptions gives flexibility in a way that the time series and the dynamic network can be modeled separately. One is not fixed to one specific network model as it would be the case for a jointly modeling approach. Instead, the idea is that the approach describe here is used to model the time series and one of the several models for dynamic networks can be used to model the network. However, this assumptions is more restrictive. It implies that the influence between the network and the time series is unidirectional; can influence , however, is not influenced by . Some real life examples may violate this assumption. For instance, when considering the influence of peers on obesity, christakis2007spread or grades, goldsmith2013social, the influence can go both ways. But if shorter periods are considered, the influence of on may not come into play. Furthermore, restricting peers to relatives as it is done in the work of christakis2007spread it seems reasonable to assume that one’s properties like obesity or grades do not influence one’s relatives, hence does not influence .

If is a deterministic sequence the DSNARMA are closely related to time-varying ARMA models, which, for instance, are used in the locally stationary framework; see dahlhaus1999nonlinear; wiesel2013time. Furthermore, if is i.i.d. the doubly stochastic framework reduces to the framework of random coefficient models, see for instance nicholls and for the multivariate setting NICHOLLS1981185. However, assuming independence between different time-points for the process , seems to be inappropriate in the framework of dynamic networks. Some form of influence of the recent history seems to be more reasonable, see blei2007statistical. As already mentioned, the focus is not on modeling the network, which is why the dependence structure of the network is not further specified here. However, in order to derive statistical results, the dependence structure needs to be restricted and we consider -mixing (see section 3 for details). Since the innovation process and the network process are independent, and both are stationary, it is an arbitrary choice at which time point the network process is used to define . Thus, a process given by has the same properties. The only difference is the interpretation. Thus, when choosing a definition, one has to answer: ’Does the current network determine how recent effects influence the process. Or does the network, which was present when recent effects occurred, determine how recent effects influence the process?’ If not stated otherwise, we follow the latter interpretation and use the corresponding Definition 2. Since directed edges are considered, two natural dependence concepts occur; the concept that the influence goes in direction with the edge and vice versa. The general definition of DSNLP and DSNARMA can handle both concepts, however, the model given by as well as the models specified in section 3 are defined in the sense that the influence goes in edge direction. That means, if social media data such as from Twitter is considered, a person could be influenced by the persons whom follows. Thus, these persons would have a directed edge to . It is also possible to define it the other way around, see wasserman1994social. However, if represents flow in a network, such as traffic amount at given locations, it seems more appropriate to define the influence in direction of the flow, thus, of the edges.

Consider the following example for the functions and in Definition . The component-wise multiplication of matrices is denote by , thus, for . Let . With and we get the following (doubly stochastic) network autoregressive moving average process of order

In this model, the ’influence’ between components is in direction with the edges and each edge is given a weight. Since indicates that an edge from to is present, we work here with . This model (3) is inspired by knight2016modelling and it coincides with the definition of network autoregressive (moving average) process of order with neighborhood order for all lags, see knight2016modelling. Higher neighborhood orders can be achieved by using more than one adjacency matrix at a time.

In the following Lemma we specify conditions which ensure stationarity of DSNLPs:

###### Lemma 2.2.

Let be a doubly stochastic linear process as defined in . If

1. for all (component-wise)

2. (component-wise),

is fulfilled, then converges component-wise in the -Limit and the autocovariance function is given by and

 ΓX(h)=∞∑s=0E(Bh,s+hΣB⊤0,s)+∞∑j=0∞∑s=0Cov(Bh,jμ––,B0,sμ),h≥0 (4)

and the mean function by .

The latter term of the autocovariance function, , comes only into play for non-centered innovations and is driven by the linear dependency structure of the network. Consequently, it can be seen that the linear dependency of the network directly influences the linear dependency of the process . As a consequence, even an DSNMA process may posses a nonzero autocovariance for lags higher than . In order to better understand this, consider a small toy example with three vertices and two possible edges, and , and only one is present at a time. Let be i.i.d. random variables with uniform distribution on , i.e., . Which edge is present at time is given by the random variables in the following way. If , then if , then else . If (that means ), then if , then else . Consequently, in this network we flip between the edges and and if one edge is present at time it is more likely (with probability ) that it is present at time than flipping to the other edge. We have dependency between different time points as well as between edges. and . Let be given by

Thus, is a DSNMA process and the influence goes in direction with the edges. Since no edge goes into vertex or , and are white noise. This can be also seen in the autocovariance function which is displayed in its two parts in Figure 1. The left-hand-side figure displays the first part; . The dependency of the network has no influence on the first part, thus, this part would remain the same if is replaced by its expected value. That is why this part of the autocovariance function has the structure one expects from a vector moving average (VMA) process of order . The right-hand-side figures display the latter part of the autocovariance function; . As already mentioned, this part is completely driven by the linear dependence structure of the network. For the two edges, we have the following linear dependency: . This explains the geometric decay in the autocovariance function of the third component of , whereas the absolute value of the autocovariance function of the third component is mainly given by the difference of the mean of the innovations of the first two components. Hence, a greater difference of the innovations mean makes it harder to identify the linear dependency between components and , or and respectively. In this particular example with mean , no linear dependency between the different components can be identified for moderate sample sizes. A sample autocorrelation function as well as a realization of the third component of is displayed in Figure 2 for a sample size . Instead, looking from the perspective of the classical time series analysis, the sample autocorrelation function looks like three uncorrelated components where the first two components are white noises and the third could be an AR process. Hence, this examples gives two important aspects to keep in mind: Firstly, the linear dependency of the network can influence the linear dependency of the time series directly. Secondly, the problem that the autocovariance function may not suffice to identify doubly stochastic network models such as DSNAR.

In order to give conditions under which there exists a solution of (2), we firstly consider a DSARMA(1,0), a doubly stochastic autoregressive process of order , given by . In the univariate case pourahmadi1988stationarity gives conditions for the existence of a stationary solution of such processes. We transfer his ideas to the multivariate case in the following lemma:

###### Lemma 2.3 (Multivariate Version of (pourahmadi1988stationarity, Lemma. 2.1)).

Consider a doubly stochastic autoregressive process of order , thus, we have

A stationary solution of is given by

 X––t=∞∑j=0[j∏s=1At−s]ε–t−j=:∞∑j=0Bt,jε–t−j, (7)

where , if (component-wise)

 ∞∑j=0 E|B0,j|=∞∑j=0E∣∣ ∣∣j∏s=1A−s∣∣ ∣∣ <∞, (8) ∞∑j=0 E|B0,jΣdB⊤0,j|=∞∑j=0E⎡⎢⎣∣∣ ∣∣j∏s=1A−sΣd(j∏s=1A−s)⊤∣∣ ∣∣⎤⎥⎦ <∞. (9)

The mean function is then given by and the ACF is given by

 ΓX(h)= ∞∑j=0E⎡⎢⎣(j+h∏s=1Ah−s)Σ(j∏s2=1A−s)⊤⎤⎥⎦ +∞∑j1=0∞∑j2=0Cov(j1∏s1=1A−s1μ––,j2∏s2=1A−s2μ––),h≥0,

. The solution 7 fits into the framework of .

The conditions and may not be easy to check. That is why the following Lemma gives conditions which ensure and :

###### Lemma 2.4.

Let and is -mixing. If there exists a such that

 Elog∥q∏s=1A−s∥<0, (10)

then and is fulfilled, hence and

 ∞∑j=0E⎡⎢⎣∣∣ ∣∣j∏s=1A−sΣd(j∏s=1A−s)⊤∣∣ ∣∣⎤⎥⎦<∞.

In the same manner as a VAR model can be written as an extended VAR model, see (luetkepohl2007new, p. 15), a -dimensional DSNAR model can be written as a -dimensional DSNAR model. Consider a DSNAR model given by . Then define and

such that . We denote the process as the stacked process. Thus, the results for DSNAR models can be transfered to DSNAR models. That is why the focus in this work is on DSNAR models.

Some note to the condition . Consider the simplify setting that is deterministic, thus we consider a simple VAR process . Let the considered VAR process be stable, which is given if for all , see Chapter 2 in luetkepohl2007new. Let be the coefficient matrix of a stacked VAR process. For a stable VAR process we have that all eigenvalues of have modulus less than , see Chapter 2 in luetkepohl2007new. However, for such a stacked coefficient matrix we have that , see Lemma E.2 in basu2015. But, there exists a such that . To see this, let be the Jordan canonical form. Furthermore, we have and , where is the greatest absolute eigenvalue of , see Appendix A.6 in luetkepohl2007new for a representation of . Since , there exists a such that . Consequently, Lemma 2.4 is not limited to DSNAR processes and can be also applied to stacked DSNAR models.

## 3 Statistical Results for Doubly Stochastic Network Processes

Lemma 2.2 gives conditions for the existence of the ACF and the mean function. In the following passage we are interested in estimating these quantities based on observation . Since the dependency of influences the dependency of process , conditions for the dependency of are required to ensure, for instance, an absolutely summable ACF. In order to include many dynamic network models, we are working with an -mixing condition of the dependency of . This, for instance, includes Markovian dynamic networks, see (bradley2007, Theorem 21.22), such as Temporal ERGMs, see hanneke2010discrete. Under the condition that the network process is -mixing, consistency and asymptotic normality of the sample mean are shown in the following theorem. Since most ideas of the proofs of Theorem 3.1 and 3.2 can be reused in the network AR estimation, the consistency of the sample mean and the autocovariance function are shown here in such detail. It is referred to Lemma 3.6 for more details to the used moment conditions.

###### Theorem 3.1.

Let be an -valued doubly stochastic network linear process, with the following assumptions

1. , , and measurable, . is a strictly stationary, -valued, -mixing process fulfilling .

2. The innovations are -valued and i.i.d. with for all . and are independent.

3. for all , (component-wise).

Then, the autocovariance function is absolutely summable and is given by The mean function is given by . Consider observations . We have, as ,

 (11)

In the context of single stochastic linear processes, Assumption 3 is similar to the assumption of dealing with linear processes with absolutely summable coefficients. If the process in Assumption 1 is a Markov process, then the mixing condition is fulfilled under moderate conditions, see for instance Theorem 21.22 in bradley2007 and notice that for some -fields . Under similar conditions as in Theorem 3.1, -consistency of the sample autocovariance can be derived.

###### Theorem 3.2.

Let be an -valued doubly stochastic network linear process, with the following assumptions

1. , , and measurable, . is a strictly stationary, -valued, -mixing process fulfilling .

2. The innovations are -valued and i.i.d. with for all . and are independent.

3. for all ,

(component-wise).

4. for all .

Then, given observations , the sample autocovariance function , where , is a consistent estimator, we have .

Theorem 3.2 gives a consistent estimator for the autocovariance function which helps to identify VAR models, however, as seen in the example in section 2, the autocovariance function is not helpful to identify DSNAR models. Nevertheless, in order to forecast doubly stochastic network processes, an estimation of DSNAR models seems helpful. That is why in the following passage the focus is on deriving consistent estimators for DSNAR models. Hence, a DSNAR processes as defined in and given by is considered. Notice that even if is Markovian a DSNAR processes can generally not be written as a Hidden Markov models (HMM). This is because given is not a sequence of conditionally independent variables and cannot be written as a noisy functional of only, which is required by a HMM (see bickel1998asymptotic for details to HMM). Consequently, techniques used for HMM cannot be applied here. Instead, the same setting as in zhu2017network is considered, thus, the process as well as the network is observed; we have observations and . A DSNAR model is given by the measurable function and the mean of the innovations . We consider three different parametrization-settings for . Ranging from seeing as an arbitrary function to the setting for all edges common parameters. We start here with the general setting that is an arbitrary measurable function. This may sound like an nonparametric setting, however if we consider the case that the number of multi-edges is limited then the process is discrete and bounded. That is why has only a finite number of possible states. Let the possible states of be denoted by and which reduces the problem to a parametric one. However, the number of parameters can be challenging — we will come back to this later. Let be the set of indices at which time points the state is observed. Then we have The least squares approach is used to derive consistent estimators for as well as . Hence, we have

 argmin^μj,^αk;j⋅∑t∈Rk(X––t+1;j−^μj−^αk;j⋅)2,j=1,…,d,k=1,…,N. (12)

This leads to the following linear system:

 ∑t∈Rk(X––t+1;jX––tX––t+1;j)=∑t∈Rk(X––tX––tX––⊤t1X––⊤t)(^μj^αk;j⋅).

By one elementary operation and denoting , we have

 (~∑tX––t+1;jX––t−~∑tX––t+1;j~∑tX––t~∑tX––t+1;j)=(0~∑tX––tX––⊤t−~∑tX––t~∑tX––⊤t1~∑tX––⊤t)(^μj^αk;j⋅) (13)

As can be seen in , the method to derive consistent estimators for is to estimate a somehow localized version of the autocovariance function. Define the conditional covariance as , where if , else . Following similar ideas and conditions as used in Theorem 3.2, as and convergence is meant in probability, we obtained