VaryingCoefficient Models for Dynamic Networks
Abstract
Dynamic networks are commonly used in applications where relational data is observed over time. Statistical models for such data should capture not only the temporal dependencies between networks observed in time, but also the structural dependencies among the nodes and edges in each network. As a consequence, effectively making inference on dynamic networks is a computationally challenging task, and many models established for dynamic networks are intractable even for moderately sized networks. In this paper, we propose and investigate a family of dynamic network models, known as varyingcoefficient exponential random graph models (VCERGMs), that characterize the evolution of network topology through smoothly varying parameters in an exponential family of distributions. The VCERGM provides an interpretable dynamic network model that enables the inference of temporal heterogeneity in a dynamic network. We establish how to fit the VCERGM through maximum pseudolikelihood techniques, and thus provide a computationally tractable method for statistical inference of complex dynamic networks. We furthermore devise a bootstrap hypothesis testing framework for testing the temporal heterogeneity of an observed dynamic network sequence. We apply the VCERGM to the US Congress covoting network and a restingstate brain connectivity case study and show that our method provides relevant and interpretable patterns describing each data set. Comprehensive simulation studies demonstrate the advantages of our proposed method over existing methods.
1.2
Keywords: Exponential random graph model, Temporal graphs, Basis spline, Pseudo likelihood, Penalized logistic regression
1 Introduction
Networks have been extensively used to explore, model, and analyze the relational structure of individual units, or actors, in a complex system. In a network model, nodes represent the actors of the system, and edges are placed between nodes if the corresponding actors share a relationship. In many applications, the relationships among the actors of a modeled system change over time, necessitating the use of dynamic networks. Two diverse examples, which we analyze later in our application study, include the Congressional covoting networks in Figure 1 and resting state brain connectivity networks in Figure 2. A prominent way to analyze relational network systems is through the use of probabilistic models, or graphical models, which describe the generative mechanism of an observed network. Although there is a rich body of literature on graphical models for static networks [15, 16], the development of interpretable and computationally tractable models for dynamic networks is in its early stages.
An important feature of dynamic networks that needs to be captured in any statistical model is the extent to which its local and global features change through time. We refer to this property as temporal heterogeneity. Heterogeneity directly affects the underlying process that best describes the formation of network. In parametric models, heterogeneity may result in significant changes in parameters that characterize the observed network. Consider the U.S. Senate covoting network shown in Figure 1. One can readily observe an evolution of the network to form distinct clusters of Republicans and Democrats by the 113th Congress. Moreover, this configuration is in stark contrast with the sparse, seemingly random configuration formed in the 40th Congress. On the other hand, the resting state functional magnetic resonance imaging (fMRI) network shown in Figure 2 remains fairly stable through time with only minor local changes in edge formation. These contrasting examples exemplify the need to explicitly model the heterogeneity of a network. We further analyze these dynamic networks in Sections 6.1 and 6.2.
The examples above highlight the importance of accounting for evolving structure when modeling a dynamic network. In this paper, we propose a probabilistic model for dynamic networks called the varyingcoefficient exponential random graph model (VCERGM). The model parameterizes timevarying topological features of dynamic networks in continuous time. Our model builds on two major statistical methodologies. One is the exponential family of random graph models [24, 48] that characterizes the marginal effect of local and global network features on the likelihood of the network. The other major component is a varyingcoefficient specification [21], which flexibly models the changes of effect parameters over time. The VCERGM characterizes the temporal heterogeneity of dynamic network by modeling the parameter associated with each topological feature as a smooth function of time.
By quantifying temporal heterogeneity of a network via fluctuating parameters, we are able to analyze key properties of the local and global features of a dynamic network through the VCERGM. In addition to serving as a means to test for heterogeneity of a dynamic sequence, our model can also be directly used for interpolation of missing networks or edges. For networks at unobserved time points, the VCERGM provides robust estimates that reflect the structure of the unobserved networks without being strongly influenced by outliers in the sequence. Importantly, estimation of the VCERGM can be done with maximum pseudolikelihood estimation (MPLE), enabling computationally efficient inference for dynamic networks with many nodes or time points.
Several dynamic network models that have been investigated. We briefly describe these here. The exponential random graph model (ERGM) is a family of probability distributions on unweighted static network. The ERGM has been adapted to dynamic networks in the pivotal work of [20]. The method is called the temporal exponential random graph model (TERGM). The TERGM models the difference in topological features between every two consecutive networks in a similar fashion to the ERGM. However, it ignores the heterogeneity of the differences, and cannot fully capture the timevarying patterns of the network structure. In fact, we show that in a wide range of situations the TERGM degenerates to a collection of independent and identically distributed ERGMs (see supplementary materials).
The TERGM has been further investigated in many different perspectives. [19] devised the hidden TERGM (HTERGM), which utilizes a hidden Markov process to express the nature of rewiring networks and model a timespecific network topology. [32] generalized the TERGM to the separable TERGM (STERGM). The STERGM flexibly models the formation and dissolution of networks by separately parameterizing prevalence and duration of fluctuations.
Another method for dynamic network modeling is the stochastic actororiented model (SAOM) [41]. It provides an alternative to dyadic models and instead is a localized actorbased model, which characterizes network evolution as a consequence of each actors’ connectivity. Even if the SAOM considers the fluctuation between two time points, it does not provide explicit form to parametrize the fluctuation in network topology. [37] and [38] generalized the latent space model developed by [22] to dynamic networks. The dynamics of network structure is modeled through random effects in a latent space. It focuses on the transition between two time points and provides limited description on overall network.
Compared to timeinvariant models described above, an alternative actorbased model was introduced in [23], where dynamic networks are modeled using multilinear tensor regression. This work adapted autoregressive models to dictate temporal dependence in a sequence of networks, and like the SAOM, proposed an actorbased dependence structure between edges in each network. It directly models the temporal heterogeneity but may not be adequate for larger networks due to its computational complexity. In the meantime, [29] emphasizes on capturing timevarying attributes of dynamic networks and parametrizes the evolving relationship of each edge between nodes as a smooth function of time. Along with kernel smoothing approach, the regularization is utilized to ensure the smoothness. The parameters in the model provide a valuable intuition in understanding the topological change of each edge, but fitting this model for larger networks can be computationally expensive considering the number of parameters.
The VCERGM exploits a varyingcoefficient framework to model the temporal heterogeneity of topological features. In general, varyingcoefficient models form a family of semiparametric models in which the coefficient of some parametric model evolves with characteristics of the coefficient in a nonparametric manner. Varyingcoefficient models were first used to characterize nonlinear effects of covariates on realvalued response variables [21]. Later it was extended to the dynamic generalized linear models [25, 52]. A detailed review of varyingcoefficient models and their applications are provided in [12]. In our proposed model, we model the coefficients of the topological features of a network as a function of time. The marginal parametric model takes the form of an exponential random graph model (ERGM). In this setting, the varying coefficients effectively capture the dynamic pattern of network structure by modeling the variation in the sufficient statistics of an observed sequence of networks.
The remainder of this paper is organized as follows. In Section 2 we introduce the varyingcoefficient exponential random graph model for dynamic networks. Section 3 describes the pseudolikelihood estimation of the VCERGM via penalized logistic regression. A hypothesis testing procedure for formally testing temporal fluctuation in a sequence of timevarying graphs is introduced in Section 4. We assess the performance of the VCERGM on a series of simulated dynamic networks, comparing it to alternative generative graph models in Section 5. In Section 6, we apply the VCERGM to the functional magnetic resonance imaging (fMRI) and covoting networks and analyze the dynamic nature of each of these applications. We conclude with a discussion of our method and open areas for future research in Section 7.
2 Model
We begin by describing the exponential family of random graph models (ERGMs) and their temporal extension, the TERGM, since our proposed model is closely related to these specifications. We then describe our proposed model, the VCERGM.
2.1 Exponential Random Graph Models
Suppose that is an unweighted network with vertices, whose th entry is an indicator that specifies whether or not node and node are connected by an edge. The ERGM is a probability distribution that characterizes the likelihood of via a function of network statistics that describe the topological structure of . One may include, for example, a statistic that quantifies the tendency of the nodes in the network to exhibit reciprocal ties using the reciprocity statistic . [27] describes a means of determining which network statistics to include in a model based on collection of goodness of fit diagnostics. Given , the ERGM posits that is a binary random matrix generated from the following probability mass function
(1) 
where parameterizes the influence of the network statistics on the likelihood of . The ERGM has been successfully applied in a wide variety of fields, ranging from social networks to brain connectivity networks [18, 39, 44]. Recent tutorials of exponential random graph models and their applications are provided in [5, 14, 36].
The TERGM is an important and popular statistical model for inference of dynamic networks [9, 20], and can be described as follows. Consider a dynamic network that is observed at discrete and nonoverlapping time periods, where each graph from is unweighted, and observed for the set of vertices . The TERGM is a generative model for that characterizes the conditional probability of given via an exponential family of probability distributions. Under the first order TERGM, exhibits a onestep Markov dependence between sequential networks as follows.
(2) 
Under (2), one can fully specify the joint probability mass function of by parameterizing the onestep transitions from to . One models these dependencies using a function of transition statistics . These statistics represent the temporal potential over cliques across two sequential networks and can represent, for example, the change in the clustering or the change in overall connectivity between each pair of networks. For a chosen , the firstorder TERGM specifies the likelihood of for as
(3) 
where parameterizes the influence of the transition statistics on the conditional likelihood of given . Suppose that the marginal distribution is specified. The TERGM characterizes the joint distribution of the dynamic sequence by
(4) 
We note that in general if one is able to specify appropriate transition statistics, then the TERGM in (3) and (4) is readily generalized to higherorder Markov dependency; however, for discussion of our equivalence statement in the supplementary materials, we consider the firstorder TERGM only.
2.2 VaryingCoefficient Exponential Random Graph Models
Let be a stochastic sequence of temporally ordered networks observed continuously up to some time . At each time point , represents an unweighted, directed or undirected network with nodes. Our goal is to provide a dynamic network model for that directly accounts for the temporal heterogeneity of its local and global network structure.
The VCERGM consists of two components  (i) an ERGM representation for the marginal likelihood of each observed network, and (ii) the coupling of networks over time via a varyingcoefficient model, where the coefficients at time parameterize the marginal likelihood of the network . We first specify a set of functions for , which quantify the same topological features of network with size . Given and the coefficient vector , the marginal likelihood of at time has an ERGM representation given by
(5) 
A large collection of topological features can be used in the VCERGM. Traditionally, the network statistics are raw counts of different features in an observed network, such as the number of edges (edge density), the number of triangles, or the number of reciprocal edges in a directed network. However, in dynamic networks, networks at different time points may have differing numbers of nodes. Thus it is not appropriate to use the raw counts directly. Instead, one should standardize the network statistics to make them comparable over time. We propose to standardize the raw count of each feature by its maximal possible value, and use the standardized statistics in model (5). Table 1 provides examples of standardized network statistics for a binary graph with nodes, where represents whether there is an edge from node to node in .
Type  Network Statistic  Definition  
Directed  Edge density  
Reciprocity  
Cyclic triad  
Undirected  Twostar  
Triangle  
The coefficients in model (5) characterize the influence of the corresponding network statistics on determining the network structure. When a dynamic network evolves gradually over time, it is reasonable to believe the coefficients will also change gradually. In such a case, can be represented by smooth functions of with continuous second order derivatives over [35]. The temporal dependence among the graphs in are captured by the dynamic pattern of the coefficient functions. This is the fundamental assumption of our model. In the special case where all the separate functions in are constant, the generative models underlying the dynamic networks are identical over time and the VCERGM reduces to a family of marginally identically distributed ERGMs. In Section 4, we introduce a formal hypothesis testing procedure to test the temporal heterogeneity of the coefficients.
3 Estimation
3.1 SplineBased Representation of TimeVarying Coefficients
Without any constraint, the collection of coefficients contain an infinite number of parameters, making inference on (5) intractable. To address this problem, we approximately represent these smooth functions as a linear combination of basis functions. Possible strategies of defining basis functions include piecewise polynomials [7], Fourier series [30], and wavelets [6]. For inferential purposes, we employ basis splines (bsplines) [7, 10] as a way to reduce the dimensionality of estimation. Bsplines are commonly used due to its flexibility in incorporating smoothing constraints.
In particular, we first specify a collection of basis functions , , and then approximate by a linear combination of these functions
where quantifies the contribution of the th basis function on . Let denote the basis coefficient matrix and let be the length vector of basis functions. We can represent the coefficients as
(6) 
The set of basis functions represents the smoothness of , and the coefficient matrix determines the shape and trajectory of the fluctuations through time. Under the basis representation in (6), the distribution of in (5) is fully specified by the parameters in the coefficient matrix .
3.2 Fast Estimation via Maximum Pseudo Likelihood
For an observed dynamic sequence of unweighted graphs , our goal is to estimate the coefficients given the sequence . Let be a vector of length of which elements are the basis functions evaluated at time . By applying the basis representation in (6), we denote as the smooth function evaluated at time . Therefore, this estimation reduces to the task of estimating the coefficient matrix . A major obstacle in obtaining the maximum likelihood estimators of the parameters in Model (5), similar to that of fitting an ERGM, is that calculation of the normalizing constant in the denominator is computationally intractable. Although numerical approaches such as the Markov chain Monte Carlo method can be used to estimate for small networks [28, 50], the computational cost is prohibitive for moderate to large networks, let alone a sequence of networks. To alleviate the computational complexity, we exploit a maximum pseudolikelihood approach, originally adapted for fitting the ERGM [43, 46, 48]. We show that the maximum pseudolikelihood estimator (MPLE) for the VCERGM can be efficiently obtained via maximum likelihood estimation of a logistic regression model. Below we describe the estimation procedure in more detail.
Without loss of generality, we assume that the numbers of nodes in different networks are the same over time, i.e., for all . In the case where varies over time, one can simply use the normalized network statistics described in Section 2. For each observed time point , let denote the binary random variable that describes whether or not there is an edge between node and node at time . Furthermore, let collection of binary random variables that describe whether or not there is an edge between all other pairs of nodes other than the node pair and . For each observed time point , assume the conditional independence between edges. The marginal pseudolikelihood function of given at time is defined as
(7) 
Subsequently, the marginally independent composite pseudo likelihood of model (5) is
The MPLE is obtained by maximizing .
Let denote an matrix whose th entry and let be the same matrix except . Define as the vector describing the elementwise difference in the network statistics when changes from 0 to 1. One can readily show that for each , the following relationship holds for all :
(8) 
Let and let . Similarly, define as the matrix whose th row contains the change in the th network statistic when each edge changes from 0 to 1. Let be the operator that stacks the columns of into a column vector, and let represent the Kronecker product operator. Combining (6) and (3.2) yields
(9) 
Let , and define the design matrix as
The relationship in (9) connects to a logistic regression where represents a design matrix with its coefficient . In [43], it was shown that maximizing the pseudolikelihood PL( in (7) is equivalent to finding the maximum likelihood estimator (MLE) of in the logistic regression model given in (3.2) with independent entries . It follows from the independence of and for that maximizing PL() is equivalent to calculating the MLE of in the logistic regression model treating as mutually independent variables.
3.3 Penalized Logistic Regression
To obtain smooth estimates of the timevarying coefficients , we further consider a roughness penalty on the coefficients of the basis functions [see 10, 25, for example]. A commonly used penalty, which we use throughout this paper, is the integrated squared second derivative defined for th row of , denoted as , as
where a smoothness matrix in this case can be specified as
For networks observed at discrete time points , the th element of is
For more examples of possible penalties, see the Chapter 5 in Ramsay [35]. As the same collection of basis functions are used to express , , via basis representation, we impose the same on all . Consequently, we add the penalty term
to the logistic log pseudo likelihood function. In conclusion, we calculate the penalized pseudolikelihood estimator by maximizing the following penalized log likelihood with tuning parameter :
(10) 
To fit (10), we implement an iteratively reweighted least squares (IRLS) algorithm. A detailed description of this procedure is available in supplementary materials.
4 Testing for Heterogeneity
A key assumption of the VCERGM is that the effects of a specified collection of statistics vary through time. This assumption reflects heterogeneity in an observed sequence of graphs , and provides intuition as to whether or not summaries of can be treated in aggregate. One can formally test for heterogeneity in using a likelihood ratio test (LRT), which we will now describe.
We begin with the preconceived notion that is homogeneous, namely that the coefficients under model (5) are fixed as constants over time. This serves as the null model, under which the VCERGM is equivalent to fitting independent and identically distributed ERGMs. With fixed constants , the null hypothesis corresponding to a homogeneous sequence of graphs can be written as
(11) 
Setting the function is equivalent to writing for all . As a result, the null hypothesis in (11) can be expressed more succinctly as
where is length vector of 1’s. The coefficients under the null hypothesis are the restricted form of the VCERGM where the basis coefficients for each network statistic are constants for all basis functions.
The likelihood ratio test (LRT) is commonly used for conducting the test for heterogeneity in varyingcoefficient models [4, 13, 11, 12]. Due to the dependence between entries in each graph, we utilize a pseudo likelihood ratio test (pLRT) [42]. As previously emphasized in Section 3, the joint pseudo likelihood consists of the distribution of given the rest of the data for all . Furthermore, maximizing the pseudo likelihood simplifies the estimation process as fitting a logistic regression. Namely, with observed networks with nodes, the pLRT compares the pseudo log likelihood function below under the null and alternative hypotheses:
Let and be the estimates of under the null and alternative hypotheses. The estimate can be calculated by fitting the VCERGM specified in (5) and is the estimate from the VERCM with a restriction of constant basis coefficients. Accordingly, let and denote the pseudo log likelihood functions under the null and alternative, respectively. Then, the test statistic is
(12)  
We reject the null hypothesis when where is the critical value of the test with significance level . In general, there exist two strategies commonly used to compute . The first approach is to apply Wilks phenomenon for deriving the asymptotic null distribution of the test statistic [2, 13]. Assumptions required to validate the asymptotic chisquared distribution of the test statistic are presented in [13]. Even though Wilks phenomenon is readily applicable to our case, calculating the degrees of freedom for asymptotic chisquared distribution is difficult, and the chisquared distribution may not be appropriate when the network size is relatively small. Another approach, which is preferable for moderate network size, involves generating bootstrap samples to construct the null distribution of [4, 12, 26]. Analogous to the work in [8, 33, 45], the steps of obtaining the critical value or calculating the value with parametric bootstrapping can be described as follows. For a large value of , the test statistics (12) calculated based on bootstrap samples successfully represent the null distribution of .

Create bootstrap samples. For each bootstrap, indexed by , is a sample from .

For each bootstrap sample , estimate under the null and alternative hypotheses and denote them as and , respectively.

Calculate the test statistic for each bootstrap sample as

The critical value is determined as the th quantile of . The value is the proportion of times that the bootstrap test statistic values exceed the observed test statistic . Define an indicator function which takes a value of 1 if is true and 0 otherwise. Then the value can be written as
The value is then used to determine whether or not to reject the null hypothesis. For values below a specified significance value, , one rejects the null hypothesis in (11) and decides that the sequence of networks does exhibit heterogeneity in its parameters. In our applications below, we choose when evaluating any hypothesis test.
5 Simulation Study
The goal of our simulation study is twofold: (i) to evaluate the power of the hypothesis testing procedure described in Section 4, and (ii) to assess the goodness of fit of the VCERGM on dynamic networks with various magnitudes of temporal heterogeneity. In Section 5.1, we evaluate the sensitivity of the hypothesis test in (11) for detecting temporal heterogeneity in a sequence of networks with fluctuating parameters using both the bootstrap procedure and the chisquared approximation. Section 5.2 assesses the performance of the VCERGM under various varyingcoefficient specifications. We compare the performance of the VCERGM with other competing methods using measurements of bias, variance, and robustness. We further investigate how the VCERGM performs when the networks are observed at unequally spaced time points due to missing networks.
5.1 Power Evaluation for Testing Heterogeneity
We first investigate the power of the hypothesis test for heterogeneity that we introduce in Section 4. To do so, we investigate both Type I and Type II errors of the test on dynamic networks over various magnitudes of temporal heterogeneity. We simulate 100 sequences of dynamic networks , where each sequence , contains networks with 30 nodes observed at equallyspaced times under the VCERGM that models the temporal contributions of the edge density statistic. We set the coefficient on the edge density term, , to be a sinusoidal curve with amplitude and period . In particular, we model
We vary the number of observed time points from 10 to 100, and the amplitude from 0 to 0.3 in increments of 0.05. For each value of and , we calculate the proportion of cases that we reject the null hypothesis at a level out of the 100 simulated dynamic network sequences. Table 2 reports these proportions when using the bootstrap procedure as well as the chisquared approximation.
When , is a constant function and as a result the proportion of rejections in this case provides an estimate for the Type I error of each test. From Table 2, we see that both strategies obtain a Type I error at or below 0.05, as desired. For , the proportion of rejections provides an estimate of the power of the test. We see that for higher signal (larger ) and for a larger number of observed networks (larger ), we obtain a higher power, as expected. Across , we see in general that the bootstrap procedure is consistently more powerful than the use of the chisquared reference distribution for each amplitude value . For the power of both tests reaches 1, indicating that heterogeneity is successfully identified by both tests. These results suggest that both tests are powerful for large enough signal size, and that the bootstrap procedure outperforms the chisquared approximation for small signal sizes (between and ).
Bootstrap  Chisquared asymptotic  

30  50  70  100  30  50  70  100  
0  0.03  0.01  0.01  0.03  0  0  0  0.01  0.03  0 
0.05  0.15  0.36  0.46  0.59  0.71  0.01  0.23  0.36  0.52  0.66 
0.1  0.42  0.77  0.91  0.93  0.97  0.1  0.71  0.9  0.9  0.98 
0.15  0.74  0.98  1  1  0.99  0.35  0.97  1  1  1 
0.2  0.98  1  1  1  1  0.71  1  1  1  1 
0.25  1  1  1  1  1  0.91  1  1  1  1 
0.3  1  1  1  1  1  0.99  1  1  1  1 
5.2 Estimation Performance
We now evaluate the performance of VCERGM to accurately estimate fluctuating parameters , . We consider four different settings for : (i) sinusoidal curve of varying amplitude ; (ii) quadratic curve of varying strength ; (iii) dynamic ErdősRényi random graph with probability of edges; and (iv) nonsmooth (spiky) functions as a form of a sequence of random numbers with varying mean and standard deviation for normal distribution. For each setting of varying coefficients, we model the occurrence of graphs using the VCERGM with edge density and reciprocity statistics (see Table 1). We simulate 100 dynamic sequences of directed graphs where each sequence is observed at equallyspaced time points. We assume that the network size remains constant through time and consider estimation with networks of three different sizes . Furthermore, we repeat (i)(iv) with 1, 5, and 10 randomly chosen networks removed from the time series to evaluate the performance on dynamic networks with observations missing at random.
For each simulated dynamic network, we compare the VCERGM with two other dynamic network models. First, we fit crosssectional ERGMs, where the ERGM in model (1) is fit separately at each of the observed time points. As an alternative competitive method, we also develop an ad hoc 2step procedure, which adapts an ad hoc smoothing procedure after fitting crosssectional ERGMs for observed networks. To assess the performance of each method, we calculate the integrated absolute error (IAE) of the estimated coefficient curves. It measures the sum of pointwise absolute difference between estimated curve and true curve at observed time points , namely
The mean and standard deviation (SD) are calculated to evaluate the performance of our proposed method compared to crosssectional ERGMs and ad hoc 2step procedure. We provide the summary of IAE for each method on dynamic networks with 30 nodes in Table 3 with missing networks. Settings for the results are (i) sinusoidal curves with (edges) and (reciprocity); (ii) quadratic curves with (edges) and (reciprocity); (iii) ErdősRényi with ; (iv) a sequence of random numbers from (edges) and (reciprocity). The performances of crosssectional ERGMs, ad hoc 2step procedure, and VCERGM become more comparable with larger network size. For results of and case, see Tables 5 and 6 in supplementary materials.
Without missing network, Figure 3 shows that crosssectional ERGMs are more likely to introduce unexpected spikes or increased variability in estimating true , compared to VCERGM. Overall, the VCERGM presents smaller deviation from true with smaller variability compared to crosssectional ERGMs and ad hoc 2step procedure. In the first three settings, the VCERGM performs better than crosssectional ERGMs and ad hoc 2step procedure with smaller IAE. In case of nonsmooth functions, with true not smooth but wiggly, the ad hoc 2step procedure shows better performance than the VCERGM with respect to IAE. Both Table 3 and Figure 3 indicate that the VCERGM potentially misses random wiggles, which causes greater bias on average compared to crosssectional ERGMs. Regardless, overall trend that the true nonsmooth presents is fairly well estimated by the VCERGM and the variability of the estimates is still smaller for the VERGM compared to crosssectional ERGMs. Overall, the performance of ad hoc 2step procedure and VCERGM is comparable, but the VCERGM is more principled in terms of incorporating timevarying coefficients in the modeling step. For all four settings, the VCERGM is computationally more efficient than the crosssectional ERGMs. We conduct an additional simulation study specifically tailored to compare the computing time between methods and the results are presented in Table 4.
When there exist missing networks, crosssectional ERGMs are no longer available to provide the estimates at unobserved time points. Therefore, the IAE is calculated only for ad hoc 2step procedure and VCERGM. Notably, the performance of the VCERGM remains stable across each number of missing networks. Crosssectional ERGMs and the 2step approach, on the other hand, suffer more than the VCERGM in the case of missing networks. Indeed, as shown in Table 3, the VCERGM outperforms these competitive methods in the case that observations are missing and is better able to capture the true coefficient curve in these cases.
Missing  Edges  Reciprocity  
ERGM  ERGM2  VCERGM  ERGM  ERGM2  VCERGM  
Sinusoidal  0  12.92 (7.52)  7.55 (8.26)  5.58 (7.48)  14.35 (4.21)  8.01 (4.9)  6.3 (1.91) 
1  7.84 (8.26)  6.42 (7.32)  8.34 (4.86)  6.25 (1.91)  
5  8.44 (8.29)  6.13 (7.46)  9.08 (4.79)  7.21 (1.75)  
10  12.15 (7.68)  11.89 (6.75)  11.37 (3.92)  9.47 (1.57)  
Quadratic  0  6.16 (3.93)  2.75 (4.43)  2.72 (4.41)  8.16 (0.88)  2.74 (1.47)  3.11 (1.18) 
1  2.77 (4.42)  2.75 (4.4)  2.75 (1.49)  3.12 (1.18)  
5  2.89 (4.44)  2.92 (4.41)  2.83 (1.44)  3.25 (1.13)  
10  3.7 (4.32)  3.64 (4.29)  3.44 (1.55)  3.68 (1.19)  
ErdősRényi  0  15.46 (9.83)  6.74 (11.32)  6.13 (11.34)  15.45 (2.24)  5.14 (2.43)  4.58 (1.43) 
1  6.74 (11.34)  6.18 (11.35)  5.12 (2.51)  4.62 (1.47)  
5  6.8 (11.29)  6.28 (11.28)  5.27 (2.5)  4.76 (1.53)  
10  6.98 (11.33)  6.46 (11.27)  5.73 (2.98)  5 (1.64)  
Nonsmooth  0  26.42 (3.37)  25.53 (3.47)  33.35 (1.92)  20.95 (4.44)  19.65 (4.17)  23.28 (2.95) 
1  26.21 (3.44)  33.56 (1.89)  19.86 (4.17)  23.37 (3.01)  
5  26.71 (3.43)  32.82 (1.93)  20.16 (4.2)  23.56 (2.94)  
10  34.79 (2.92)  36.01 (1.76)  22.17 (3.9)  23.38 (2.88) 
In order to compare the computational efficiency, we vary the number of time points and record the computing time for VCERGM and crosssectional ERGMs. Table 4 summarizes the computing times of 500 simulated dynamic network sequences and displays how computing time changes as the number of time points changes. According to Table 4, the VCERGM takes significantly less time than crosssectional ERGMs to complete the parameter estimation. Even if both VCERGM and crosssectional ERGMs show linear increase in computing time, the rate of change is much smaller for VCERGM. Both methods entail separate steps to construct design matrix and response vector at each time point, but the crosssectional ERGMs require separate MPLE steps while VCERGM only needs one estimation. In other words, the longer the time series of networks are, the more efficient VCERGM is compared to crosssectional ERGMs.
Number of time points  

20  40  60  80  100  
ERGM  1.00 (0.05)  2.09 (0.08)  3.25 (0.13)  4.25 (0.11)  5.30 (0.12) 
VCERGM  0.43 (0.03)  0.82 (0.04)  1.25 (0.07)  1.63 (0.07)  2.01 (0.06) 
6 Applications
In this section we apply the VCERGM to two case studies which portray differing amounts of temporal heterogeneity. First, we analyze how the covoting patterns among U.S. Senators have changed through time. In this example, we analyze the effects of political affiliation (Republican or Democrat) on the likelihood of the voting networks. In our second example, we investigate the structural changes of the restingstate functional magnetic resonance imaging (rfMRI) records of healthy individuals. For each case study, we first test for temporal heterogeneity of any statistic included in the model. We fit the VCERGM on the two data sets and also fit crosssectional ERGMs and ad hoc 2step procedure for comparison. In the first example, we witness a clear evidence of temporal heterogeneity that the importance of political affiliation on the likelihood of the voting networks significantly changes over time. On the contrary, relatively stable rfMRI networks show little fluctuations through time and our method is shown to accommodate it successfully as well.
6.1 Political Voting Network
We begin by analyzing the dynamic network that describes the covoting patterns among U.S. Senators from 1867 (Congress 40) to 2015 (Congress 113). Three of the voting networks are shown in Figure 1. This network was first investigated in Moody and Mucha [34] and has been subsequently analyzed in Wilson et al. [49]. The network is based off of the roll call voting data from http://voteview.com, which contains the voting decision of each Senator (yay, nay, or abstain) for every bill brought to Congress. We model the covoting tendencies of the Senators using a dynamic network where nodes represent Senators and an edge is formed between two nodes if the two Senators vote concurrently (both yay or both nay) on at least 75% of the bills to which they were both present.
As shown in Figure 1, there exist great fluctuations in network structure through time. Previous analyses in Moody and Mucha [34], Wilson et al. [49] have identified significant changes in the community structure of the network over time, and that this community structure is closely associated with political affiliation of the Senators. To account for these fluctuations, we incorporate a nodematch statistic for political affiliation which counts the number of edges shared between Senators who have the same political affiliation. Furthermore, we include a statistic which models popularity (twostar) and clustering (triangle) of the Senator networks. See Table 1 for more details of these two statistics. We note that it is of separate interest to perform model selection for the VCERGM; however, we pursue this in future work. Here, we compare and contrast the results of a simple model when using several viable dynamic network models.
The estimated parameters from i) crosssectional ERGMs (ERGM), ii) ad hoc 2step procedure (ERGM2) and iii) VCERGM are presented in Figure 4. Notably, all three network statistics exhibit temporal heterogeneity. The asymptotic chisquared value for testing heterogeneity is . Consistent with our simulation results, the crosssectional ERGMs exhibits spiky estimates, but the ad hoc smoothing recovers the lack of smoothness efficiently and produces similar estimates as the VCERGM. The political nodematch parameter estimate reveals an important trend in the political network. We see that the coefficient value for this term generally increases over time, indicating the increasing importance of political affiliation on the covoting habits of the Senators throughout U.S. history. In particular since Congress 95, this coefficient has significantly increased. This finding matches the current theory of “political polarization” described in [34]. Figure 4 also suggests that the triangle coefficient has decreased in magnitude suggesting that clustering has become less and less important over time.
6.2 fMRI Dataset
We next analyze the structure of brain connectivity in the data provided by the WUMinn Consortium Human Connectome Project (HCP). The dataset is available at https://db.humanconnectome.org. See Van Essen et al. [47] for an overview of data acquisition and analysis. The dataset includes the restingstate functional magnetic resonance imaging (rfMRI) of 500 subjects. For each subject, a 15minute run of rfMRI is recorded. We set 47 local windows and calculate a partial precision matrix between 50 brain regions based on observations within each window. For a transition from precision matrices to sequence of dynamic networks, we define the edge density of a network as the proportion of edges in the network. Once the edge density is specified, the threshold of partial precision value can be determined to form an edge between the brain regions. With the edge density of 10%, for example, the greatest 10% of partial precision values would form edges.
References [39] and [40] were among the first to fit ERGMs to functional brain networks. Their final model included network statistics such as geometrically weighted edgewise shared partner (GWESP) and geometrically weighted nonedgewise shared partner (GWNSP). For sake of comparison, we model our rfMRI networks with three network statistics: edges, triangle and twostar and compare i) crosssectional ERGMs (ERGM), ii) ad hoc 2step procedure (ERGM2) and iii) VCERGM. We note that model selection for the VCERGM (and other dynamic network models in general) is an important area for future research.
Figure 5 shows the results of two individuals from this study. As the data are the restingstate fMRI records, little fluctuation is expected in parameters over time. For both individuals, both ad hoc 2step procedure and VCERGM provide estimates with a small range of fluctuation for all three network statistics. Overall, the ad hoc 2step procedure and VCERGM provide relatively similar estimates, while both estimates cross the crosssectional ERGM estimates. The estimates from crosssectional ERGMs are extremely jagged that they may introduce inaccurate inference with regard to explaining the topological change in brain networks over time. The VCERGM not only produces fairly static estimates but also captures small variations through time more sensitively than ad hoc 2step procedure. Therefore, even with relatively stable dynamic networks, the VCERGM performs consistently well.
7 Discussion
In this paper, we introduce varyingcoefficient models for dynamic networks. In particular, we described the formulation and estimation of the VCERGM, a model that incorporates temporal changes in the coefficients of an exponential random graph family of models. We demonstrated the advantages of applying the VCERGM over competing methods through simulations and two dynamic network case studies. First, the VCERGM provides an intuitive explanation of how a network changes through time. Both the crosssectional ERGMs and ad hoc 2step procedure seemed to capture the temporal heterogeneity in a sense. However, by incorporating the temporal heterogeneity in the modeling step, the VCERGM provides a compact and meaningful model to formally explain the temporal structure of dynamic networks. Second, the VCERGM is robust to perturbations in observed temporal data. By imposing smoothness on the coefficients, we are able to provide robust estimates that are resistant to outliers and noise. Third, the VCERGM enables interpolation for missing networks through time. In practice, one can only observe a finite number of networks in a dynamic sequence, which may be observed in unequally spaced time increments. Estimates of the coefficients to the VCERGM can be evaluated at any time point in the domain and immediately interpreted as the impact of network statistics at that time point. By presenting the results with unequallyspaced networks, we illustrated how the varyingcoefficients through time can be useful especially in terms of interpolation.
Our work provides several avenues for future research. First, it is important to consider the evaluation of goodness of fit and model selection in a dynamic context. Through empirical exploration, we found that the network statistics used to fit a model are often highly correlated. For example, if there exists a triangle in a network, it is more likely to find twostars in the network. Model identifiability should be investigated both in static ERGM models and the VCERGM to ensure appropriate model selection. For static ERGMs, one generally assesses goodness of fit through a comparison of quantitative summaries of simulated networks from the fitted model with the summaries of the observed network [27]. However, for dynamic networks this type of goodness of fit comparison captures only the marginal aspects of the dynamic sequence. How exactly to assess the quality of a dynamic model is still an open problem. A second avenue for future work involves adapting the varyingcoefficient framework introduced here to networks with weighted edges. To do this, one can extend the exponential models of networks for integervalued weights from [31] or to the models of networks for continuousvalued weights considered in [9, 50].
In many dynamic networks, it is often of interest to identify changepoints in the network, namely points in time where the network undergoes significant local or global structural change [51, 1]. It would be interesting to further analyze how to utilize dynamic network models like the VCERGM to identify such changes. The test for heterogeneity that we use in the paper may provide some idea of how to formally test for a change  through identification of a change in network parameter  however, in future research we plan to purse this idea further.
Finally, we will investigate adapting a varyingcoefficient model for networks with a Markov dependency. In this paper, we investigated the dynamics of coefficients for marginal network statistics. However, this model can readily be extended to networks with a Markov dependency like that described by the first order TERGM. In general, for any nonnegative integer , one can model higher order dependencies between successive networks. For example if we assume the onestep Markov dependence in (2), we can model the one step transition between and using a suite of statistics as
(13) 
In model (13), can be modeled as smooth functions that describe the impact of the onestep transition statistics from to . Therefore, model (13) effectively captures the rate of change of the temporal potential between sequential graphs rather than the rate of change of the marginal features as done in this work. Like the TERGM, we can generalize the VCERGM to a higher order Markov dependency, say order , by specifying appropriate transition statistics .
Acknowledgements
The fMRI data were provided [in part] by the Human Connectome Project, WUMinn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
References
 Bindu and Thilagam [2016] PV Bindu and P Santhi Thilagam. Mining social networks for anomalies: Methods and challenges. Journal of Network and Computer Applications, 68:213–229, 2016.
 Boucheron and Massart [2011] Stéphane Boucheron and Pascal Massart. A highdimensional wilks phenomenon. Probability theory and related fields, 150(34):405–433, 2011.
 Brockwell and Davis [2013] Peter J Brockwell and Richard A Davis. Time series: theory and methods. Springer Science & Business Media, 2013.
 Cai et al. [2000] Zongwu Cai, Jianqing Fan, and Runze Li. Efficient estimation and inferences for varyingcoefficient models. Journal of the American Statistical Association, 95(451):888–902, 2000.
 Cranmer and Desmarais [2011] Skyler J. Cranmer and A. Desmarais, Bruce. Inferential network analysis with exponential random graph models. Political Analysis, 19(1):66–86, 2011.
 Daubechies et al. [1992] Ingrid Daubechies et al. Ten lectures on wavelets, volume 61. SIAM, 1992.
 De Boor et al. [1978] Carl De Boor, Carl De Boor, EtatsUnis Mathématicien, Carl De Boor, and Carl De Boor. A practical guide to splines, volume 27. SpringerVerlag New York, 1978.
 De Brabanter et al. [2006] Joseph De Brabanter, Kristiaan Pelckmans, JAK Suykens, and Bart De Moor. Generalized likelihood ratio statistics based on bootstrap techniques for autoregressive models. IFAC Proceedings Volumes, 39(1):790–795, 2006.
 Desmarais and Cranmer [2012] Bruce A Desmarais and Skyler J Cranmer. Statistical inference for valuededge networks: the generalized exponential random graph model. PloS one, 7(1):e30136, 2012.
 Eilers and Marx [1996] Paul HC Eilers and Brian D Marx. Flexible smoothing with bsplines and penalties. Statistical Science, pages 89–102, 1996.
 Fan and Zhang [2000] Jianqing Fan and Wenyang Zhang. Simultaneous confidence bands and hypothesis testing in varyingcoefficient models. Scandinavian Journal of Statistics, 27(4):715–731, 2000.
 Fan and Zhang [2008] Jianqing Fan and Wenyang Zhang. Statistical methods with varying coefficient models. Statistics and its Interface, 1(1):179, 2008.
 Fan et al. [2001] Jianqing Fan, Chunming Zhang, and Jian Zhang. Generalized likelihood ratio statistics and wilks phenomenon. Annals of Statistics, pages 153–193, 2001.
 Fellows and Handcock [2012] Ian Fellows and Mark S Handcock. Exponentialfamily random network models. arXiv preprint arXiv:1208.0121, 2012.
 Fienberg [2012] Stephen E Fienberg. A brief history of statistical models for network analysis and open challenges. Journal of Computational and Graphical Statistics, 21(4):825–839, 2012.
 Goldenberg et al. [2010] Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, and Edoardo M Airoldi. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
 Golub et al. [1979] Gene H Golub, Michael Heath, and Grace Wahba. Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979.
 Goodreau et al. [2009] Steven M Goodreau, James A Kitts, and Martina Morris. Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks. Demography, 46(1):103–125, 2009.
 Guo et al. [2007] Fan Guo, Steve Hanneke, Wenjie Fu, and Eric P Xing. Recovering temporally rewiring networks: A modelbased approach. In Proceedings of the 24th international conference on Machine learning, pages 321–328. ACM, 2007.
 Hanneke et al. [2010] Steve Hanneke, Wenjie Fu, Eric P Xing, et al. Discrete temporal models of social networks. Electronic Journal of Statistics, 4:585–605, 2010.
 Hastie and Tibshirani [1993] Trevor Hastie and Robert Tibshirani. Varyingcoefficient models. Journal of the Royal Statistical Society. Series B (Methodological), pages 757–796, 1993.
 Hoff et al. [2002] Peter D Hoff, Adrian E Raftery, and Mark S Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.
 Hoff et al. [2015] Peter D Hoff et al. Multilinear tensor regression for longitudinal relational data. The Annals of Applied Statistics, 9(3):1169–1193, 2015.
 Holland and Leinhardt [1981] P.W. Holland and S. Leinhardt. An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33–50, 1981.
 Hoover et al. [1998] Donald R Hoover, John A Rice, Colin O Wu, and LiPing Yang. Nonparametric smoothing estimates of timevarying coefficient models with longitudinal data. Biometrika, 85(4):809–822, 1998.
 Huang et al. [2002] Jianhua Z Huang, Colin O Wu, and Lan Zhou. Varyingcoefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89(1):111–128, 2002.
 Hunter et al. [2008] David R Hunter, Steven M Goodreau, and Mark S Handcock. Goodness of fit of social network models. Journal of the American Statistical Association, 103(481):248–258, 2008.
 Hunter and Handcock [2006] D.R. Hunter and M.S. Handcock. Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15(3):565–583, 2006.
 Kolar et al. [2010] Mladen Kolar, Le Song, Amr Ahmed, and Eric P Xing. Estimating timevarying networks. The Annals of Applied Statistics, pages 94–123, 2010.
 Konidaris et al. [2011] George Konidaris, Sarah Osentoski, and Philip Thomas. Value function approximation in reinforcement learning using the fourier basis. In Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence, pages 380–385. AAAI Press, 2011.
 Krivitsky [2012] Pavel N Krivitsky. Exponentialfamily random graph models for valued networks. Electronic Journal of Statistics, 6:1100, 2012.
 Krivitsky and Handcock [2014] Pavel N Krivitsky and Mark S Handcock. A separable model for dynamic networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):29–46, 2014.
 McLachlan [1987] Geoffrey J McLachlan. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied statistics, pages 318–324, 1987.
 Moody and Mucha [2013] James Moody and Peter J Mucha. Portrait of political party polarization. Network Science, 1(01):119–121, 2013.
 Ramsay [2006] James O Ramsay. Functional data analysis. Wiley Online Library, 2006.
 Robins et al. [2007] Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher. An introduction to exponential random graph (p*) models for social networks. Social Networks, 29(2):173–191, 2007.
 Sarkar and Moore [2005] Purnamrita Sarkar and Andrew W Moore. Dynamic social network analysis using latent space models. ACM SIGKDD Explorations Newsletter, 7(2):31–40, 2005.
 Sewell and Chen [2015] Daniel K Sewell and Yuguo Chen. Latent space models for dynamic networks. Journal of the American Statistical Association, 110(512):1646–1657, 2015.
 Simpson et al. [2011] Sean L Simpson, Satoru Hayasaka, and Paul J Laurienti. Exponential random graph modeling for complex brain networks. PloS one, 6(5):e20039, 2011.
 Simpson et al. [2012] Sean L Simpson, Malaak N Moussa, and Paul J Laurienti. An exponential random graph modeling approach to creating groupbased representative wholebrain connectivity networks. Neuroimage, 60(2):1117–1126, 2012.
 Snijders [2001] Tom AB Snijders. The statistical evaluation of social network dynamics. Sociological methodology, 31(1):361–395, 2001.
 Staicu et al. [2014] AnaMaria Staicu, Yingxing Li, Ciprian M Crainiceanu, and David Ruppert. Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis. Scandinavian Journal of Statistics, 41(4):932–949, 2014.
 Strauss and Ikeda [1990] D. Strauss and M. Ikeda. Pseudolikelihood estimation for social networks. Journal of the American Statistical Association, 85(409):204–212, 1990.
 Székely et al. [2016] Eszter Székely, Magda Pappa, James D Wilson, Shankar Bhamidi, Vincent Jaddoe, Frank Verhulst, Henning Tiemeier, and Philip Shaw. Childhood peer network characteristics: Genetic influences and links with early mental health trajectories. Journal of Child Psychology and Psychiatry and Allied Disciplines, 57(6):687–694, 2016.
 Tekle et al. [2016] Fetene B Tekle, Dereje W Gudicha, and Jeroen K Vermunt. Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models. Advances in Data Analysis and Classification, pages 1–16, 2016.
 Van Duijn et al. [2009] Marijtje AJ Van Duijn, Krista J Gile, and Mark S Handcock. A framework for the comparison of maximum pseudolikelihood and maximum likelihood estimation of exponential family random graph models. Social Networks, 31(1):52–62, 2009.
 Van Essen et al. [2012] David C Van Essen, Kamil Ugurbil, E Auerbach, D Barch, TEJ Behrens, R Bucholz, A Chang, Liyong Chen, Maurizio Corbetta, Sandra W Curtiss, et al. The human connectome project: a data acquisition perspective. Neuroimage, 62(4):2222–2231, 2012.
 Wasserman and Pattison [1996] Stanley Wasserman and Philippa Pattison. Logit models and logistic regressions for social networks: I. an introduction to markov graphs and . Psychometrika, 61(3):401–425, 1996.
 Wilson et al. [2016] James D Wilson, Nathaniel T Stevens, and William H Woodall. Modeling and estimating change in temporal networks via a dynamic degree corrected stochastic block model. arXiv preprint arXiv:1605.04049, 2016.
 Wilson et al. [2017] James D Wilson, Matthew J Denny, Shankar Bhamidi, Skyler J Cranmer, and Bruce A Desmarais. Stochastic weighted graphs: Flexible model specification and simulation. Social Networks, 49:37–47, 2017.
 Woodall et al. [2017] William H Woodall, Meng J Zhao, Kamran Paynabar, Ross Sparks, and James D Wilson. An overview and perspective on social network monitoring. IISE Transactions, 49(3):354–365, 2017.
 Zhang et al. [2015] Xiaoke Zhang, JaneLing Wang, et al. Varyingcoefficient additive models for functional data. Biometrika, 102(1):15–32, 2015.
SUPPLEMENTARY MATERIAL
 Stochastic Equivalence under the Difference Statistic Specification:

Comparing the firstorder TERGM with model (1), we see that the TERGM is closely related to the ERGM in that it characterizes the conditional distribution of given using an ERGM representation. Perhaps not surprisingly, these two models are much more closely related than this relationship.
Consider a simple univariate time series represented by the stochastic process for . Without any other information about , a natural nonparametric manner to investigate the rate of change in involves analyzing the difference between sequential observations and , namely analyzing . The analysis of in univariate and multivariate time series is known as differencing, and is a common first step in the analysis of time series data [3]. In the context of the TERGM, differencing corresponds to the analysis of difference statistics, where one specifies transition statistics of the form
(14) where is a topological summary of an input network with vertices. Statistics of the form in (14) can capture, for example, the differences in the edge weight of the network from time to , or the difference in the number of triangles from one network to the next. Although incorporating difference statistics in the TERGM is a natural firststep in the analysis of temporal networks, it turns out that doing so is equivalent to modeling each network as an independent realization from the same exponential family probability mass function. This is made precise in the next proposition.
Proposition 1.
Let denote the family of unweighted dynamic graph sequences on vertices with discrete observations. Suppose that is generated under the TERGM in (3), where for
Suppose is a difference statistic of the form (14) where for some . Then for all , is independent of and can be generated as an independent realization of an ERGM with the following probability mass function
Proposition 1 reveals that under the difference statistic model specification, a dynamic network under the TERGM reduces to an independent and identically distributed sequence of graphs under a corresponding ERGM. Hence under this family of specifications, the TERGM does not capture temporal dependence in the underlying dynamic network sequence. Although in practice one may utilize statistics that are not of the form (4), this relatively simple example motivates further investigation between the relationship of the ERGM and the TERGM.
 Iterative Reweighted Least Squares (IRLS):

The penalized logistic regression problem for fitting a VCERGM is to maximize the following penalized log likelihood function:
The tuning parameter controls the amount of roughness. We implement the iteratively reweighted least squares (IRLS) to fit the logistic regression with the penalty term. Consider a link function and a convex function . The IRLS without penalty term updates at the th iteration
(15) where and is a diagonal matrix with
 Additional Simulation Results:

Tables below show the mean and standard deviation of IAE associated with fitting ERGMs and VCERGMs to temporal networks of size 50 and 100 with 0, 1, 5, and 10 randomly missing networks. The results are from the settings (i) sinusoidal curves with (edges) and (reciprocity); (ii) quadratic curves with (edges) and (reciprocity); (iii) ErdősRényi with ; (iv) a sequence of random numbers from (edges) and (reciprocity).
Missing Edges Reciprocity ERGM ERGM2 VCERGM ERGM ERGM2 VCERGM Sinusoidal 0 10.56 (16.1) 7.74 (16.88) 7.61 (16.92) 8.23 (3.22) 4.87 (4) 4.78 (3.96) 1 7.91 (16.81) 8.28 (16.71) 5.32 (3.9) 5.01 (3.9) 5 9 (16.55) 8.63 (16.65) 6.36 (3.66) 6.3 (3.54) 10 13.17 (15.62) 13.9 (15.4) 9.26 (3.05) 8.92 (2.95) Quadratic 0 6.02 (11.37) 4.14 (11.82) 4.17 (11.8) 5.18 (1.51) 2.14 (1.96) 2.42 (1.84) 1 4.17 (11.79) 4.19 (11.78) 2.17 (1.97) 2.44 (1.86) 5 4.39 (11.72) 4.39 (11.71) 2.29 (1.95) 2.58 (1.83) 10 5.09 (11.54) 5.12 (11.54) 2.77 (1.93) 2.93 (1.84) ErdősRényi 0 14.7 (22.94) 10.03 (24.32) 10.3 (24.25) 8.88 (1.55) 2.77 (1.21) 3.25 (0.89) 1 10 (24.33) 10.33 (24.24) 2.79 (1.21) 3.3 (0.88) 5 10.12 (24.33) 10.46 (24.24) 2.8 (1.26) 3.34 (0.91) 10 10.01 (24.34) 10.43 (24.22) 2.97 (1.44) 3.4 (0.85) Nonsmooth 0 25.88 (7.26) 25.35 (7.34) 32.74 (6.16) 19.87 (8.52) 19.48 (8.61) 23.25 (7.21) 1 26.34 (7.17) 33.1 (6.07) 19.8 (8.58) 23.47 (7.22) 5 26.46 (7.13) 32.25 (6.2) 19.77 (8.68) 23.61 (7.29) 10 34.87 (5.9) 36.28 (5.59) 22.64 (7.81) 23.98 (7.18) Table 5: Simulation results with 50 nodes and (0, 1, 5, 10) missing networks: Mean and standard deviation of the integrated absolute errors (IAE) for each method. Missing Edges Reciprocity ERGM ERGM2 VCERGM ERGM ERGM2 VCERGM Sinusoidal 0 25.96 (35.76) 25.54 (36.02) 25.51 (36.04) 8.88 (6.62) 8.19 (7.03) 8.14 (7.01) 1 25.38 (36.09) 25.44 (36.05) 8.45 (6.82) 8.39 (6.85) 5 26.3 (35.63) 26.28 (35.63) 8.86 (6.61) 8.78 (6.5) 10 29.13 (34.25) 29.29 (34.16) 11.69 (4.91) 11.61 (4.85) Quadratic 0 15.45 (27.98) 14.96 (28.23) 15.04 (28.18) 4.29 (3.36) 3.13 (3.78) 3.24 (3.63) 1 14.87 (28.24) 14.94 (28.21) 3.23 (3.71) 3.32 (3.58) 5 15 (28.13) 15.03 (28.11) 3.34 (3.71) 3.41 (3.6) 10 15.63 (27.81) 15.63 (27.81) 3.77 (3.66) 3.83 (3.55) ErdősRényi 0 37.23 (47.52) 36.48 (48.06) 36.61 (47.97) 3.6 (0.92) 1.02 (0.5) 1.48 (0.39) 1 36.46 (48.07) 36.59 (47.99) 0.99 (0.46) 1.5 (0.4) 5 36.67 (47.94) 36.78 (47.87) 1 (0.47) 1.5 (0.39) 10 36.42 (48.02) 36.58 (47.91) 1.27 (0.56) 1.67 (0.45) Nonsmooth 0 32.48 (19.59) 32.64 (19.57) 37.05 (18.12) 27.37 (15.95) 27.51 (15.8) 28 (14.77) 1 33.48 (19.26) 37.57 (17.9) 27.78 (15.74) 28.31 (14.77) 5 33.25 (19.35) 36.69 (18.2) 27.61 (15.77) 28.37 (14.74) 10 40.36 (17.2) 41.84 (16.62) 29.54 (14.52) 29.37 (14.18) Table 6: Simulation results with 100 nodes and (0, 1, 5, 10) missing networks: Mean and standard deviation of the integrated absolute errors (IAE) for each method.  Rpackage for VCERGM:

Rpackage VCERGM containing code to fit the VCERGM, conduct hypothesis test for heterogeneity, and compare the VCERGM with crosssectional ERGMs as described in the article. The package also contains political voting network dataset as provided as an example in Section 6.1 in the article.