Variational perturbation and extended Plefka approaches to dynamics on random networks: the case of the kinetic Ising model
We describe and analyze some novel approaches for studying the dynamics of Ising spin glass models. We first briefly consider the variational approach based on minimizing the Kullback-Leibler divergence between independent trajectories and the real ones and note that this approach only coincides with the mean field equations from the saddle point approximation to the generating functional when the dynamics is defined through a logistic link function, which is the case for the kinetic Ising model with parallel update. We then spend the rest of the paper developing two ways of going beyond the saddle point approximation to the generating functional. In the first one, we develop a variational perturbative approximation to the generating functional by expanding the action around a quadratic function of the local fields and conjugate local fields whose parameters are optimized. We derive analytical expressions for the optimal parameters and show that when the optimization is suitably restricted, we recover the mean field equations that are exact for the fully asymmetric random couplings . However, without this restriction the results are different. We also describe an extended Plefka expansion in which in addition to the magnetization, we also fix the correlation and response functions. Finally, we numerically study the performance of these approximations for Sherrington-Kirkpatrick type couplings for various coupling strengths and the degrees of coupling symmetry, for both temporally constant but random, as well as time varying external fields. We show that the dynamical equations derived from the extended Plefka expansion outperform the others in all regimes, although it is computationally more demanding. The unconstrained variational approach does not perform well in the small coupling regime, while it approaches dynamical TAP equations of  for strong couplings.
The kinetic Ising spin glass model is a prototypical model for studying the dynamics of disordered systems. Previous work on this topic focused both on studying the average – over couplings – behavior of various order parameters, such as magnetizations, correlations and response functions, and in more recent years, developing approximate methods for relating the dynamics of a given realization of the model to its parameters. The latter line of work has received a lot of attention in recent years, in part, because of the applications it has on developing approximate inference methods for point processes which in turn are receiving particular attention due to the on going improvements in data acquisition techniques in various disciplines in life sciences.
Most of the early work on the topic dealt with systems with symmetric interactions, until Crisanti and Sompolinsky  studied the disorder averaged dynamics of Ising models with various degrees of symmetry and Kappen and Spanjers  derived naive mean field and TAP equations for the stationary state of the Ising model for arbitrary couplings, in both cases considering Glauber dynamics. Roudi and Hertz  derived dynamical TAP equations (hereafter denoted by RH-TAP) for both discrete time parallel and continuous time Glauber dynamics using Plefka’s method , originally used for studying equilibrium spin glass models, extended to dynamics. This was followed by  who reported another derivation of these equations using information geometry following the approach of . Mezard and Sakellariou  developed a mean field method (hereafter denoted by MS-MF) which is exact for large networks with independent random couplings and an elegant generalized mean field methods was followed in .
In the current paper we follow up on these efforts and report some new results on the dynamics of kinetic Ising model with parallel dynamics. We first look at the relationship between the saddle point approximation to the path integral representation of the dynamics and the simplest variational approach based on minimizing the Kullback-Leibler (KL) divergence between the true distribution of the spin trajectories and a factorized distribution. Although for the standard kinetic Ising model the two methods yield the same equations of motion, we see that this is not in general the case when the probability of spin configurations at a given time given those of the previous time is not a logistic function of the fields. After this, we consider two approaches for going beyond the saddle point solution of the path integral representation of the dynamics of the standard kinetic Ising model with parallel dynamics (defined in more detail in the following sections).
In one of these approaches, which we refer to as Gaussian Average variational method, we perform a Taylor expansion of the action in the path integral representation of the generating functional around a quadratic function of the fields and conjugate fields. As described in described in detail in section (4), we then choose the parameters of this function such that the resulting functional minimally depends on these parameters. We derive analytical expressions for these optimal solutions and show that for a fully asymmetric network under a further assumption about the interaction between the fields and the conjugate fields, we can recover the equations of motion for the magnetization identical to MS-MF equations . Without this assumption we observe that the resulting equations are different from MS-MF. In the second approach, we go beyond the saddle point by performing an extended Plefka expansion. The standard Plefka expansion for the equilibrium model involves performing a small coupling approximation of the free energy at fixed magnetization and is the approach that was originally taken in . As we show here, however, similar to the soft spin models [8, 9], a better description of the dynamics can be achieved by not only fixing the magnetizations but also pairwise correlation and response functions while expanding around the uncoupled model.
2 The dynamical model
We consider the synchronous dynamics of interacting binary spins in the time window defined by
is the total field acting on spin at time composed of the external field and the fields felt from other spins in the system. The function is a generic transfer function or conditional probability of the state of the spin at time given the field at time . Our goal will be to calculate the mean magnetizations of the spins.
The generating functional of the distribution , expressed as a path integral integral form is
having set and . Notice that we assumed the initial state to be uniformly distributed, manifested in the factor in (4), and that we refer to the two auxiliary variables with the compact notation and .
The magnetization of spin at time can then be obtained as the first derivative of the log-generating functional:
Let us make a brief note on how the the integral representation of the generating functional in (4)-(2) has been derived. This is done by first replacing in (2) by and integrating over all while enforcing that at each time step and for each spin by inserting -functions, , in the integral. One then writes this delta function in its the integral representation
which is how the appear in the equations. This rewriting of the generating functional constitutes the first steps in the Martin-Siggia-Rose-De Domenicis-Peliti formalism [10, 11] once it is adapted for hard spins. For more details about this approach and a pedagogical review on its application to soft and hard spin dynamics see  and .
A logistic transfer function in (2), such that , yielding the following probability distribution over spin paths
where we have set and
3 Mean Field
As a prologue to our more important results in the following sections, in this section we review the derivation of mean field equations for the dynamical model in (2) using two approaches. These are the saddle point approximation to the path integral representation of the generating functional in (4), and the minimization of the KL distance between the true distribution in (1) and a factorized one. Despite being formally different methods, in the literature they are both often referred to as mean field and it is indeed well know that for the specific case of the equilibrium Ising model, they lead to the very same set of equations, known as naïve mean field equations . Throughout this section the transfer function in (2) is considered a generic function of the field . Only towards the end of this section we are going to consider as a logistics function of the kinetic Ising model.
3.1 Saddle point mean field
In the equilibrium case, one way to derive the naïve mean field equations is as the equations describing the saddle point approximation to a path integral representation of the free energy, while the TAP equations are those derived by calculating the Gaussian integral around the saddle point . (Another way is by means of Plefka expansion, which at this point we do not discuss but will get back to later on ). Let us consider this saddle point approach for the kinetic model in (2) and the corresponding generating functional (4). Defining a complex measure as
These are the saddle point mean field equations for a general function . Note that the marginal here yields the same expression as the conditional probability in (2), namely except that in (11m), the fluctuating field has been replaced by an effective (mean) field , in analogy with the physical intuition behind the original formulation of the mean field theory by Weiss .
3.2 Mean field from KL distance
A second way of deriving mean field equations, usually employed in the machine learning community, is based on a variational approximation. Within this framework, one approximates the model distribution with a Markovian process that factorizes over the spin trajectories . In other words, assuming
one minimizes the Kullback-Leibler divergence, , between the approximate distribution and the model . In the case of the model defined in (2) and an approximate distribution satisfying (11n) and (11o), the KL-divergence can be rewritten as
where the first line is just the definition of the KL-divergence, in the second line we have exploited the Markovian property of and and assumed , while in the last line we have use the factorizability of over spin trajectories. Notice that the last equality is valid for any choice of and that we have defined as
and denotes all components of apart from . Observe that thanks to the Markovian property of the two distributions and we were able to reduce the average over a dimensional space to a sum of averages over dimensional spaces.
In order to determine the variational mean field equations, one has to minimize the KL-divergence in the space of marginals and transition probabilities . Given that these are not independent, we enforce the constraints:
using Lagrange multipliers , ultimately optimizing the following cost function:
The stationary points of in (11ps) are the zeros of the functional derivatives
that can be reduced to the relation:
It is worth emphasizing that this solution is valid for any Markov chain and any approximate Markov distribution that factorizes over the spin trajectories. From now on we will require the spins at time to be conditionally independent under the model distribution, as in (2). This assumption and a little algebra allow us to simplify (11ptu) as follows:
where we imposed the normalizability to .
If there are no self-couplings in the model distribution , the right-hand side of (11ptv) will not depend on and consequently the solution for the joint distribution will factorize in time. The spin independent 1-st order Markov chain that best approximates the model defined in (2) with , is actually a 0-th order Markov chain. Additionally the absence of self-interactions in makes (11ptv) an explicit relation between the marginal of spin at time and the marginals of all spins but at the previous time step . Since we are dealing with a system of binary units, marginals are fully determined by their first moments, thus the marginal of spin at time , in (11ptv), becomes a function of the magnetizations at time . Taking one step further one can easily verify that the first moments of (11ptv) equal the naïve mean field magnetizations of (11m) if the transition probability belongs to the exponential family with the field as natural parameter
where is a generic function of the state . For the kinetic Ising model is the identity function and the equations for the magnetizations read:
4 Gaussian Average method
What we have shown so far is that the saddle point approximation to the generating functional for the kinetic Ising model and the one based on the KL divergence match each other, although this is not the case for non-logistics transfer functions. In this section, we study an improvement over the saddle point approximation. Our approach is to find the optimal Gaussian distribution for approximating the generating functional perturbatively, and then using the resulting approximation to calculate the magnetizations. This can be thought as an extension to complex measures of a standard variational method: it was taken by Müschlegel and Zittartz  for the equilibrium Ising model, while a general framework is set in . We describe this approach in detail in this section.
where we have defined the complex gaussian measure
parametrized by the interaction matrix and the mean . Here we split the vectors into and into similar to . From now on we will use the form of the action in (9) since we are going to focus on the standard parallel update kinetic Ising model.
The choice of a quadratic form for allows us to easily calculate many of the terms in (11pty), simplifying the expression for the log-generating functional as
where we have replaced (9) in (11pty) and we have performed the change of variables . Notice that when not stated otherwise the sum over runs from to . From now on we will just drop off the superscript from variables .
If all measures were real probability measures, the first order approximation on the right hand side of (11pty) would be an upper bound to the free energy . In this case a minimization of the bound with respect to the variational parameters would be the obvious choice for optimizing the approximation. Since integrations in our case are over complex measures this argument cannot be applied. Instead, we base our optimisation on the idea of the Variational Perturbation method : if the Taylor series expansion of the log generating functional (4)-(9) would be continued to infinite order it would represent the functional and the resulting series would be entirely independent of the parameters of the gaussian measure (11ptza). On the other hand, the truncated series (11pty) inherits a dependence on the variational parameters , , . Hence, one would expect that the truncation represents the most sensible approximation if it depends the least on these parameters. One should therefore choose their optimal values such that the approximation to is the most insensitive to variations of these parameters. This simply corresponds to computing the stationary values of the log generating functional in the , , space. This requirement of minimum sensitivity to the variational parameters was introduced in  as an approximation protocol.
Using the logic in the previous paragraph and setting the first derivative of the expression for in (11ptzaa) with respect to to zero, one gets the equation for stationary , the first moment of the gaussian form for :
where we have defined for :
while for and we have respectively:
Looking for the stationary points of (11pty) with respect to corresponds to solving the following set of equations:
where we have defined .
4.2 Equations for the magnetizations
In the previous subsection we derived expressions for the parameters of the gaussian used for perturbative approximation of the log-generating functional at fixed . Now we want to derive an expression for the magnetizations using (6). We will first perform the derivative of (11ptzaa) with respect to ; notice that even , and are dependent, such that (6) reads:
However, since in our optimization scheme we looked for the stationary values of with respect to the variational parameters, will only consist of its explicit derivative with respect to , leading to:
4.3 The optimized values of the parameters
In principle, one needs to solve the full set of equations (11ptzab)-(4.1) and take the limit of to calculate the magnetization in (11ptzadah). This is obviously a very difficult task to do analytically given the high dimensional integrals that appear in (11ptzac)-(4.1) and that the equations have to be solved simultaneously. The solutions, however, can be very much simplified if we assume
the blocks are of size , and
Observe that the matrix in (11ptzadaj) is a symmetric complex matrix (not hermitian), whose hermitian part is positive symmetric.(Recall that the hermitian part of a matrix is defined as .) This is consistent with its derivation given that — as pointed out in — the gaussian integral converges only if the hermitian part of is a positive symmetric matrix.
In (11ptzadala) and (11ptzadalb) we implicitly state that : as a matter of fact it can be proven to be a mere consequence of the block structure of the matrix , as shown in A. Since this means that the gaussian integral and the model log generating functional match in the limit .
We are now left to evaluate a multidimensional integral in (11ptzadalam). In fact the integration in (11ptzadalam) can be reduced to a one-dimensional integral marginalizing the multivariate gaussian distribution, yielding
where the integral is now over , a normally distributed, zero mean unit variance, random variable.
For performing the one dimensional integral in (11ptzadalan), we need to compute the entries of the inverse of matrix . In B we demonstrate that, given as defined in (11ptzadaj)-(11ptzadalb), the entries of in which we are interested in can be calculated recursively as
Recalling the definitions of the matrices and , one can verify that the magnetizations in (11ptzadalan) only depend on the past magnetizations with , . Since this dependence goes back to it is natural to wonder if the error in estimating the past magnetizations would accumulate impairing the inference process. We notice (not included in section 6) that for the Gaussian Average method knowledge of the history of the experimental magnetization — knowing when computing with (11ptzadalan) — doesn’t affects the reconstruction significantly. Whether we are using experimental magnetizations or approximate ones in (11ptzadalap), we observe that grows exponentially with time for strong couplings while it converges to a finite value for weak couplings. This behavior can be understood by studying the stability of the map (11ptzadalap) of into that defines a dynamical system, as we do in C. Averaging over the disorder one realizes that this dynamical system is chaotic for couplings strength above a certain critical value. Its critical value depends on the degree of symmetry of the connectivity and on the presence of an external field.
4.4 The solution
In principle, the value of limit of of that satisfy the optimality equations, may be non-zero. In this section, we justify the choice of that we made in the previous section. We first note that zero is a good candidate for the optimal value of — here indicates the average under the complex measure in (11ptza)-(11ptzb) — since
where for the kinetic Ising model has been defined in (9) and indicates the average under the complex measure . This choice for the mean in the s can be justified by analogy with the mean in the s: the stationary value for latter is also the saddle point value of the kinetic Ising generating functional, while the saddle point in the s is conventionally set to zero.
Furthermore, we can show that yields a consistent solution. To do this we first note that by inverting the matrix , as shown in B, two point correlation functions and are both zero, where notation indicates averages under the gaussian measure , with . Consequently, we have