# Marginal process framework:

A model reduction tool for Markov jump processes

###### Abstract

Markov jump process models have many applications across science. Often, these models are defined on a state-space of product form and only one of the components of the process is of direct interest. In this paper, we extend the marginal process framework, which provides a marginal description of the component of interest, to the case of fully coupled processes. We use entropic matching to obtain a finite-dimensional approximation of the filtering equation, which governs the transition rates of the marginal process. The resulting equations can be seen as a combination of two projection operations applied to the full master equation, so that we obtain a principled model reduction framework. We demonstrate the resulting reduced description on the totally asymmetric exclusion process. An important class of Markov jump processes are stochastic reaction networks, which have applications in chemical and biomolecular kinetics, ecological models and models of social networks. We obtain a particularly simple instantiation of the marginal process framework for mass-action systems by using product-Poisson distributions for the approximate solution of the filtering equation. We investigate the resulting approximate marginal process analytically and numerically.

###### pacs:

©2018 American Physical Society.

## I Introduction

Markov jump processes (MJP) have many applications across science and engineering. The master equation (ME) van Kampen (2007), which governs the time evolution of the probability distribution of the process, is generally too complicated to solve analytically, and frequently infeasible to solve numerically. A popular alternative is the stochastic simulation (Gillespie) algorithm Gillespie (1977), which produces samples of trajectories of the process. For larger systems, however, this approach can be computationally very expensive, especially if the system exhibits multi-scale behavior.

In many cases, the MJP is defined on a product-form state-space and only one component is of direct interest, while the other component can be considered a nuisance variable.
The question arises whether it is possible to derive a reduced description for the stochastic process corresponding to the component of interest only.
Mathematically, the remaining component then has to be marginalized out of the full stochastic process.
This approach has been used for reaction networks in the case when the variables of interest do not influence the nuisance variables Zechner and Koeppl (2014), and also in other contexts Giesecke *et al.* (2011), mostly to speed up stochastic simulations.

A large number of other model reduction methods have been proposed in the literature. Many of these are based on time-scale separation or abundance separation, a non-exhaustive list being Cao *et al.* (2005); Hellander and Lötstedt (2007); Jahnke (2011); Thomas *et al.* (2012); Constable *et al.* (2013); Kang and Kurtz (2013); Kim *et al.* (2017).
Other approaches based on marginalization have recently been published Rubin *et al.* (2014); Bravi *et al.* (2016); Bravi and Sollich (2017) and focus on stochastic differential equation models.

In this article, we extend the marginalization approach of Giesecke *et al.* (2011); Zechner and Koeppl (2014) to a general MJP with full coupling between variable of interest and nuisance variable.
Marginalization requires the solution of the (in general, infinite-dimensional) filtering equation, which describes the evolution of the conditional probability of the nuisance variable given the trajectory of the marginal process.
We use entropic matching Ramalho *et al.* (2013); Bronstein and Koeppl (2018) to obtain a finite-dimensional approximation. The filtering equation and entropic matching can be interpreted as the result of projection operations consecutively applied to the full ME of the joint process. In this way, we obtain a principled model reduction method.
Our focus is on the marginal process framework as a theoretical tool for model reduction, rather than as a method for more efficient stochastic simulation.

A particularly important class of MJPs with applications in chemical kinetics, biological population models and models of social interactions are reaction networks.
For reaction networks with mass-action kinetics, a particularly simple reduced description is obtained when a product-Poisson ansatz distribution is used for the approximate solution of the filtering equation. We call the resulting reduced model the *Poisson-marginal process* and investigate it in detail.
Analogously, for exclusion processes, a product-Bernoulli ansatz distribution leads to what is the simplest possible reduced model within our framework. We investigate this reduced process for the example of the totally asymmetric exclusion process (TASEP) on the line with open boundaries.

This paper is organized as follows: After describing the problem setting in Sec. II, we provide an outline of the proposed method in Sec. III, using a simple model of constitutive gene expression as a running example. The general form of the marginal process framework is derived in Sec. IV. The finite-dimensional approximations necessary for a tractable description of the marginal process are discussed in Sec. V, where we also apply our method to the TASEP as a first example. The Poisson-marginal process for mass-action reaction networks is discussed in Sec. VI.

## Ii Setting

We consider an MJP on a product-form state-space , where and are countable sets. The marginal probability distribution of such a process (with initial distribution at time 0) is governed by the ME

where is the rate of transitioning from state to state . We will also require the backwards evolution operator , which acts on functions and is given by

(1) |

It is the adjoint of with respect to the pairing . Recall that governs the moment equations of the stochastic process. Thus, for a function , the expectation with respect to the distribution evolves according to

(2) | ||||

We are particularly interested in reaction networks, consisting of species and reactions that are specified as

(3) |

for . Here we have divided the set of all species into the species of interest, the *subnet*, and the remaining species , the *environment*.
We are interested in the case when the copy numbers of some of the species might be small, so that a fully stochastic description in terms of an MJP is necessary.
The process then describes the state of the subnet species, while describes the state of the environment species.
The state-space is given by and .
For a state , for each there exists a transition to the state with rate and change vector with and .
The operators and take the form

In all examples treated in this article, we will employ mass-action kinetics, which are given by with

(4) |

where denotes the falling factorial, is a reaction rate constant and where we have introduced the system size in terms of which we will analyze the behavior of the approximate marginal process.

We will also require a description of the process in terms of the reactions. We associate with each reaction channel a counting process which counts the number of firings of reaction over the time interval . The process can again be seen as a reaction network of the form (3) with values in , which can only change by increments of size 1 in any one of its components at a single time. The state of the original process is recovered from the state of these counting processes via

## Iii Outline of the method

As explained in the introduction, our goal is to derive a marginal process description for the process of interest . While the joint process is Markovian, this is no longer the case for the marginal process . The effect of the nuisance variables is implicitly contained in the memory of the process . We now illustrate our proposed method on a simple reaction network from biology. We focus on the underlying ideas and postpone derivations to later sections.

A very simple model of constitutive gene expression is given by the reaction network

(5) | ||||

mRNA |

Assuming that we are interested primarily in the protein dynamics, we will consider the mRNA to be a nuisance species. Our goal is to obtain a marginal description of the protein dynamics. Thus, the mRNA plays the role of the nuisance variable and the protein the role of the variable of interest . Assuming mass-action kinetics, the transition rates of the four reactions in the state are given in Table 1.

The steps to obtain a tractable approximate description of the marginal process are as follows: (i) Determine how the transition rates of the marginal process at time depend on the process history . (ii) Find a description for these marginal transition rates in terms of an evolution equation driven by the process history . The resulting equations are generally infinite-dimensional, but provide an exact description of the marginal process. (iii) Choose an approximation to obtain finite-dimensional equations. We will now carry out these steps for our simple example network.

#### Description of the marginal process

Since the first two reactions in Table 1 do not change the state of , the marginal process consists of two reactions, corresponding to the last two reactions in Table 1. Generally, since the marginal process is no longer Markovian, the transition rates for these reactions will depend on the entire history instead of just on the current state . However, the transition rate of the fourth reaction in Table 1 does not depend on the mRNA abundance. Consequently, its marginal transition rate remains unchanged and is given by . In particular, it depends only on the current state of the marginal process. In contrast to this, the rate for the third reaction does depend on the mRNA abundance. As will be derived in Sec. IV.1, the corresponding marginal transition rate is given by . This is an intuitive result, expressing the fact that in absence of information about the mRNA abundance, the marginal transition rate is given by the expectation of the transition rate conditional on all available information, i.e. conditional on the entire process history .

#### Filtering equation

We are now tasked with computing the expectation . A convenient way to do this is to derive an evolution equation, driven by the marginal process , for the so-called filtering distribution with respect to which this expectation is computed. The resulting equation is called the filtering equation. As the process is a jump process, the trajectory is piecewise constant. The filtering equation for will thus consist of two parts: Continuous evolution (described by a differential equation) as long as remains constant, and discontinuous jumps whenever jumps. This is schematically illustrated in Fig. 1.

As will be derived in Sec. IV.3, the continuous evolution is given by

(6) | ||||

Here the expectation is computed with respect to the distribution itself. The first two terms on the right-hand side of (6) simply correspond to the ME for the mRNA alone, the dynamics of which do not depend on the protein abundance. The last term, however, is new and describes how the information that is contained in the trajectory impacts our state of knowledge about the mRNA abundance. Note that the right-hand side of this equation does not depend on the state of the marginal process . This is because the reaction network has a “feed-forward” structure. In general, the filtering equation will depend on the state of .

As explained above, the filtering distribution will also jump instantaneously whenever the driving process jumps (see Fig. 1). At a jump of at time , the corresponding jump of the filtering distribution will depend on which reaction caused the change in . Since the protein decay reaction does not depend on the mRNA abundance, no information about mRNA abundance is obtained when this reaction fires. Therefore, in this case, in analogy to the continuous part (6) of the filtering equation in which the protein decay reaction likewise plays no role. When the protein abundance increases via the third reaction in Table 1, however, we instantaneously receive a finite amount of information about the mRNA state. To understand this, note for example that reaction three can fire only if there is at least one mRNA molecule present, i.e. . Thus, the filtering distribution immediately after the jump, , certainly has to satisfy . As will be shown in Sec. IV.3, the jump in the filtering distribution when reaction three fires is given by

(7) |

In principle, (6) and (7) provide a full, exact description of the marginal process, allowing us to compute the marginal transition rates at any time from the history of the marginal process . For some simple processes, the corresponding equations can be solved in closed form, as will be demonstrated in Sec. IV.4. In general, however, these equations constitute an infinite-dimensional system that does not provide a sufficiently simple description of the marginal process dynamics. We thus have to look for finite-dimensional approximations.

#### Finite-dimensional approximation

Since we are interested only in the expectation of the filtering distribution , it seems reasonable to consider the first-order moment equations for (6) and (7). However, the equations for the mean are not closed, because the second-order moment enters: We obtain

(8) |

from (6) and

(9) |

from (7). To find a tractable description of the marginal process, we employ moment closure to obtain a finite-dimensional system of equations. As will be explained in more detail in Sec. VI, in this article we want to obtain the simplest possible description of the (approximate) marginal process, and so choose a first-order closure, incorporating the mean of the filtering distribution only. A natural choice for such a closure ansatz is the Poisson distribution. Writing for the mean of the Poisson ansatz distribution, we obtain

(10) |

from (8) and

(11) |

from (9). These equations complete our description of the approximate marginal process, which we denote by (and refer to as the Poisson-marginal process) to distinguish it from the exact marginal process . Using (10) and (11), we can compute the marginal transition rates at time based on the full history of the approximate marginal process . We use the fact that knowing the history is equivalent to knowing the histories and of the two processes and (as defined in Sec. II) that count firings of reactions three and four. Solving (10) and (11) in terms of the process histories, we obtain for the marginal rate of the third reaction the expression

The Stieltjes integral here reduces to a sum, because is piecewise constant.

In order to evaluate the quality of our chosen approximation, we can compute the mean and the variance of the approximate marginal process and of the exact marginal process . Using results from Sec. VI.1, we find that the means of the exact and approximate marginal processes coincide at all times, assuming the initial conditions are chosen appropriately. At stationarity, the means are given by . The variances of the processes, however, differ. We compute the relative error of the variance approximation at stationarity and find

One particular regime where the error vanishes is time-scale separation, when with constant.
It is thus natural to compare our approach with an approximation that directly invokes time-scale separation. As mentioned in the introduction, there exist a large number of approaches. For our simple network, however, there is one particularly natural option (e.g. Cao *et al.* (2005)): We consider the process with the rate constant .
One easily checks that at stationarity, the time-scale separation ansatz reproduces the correct mean.
We compare the approximation of the full distributions at stationarity numerically in Fig. 2. We see that the Poisson-marginal process systematically improves on time-scale separation.

## Iv The marginal process and the filtering equation

In this section, we introduce the marginal process framework in full generality and derive the necessary equations for the case of a general MJP defined on a product-form state-space. We then specialize to the case of reaction networks, where it is useful to additionally introduce a slightly modified version of the marginal process.

### iv.1 The marginal process

As explained in Sec. III, the marginal process is in general no longer Markovian, so that the transition rates at time will depend on the entire history of the process over the time interval , instead of just on the current state .
We now proceed to compute these marginal transition rates in a way analogous to Zechner *et al.* (2014).

For the marginal process, the probability for a transition into the state to happen in the time interval , conditional on the process history (and assuming ), is given by

where is the total rate for jumps from the state leading to any state in . Thus, the marginal transition rate is given by

(12) |

i.e. by the expectation of the total transition rate conditional on the entire history of the marginal process up to time .

The distribution with respect to which the expectation is computed is the filtering distribution for the stochastic process given the “observed” trajectory of the stochastic process . The filtering distribution is the solution to the problem of estimating the state of the unobserved variable given the available information about the observed variable.

We see that, in order to obtain a useful description of the marginal process, we require a sufficiently simple description of the filtering distribution, or at least of the marginal transition rates that are computed with respect to the filtering distribution. One way to obtain such a description is to formulate an evolution equation for the filtering distribution driven by the marginal process . For the case of two fully coupled Markov jump processes, we are not aware of the required results existing in the literature, so we provide an elementary derivation. For an overview of stochastic filtering in general, see Bain and Crisan (2009).

### iv.2 The filtering equation

The filtering distribution is, in principle, defined over the state-space of the nuisance variable. It is, however, convenient and natural to consider it as a distribution over the joint state-space via

where is the Kronecker delta. This simply expresses the fact that conditional on , the state of is known to be with probability one. Depending on the situation, either of these two views will be more convenient, so that in the following, we will repeatedly switch between considering the filtering distribution to be defined either on or on .

For the derivations below, the following two operators will be useful: A summation operator and an evaluation operator (which depends on a state ), both of which act on functions . They are defined by

(13) | ||||

We can now derive the filtering equation. The filtering distribution will evolve according to a differential equation in between jumps of the process , and will jump whenever jumps. The intuition here is that, over an infinitesimal time-interval , if the observed process does not jump, we receive only an infinitesimal amount of information so that the change in the filtering distribution should also be infinitesimal. When, however, does jump, we receive a finite amount of information and correspondingly, the filtering distribution has to jump, too. See also Fig. 1.

Assuming that we have observed the process over a time-interval , these observations can be partitioned into the observations up to time , and the observation . We assume sufficiently small such that at most one jump occurred during the time-interval . Using Bayes’ theorem, we have

Multiplying this equation by , using that

and noting the definition of and in (13), we find

(14) |

In the denominator, we also used that

We now have to distinguish the cases and . When , i.e. remained constant over the time-interval , we have . Subtracting from (14), dividing by and taking the limit , we obtain

(15) |

This is the differential equation that the filtering distribution satisfies in between jumps of the process . It turns out that (15) can also be obtained as an orthogonal projection of the full (joint) ME computed with respect to the Fisher-Rao information metric. This point of view is described in Appendix B and will allow us to better understand the finite-dimensional approximation of the filtering equation introduced in Sec. V.2 below.

When , i.e. when jumps during the time-interval , we have . Taking the limit in (14), we obtain an expression for the filtering distribution immediately after the jump, , in terms of the filtering distribution immediately before the jump, , given by

(16) |

where is the value of after the jump.

We now write down expressions (15) and (16) explicitly in terms of the transition rates. The explicit expressions are simpler if we regard the filtering distribution as being defined only over , i.e. . We define

the total rate of those transitions out of state that change the -component. In between jumps of , we then have

(17) | ||||

where denotes the expectation computed using the filtering distribution . The first term on the right-hand side of (17) is an ME for the nuisance component involving only those transitions which do not change the -component of the state. Note that the corresponding transition rates can still depend on the current state of . The second term in (17) accounts for the observations. Here the observations contain information by virtue of the fact that does not jump as long as (17) is in effect. From this equation, we also see that the effect of “feedback” from the variable of interest to the nuisance variable is very simple: Because is constant between its jumps, is simply fixed to its current value in the transition rates entering (17).

When does jump, so that , the corresponding jump in the filtering distribution is given by

(18) |

The combination of (17) and (18) with the marginal transition rates (12) provides a full description of the marginal process . For simple processes, these expressions can be evaluated and solved in closed form, as will be demonstrated for a simple reaction network in Sec. IV.4. Before discussing the example, we specialize the discussion to reaction networks.

### iv.3 Reaction networks

As we can see from (17), the transitions of the MJP are naturally partitioned into two groups: Those that change the state of the -component, and those that do not. We will denote by the subset of indices of those reactions which can modify , and by the indices of all remaining reactions. This partitioning also results in a partitioning of the counting processes (defined in Sec. II) into two processes and with the former containing the reactions in and the latter the remaining reactions in . Note that the state of the subnet can then be recovered from alone, while the state of the environment generally requires knowledge of both and :

We now specialize the results obtained for the marginal process for general MJPs to the case of reaction networks. At this point, there arises an issue regarding the precise definition of the history of the marginal process on which we condition in the marginal transition rates (12). Generally, it can happen that a reaction network contains two different reactions, with different change vectors and , for which however the components corresponding to the subnet are identical, . For example, this is the case for the simple gene expression model with negative feedback

(19) |

Here are the two possible states of a gene, and X is the gene product which is produced when the gene is in state . The gene product can also reversibly bind to the gene and switch it to state , in which production of X is no longer possible. When the gene product X is considered to constitute the subnet, the reactions and both lead to an increase of X of size 1. Similarly, and both lead to a decrease of X of size 1.

For such a reaction network, we obtain two different marginal processes depending on whether the history of the process is defined to be just the trajectory (as was done in Sec. IV.1), or the trajectory of the counting processes of all reactions which change the subnet. In the former case we will speak of the marginal process and in the latter case of the marginal process . Both marginal processes are meaningful, and only minor changes in the derivations presented in Sec. IV.1 and IV.2 are necessary. We will present expressions for both cases, because each version of the marginal process has advantages and disadvantages.

The marginal transition rate for the process for reaction is given by

(20) |

where . This is different from (12), which for reaction networks reads

for a transition with change vector , and where the summation runs over all such that .

The filtering equation, similarly, exists in two variants, depending on which form of the marginal process we consider. It turns out however that the continuous part (15) of the filtering equation is the same for both variants and explicitly reads

(21) | ||||

For the marginal process as defined in Sec. IV.1, the jump in when jumps is given by

(22) |

where the sums in numerator and denominator each run over all reaction indices such that . If instead we consider the marginal process , a transition leads to a jump in the filtering distribution given by

(23) |

The absence of summations in (23) will be useful in Sec. VI.1. Here we proceed to discuss a simple example for which only the marginal process is useful.

### iv.4 Example: A case with finite-dimensional filtering equations

We consider the simple gene expression model (19), with the gene product X chosen to constitute the subnet. For this model, every reaction changes the state of X, so that the marginal process would be equal to the full process and thus of no interest. Consequently, we instead consider the (one-dimensional) marginal process . This process has two reactions, and , with rates at time given by

respectively, where is the filtering distribution mean of the gene state , and where we assumed that only a single copy of the gene is present. The filtering distribution , initially defined on , is fully determined by a single number due to the conservation relation . Similarly, for the expectation values with respect to we have . We can now write down the (one-dimensional) filtering equation using (21) and (22). In between jumps of X, the result reads

(24) |

where . This is solved, for an initial value of at time , by

where we used that is constant and equal to in between jumps. When the reaction fires, the filtering distribution mean jumps to 1. This is clear because both reactions of (19) that cause a change in of size lead to the gene being in state . More interesting is the case when the reaction fires. Then the jump in the filtering distribution mean is given by

(25) |

This completes the description of the marginal process. If we consider as an auxiliary variable and use it to augment the process state, the resulting process is a piecewise-deterministic Markov process with two reactions and deterministic evolution in between jumps given by (24). We show a sample from this augmented process in Fig. 3.

While simple systems such as the one discussed in this section can be treated without approximation, more complicated systems will require an approximate solution of the filtering equations, as was already mentioned in Sec. III. We address this issue next.

## V Finite-dimensional approximations of the filtering equation

The filtering equation is in general infinite-dimensional and, just as the ME, far too complicated for either analytical or numerical solution. Thus, regardless of whether one is interested in the marginal process for analytical investigation or for stochastic simulation, an approximate treatment of the filtering equation is necessary.

### v.1 Moment equations and moment closure

A standard approach to obtain a finite-dimensional approximation is moment closure Kuehn (2016), which in the context of stochastic filtering is also known as assumed-density filtering Brigo *et al.* (1999).
While moment closures have often been considered *ad-hoc* approximations, a simple variational derivation has been obtained recently Bronstein and Koeppl (2018).

We first derive the filtering moment equations in their general form, starting from (15) and (16). The moments of the filtering distribution will evolve according to a differential equation in between jumps of the marginal process, and will jump whenever the marginal process jumps. Again considering to be defined on , we consider the moment equation for a function , which can be obtained as demonstrated in (2) and is given by

(26) | ||||

in between jumps of , where we used that . When jumps, we have

(27) |

We skip explicit expressions in terms of rates for general MJPs and instead write down the simpler explicit expressions for reaction networks, which read (now for a function )

(28) | ||||

in between jumps. Focusing on the marginal process , at a jump of via reaction we have

(29) |

The moment equations are, as is generally the case, not closed: Choosing, for instance, for a reaction network to obtain first-order moments, the resulting equations will depend on moments of order higher than one. In this way, an infinite hierarchy of moment equations is obtained. We require a way to close the system of equations.

In this paper, we consider a special case of moment closure, which has a dual interpretation based on minimization of relative entropy Ramalho *et al.* (2013); Bronstein and Koeppl (2018) and on projection using the Fisher-Rao information metric Brigo *et al.* (1999).
We next present a derivation analogous to Bronstein and Koeppl (2018), which allows for a unified treatment of the continuous and discrete parts of the filtering equation.

### v.2 Entropic matching

A finite-dimensional approximation of a distribution can be obtained by choosing a distribution from within a finite-dimensional parametric family with parameters . There are strong arguments Leike and Enßlin (2017) for choosing this approximation so that it minimizes the relative entropy