Efficient reduction of Kappa models by static inspection of the rule-set

# Efficient reduction of Kappa models by static inspection of the rule-set

## Abstract

When designing genetic circuits, the typical primitives used in major existing modelling formalisms are gene interaction graphs, where edges between genes denote either an activation or inhibition relation. However, when designing experiments, it is important to be precise about the low-level mechanistic details as to how each such relation is implemented. The rule-based modelling language Kappa allows to unambiguously specify mechanistic details such as DNA binding sites, dimerisation of transcription factors, or co-operative interactions. However, such a detailed description comes with complexity and computationally costly execution. We propose a general method for automatically transforming a rule-based program, by eliminating intermediate species and adjusting the rate constants accordingly. Our method consists of searching for those interaction patterns known to be amenable to equilibrium approximations (e.g. Michaelis-Menten scheme). The reduced model is efficiently obtained by static inspection over the rule-set, and it represents a particular theoretical limit of the original model. The Bhattacharyya distance is proposed as a metric to estimate the reduction error for a given observable. The tool is tested on a detailed rule-based model of a -phage switch, which lists rules and agents. The reduced model has rules and agents, and provides a dramatic reduction in simulation time of several orders of magnitude.

\numberofauthors

3

## 1 Introduction

One of the main goals of synthetic biology is to design and control genetic circuits in an analogous way to how electronic circuits are manipulated in human made computer systems. The field has demonstrated success in engineering simple genetic circuits that are encoded in DNA and perform their function in the cellular environment [20], [24]. However, there remains a need for rigorous quantitative characterisation of such small circuits and their mutual compatibility [33], which in electronic circuits is easily guaranteed by impedance matching. The important ingredient towards such characterisation is having an appropriate language for capturing model requirements, for prototyping the circuits, and for predicting their quantitative behaviour before committing to the time-intensive experimental implementation.

Quantitative modelling of biomolecular systems is particularly challenging, because one deals with stochastic, highly dimensional, non-linear dynamical systems. For these reasons, modellers often immediately apply ad-hoc simplifications which neglect the mechanistic details, but allow to predict (simulate) the system’s behaviour as a function of time. For example, the fact that protein activates protein is often modelled immediately in terms of a reaction with the Hill kinetic coefficient (e.g. ), while the mechanism in fact includes the formation of a macromolecular complex and its binding to a molecular target. While such models are easier to execute, the simplification makes models hard to edit or refine. For example - a new experimental insight about an interaction mechanism cannot be easily integrated properly into the model, since several mechanistic steps are merged into a single kinetic rate. Moreover, an abstract model does not provide precise enough design guide for circuit synthesis, and sometimes, only the more detailed models explain certain behaviours (e.g., in [15], it is shown that only when incorporating the mRNA, the model explains certain experimentally observed facts).

Rule-based languages, such as Kappa [18] or BioNetGen [4], are designed to naturally capture the protein-centric and concurrent nature of biochemical signalling: the internal protein structure is maintained in form of a site-graph, and interactions can take place upon testing only patterns, local contexts of molecular species. A site-graph is a graph where each node contains different types of sites, and edges can emerge from these sites. Nodes typically encode proteins and their sites are the protein binding-domains or modifiable residues; the edges indicate bonds between proteins. Then, every species is a connected site-graph, and a reaction mixture is a multi-set of connected site-graphs. The executions of rule-based models are traces of a continuous-time Markov chain (CTMC), defined according to the principles of chemical kinetics. In general, rule-based models are advantageous to the classical reaction models (Petri nets) for two major reasons. First, the explicit graphical representation of molecular complexes makes models easy to read, write, edit or compose (by simply merging two collections of rules). For example, the reaction of dimerization between two lambda CI molecules is classically written , where the convention is that represents a free monomer, and represents a free dimer. On the other hand, the same reaction written in Kappa amounts to:

 'CI2:'CI(ci,or),CI(ci,or)↔CI(ci!1,or),CI(ci!1,or)@k2+,k2−,

where the binding sites ci and or are binding sites of the protein CI, and CI denotes that the identifier of the rule-based bond account for the physical interaction between the two CI monomers, is . Secondly, a rule set can be executed, or subjected to formal static analysis: for example, it provides efficient simulations [11], [29] automated answers about the reachability of a particular molecular complex [13], or about causal relations between rule executions [10].

The downside of incorporating too many mechanistic details in the model, is that they lead to computationally costly execution. For this reason, we define and implement an efficient method for automatically detecting and applying equilibrium approximations. As a result, one obtains a smaller model, where some species are eliminated, and the kinetic rates are appropriately adjusted. In this way, the experimentalist can choose to obtain the predictions more efficiently but less accurately, however without losing track of the underlying low-level mechanisms.

To the best of our knowledge, there exist no efficient methods to quantify the error induced by time-scale separation approximations for biochemical reaction networks. The bottleneck is the complexity of the original system, whose behaviour is computationally costly to analyse - often even to run a single simulation trace. The correctness of our approach relies on the fact that the approximate model is equal to the original one, in the artificial limit where certain reactions happen at a sufficiently larger time-scale than others, and they are seemingly equilibrated shortly upon the reactions initiate. Faced with designing or modelling biological circuits with many connections and highly heterogeneous hardware, the ability of predicting solutions that lack a precise error measurement is of secondary importance. What is desirable is to have a prototype or model of a circuit that displays the desired behaviour at a qualitative level, which later on can be further improved. Furthermore, most kinetic rates are rarely precisely determined experimentally, and hence precise quantitative error estimates are not necessarily relevant and on top are time consuming, when faced with imprecise input characteristics of the underlying prototype model.

Implementation and testing. The tool is implemented in OCaml, and it is tested on a detailed rule-based model of a -phage switch [37], [38]. Simulations were carried out on the complete chemical reaction genetic circuit model which contains 96 rules, 16 agents and 61 species. The model is reduced to only 11 rules and 5 agents. It is worthwhile emphasising that our reduction method is general – applicable to any rule-set, and that the reduced model is obtained almost instantaneously.

Related work. Our method is inspired from the work presented in [36] and [32], with the adaptations which arise due to the differences between the reaction lists and the rule-based language. Apart from rule-based models, other formalisms were proposed for characterisation of synthetic devices, such as, e.g., linear temporal logic [2]. In a broader context, the principle of obtaining conclusions about system’s dynamics by analyzing their model description, originates from, and is exhaustively studied in the field of formal program verification and model checking [7], [6], while it is recently gaining recognition in the context of programs used for modeling biochemical networks. An example is the related work of detecting fragments for reducing the deterministic or stochastic rule-based models [16], [19], [17], detecting the information flow for ODE models of biochemical signaling [26, 5], or the reaction network theory [8]. Related works on systematic model reduction techniques are based on separating time-scales [28, 39, 30, 23, 29], or they propose numerical algorithms which focus of efficiently obtaining the evolution of the probability distribution over time (the master equation) [34, 27].

Paper outline. Section 2 introduces two concepts: (i) the classical stochastic and deterministic model of chemical reaction networks, and (ii) the rule-based modelling language Kappa. Section 3 illustrates the equilibrium approximation schemes (the generalized Michaelis-Menten and fast dimerisation) and discusses the theoretical guarantees about the approximate system. In Section 4, we outline the algorithms for detecting the approximation schemes (operator site reduction and dimerisation reduction). Finally, in Section 5, we describe the -phage model and we compare the results and the CPU time for the original and approximate model.

## 2 Preliminaries

### 2.1 Stochastic chemical reaction networks

For a well-mixed reaction system with molecular species , the state of a system can be represented as a multi set of those species, denoted by . The dynamics of such as system is determined by a set of reactions . Each reaction is a triple , written down in the following form:

 a1jS1,…,anjSn\lx@stackrelkj→a′1jS1,…,a′njSn, such that a′ij=aij+νij.

The vectors and are often called respectively the consumption and production vectors due to reaction , and is the kinetic rate of reaction . If the reaction occurs, after being in state , the next state will be . This will be possible only if for all . Under certain physical assumptions [21], the species multiplicities follow a continuous-time Markov chain (CTMC) , defined over the state space . Hence, the probability of moving to the state from after time is

 P(X(t+Δ)=x+νk∣X(t)=x)=λj(x)Δ+o(Δ),

with the propensity of th reaction, assumed to follow the principle of mass-action: . The binomial coefficient reflects the probability of choosing molecules of species out of available ones.

### 2.2 Deterministic limit

In the continuous, deterministic model of a chemical reaction network, the state is represented by listing the concentrations of each species. The dynamics is given by a set of differential equations in form

 ddtzi=νijr∑j=1cjn∏i=1zi(t)aij, (1)

where is a deterministic rate constant, computed from the stochastic one and the volume from ( denotes the -norm of the vector ). The deterministic model is a limit of the stochastic model when all species in a reaction network are highly abundant. Denote by the number of times that the -th reaction had happened until the time . Then, the state of the stochastic model at time is

 X(t)=X(0)+r∑j=1Rj(t)νj. (2)

The value of is a random variable, that can be described by a non-homogenous Poisson process, with parameter , that is, . Then, the evolution of the state is given by the expression

 X(t)=X(0)+r∑j=1ξj(∫t0λj(X(s))ds)νj. (3)

By scaling the species multiplicities with the volume: , adjusting the propensities accordingly, in the limit of infinite volume , the scaled process follows an ordinary differential equation (1) [31].

It is worth mentioning here that the above scaling from stochastic to the deterministic model is a special case of a more general framework presented in [30], referred to as the multiscale stochastic reaction networks. Intuitively, the deterministic model is a special case where all species are scaled to concentrations and reaction rates are scaled always in the same way, depending on their arity. The reductions shown in this paper will be based on a variant of multiscale framework, where some species are scaled to concentrations and others are kept in copy numbers, and where reaction rates have varying scales as well.

### 2.3 Rule-based Models

In this section, we introduce the rule-based modeling language Kappa, which is used to specify chemical reaction networks, by explicitly denoting chemical species in form of site-graphs. A simple example of a Kappa model is presented in Fig. 1.

For the stochastic semantics of Kappa, that is a continuous-time Markov chain (CTMC) assigned to a rule-based model, we refer to [12] or [17]. Intuitively, any rule-based system can be expanded to an equivalent reaction system (with potentially infinitely many species and reactions). The stochastic semantics of a Kappa system is then the CTMC assigned to that equivalent reaction system. Even though the semantics of a Kappa system is defined as the semantics of the equivalent reaction system, in practice, using Kappa models can be advantageous for several reasons - they are easy to read, write, edit or compose, they can compactly represent potentially infinite set of reactions or species, and, most importantly, they can be symbolically executed.

We present Kappa in a process-like notation. We start with an operational semantics.

Given a set , denotes the power set of (i.e. the set of all subsets of ). We assume a finite set of agent names , representing different kinds of proteins; a finite set of sites , corresponding to protein domains; a finite set of internal states , and , two signature maps from to , listing the domains of a protein which can bear respectively an internal state and a binding state. We denote by the signature map that associates to each agent name the combined interface .

\newdef

definitionDefinition {definition} (Kappa agent) A Kappa agent is defined by its type and its interface . In , the interface is a sequence of sites in , with internal states (as subscript) and binding states (as superscript). The internal state of the site may be written as , which means that either it does not have internal states (when ), or it is not specified. A site that bears an internal state is written (in such a case ). The binding state of a site can be specified as , if it is free, otherwise it is bound (which is possible only when ). There are several levels of information about the binding partner: we use a binding label when we know the binding partner, or a wildcard bond when we only know that the site is bound. The detailed description of the syntax of a Kappa agent is given by the following grammar:

 a::=N(σ)(agent)N::=A∈A(agent name% )σ::=ε∣s,σ(interface)s::=nλι(site)n::=x∈S(site name% )ι::=ϵ∣m∈I(internal state)λ::=ϵ∣−∣i∈N(binding state)

We generally omit the symbol .

{definition}

(Kappa expression) Kappa expression is a set of agents A and fictitious agents . Thus the syntax of a Kappa expression is defined as follows:

 E ::=ε∣a , E∣∅ , E.

The structural equivalence , defined as the smallest binary equivalence relation between expressions that satisfies the rules given as follows

 E , A(σ,s,s′,σ′) , E′≡E , A(σ,s′,s,σ′) , E′E , a , a′ , E′≡E , a′ , a , E′E≡E , ∅i,j∈N and i does not occur % in EE[i/j]≡Ei∈N and i occurs only % once in EE[ϵ/i]≡E

stipulates that neither the order of sites in interfaces nor the order of agents in expressions matters, that a fictitious agent might as well not be there, that binding labels can be injectively renamed and that dangling bonds can be removed.

{definition}

(Kappa pattern,mixture and species)
A Kappa pattern is a Kappa expression which satisfies the following five conditions: (i) no site name occurs more than once in a given interface; (ii) each site name in the interface of the agent occurs in ; (iii) each site which occurs in the interface of the agent with a non empty internal state occurs in ; (iv) each site which occurs in the interface of the agent with a non empty binding state occurs in ; and (v) each binding label occurs exactly twice if it does at all — there are no dangling bonds. A mixture is a pattern that is fully specified, i.e. each agent documents its full interface , a site can only be free or tagged with a binding label , a site in bears an internal state in , and no fictitious agent occurs. A species is a connected mixture, i.e. for each two agents and there is a finite sequence of agents s.t. there is a bond between a site of and of and for , there is a site of agent and a site of agent .

{definition}

(species occurring in a pattern) Given Kappa patterns and , if defines a Kappa species, and is a substring of , we say that a species occurs in a pattern .

{definition}

(Kappa rule) A Kappa rule is defined by two Kappa patterns and , and a rate , and is written: .

A rule is well-defined, if the expression is obtained from by finite application of the following operations: (i) creation (some fictitious agents are replaced with some fully defined agents of the form A, moreover documents all the sites occurring in and all site in bears an internal state in ), (ii) unbinding (some occurrences of the wild card and binding labels are removed), (iii) deletion (some agents with only free sites are replaced with fictitious agent ), (iv) modification (some non-empty internal states are replaced with some non-empty internal states), (v) binding (some free sites are bound pair-wise by using binding labels in ).

In our static inspection of rules, we will test species (fully defined connected mixtures). To this end, we adopt the terminology of reactant, modifier and product from [32].

{definition}

(reactant, modifier, product) Given a rule , a Kappa species is called

• a reactant, if it occurs in pattern and does not occur in pattern ,

• a modifier, if the number of occurrences in pattern equals the number of occurrences in pattern ,

• a product, if it does not occur in pattern , and it occurs in pattern .

{definition}

(Kappa system) A Kappa system is given by an initial mixture , a set of Kappa patterns called observables, and a finite set of rules .

## 3 Model Approximation

In this section, we present the mathematical analysis underlying the approximation algorithms presented in Section 4. Our reductions will be based on three reduction schemes: enzymatic catalysis reduction, generalized enzymatic catalysis reduction and fast dimerization reduction.

### 3.1 Enzymatic reduction

Assume the elementary enzymatic transformation from a substrate to a product , through the intermediate complex :

 E+Sk1\vbox \ooalign{\raise 1.0pt\hbox{\relbar\joinrel⇀\joinrel}\cr\lower 1% .0pt\hbox{↽\joinrel\relbar\joinrel}} k2E:S\lx@stackrelk3→E+P, (4)

which our algorithm will convert to the well-known Michaelis-Menten form

 S\lx@stackrelk3ETK1+KxS→P, (5)

where denotes the total concentration of the enzyme, and .

The above approximation is generally considered to be sufficiently good under different assumptions, such as, for example, that the rate of dissociation of the complex to the substrate is much faster than its dissociation to the product (i.e. ), also known as the equilibrium approximation. Even if the equilibrium condition is not satisfied, it can be compensated in a situation where the total number of substrates significantly outnumbers the enzyme concentration - , known as the quasi-steady-state assumption.

Whenever one of the above assumptions holds, the quantity of the intermediate complex can be assumed to be rapidly reaching equilibrium, that is, . Then, it is straightforward to derive the rate of direct conversion from substrate to product:

 ddtxP=k3ETK1+KxSxS,

which exactly corresponds to the equation for the rule (5).

The informal terminology of being ‘significantly faster’, motivated the rigorous study of the limitations of the approximations based on separating time scales. While the enzymatic (Michaelis-Menten) approximation has been first introduced and subsequently studied in the context of deterministic models (e.g. [35], Ch.6), it was more recently that the time-scale separation was investigated in the stochastic context [39], [25], [9], [28], [40], [22]. Notably, the following result from [14] (also shown as a special case of the multi scale stochastic analysis from [30]), shows that, under an appropriate scaling of species’ abundance and reaction rates, the original model and the approximate model converge to the same process.

###### Theorem 1

(Darden [14], Kang [30]). Consider the reaction network (4) (equivalently the rule-based system depicted in Fig. 2), and denote by , , and the copy numbers of the respective species due to the random-time change model (2). Denote by and and assume that . Assume that , , , , and . Then converges to and

 ddtvE(s)=ET1+^KxE(s) % \;\;\; and \;\;\;ddtxS=−ETγ3^KxS(t)1+^KxS(t),

where .

The assumptions listed in the theorem capture the that: (i) and are scaled to concentrations, while and remain in copy numbers; (ii) the stochastic reaction rate is an order of magnitude smaller than the rates and (as a consequence of being related to the bimolecular, and not unimolecular reaction).

A complete proof is provided in [30]. We here outline the general idea. Let be a natural number, and let , , , . Writing out the scaled random time-change model for the substrate gives:

 ZS(t)=ZS(0) −N−1ξ1(N∫t0γ1ZS(s)ZE(s)ds) +N−1ξ2(N∫t0γ2ZS:E(s)ds),

and writing out the scaled random time-change model for the complex gives:

 ZE:S(t)=ZE:S(0) +ξ1(N∫t0γ1ZS(s)ZE(s)ds) −ξ2(N∫t0γ2ZS:E(s)ds) −ξ3(N∫t0γ3ZS:E(s)ds).

After dividing the latter with , and applying the law of large numbers, we obtain the balance equations analogous to assuming that the complex is at equilibrium. This equation implies the expression for . The equation for follows from the model of : we first use the conservation law and then substitute the obtained value of .

In order to confirm that the reduction is appropriate, our goal is now to show that the scaled versions of the original model (4) and the reduced model (5) are equivalent in the limit when . Let be the scaled random time change for the product in the original model, and in the reduced model. Notice that, from the balance equations, . According to the reduced system (5), the random time change for the product is given by

 ^ZP(t) =^ZP(0)+N−1ξ(∫t0k3ETK1+KN^ZS(s)N^ZS(s)ds) =^ZP(0)+N−1ξ(∫t0Nγ3ET^K1+^K^ZS(s)^ZS(s)ds).

Passing to the limit, we obtain the desired relation .

The above Theorem does not provide the means of computing the approximation error, or an algorithm which suggests which difference in time-scales is good enough for an approximation to perform well. Rather, this result shows that the enzymatic approximation is justified in the limit when the assumptions about the reaction rates and species’ abundance are met. In other words, when , the scaled versions of the original and reduced models – e.g. and – both converge to at the same, well-behaved process. This provides confidence that the actual process is a good approximation of the process .

In our reduction algorithm (Section 4), we will apply the reduction whenever the pattern (4) is detected. In order to ensure the validity of the approximation in the context of other rules, we will additionally check that the enzyme (resp. complex ) have initially low copy number (zero resp.), and that they don’t appear in any other rule (unless it is another enzymatic catalysis  scheme).

###### Example 1

To illustrate the meaning of the Theorem 1, we apply our reduction method on a small example shown in Fig. 1. We plot the mean and we compute the standard deviation of the protein level for the original and for the reduced model. Then, we scale up the parameters and , as well as the initial concentration of transcription factor , in order to mimic the effect of choosing a larger in Theorem 1. The deviation between the curves is decreased, as can be seen in Fig. 2. In order to obtain the error of using the reduced system instead of the original one, we compute the Bhattacharyya distance for each time point, for the actual parameter set and for the scaled parameter set. As expected, the distance is overall smaller in the scaled system. Especially in the scaled system (green line), we can observe that initially, the distance is larger, and then it decreases with time. This is because the original system takes time to reach the equilibrium state which is, in the reduced system, assumed immediately.

### 3.2 Generalised enzymatic reduction

The enzymatic approximation can be generalized to a situation where many sets of substrates compete for binding to the same enzyme. Consider a sub-network of reactions where the -th such reaction reads:

 E+Si,1+…+Si,mi ki\vbox \ooalign{\raise 1.0pt\hbox{\relbar\joinrel⇀\joinrel}\cr\lower 1% .0pt\hbox{↽\joinrel\relbar\joinrel}} k−iE:Si,1:…:Si,mi\lx@stackrel^ki→E+Pi.

The resulting approximation is

 Si,1+…+Si,mi\lx@stackrel^kiETK1xSiZ→Pi,

where , and . The latter expression follows from for all .

The correctness of the generalized enzymatic reduction can be shown with the same technique as Theorem 1. Each substrate and product should be scaled to concentrations, while all intermediate complexes and the enzyme remain in copy numbers. The relations between the reaction rates are equivalent.

### 3.3 Fast dimerization reduction

Consider now the dimerisation reaction . Assuming that both rates and are fast comparing to other reactions involving or , it is common to assume that the reaction is equilibrated, that is, , where and denote the copy number at time , of monomers and dimers respectively. Such assumption allows us to eliminate the dimerization reactions, and only the total amount of molecules needs to be tracked in the system. The respective monomer and dimer concentrations can be expressed as fractions of the total concentration:

 xM(t)=14K(√8KMT(t)+1−1), and xM2(t)=MT(t)2−12xM(t),

where and .

The correctness of the generalized enzymatic and dimerization reduction can be shown with the same technique as Theorem 1. In the context of multiscale stochastic reaction networks, both reaction rates should be treated as fast.

## 4 Reduction Algorithm

The idea of the reduction is to transform a Kappa system to a Kappa system with fewer rules and fewer agents, while still capturing the observables and the relevant dynamics. Our algorithm statically analyzes the rule-set, in search for one of the following mechanistic schemes:

• the modifier elimination and similar rule composition, that are the patterns amenable to the exact reduction (providing the equivalent rule-based model with fewer rules), as well as

• the generalized enzymatic catalysis, enzymatic catalysis and fast dimerization reductions, three interaction patterns amenable to approximate reduction based on time-scale separation (Section 3).

Recall that the generic framework for time-scale separation in biochemical reaction networks is shown in [30]. A special case of this framework is Theorem 1, which confirms that using the classical enzymatic approximation in stochastic setting is adequate. After detecting one of the five interaction patterns, our algorithms, similarly as in [32], perform additional checks, in order to avoid the situations where the equilibrium assumptions are violated due to interleavings with the rest of the reaction network.

The top-level algorithm is shown in Alg. 1. We next describe each of the five interaction patterns in more detail.

### 4.1 Similar rule composition

In similar rule composition scheme, rules have the same reactants, modifiers and products, but different rates. Our algorithm combines them into a single rule, by summing their rate laws. Notice that this reduction is exact, that is, applying the similar rule composition does not change the semantics of the rule-based system.

### 4.2 Modifier elimination

This reduction can be applied when a species only appears as a modifier throughout a rule-based system. Such a species will never change its copy number throughout the dynamics, and therefore, its quantity will be constant. The species being always a modifier does affect the dynamics of the system, and all the rule rates where the species was involved need to be adapted. Concretely – after the species is eliminated, each rate law will be multiplied by the initial copy number of this species. Notice that modifier elimination reduction is exact, that is, applying the modifier elimination does not change the semantics of the rule-based system.

### 4.3 Fast dimerization reduction

The algorithm searches for dimerisation rules. Suppose that a pair of reversible reactions is detected. Before proceeding to the reduction, we check whether a dimer is produced elsewhere, or if the monomer is a modifier elsewhere. These checks are necessary because they prevent from deviating from the assumed equilibrium. Finally, if all checks passed, the dimerization reaction can be eliminated. A new species is introduced, and, wherever the monomer or dimer were involved, they are replaced by the species , and the rate is adapted accordingly, by the expressions shown in Section 3.3.

### 4.4 Generalised enzymatic reduction

The algorithm searches for the scheme described in Section 3.2, by searching for candidate enzymes. Each pattern is tested as to whether it is catalyzing some enzymatic reduction. If a pattern indeed is an enzyme (operator) in an enzymatic reaction scheme, a set of all patterns which compete to bind to is formed, as well as the set of their complexes . Then, before proceeding with the reduction, additional tests must be performed: (i) pattern must be a species, and it is not an observable, (ii) must be small in copy number, that is, its initial copy number is smaller than a threshold, (iii) can neither be produced, nor degraded, (iv) complex is not an observable and is never appearing in another rule of and has initially zero abundance. Then, the patterns and can be eliminated from the rule-set and the reaction rates are adjusted according to the description in Section 3.2.

Often times, enzymatic catalysis reduction is appropriate to eliminate the binding of the transcription factor to the operator site. In this context, the operator site takes the role of the enzyme, and transcription factor(s) the role of the substrate. Whenever a candidate enzyme is detected, and the other algorithm checks pass, the rates are appropriately scaled. The competitive enzymatic reduction is suitable in a situation when more transcription factors compete for binding the enzyme, each in a different reaction. In other words, the algorithm finds rules where different substrates compete for the same enzyme.

###### Example 2

We illustrate the competitive enzymatic transformation on a small subnetwork of the -phage model, which will be introduced in Section 5. The four rules presented below model the binding of the agent RNAP to the operator site of the agent PRE and subsequent production of protein CI. Agent PRE binds either only RNAP (at rate and ), or simultaneously with CII (at rate and ). The protein can be produced whenever PRE and PRE are bound, but the rates will be different depending on whether only RNAP is bound to the operator (rate ), or, in addition, CII is bound to the operator (rate ):

 PRE(cii,rnap),RNAP(p1,p2) ↔PRE(cii,rnap!1),RNAP(p1!1,p2)@k1+,k1− PRE(cii,rnap),CII(pre),RNAP% (p1,p2) ↔PRE(cii!1,rnap!2% ),CII(pre!1),% RNAP(p1!2,p2)@ka+,ka− PRE(cii,rnap!1),RNAP(p1!1,p2) →PRE(cii,rnap!1),RNAP(p1!1,p2),10CI(ci,or)@kb PRE(cii!1,rnap!2),CII(pre!1),RNAP(p1!2,p2) →PRE(cii!1,rnap!2),CII(pre!1),% RNAP(p1!2,p2),10CI(ci,or)@ka

After the competitive enzymatic reduction, the operator PRE is eliminated from each of the two competing enzymatic catalysis  patterns. Finally, the production of CI is modelled only as a function of RNAP and CII, and the rate is appropriately modified:

 RNAP(p1,p2),% CII(pre) →RNAP(p1,p2),CII(pre),10CI(pr,ci)@knew.

## 5 Case Study: λ-phage

The phage is a virus that infects E.coli cells, and replicates using one of the two strategies: lysis or lysogeny. In the lysis strategy, phage uses the machinery of the E.coli cell to replicate itself and then lyses the cell wall, killing the cell and allowing the newly formed viruses to escape and infect other cells, while in the lysogeny scenario, it inserts its DNA into the host cell’s DNA and replicates through normal cell division, remaining in a latent state in the host cell (it can always revert to the lysis strategy). The decision between lysis and lysogeny is known to be influenced by environmental parameters, as well as the multiplicity of infection and variations in the average phage input [1].

The key element controlling the decision process is the operator (shown in Fig. 5), which is composed of three operator sites (, , ) to which transcription factors can bind, in order to activate or repress the two promoters ( and ) overlapping the operator sites. When RNAP (RNA polymerase, an enzyme that produces primary transcript RNA) binds to , it initiates transcription to the left, to produce mRNA transcripts from the cI gene; RNAP bound to the promoter, on the other hand, initiates transcription to the right, producing transcripts from the cro gene. The two promoters form a genetic switch, since transcripts can typically only be produced in one direction at a time.

The cI gene codes for the CI protein, also known as the repressor: in its dimer form (two CI monomers react to form a dimer, ), it is attracted the the operator sites in the phage’s DNA, repressing the promoter from which Cro production is initiated and further activating CI production. Similarly, the cro gene codes for the Cro protein, which also dimerizes in order to bind to operator sites and prevent production from , or even its own production.

While and can bind to any of the three operator sites at any time, they have a different affinity to each site. The has its strongest affinity to the operator site, next to the site, and finally to the site (in other words, first turns off , then activates , and finally, represses its own production), while has the reverse affinity (it first turns off CI production, then turns off its own production).

The feedback through the binding of the products as transcription factors coupled with the affinities described makes the operator behave as a genetic bistable switch. In one state, Cro is produced locking out production of CI. In this state, the cell follows the lysis pathway since genes downstream of Cro produce the proteins necessary to construct new viruses and lyse the cell. In the other state, CI is produced locking out production of Cro. In this state, the cell follows the lysogeny pathway since proteins necessary to produce new viruses are not produced. Instead, proteins to insert the DNA of the phage into the host cell are produced.

What’s more, in the lysogeny state, the cell develops an immunity to further infection: the cro genes found on the DNA inserted by further infections of the virus are also shut off by molecules that are produced by the first virus to commit to lysogeny. Once a cell commits to lysogeny, it becomes very stable and does not easily change over to the lysis pathway. An induction event is necessary to cause the transition from lysogeny to lysis. For example, lysogens (i.e., cells with phage DNA integrated within their own DNA) that are exposed to UV light end up following the lysis pathway.

### 5.1 Results and discussion

We applied our reduction algorithm to a Kappa model of the phage decision circuit that we built using the reaction-based model presented in [36], [32].

#### Intermediate tests

Intermediate tests were carried out on the portion of the circuit that is involved in CI production from the he promoter and in CII production from the promoter. Initially, this model contained 10 rules, 5 proteins and 4 species; after applying the reduction algorithm, it was reduced to 4 rules and 3 proteins.

#### Full model

Simulations were carried out on the complete chemical reaction genetic circuit model which contains 96 rules, 16 proteins and 61 species (the contact map is shown in Fig. 7). After applying the reduction, the Kappa model is reduced to 11 rules and 5 proteins.

In Fig. 3, we plot the mean for the CI copy number obtained from runs of the original and of the reduced model, and the graphs show agreement.

In Fig. 4, we compared the probability of lysogeny before and after the reduction of the model (lysogeny profile is detected if there are molecules of CI before there are molecules of Cro). The graphs show overall agreement in predicting the lysogeny profile. More precisely, for two and less MOI’s (multiplicities of infection), the probability of lysogeny is almost negligible; For three MOIs, both graphs show that lysogeny and lysis are equally probable (the reduced model reports slightly larger probability), and for five or more MOI’s, both graphs show that lysogeny is highly probable. While one simulation of the original model takes about 40 mins, one simulation of the abstracted model takes about 5 seconds. Once again, the results are similar, with a significant improvement in simulation speed.

## 6 Conclusion and future work

In this paper, we presented a method for automated reduction of rule-based models. The reduction is based on equilibrium approximations: certain rules and species are eliminated and the rates are approximated accordingly. More concretely, a number of reaction patterns known to be amenable to equilibrium approximations are recognised by static inspection of the rules. The crucial aspect of the presented approach is that each approximation step can be retrieved at any time, and no information about the original, detailed model is lost. The presented method can be seen as the first step towards a systematic time-scale separation of stochastic rule-based models. The guarantees of the presented reduction method are given for the asymptotic behaviour. Bhattacharyya distance is proposed as a metric to quantify the reduction error with respect to the observable. We plan to further investigate how to practically access the approximation error. To this end, the error can be measured with respect to a given observable, or, more generally, with respect to a given property specified in, for example, linear temporal logic (LTL).

We implemented the tool and evaluated it on a case study of a lambda phage bistable switch. The simulation of one cell cycle was improved from CPU time to , and the profiles of the observables show agreement. We plan to extend the set of approximation patterns so to obtain good reductions for complex models of signaling pathways. More precisely, while our tool is applicable to any rule-based model, the chosen set of approximation patterns are tailored for GRNs and may not provide significant reductions when applied to the signaling pathways. To illustrate this, we applied the reduction to the EGF/insuling crosstalk model, and we observe that the number of dimerisation events does not take the significant portion of all events (see Fig. 6), at least not as radically as it was the case with the lambda phage example. To this end, we plan to include more patterns for reducing signaling pathways, by, for example, approximating multiple phosphorylation events.

## Acknowledgements

The authors would like to thank to Jérôme Feret, for the inspiring discussions and useful suggestions.

\balancecolumns

### References

1. A. Arkin, J. Ross, and H. H. McAdams. Stochastic kinetic analysis of developmental pathway bifurcation in phage -infected escherichia coli cells. Genetics, 149(4):1633–1648, 1998.
2. E. Bartocci, L. Bortolussi, and L. Nenzi. A temporal logic approach to modular design of synthetic biological circuits. In Computational Methods in Systems Biology, pages 164–177. Springer, 2013.
3. A. Beica and T. Petrov. Kared, 2014.
4. M. L. Blinov, J. R. Faeder, B. Goldstein, and W. S. Hlavacek. Bionetgen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics, 20(17):3289–3291, 2004.
5. N. M. Borisov, N. I. Markevich, J. B. Hoek, and B. N. Kholodenko. Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity. Biophysical journal, 89(2):951–966, 2005.
6. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L.-J. Hwang. Symbolic model checking: states and beyond. Information and computation, 98(2):142–170, 1992.
7. P. Cousot. Abstract interpretation based formal methods and future challenges. In Informatics - 10 Years Back. 10 Years Ahead., pages 138–156, London, UK, 2001. Springer-Verlag.
8. G. Craciun and M. Feinberg. Multiple equilibria in complex chemical reaction networks: Ii. the species-reaction graph. SIAM Journal on Applied Mathematics, 66(4):1321–1338, 2006.
9. A. Crudu, A. Debussche, and O. Radulescu. Hybrid stochastic simplifications for multiscale gene networks. BMC systems biology, 3(1):89, 2009.
10. V. Danos, J. Feret, W. Fontana, R. Harmer, and J. Krivine. Rule-based modelling of cellular signalling. In CONCUR 2007–Concurrency Theory, pages 17–41. Springer, 2007.
11. V. Danos, J. Feret, W. Fontana, and J. Krivine. Scalable simulation of cellular signaling networks, invited paper. In Z. Shao, editor, Proceedings of the Fifth Asian Symposium on Programming Systems, APLAS’2007, Singapore, volume 4807 of Lecture Notes in Computer Science, pages 139–157, Singapore, 29 November – 1 December 2007. Springer, Berlin, Germany.
12. V. Danos, J. Feret, W. Fontana, and J. Krivine. Abstract interpretation of cellular signalling networks. Lecture Notes in Computer Science, 4905:83–97, 2008.
13. V. Danos, J. Feret, W. Fontana, and J. Krivine. Abstract interpretation of reachable complexes in biological signalling networks. In Proceedings of the 9th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI’08), volume 4905, pages 42–58, 2008.
14. T. A. Darden. A pseudo-steady-state approximation for stochastic chemical kinetics. PhD thesis, University of California, Berkeley, 1979.
15. D. Del Vecchio. Design and analysis of an activator-repressor clock in e. coli. In American Control Conference, 2007. ACC’07, pages 1589–1594. IEEE, 2007.
16. J. Feret, V. Danos, J. Krivine, R. Harmer, and W. Fontana. Internal coarse-graining of molecular systems. Proceedings of the National Academy of Sciences, 106(16):6453–6458, April 2009.
17. J. Feret, T. Henzinger, H. Koeppl, and T. Petrov. Lumpability abstractions of rule-based systems. Theor. Comput. Sci., 431:137–164, May 2012.
18. J. Feret and J. Krivine. Kasim: a simulator for kappa, 2008-2013.
19. A. Ganguly, T. Petrov, and H. Koeppl. Markov chain aggregation and its applications to combinatorial reaction networks. Journal of mathematical biology, November 2013.
20. T. S. Gardner, C. R. Cantor, and J. J. Collins. Construction of a genetic toggle switch in escherichia coli. Nature, 403(6767):339–342, 2000.
21. D. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry, 81:2340–2361, 1977.
22. D. T. Gillespie, Y. Cao, K. R. Sanft, and L. R. Petzold. The subtle business of model reduction for stochastic chemical kinetics. The Journal of chemical physics, 130(6):064103, 2009.
23. A. N. Gorban and O. Radulescu. Dynamical robustness of biological networks with hierarchical distribution of time scales. arXiv preprint q-bio/0701020, 2007.
24. C. C. Guet, M. B. Elowitz, W. Hsing, and S. Leibler. Combinatorial synthesis of genetic networks. Science, 296(5572):1466–1470, 2002.
25. E. L. Haseltine and J. B. Rawlings. Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. The Journal of chemical physics, 117(15):6959–6969, 2002.
26. H.Conzelmann, J.Saez-Rodriguez, T.Sauter, B.N.Kholodenko, and E. Gilles. A domain-oriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics, 7, 2006.
27. T. A. Henzinger, M. Mateescu, and V. Wolf. Sliding window abstraction for infinite Markov chains. In CAV, pages 337–352, 2009.
28. B. Hepp, A. Gupta, and M. Khammash. Adaptive hybrid simulations for multiscale stochastic reaction networks. arXiv preprint arXiv:1402.3523, 2014.
29. J. S. Hogg, L. A. Harris, L. J. Stover, N. S. Nair, and J. R. Faeder. Exact hybrid particle/population simulation of rule-based models of biochemical systems. PLoS computational biology, 10(4):e1003544, 2014.
30. H.-W. Kang, T. G. Kurtz, et al. Separation of time-scales and model reduction for stochastic reaction networks. The Annals of Applied Probability, 23(2):529–583, 2013.
31. T. G. Kurtz. Limit theorems for sequences of jump Markov processes approximating ordinary differential processes. Journal of Applied Probability, 8(2):344–356, 1971.
32. H. Kuwahara, C. J. Myers, M. S. Samoilov, N. A. Barker, and A. P. Arkin. Automated abstraction methodology for genetic regulatory networks. In Transactions on computational systems biology VI, pages 150–175. Springer, 2006.
33. R. Kwok. Five hard truths for synthetic biology. Nature, 463(7279):288–290, 2010.
34. B. Munsky and M. Khammash. The finite state projection algorithm for the solution of the chemical master equation. The Journal of chemical physics, 124:044104, 2006.
35. J. D. Murray. Mathematical biology i: An introduction, vol. 17 of interdisciplinary applied mathematics, 2002.
36. C. J. Myers. Engineering genetic circuits. CRC Press, 2011.
37. M. Ptashne. A genetic switch: Gene control and phage lambda. 1986.
38. M. A. Ptashne. Genes & signals. 2001.
39. C. V. Rao and A. P. Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: application to the gillespie algorithm. The Journal of chemical physics, 118(11):4999–5010, 2003.
40. K. Sanft, D. Gillespie, and L. Petzold. Legitimacy of the stochastic michaelis-menten approximation. Systems Biology, IET, 5(1):58–69, 2011.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters