A The full Bio-PEPA model

# Complementary approaches to understanding the plant circadian clock

## Abstract

Circadian clocks are oscillatory genetic networks that help organisms adapt to the 24-hour day/night cycle. The clock of the green alga Ostreococcus tauri is the simplest plant clock discovered so far. Its many advantages as an experimental system facilitate the testing of computational predictions.

We present a model of the Ostreococcus clock in the stochastic process algebra Bio-PEPA and exploit its mapping to different analysis techniques, such as ordinary differential equations, stochastic simulation algorithms and model-checking. The small number of molecules reported for this system tests the limits of the continuous approximation underlying differential equations. We investigate the difference between continuous-deterministic and discrete-stochastic approaches. Stochastic simulation and model-checking allow us to formulate new hypotheses on the system behaviour, such as the presence of self-sustained oscillations in single cells under constant light conditions.

We investigate how to model the timing of dawn and dusk in the context of model-checking, which we use to compute how the probability distributions of key biochemical species change over time. These show that the relative variation in expression level is smallest at the time of peak expression, making peak time an optimal experimental phase marker. Building on these analyses, we use approaches from evolutionary systems biology to investigate how changes in the rate of mRNA degradation impacts the phase of a key protein likely to affect fitness. We explore how robust this circadian clock is towards such potential mutational changes in its underlying biochemistry. Our work shows that multiple approaches lead to a more complete understanding of the clock.

## 1 Introduction

The daily cycles in sunlight, temperature and other environmental parameters are highly important to most organisms. To follow and anticipate these cycles, living cells generate biochemical rhythms with a period of approximately 24 hours (circadian). The majority of the known circadian clocks, including those in eukaryotes, are based on one or more interlocking transcriptional feedback loops between a set of key genes. Crucial to the function of the clock is its ability to entrain to environmental signals (i.e. to adjust its internal rhythm by synchronising with external cycles), so that the phase of gene expression is maintained under changes to the length of the day (photoperiod). Such entrainment acts through various photoreceptor pathways, where light affects kinetic parameters of the core clock. In addition to circadian entrainment, a defining feature of circadian clocks is that they exhibit continued oscillations in constant light conditions [?].

The circadian clocks of many organisms are organised around complex feedback loop architectures, making the determination of design principles a challenging computational problem. Although research has revealed much about the clock of the foremost model plant organism, Arabidopsis thaliana, there are still unidentified components and inconsistencies between computational models and experimental observations [?]. For this reason, to increase our mechanistic understanding of circadian clocks, it is desirable to investigate simpler systems that possess functional similarity with more complex networks. The circadian clock of the green alga Ostreococcus tauri [?] is the simplest plant clock discovered so far, and is thus an ideal model system for understanding plant circadian function with the help of experiments, simulations and theory. A quantitative model describing the biochemical reactions of the Ostreococcus clock can serve as a focal point for this research, yielding a low-dimensional test system for various mathematical analysis techniques.

Bio-PEPA [?] is a stochastic process algebra specifically defined to model and analyse biochemical systems. Exploiting the defined formal mappings of Bio-PEPA models into a number of equivalent representations, it is possible to analyse Bio-PEPA models using different mathematical and computational techniques, including ordinary differential equations (ODEs), stochastic simulation algorithms (SSAs) and model-checking.

In previous work we used both ODEs and SSAs to model the clock of the fungus Neurospora crassa, demonstrating that combining different analysis methods is important for fully quantifying the relationship between feedback architecture and circadian behaviour [?]. Here we build on this approach, applying a broader range of computational techniques to the Ostreococcus clock. We develop and analyse a Bio-PEPA model of the clock, focusing on various stochastic methods which are the most appropriate in this case due to the low copy numbers characteristic of the system. In particular, we exploit the automatic generation of PRISM models from Bio-PEPA to carry out a novel application of the PRISM model-checker [?] to a circadian model, computing time-dependent probability distributions for the clock components. We use the model to quantify the variability and robustness of the clock’s functional behaviour with respect to the following factors: (i) internal stochastic noise, the inevitable consequence of a system comprising a small number of molecules; (ii) environmental changes, such as photoperiod variations and transitions between constant light/darkness; and (iii) mutational changes that affect the biochemical reaction rates of our model, representing perturbations to the system that occur on an evolutionary timescale.

The rest of our paper is structured as follows. After an overview of Bio-PEPA in Section 2, the Ostreococcus clock is introduced in Section 3, followed by the description of the corresponding Bio-PEPA model. In Section 4 we analyse the model using various approaches. We first use stochastic simulation to investigate how different light conditions affect the oscillations of the clock. We then explore approaches for modelling light entrainment in a continuous-time Markov chain (CTMC) before using model-checking to compute the time-dependent probability distributions of protein levels. This enables us to identify the phase markers that are most robust to stochastic fluctuations. Finally we use ideas from a recently developed framework for evolutionary systems biology [?] to test how mutational changes in mRNA degradation rate affect the phase of oscillations in comparison to the inherent stochastic noise that is present at the individual cell level. The full Bio-PEPA model is given in Appendix A.

## 2 An overview of Bio-PEPA

Bio-PEPA [?] is a stochastic process algebra, recently developed for the modelling and analysis of biological systems. We give here a brief overview of the main features of the language. For a detailed presentation of its syntax and semantics, see [?].

The main components of a Bio-PEPA system are the species components, describing the behaviour of each species, and the model component, specifying all interactions and initial amounts of species. The syntax of Bio-PEPA components is given by:

 \vspace∗−.5emS::=(α,κ) {op} S∣S+S∣Cwith % op=\reactant∣\product∣\activator∣\inhibitor∣\modifierP::=P\syncsIP∣S(x)

where is the species component and is the model component. In the prefix term , is the stoichiometry coefficient of species in reaction , and the prefix combinatorop” represents the role of in the reaction. Specifically, indicates a reactant, a product, an activator, an inhibitor and a generic modifier. The notation is a shorthand for when . The operator “” expresses a choice between possible actions, and the constant is defined by an equation . The process denotes synchronisation between components and ; the set determines the activities on which the operands are forced to synchronise, with denoting a synchronisation on all common action types. In the model component , the parameter represents the initial number of molecules present. In addition to species and model components, a Bio-PEPA system consists of kinetic rates, parameters and, if needed, locations, events and other auxiliary information for the species.

The formal representation offered by Bio-PEPA allows for different kinds of analysis through the defined mapping into continuous-deterministic and discrete-stochastic analysis methods (see [?] for details). More on Bio-PEPA can be found at [?], including two software tools, the Bio-PEPA Eclipse Plug-in and the Bio-PEPA Workbench [?]. Both tools process Bio-PEPA models automatically and either compute time-series results directly using various SSA or ODE solvers, or generate representations that can be used by other tools.

## 3 The Ostreococcus clock

Ostreococcus tauri is an exceptionally small green alga with a highly reduced genome [?]. Experiments and homology searches indicate that its circadian clock is very simple compared to higher plants, such as Arabidopsis thaliana. Only a handful of the clock genes identified in other plants have been found in Ostreococcus, and only two of these appear to be central to the clock. The first of these, which we refer to as TOC1, is homologous to Arabidopsis TOC1 (TIMING OF CAB EXPRESSION 1) and other PRRs (PSEUDO RESPONSE REGULATORs). The other gene, here called LHY, is homologous to Arabidopsis LHY (LATE ELONGATED HYPOCOTYL) and CCA1 (CIRCADIAN CLOCK ASSOCIATED 1[?]. An ODE model of the Ostreococcus clock as a negative feedback loop between these two genes was introduced in [?], where it was applied to drug treatments and other perturbations. The full model includes details of the luciferase assay used to measure mRNA and protein levels, but here we use only the central parts of the model, which describe the dynamics of the native mRNAs and proteins. The model is illustrated in Figure 1.

TOC1 transcription requires light, which is buffered by a “light accumulator” (acc) and is inhibited by the presence of nucleic LHY protein (LHY_n) for most of the day. TOC1 activates LHY transcription through an unknown mechanism, proposed in [?] to work as follows: TOC1 mRNA is translated into inactive TOC1 protein (TOC1_i), which is activated slowly during the day but quickly after dusk. The active form (TOC1_a) drives LHY transcription throughout the night but is quickly degraded after dawn. LHY mRNA is translated into cytosolic LHY (LHY_c), which is quickly translocated to the nucleus, thereby closing the feedback loop. Light also accelerates the rate of LHY degradation.

The model parameters were estimated by fitting simulated time-courses to equivalent data obtained from experiments over a wide range of light conditions. Some experiments alternated 12 hours of light and dark (denoted LD 12:12), others used longer or shorter days (such as LD 16:8 or 8:16), and many included transitions between different conditions, often into constant light (LL) [?].

### 3.1 A Bio-PEPA model of the clock

A model of the clock as described above was implemented in Bio-PEPA. Here we describe its main features; for the full model, including kinetic laws and parameters, see Appendix A.

One of the key issues involved in obtaining a realistic stochastic model is the correct scaling of the initial concentrations and kinetic parameters of the continuous ODE model in [?] so as to obtain the respective molecule counts and rate constants for the Bio-PEPA discrete-state model. Since the absolute values of the initial concentrations are not known, the initial values in the original ODE model are given in arbitrary relative units. However, the peak number of TOC1 and LHY protein molecules was estimated experimentally over a number of free-running cycles in LL conditions using a TopCount luminometer. From this, approximate initial values for our discrete-state model were computed, yielding a rough estimate for the scaling factor of . After such rescaling the Bio-PEPA model can be analysed by ODEs and SSAs, which both give results in molecule counts.

The proteins and mRNAs shown in Figure 1 are modelled as the following Bio-PEPA species components that describe the possible reactions they can participate in and how their amounts are affected by the occurrence of each reaction. Reactions are associated with functional rates representing the corresponding kinetic law.
TOC1_mRNA LHY_mRNA TOC1_i LHY_c TOC1_a LHY_n acc

For instance, the transcription of TOC1_mRNA is modelled by reaction , which involves three different species (TOC1_mRNA, LHY_n, and acc), and is positively regulated by the light-accumulator acc and negatively regulated by LHY_n. The kinetic law for this reaction is given by a Hill function, commonly used for describing transcription in clock models [?, ?, ?, ?]:

 \vspace∗−.1exΩ⋅tmp_toc1_transcription1+tmp_toc1_transcription+(R_toc1_lhyΩ⋅LHY_n)H_toc1_lhy.

Here, species names represent molecule counts, and , , and are parameters. For comparisons with experiments we also defined the observables and .

Bio-PEPA functional rates allow the definition of general kinetic laws. We use this facility to represent the entrainment of the system to light/dark cycles through the time-dependent function below:

 \vspace∗−.1exlight_time=H(((time−24⋅⌊time24⌋)−tdawn)⋅(tdusk−(time−24⋅⌊time24⌋))).

This allows us to model light-dependent reaction rates by returning the value 1 in day-time and 0 during night-time. The parameters and give the time of the day (in hours) at which dawn and dusk occur, respectively; is the Heaviside step function that returns 1 for and 0 otherwise.

## 4 Analysis methods and results

Each technique that can be used to analyse Bio-PEPA models has its particular strengths: ordinary differential equations (ODEs) easily predict mean values and quantify dynamical changes in terms of bifurcations, stochastic simulation algorithms (SSAs) allow variability in the system’s responses to be measured, and model-checking enables complex queries about the model to be formulated and verified automatically. Here we analyse the clock model using these three analysis methods. After briefly describing each method, we explain why it is better suited for investigating a particular aspect of the system, and report some of the results obtained.

### 4.1 Stochastic simulation: population versus single cell behaviour

Following the formulation of Gillespie’s stochastic simulation algorithm [?], the stochastic analysis of biochemical systems has received increasing attention due to the impact that stochastic variability can have on system behaviour. This is particularly relevant for systems such as gene regulatory networks, where some molecules are present in copy numbers so small that random fluctuations are too large for the continuous approximation behind ODEs to be justified.

Within this framework, a single molecule-by-molecule stochastic simulation run can be viewed as a faithful representation of behaviour at the cellular level (assuming the underlying model is accurate). Observing the mean behaviour over a larger number of runs is then equivalent to observing a population of cells. Most current experimental techniques only allow population-level assays. However, as progress in high-resolution imaging techniques reduces the minimum population size that can be measured, it will also become possible to consider the effect of stochastic noise, which is expected to be more evident in smaller populations.

In the rest of this section we report results obtained by solving the clock model using the Dormand–Prince ODE solver and the Gibson–Bruck SSA, both available in the Bio-PEPA Eclipse Plug-in [?]. We consider three different light conditions: constant dark (DD), constant light (LL), and alternating light/dark cycles (LD). We also consider an experiment in which the system is transferred from constant light into constant dark (LL-DD). For each of these, we compare results obtained by numerical integration of the deterministic model with those obtained by stochastic simulation. The initial conditions are those of the original model at dawn following entrainment to 24 hour light/dark cycles (LD 12:12). Figures 2 and 4 report the computed time-series behaviours for all settings.

The species of interest are TOC1_mRNA, LHY_mRNA and the corresponding experimentally observable total protein amounts (Total_TOC1 and Total_LHY as defined in Section 3.1).

#### DD system.

The rapid damping behaviour of the system in constant dark (DD) is shown in Figure ac. This damping is seen in all analysis methods: ODEs, individual SSA runs, and the mean stochastic behaviour calculated over 10000 independent SSA runs. Despite the fact that the self-sustained oscillations observed by averaging over the SSA runs stop very quickly (within 1-2 days), when looking at individual SSA runs we note that occasionally a non-zero number of molecules can be briefly observed, even after several circadian cycles (Figure c).

#### LL system.

In constant light (LL), the deterministic system also exhibits damped oscillations, with all species tending to non-zero constant values after about 7-8 days (Figure d). Similar behaviour is observed by averaging over 10000 SSA runs (Figure e), though the oscillations damp more rapidly and the steady-state value is slightly different (e.g. the LHY copy number is about 130 in ODEs compared to about 160 in the SSA). However, a 10-fold increase of the scaling factor to yields a perfect quantitative agreement between the SSA and ODEs (see Figure 8 in Appendix C). Although the precise source of the discrepancy is not known at present, we hypothesise that it is caused by a breakdown of the continuous approximation underpinning ODEs, due to the low copy numbers obtained at small values.

The other, most notable, difference between the deterministic and stochastic models is the behaviour of individual SSA runs (Figure f), for which persistent irregular oscillations (in both phase and amplitude) are observed. Because of phase diffusion effects, however, these oscillations cannot be detected in the mean behaviour: hence, in this case, neither the simple average over multiple SSA runs nor the ODE solution gives us a correct indication of the real behaviour of the system, emphasising the importance of observing single realisations.

In view of these findings, we hypothesise that single cell experimental data may exhibit sustained oscillations, whereas the behaviour observed in a large population of cells would be closer to the rapid damping reproduced by both the ODE and SSA average. This is due to the fact that free-running oscillations are unlikely to be synchronised over different cells. Furthermore, visual inspection of the solutions obtained by averaging over different numbers of SSA runs suggests that it should be possible to discern the stochastic effects with a population of around 100 cells. The experimental data currently available for this system are at the level of a population comprising at least 10000 cells. However, we anticipate that the development of new experimental techniques for measuring gene expression in single cells or small populations will enable this hypothesis to tested experimentally in the not-too-distant future.

#### LL-DD system.

As an additional experiment, we consider a system which is kept in constant light (LL) for 160 hours, and then transferred into constant dark (DD). The time-series results are reported in Figures gi. It can be seen that single realisations of the SSA — approximating the behaviour of individual cells — exhibit immediate cessation of self-sustained oscillations following the LL-DD transfer. This behaviour can be understood by considering the fixed points of the deterministic model.

The LL fixed point is located far from the origin in phase space and is of the stable focus type. Trajectories of the ODE — approximating the behaviour of a large population of cells — spiral around the fixed point as they converge to it, producing slowly damping oscillations. In the corresponding stochastic model, fluctuations kick individual realisations of the system between these spiralling trajectories, thereby preventing the system from remaining close to the fixed point for long periods (see Figure 3). This leads to the irregular self-sustained oscillations observed. By contrast, the DD fixed point of the ODE system is located at the origin. As species concentrations must be positive, the fixed point cannot be a stable focus, and is instead a stable node. Trajectories of the ODE converge directly onto it, generating oscillations that quickly damp to zero. Individual realisations of the stochastic model are thus repeatedly perturbed between rapidly convergent trajectories. Consequently, they quickly approach the DD steady-state following the LL-DD transition, remaining in its vicinity thereafter (see Figure 3).

The model thus predicts that the DD behaviour of the LL-DD system at the single-cell level mirrors that at the population level. If further experiments were to reveal that self-sustained oscillations are observed during the DD phase, this would indicate that the model requires modification to convert the DD fixed point into a stable focus bounded away from the origin.

#### LD system.

The light conditions considered so far are experimental settings useful for observing the system’s endogenous dynamics. It is also informative, however, to observe the behaviour of the clock under natural conditions (alternating 24-hour cycles of light and dark). We present results obtained for three different photoperiods: 6 hours light/18 hours dark (LD 6:18), 12 hours light/12 hours dark (LD 12:12), and 18 hours light/6 hours dark (LD 18:6).

As described in Section 3, exposure to periodic external stimuli such as light/dark cycles has the effect of resetting the free-running oscillations observed in constant light, so as to establish stable phase relationships with the forcing stimulus. Compared with the free-running LL system, the entrainment to LD cycles regularises the dynamics of the system, markedly reducing the variability of oscillations, particularly in terms of phase. As a consequence, persistent regular oscillations with a stable phase relationship to the light/dark cycle are observed in both ODEs (Figures a, d, g) and when averaging over multiple SSA runs (Figures b, e, h). This phase regularisation can also be seen in individual SSA runs of the entrained system (compare Figures c, f, i) with the simulations of the free-running clock in Figure f). These observations are consistent with previous stochastic analyses of clock models [?, ?]. We also note that, as for the LL system, the deterministic and mean stochastic behaviour, whilst very similar, are not in perfect agreement.

### 4.2 Model-checking: time-dependent probability distributions

Model-checking [?] is a formal verification method that allows modellers to state properties of a given model and to automatically check whether they are met. Probabilistic model-checking (see, for instance, [?, ?]) adds probabilistic measures in the evaluation of queries. Recently, model-checking has grown in popularity within the field of systems biology due to its ability to directly answer complex questions on a model’s behaviour. Traditional model-checking verification differs from simulation-based analysis in that the verification of a property is obtained from a computation over the entire state-space of the continuous-time Markov chain (CTMC) underlying the model. The major drawback of this approach is the state-space explosion problem: the model’s dimension is often too large for computational viability.

Statistical model-checking (see, for instance, [?]) is an alternative query-based approach: it estimates the probability distributions and computes approximate results of queries (together with an estimate of the error) by generating random realisations of the CTMC and averaging the results obtained by evaluating the queries on each of them. The advantage of statistical model-checking over exact verification approaches is that it does not need to build the explicit state-space of the model, which is often intractable, and it does not rely on the transient solution of the CTMC. In essence, statistical model-checking is a verification technique which allows modellers to perform additional analyses of a stochastic system by automatically evaluating queries over multiple simulation traces. The obvious drawback of statistical model-checking is that it only considers a finite number of behaviours of the system (i.e. paths in the CTMC) and, hence, the accuracy of the results is strongly related to that number. However, exact verification of biological systems is generally infeasible, and statistical model-checking often represents a good practical solution. Another issue of probabilistic model-checking is that the transient solution of the CTMC can incur the same averaging effect discussed previously relating to ODEs and mean SSA behaviour: computing the expected value of the model variables might not be sufficient because this would be exactly the same as the deterministic behaviour. Results of reward-based properties, for instance, are computed in terms of expected values and this, especially in the case of oscillations which are out of phase, does not give satisfactory results.

PRISM [?] is a probabilistic model-checker, which can be used to verify properties of a CTMC model. It also includes a discrete-event simulator for statistical model-checking. PRISM has been used to analyse systems from a wide range of application domains, and recently also biochemical systems [?]. Models are described using the state-based PRISM language, and it is possible to specify quantitative properties of the system using a property specification language which includes the temporal logic CSL (Continuous Stochastic Logic) [?, ?].

Using the Bio-PEPA Workbench [?], we generated a PRISM model of the clock (together with a set of reward structures and some standard CSL properties which are automatically generated). In the PRISM model, one module is defined for each species, and module local variables are used to record the current quantity of each species. The transitions correspond to the activities of the Bio-PEPA model and the updates take the stoichiometry into account. Transition rates are specified in an auxiliary module which defines the functional rates corresponding to all the reactions. In order to have a finite CTMC, lower and upper bounds are defined for each variable. In the following, we focus on statistical model-checking: in this case, the choice of the values for the bounds has no effect on the performance of the analysis (since the CTMC is not built) and so we can use arbitrarily high values that are guaranteed not to be reached.

#### Modelling the light in PRISM.

In order to model the entrained clock (LD system), time-dependent events must be represented. However, because of the intrinsic nature of model-checking algorithms, which involve the numerical solution of the CTMC underlying stochastic models, deterministic events and time-dependent functions cannot be explicitly specified in PRISM.

We investigated several approaches to address this problem. A first possibility is to split the model-checking algorithm into different (two or more, depending on the time window we are interested in) analysis steps over different time intervals, each with constant light conditions: two different CTMCs would be considered (one for the day-time system and one for the night-time one) with the algorithm switching back and forth between them. The main issue with this approach is how to merge the results obtained over the different time periods. For some particular queries, such as those relating to reachability, this can be done by splitting them into a number of subqueries such that the result (i.e. probability distributions) of one query can be used as the initial state for the next one. For an arbitrary CSL query, however, this cannot be done, and this strongly limits the kind of queries that could be verified.

The alternative approach we consider here is to represent the light by approximating time using a monotonically increasing stochastic variable. The main issue of this approach is that we introduce an additional stochastic effect which is absent in the system we have described so far (i.e. where light is modelled as a deterministic on/off switch). In practice this does not matter, provided that the stochastic variability introduced is kept smaller than the variability of any experiments the model may be compared to. The introduction of an additional variable to model time also causes an increase in the state-space, but this is not an issue here since we focus on statistical model-checking only. An extract from the PRISM model showing how we model time and the day/night switch is provided in Appendix B.

#### Time-dependent probability distributions of protein levels.

Using model-checking we can compute the time-dependent probability distributions for each of the model species. For instance, by verifying the CSL property

 \Prob\Eventually[T,T](LHY_c+LHY_n=i)

for time instant and protein level , we can observe how the probability distribution for LHY protein changes over time during the first 96 hours of simulation.

Each of the plots in Figure 5 refers to a different light condition (DD, LL, and LD 12:12), showing how the probability of being at a particular level changes over time in each case. The plots also report the mean value and the standard deviation of LHY expression. In all cases, the initial amount is , and then the probability mass gradually moves away from this initial value. In DD, as expected from the results of Section 4.1, the bulk of the probability mass rapidly moves close to zero. In LL we can clearly observe the effect of phase diffusion, resulting in a probability distribution spread almost equally across a wide range of values. By contrast, in LD we are able to observe clear oscillations in the probability distribution; we also note that the amplitude of peak expression is more variable than that of trough expression, with a much broader spread of the probability mass around the mean.

In the following, we focus on the case of alternating light/dark cycles (LD 12:12). In Figure 6(a) we report the probability distribution for LHY protein, together with its mean and standard deviation in a single 24-hour cycle (from 120 to 144 hours). Figure 6(b) plots the oscillation in the corresponding coefficient of variation . This provides a normalised measure of the sensitivity of the LHY protein oscillation to stochastic fluctuations as a function of circadian time. Small values of , therefore, correspond to robust phase markers (a high signal-to-noise ratio) and large values poor phase markers (a low-signal-to-noise ratio). Commonly used phase measures in circadian research are the times of peak and trough expression together with the time at which the oscillation falls to its midpoint level. It can be seen in Figure 6(b) that the coefficient of variation is minimal around the peak, suggesting that the latter is the best of the standard phase markers for analysing experimental LHY data.

As discussed in Section 4.1, for the DD system, both the deterministic solution and SSA average quickly attain a constant value. Species amounts can, however, be greater than zero for short time intervals in individual SSA runs (see Figure c). The following CSL property computes the probability that the total LHY level remains in the range between 96 and 500 hours.

 \vspace∗−.2ex\Prob\Globally[96,500](LHY_c+LHY_n≤0+e)

The results reported in Table 1 quantify the observed trend, showing that there is a small probability that LHY exceeds for , and that this probability decreases with increasing .

### 4.3 Distribution of Mutational Effects: robustness analysis

Thus far we have investigated aspects of the inherent random noise caused by the small size of the system and the discrete nature of its components. We have also explored the consequences of variations in the light environment. We now consider the impact of mutational noise caused by DNA changes that alter reaction rates by affecting the structure of corresponding proteins. This analysis differs from the above in that it requires simulation of vast numbers of different parameter combinations. We do this in the context of a recently developed framework for evolutionary systems biology that describes how the effects of DNA changes propagate through various levels of organisation and abstraction until they impact fitness at the highest level of biological functionality [?]. We limit our analysis to what has been described as the “third” level of the adaptive landscape [?], which maps a combination of biochemical reaction rates to a computable system-level property likely to affect fitness via a quantitative mechanistic model.

We generated a StochKit [?] model using the Bio-PEPA Workbench [?] in order to build on the code base of an analysis framework that has previously been used to investigate the evolutionary systems biology of a different, highly simplified circadian clock system that lacks entrainment [?]. We added a class for computing phase in particularly noisy circadian clocks, using two thresholds to block stochastic noise from generating artificially short ‘cycles’ with very low amplitude. Thresholds were set at 20% and 35% of the distance between the highest peak and lowest trough observed over 10 days, since minima are less variable than maxima (see Figure f). We used the peak of total TOC1 as a phase marker since our model checking results showed peaks to have higher signal/noise ratios (Figure 6(b)).

We consider the parameter combination used in the rest of the paper to be the wild-type and introduce mutational noise by multiplying degradation rates of all mRNAs in the system (deg7 and deg9 in Figure 1) by a uniformly distributed factor ([0.5,1.5], where 1.0 represents the wild-type). To differentiate between internal stochastic noise and mutational noise we ran 40000 simulations in two sets, each analysing 10000 time-courses for the wild-type and 10000 for mutants. In the first set (Figure 7(a–c)), the system size corresponds to that of a single cell. The resulting noise is substantial as indicated by the large width of the distribution of observed phase values (Figure 7(a)). In the second set of simulations (Figure 7(d–f)), we increased a million fold. This results in an excellent approximation of corresponding time-courses produced by ODEs; these simulations are thus denoted as ‘mean’ here. The resulting internal stochastic noise is minimal (see the extremely narrow distribution in Figure 7(d)).

Our ‘mean’ results show how phase is affected by mutational changes in the mRNA degradation rate (Figure 7(d,e)). If the same mutational changes are introduced in the noisy single cell system, they are much more difficult to detect (see Figure 7(a,b)). As a different way of visualising results, Figure 7(c,f) plots the high-level consequences of mutations (phase) against their low-level effect (the factor affecting mRNA degradation). The resulting graphs for the ‘mean’ system show clearly how a change in mRNA degradation rate is expected to affect the mean change in phase (Figure 7(f)). Figure 7(c) shows the corresponding plot for single cell simulations. It demonstrates that a wave of TOC1 peaks starts around the time of dusk (18h in these simulations) and continues for a few hours. The time at which the wave starts is minimally affected by the mutations we investigated, in stark contrast to the variance, which is much lower for higher mRNA degradation rates. In other words, a lower rate of mRNA degradation leads to greater internal stochastic noise. This increase in variance explains why the average phase is strongly shifted towards later hours for smaller mRNA degradation rates in the ‘mean’ system (Figure 7(f)), since the mean is strongly affected by large values in skewed distributions as found in (Figure 7(c)). Taken together, these results demonstrate that a higher mRNA degradation rate will on average move the phase forward and make it more reliable (decrease the phase variance), whereas a decrease will move it backwards and make it less reliable.

These subtle patterns in the mutational robustness of the clock could not have been uncovered without running large numbers of simulations with varying parameter combinations. Such work requires a different infrastructure for data analysis from projects that analyse only a few parameter sets.

## 5 Conclusions

We have studied the circadian network of Ostreococcus tauri by developing a process algebra model of the clock, based on an existing deterministic representation that was parameterised according to quantitative experimental data. We have investigated several key aspects of the clock, such as the conditions necessary for persistent oscillations in its constituent genes and proteins, as well as the effect of different environmental and mutational changes on the phases of these oscillations. We used the Bio-PEPA stochastic process algebra as a modelling language and applied a range of the analysis methods supported. Because of the low copy numbers of the molecular species involved in the clock network, we focused on stochastic analysis methods which enable the system’s intrinsic variability to be observed.

In particular, we used stochastic simulation to explore how the clock responds to changes in the light environment, and compared the results obtained against the behaviour of the corresponding deterministic system. We predict that the qualitative behaviour of the free-running (LL) clock will be dependent on the size of the cellular population; while damped oscillations will be observed in large populations (simulated by the SSA average and the deterministic model), self-sustained oscillations may be detectable in single cells (simulated by individual runs of the SSA). Model-checking was further used to investigate how the variability of the clock’s behaviour changes over a circadian cycle. By computing the time-dependent probability distributions of the clock proteins, we identified the time of peak expression as the most robust phase marker, suggesting its use as an experimental measure.

Finally, we added mutational noise to our system by randomly changing the overall rate of mRNA degradation and observing how this affects the phase of the oscillations of a key clock protein, likely to have an impact on fitness. We found that the large amount of stochastic noise at the single cell level makes it hard to observe functional changes that may be induced by mutations, without averaging over many observations.

A number of the novel hypotheses we have formulated in this modelling study may provide new biological insights into the behaviour of the Ostreococcus clock and will hopefully inspire subsequent experimental research. In addition, further theoretical work can build on the novel model-checking results reported here, to explore additional ways in which systems biology models can be automatically analysed using approaches based on concurrency theory. Our results demonstrate that the integration of different computational techniques is critical for fully quantifying the architectural [?] and mutational [?] robustness of the circadian clock.

### Acknowledgements

The authors thank Gerben van Ooijen for TopCount data and Jane Hillston and Andrew Millar for their helpful comments. The Centre for Systems Biology at Edinburgh is a Centre for Integrative Systems Biology (CISB) funded by BBSRC and EPSRC, ref. BB/D019621/1. CT is supported by The International Human Frontier Science Program Organization.

## Appendix A The full Bio-PEPA model

Kinetic parameters:
= = = = = = = = = = = = = = = = = = =

Time of the day at which dawn and dusk occur:
= = // for the LD 12:12 system

Time-dependent function representing light in LD system:

 light_time=H(((time−24⋅⌊time24⌋)−tdawn)⋅(tdusk−(time−24⋅⌊time24⌋)))

Scaling factor:
=

Initial values:
= = = = = = =

= = = = =

Functional Rates:
: //  light accumulator increase: mass action : //  light accumulator decrease: mass action : //  TOC1 transcription: Hill kinetics : //  TOC1 degradation: mass action : //  TOC1 translation: mass action : //  TOC1 conversion: mass action : //  TOC1 mRNA degradation: mass action : //  LHY transcription: Hill kinetics : //  LHY mRNA degradation: mass action : //  LHY translation: mass action : //  LHY nuclear transport: mass action : //  LHY degradation, cytosol: mass action : //  LHY degradation, nucleus: mass action

Species components:

Species observables:
= =

Model component:

 LHY_c(LHY_c_init) \syncs∗ LHY_mRNA(LHY_mRNA_init) \syncs∗ TOC1_a(TOC1_a_init) \syncs∗
 TOC1_mRNA(TOC1_mRNAinit) \syncs∗ acc(acc_init) \syncs∗ TOC1_i(TOC1_i_init) \syncs∗ LHY_n(LHY_n_init)

## Appendix B Modelling light/dark cycles in PRISM

const int max_days_simulated = 21;

module time

min : [0..59] init 0;
hour : [0..23] init 0;
day : [0..max_days_simulated] init 0;
light_time : [0..1] init 0;

[change_min] (min < 59) -> 60: (min'=min + 1);

[change_hour_dawn] (min = 59 & hour = (tdawn-1) ) -> 60: (min'=0) & (hour' = hour + 1) & (light_time' = 1);
[change_hour_dusk] (min = 59 & hour = (tdusk-1) ) -> 60: (min'=0) & (hour' = hour + 1) & (light_time' = 0);
[change_hour] (min = 59 & hour < 23 & hour != (tdawn-1) & (hour != tdusk-1) ) ->
60: (min'=0) & (hour' = hour + 1);

[time_change_day] (min = 59 & hour = 23 & day < max_days_simulated) ->
60: (min'=0) & (hour'=0) & (day' = day + 1);
[time_end_day] (min = 59 & hour = 23 & day = max_days_simulated) -> 60: (min'=0) & (hour'=0) & (day'=0);

endmodule


## Appendix C Additional simulation and analysis results

Figure 8 shows the comparison between the ODE results and the mean SSA behaviour with scaling factors and . We observe a slight difference between the SSA results with the different scaling factors. The reference value is (estimated from experimental protein counts), and consequently the predicted biological behaviour is that shown in Figure b. Note, instead, that the ODE results (Figure a) agree perfectly with the SSA results for the larger value (Figure c).

Table 2 lists the eigenvalues of linearisation at the fixed points of the deterministic model. These determine the dynamics in the vicinity of the steady-states [1]. All eigenvalues of the DD fixed point are negative and real, identifying it as a stable node [1]. The LL fixed point retains 3 of the DD eigenvalues; the remaining ones comprise two complex conjugate pairs with negative real parts. The steady-state is therefore a stable focus [1]. The positions of the fixed points were estimated using the Nelder-Mead simplex algorithm [2], as implemented in the MATLAB routine fminsearch. Derivatives were computed analytically using the MATLAB Symbolic Math Toolbox.

## References

### References

1. J. Guckenheimer & P. Holmes (1983): Nonlinear Oscillations, Dynamical Systems and Bifurcations of Vector Fields. Springer.
2. J.C.Lagarias, J. A. Reeds, M. H. Wright & P. E. Wright (1998): Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optimiz. 9(1), pp. 112–147.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters