Modeling metabolic networks including gene expression and uncertainties1footnote 11footnote 1Sponsored by German Federal Research Ministry (BMBF) Fkz. 031L0017A

Modeling metabolic networks including gene expression and uncertainties111Sponsored by German Federal Research Ministry (BMBF) Fkz. 031L0017A

H. Lindhorst henning.lindhorst@ovgu.de S. Lucia sergio.lucia@tuberlin.de R. Findeisen rolf.findeisen@ovgu.de S. Waldherr steffen.waldherr@kuleuven.be Otto-von-Guericke-University Magdeburg, Institute for Automation Engineering, Universitätsplatz 2, D-39112 Magdeburg, Germany Technische Universität Berlin, Laboratory for Internet of Things and Smart Buildings. Ernst-Reuter-Platz 7, D-10587 Berlin, Germany KU Leuven, Department of Chemical Engineering, Celestijnenlaan 200F bus 2424, 3001 Heverlee, Belgium
Abstract

Constraint based methods, such as the Flux Balance Analysis, are widely used to model cellular growth processes without relying on extensive information on the regulatory features. The regulation is instead substituted by an optimization problem usually aiming at maximal biomass accumulation. A recent extension to these methods called the dynamic enzyme-cost Flux Balance Analysis (deFBA) is a fully dynamic modeling method allowing for the prediction of necessary enzyme levels under changing environmental conditions. However, this method was designed for deterministic settings in which all dynamics, parameters, etc. are exactly known. In this work, we present a theoretical framework extending the deFBA to handle uncertainties and provide a robust solution. We use the ideas from multi-stage nonlinear Model Predictive Control (MPC) and its feature to represent the evolution of uncertainties by an exponentially growing scenario tree. While this representation is able to construct a deterministic optimization problem in the presence of uncertainties, the computational cost also increases exponentially. We counter this by using a receding prediction horizon and reshape the standard deFBA to the short-time deFBA (sdeFBA). This leads us, along with further simplification of the scenario tree, to the robust deFBA (rdeFBA). This framework is capable of handling the uncertainties in the model itself as well as uncertainties experienced by the modeled system. We applied these algorithms to two case-studies: a minimal enzymatic nutrient uptake network, and the abstraction of the core metabolic process in bacteria.

keywords:
Flux Balance Analysis, Robust Optimization, Predictive control
journal: ArXiv

1 Introduction

Bacteria are encountering variations in their natural living spaces and have developed complex regulatory mechanism to cope with these. Without full knowledge on these mechanisms we usually rely on optimization based methods to predict the behavior of the bacteria. The most prominent approach utilizing this concept is the Flux Balance Analysis (FBA) varma1994metabolic (). This idea optimizes the fluxes in steady state, such that a biomass producing flux is maximized. This simple approach has led to a number of different extensions like the iterative FBA methods presented in varma1994stoichiometric () and mahadevan2002dynamic (). They calculate optimal fluxes for each iterative step with the standard FBA and update the environmental conditions with regards to the calculated fluxes. While these models can already include changes in the environment, they tend to react very abruptly to these changes as adaptation processes are not modeled. These adaptation processes are mainly changes in the utilized pathways, which need the presence of the corresponding enzymes to work correctly. The relation between the pathway usage and the enzyme levels was first modeled in the Resource Balance Analysis (RBA) goelzer2011cell (). These models optimize the growth rate under fixed environment while including enzymatic flux constraints to limit uptake and metabolic fluxes. The solution are the flux rates and the enzyme distribution under the quasi-stationary assumption, leading to the optimized exponential growth. These methods culminated in the dynamic enzyme-cost FBA (deFBA) waldherr2015 (), which predicts all reaction rates and the enzyme levels over a time course. It is especially useful in investigating the adaptation process over the course of nutrient changes, e.g. the switch from aerobic to anaerobic growth conditions or the depletion of a preferred carbon source.

All of these modeling methods rely heavily on experimental measurements for the stoichiometry, flux bounds, catalytic constants, etc. These measurements naturally include errors and thus introduce a form of uncertainty to the biological models. While there are some extensions to the FBA handling uncertainties in the stoichiometry of the network zavlanos2011robust () or the steady-state assumption almass2014robust (), these are only applicable to static problems without the inclusion of enzyme catalysis.

In this work we present a general framework for the class of FBA methods capable of portraying uncertain parameters, and to ensure robust results with respect to these uncertainties. It is important to ensure robustness, because otherwise we might get infeasible results, e.g. in a model with uncertain nutrient dynamics a non robust solution might predict uptake of an already depleted source. With the deFBA being the most complex FBA method to date, we have chosen it as the basis for our framework and combine it with some ideas originating in robust Model Predictive Control (MPC) bemporad1999robust (). As most MPC applications naturally include uncertainties, these methods have already shown they can be used in real world applications. The most classic approach in robust MPC is the min-max MPC campo1987robust (). The solution for this problem is a series of control inputs minimizing the objective for a worst-case realization of the uncertainties. This approach could be included in the deFBA, but for more complex networks the problem may lead to a very conservative solution or even become infeasible as shown in scokaert1998min (). Another possible class of robust methods is the tube-based MPC rakovic2011fully (). Here the solution of the nominal control problem is combined with a so called ancillary-controller ensuring the evolution of the real uncertain system is constrained to a predefined tube centered around the nominal trajectory. This method guarantees stability and the solution satisfies all constraints under the uncertainties. Yet, optimality of the solution is not addressed, and can therefore not be used in the biological setting, as the optimality of the solution is the governing assumption for all FBA methods. Instead we use the multi-stage nonlinear MPC (NMPC) lucia2013 (), lucia2014 () as the robust method to be combined with the deFBA. The multi-stage NMPC uses a representation of uncertainties by means of a scenario tree on a finite prediction horizon. These trees are constructed by modeling the influence of an uncertainty as a discrete event, which creates additional vertices in the tree. Hence, the number of paths in this tree grows exponentially in the number of uncertainties and the observed time frame. This leads to a very large optimization problems if applied to deFBA models with a large number of measured values and a prediction over long time spans. Consequently, we apply the idea of the receding prediction horizon to the deFBA to minimize the size of the scenario tree and thus the problem size. Our first contribution is the short-term deFBA (sdeFBA), which utilizes the receding prediction horizon to form an iterative scheme to solve deFBA problems. In this context, we suggest a systematic way to choose the prediction horizon for the sdeFBA such that the results are qualitatively similar or superior to the deFBA solutions. Using the sdeFBA in itself can be beneficial for some applications, but most importantly is can be used as a predictor inside a MPC scheme.

The next step in formulating the robust framework is the integration of the scenario tree in the sdeFBA. We call the result the robust deFBA (rdeFBA). While capable to handle all kinds of uncertainties relevant to the metabolic networks, we focus on measurement errors in the catalytic constants. The effects of uncertainty in these constants are usually underestimated, but should be given proper thought as our examples show. We consider a minimal example to clarify the differences in the presented method and a medium sized example describing the core metabolic process in bacteria.

In summary, our proposed framework is an extension of the deFBA, allowing us to include different kinds of uncertainties within the model of metabolic networks. The result follows a strict optimality principle and is guaranteed to be admissible under the influence of the constraints. Furthermore, we depict the impact of measurement errors by comparison of robust and regular solutions of the examples mentioned above.

2 Dynamic Enzyme-cost FBA

Let us recap the deFBA and its mathematical description waldherr2015 (). This method was developed to analyze the biomass fluxes under changing environmental conditions as well as the enzyme levels necessary to realize these.

2.1 Modeling metabolic networks

The networks we are investigating consist of a set of molecular species, which we divide into external species , internal species , and a set of macromolecules . All species have units of molar amounts, . We assign each macromolecule in its molecular weight , which we use to measure the total biomass . We denote the reactions between the species by the resulting reaction fluxes , which are divided into three classes:

  • Exchange reactions transport matter between the inside and the outside of the cell;

  • Metabolic reactions convert the metabolites among each other;

  • Biomass reactions convert the metabolites into macromolecules or vice versa.

We are mainly interested into the time evolution of the species and reactions, , etc. But for convenience and easy readability we drop the dependency on time unless it is important. The effect of a flux on the time evolution of the species is given by the stoichiometric matrix as

(1)

where the matrix entries describe the stoichiometry of species in reaction . The macromolecules are usually composed of a large number of the small metabolites , which may lead to ill-conditioned dynamics (1) with large coefficients in the matrix . We tackle this by scaling the macromolecules and the according fluxes by a large factor as

(2)

Furthermore, we assume the internal metabolites to be in a quasi-steady-state . This assumption is valid, because the metabolic reactions are much faster in comparison to the biomass reactions and the external species are slow variables due to the large external volume (cf. waldherr2015 ()). This gives us the boundary layer condition of the scaled system as

(3)

so that is normalized with . For convenience we will write for the scaled fluxes. Let us first discuss the other constraints in the model. The most important one is the enzyme capacity constraint. Most of the macromolecules are enzymes whose amounts limit the possible reaction fluxes. We assume that the first entries in the vector correspond to these enzymes. The maximal reaction rates are determined by the reaction specific catalytic constant . We differentiate between the forward constant and the backward constant , if the reaction is reversible and both directions are catalyzed by the same enzyme. Since some enzymes are capable of catalyzing multiple reactions, we denote the set of reactions catalyzed by the enzyme as

(4)

The constraint for the enzyme then reads in short

(5)

To eliminate the absolute value in constraint (5) and transform it into a linear constraint, we have to include all possible sign-combination of the occurring -values. As an illustration consider catalyzes reversibly and irreversibly. Then we can write (5) as

(6)

The concatenations of the matrices to and to , are written as and , respectively. We include the scaling of by splitting this into the different kinds of fluxes as

(7)

and use the short notation

(8)

Usually, any organism needs a certain amount of structural macromolecules to keep working, e.g. the cell wall separating it from the outside. We express this necessity by enforcing certain percentages of the scaled biomass to be made of the structural components, e.g for a structural protein

(9)

with being the minimal fraction of the total biomass for . As before, the extension of (9) to the network level can be expressed by collecting the individual constraints into

(10)

where the rows of correspond to for different indices . We call (10) the biomass composition constraint. Moreover, we consider the positivity constraint of all species

(11)

and biomass-independent flux constraints

(12)

The last step to construct the optimization problem is the introduction of the objective functional as

(13)

depending on the initial values and the chosen end-time . Since is proportional to the total biomass, this objective functional represents the accumulation of biomass over time and corresponds to the assumption that the modeled organism optimizes itself to grow as fast as possible. Finally, we can formulate the full optimal control problem as

(14)

Depending on the initial conditions , this linear dynamic optimization problem determines the optimal fluxes and the corresponding enzyme levels at all times .

The problem with this approach is, that the optimization can lead to artificial results. For example, the optimal solution may only predict the production of the macromolecules with the best biomass yield. Whereas, this is an optimal solution from a mathematical standpoint, we know that biological systems usually do not behave this way. We present this behavior in the following example.

2.2 Example: Enzymatic growth

\setkeys Ginscale=0.55, tics=10\OVP@calcA⃝ \setkeys Ginscale=0.55, tics=10\OVP@calcB⃝ \setkeys Ginscale=0.55, tics=10\OVP@calcC⃝
Figure 1: Numerical results for the example in Section 2.2. Plot A⃝ shows a mixture of linear and exponential growth for and , B⃝ shows pure linear growth for , and C⃝ shows exponential growth until the end-time h for .

We present a simple model, which we already analyzed very detailed in Waldherr2017 (), to give the reader an idea how changing the catalytic constant or other system parameters, e.g. the end-time , can impact the quality of results. In this model the organism can either invest in the production of catalyzing enzymes to increase its capability for growth or it can produce storage components, which have a better biomass yield. We use a minimal model size by introduction of one external nutrient , one internal metabolite , two macromolecules , and three irreversible reactions with stoichiometry as shown in (15a)-(15c).

(15a)
(15b)
(15c)

The external nutrient represents a collection of components necessary for growth, such as carbon, nitrogen, etc. Further processed components made from these nutrients are collected as the internal metabolite . We differentiate the macromolecules into the group of enzymes , collecting the whole enzymatic machinery needed for growth, and non-enzymatic macromolecules . These reflect storage components such as lipids, starch, and glycogen. We handle the large difference in the stoichiometric values by scaling with the factor as described in (2). The scaled dynamics for this model are

(16)

According to steady-state assumption (3), we do not allow accumulation of the metabolites

(17)

We further assume all three reactions are catalyzed by the enzymatic macromolecules with the respective catalytic constants . Hence, the enzyme capacity constraint (8) for the scaled system is given as

(18)

We collect the molecular weights of the species in the biomass vector and define the objective functional as

(19)

with initial values and fixed end-time . We have already shown analytically in Waldherr2017 () that this system can experience basically three different growth modes depending on the chosen parameters and end-time . Growth in this sense means the change of the biomass over time .

  • A stationary phase with , which occurs as soon as nutrients deplete.

  • A linear growth phase starting at with the optimal solution

    (20)

    meaning the solution predicts only increase in the storage , while keeping the amount of enzymes constant.

  • An exponential growth phase starting at with the optimal solution

    (21)

If we assume unlimited nutrients, we would expect a biological system to work exclusively in the exponential phase and produce no storage at all. But the formulation as an optimization problem can generate solutions containing linear growth phases depending on the system parameters and the end-time. For this example, we fix the end-time h, and vary the value of with a nominal value of . The other parameters are shown in Table 1. The initial amounts are chosen as mol, mol, and mol to ensure nutrients are not limiting. We refrain from plotting the results for as these are given by the dynamics

(22)

We can observe a partwise linear solution plotted in Figure 1 A⃝. As mentioned, we chose the parameters such that the yield of the storage is larger (), but depending on the parameters exponential growth is preferred as it enables quicker nutrient uptake. Up until h, the solution shows an exponential increase in enzymes, but afterwards all resources are used to maximize biomass by producing only storage. This contradicts our expectation of exclusive exponential growth in a non-limiting nutrient situation, but slight changes to can change these results completely. Using the minimal value , we can observe a solution consisting of a single linear growth phase in B⃝. This change in the reproductive capabilities of is enough to make purely linear growth the most effective solution. Following Waldherr2017 (), we can even pinpoint the exact conditions necessary for optimal linear growth

(23)

Equation (23) also shows, that the end-time is as important as the other system parameters for the quality of the solution. The right plot C⃝ uses the maximal value of and shows a purely exponential growth behavior. This is interesting because the instantaneous yield of the storage is still better in this case (), however, the exponential solution is able to outgrow the linear growth mode for large values.

Inspecting the results of this minimal example, we are now facing two problems. Can we eliminate the mathematical artifacts from solutions like the one presented in A⃝ and how can we handle the uncertain value . Interestingly, the answers to these questions are highly related. First we extend the deFBA with means from Model Predictive Control (MPC) to answer the first one.

parameter
value 150 100 150 2 1
Table 1: Numerical values for the parameters in Example 1

3 Short term deFBA

The previous example shows artifacts of the optimization technique in form of linear growth phases (cf. Figure 1 A⃝ and B⃝). We present in this section an extension of the deFBA method, which helps to eliminate these artifacts and can reduce the computational cost of finding the optimal solution.

3.1 Using a receding prediction horizon

An obvious mismatch between our models and a real microorganism is a vast difference in knowledge on the development of the environment. The deFBA depicts a deterministic environment and solves an optimization problem with exact information for all times . Real biological systems on the other hand are acting more like Model Predictive Controllers (MPC) rawlings2009 (), meaning they react and adapt to changing conditions on the fly. Hence, we wanted to use the basic idea of a reacting system in our model by the inclusion of the receding prediction horizon.

In the original deFBA formulation (14) we solve one large optimization problem over until the end-time . here, we define instead a prediction horizon representing the time the system can plan ahead. Thus, we solve an iterative scheme opposed to a single optimization problem. We introduce a step size to discretize time on the equidistant grid

(24)

Each of the individual optimization problems over the control times , is defined as

(25)

After solving the problem (25) for fixed , the first part of the calculated trajectories is appended to the solution curves . Then the next iteration is initialized with the updated values . For our numerical results we discretize all variables on (24) and denote the approximation of the species states and reaction fluxes . We further express the prediction horizon as . The dynamics of the system are substituted by a collocation method . As an example, the problem (25) can be discretized with an 1-step collocation method as

(26)

We call the problem (26) the short-term deFBA (sdeFBA). Solving the system starting at the -th time step, gives us the optimal fluxes for the next -time steps. However, we only implement the first fluxes and reformulate (26) for the next iteration for .

This approach is beneficial in multiple ways, e.g. we already mentioned the possibility to eliminate mathematical artifacts, see also Example 4.2. But the most relevant advantage for us is the availability of algorithms to handle uncertainties in such problems via multi-stage NMPC lucia2013 ().

Since we have already shortly discussed in Example 2.2 that different end-times may lead to qualitatively different results, we are facing the same problem in choosing the prediction horizon . In most applications of MPC the prediction horizon is dictated by real-time requirements for the computation and is chosen as large as possible to maximize performance. For our approach we are interested in a prediction horizon such that a solution including exponential growth phases is attainable. This means, we need to choose sufficiently large such that linear solutions become suboptimal, whilst keeping as small as possible for the sake of low computational cost.

3.2 Choosing the prediction horizon

Figure 2: Illustration of method to choose the prediction horizon. Upper bound on linear growth shown in red (), lower bound on exponential growth in blue (), and the optimal solution in brown (x).

Our idea to choose the prediction horizon is based on the results of Example 2.2. Looking at the plots A⃝ and B⃝ from Figure 1, we can see that for short times linear growth is faster than exponential growth. But an exponential solution outgrows a linear one after enough time, as sketched in Figure 2. Hence, we calculate the maximal achievable linear growth rate and compare it to a suboptimal solution with enforced exponential growth. The time at which the suboptimal exponential solution outgrows the best linear one is then chosen as the prediction horizon, to ensure the solver will choose an exponential growth solution if available. In the following we show how to compute these solutions.

Based on the initial biomass composition we define an upper bound on linear growth by maximizing the slope of the curve at . Additionally, we set the system dynamics to , with . This translates to the optimization problem similar to the standard FBA varma1994metabolic ()

(27)

Please note that (27) is independent of the real nutrient situation as it represents a strict upper bound. We use the solution to obtain the corresponding biomass trajectory by solving the differential equation

(28)

To calculate a lower bound on the exponential growth solution we construct another optimization problem inspired by the Resource Balance Analysis (RBA) goelzer2011cell (). In RBA one is looking for enzyme levels which maximize the growth rate of the system. We enforce balanced growth baroukh2014drum (), meaning the biomass composition is fixed during growth, via the dynamics . As before, we solve this only at time and identify the flux vector realizing this growth rate.

(29)

The biomass curve with the maximal growth rate can be identified by solving the system dynamics

(30)

With solutions for linear and exponential growth in hand, we now calculate the contact time , at which the integrated biomass curves meet. The integral of the biomass curve for exponential growth is derived from (30) as

(31)

and the corresponding integral for the linear case (28) is computed as

(32)

Both solutions (31) and (32) satisfy the deFBA constraints (14) but usually are suboptimal solutions to the original deFBA problem. Therefore, we interpret (31) as a lower bound for exponential growth. We calculate the contact time by solving

(33)

The solution always exists for (33). If this is the only solution, the sub-optimal biomass curve is larger at all times and we can choose the prediction horizon arbitrarily and still get an exponential solution from the sdeFBA. Secondly, there may exists another solution for (33) with . This implies that on a horizon the sub-optimal exponential solution outgrows the best linear solution (cf. Figure 2). Thus, as long as nutrients do not deplete and we choose , the optimal solution for (25) must be an exponential growth phase. Because we calculated based on a comparison of upper and lower bounds without the inclusion of the actual nutrient dynamics, depending on the specific situation it could be that a smaller prediction horizon also gives exponential growth.

4 Robust deFBA

4.1 Using multi-stage MPC

Having the sdeFBA established in the last section, we can now show how we include uncertainties in our method via the multi-stage NMPC approach presented in lucia2013 (); lucia2014 (); also presented in e.g. delaPena2005b () for linear systems. As mentioned before we want to model very different uncertainties, but mostly unknowns in the model structure like parameter uncertainties, unknown stoichiometry, etc. While our framework can handle different kinds of uncertainties at the same time, we focus in this work on the inclusion of measurement errors in the -values as leading example.

The multi-stage NMPC is based on a representation of the uncertainties by a branching tree as shown in Figure 3 (left). Each branch represents the effect of an uncertainty and the optimal fluxes to handle this effect. As we can only handle a finite amount of scenarios we assume all uncertainties are bounded at all times. A path in the tree from the root to a final leaf is called a scenario. To make the notation easier to read, we use an upper index for scenarios and a lower index for prediction steps. So with each new iterative step , we get another set of scenarios depending on the number of uncertainties . The main challenge of this approach is that the scenario tree grows exponentially with the prediction horizon . However, previous results lucia2014 () have shown that a full coverage of the scenario tree is usually not necessary and we can assume that the uncertainties stay constant after a certain time span. We choose this time span to be identical to the step size of the time grid used for the discrete approximation of the system. This way we can simplify the tree to the form presented in Figure 3 (right). Constructing the scenarios with the extreme values for each , we can ensure the calculated fluxes are feasible for all scenarios. Thus, considering uncertainties we get a total of different scenarios, with the length of each scenario being defined by the prediction horizon . Figure 3 (right) shows how to handle a system with three uncertainties where each branch corresponds to a of the set

(34)

The chronological first set of fluxes in each iteration is the most important one, as this is the one we want to implement to the final solution in each iteration. These fluxes have to be feasible under all uncertainties , which we denote with .

Figure 3: Scenario trees for different robust horizons. (Left) Example of a full branching tree. (Right) Reduced scenario tree for a robust horizon and three uncertainties. Only maximal deviations from nominal values are included.

Multi-stage MPC dictates us to implement the scenario tree into the discrete sdeFBA (26) by optimizing over all scenarios at once. The new objective function is chosen as the weighted sum of the individual objectives (13)

(35)

with start-time , end-time , and the weights . The weights can be used if additional information on the likelihood of a scenario is given, but usually we use . Further, we define an index set for the sake of clarity. The robust approach is then defined as an iterative scheme over

(36a)
(36b)
(36c)
(36d)
(36e)
(36f)
(36g)
(36h)
Additionally we have to include the non-anticipativity constraint, which enforces the copies of the first part of the trajectories to be identical for all scenarios.
(36i)

The first part of the problem (36a)-(36h) is identical to the approach we presented for the sdeFBA, but optimizes over all scenarios at once. We call the full scheme (36) the robust deFBA (rdeFBA). In the case of measurement errors, the only influence of the uncertainties is found in the matrices , but the framework allows to make all entries in the constraints dependent on the scenario. As with the sdeFBA, we implement only the first time interval to the solution. For both examples presented in this work, the dynamics are discretized using orthogonal collocation on finite elements using the do-mpc tool lucia2016_CEP (). Within do-mpc the resulting optimization problems are solved by IPOPT wachter2006 (), with exact first and second order derivatives computed via CasADi andersson2012 (). Do-mpc also offers an efficient implementation of multi-stage NMPC. While the rdeFBA constructs linear programs, the tools we are using support non-linear problems and we can easily include non-linear dynamics or constraints in our models.

We calculate the prediction horizon for the robust deFBA in the way suggested in Section 3.2. Keep in mind that changes in the catalytic constants may lower the achievable growth rate of the system significantly. Hence, it may be necessary to calculate the prediction horizon in the scenario with minimal -values.

4.2 Example: Enzymatic Growth

We visit Example 2.2 a second time to see how the robust solution differs to the nominal one for this minimal example. We assume all three -values for the reactions include an error up to 20% of their nominal values. Hence, we are looking at eight different scenarios consisting of all combinations of extremal values

(37)

We calculated the prediction horizon as h via the algorithm described in Section 3.2 and set the end-time to  h. As in the previous example, the initial amount of nutrients is chosen so large that they do not limit growth. Figure 4 shows three plots for different scenarios. The plot A⃝ is the reference solution of the sdeFBA for the nominal values of the catalytic constants. Due to the increased prediction horizon in comparison to the short end-time , this solution shows only exponential growth. In particular, we do not observe the artificial linear growth phase (cf. Figure 1 A⃝). Thus, the impact of the fixed end-time is reduced via the sdeFBA.

The robust solution is presented in plot B⃝. Obviously, the inclusion of the uncertainties decreases the overall growth rate significantly. This is due to the fact that the calculated trajectories are robust. Thus, they must be feasible for all scenarios at the same time. The scenario using the minimal values is in this sense acting as an overall upper bound to the growth rate. We call this the minimal scenario. The results of the sdeFBA in which we used these minimal values is absolutely identical to the robust solution shown in plot B⃝.

\setkeys Ginscale=0.75, tics=10\OVP@calcA⃝ \setkeys Ginscale=0.75, tics=10\OVP@calcB⃝
Figure 4: Prediction horizon is set to h and step size is h. Plot A⃝ shows the reference solution of the sdeFBA with nominal -values and plot B⃝ shows the robust solution.

Thus, we conclude that for such a simple system the robust optimization is unnecessary. But we can see impacts of the uncertainties in the Example 4.3, which shows a larger networks including multiple pathways for biomass production.

4.3 Example: Core Carbon Network

In this section, we consider a more sophisticated model for the uptake of different nutrients and the transformation of these into biomass. It corresponds to the core network inside most microorganisms. This example was previously studied in waldherr2015 () with help of the deFBA under dynamic nutrient conditions. We only present one environmental setting and compare the results for the different methods. All transport, metabolic and biomass reactions are shown in the Tables 2-3. The different enzymes are either labeled for catalyzing metabolic reactions or for transport reactions. The catalytic constants for the enzymes are derived from typical values in metabolism (bar2011moderately ()schomburg2002brenda ()milo2010bionumbers ()). The constants for the biomass reactions are taken from measurements of translation progression rates inside E.coli young1976 (). The model considers ribosomes R as the essential part of the enzymatic machinery producing all biomass components and itself. Structural components are collected in the macromolecule S. Identifying S with the exterior membrane of the cell it also represents the surface area and thus limits the exchange of O2, D, and E. In this model, the structural component must be more than 35% of the total biomass for the cell to be working correctly,

(38)

For the rdeFBA we assume that three of the -values include a measurement error. We have chosen the values

(39)

where and . Large values for E make the production of ATP utilizing O2 more effective, while is essential for the production of the components G. The -value describes the efficiency of self-reproduction of the ribosomes R. Larger values lead to a quicker growth and deFBA simulations have shown that this may also impact the decision if the cell prefers to stay in exponential growth or goes to a linear growth phase. We call any phase in which any of the transporter enzymes are beeing produced an exponential growth phase, while in a linear growth phase only the best yield macromolecule S is produced. Notice that the cell will only produce enough S to fulfill the biomass composition constraint (38) in an exponential growth phase. In a linear growth phase production of S is maximized, as the structural component has the largest biomass yield.

This network already captures a large variety of possible behaviors and decisions for the biomass composition. Varying the environmental conditions, especially the presence of oxygen, leads to very different enzyme levels and pathway usage. Thus, we are using oxygen dynamics comparable to an aerated batch process

(40)

with the oxygen inflow mol/min, the ventilation rate 1/min, and the cellular uptake flux , which is an optimization variable.

We use the same objective functional as before, namely

(41)

We set the initial environment to Carb1(0)=2 mol, Carb2(0)=30 mol and mol. We use the Resource Balance Analysis goelzer2011cell () to identify suitable initial values leading to an optimal growth rate in this environment for the nominal -values. This ensures, that differences between the robust solution and the non-robust only originate in the modeling method. We constrained the amount of macromolecules at time zero to be g with the scaling factor for the macromolecules chosen to be . The resulting initial values are shown in Table 3. We calculated the prediction horizon as min with the algorithm presented in Section 3.2 and use it for the sdeFBA as well as the rdeFBA. The step size for the discretization was chosen as min.

The result for the different methods are shown in Figure 5. At the top left the biomass is plotted for the different optimizers. The sdeFBA delivers qualitatively the same results as the deFBA with minor numerical differences. Hence, we only show the sdeFBA results in more detail, since the enzyme plots for the deFBA are indistinguishable from the ones of sdeFBA. The overall growth rate of the robust deFBA is much lower than the one observed in the non-robust methods. The cause of this becomes clear when looking at the other plots in Figure 5. In the top right plot we compare the time development of the nutrients. While we can also observe a time delay as in the left plot, we also see that the robust solution does not deplete the Carb1 source at first. This is very surprising as the parameters are chosen in a fashion to make Carb1 the preferred carbon source. The two plots at the bottom of Figure 5 show the time development of the biomass composition and give more insight into the different solutions. While the sdeFBA predicts an initial increase in Ribosomes along with an increase in enzyme production capacity, the rdeFBA focuses on the production of structure S. This behavior can be explained with the uncertainty . The robust solution optimizes for all scenarios at once and four of those include a 10-fold increased efficiency for the self replication of the ribosomes. Therefore, the solver decides against an initial production of ribosomes. Furthermore, the rdeFBA predicts an increased E production for increased ATP production from oxygen. The sdeFBA solution on the other hand keeps the amount of structural proteins to the enforced minimum (38) until the prediction horizon includes the time of nutrient depletion. Only then the solution predicts an increase in structure molecules S as it is the most effective biomass component in terms of nutrients to biomass conversion. Overall it is very confirming to see that all solutions reach an identical end-value in total biomass. Hence, the effectiveness of the production is not affected by our changes in the catalytic constants. But at the same time we can observe changes in the quality of the solutions and even more in the value of the objective (41).

Reaction Enzyme
Exchange reactions
Carb1 A 3000
Carb1 A 2000
3000
3000
S 1000
S 1000
S 1000
Metabolic reactions
A + ATP B 1800
B C + 2 ATP + 2 NADH 1800
C 2ATP + 3D 1800
C + 4NADH 3E 1800
B F 1800
C G 1800
G + ATP + 2NADH H 1800
G 0.8C + 2NADH 1800
O2 + NADH ATP 1800
Table 2: Exchange and metabolic reactions, together with their rate constants and catalytic constants . For reversible reactions we use the same value for forward and backward reactions.
Biomass reactions