A Scalable Algorithm to Explore the Gibbs Energy Landscape of Genomescale Metabolic Networks
Abstract
Abstract – The integration of various types of genomic data into predictive models of biological networks is one of the main challenges currently faced by computational biology. Constraintbased models in particular play a key role in the attempt to obtain a quantitative understanding of cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities of an organism based on minimal assumptions that describe the steady states of the underlying reaction network via suitable stoichiometric constraints, specifically mass balance and energy balance (i.e. thermodynamic feasibility). The implementation of these requirements to generate viable configurations of reaction fluxes and/or to test given flux profiles for thermodynamic feasibility can however prove to be computationally intensive. We propose here a fast and scalable stoichiometrybased method to explore the Gibbs energy landscape of a biochemical network at steady state. The method is applied to the problem of reconstructing the Gibbs energy landscape underlying metabolic activity in the human red blood cell, and to that of identifying and removing thermodynamically infeasible reaction cycles in the Escherichia coli metabolic network (iAF1260). In the former case, we produce consistent predictions for chemical potentials (or logconcentrations) of intracellular metabolites; in the latter, we identify a restricted set of loops (23 in total) in the periplasmic and cytoplasmic core as the origin of thermodynamic infeasibility in a large sample () of flux configurations generated randomly and compatibly with the prior information available on reaction reversibility.
Author Summary – The operation of biological systems is constrained under all circumstances by the laws of physics. Thermodynamics, in particular, dictates preferential directions in which biochemical reactions should flow at stationarity. When applied to cellular reaction systems (like metabolic networks), it favors the emergence of some (thermodynamically feasible) ways to organize the flow of matter while prohibiting others. The development of detailed predictive models for the biochemical activity of a cell relies on the possibility to integrate the laws of thermodynamics in genomescale reconstructions of cellular metabolic networks. In this work we have devised an efficient relaxation algorithm to implement thermodynamic constraints in genomescale models. Besides allowing to check for thermodynamic feasibility of reaction flow configurations, it is also capable of providing information on other relevant physicochemical quantities. We have applied it to two cellular metabolic networks of different complexity, namely that of human red blood cells and that of the bacterium Escherichia coli. In the former case, we have obtained predictions for the intracellular chemical state (in terms of metabolite concentrations and reaction free energies) consistent with empirical knowledge; in the latter, we have effectively corrected thermodynamically infeasible flux configurations.
pacs:
Valid PACS appear hereIntroduction
Constraintbased models of cellular metabolism are important tools to analyze and predict the chemical activity and response to perturbations of cells without relying on kinetic details that are often unavailable. In such frameworks, the metabolic capabilities of a cell are inferred from the overall configuration space compatible with minimal physicochemical constraints describing the nonequilibrium steady state of the underlying reaction network. First, feasible reaction flux vectors need to satisfy massbalance conditions. Then, according to the second law of thermodynamics, in an open chemical network at steady state and constant temperature and pressure the direction of each reaction should ensure a decrease in Gibbs energy. Thermodynamic consistency of flux configurations satisfying massbalance alone is in general not guaranteed due to the presence of infeasible cycles book ; beard1 ; beard2 , even if reaction reversibility is preassigned based on careful estimations of chemical potentials in physiologic conditions alberty (a procedure that was recently extended to genomescale fleming ; hatz ). Besides the flux organization, several other aspects involved in the analysis of genomescale metabolic networks hinge directly on the explicit inclusion of thermodynamic constraints into the models, like the estimation of metabolite concentrations or the identification of reactions subject to regulation kummel2 .
Much work has been concerned with implementing thermodynamic constraints in genomescale models of metabolism. The removal of thermodynamic inconsistencies was proved to be useful in estimating concentrations and affinities besides fluxes in FluxBalanceAnalysis kau ; palrev , whose goal is to identify massbalanced flux configurations maximizing a predetermined physiological objective function sauer ; biomass . This has been achieved for instance by building mixed integerlinear or nonlinear optimization problems that minimize the Euclidean distance of concentration levels from experimentally known values hoppe , or ensure the absence of cycles in the resulting flux pattern tbmfa ; cobrall ; hepat . On the other hand, information on feasible Gibbs energy ranges can be retrieved by exploiting the patterns of reaction interconnections encoded in the stoichiometry to narrow the experimental bounds kummel2 . These procedures may however require reliable prior thermodynamic information on metabolites and/or reactions, a type of knowledge that is often unavailable. An important lesson of this kind of approaches is that the key input for the thermodynamic profiling of a reaction network is often provided by the stoichiometric matrix beardstoc .
The scalability of algorithms to solve mixed integerlinear (or nonlinear) programming problems may become an issue when the underlying network size is large or when one is interested in sampling the solution space (for both free energies and fluxes) rather than focusing on a potentially small set of configurations (e.g. optima). Luckily, however, solutions to computationally hard problems can often be generated efficiently with the help of heuristic algorithms based on simple local rules. The use of messagepassing algorithms to characterize the highdimensional volume of the solution space of FBA models BMC (with a convex, continuous solution space) or to solve large combinatorial constraintsatisfaction problems KSAT (with a discrete and possibly fragmented solution space) is an example of the success of this kind of strategy.
Our goal in this paper is to obtain information about the landscape of Gibbs free energies compatible with a given vector of reaction directions by following a route that allows to use all stoichiometric information via heuristics inspired by perceptron learning. In a nutshell, the method we propose consists in exploiting the network’s structure to iteratively build up correlations between the chemical potentials of the reacting species starting from a seed of empirical biochemical knowledge, until a thermodynamically consistent profile is achieved. The resulting algorithm is completely scalable and can be employed for different purposes, like checking the feasibility of flux configurations, identifying and removing infeasible cycles, estimating reaction affinities, and obtaining bounds for (log)concentrations and free energies of formation. In the following, we describe the method in detail, providing a mathematical proof of convergence as well as theoretical arguments highlighting the main idea behind the procedure. As applications, we focus on two metabolic networks of rather different complexity. First, we shall obtain a detailed reconstruction of the Gibbs energy landscape underlying metabolic activity in the human red blood cell (hRBC) starting from the flux maps obtained in palsson ; andrea . Then, the metabolic network of Escherichia coli, iAF1260 bigcoli , will be analyzed to eliminate infeasible cycles from randomly generated flux configurations.
Materials and Methods
Materials
The cellular systems analyzed in this study are (i) the model of the hRBC metabolism developed inWiback:2002fk and discussed in palsson , and (ii) the reconstructed metabolic network of the bacterium Escherichia coli iAF1260, presented in bigcoli . The former consists of 35 intracellular reactions among metabolites subject to uptake fluxes. The latter includes reactions among metabolites. The basic information extracted from these models is the stoichiometric matrix (with and the number of metabolites and reactions, respectively), denoted below as . For Escherichia coli, we will consider the inner matrix of periplasmic and cytoplasmic reactions without repetitions, which consists of reactions among chemical species, once periplasmic and cytoplasmic metabolites are distinguished. According to the reversibility assignment given in bigcoli , of the processes above are unidirectional, with being reversible. The biochemical data we shall refer to or make use of include the standard free energies of formation of metabolites, given in kummel , where they are computed at , , and ionic strenght , according to the prescriptions of alberty . The estimated intracellular concentration ranges for the hRBC were extracted from the Bionumbers database Milo:2010fk and refer to measurements in different settings. It is worth to notice that the experimental errors on such values reflect the intrinsic uncertainty due to statistical celltocell fluctuations (since measurements of concentration levels are usually carried out by averaging over numbers of cells ranging from to ) and analytical error. The stoichiometric matrix and thermodynamic potentials employed for the analysis of the hRBC are respectively available as Supporting Datasets S1 and S2.
Methods
Algorithm to compute chemical potentials
According to the second law of thermodynamics, in an open system at constant temperature and pressure the Gibbs energy (where , , are respectively the energy, volume and entropy of the system) never increases spontaneously. This means that the direction ( for forward, for backward) of every chemical reaction occurring in the system should be opposite to the Gibbs energy change induced by reaction , i.e.
(1) 
The equality holds if reaction is in equilibrium. Denoting by the stoichiometric coefficient of reactant in reaction , with the standard sign convention to distinguish substrates () from products (), the vector of ’s for a wellmixed system can be written in terms of the chemical potentials (where is the Gibbs energy per mole of species ) as (see e.g. book , Ch. 1)
(2) 
Given flux vectors such that (i.e. steadystate flux configurations), equation (2) implies that , i.e. that the ‘loop law’ holds. The Gibbs energy landscape reconstruction problem consists, given a vector of reaction directions (), in generating vectors that satisfy the system of linear inequalities
(3) 
Note that the solution space of (3) for fixed directions is convex (while nonconvexity can arise if directions are allowed to vary). Relaxation methods (see e.g. LP , Ch. 12, or gof ) are among the most effective procedures to find solutions of systems such as (3). These techniques date back at least to the Jacobi method for solving systems of linear equalities, and were extended to inequalities in the 1950’s agmon ; motz . In essence, they are iterative methods in which variables are updated so that at every iteration one of the violated inequalities is fixed. While this readjusts the entire vector without a guarantee that constraints that were previously satisfied will be broken, convergence to a solution (if it exists) is guaranteed if the update step is chosen wisely. We shall employ a relaxation algorithm known as MinOver, which was developed in the context of neural network learning mez , and has been employed, in a slightly modified form jstat , to explore the space of flux states compatible with minimal stability constraints à la Von Neumann pnas ; kyoto . Figure 1 displays a flowchart of the procedure for the present case.
One starts from a ‘trial’ probability distribution of chemical potential vectors. Its role for the moment is simply that of initializing the algorithm, which is done by generating a random vector under . For simplicity, one may think that , with prescribed distributions , e.g. uniform over a given interval: in this case each initial is selected randomly and independently from its trial distribution . On the other hand, might contain prior biochemical information, e.g. by being a uniform distribution centered around the known experimental values of and of sufficiently large width to span several orders of magnitude in concentrations. (The precise construction of for our case studies is discussed below.) The algorithm is based on the following steps:

Generate a chemical potential vector from .

Compute from (3) and (i.e., is the index of the least satisfied constraint).

If then is a thermodynamically consistent chemical potential vector for ; exit (or go to 1 to obtain a different solution).

If , update as
(4) (where is a constant and is the th column of matrix ), go to 2 and iterate.
As is generally true in MinOver schemes, the reinforcement term in (4) drives the gradual adjustment of chemical potentials by ensuring that, at every iteration, the least satisfied constraint (labeled ) is improved. Convergence to a solution, if one exists, is guaranteed for any . To see it, suppose that a vector exists, such that
(5) 
with a constant (in other words, is a solution of (3)). From Eq. (4), the chemical potential vector at the th iteration step, , satisfies
(6)  
where is the value taken by at step . Similarly, one finds
(7) 
with . As a consequence,
(8) 
However by the CauchySwartz inequality , and an upper bound for the number of steps can be obtained from which to leading orders in and gives
(9) 
where , and are constants proportional to . Hence starting from an initial random vector of chemical potentials the algorithm is able to obtain a new vector ensuring that (3) is satisfied. Reinitializing the algorithm from a different random vector sampled from allows to retrieve another solution and in turn explore the space of ’s satisfying (3).
In essence, the final outcome of multiple (random) initializations of the above algorithm is a set of correlated probability distributions for the ’s (at odds with the , which was assumed to be a product measure, so that no correlations were present initially). The algorithmic origin of such interdependencies can be understood considering that chemical potentials are updated dynamically through a series of reinforcement steps of the form . It follows that the final value of can be written as , where is the trial chemical potential sampled from (i.e. the initial value of ) and is an index which is updated (increased or decreased by one according to the sign of the reaction) each time reaction tries to invert. The connected correlations between chemical potentials (where is an average over all possible choices of the initial conditions) can thus be decomposed as
(10) 
where is the variance of and if and otherwise (so that ). In the Gaussian approximation for , the leading term (of ) in the above sum is
(11) 
Therefore to leading order, the dynamics tends to correlate (resp. anticorrelate) the chemical potentials of metabolites typically appearing on the same (resp. opposite) side of the reaction equations. In a sense, the above scheme allows to modify by building up correlations between chemical potentials according to the interconnections encoded in . Note that, at odds with the method proposed in kummel2 , the resulting ’s can exceed the initial bounds defined by .
If there is no prior information on the direction of some reactions (e.g. because they are putatively reversible), the corresponding constraints (3) are formally absent, as if . However, the above method still allows to retrieve information about the chemical potential of a metabolite involved in them, provided it is not employed in reversible processes only, in which case its clearly is never updated.
Finally, some observations are in order about the solution space of (3), which in general has the form of an unbounded cone passing trough the origin. If one is interested in uniform sampling the space of ’s making thermodynamically feasible, boundedness is an essential precondition. The simplest way to obtain a bounded solution space consist in clamping some ’s, i.e. in keeping them fixed at definite values throughout the updating. Note that fixing some ’s is also crucial to set a scale for chemical potentials. The same effect can be achieved by assigning hard ranges of variability for potentials in the form of bounds like (e.g. according to experimental or biochemical information). Such inequalities can simply be added to the list of thermodynamic constraints (3). Alternatively, one may add other types of global, nonhomogeneous constraints bearing a physical justification. For instance, if uptake fluxes are included in , the chemical potentials of the external metabolites should be fixed; or, volume constraints for the feasible ranges of concentrations could be added if the standard free energies of formation are known. Once this is done, the system (3) becomes inhomogeneous and its solution space is formed by the union of a convex polyhedron and a cone; boundedness can be achieved if and only if the cone shrinks to the null vector, which occurs if the related homogeneous system of equations has no solutions apart from the trivial one, solodov . In synthesis, one can obtain a bounded solution space for (3) by clamping the chemical potential of a sufficient number of metabolites to ensure that the homogeneous system of equalities associated to the inhomogeneous system of inequalities obtained by clamping some ’s in (3) admits the null vector as its only solution.
A number of interesting theoretical and computational questions arise at this stage, regarding e.g. the minimal amount of prior information on chemical potentials needed to bound the solution space of (3), or computationally efficient and scalable methods to obtain uniform sampling (besides Monte Carlo, which may be infeasible at high dimensions as suggested by the “curse of dimensionality”, see e.g. Schellenberger:2009fk ). To our knowledge, there is no mathematical proof that MinOver schemes are capable of sampling a bounded solution space uniformly, although low dimensional tests suggest that this might indeed be the case pnas . Our goal in the present paper, rather than uniformly sampling the space of chemical potentials granting feasibility, is that of exploring feasible configurations “close” to the prior biochemical information we have injected. To quantify this idea, we have explicitly compared the solutions obtained via the above procedure with those retrieved from the minimization of a cost function (the average Euclidean distance between the prior and the solution) and by a standard relaxation method (see Results: Exploring the Gibbs energy landscape of the hRBC metabolic network). The heuristics we present indeed turns out to roughly minimize , with the advantage of being considerably faster than a penalty method. In addition, it allows to access refined information on the Gibbs energy landscape (i.e. compute chemical potentials and Gibbs energy changes) even when the initialization is complemented by a noisy or inconsistent biochemical prior, since the algorithm correctly identifies and gradually removes inconsistencies. The solutions thus obtained identify a restricted and statistically wellbehaved set of chemical potentials with physiological significance. We are currently unable to go beyond this point. In addition, we shall see that by the same method one can verify the thermodynamic consistency of, and eventually adjust, specific flux configurations of biochemical reaction networks.
Algorithm to identify and remove loops
The algorithm just discussed generates chemical potential vectors given a thermodynamically feasible vector of reaction directions. A generic assignment of reaction directions, however, could be such that the system (3) has no solutions apart from the trivial one. In accordance with the FarkasMinkowski lemma solodov this happens if and only if there is at least one infeasible loop, i.e. if there is a set of intracellular reactions for which positive constants exist such that
(12) 
In presence of a loop, relaxation methods (MinOver included) do not converge, just because the least satisfied constraint moves along the loop causing the iteration to cycle indefinitely. More precisely, in presence of a loop the MinOver dynamics becomes periodic or almost periodic. For a periodic dynamics so that, using (4), for any we have
(13) 
Comparing this with (12) it can be gathered that reactions updated over a period define a loop of length (setting to the number of times constraint has been updated). This suggests a simple way to correct configurations of reaction directions that are thermodynamically infeasible to start with:

While running the algorithm to compute chemical potentials for a large number of iteration steps , keep track of the last say least unsatisfied constraints, i.e. store for , with a reasonably large number (e.g. ), and count the number of different reactions appearing in the series.

Search, within such subset of reactions, for a loop of length by looking for solutions to equation (12) with for reactions only, for all uples, starting from and increasing .

If a loop is found, change the direction of one of its reversible reactions chosen with uniform probability.
In the Results section we shall use this heuristics to spot loops and eliminate them in all of the infeasible configurations we shall generate for a large metabolic network (E. coli iAF 1260). For this network, it turns out that in all runs and that accounting for short loops (of length up to ) suffices to correct all randomlygenerated configurations we tested. This is rather important since, in principle, step (ii) of the above procedure could be exponential in .
A computer code implementing the algorithms to compute chemical potentials and identify and remove infeasible loops is downloadable from http://chimera.roma1.infn.it/SYSBIO/
Results
Exploring the Gibbs energy landscape of the hRBC metabolic network
As a first application, we have employed the MinOver scheme outlined above to analyze the thermodynamic landscape of the hRBC metabolic network. As a starting point, we have considered the flux configurations obtained in palsson and andrea respectively by Monte Carlo sampling of the solution space of mass balance equations (MBE) and by MinOver sampling of states compatible with Von Neumann’s constraints (VNC). In brief, MBE describe steadystate fluxes in terms of Kirchhofftype laws enforced at metabolite nodes of the network as . In such a scenario, intracellular concentrations are clamped. VNC, instead, ‘soften’ the massbalance equations by requiring that, for intracellular metabolites, . In the underlying steady state intracellular concentrations can grow in time if flux vectors allowing for it exist. Once the nutrient availability is set, VNC define a selfconsistent flux problem where the system selects how much of the nutrients to use and, eventually, which metabolites are globally produced. For intracellular metabolites, VNC correspond to a stability requirement since, in a dynamical setup, a violation of one of the inequalities implies the existence of a metabolite whose total amount used as input in metabolic processes exceeds the total amount returned as output (see e.g. gale ; Martino:2005mi ). VNC are also closely linked to the metabolite producibility problem introduced in ib00 . We shall thus make use of the reaction direction vectors obtained in palsson ; andrea for the hRBC. In summary:

according to palsson , the net flux of all reactions is in the forward direction, except PGI, R5PI and ApK, which are found to operate bidirectionally;

according to andrea , the net flux of all reactions is in the forward direction, except R5PI (which is found to be operating bidirectionally), and PGI and ApK (which are found to have a net backward flux).
As a first step, we tested the thermodynamic feasibility of these direction assignments, solving (3) by starting from the vector . A solution is found for both MBE and VNC assignments. Both sets of assignments then turn out to be (expectedly) thermodynamically feasible. In this case, however, it is the emerging thermodynamic landscape that is of interest to us. As the initial distribution of chemical potentials we selected a product of independent uniform distributions . The for compounds with known empirical bounds on concentrations were centered at a value computed from the Gibbs energy of formation and the estimated intracellular concentration under the hypothesis of a dilute solution, i.e. at
(14) 
and were taken to span two standard deviations in concentrations. The for metabolites whose concentration estimates were unavailable (namely 6PGL, RL5P, X5P, R5P, S7P) were taken to be centered again at (14) but with , and were assumed to span four orders of magnitude uniformly in the chemical potential scale. Finally, the chemical potential of water was clamped.
Results for the estimated concentrations and Gibbs energy changes (computed from (2) using the final chemical potential vectors) are showcased in Figures 2 and 3, respectively.
Note that several of the bounds encoded in (black markers) indicate that the Gibbs energy change in a reaction is positive and in contrast with the reaction direction assignment from the flux problem. Specifically this happens for GAPDH, PGK, PGM, LDH, G6PDH, TA and PNPase, due either to the actual experimental estimates for the affinities or to the initial uncertainty we place on concentrations. Such bounds turn out to be altered by MinOver in a direction compatible with direction assignments based on mass balance, as final affinities display significant changes with respect to the picture embedded in . At the same time, we observe that in some cases the fluctuations of do exceed the initial boxes defined by , leading to an estimate for the concentration range also for metabolites whose level has not been experimentally probed (6PGL, RL5P, X5P, R5P, S7P). Our predictions for the levels of (1,3)diphosphoglycerate ((1,3)DPG), 2phosphoglycerate (2PG) and phosphoenolpyruvate (PEP) slightly differ from the experimental estimates. This is most likely a consequence of the fact that we are forcing the phosphoglycerate kinase (PGK) and the glyceraldehyde phosphate dehydrogenase (GAPDH) reactions in the forward direction, in agreement with the steady state direction assignments for glycolysis, even if the experimental values would classify them as reversible. In addition, we obtain levels of key metabolites like ATP and inorganic phosphate that differ slightly from experimental estimates, while our predictions for ADP and AMP fail under the MBE and VNC direction assignments, respectively. On one hand this could be due to errors in the prior information on standard free energies but on the other hand, precise experimental estimates of the levels of such highly interchanging metabolites might be difficult to achieve.
We can now quantify the extent to which the solutions we generate are “close” to the prior. In Figure 4 we compare the Gibbs energy changes obtained for the hRBC using MBE directions in three different ways: (a) the MinOver algorithm introduced here, with update given by (4); (b) a penalty method defined by the update rule
(15) 
where and are constants and the sum extends over all ’s such that ; (c) a standard relaxation method defined by the update rule
(16) 
In short, algorithm (b) minimizes the Euclidean distance between the prior and the solution under the thermodynamic constraint, which is enforced through the term proportional to . The relaxation method (c) simply corresponds to a particular choice of the constant appearing in (4): in specific, the step size is required to be proportional to the amount by which the least satisfied constraint is violated. One sees that the penalty method and MinOver produce almost identical solutions. Measuring explicitly the average Euclidean distance between the prior and the solution (the average being taken over the choice of the priors), one indeed finds (penalty method) versus (MinOver), while for the relaxation method (c). Note that the gap between the performance of MinOver and that of the penalty method (, corresponding to a relative error on of just over 1%) provides a very rough estimate of the average distance between the solutions obtained by MinOver and those obtained by cost function minimization, and is much smaller than the spread of the initial configurations of chemical potentials, which in this case is about . Comparing running times for this case, moreover, one sees that MinOver is about times faster than the penalty method ( versus roughly seconds), while the standard relaxation method is even faster (about seconds).
Identifying and removing thermodynamically infeasible loops in the Escherichia coli metabolic network model iAF1260
The application that we have just discussed shows that the algorithm we present can provide information on the Gibbs energy landscape, even correcting inconsistent input knowledge. We shall now employ the procedure outlined in the Materials and Methods section to efficiently identify and eliminate loops from thermodynamically infeasible flux configurations of the reconstructed metabolic network of Escherichia coli iAF1260 bigcoli . We shall focus specifically on the periplasmic and cytoplasmic core of the network, which presents the advantage that the cycles identified here are independent of the transport and environment selections. The network includes reactions among chemical species.
Since we are not focusing on the reconstruction of the Gibbs energy landscape but simply on the existence of solutions of (3), a detailed biochemical prior is not needed. Therefore, for the present purposes we have taken to be product of identical uniform distributions. To begin with, we have fixed the direction of reactions that are putatively irreversible in the reconstructed network ( in total) and verified that this assignment is indeed thermodynamically feasible by finding a solution of (3) restricted to irreversible processes. Then we integrated the above assignments by fixing randomly and independently with equal probability the directions of the processes that are putatively reversible. A large ensemble of vectors ( instances) thus obtained was tested for thermodynamic feasibility. Note that by excluding the possibility that reactions are not operating we are considering a worstcase scenario in which all reactions bear a nonzero flux. Only about 1.5% of these configurations turned out to be thermodynamically feasible, i.e. free from loops. We focus on infeasible instances, for which no vector of chemical potentials was found to satisfy (3), applying to them the loop identification and removal protocol. Figure 5 shows the histogram of convergence times for the above procedure, i.e. the times required to verify that a given vector of flux directions is thermodynamically infeasible and to correct it.
On an Intel dual core at 3.06 GHz the average CPU time for convergence is of the order of a few seconds, while it exceeds 10 seconds in about 5% of the (random) instances. In the worst case within our ensemble, convergence time was around 100 seconds.
We have furthermore studied the set of loops that were thus identified and corrected. Quite remarkably this analysis revealed that thermodynamical infeasibility is related to the presence of a small set of cycles, 23 in total. These are reported in Table 1 and three of them are depicted explicitly in Figure 6.
Note that some of these cycles include a single reversible reaction. This implies that in order to ensure that such cycles will not be present in the final configuration it is necessary to fix the direction of the reactions SERt2rpp, GLUt2rpp, ACt2rpp, GLYCLTt2rpp, THRt2rpp, SUCOAS, PPAKr and PROt2rpp opposite to that shown in Table 1 (see the Supporting Table S1 for abbreviations). In turn, this is easily seen to impose a further constraint on the direction of the reversible fluxes CAt6pp and GLUABUTt7pp (via cycles of length 4). Finally we checked that excluding these loops guarantees thermodynamic feasibility of randomly generated flux configurations
Discussion
Ideally, constraintbased models of metabolic activity allow to appraise the energetic potential of cells based on minimal constraints related to local massbalance and thermodynamic feasibility rules, possibly complemented with optimization principles that can encode for functional constraints. As a result, the flow of matter in nonequilibrium steady states could be characterized in terms of the Gibbs energy change of reactions, which specifies the directionality of interconversions, and of the average number of turnovers per time per volume, i.e. the flux, without the need of detailed information on enzyme kinetics or transport mechanisms. Thermodynamic constraints, strongly linked to overall intracellular conditions like ionic strength and pH stock , are particularly subtle and rich of consequences. It has indeed been argued that the Gibbs energy landscape contains important regulatory information kummel2 . Reactions far from equilibrium are expected to be roughly insensitive to fluctuations in metabolite concentrations, so that they will be driven mostly by enzyme regulation. On the other hand, reactions close to equilibrium (i.e. with a net Gibbs energy change close to zero) bear a high sensitivity to variations in metabolite levels and are therefore unlikely targets for tight regulation. Besides, knowledge of reaction free energies (or more precisely of the chemical potentials of the metabolites involved) provides clues on metabolite levels which would be hard to obtain from massbalance constraints only. Therefore, developing effective procedures to deal with the complexity of flux models encompassing both mass and energy constraints at genome scale is a central challenge for computational systems biology.
Many important steps have been taken recently to tackle it. At one level, thermodynamic feasibility can be translated into topologic constraints (‘absence of loops’) for the flux configuration emerging from massbalance constraints expa . This suggests than an improvement in reversibility assignments (e.g. along the lines of kummel ) can be a key to ensure energy balance a priori in metabolic network reconstructions, with the caveat that the possibility that a reaction reverses can depend on the boundary conditions (e.g. the external supply of a certain metabolite) or on intracellular perturbations (e.g. a knockout causing the accumulation of an intermediate metabolite) hoppe . Another possibility consists in building mixed integerlinear constraintbased models that include thermodynamic requirements in the form of consensus rules (using information on standard Gibbs free energies) tbmfa or as additional constraints on metabolite levels (using information on measured intracellular concentrations) hoppe . Here we have followed a different though related route that in our view complements the approaches just described. The starting point is the fact that, given a flux configuration, thermodynamic constraints can be written as simple stoichiometric inequalities for the chemical potentials. Feasibility implies the existence of a vector of chemical potentials satisfying such system. We have presented an algorithm that is able to construct solutions starting from a possibly limited and noisy biochemical prior. Our approach differs substantially from previous methods in that it relies on modifying the structure of correlations between chemical potentials (after fixing some variables to set a scale) using the stoichiometric information on reaction connectivity to drive the updating process. In this sense, the important difference with kummel2 is that the prior information is used only to initialize the algorithm, and a large flexibility is allowed in deforming it until a viable solution is obtained.
The usefulness of the algorithm in the analysis of genomescale networks has been tested in two different cases. For the metabolic network of the human red blood cell, our approach has proved capable of reconstructing the Gibbs energy landscape correcting inconsistent prior information. In turn, this has lead to predictions for intracellular metabolite levels. It is important to stress that the bounds on concentrations we have obtained (which vary rather heterogeneously across compounds) only reflect stoichiometric information. For the metabolic network of Escherichia coli, instead, we have focused on the problem of correcting thermodynamically infeasible flux states in the core formed by the periplasmic and cytoplasmic matrix. We have thoroughly analyzed a large ensemble of configurations of reaction directions, identifying the cycles responsible for thermodynamic inconsistency and correcting them in a very modest amount of CPU time. Quite intriguingly, we have related infeasibility to the existence of a relatively small number of short cycles in the flux configuration, whose removal suffices to ensure thermodynamic feasibility in worstcase flux configurations.
The main advantage of our method consists in our view in its efficient implementation. On the critical side, we point to two aspects that deserve further study. In first place, our tool requires flux configurations as inputs, i.e. it is still unable to produce thermodynamically feasible configurations of fluxes and chemical potentials starting from no previous reversibility hypothesis. However it may provide the basis of a more general procedure for the analysis of genomescale metabolic networks that couples flux and thermodynamic profiling, a challenging open problem in computational biology. Secondly, our method relies on prior biochemical information and it would be desirable for it to be effective even if much or most biochemical priors are unknown. As we pointed out, some information has to be injected into the problem for the sake of definiteness. The interesting question is therefore what is the minimum necessary prior needed to reconstruct the Gibbs energy landscape and how are predictions affected by restricted priors. Such problems are mathematical in nature and could bear a particularly high significance for modeling purposes.
We remark that the corrected flux configurations thus obtained, like the starting ones (which were drawn from a uniform product measure over reactions), are not guaranteed to be consistent with any steady state assumption. On the other hand, see Supporting Text S1, starting from a massbalanced configuration one retrieves another massbalanced configuration. Clearly, a method that directly generates thermodynamically steady state flux vectors and chemical potentials would be highly welcome. Nevertheless, we note that our analysis focuses on a worstcase scenario where all reactions bear a nonzero flux. In a more realistic case where reactions may be inactive it is reasonable to expect that the flux directions we generate will allow for a steady state. In turn, the identification of these loops might be weakly conditioned on the sample of flux assignments we employed. We can’t rule out the possibility that assigning directions based e.g. on mass balance constraints provides a selection criterion for flux configurations that differs substantially from the uniform measure we employed. However we expect that our sample allows to correctly identify loops involving at most reversible reactions, being the number of distinct direction assignments in our sample. Only loops that include a larger number of reversible processes might therefore play a role in a differently selected set of direction assignments.
Acknowledgments – This work was supported by the DREAM Seed Project of the Italian Institute of Technology (IIT) and by the joint IIT/Sapienza Lab Nanomedicine . The IIT Platform “Computation” is gratefully acknowledged.
Cycle ID  Lenght  Formula 

1  3  SERt4pp NAt3pp SERt2rpp(R) 
2  3  NAt3pp GLUt4pp GLUt2rpp(R) 
3  3  NAt3pp ACt2rpp(R) ACt4pp 
4  3  NAt3pp GLYCLTt2rpp(R) GLYCLTt4pp 
5  3  PROt4pp PROt2rpp(R) NAt3pp 
6  3  HPYRRx TRSARr(R) HPYRI(R) 
7  3  CRNDt2rpp(R) CRNt2rpp(R) CRNt8pp 
8  3  VPAMT ALATAL(R) VALTA(R) 
9  3  ABUTt2pp GLUABUTt7pp(R) GLUt2rpp(R) 
10  3  NAt3pp THRt2rpp(R) THRt4pp 
11  3  ADK3(R) ADK1(R) NDPK1(R) 
12  3  ACCOAL PPCSCT SUCOAS(R) 
13  3  PPAKr(R) ACCOAL PTA2 
14  4  ACt4pp CAt6pp(R) ACt2rpp(R) CA2t3pp 
15  4  CA2t3pp GLYCLTt2rpp(R) GLYCLTt4pp CAt6pp(R) 
16  4  SERt4pp CAt6pp(R) SERt2rpp(R) CA2t3pp 
17  4  GLUt4pp CAt6pp(R) CA2t3pp GLUt2rpp(R) 
18  4  CA2t3pp PROt2rpp(R) PROt4pp CAt6pp(R) 
19  4  THRt4pp CAt6pp(R) THRt2rpp(R) CA2t3pp 
20  5  ADK1(R) ACKr(R) ACS PTAr(R) PPKr(R) 
21  5  R15BPK PPM(R) PRPPS(R) ADK1(R) R1PK 
22  6  R1PK NDPK1(R) PPM(R) PRPPS(R) R15BPK ADK3(R) 
23  6  ADK3(R) ACKr(R) PTAr(R) NDPK1(R) PPKr(R) ACS 
References
 (1) Beard DA, Qian H (2008) Chemical biophysics. Cambridge University Press, Cambridge UK
 (2) Beard D, Liang S, Qian H (2002) Energy balance for analysis of complex metabolic networks. Biophys. J. 83: 7986
 (3) Beard D, Babson E, Curtis E, Qian H (2004) Thermodynamic constraints for biochemical networks. J. Theor. Biology 228: 327333
 (4) Alberty RA, CornishBowden A, Goldberg RN, Hammes GG, Tipton K, Westerhoff HV (2011) Recommendations for terminology and databases for biochemical thermodynamics. Biophys. Chem. 155: 89103
 (5) Jankowski MD, Henry CS, Broadbelt LJ, Hatzimanikatis V (2008) Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys. J. 95: 14871499
 (6) Fleming RMT, Thiele I, Nasheuer HP (2009) Quantitative assignment of reaction directionality in constraintbased models of metabolism: application to Escherichia coli. Biophys. Chem. 145: 4756
 (7) Kümmel A, Panke S, Heinemann M (2006) Putative regulatory sites unraveled by networkembedded thermodynamic analysis of metabolome data. Mol. Sys. Biology 2: 2006.0034
 (8) Kauffman KJ, Prakash P, Edwards JS (2003) Advances in fluxbalance analysis. Curr. Opin. Biotechnol. 14: 491496
 (9) Orth JD, Thiele I, Palsson BØ (2010) What is flux balance analysis? Nature Biotechnology 28: 245248
 (10) Schütz R, Kuepfer L, Sauer U (2007) Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol. Sys. Biol. 3: 119
 (11) Feist AM, Palsson BØ (2010). The biomass objective function. Curr. Opin. Microbiol. 13: 344349
 (12) Hoppe A, Hoffmann S, Holzhütter HG (2007) Including metabolite concentrations into fluxbalance analysis: thermodynamic realizability as a constraint on flux distributions in metabolic networks. BMC Systems Biology 1: 23
 (13) Henry CS, Broadbelt LJ, Hatzimanikatis V (2007) Thermodynamicsbased metabolic flux analysis. Biophys. J. 92: 17921805
 (14) Schellenberger J, Lewis NE, Palsson BØ (2011) Elimination of thermodynamically infeasible loops in steadystate metabolic models. Biophys. J. 100: 544553
 (15) Beard DA, Qian H (2005) Thermodynamicbased computational profiling of cellular regulatory control in hepatocyte metabolism. Am. J. Physiol. Endocrinol. Metab. 288: E633E644
 (16) Yang F, Qian H, Beard DA (2005) Ab initio prediction of thermodynamically feasible reaction directions from biochemical network stoichiometry. Metab. Eng. 7: 251259
 (17) Braunstein A, Mulet R, Pagnani A (2008) Estimating the size of the solution space of metabolic networks. BMC Bioinformatics 9: 240
 (18) Mézard M, Parisi G, Zecchina R (2002) Analytic and algorithmic solution of random satisfiability problems. Science 297: 812815
 (19) Price ND, Schellenberger J, Palsson BØ (2004) Uniform sampling of steadystate flux spaces: means to design experiments and to interpret enzymopathies. Biophys. J. 87: 21722186
 (20) De Martino A, Granata D, Marinari E, Martelli C, Van Kerrebroeck V (2010) Optimal fluxes, reaction replaceability, and response to enzymopathies in the human red blood cell. J. Biomed. Biotechol. 2010: 415148
 (21) Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ (2007) A genomescale metabolic reconstruction for Escherichia coli K12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol. Sys. Bio. 3: 121
 (22) Wiback A, Palsson BØ (2002) Extreme pathway analysis of human red blood cell metabolism. Biophys. J. 83: 808818
 (23) Kümmel A, Panke S, Heinemann M (2006) Systematic assignment of thermodynamic constraints in metabolic network models. BMC Bionformatics 7: 512
 (24) BioNumbers database, see R. Milo et al. (2010) BioNumbers–the database of key numbers in molecular and cell biology. Nucl. Acids Res. 38 (suppl. 1): D750D753
 (25) Schrijver A (1986) Theory of linear and integer programming. John Wiley & Sons Ltd
 (26) Goffin JL (1980) The relaxation method for solving systems of linear inequalities. Math. Oper. Res. 5: 388414
 (27) Agmon S (1954) The relaxation method for linear inequalities. Canadian J. Math 6: 382392
 (28) Motzkin TS, Schoenberg IJ (1954) The relaxation method for linear inequalities. Canadian J. Math 6: 393404
 (29) Krauth W, Mézard M (1987) Learning algorithms with optimal stability in neural networks. J. Phys. A: Math. Gen. 20: L745L752
 (30) De Martino A, Martelli C, Monasson R, Perez Castillo I (2007) Von Neumann’s expanding model on random graphs. J. Stat. Mech. P05012
 (31) Martelli C, De Martino A, Marinari E, Marsili M, Perez Castillo I (2009) Identifying essential genes in Escherichia coli from a metabolic optimization principle. Proc. Nat. Acad. Sci. USA 106: 26072611
 (32) De Martino A, Marinari E (2010) The solution space of metabolic networks: producibility, robustness and fluctuations. J. Phys. Conf. Ser. 233: 012019
 (33) Solodovnikov AS (1980) Systems of linear inequalities. The University of Chicago Press
 (34) Schellenberger J, Palsson B Ø(2009) Use of Randomized Sampling for Analysis of Metabolic Networks. J. Biol. Chemistry 284: 54575461
 (35) D Gale (1960) The theory of linear economic models. The University of Chicago Press
 (36) De Martino A, Marsili M (2005) Typical properties of optimal growth in the Von Neumann expanding model for large random economies. J. Stat. Mech. L09003
 (37) Imielinski M, Belta C, Rubin H, Halasz A (2006) Systematic Analysis of Conservation Relations in Escherichia coli GenomeScale Metabolic Network Reveals Novel Growth Media. Biophys. J. 90: 26592672
 (38) Maskow T, Von Stockar U (2005) How reliable are thermodynamic feasibility statements of biochemical pathways? Biotech. Bioeng. 92: 223230
 (39) Price ND, Famili I, Beard DA, Palsson BØ (2002) Extreme pathways and Kirchhoff’s second law. Biophys J. 83: 28792882