An Algorithm for Detecting Fixed Points of Boolean Networks
In the applications of Boolean networks to modeling biological systems, an important computational problem is the detection of the fixed points of these networks. This is an NP-complete problem in general. There have been various attempts to develop algorithms to address the computation need for large size Boolean networks. The existing methods are usually based on known algorithms and thus limited to the situations where these known algorithms can apply. In this paper, we propose a novel approach to this problem. We show that any system of Boolean equations is equivalent to one Boolean equation, and thus it is possible to divide the polynomial equation system which defines the fixed points of a Boolean network into subsystems that can be solved easily. After solving these subsystems and thus reducing the number of states, we can combine the solutions to obtain all fixed points of the given network. This approach does not depend on other algorithms and it is straightforward and easy to implement. We show that our method can handle large size Boolean networks, and demonstrate its effectiveness by using MAPLE to compute the fixed points of Boolean networks with hundreds of nodes and thousands of interactions.
Boolean networks were introduced in  as random models of genetic regulatory networks to study biological systems. A recent research focus of Boolean networks is to develop theories and algorithms to address questions arise from biological applications , , , -, , . To aid the study of complex biological systems, where experiments are usually expensive and time consuming, researchers use mathematical models built based on partial experimental information of these biological systems. Boolean networks offer relatively simpler such models which are capable of capturing some of the key dynamical properties ,, such as the stable states, of the underlying systems. As discrete time finite state dynamical systems, Boolean networks will eventually revert to certain sets of states called attractors. These attractors encode the long term behaviors of the underlying biological systems, and can be divided into two categories: stable states (fixed points) and cyclic states. The purpose of this paper is to develop an effective method for detecting the fixed points of these networks.
Different approaches for the detection of fixed points of Boolean networks exist in the literature. In , an approach which is search/recursive in nature was given. According to , the proposed algorithms can identify all fixed points of a random Boolean network with maximum indegree (the number of variables that each nodes depends on) with an average time ( is the number of variables the whole Boolean network depends on, which is also the number of nodes). In the worst case, however, it can take up to time . Another approach, which is based on the -satisfiability problem related algorithms and methods, has been developed in several recent publications (see , , and the references therein). In , the result of applying algorithms of solving constraint satisfaction problems to the detection of fixed points of some randomly generated Boolean networks was reported. According to , this method performs well for Boolean networks with indegree , and the computation will be exponential with indegree . This is because there exist polynomial time algorithms for the satisfiability problem, but the satisfiability problem is NP-complete for , . According to , the algorithm there can detect a fixed point of an AND/OR (only one of these operations is allowed for each node) Boolean network with non-restricted indegree in time . The satisfiability problem concerns whether or not there is a solution, not how to find all solutions. Thus further developments are needed before these algorithms can find more real applications. A computational algebra approach to the theory of dynamical systems over finite fields was developed in , , . A key concept of computational algebra, the Gröbner bases of a polynomial system, can also be employed in the detection of the fixed points of a Boolean network by first compute a Gröbner basis of the polynomial system that defines the fixed points, and thus make the system easier to solve. Though the computation of a Gröbner basis of a polynomial system over a finite field is faster than the computations over the real or the complex numbers, especially for the Boolean case, our capability to perform such a computation is still rather limited, due to the fact that the computation of a Gröbner basis may require time that is doubly-exponential.
There are also publications concerning the connection between fixed points and the topology structures of Boolean networks. The fact that genetic networks with canalyzing Boolean rules are always stable was reported in , and these Boolean networks were subsequently studied by . The problem of when a Boolean network in which all the up-dating rules are defined by monomials is a fixed point system was investigated in . In , via minimizing a cost function over a family of Boolean networks having a common set of fixed points, the intervention in a family of Boolean networks was studied. In , the impact of function perturbations to a Boolean network’s fixed points in the form of a one-bit change of the truth table was investigated. In , the consistency of partial information on a Boolean network and a given set of fixed points was considered, and a testable necessary and sufficient condition for consistency was derived. Some discussions on the effects of topology of Boolean networks to their long term behaviors can be found in , , .
The main result of this paper is a method for solving systems of Boolean equations arise from applications to biological systems. It is known that, though there are hardly any biological networks (for example, gene regulatory networks) with each and every node depends on other nodes, these networks are also not densely connected. There can be nodes with many connections, but most nodes depend on a few other nodes. We have observed that, though it is impossible to treat large size Boolean network using the exhaustive enumeration method since a Boolean network with nodes has states, with today’s standard home and office PCs, the computing time for solving a system of Boolean equations for variables, even using exhaustive enumeration, is rather short. For example, if , then the computation usually takes about seconds (see examples later). Therefore, if we can divide a system of Boolean equations into subsystems according to the number of variables involved (different subsystems can have common variables) such that each subsystem can be solved easily, say by using the exhaustive enumeration method, then by patching the solutions of the subsystems together, we should be able to find all the solutions. This turned out working quite well, since biological Boolean networks usually permit such a division, and the number of solutions of a subsystem with variables is . So by solving the subsystems first, we can reduce a seemingly intractable enumeration problem to a feasible one. This approach is straightforward, does not rely on any other algorithm, and is capable of solving large systems. This method also applies to Boolean networks which have not been considered in the literature so far (to the best of the author’s knowledge). For example, it applies to “community-like” networks, i.e. those networks where the nodes in each community (with reasonable size) can be densely connected while the communities of the network are sparsely connected, these networks can have the average number of connections of each node .
2. Theory and Algorithm
A Boolean network with nodes can be given by a Boolean polynomial function
where is the state space of all sequences of length formed by and , and are Boolean polynomials in variables . We can use either the logical operations OR (), AND (), and NOT (), or the modulo arithmetic operations addition and multiplication, to perform the calculations for Boolean variables and polynomials. The correspondences are given by:
To study the dynamical properties of a Boolean network, we consider the time-discrete dynamical system defined by:
That is, the functions , give the updating rules for the nodes, and the state of the th node at time is given by the function value . For gene regulatory networks, the variables represent the genes and the functions give the gene regulatory rules. If , then the corresponding gene is expressed (ON); and if , then the gene is not expressed (OFF).
The state space graph of a Boolean network is a directed graph with the vertices (states) given by the set , and with the directed edges defined by the function : there is a directed edge from vertex to vertex if the value of at is . The dependency graph of is a directed graph with nodes such that there exists a direct edge from node to node ( is allowed) if and only if depends on the variable .
A state is a fixed point of if it is a solution of the system of equations
To describe our method, we change the above system of equations to a different equivalent form. We consider the set of Boolean polynomials
Let . If , we write
Recall that a set of nonempty subsets of is a partition of if
We can now state the following theorem.
Let be defined by (2.1) and let be a partition of . Then a state is a fixed point of if and only if
In particular, is a fixed point of if and only if .
The system of equations given by (2.2) is equivalent to
which in turn is equivalent to
If is a partition of , then
so if and only if all . ∎
By using the above argument, one can convert a satisfiability problem to a problem of finding a fixed point of a Boolean network, and vice versa. This implies immediately that detecting a fixed point of a Boolean network is an NP-complete problem.
The above theorem also implies that any system of Boolean equations is equivalent to a single Boolean equation.
As a consequence of the above theorem, we have the following procedure of detecting the fixed points of a Boolean network (i.e. an algorithm for solving a system of Boolean equations).
Boolean network fixed points detection algorithm.
INPUT: A Boolean network defined as in (2.1).
OUTPUT: Fixed points of .
1. Choose a threshold level (a positive integer) such that any Boolean equation with the number of variables can be solved easily.
2. Set . Simplify the system (reduce the number of variables) using obvious relations such as or (for ) by making the substitutions or into the ’s.
3. Divide into subsets such that for each , the number of variables involved in the subsystem is (but as close to as possible), and solve each subsystem separately.
4. Combine the solutions of each subsystem to obtain the fixed points of .
Note that the threshold level depends on the hardware and the method employed to solve these equations. For exhaustive enumeration method on standard PCs, we can use . Note that different subsystems are allowed to have common variables, and for each , one can just solve (or ). Note also that parallel computation can be used in both step 3 and step 4.
Algorithm analysis. It is clear that the success of the above algorithm depends on whether the whole system can be divided into subsystems according to the threshold level such that the number of subsystems () is relatively small compare to the total number of nodes (). For example, this will not be the case if the dependency graph is a complete graph. As mentioned in the introduction, biological networks as well as community-like networks can be divided. Basically, Boolean networks with small average connections, for example , can always be divided, but those with average connections may or may not be divisible depending on the actual networks and the method employed to solve them. Assume that exhaustive enumeration is used to solve the subsystems. From the actual gene regulatory networks in the literature, we can assume that, with the threshold level , the average number of equations in each subsystem is between and (see examples in the next section). If solving one of these subsystems takes about seconds, then the total time of solving these subsystems is approximately equal to seconds. So the computation time is up to the time needed for combining the solutions of these subsystems. This depends on the number of fixed points of the Boolean network . In general, the more fixed points has, the longer the computation (compare the examples in the next section), since if has a large number of fixed points, then even verifying that all these points are fixed points could be a problem.
In this section, we present several examples for our algorithm. The first three are gene regulatory networks from the references. The last two were simulated based on the gene regulatory networks published in the literature. The subsystems were solved using exhaustive enumerations. All computations were done using MAPLE on a Dell laptop with the system: Intel(R)Core(TM)2 Duo CPU T9900@3.06GHz with 3.5 GB RAM.
Example 3.1. Our first example is the gene regulatory network published in . This Boolean network models the expression pattern of the segment polarity genes in the fruit fly Drosophila melanogaster and has nodes. The polynomial system is given in the Appendix. There is no need to divide the system, after step 2 of Algorithm 2.1, the resulted equation to be solve is
The computation for solving this equation took second, and fixed points were detected (see supplement MAPLE worksheet).
Example 3.2. This example is the T-LGL survival signaling Boolean network given by the diagram of Fig. 2B in . This network has nodes (see Appendix). The equation we obtained after step 2 of Algorithm 2.1 is
The computation for solving this equation took second, and fixed points were detected (see supplement MAPLE worksheet).
Example 3.3. This example is the T-Cell receptor signaling Boolean model given by Fig. 1 in . This network has nodes. We derived the Boolean polynomial functions of the nodes according to the interactions given in the diagram (see Appendix). After step 2 of Algorithm 2.1, with the number of variables threshold at , the entire system of equations that defines the fixed points was divided into subsystems. Subsystem 1 involves variables and equations, which was solved in seconds. Subsystem 2 involves variables and equations, which was solved in seconds. Subsystem 3 involves variables and equations, which was solved in second. Putting the solutions of these subsystems together took seconds. A total of fixed points were detected (see supplement MAPLE printout).
Example 3.4. This is a simulated Boolean network with nodes. The dependency graph (Fig. 1) is generated from the polynomial functions (see supplement MAPLE printout). With threshold at , after step 2 of Algorithm 2.1, the fixed point system of equations was divided into subsystems. The solutions of subsystems 1 to 4 were combined first, then the solutions of subsystems 5 to 7 were combined, and finally the resulted two sets of solutions were combined to obtain the fixed points. The total computation time was approximately hours and a total of fixed points were detected. The majority of the computation time was used to combine the solutions of subsystems 1 to 4 with the solutions of subsystems 5 to 7.
Example 3.5. This is a simulated Boolean network with nodes and interactions. The dependency graph (Fig. 2) is generated from the polynomial functions (see supplement MAPLE printout). With the threshold level at , the whole system was divided into subsystems, the dividing time was seconds, in which seconds were used in reading the inputed network. The total time for solving these subsystems was minutes, and the total time for combining these solutions was minutes. fixed points were detected.
4. Concluding Remarks
We have developed a new approach to solve systems of Boolean equations. With the computation of the fixed points of complex biological Boolean networks in mind, we developed our approach based on the characteristic of these networks, though it also applies to Boolean networks broadly. Our algorithm is self-contained, not an application of other algorithms, and thus it applies to Boolean networks beyond those have been considered before. The approach is especially adaptable to large networks assembled from smaller components , since these networks are naturally divisible. To demonstrate the effectiveness of our algorithm, we provided several examples of Boolean networks. The first two examples were included to show that exhaustive enumeration method can solve this problem for Boolean networks of sizes between and in less than a second with today’s standard PCs, which provides the supporting evidence for our approach. The third example is the Boolean network published in . According to the authors, this network was the largest Boolean model of a cellular network known to them at the time of publication. Our algorithm used less than seconds to detect all fixed points of this Boolean network using MAPLE. The two simulated examples are substantially larger than the one in . Thus we believe that our method will offer a useful tool for analyzing Boolean models, in particular, Boolean networks of biological systems.
-  Albert, R. and Othmer, H. (2003), The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster, J. Theor. Biol. 223, 1-18.
-  Bonneau, R. (2008), Learning biological networks: from modules to dynamics, Nature Chemical Biology 4, 658-664.
-  Choudhary, A., Datta, A., Bittner, M. L., and Dougherty, E. R. (2006), Intervention in a family of Boolean networks, Bioinformatics 22 (2), 226-232.
-  Colón-Reyes, O., Laubenbacher, R., and Pareigis, B. (2004), Boolean monomial dynamical systems, Annals of Combinatorics 8, 425-439.
-  Davidson, E. H. and Levine, M. S. (2008), Properties of developmental gene regulatory networks, PNAS 105(51), 20063-20066.
-  Devloo, V., Hansen, P., and Labbé, M. (2003), Identification of all steady states in large networks by logical analysis, Bulletin of Mathematical Biology 65, 1025-1051.
-  Jarrah, A., Laubenbacher, R., Stigler, B., and Stillman, M. (2007), Reverse-engineering of polynomial dynamical systems, Adv. in Appl. Math., 39 (4), 477-489.
-  Jarrah, A., Raposa, B., and Laubenbacher, R. (2007), Nested canalyzing, unate cascade, and polynomial functions, Physica D, 233, 167-174.
-  Karlebach, G. and Shamir, R. (2008), Modelling and analysis of gene regulatory networks, Nature Reviews Molecular Cell Biology 9, 770-780.
-  Kauffman, S. A. (1969), Metabolic stability and epigenesis in randomly constructed genetic nets, J. Theor. Biol. 22, 437-467.
-  Kauffman, S., Peterson, C., Samuelsson, B., and Troein, C. (2004), Genetic networks with canalyzing Boolean rules are always stable, PNAS 101(49), 17102-17107.
-  Kinoshita, S-i., Iguchi, K., and Yamada, H. S. (2009), Intrinsic properties of Boolean dynamics in complex networks, J. Theor. Biol. 256, 351-369.
-  Laubenbacher, R. and Stigler, B. (2004), A computational algebra approach to the reverse engineering of gene regulatory networks, J. Theor. Biol. 229, 523-537.
-  Nochomovitz, Y. D. and Li, H. (2006), Highly designable phenotypes and mutational buffers emerge from a systematic mapping between network topology and dynamic output, PNSA 103(11), 4180-4185.
-  Oikonomou, P. and Cluzel, P., (2006), Effects of topology on network evolution, Nature Physics 2, 532-536.
-  Pal, R., Ivanov, I., Datta, A., Bittner, M. L., and Dougherty, E. R. (2005), Generating Boolean networks with a prescribed attractor structure, Bioinformatics 21(21), 4021-4025.
-  Pomerance, A., Ott, E., Girvan, M., and Losert, W. (2009), The effect of network topology on the stability of discrete state models of genetic control, PNAS 106(20), 8209-8214.
-  Purnick, P. E. M., and Weiss, R. (2009), The second wave of synthetic biology: from modules to systems, Nature Reviews Molecular Cell Biology 10, 410-422.
-  Saez-Rodriguez, J. et al. (2007), A logical model provides insights into T cell receptor signaling, PLoS Computational Biology 3(8): e163.
-  Tamura, T. and Akutsu, T. (2009), Detecting a Singleton Attractor in a Boolean Network Utilizing SAT Algorithms, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E92-A (2), 493-501.
-  Xiao, Y. and Dougherty, E. R. (2007), The impact of function perturbations in Boolean networks, Bioinformatics, 23(10), 1265-1273.
-  Zhang, R. et al. (2008), Network model of survival signaling in large granular lymphocyte leukemia, PNAS 105(42), 16308-16313.
-  Zhang, S-Q. et al. (2007), Algorithms for finding small attractors in Boolean networks, EURASIP Journal on Bioinformatics and Systems Biology, doi:10.1155/2007/20180.
-  Zou, Y.M. (2010), Modeling and analyzing complex biological networks incooperating experimental information on both network topology and stable states, Bioinformatics 26(16), 2037-2041.
We provide the correspondences between gene names and the variables for the three Boolean networks cited from the literature. We refer the reader to the references for the original networks. The lengthy polynomial systems of Example 4 and 5 are provided in the MAPLE sheets.
Boolean Networks of 
We introduce the variables as follows:
Then the Boolean network is given by the following polynomial functions:
The Boolean Network of 
This is the Boolean network given by the diagram of Fig. 2B in . We introduce the variables as follows:
Then the Boolean network is given by
The Boolean network of 
We introduce the variables as follows:
Then the Boolean network is given by