Efficient Algorithms and Routing Protocols for Handling Transient Single Node Failures
Single node failures represent more than 85% of all node
failures in the today’s large communication networks such as the
Internet. Also, these node failures are usually
transient. Consequently, having the routing paths
globally recomputed does not pay off
since the failed nodes recover fairly quickly, and the recomputed
routing paths need to be discarded. Instead, we develop algorithms
and protocols for dealing with such transient single node failures
by suppressing the failure (instead of advertising it across the
network), and routing messages to the destination via alternate paths
that do not use the failed node.
We compare our solution to that of , which also
discusses such a proactive recovery scheme for handling
transient node failures. We show that our algorithms are faster
by an order of magnitude while our paths are equally good.
We show via simulation results that our paths are usually
within 15% of the optimal for randomly generated graph with
KEY WORDS: Network Protocols, Node Failure Recovery, Transient Node Failures, Alternate Path Routing.
Let be an edge weighted graph that represents a computer network, where the weight (positive real number), denoted by , of the edges represents the cost (time) required to transmit a packet through the edge (link). The number of vertices () is and the number of edges () is . It is well known that a shortest paths tree of a node , , specifies the fastest way of transmitting a message to node originating at any given node in the graph under the assumption that messages can be transmitted at the specified costs. Under normal operation the routes are the fastest, but when the system carries heavy traffic on some links these routes might not be the best routes. These trees can be constructed (in polynomial time) by finding a shortest path between every pair of nodes. In this paper we consider the case when the nodes in the network are susceptible to transient faults. These are sporadic faults of at most one node111The nodes are single- or multi-processor computers at a time that last for a relatively short period of time. This type of situation has been studied in the past  because it represents most of the node failures occurring in networks. Single node failures represent more than 85% of all node failures . Also, these node failures are usually transient, with 46% lasting less than a minute, and 86% lasting less than 10 minutes . Because nodes fail for relative short periods of time, propagating information about the failure throughout the network is not recommended.
In this paper we consider the case where the network is biconnected (2-node-connected), meaning that the deletion of a single node does not disconnect the network. Based on our previous assumptions about failures, a message originating at node with destination will be sent along the path specified by until it reaches node or a node (other than ) that failed. In the latter case, we need to use a recovery path to from that point. Since we assume single node faults and the graph is biconnected, such a path always exists. We call this problem of finding the recovery paths the Single Node Failure Recovery (SNFR) problem. It is important to recognize that the recovery path depends heavily on the protocol being deployed in the system. Later on we discuss our (simple) routing protocol.
Our communication network is modeled by an edge-weighted biconnected undirected graph , with and . Each edge has an associated cost (weight), denoted by , which is a non-negative real number. denotes a shortest path between and in graph and to denote its cost (weight).
A shortest path tree for a node is a collection of edges of which form a spanning tree of such that the path from node to in is a shortest path from to in . We say that is rooted at node . With respect to this root we define the set of nodes that are the children of each node as follows. In we say that every node that is adjacent to such that is on the path in from to , is a child of . For each node in the shortest paths tree, denotes the number of children of in the tree, and denotes this set of children of the node . Also, is said to be the parent of each in the tree . With respect to , the parent node, , of a node is sometimes referred to as the primary neighbor or primary router of , while is referred to as an upstream neighbor or upstream router of . The children of a particular node are said to be siblings of each other. denotes the set of nodes in the subtree of in the tree and denotes the set of all edges incident on the node in the graph . We use to denote the next node from on the shortest path tree from to . Note that by definition, is the parent of in .
Finally, we use to denote the escape edge in that the node uses to recover from the failure of its parent. As we discuss later, having the information of a single escape edge for each node and is sufficient to construct the entire alternate path for any node to recover from the failure of its parent, even though the path may actually contain multiple non-tree edges.
1.2 Related Work
One popular approach of tackling the issues related to transient failures of network elements is that of using proactive recovery schemes. These schemes typically work by precomputing alternate paths at the network setup time for the failure scenarios, and then using these alternate paths to re-route the traffic when the failure actually occurs. Also, the information of the failure is suppressed in the hope that it is a transient failure. The local rerouting based solutions proposed in [1, 6, 9, 10, 11] fall into this category.
Refs. [8, 11] present protocols based on local re-routing for dealing with transient single link and single node failures respectively. They demonstrate via simulations that the recovery paths computed by their algorithm are usually within 15% of the theoretically optimal alternate paths.
Wang and Gao’s Backup Route Aware Protocol  also uses some precomputed backup routes in order to handle transient single link failures. One problem central to their solution asks for the availability of reverse paths at each node. However, they do not discuss the computation of these reverse paths. Interestingly, the alternate paths that our algorithm computes qualify as the reverse paths required by the BRAP protocol of .
Slosiar and Latin  studied the single link failure recovery problem and presented an time for computing the link-avoiding alternate paths. A faster algorithm, with a running time of for this problem was presented in . Our central protocol presented in this paper can be generalized to handle single link failures as well. Unlike the protocol of , this single link failure recovery protocol would use optimal recovery paths.
1.3 Problem Definition
The Single Node Failure Recovery problem, is defined as follows: (SNFR) Given a biconnected undirected edge weighted graph , and the shortest paths tree of a node in where denotes the set of children of the node in , for each node and , find a path from to in the graph , where is the set of edges adjacent to vertex .
In other words, for each node in the graph, we are interested in finding alternate paths from each of its children to the source222 We use source and destination in an interchangeable way node when the node fails. Note that we don’t consider the problem to be well defined when the node fails.
The above definition of alternate paths matches that in  for reverse paths: for each node , find a path from to the node that does not use the primary neighbor (parent node) of in .
1.4 Main Results
We discuss our efficient333The primary routing tables can be computed using the Fibonacci heaps  based implementation of Dijkstra’s shortest paths algorithm  in time algorithm for the SNFR problem that has a running time of (by contrast, the alternate path algorithms of [6, 8, 11] have a time complexity of per destination). We further develop protocols based on this algorithm for recovering from single node transient failures in communication networks. In the failure free case, our protocol does not use any extra resources.
The recovery paths computed by our algorithm are not necessarily the shortest recovery paths. However, we demonstrate via simulation results that they are very close to the optimal paths.
We compare our results with those of  wherein the authors have also studied the same problem and presented protocols based on local rerouting for dealing with transient single node failures. One important difference between the algorithms of [6, 8, 11] and our’s is that unlike our algorithm, these are based primarily on recomputations. Consequently, our algorithm is faster by an order of magnitude than those in [6, 8, 11], and as shown by our simulation results, our recovery paths are usually comparable, and sometimes better.
2 Algorithm for Single Node Failure Recovery
A naive algorithm for the SNFR problem is based on recomputation: for each node and , compute the shortest paths tree of in the graph . Of interest are the paths from to each of the nodes . This naive algorithm invokes a shortest paths algorithm times, and thus takes time when it uses the Fibonacci heap  implementation of Dijkstra’s shortest paths algorithm . While these paths are optimal recovery paths for recovering from the node failure, their structure can be much different from each other, and from the original shortest paths (in absence of any failures) - to the extent that routing messages along these paths may involve recomputing large parts of the primary routing tables at the nodes through which these paths pass. The recovery paths computed by our algorithm have a well defined structure, and they overlap with the paths in the original shortest paths tree () to an extent that storing the information of a single edge, , at each node provides sufficient information to infer the entire recovery path.
2.1 Basic Principles and Observations
We start by describing some basic observations about the characteristics of the recovery paths. We also categorize the graph edges according to their role in providing recovery paths for a node when its parent fails.
Figure 1 illustrates a scenario of a single node failure. In this case, the node has failed, and we need to find recovery paths to from each . When a node fails, the shortest paths tree of , , gets split into components - one containing the source node and each of the remaining ones contain one subtree of a child .
Notice that the edge (Figure 1), which has one end point in the subtree of , and the other outside the subtree of provides a candidate recovery path for the node . The complete path is of the form . Since is outside the subtree of , the path is not affected by the failure of . Edges of this type (from a node in the subtree of to a node outside the subtree of ) can be used by to escape the failure of node . Such edges are called green edges. For example, edge is a green edge.
Next, consider the edge (Figure 1) between a node in the subtree of and a node in the subtree of . Although there is no green edge with an end point in the subtree of , the edges and together offer a candidate recovery path that can be used by to recover from the failure of . Part of this path connects to (), after which it uses the recovery path of (via ’s green edge, ). Edges of this type (from a node in the subtree of to a node in the subtree of a sibling for some ) are called blue edges. Another example of a blue edge is edge which can be used the node to recover from the failure of .
Note that edges like and (Figure 1) with both end points within the subtree of the same child of do not help any of the nodes in to find a recovery path from the failure of node . We do not consider such edges in the computation of recovery paths, even though they may provide a shorter recovery path for some nodes (e.g. may offer a shorter recovery path to ). The reason for this is that routing protocols would need to be quite complex in order to use this information. We carefully organize the green and blue edges in a way that allows us to retain only the useful edges and eliminate useless (red) ones efficiently.
We now describe the construction of a new graph , the recovery graph for , which will be used to compute recovery paths for the elements of when the node fails. A single source shortest paths computation on this graph suffices to compute the recovery paths for all .
The graph has nodes, where . A special node, , represents the source node in the original graph . Apart from , we have one node, denoted by , for each . We add all the green and blue edges defined earlier to the graph as follows. A green edge with an end point in the subtree of (by definition, green edges have the other end point outside the subtree of ) translates to an edge between and . A blue edge with an end point in the subtree of and the other in the subtree of translates to an edge between nodes and . However, the weight of each edge added to is not the same as the weight of the green or blue edge in used to define it. The weights are specified below.
Note that the candidate recovery path of that uses the green edge has total cost equal to:
As discussed earlier, a blue edge provides a path connecting two siblings of , say and . Once the path reaches , the remaining part of the recovery path of coincides with that of . If is the blue edge connecting the subtrees of and (the cheapest one corresponding to the edge ), the length of the subpath from to is:
We assign this weight to the edge corresponding to the blue edge that is added in between and .
The construction of our graph is now complete. Computing the shortest paths tree of in provides enough information to compute the recovery paths for all nodes when fails.
2.2 Description of the Algorithm and its Analysis
We now incorporate the basic observations described earlier into a formal algorithm for the SNFR problem. Then we analyze the complexity of our algorithm and show that it has a nearly optimal running time of .
Our algorithm is a depth-first recursive algorithm over . We maintain the following information at each node :
Green Edges: The set of green edges in that offer a recovery path for to escape the failure of its parent.
Blue Edges: A set of edges in such that is the nearest-common-ancestor of and with respect to the tree .
The set of green edges for node is maintained in a min heap (priority queue) data structure, which is denoted by . The heap elements are tuples of the form where is a green edge, and defines its priority as an element of the heap. Note that the extra element is added in order to maintain invariance that the priority of an edge in any heap remains constant as the path to is traversed. Initially contains an entry for each edge of which serves as a green edge for it (i.e. an edge of whose other end point does not lie in the subtree of the parent of ). A linked list, , stores the tuples , where is a blue edge, and is the weight of as defined by the equation (2).
The heap is built by merging together the heaps of the nodes in , the set of children on . Consequently, all the elements in may not be green edges for . Using a dfs labeling scheme similar to the one in , we can quickly determine whether the edge retrieved by is a valid green edge for or not. If not, we remove the entry corresponding to the edge from via a operation. Note that since the deleted edge cannot serve as a green edge for , it cannot serve as one for any of the ancestors of , and it doesn’t need to be added back to the heap for any . We continue deleting the minimum weight edges from till either becomes empty or we find a green edge valid for to escape ’s failure, in which case we add it to .
After adding the green edges to , we add the blue edges from to .
Finally, we compute the shortest paths tree of the node in the graph using a standard shortest paths algorithm (e.g. Dijkstra’s algorithm ). The escape edge for the node is stored as the parent edge of in , the shortest paths tree of in . Since the communication graph is assumed to be bi-connected, there exists a path from each node to , provided that the failing node is not .
For brevity, we omit the detailed analysis of the algorithm. The time complexity of the algorithm follows from the fact that (1) An edge can be a blue edge in the recovery graph of exactly one node: that of the nearest-common-ancestor of its two end points, and (2) An edge can be deleted at most once from any heap. We state the result as the following theorem.
Given an undirected weighted graph and a specified node , the recovery path from each node to to escape from the failure of the parent of is computed by our procedure in time.
3 Single Node Failure Recovery Protocol
When routing a message to a node , if a node needs to forward the message to another node , the node is the parent of in the shortest paths tree of . The SNFR algorithm computes the recovery path from to which does not use the node . In case a node has failed, the protocol re-routes the messages along these alternate paths that have been computed by the SNFR algorithm.
3.1 Embedding the Escape Edge
In our protocol, the node that discovers the failure of embeds information about the escape edge to use in the message. The escape edge is same as the edge identified for the node to use when its parent (, in this example) has failed. We describe two alteratives for embedding the escape edge information in the message, depending on the particular routing protocol being used.
In several routing protocols, including TCP, the message headers are not of fixed size, and other header fields (e.g. Data Offset in TCP) indicate where the actual message data begins. For our purpose, we need an additional header space for two node identifiers (e.g. IP addresses, and the port numbers) which define the two end points of the escape edge. It is important to note that this extra space is required only when the messages are being re-routed as part of a failure recovery. In absence of failures, we do not need to modify the message headers.
In some cases, it may not be feasible or desirable to add the information about the escape edge to the protocol headers. In such situations, the node that discovers the failure of its parent node during the delivery of a message , constructs a new message, , that contains information for recovering from the failure. In particular, the recovery message, contains (a) : the original message, and (b) : the escape edge to be used by to recover from the failure of its parent.
With either of the above two approaches, a light weight application is used to determine if a message is being routed in a failure free case or as part of a failure recovery, and take appropriate actions. Depending on whether the escape edge information is present in the messagae, the application decides which node to forward the message to. This process consumes almost negligible additional resources. As a further optimization, this application can use a special reserved port on the routers, and messages would be sent to it only during the failure recovery mode. This would ensure that no additional resources are consumed in the failure free case.
3.2 Protocol Illustration
For brevity we do not formally specify our protocol, but only illustrate how it works. Consider the network in Figure 1. If notices that has failed, it adds information in the message (using one of the two options discussed above) about as the escape edge to use, and reroutes the message to . clears the escape edge information, and sends the message to , after which it follows the regular path to . If has not recovered when the message reaches , reroutes with message to with as the escape edge to use. This continues till the message reaches a node outside the subtree of , or till recovers.
Note that since the alternate paths are used only during failure recovery, and the escape edges dictate the alternate paths, the protocol ensures loop free routing, even though the alternate paths may form loops with the original routing (shortest) paths.
4 Simulation Results and Comparisons
We present the simulation results for our algorithm, and compare the lengths of the recovery paths generated by our algorithm to the theoretically optimal paths as well as with the ones computed by the algorithm in . In the implementation of our algorithm, we have used standard data structures (e.g. binary heaps instead of Fibonacci heaps : binary heaps suffer from a linear-time merge/meld operation as opposed to constant time for the latter). Consequently, our algorithms have the potential to produce much better running times than what we report.
We ran our simulations on randomly generated graphs, with varying the following parameters: Number of nodes, and Average degree of a node. The edge weights are randomly generated numbers between 100 and 1000. In order to guarantee that the graph is 2-node-connected (biconnected), we ensure that the generated graph contains a Hamiltonian cycle. Finally, for each set of these parameters, we simulate our algorithm on multiple random graphs to compute the average value of the of a metric for the parameter set. The algorithms have been implemented in the Java programming language (126.96.36.199 patch), and were run on an Intel machine (Pentium IV 3.06GHz with 2GB RAM).
The stretch factor is defined as the ratio of the lengths of recovery paths generated by our algorithm to the lengths of the theoretically optimal paths. The optimal recovery path lengths are computed by recomputing the shortest paths tree of in the graph . In the figures [2,3], the Fir labels relate to the performance of the alternate paths algorithm used by the Failure Insensitive Routing protocol of , while the Crp labels relate to the performance of our algorithm for the SNFR problem.
Though  doesn’t present a detailed analysis of their algorithm, from our analysis, their algorithm needs at least time per sink node in the system. Figures [2,3] compare the performance of our algorithm (CRP) to that of  (FIR). The plots for the running times of our algorithm and that of  fall in line with the theoretical analysis that our algorithms are faster by an order of magnitude than those of . Interestingly, the stretch factors of the two algorithms are very close for most of the cases, and stay within 15%. The running time of the algorithms fall in line with our theoretical analysis. Our CRP algorithm runs within 50 seconds for graphs upto 600-700 nodes, while the FIR algorithm’s runtime shoots up to as high as 5 minutes as the number of nodes increase. The metrics are plotted against the variation in (1) the number of nodes (Figure ), and (2) the average degree of the nodes (Figure ). The average degree of a node is fixed at for the cases where we vary the number of nodes (Figure ), and the number of nodes is fixed at for the cases where we plot the impact of varying average node degree (Figure ). As expected, the stretch factors improve as the number of nodes increase. Our algorithm falls behind in finding the optimal paths in cases when the recovery path passes through the subtrees of multiple siblings. Instead of finding the best exit point out of the subtree, in order to keep the protocol simple and the paths well structured, our paths go to the root of the subtree and then follow its alternate path beyond that. These paths are formed using the blue edges. Paths discovered using a node’s green edges are optimal such paths. In other words, if most of the edges of a node are green, our algorithm is more likely to find paths close to the optimal ones. Since the average degree of the nodes is kept fixed in these simulations, increasing the number of nodes increases the probability of the edges being green. A similar logic explains the plots in Figure . When the number of nodes is fixed, increasing the average degree of a node results in an increase in the number of green edges for the nodes,444When the average degree is very small, there are only a few alternate paths available, and the algorithms usually find the better ones among them, resulting in smaller stretch factors. as well as the stretch factors.
5 Concluding Remarks
In this paper we have presented an efficient algorithm for the SNFR problem, and developed protocols for dealing with transient single node failures in communication networks. Via simulation results, we show that our algorithms are much faster than those of , while the stretch factor of our paths are usually better or comparable.
Previous algorithms [6, 8, 11] for computing alternate paths are much slower, and thus impose a much longer network setup time as compared to our approach. The setup time becomes critical in more dynamic networks, where the configuration changes due to events other than transient node or link failures. Note that in several kinds of configuration changes (e.g. permanent node failure, node additions, etc), recomputing the routing paths (or other information) cannot be avoided, and it is desirable to have shorter network setup times.
For the case where we need to solve the SNFR problem for all nodes in the graph, our algorithm would need time, which is still very close to the time required () to build the routing tables for the all-pairs setting. The space requirement still stays linear in and .
The directed version of the SNFR problem, where one needs to find the optimal (shortest) recovery paths can be shown to have a lower bound of using a construction similar to those used for proving the same lower bound on the directed version of SLFR and replacement paths problems. The bound holds under the path comparison model of  for shortest paths algorithms.
-  A. M. Bhosle and T. F. Gonzalez. Algorithms for single link failure recovery and related problems. J. of Graph Alg. and Appl., pages 8(3):275-294, 2004.
-  E. W. Dijkstra. A note on two problems in connection with graphs. In Numerische Mathematik, pages 1:269-271, 1959.
-  M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. JACM, 34:596-615, 1987.
-  J. Hershberger, S. Suri, and A. M. Bhosle. On the difficulty of some shortest path problems. ACM Transactions on Algorithms, 3(1), 2007.
-  D. R. Karger, D. Koller, and S. J. Phillips. Finding the hidden path: Time bounds for all-pairs shortest paths. In 32IEEE FOCS, pages 560-568, 1991.
-  S. Lee, Y. Yu, S. Nelakuditi, Z.-L. Zhang, and C.-N. Chuah. Proactive vs reactive approaches to failure resilient routing. In Proc. of IEEE INFOCOM, 2004.
-  A. Markopulu, G. Iannaccone, S. Bhattacharya, C. Chuah, and C. Diot. Characterization of failures in an ip backbone. In Proc. of IEEE INFOCOM, 2004.
-  Srihari Nelakuditi, Sanghwan Lee, Yinzhe Yu, Zhi-Li Zhang, and Chen-Nee Chuah. Fast local rerouting for handling transient link failures. IEEE/ACM Trans. Netw., 15(2):359–372, 2007.
-  R. Slosiar and D. Latin. A polynomial-time algorithm for the establishment of primary and alternate paths in atm networks. In IEEE INFOCOM, pages 509-518, 2000.
-  F. Wang and L. Gao. A backup route aware routing protocol - fast recovery from transient routing failures. In INFOCOM, 2008.
-  Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C.-N. Chuah. Failure inferencing based fast rerouting for handling transient link and node failures. In Proc. of IEEE INFOCOM, pages 4: 2859-2863, 2005.