A practical Single Source Shortest Path algorithm for random directed graphs with arbitrary weight in expecting linear time
A practical Single Source Shortest Path algorithm for random directed graphs with arbitrary weight in expecting linear time
|Fujian Zhangzhou No.1 High School|
Abstract In this paper I present a new algorithm called Raffica algorithm for Single-Source Shortest Path. In random graph, this algorithm has time complexity(in expect). And for random grid graphs with hop-diameter, it is also linear. This algorithm can solve SSSP with arbitrary weights; when a negative cycle exists, this algorithm can find it out costing (in expect). It means we can use it to solve random System of Difference Constraints fast as in expect. Using the idea, I can prove the expecting time complexity of queue optimized Bellman-Ford Algorithm, which is usually called SPFA***Not admitted by decade, and even not firstly presented by Duan, the following so-called SPFA is only for convenience. And its expecting time complexity used to be unknown., is time complexity in expect, where is the expecting hop-diameter, unlike the claim of Duan’s.
Keywords: Time complexity; Negative cycles; Single Source Shortest Path; Raffica algorithm; Random Graph.
Single Source Shortest Path problem(SSSP) is the most basic problem in graph optimization studied in computer science. It is widely used in mathematical modeling, such as traffic regulation, Systems of Difference Constraints, and so on.
Prior work Dijkstra gave the problem a solution[?]. It sorts the distance of the vertices. On the base of Dijkstra’s algorithm, they use priority queues to make a new approach . Using Fibonacci Heap will even make it only cost , which is almost linear. We can also use a bucket to solve it in [?], where is the max weight of the graph. We can even use the characteristic of RAM to solve it in or . Using multi-level bucket[?], it can solve SSSP on graph which edge lengths satisfy nature distribution in expecting linear time. But can only handle directed graphs with non-negative weight.
Thorup[?] found an algorithm to solve SSSP on undirected graph in linear time complexity. It is a very important algorithm in theorem, however, it runs actually extremely slow in real life.
Sometimes we need to solve this problem with arbitrary weights, like solving Systems of Difference Constraints. When it comes to arbitrary weights, we may meet negative cycles. Bellman-Ford Algorithm[?] is a basic method relaxing and iterating for times. The best algorithm to solve SSSP with arbitrary weights used to be queue optimized Bellman-Ford Algorithm[?], which had a complexity of as we considered. Duan claimed that the complexity of queue optimized Bellman-Ford Algorithm is , and gave the algorithm a name Shortest Path Faster Algorithm(SPFA). Later the claim was proved wrong, but SPFA does run fast, when the graph is not specially constructed and no negative cycles are included. But it runs slow as complexity with graphs with negative cycles in expect. It also runs slow as in grid graph, which means it cannot fit a lot of status in real life.
My contribution I found a simple but effective method to improve SPFA. The improved algorithm is so-called Raffica algorithm. Finding that the single source shortest path forms a tree, I denote it as an Auxiliary Tree and use a breadth first search to maintain it. The relaxing operation in priority-queue-based Dijkstra’s algorithm only happens once on each vertex. But it may happen a lot of times on every vertex in Raffica algorithm. Thus, using Auxiliary Tree, it can cut down so much trivial relaxing that each vertex is relaxed expecting times in random graph. The operation that cuts the relaxing down is called ‘Raffica‘. When Auxiliary Tree fails to maintain, it only means a cycle on the tree appears, and there exists a negative cycle, therefore the problem has no solutions.
It can solve SSSP in in expect on the model of both G(n, p) random graph and the traffic problem model, which is a graph basically gridly constructed and with diameter. When we are to find negative cycles, it also cost in expect. And I found how to prove SPFA’s time complexity is in expect, unlike the claim in [?].
Negative cycle detecting is also a common model. System of Difference Constraints is an example. Comparing with SPFA, Raffica algorithm is linear time complexity in expect, which is a lot faster. Actually, it is the best way to detect a negative cycle.
Raffica algorithm is in worst case scenario, while SPFA is too. However, the worst case scenario of Raffica algorithm will hardly appear in traffic problems(in other words, a random near-grid graph), while SPFA will easily fall into the worst case scenario by grid graph. The method to improve the worst case of my algorithm is only to reconstruct the graph, or to change the method to search instead of a breadth first search. However, the common reconstruct method is still to be find out. Though, my algorithm is still practical in traffic problems and random graphs.
Following these conclusions, we can solve the All-Pairs Shortest Path(APSP) problem in on random graphs in expect, which is better than Floyd Algorithm[?]. We can maintain a dynamic SSSP with arbitrary weights in in expect by simply resolve the problem. And we can solve the minimum average weight cycle problem by simply using dichotomy and Raffica algorithm and we get a solution in expect, where stands the maximum weight of the graph.
The following graph is a directed weighted graph , . As an SSSP problem, we denote the source vertex as . is the count of edges while is the count of vertices.
Auxiliary Tree is a tree used in Raffica algorithm.
The hop-diameter or the diameter of the graph is defined as the maximum count of vertices on the shortest path on each . It is denoted as . On a sparse random graph, in expect[?].
SP Tree is the Shortest Path Tree of the SSSP. Auxiliary Tree is convergent during the Raffica algorithm. Finally it will be the same as SP Tree.
stands the father node of on any tree.
The depth of a vertex indexed is denoted as or . It stands how many vertices are there on the path from source to . .
Both Auxiliary Tree and SP Tree are rooted by ;
is an array saving the label whether the vertex should be in queue, denotes whether the vertex is in queue.
is the distance of the SSSP problem.
Relaxing is an operator on an edge from Bellman-Ford Algorithm and Dijkstra’s Algorithm. When an edge is relax-able, it means .
Let us define an iteration. Denote the iteration count of a vertex is how many vertices it visited from the source vertex to , in other words, the depth on the Auxiliary Tree. An iteration of a BFS-like algorithm(i.e. the following SPFA and Raffica algorithm) is a series of relaxing where the iteration count is equal.
During the iteration, I call the vertex in queue as the Dark Point.
For example, if the vertex 1 is the source , then the iteration count of vertex 1 is 1, the count of vertex 2 and 3 is 2, the count of vertex 4 and 5 is 3.
The random graph uses this model: , where and . , without loss of generality, let the weight satisfy Uniform Distribution . The reason why the weights are non-negative is, when tends to be infinity, random graph with negative weights can easily form a negative cycle, making the problem lose solutions. Making this hypothesis doesn’t affect the result. †††For a more generalized conclusion, once the weight satisfies or can be regarded as a derivable distribution, the result is still correct.
Bellman-Ford Algorithm[?] is a classic algorithm solving the SSSP on arbitrary weighted graph. It uses relaxing to iterate and after times iteration, we get the answer. If in the th iteration there is any vertex relaxed, those vertices form negative cycle(s).
The time complexity is obviously . There are many kinds of improving like Yen’s, and so on. The following SPFA is also a kind of improving. Actually, you can improve the time to by only checking if the distance changes with iterations.
SPFA is the queue optimized Bellman-Ford Algorithm.
SPFA uses a queue to keep the vertices, a little bit like BFS. SPFA uses Adjacency table. During the SPFA, we search and relax, pushing the relax-able vertices into the queue and update the distance.
The following pseudocode describes how SPFA works.
Raffica algorithm is an improved SPFA. Raffica algorithm is based on this theorem:
The solution of the SSSP forms a tree.
Obviously the solution includes all the vertices reachable. Suppose the solution of the SSSP includes a cycle, and is a vertex on the cycle, then it means from we go through the cycle, and back to , it is still a shortest path. If the sum of the weight of the whole cycle is positive, it won’t be the SSSP, because going through this cycle will make us get a worse answer. If the weight of the cycle is zero, we needn’t go through this cycle. If negative, we will go through the cycle for infinite times so that the answer doesn’t exist. So the solution of SSSP doesn’t include any cycles.
The method is pretty simple. We maintain the Auxiliary Tree, when we relax an edge , we set as ’s son on the Auxiliary Tree . If already has a father, we break the edge on the Auxiliary Tree and reset as . This operation is called ‘‘.
It is easy to see that we successfully maintain a tree, except that is an ancestor of on the tree.
Now we are to prove when is an ancestor of on the tree, this SSSP has a negative cycle .
When we can relax , it means . When this inequality comes to a cycle during the iteration, it can only mean that there is a negative cycle.
If there is a negative cycle, the SSSP problem has no solutions. So what we maintain is absolutely a tree.
For the BFS can traverse all these vertices reachable, if there is a negative cycle reachable, we can absolutely find it.
So finding a cycle on the Auxiliary Tree is the necessary and sufficient condition of existing a reachable negative cycle.
We introduce the second theorem:
Shortest Path(SP) can be divided into smaller SP.
i.e. If SP includes vertex , we can conclude that and on the SP are also SP in smaller problems.
Consider that during the iteration, when there is a vertex in the queue, and its ancestor is just Raffica-ed. Before Raffica-ed, . means the length of shortest path from to . After Raffica-ed, become lower, so should become lower too, due to the Theorem 2nd.
For the vertex is still in queue, it uses an earlier distance data. We need not to keep it in queue more. We need only in queue. Else there will be redundant relaxing. So we clear the in-queue label of ’s subtree when .
Consider the status of the picture. In a relaxation , 2 is the earlier father of 3. After , 2 no longer has a child 3. 1 obtains a child 3.
Suspect that 4 and 5 are in queue at this time, we consider the updating and . It is still using the old data. In other words, it still believe the best path is and . But we have already known is better than , so we should let 4 and 5 out of the queue, while 3 is in the queue, using the newest result, considering is the best. In the next iteration, 4 and 5 will consider and as the shortest path.
If we didn’t do this Raffica like SPFA, we will firstly update 4’s son and 5’son using and , secondly using and . If we would Raffica a lot in Raffica algorithm, and the SP Tree is tall, will be very slow, because we need to use new data to override old data during every iteration. Although there is a lot of Rafficas, Raffica algorithm would not update all the subtree of SP Tree in average. It will update the subtree of Auxiliary Tree, which is not very tall.
In fact, the graph needing a lot Rafficas while having a tall SP Tree is common: a grid graph is one of the example, where SPFA runs slow and Raffica algorithm runs fast.
If we need to find out a negative cycle, we consider if v is the ancestor of v, If yes, there is a negative cycle.
We would simply use a DFS to check it, because DFS costs the same time as Raffica.
Now we show the pseudocode.
Now we formally prove the correctness of Raffica algorithm and SPFA. ‡‡‡Duan[?] didn’t prove this algorithm strictly, for example, why the algorithm will end. If did, he might had come up with Raffica algorithm earlier than me.
For Raffica algorithm, the Auxiliary Tree is always a SP Tree of ‘the graph consist of all the vertices in Auxiliary Tree and the edges traversed‘.
Now we use a Mathematical Induction to prove it will return a correct answer. Firstly, a tree consisted of a vertex obviously meets the condition. Considering a relax, if it causes no Raffica, obviously meets the condition too. If it causes a , the subtree of v is cut, so it also meets the condition.
And now we are to prove that this algorithm will come to an end. Fist we prove a vertex cannot be Rafficaed more than N-2 times. Consider a vertex . Except the first time visited, because Raffica algorithm goes through the SP Tree, BFS costs iterations to end, which is the maximum possible height of SP Tree. For every iteration, only a vertex can Raffica , because in a path from to a leaf, there is no more than one vertex in the queue.
After the Rafficas for every vertices, it remains a BFS. So absolutely the algorithm will come to an end.
SPFA The difference between Raffica algorithm and SPFA is SPFA does not clear the in-queue label when Raffica. So it is easy to see that it is also correct.
The worst case scenario of the Raffica algorithm is like the following picture:
Point A has output and will be Raffica-ed times. We may reconstruct the graph by spliting the output or randomizing. The worst case scenario time complexity can be improved to , but it is trivial.
In real life, there may be little vertices with output like the above figure. Afterwards, I will show that this case can be improved. The unimprovable worst case scenario appears in a desperately extreme status, unlike the worst case of SPFA.
Raffica algorithm A BFS is obviously , we consider the extra complexity caused by Raffica.
Firstly, the count of Raffica is no more than (due to the correctness proof).
On the Auxiliary Tree, one’s ancestor can not be in queue with it. Suppose there is a leaf vertex . is a series of edges from root to leaf.
When an edge in the final SP tree is accessed, and won’t be Raffica-ed.
There won’t be any path shorter than the final SP Tree. Once it is accessed, there are no solutions better than this solution. So both and won’t be Raffica-ed.
Due to [?], the expecting diameter of a random graph follows:
|Concentrated on mostly 2|
|Concentrated on mostly 2|
|Concentrated on mostly 3|
|Concentrated on mostly 4|
|, is a small constant||Concentrated on mostly|
|Graph will be mostly disconnected.|
When the graph is dense, the diameter is a constant in expect, making both SPFA and Raffica algorithm runs . So we need only consider sparse graphs, where diameter is .
Consider each path from the root to the leaf.
Basically, the density§§§The density is how many iterations are there during 2 Rafficas. of Raffica tends to increase with depth in expect(and convergent to 0.5, later to prove), making it hard to use the linear property of mathematical expectation. The extra cost of is the size of the subtree rooted by . The total cost of the Raffica operation is the sum of all the extra costs.
The count of nodes on the depth is more than [?], where , . Denote the count by , and it is sure that it is not only monotonically increasing, but larger than a concave function.
For , there always exists an that for any , the density of Raffica on depth is larger than . Easily, the cost of a Raffica is , which is a certain value(). When the depth increases, the cost of Raffica decreases. And due to the concave, the density of cost also decreases. The expecting time of Raffica is:
On the depth , one who wants to Raffica another, the distance must be shorter than it. The expecting distance on the depth is:
The reason is, the minimum value of vertices on the Uniform Distribution tends to . The small o means it is lower than , but it is sure that it increases with depth. And, for it is a sparse graph, , making the depth to be infinity, and the distance of the leaves also tends to be infinity.
Denote the Probability Density Function that depth as . Suppose that there are 2 nodes named and , the depth is and respectively.
The probability that an edge from the is:
Despite we don’t know the exact formula of now, we can still find out some property of . For every , there is:
And, , and is partial derivative by y, then:
Due to sandwich theorem, .
With the depth grows, the size of Auxiliary Tree increases, making the count of Raffica targets grows. Both probability and count of target grows, then the density of Raffica absolutely increases with depth. Now we are to prove that,
Due to [?],
Transform the problem:
For it is a sparse graph, when , in other words, , tends to 0. Back to the original inequality,
The theorem R1 and R2 shows that, when depth tends to infinity, the probability that any edge does Raffica others tends to .
Now we know that the total cost by Raffica is . The question turns out to be how much time does other operations cost. It looks like very slow to check and update a subtree of , because each DFS may cost . However, the subtree to be maintained and to be checked, is the same size as Raffica operator, making the other operations using DFS the same cost as Raffica. The conclusion is, Raffica Algorithm has time complexity in expect on random graph.
SPFA Firstly we give an upper bound. Using the same analysis. The difference between and Raffica algorithm is SPFA does not clear the in-queue label of the subtree of the Dark Point.
According to [?], the diameter of a random graph is , where . And according to another paper [?], the average distance in the graph is , where is the degree of the vertex indexed . So,
For tends to infinity, the average distance tends to , and:
And there is obviously . Due to Sandwich Theorem, . The time of a checked-change Bellman-Ford Algorithm******Check if any distance changes after an iteration. has expecting time complexity. For every vertex , is done Rafficas. So is the extra cost of Rafficas. So the upper bound is .
For a more precise analysis, consider the probability that a Raffica successes. When the depth tends to infinity, the probability tends to . Each Raffica costs the time of the average height , because the Dark Point isn’t erased. The path from the root to a leaf has more than 1 Dark Point. The lower bound is also .
When SPFA deals the problem finding negative cycle, its expect time complexity is , while Raffica algorithm is . Because SPFA judges a negative cycle by checking how many times any vertices be in queue. If it enters the queue times, there exists a negative cycle. Each iteration total expecting vertices is in queue. Raffica algorithm draw a conclusion that there exist a negative cycle when it firstly finds a negative cycle. So the time complexity is .
Now we consider a grid or a near-grid graph. The diameter of these graphs are , and the counts of out degrees are . The weights of the edges satisfy a uniform distribution ††††††Similarly, if the weights satisfy not a uniform distribution but a normal one, it only makes SPFA and Raffica algorithm run faster.. A grid graph or near-grid graph is often seen in real life, I call it a traffic problem.
SPFA runs slow in this graph, while Raffica algorithm runs in linear complexity.
Using a similar analysis, when tends to infinity, the density of Raffica is always a constant. Therefore, each Raffica cost a constant expecting time. Then it is .
Consider SPFA. The diameter of the grid graph is , and the time of Raffica is . The total cost is in expect.
In fact, SPFA’s time complexity depends on the height of the SP tree and the density of Raffica. Actually, the grid graph is only one of the graphs SPFA runs slow, if only both the height and the density are huge, SPFA then runs also slow. In fact, this status often appears in traffic SSSP problems.
System of Difference Constraints is a series of difference constraints like , which can be easily transformed to a SSSP problem with arbitrary weight. It is widely used to many applications, such as temporal reasoning.
We define a vertex S as the super source vertex. Transform the inequality to . This is the familiar triangle inequality as we see. For each inequality we add an edge with weight , then for each vertex X we add an edge with weight 0. Then we regard S as source, the solution of System of Difference Constraints is the solution of SSSP. If there is a negative cycle, there exists no solution.
The above graph stands for these constraints:
This system has no solutions because is a negative cycle.
We often want to know if this problem has any solutions, we are often to find if this SSSP problem has a negative cycle.
Bellman-Ford Algorithm and SPFA cannot solve the find-negative-cycle problem very fast, even in Mathematical Expectation, for the reason of both these algorithms check negative cycle by checking the count of relaxations of every vertices.
The graph has a feature, that the is connected with each vertex. Except this, it is a random graph. And the connection obviously won’t make the time complexity longer. The total time is still in expect.
Dynamic SSSP model is also commonly used in real life. According to [?], the best algorithm of dynamic SSSP used to be Incremental Algorithm, per edge. If the graph is randomly or gridly constructed, even if we resolve it, it is better than Incremental Algorithm in expect.
An average weight of a cycle denotes the total sum of the weight of the cycle divided by the total count of the cycle. Karp[?] found an algorithm to solve it in . In random graph or grid graph, we have a algorithm in expect, stands the max weight of the edges.
Using dichotomy, we decide a . If the answer average weight cycle is , for each edge minus , there will be a zero cycle. If is greater than the answer, each edge minus there will be a negative cycle. If is lower, there will be no negative or zero cycle.
Raffica algorithm is detecting negative cycles in expect. So we can easily conclude that the total time complexity is .
|Non-negative weighted Random Graph|
|Negative Cycle Random Graph||unable||unable|
|Arbitrary Weight Random Graph||unable||unable|
|Traffic Problem‡‡‡‡footnotemark: ‡‡|
|Non-negative Weighted Random Graph|
|Arbitrary Weight Random Graph||unable|
We listed 2 lists to compare each classical algorithms and my algorithm on SSSP and APSP.
You can see that, in random graph or traffic problem, Raffica algorithm has linear complexity, which is absolutely fastest. Thorup’s algorithm is also linear, which should be the fastest too. But it can only handle undirected graph, and it is actually extremely slow on real running time. My one doesn’t, which means it can replace Dijkstra’s algorithm in random graph or traffic problems.
On handling graphs with arbitrary weights, my algorithm is almost completely better than other algorithms.
Figure 3 is the worst case scenario. This case may often appear in real life. We can easily transform it to the figure 6 scenario. For the vertex with many out degrees, we separate these edges into those vertices. It is easy to see the correctness of the transformation. And it is easier(linear) to solve for Raffica algorithm.
Figure 7 is another status. The 0 vertex has a subtree looked like a binary-tree. We cannot handle it like the upper one. But we can hardly see it in real life. What we can do is to change the method of searching: not BFS or DFS, but an IDFS. I cannot quantitative the effect of this optimization yet.
In conclusion, if there is a vertex with a subtree on the SP Tree having vertices in a particular depth, and we visit it in a particular way, so that those vertices are updated many times. And that is the worst case.
Another feature of the worst case is there are some vertices visited many times. But even if we separate the in-degree, the in-degree may also appear like a binary-tree.
In this way, we can separate the in-degree and out-degree, improving the worst case to . Reconstructing the graph remains an open problem.
We can also use priority queue to improve the worst case. Using an evaluation function, we can let it have a higher priority that the size of sub-tree is small. It can solve the binary-tree status, but it can’t tackle all the statuses.
Thanks to Mr.Wang, Mr.Wen and anyone who has supported me, and those who used this algorithm to real problem settling, giving me priceless test cases and examples. I am now looking for a (new) university to study CS.
-  R. K. Ahuja, K. Mehlhorn, J. Orlin, and R. E. Tarjan. Faster algorithms for the shortest path problem. Journal of the ACM (JACM), 37(2):213–223, 1990.
-  R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1):87–90, 1958.
-  Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, 2002.
-  K. Chatterjee, M. Henzinger, S. Krinninger, V. Loitzenbauer, and M. A. Raskin. Approximating the minimum cycle mean ¡î. Theoretical Computer Science, 547(1):104–116, 2014.
-  B. V. Cherkassky, L. Georgiadis, A. V. Goldberg, R. E. Tarjan, and R. F. Werneck. Shortest-path feasibility algorithms:an experimental evaluation. Journal of Experimental Algorithmics, 14(12):124312–124312–11, 2010.
-  B. V. Cherkassky and A. V. Goldberg. Negative-cycle detection algorithms. Mathematical Programming, 85(2):277–311, 1999.
-  F. R. K. Chung and L. Lu. The diameter of sparse random graphs. Advances in Applied Mathematics, 26(4):257–279, 2001.
-  T. H. E. L. Cormen. Introduction to Algorithms. Mit Pr, 7 2005.
-  E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematics, 1(1):269–271, 1959.
-  F. Duan. A faster algorithm for the shortest-path problem called spfa. Xinan Jiaotong Daxue Xuebao/journal of Southwest Jiaotong University, 29(2), 1994.
-  J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the Acm, 19(2):248–264, 1972.
-  C. Fan and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1):91–113, 2004.
-  R. W. Floyd. Algorithm 97: Shortest path. Communications of The ACM, 5(6):345, 1962.
-  A. V. Goldberg. Scaling algorithms for the shortest paths problem. In Acm/sigact-Siam Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas, pages 222–231, 1993.
-  A. V. Goldberg. Scaling algorithms for the shortest paths problem. SIAM Journal on Computing, 24(3):494–504, 1995.
-  A. V. Goldberg and T. Radzik. A heuristic improvement of the bellman-ford algorithm. Applied Mathematics Letters, 6(3):3–6, 1993.
-  T. Hagerup. Improved shortest paths on the word ram. international colloquium on automata languages and programming, pages 61–72, 2000.
-  R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):309 – 311, 1978.
-  P. Lotstedt and L. R. Petzold. Numerical solution of nonlinear differential equations with algebraic constraints i: Convergence results for backward differentiation formulas. Mathematics of Computation, 46(174):491–516, 1986.
-  J. Lysgaard. A two-phase shortest path algorithm for networks with node coordinates. European Journal of Operational Research, 87(2):368–374, 1995.
-  G. E. Pantziou, P. G. Spirakis, and C. D. Zaroliagis. Efficient parallel algorithms for shortest paths in planar digraphs. Bit Numerical Mathematics, 32(2):215–236, 1992.
-  J. Pedersen, T. Knudsen, and O. Madsen. Topological routing in large-scale networks, 02 2004.
-  G. Ramalingam, J. Song, L. Joskowicz, and R. E. Miller. Solving systems of difference constraints incrementally. Algorithmica, 23(3):261–275, 1999.
-  M. Thorup. Undirected single-source shortest paths with positive integer weights in linear time. J. ACM, 46(3):362–394, May 1999.