The maximum disjoint paths problem on multi-relations social networks
Motivated by applications to social network analysis (SNA), we study the problem of finding the maximum number of disjoint uni-color paths in an edge-colored graph. We show the NP-hardness and the approximability of the problem, and both approximation and exact algorithms are proposed. Since short paths are much more significant in SNA, we also study the length-bounded version of the problem, in which the lengths of paths are required to be upper bounded by a fixed integer . It is shown that the problem can be solved in polynomial time for and is NP-hard for . We also show that the problem can be approximated with ratio in polynomial time for any . Particularly, for , we develop an efficient 2-approximation algorithm.
algorithm, social network analysis, disjoint paths, approximation algorithm, NP-complete.
A social network is usually modeled by a graph , in which is the set of actors and is the binary relation we are interested in. In the terminology of graph theory, is the node set and is the edge set. The connectivity, or node connectivity, of two nodes is the minimum number of nodes whose removal separates the two nodes. By Menger’s theory, it is equal to the maximum number of disjoint paths between the two nodes and also can be thought of as a simpler form of the maximum flow between them. In social network analysis (SNA), connectivity is a basic measurement of information flow between nodes and also used to define cohesion group and centralities [3, 6, 16]. Thus computing the connectivity of two nodes is an important problem in SNA.
When there are more than one kinds of relations, we can model a multi-relations social network by a graph with more than one edge sets. Let be a positive integer. A -relations social network can be described by , in which is the set of nodes and is a collection of edge sets. For , represents the -th relation, and we shall say the edges in are of color . Note that there may be edges of different colors between one pair of nodes. For a fixed , a graph is called as a -colors graph if there are at most colored edge sets, and simply a “color graph” if the number of colors is not fixed or need not be specified.
A path is of uni-color if all the edges of the path are of the same color. Two paths are internally disjoint if they have no common internal node, and a set of paths are internally disjoint if they are mutually internally disjoint. In this paper we shall simply use “disjoint”. The decision version of the main problem discussed in this paper is defined as follows.
Problem: The disjoint paths problem on color graphs (CDP)
Instance: A color graph , two nodes and a positive integer .
Question: Are there disjoint uni-color paths from to ?
In general, the graph may be directed or undirected but in this paper we only consider undirected graphs. We shall use the name “CDP” for the decision problem of which the input is a -colors graph. The maximization version, denoted by Max CDP problem, asks for the maximum number of disjoint uni-color paths between two given nodes, which will be called as their colored connectivity. When there is only one color, the maximum number of disjoint paths, i.e, the traditional connectivity, can be computed in polynomial time by solving the maximum flow problem. But the colored connectivity problem, to our best knowledge, has not been studied yet. A related but different problem studied in the literature is the minimum color path problem which is motivated by communication reliability and the goal is to find a path or two disjoint paths with minimum number of colors [11, 17]. Other related problems also includes the minimum color-cost path problem  and properly colored path problems, seeing  for example.
The motivation of studying the colored connectivity is natural. Most of the researches in SNA consider only single relation. But in practical there are more than one kinds of relations. The Max CDP problem arises if the information flow or the influence spread only along relations of the same kind. Computer virus spreading is an example. One virus usually spreads only along one or several particular computer softwares. Conversations among people is another example. People usually talk different topics with the ones of different relations. Disjoint paths also play an important role in data communication when security or traffic congestion is concerned. Thus the scenario of the Max CDP problem may also occur if different types of links between nodes are considered, either due to different media or different protocols.
The results and the organization of this paper are as follows. In Section 2, first we show that the CDP problem is NP-complete even for 2-colors graphs and that the Max CDP problem cannot be approximated with ratio less than two, unless NP=P. And then we give an -time -approximation algorithm for -colors graphs. Throughout this paper, and denote the numbers of edges and nodes of the input graph , respectively. An extreme example is given to show the tightness of the ratio. Also we give an time exact algorithm for the problem. Since, in social network analysis, short paths are considered much more significant than long paths, we also study the length-bounded version of the Max CDP problem, namely -LCDP, in which the lengths of solution paths are required to be upper bounded by a fixed integer . In Section 3, we show that the -LCDP problem can be solved by graph matching for and is NP-hard for . We also show that, for any fixed , the -LCDP problem can be approximated with ratio in polynomial time. Particularly, for a -colors graph, we give an efficient 2-approximation for with time complexity , in which is the number of paths found by the algorithm. In most of the applications, it is a linear time algorithm.
2 Complexity and approximability
In this section, we show the complexity and the approximability of the CDP problem. First, in Section 2.1, we show that the problem is NP-complete, and the proof also implies that the Max CDP problem is NP-hard and cannot be approximated with ratio less than two, unless NP=P. In Section 2.2, we give a simple -approximation algorithm for -colors graphs and an extreme example to show the sharpness of the ratio. In Section 2.3 we propose an algorithm for finding the exact solution. For a -colors graph , we shall denote by .
To show the NP-hardness of the CDP problem, we introduce the following similar problem, named MCDP in short.
Problem: The multi-pairs disjoint paths problem on -colors graphs
Instance: A -colors graph , pairs , , of nodes.
Question: Is there a color- path from to for each such that and are internally disjoint for all and ?
The reduction from the MCDP problem to the CDP problem is quite straightforward. We first assume that all nodes in the given pairs are distinct, and the other case will be explained later. For an instance of the MCDP problem, we construct a graph from by adding two new nodes and , as well as four edges , , , and . The edges and have color one and the other two new edges have color two. Apparently there exist two disjoint uni-color -paths in if and only if the answer of the MCDP problem is also “yes”. Therefore if the MCDP problem is NP-complete, so is the CDP problem. In the case that , we can add a duplicate of such that has the same neighbors as , and the edges incident to are and instead. Other cases that any two nodes in the given pairs are not distinct can also be handled similarly. We shall show the NP-completeness of the MCDP problem by transformation from the SAT problem. We remind that the MCDP problem on 1-color graphs is polynomial-time solvable when the number of pairs is fixed [12, 13, 14].
Let , be the clauses of the SAT problem and , , the variables. We construct a 2-colors graph as follows. The node set mainly consists of , and some other nodes for some “switches” (explained later). The edges of color 1 and 2 are depicted in Figure 1.
is an -stages graph, in which the -th stage corresponding to a clause for , and the -th and the -th stages are and , respectively. Two consecutive stages are connected as a complete bipartite graph. Note that, for simplicity, the super scripts of nodes are not shown in the figure. Different nodes are used to represent a same literal or appearing in different clauses.
For color 2, all occurrences of a same literal, i.e, or for all , are connected to form a path, and the four paths of two consecutive variables are connected by a switch as shown in the figure.
If and only if there is a truth assignment satisfying all the clauses, there are an -path in and an -path in , which are disjoint.
If the instance of SAT problem is satisfiable, we may have an -path in passing through all literals which are assigned False. That is, for each , the path passes through if False; and through otherwise. Since this truth assignment satisfies all clauses, each clause has a literal assigned True, and therefore there is a path from to in .
Conversely, suppose that there are two such disjoint paths. Since there is an -path in , each stage has a node not used by the path in . We observe that, in , any -path passes through all occurrences of either or for every . Therefore if we assign True if it is not passed by the path in and assign False otherwise, every clause has a literal assigned True and the instance is satisfiable. ∎
Since the MCDP and the CDP problems are apparently in NP, we obtain the following theorem.
The MCDP problem is NP-complete. The CDP problem is NP-complete even for determining if there exist 2 paths in a 2-colors graph.
The Max CDP problem is NP-hard and cannot be approximated in polynomial time with ratio for any , unless NP=P.
Since determining one or two paths is NP-complete, it is impossible to approximate the optimal with ratio less than two in polynomial time, unless NP=P. ∎
2.2 An approximation algorithm
By we denote the connectivity of and in graph , i.e., the maximum number of disjoint paths between them. When the subscript is omitted, denotes the maximum number of disjoint paths of uni-color. We show the following greedy algorithm is a -approximation algorithm for -colors graphs.
For each color , find . Select the color with maximum and put these paths into solution. Remove all internal nodes of these paths, and then repeat the previous step until no path remains.
The Max CDP problem can be -approximated in time for -colors graphs.
Apparently the optimal solution . The approximation ratio follows from that the number of paths found by the algorithm is at least . The value , i.e., connectivity in a uni-color graph, can be found by solving a maximum flow problem [9, p. 212] and therefore takes time [1, 2]. In total the algorithm takes time, or time since . ∎
Figure 2 illustrates a tight example of the -approximation algorithm. The optimal solution contains disjoint paths (the horizontal ones), one for each color. But if we choose the bold path of color 1 at the first iteration, the algorithm will find only one path.
2.3 An exact algorithm
First, for any color , if , this path of single edge must be in the optimal solution, and we can put it into the solution and remove this edge. Therefore, in the remaining paragraphs of this paper, we assume for any . For a -colors graph , define a node coloring . Two nodes are said to be assigned the same color if . For the convenience, nodes and are thought of having the same color as any node in any coloring. Let , , denote the subset of in which the two endpoints are assigned the same color by . Let and be the uni-color graph induced by the edge set . Suppose that is an optimal solution of the Max CDP problem. Let be a node coloring such that if is on a path of color in ; and is arbitrary otherwise.
We can observe that any path in must also be a path in and any path in corresponds to a uni-color path in . Thus, equals the -connectivities on and can be computed in time. If we individually solve the maximum flow problems for all colorings, the total time complexity will be . By the following observations, the complexity can be reduced to . Using the generalized Gray code, all the colorings can be arranged in an order such that two consecutive colorings differ at only one node, and thus the maximum flow corresponding to can be obtained from that corresponding to by performing at most two breadth-first-searches on the residual graph. The next theorem states the result but the detailed proof is omitted here.
There exists an time algorithm for the Max CDP problem on -colors graphs.
3 Length-bounded cases
In this section we discuss the Max CDP problem with bounded length. The length of a path is the number of edges in this path. When the path lengths are required to be upper bounded by a fixed integer , we name the problem by -LCDP. An edge will be denoted by , and denote a path of color and visiting in this order. The cases of can be easily solved, and we shall discuss the cases of and 4.
3.1 A polynomial time algorithm for 3-LCDP
The set of all common neighbors of nodes and of color is denoted by . Recall that we have assumed for all , and we need only consider paths of length at least 2. An -path of length two has the form , i.e., any co-neighbor of and may contribute a path. The next claim comes from that any -path of length two is disjoint to any others of length 2 and may intersect at most one -path of longer length.
If for any , there is an optimal solution of the 3-LCDP problem containing the path .
Algorithm 1 is the proposed method for solving the 3-LCDP problem exactly. Besides the above claim, the correctness of the algorithm is due to the next claim which can be shown by observing that a set of disjoint -paths corresponds to a matching on , and vice versa. We remind that defining as a set of ordered pairs is only for the sake of making step 9 easier. The maximum matching on a directed graph is the same as the one on an undirected graph.
Suppose that . A maximum matching of the graph constructed in Algorithm 1 corresponds to an optimal solution of the 3-LCDP problem.
The time complexity is dominated by the step of finding a maximum cardinality matching of a general graph, which can be done in time .
The 3-LCDP problem on color graphs can be exactly solved in time.
3.2 The complexity of 4-LCDP and an approximation algorithm
For the length-bounded case, the notations and are analogous to the ones without superscript but those paths are of length at most .
The -LCDP problem on -colors graphs is NP-hard for fixed and .
It is sufficient to show the case of and . We show the NP-hardness by transforming from a restrict version of the SAT problem in which there are at most 3 occurrences of each variable. This version of SAT problem still remains NP-complete [4, 15]. Let , , be the clauses and , , the variables. For any variable , if all the occurrences of are positive, we can assign True and remove from all clauses. The case of all occurrences are negative is similar. Therefore we can assume the occurrences of each variable are neither all positive nor all negative. As a result, both and occur at most twice for any . Given an instance of the restrict SAT problem, we construct a 2-colors graph as in Figure 3. Since the number of occurrences of each literal is at most two, any -path of any color has length at most 4.
Since both the degree of and are , , and the maximum is achieved if for any clause there is a literal not used in . On the other hand, since the degree of is . We can also easily find disjoint -paths in as long as for each we use either or as the internal nodes. If the SAT instance is satisfiable, let be a truth assignment satisfying all the clauses. We choose as internal nodes in if is assigned False in ; and otherwise. Then we can have disjoint -paths of color 1 since there exists a literal assigned True in each clause and thus not used in color 2. The total number of disjoint paths is . Conversely if there are disjoint -paths, there are exactly paths in and paths in . Therefore for each variable either itself or its negation is used in . Since there are disjoint paths in , each clause contains at least one literal not used in . So we can assign True if it is not used in and False otherwise, and all the clauses are satisfied. ∎
The -approximation algorithm in Section 2.2 also works for length bounded case. We may achieve a better approximation ratio for small .
For any fixed and , the -LCDP problem can be approximated with ratio in polynomial time.
Problem: The Maximum Set Packing (MSP) problem
Instance: A collection of -element subsets , , of a universal set of total elements.
Goal: A maximum disjoint sub-collection of .
By transforming to the MSP problem, the -CDP problem can be approximated with ratio for any . A direct transformation is as follows. Let be an instance of the -LCDP problem.
For each uni-color -path of length at most , create a subset consisting of the internal nodes of the path. There are at most subsets and for each .
The elements are all the nodes in the graph except and .
Any disjoint sub-collection corresponds to a set of disjoint uni-color paths.
The stop condition of the while-loop can be implemented by enumerating all possible subsets, testing if they are disjoint in time, and counting the intersected subsets in in time. Since is fixed, , and is also a constant determined by , this step takes . Since is increased at least one after each iteration and bounded by , the naive implementation has time complexity , which is polynomial for fixed and . Theorem 3.3 follows from Eq. (1), the transformation and the above analysis of the time complexity.
3.3 An efficient 2-approximation algorithm for 4-LDCP
Particularly, when and , by substituting , the approximation ratio by Eq. (1) is . That is, it takes time to compute a 2-approximation of the 4-LDCP problem. Although in polynomial time, it becomes intractable even for graphs of moderate size. In the following, we aim at developing a more efficient algorithm for and . Let denote the solution found so far, in which is the set of internal nodes of an -path. Let be the nodes not used yet. When , the while-condition can be implemented by
For each , determine if there are two disjoint -paths of length at most 4 in .
The key point is how to determine if in a color graph without generating all possible paths. We shall use the following notations. The distance, or shortest path length, between and in graph is denoted by . A node is an -cut node in graph if its removal separates the two nodes, i.e., after removing . The set of all such cut nodes is denoted by .
Algorithm 3 is correct and takes time.
The algorithm returns True iff for some color or there are two uni-color disjoint paths of two colors. Clearly, what we need to show is the correctness of the procedure Test.
By the assumption that for all , we need not consider the case that or . The test procedure starts with a repeat-until loop to remove any -cut node of one graph from the other. Note that the loop is necessary since removing nodes from a graph may result in new cut nodes. But the loop will only be executed at most four times since each graph has one -cut node originally and can have at most three -cut nodes or otherwise and will have distance more than 4 (including , i.e., disconnected).
Step 6 deals with the case that the distance between and in either graph exceeds 4 or there exists any common -cut node. At the beginning of step 7, we have that and the distance between and at either graph is at least two. Let . If , there exists a (unique) -path of color . Immediately the output should be True since and is not in . The case that is similar. If , there is a path or in . Since is not in and , recalling that we have removed any -cut node of from , the result should also be True.
The remaining case is . Recall that each graph has at least one -cut node. Any length-4 -path in contains exactly three internal nodes, said , in which and therefore not in . Furthermore neither nor is in . Hence, removing the three nodes destroys at most two paths in . If there are more than two, not disjoint surely, length-4 -paths in , the output should be True. Similarly it holds if there are more than two such paths in . The remaining case is that there are one or two paths in either graph, and the answer can be obtained by the following method. First we choose a path in and check if the removal of the internal nodes separates and in . If not, we find two disjoint paths. Otherwise we choose the other path in if any, and do it again.
By the above discussion, the test procedure takes linear time, i.e., . The whole algorithm calls the test procedure for each pair of and , and therefore the total time complexity is since the other steps of Algorithm 3 can be done in time. ∎
Combining Algorithms 2 and 3, we obtain the next theorem. The time complexity is obtained as follows. To implement the while-condition of Algorithms 2, we need to call Algorithm 3 at most times, where is the number of paths found so far. Let be the number of paths found by the algorithm. Since the while-loop may be executed at most times, the total time complexity is .
There exists an time 2-approximation algorithm for the 4-LCDP problem on a -colors graph, in which is the number of paths found by the algorithm.
Finally we would like to remark the following. In most of the applications, both and are small integers, and thus the approximation algorithm runs in linear time. Furthermore, since we need only consider the graphs induced by for each color , the algorithm is in fact a local algorithm and is therefore efficient even for large-scale social networks.
-  Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press and McGraw-Hill (2001)
-  Dinic, E.A.: Algorithm for solution of a problem of maximum flow in networks with power estimation. Sov. Math. Dokl. II, 1277–1280 (1970)
-  Freeman, L.C., Borgatti, S.P., White, D.R.: Centrality in valued graphs: A measure of betweenness based on network flow. Soc. Netw. 13(2), 141–154 (1991)
-  Garey, M.R., Johnson, D.S: Computers and Intractability: A Guide to The Theory of NP-Completeness. Freeman, NewYork (1979)
-  Gutin, G., Kim, E.J.: Properly coloured cycles and paths: results and open problems. Golumbic Festschrift, LNCS 5420, 200–208 (2009)
Hanneman, R.A., Riddle, M.: Introduction to Social Network Methods,
-  Hassin, R., Monnot, J., Segev D.: Approximation algorithms and hardness results for labeled connectivity problems. J. Comb. Optim. 14(4), 437–453 (2007)
-  Hurkens, C.A.J., Schrijver, A.: On the size of systems of sets every of which have an SDR, with an application to the worst-case ratio of heuristics for packing problems. SIAM J. Discret. Math. 2, 68–72 (1989)
-  McHugh, J.A.: Algorithmic Graph Theory. Prentice Hall (1990)
-  Micali, S., Vazirani, V.V.: An algorithm for finding maximum matching in general graphs. FOCS, 17–27 (1980)
-  Mohan, G., Murthy, C.: Lightpath restoration in WDM optical networks. IEEE Netw., 24–32 (2000)
-  Robertson, N., Seymour, P.D.: Graph minors. XIII. The disjoint paths problem. J. Comb. Theory, Series B 63, 65–110 (1995)
-  Seymour, P.D.: Disjoint paths in graphs. Discret. Math. 29, 293–309 (1980)
-  Shiloach Y.: A polynomial solution to the undirected two paths problem. J. ACM 27, 445–456 (1980)
-  Tovey, C.A.: A simplified NP-complete satisfiability problem. Discret. Appl. Math. 8(1), 85–89 (1984)
-  Wasserman S., Faust, K.: Social Network Analysis, Cambridge University Press, Cambridge (1994)
-  Yuan, S., Varma, S., Jue, J.P.: Minimum-color path problems for reliability in mesh networks. IEEE INFORCOM 2005 volume 4, 2658–2669 (2005)