The maximum disjoint paths problem on multirelations social networks
Abstract
Motivated by applications to social network analysis (SNA), we study the problem of finding the maximum number of disjoint unicolor paths in an edgecolored graph. We show the NPhardness and the approximability of the problem, and both approximation and exact algorithms are proposed. Since short paths are much more significant in SNA, we also study the lengthbounded version of the problem, in which the lengths of paths are required to be upper bounded by a fixed integer . It is shown that the problem can be solved in polynomial time for and is NPhard for . We also show that the problem can be approximated with ratio in polynomial time for any . Particularly, for , we develop an efficient 2approximation algorithm.
Keywords:
algorithm, social network analysis, disjoint paths, approximation algorithm, NPcomplete.
1 Introduction
A social network is usually modeled by a graph , in which is the set of actors and is the binary relation we are interested in. In the terminology of graph theory, is the node set and is the edge set. The connectivity, or node connectivity, of two nodes is the minimum number of nodes whose removal separates the two nodes. By Menger’s theory, it is equal to the maximum number of disjoint paths between the two nodes and also can be thought of as a simpler form of the maximum flow between them. In social network analysis (SNA), connectivity is a basic measurement of information flow between nodes and also used to define cohesion group and centralities [3, 6, 16]. Thus computing the connectivity of two nodes is an important problem in SNA.
When there are more than one kinds of relations, we can model a multirelations social network by a graph with more than one edge sets. Let be a positive integer. A relations social network can be described by , in which is the set of nodes and is a collection of edge sets. For , represents the th relation, and we shall say the edges in are of color . Note that there may be edges of different colors between one pair of nodes. For a fixed , a graph is called as a colors graph if there are at most colored edge sets, and simply a “color graph” if the number of colors is not fixed or need not be specified.
A path is of unicolor if all the edges of the path are of the same color. Two paths are internally disjoint if they have no common internal node, and a set of paths are internally disjoint if they are mutually internally disjoint. In this paper we shall simply use “disjoint”. The decision version of the main problem discussed in this paper is defined as follows.
Problem: The disjoint paths problem on color graphs (CDP)
Instance: A color graph , two nodes and a positive integer .
Question: Are there disjoint unicolor paths from to ?
In general, the graph may be directed or undirected but in this paper we only consider undirected graphs. We shall use the name “CDP” for the decision problem of which the input is a colors graph. The maximization version, denoted by Max CDP problem, asks for the maximum number of disjoint unicolor paths between two given nodes, which will be called as their colored connectivity. When there is only one color, the maximum number of disjoint paths, i.e, the traditional connectivity, can be computed in polynomial time by solving the maximum flow problem. But the colored connectivity problem, to our best knowledge, has not been studied yet. A related but different problem studied in the literature is the minimum color path problem which is motivated by communication reliability and the goal is to find a path or two disjoint paths with minimum number of colors [11, 17]. Other related problems also includes the minimum colorcost path problem [7] and properly colored path problems, seeing [5] for example.
The motivation of studying the colored connectivity is natural. Most of the researches in SNA consider only single relation. But in practical there are more than one kinds of relations. The Max CDP problem arises if the information flow or the influence spread only along relations of the same kind. Computer virus spreading is an example. One virus usually spreads only along one or several particular computer softwares. Conversations among people is another example. People usually talk different topics with the ones of different relations. Disjoint paths also play an important role in data communication when security or traffic congestion is concerned. Thus the scenario of the Max CDP problem may also occur if different types of links between nodes are considered, either due to different media or different protocols.
The results and the organization of this paper are as follows. In Section 2, first we show that the CDP problem is NPcomplete even for 2colors graphs and that the Max CDP problem cannot be approximated with ratio less than two, unless NP=P. And then we give an time approximation algorithm for colors graphs. Throughout this paper, and denote the numbers of edges and nodes of the input graph , respectively. An extreme example is given to show the tightness of the ratio. Also we give an time exact algorithm for the problem. Since, in social network analysis, short paths are considered much more significant than long paths, we also study the lengthbounded version of the Max CDP problem, namely LCDP, in which the lengths of solution paths are required to be upper bounded by a fixed integer . In Section 3, we show that the LCDP problem can be solved by graph matching for and is NPhard for . We also show that, for any fixed , the LCDP problem can be approximated with ratio in polynomial time. Particularly, for a colors graph, we give an efficient 2approximation for with time complexity , in which is the number of paths found by the algorithm. In most of the applications, it is a linear time algorithm.
2 Complexity and approximability
In this section, we show the complexity and the approximability of the CDP problem. First, in Section 2.1, we show that the problem is NPcomplete, and the proof also implies that the Max CDP problem is NPhard and cannot be approximated with ratio less than two, unless NP=P. In Section 2.2, we give a simple approximation algorithm for colors graphs and an extreme example to show the sharpness of the ratio. In Section 2.3 we propose an algorithm for finding the exact solution. For a colors graph , we shall denote by .
2.1 NPcompleteness
To show the NPhardness of the CDP problem, we introduce the following similar problem, named MCDP in short.
Problem: The multipairs disjoint paths problem on colors graphs
Instance: A colors graph , pairs , , of nodes.
Question: Is there a color path from to for each such that and are internally disjoint for all and ?
The reduction from the MCDP problem to the CDP problem is quite straightforward. We first assume that all nodes in the given pairs are distinct, and the other case will be explained later. For an instance of the MCDP problem, we construct a graph from by adding two new nodes and , as well as four edges , , , and . The edges and have color one and the other two new edges have color two. Apparently there exist two disjoint unicolor paths in if and only if the answer of the MCDP problem is also “yes”. Therefore if the MCDP problem is NPcomplete, so is the CDP problem. In the case that , we can add a duplicate of such that has the same neighbors as , and the edges incident to are and instead. Other cases that any two nodes in the given pairs are not distinct can also be handled similarly. We shall show the NPcompleteness of the MCDP problem by transformation from the SAT problem. We remind that the MCDP problem on 1color graphs is polynomialtime solvable when the number of pairs is fixed [12, 13, 14].
Let , be the clauses of the SAT problem and , , the variables. We construct a 2colors graph as follows. The node set mainly consists of , and some other nodes for some “switches” (explained later). The edges of color 1 and 2 are depicted in Figure 1.
is an stages graph, in which the th stage corresponding to a clause for , and the th and the th stages are and , respectively. Two consecutive stages are connected as a complete bipartite graph. Note that, for simplicity, the super scripts of nodes are not shown in the figure. Different nodes are used to represent a same literal or appearing in different clauses.
For color 2, all occurrences of a same literal, i.e, or for all , are connected to form a path, and the four paths of two consecutive variables are connected by a switch as shown in the figure.
Lemma 1
If and only if there is a truth assignment satisfying all the clauses, there are an path in and an path in , which are disjoint.
Proof
If the instance of SAT problem is satisfiable, we may have an path in passing through all literals which are assigned False. That is, for each , the path passes through if False; and through otherwise. Since this truth assignment satisfies all clauses, each clause has a literal assigned True, and therefore there is a path from to in .
Conversely, suppose that there are two such disjoint paths. Since there is an path in , each stage has a node not used by the path in . We observe that, in , any path passes through all occurrences of either or for every . Therefore if we assign True if it is not passed by the path in and assign False otherwise, every clause has a literal assigned True and the instance is satisfiable. ∎
Since the MCDP and the CDP problems are apparently in NP, we obtain the following theorem.
Theorem 2.1
The MCDP problem is NPcomplete. The CDP problem is NPcomplete even for determining if there exist 2 paths in a 2colors graph.
Corollary 1
The Max CDP problem is NPhard and cannot be approximated in polynomial time with ratio for any , unless NP=P.
Proof
Since determining one or two paths is NPcomplete, it is impossible to approximate the optimal with ratio less than two in polynomial time, unless NP=P. ∎
2.2 An approximation algorithm
By we denote the connectivity of and in graph , i.e., the maximum number of disjoint paths between them. When the subscript is omitted, denotes the maximum number of disjoint paths of unicolor. We show the following greedy algorithm is a approximation algorithm for colors graphs.
For each color , find . Select the color with maximum and put these paths into solution. Remove all internal nodes of these paths, and then repeat the previous step until no path remains.
Theorem 2.2
The Max CDP problem can be approximated in time for colors graphs.
Proof
Apparently the optimal solution . The approximation ratio follows from that the number of paths found by the algorithm is at least . The value , i.e., connectivity in a unicolor graph, can be found by solving a maximum flow problem [9, p. 212] and therefore takes time [1, 2]. In total the algorithm takes time, or time since . ∎
Figure 2 illustrates a tight example of the approximation algorithm. The optimal solution contains disjoint paths (the horizontal ones), one for each color. But if we choose the bold path of color 1 at the first iteration, the algorithm will find only one path.
2.3 An exact algorithm
First, for any color , if , this path of single edge must be in the optimal solution, and we can put it into the solution and remove this edge. Therefore, in the remaining paragraphs of this paper, we assume for any . For a colors graph , define a node coloring . Two nodes are said to be assigned the same color if . For the convenience, nodes and are thought of having the same color as any node in any coloring. Let , , denote the subset of in which the two endpoints are assigned the same color by . Let and be the unicolor graph induced by the edge set . Suppose that is an optimal solution of the Max CDP problem. Let be a node coloring such that if is on a path of color in ; and is arbitrary otherwise.
We can observe that any path in must also be a path in and any path in corresponds to a unicolor path in . Thus, equals the connectivities on and can be computed in time. If we individually solve the maximum flow problems for all colorings, the total time complexity will be . By the following observations, the complexity can be reduced to . Using the generalized Gray code, all the colorings can be arranged in an order such that two consecutive colorings differ at only one node, and thus the maximum flow corresponding to can be obtained from that corresponding to by performing at most two breadthfirstsearches on the residual graph. The next theorem states the result but the detailed proof is omitted here.
Theorem 2.3
There exists an time algorithm for the Max CDP problem on colors graphs.
3 Lengthbounded cases
In this section we discuss the Max CDP problem with bounded length. The length of a path is the number of edges in this path. When the path lengths are required to be upper bounded by a fixed integer , we name the problem by LCDP. An edge will be denoted by , and denote a path of color and visiting in this order. The cases of can be easily solved, and we shall discuss the cases of and 4.
3.1 A polynomial time algorithm for 3LCDP
The set of all common neighbors of nodes and of color is denoted by . Recall that we have assumed for all , and we need only consider paths of length at least 2. An path of length two has the form , i.e., any coneighbor of and may contribute a path. The next claim comes from that any path of length two is disjoint to any others of length 2 and may intersect at most one path of longer length.
Claim
If for any , there is an optimal solution of the 3LCDP problem containing the path .
Algorithm 1 is the proposed method for solving the 3LCDP problem exactly. Besides the above claim, the correctness of the algorithm is due to the next claim which can be shown by observing that a set of disjoint paths corresponds to a matching on , and vice versa. We remind that defining as a set of ordered pairs is only for the sake of making step 9 easier. The maximum matching on a directed graph is the same as the one on an undirected graph.
Claim
Suppose that . A maximum matching of the graph constructed in Algorithm 1 corresponds to an optimal solution of the 3LCDP problem.
The time complexity is dominated by the step of finding a maximum cardinality matching of a general graph, which can be done in time [10].
Theorem 3.1
The 3LCDP problem on color graphs can be exactly solved in time.
3.2 The complexity of 4LCDP and an approximation algorithm
For the lengthbounded case, the notations and are analogous to the ones without superscript but those paths are of length at most .
Theorem 3.2
The LCDP problem on colors graphs is NPhard for fixed and .
Proof
It is sufficient to show the case of and . We show the NPhardness by transforming from a restrict version of the SAT problem in which there are at most 3 occurrences of each variable. This version of SAT problem still remains NPcomplete [4, 15]. Let , , be the clauses and , , the variables. For any variable , if all the occurrences of are positive, we can assign True and remove from all clauses. The case of all occurrences are negative is similar. Therefore we can assume the occurrences of each variable are neither all positive nor all negative. As a result, both and occur at most twice for any . Given an instance of the restrict SAT problem, we construct a 2colors graph as in Figure 3. Since the number of occurrences of each literal is at most two, any path of any color has length at most 4.
Since both the degree of and are , , and the maximum is achieved if for any clause there is a literal not used in . On the other hand, since the degree of is . We can also easily find disjoint paths in as long as for each we use either or as the internal nodes. If the SAT instance is satisfiable, let be a truth assignment satisfying all the clauses. We choose as internal nodes in if is assigned False in ; and otherwise. Then we can have disjoint paths of color 1 since there exists a literal assigned True in each clause and thus not used in color 2. The total number of disjoint paths is . Conversely if there are disjoint paths, there are exactly paths in and paths in . Therefore for each variable either itself or its negation is used in . Since there are disjoint paths in , each clause contains at least one literal not used in . So we can assign True if it is not used in and False otherwise, and all the clauses are satisfied. ∎
The approximation algorithm in Section 2.2 also works for length bounded case. We may achieve a better approximation ratio for small .
Theorem 3.3
For any fixed and , the LCDP problem can be approximated with ratio in polynomial time.
To show Theorem 3.3, we introduce the following problem, and Algorithm 2 is a approximation algorithm shown in [8].
Problem: The Maximum Set Packing (MSP) problem
Instance: A collection of element subsets , , of a universal set of total elements.
Goal: A maximum disjoint subcollection of .
Let OPT denote the maximum number of disjoint subsets and APP denote the result obtained by Algorithm 2. It was shown in [8] that
(1) 
By transforming to the MSP problem, the CDP problem can be approximated with ratio for any . A direct transformation is as follows. Let be an instance of the LCDP problem.

For each unicolor path of length at most , create a subset consisting of the internal nodes of the path. There are at most subsets and for each .

The elements are all the nodes in the graph except and .

Any disjoint subcollection corresponds to a set of disjoint unicolor paths.
The stop condition of the whileloop can be implemented by enumerating all possible subsets, testing if they are disjoint in time, and counting the intersected subsets in in time. Since is fixed, , and is also a constant determined by , this step takes . Since is increased at least one after each iteration and bounded by , the naive implementation has time complexity , which is polynomial for fixed and . Theorem 3.3 follows from Eq. (1), the transformation and the above analysis of the time complexity.
3.3 An efficient 2approximation algorithm for 4LDCP
Particularly, when and , by substituting , the approximation ratio by Eq. (1) is . That is, it takes time to compute a 2approximation of the 4LDCP problem. Although in polynomial time, it becomes intractable even for graphs of moderate size. In the following, we aim at developing a more efficient algorithm for and . Let denote the solution found so far, in which is the set of internal nodes of an path. Let be the nodes not used yet. When , the whilecondition can be implemented by
For each , determine if there are two disjoint paths of length at most 4 in .
The key point is how to determine if in a color graph without generating all possible paths. We shall use the following notations. The distance, or shortest path length, between and in graph is denoted by . A node is an cut node in graph if its removal separates the two nodes, i.e., after removing . The set of all such cut nodes is denoted by .
Lemma 2
Algorithm 3 is correct and takes time.
Proof
The algorithm returns True iff for some color or there are two unicolor disjoint paths of two colors. Clearly, what we need to show is the correctness of the procedure Test.
By the assumption that for all , we need not consider the case that or . The test procedure starts with a repeatuntil loop to remove any cut node of one graph from the other. Note that the loop is necessary since removing nodes from a graph may result in new cut nodes. But the loop will only be executed at most four times since each graph has one cut node originally and can have at most three cut nodes or otherwise and will have distance more than 4 (including , i.e., disconnected).
Step 6 deals with the case that the distance between and in either graph exceeds 4 or there exists any common cut node. At the beginning of step 7, we have that and the distance between and at either graph is at least two. Let . If , there exists a (unique) path of color . Immediately the output should be True since and is not in . The case that is similar. If , there is a path or in . Since is not in and , recalling that we have removed any cut node of from , the result should also be True.
The remaining case is . Recall that each graph has at least one cut node. Any length4 path in contains exactly three internal nodes, said , in which and therefore not in . Furthermore neither nor is in . Hence, removing the three nodes destroys at most two paths in . If there are more than two, not disjoint surely, length4 paths in , the output should be True. Similarly it holds if there are more than two such paths in . The remaining case is that there are one or two paths in either graph, and the answer can be obtained by the following method. First we choose a path in and check if the removal of the internal nodes separates and in . If not, we find two disjoint paths. Otherwise we choose the other path in if any, and do it again.
By the above discussion, the test procedure takes linear time, i.e., . The whole algorithm calls the test procedure for each pair of and , and therefore the total time complexity is since the other steps of Algorithm 3 can be done in time. ∎
Combining Algorithms 2 and 3, we obtain the next theorem. The time complexity is obtained as follows. To implement the whilecondition of Algorithms 2, we need to call Algorithm 3 at most times, where is the number of paths found so far. Let be the number of paths found by the algorithm. Since the whileloop may be executed at most times, the total time complexity is .
Theorem 3.4
There exists an time 2approximation algorithm for the 4LCDP problem on a colors graph, in which is the number of paths found by the algorithm.
Finally we would like to remark the following. In most of the applications, both and are small integers, and thus the approximation algorithm runs in linear time. Furthermore, since we need only consider the graphs induced by for each color , the algorithm is in fact a local algorithm and is therefore efficient even for largescale social networks.
References
 [1] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press and McGrawHill (2001)
 [2] Dinic, E.A.: Algorithm for solution of a problem of maximum flow in networks with power estimation. Sov. Math. Dokl. II, 1277–1280 (1970)
 [3] Freeman, L.C., Borgatti, S.P., White, D.R.: Centrality in valued graphs: A measure of betweenness based on network flow. Soc. Netw. 13(2), 141–154 (1991)
 [4] Garey, M.R., Johnson, D.S: Computers and Intractability: A Guide to The Theory of NPCompleteness. Freeman, NewYork (1979)
 [5] Gutin, G., Kim, E.J.: Properly coloured cycles and paths: results and open problems. Golumbic Festschrift, LNCS 5420, 200–208 (2009)

[6]
Hanneman, R.A., Riddle, M.: Introduction to Social Network Methods,
http://www.faculty.ucr.edu/hanneman/nettext/ (2005)  [7] Hassin, R., Monnot, J., Segev D.: Approximation algorithms and hardness results for labeled connectivity problems. J. Comb. Optim. 14(4), 437–453 (2007)
 [8] Hurkens, C.A.J., Schrijver, A.: On the size of systems of sets every of which have an SDR, with an application to the worstcase ratio of heuristics for packing problems. SIAM J. Discret. Math. 2, 68–72 (1989)
 [9] McHugh, J.A.: Algorithmic Graph Theory. Prentice Hall (1990)
 [10] Micali, S., Vazirani, V.V.: An algorithm for finding maximum matching in general graphs. FOCS, 17–27 (1980)
 [11] Mohan, G., Murthy, C.: Lightpath restoration in WDM optical networks. IEEE Netw., 24–32 (2000)
 [12] Robertson, N., Seymour, P.D.: Graph minors. XIII. The disjoint paths problem. J. Comb. Theory, Series B 63, 65–110 (1995)
 [13] Seymour, P.D.: Disjoint paths in graphs. Discret. Math. 29, 293–309 (1980)
 [14] Shiloach Y.: A polynomial solution to the undirected two paths problem. J. ACM 27, 445–456 (1980)
 [15] Tovey, C.A.: A simplified NPcomplete satisfiability problem. Discret. Appl. Math. 8(1), 85–89 (1984)
 [16] Wasserman S., Faust, K.: Social Network Analysis, Cambridge University Press, Cambridge (1994)
 [17] Yuan, S., Varma, S., Jue, J.P.: Minimumcolor path problems for reliability in mesh networks. IEEE INFORCOM 2005 volume 4, 2658–2669 (2005)