An exact algorithm with the time complexity of for the weighed mutually exclusive set cover problem
In this paper, we will introduce an exact algorithm with a time complexity of for the weighted mutually exclusive set cover problem, where is the number of subsets in the problem. This problem has important applications in recognizing mutation genes that cause different cancer diseases.
Department of Biomedical Informatics,
University of Pittsburgh, Pittsburgh, PA 15219, USA
Email: firstname.lastname@example.org, email@example.com
The set cover problem is that: given a ground set of elements and a collection of subsets of , try to find a minimum number of subsets in such that . If we add an additional constrain such that all subsets in the solution are pairwise disjoint, then the set cover problem becomes the mutually exclusive set cover problem. If we further assign each subset in a real number weight and search the solution with the minimum weight, i.e. the sum of weights of subsets in the solution is minimized, then the problem becomes the weighted mutually exclusive set cover problem.
Recently, the weighted mutually exclusive set cover problem has found important applications in cancer study to identify driver mutations , i.e. somatic mutations that cause cancers. As somatic mutations will change the structures (and therefore the functions) of signaling proteins; thus, perturb cancer pathways that regulate the expressions of genes in certain important biological processes, such as cell death, cell proliferation etc. The perturbations within a common cancer pathway are often found to be mutually exclusive in a single cancer cell, i.e. each tumor usually has only one perturbation on one given cancer pathways (one perturbation is enough to cause the disease; hence, there is no need to wait for another perturbation). Modern lab techniques can identify somatic mutations and gene expressions of cancer cells. After preprocessing the data, we will obtain following information for important biological processes, e.g. cell death: 1)which cancer cells have disturbed the expressions of genes in the biological process; 2) which genes have been mutated in those cancer cells; 3) how possible each mutation is related to the given biological process (i.e. each mutation is assigned a real number weight). Then next step is finding a set of mutations such that each cancer cell has one and only one mutation in the solution set (mutually exclusive) and the sum of weights of all genes in the solution set is minimized, which is the weighted mutually exclusive set cover problem.
While there is not much research on the mutually exclusive set cover or the weighted mutually exclusive set cover problems, the set cover problem has been paid much attention. The set cover, which is equivalent to the hitting set problem, is a fundamental NP-hard problem in Karp’s 21 NP-complete problems . One research direction for the set cover problem is approximation algorithms, e.g. papers  gave polynomial time approximation algorithms that find solutions whose sizes are at most times the size of the optimal solution, where is a constant. Second direction is using , the number of subsets in the solution, as parameter to design fixed-parameter tractable (FPT) algorithms for the equivalent problem, the hitting set problem. Those algorithms have a constrain such that each element in is included in at most subsets in , i.e. sizes of all subsets in the hittng set problem are upper bound by ; it is also called the -hitting set problem. For example, paper  gave an algorithm for the -hitting set problem, and paper  further improved the time complexity to . The third direction is designing algorithms that use as parameter in the condition that is much less than . Papers  designed algorithms with time complexities of for the problem. The paper  also extended the algorithm to solve the weighted mutually exclusive set cover problem with the same time complexity. Paper  improved the time complexity to under the condition that at least elements in are included in at most subsets in . This algorithm can also be extended to the weighted mutually exclusive set cover problem with the same time complexity. However, in the application of cancer study, neither is less than nor each element in is included in bounded number of subsets in . Hence, there is a need to design new algorithms.
In this paper, we will design a new algorithm that uses as parameter (in application of cancer study, is smaller than , where can be as large as several hundreds). Trivially, if using as parameter, we can solve the problem in time of , where the algorithm basically just tests every combination of subsets in . To our best knowledge, we have not found any algorithm that is better than the trivial algorithms when using as parameter. This paper will give the first un-trivial algorithm with the time complexity of to solve the weighted mutually exclusive set cover problem. We have tested this algorithm in the cancer study, and the program can finish the computation practically when is less than 100.
2The weighted mutually exclusive set cover problem is NP-hard
The formal definition of the weighted mutually exclusive set cover problem is: given a ground set of elements, a collection of subsets of , and a weight function , if such that , and for any , then we say is a mutually exclusive set cover of and is the weight of ; the goal of the problem is to find a mutually exclusive set cover of with the minimum weight, or report that no such solution exists.
As we have not found the proof of NP-hardness for the weighted mutually exclusive set cover problem, in this section, we will prove that the mutually exclusive set cover problem is NP-hard; thus, prove that the weighted mutually exclusive set cover problem is NP-hard.
We will prove the NP-hardness of the mutually exclusive set cover problem by reducing another NP-hard problem, the maximum set packing problem, to it. Remember that the maximum set packing problem is: given a collection of subsets, try to find an such that subsets in are pairwise disjoint and is maximized.
3The main Algorithm
In this section, we will introduce our new algorithm to solve the weighted mutually exclusive set cover problem.
Let be an instance of the weighted mutually exclusive set cover problem. We can use a bipartite graph to represent such that all nodes on one sides are subsets in while nodes on the other side are elements in , and if an element of is in subset , i.e. , then an edge is added between and . For the convenience, let us introduce some notations. The Figure ? can help you to understand and remember following notations.
For any , let , , . For any in , let , , , .
The main algorithm, Algorithm-1, is shown in Figure ?. Basically, the Algorithm-1 first finds an with minimum degree and then branches at one subset in (such as in step 6.2.2 and 6.2.3). For the convenience, if , then we say that Algorithm-1 is doing a -branch. Because of steps 3,4,5, when the program arrives at step 6, we must have: 1) ; 2) for any , ; 3) there exists a such that .
The Algorithm-1 is basically searching the solution by going through a search tree; hence, if knowing the number of leaves in the search tree, then we will obtain the time complexity of the Algorithm-1. Next, we will estimate the number of leaves in the search tree by studying the different cases of branching. We begin from the -branch.
Now, we consider the case of doing -branch. Remember that when Algorithm-1 is doing a -branch, for all .
Let us consider the case of doing -branch for .
In this paper, we first proved that the weighted mutually exclusive set cover problem is NP-hard. Then we designed the first non-trivial algorithm, which uses the as parameter, with a time complexity of for the problem. the weighted mutually exclusive set cover problem has been used to find the driver mutations in cancers . Our new algorithm can find the optimal solution for the problem, which is better than solutions found by the heuristic algorithms in the previous research . The exclusivity is the extreme case. In practical applications, a cancer cell may have more than one mutation to perturb a common pathway. Hence, a modified model is finding a set of mutations with minimum weight sum such that each cancer cell has at least one and at most t (t=2 or 3) mutations in the solutions, which leads to the small overlapped set cover problem. Also, on application, some mutations in cancer cells may not be detected because of errors. Thus, it is not always ideal to find a solution mutations that cover all cancer cells. A modified model is finding a set of mutually exclusive mutations that cover at least percent ( or ) of cancer cells, which leads to the maximal set cover problem. Our next research will design efficient algorithms for above two new problems.
- N. Alon, D. Moshkovitz, and S. Safra, Algorithmic Construction of Sets for -Restrictions, ACM Transaction on Algorithms, 2(2), pp. 153-177, 2006.
- A. Bjölund, T. Husfeldt, M. Koivisto, Set partitioning via Inclusion-Exclusion. SIAM Journal on Computing, Special Issue for FOCS 2006.
- J. Chen, I, Kanj, and W. Jia, Vertex Cover: Further Observations and Further Improvements, Journal of Algorithm, 41, pp. 280-301, 2001.
- G. Ciriello, E. Cerami, C. Sander, N. Schultz, Mutual exclusivity analysis identifies oncogenic netwrok modules, Genome research, 22(2), pp. 398-406, 2012.
- U. Feige, A Threshold of for Approximation Set Cover, J. of the ACM, 45(4), pp. 634-652, 1998.
- H. Fernau, a top-down approach to search-trees: Improved algorithmics for -Hitting Set, Algorithmica, 57, pp. 97-118, 2010.
- Q. Hua, Y. Wang, D. Yu, F. Lau, Dynamic programming based algorithms for set multicover and multiset multicover problem. Theoretical Computer Science V411, pp. 2467-2474, 2010.
- R. Karp, Reducibility Among Combinatorial Problems, In R. E. Miller and J. W. Thatcher (editors). Complexity of Computer Computations. New York: Plenum, pp. 85-103, 1972.
- S. Kolliopoulos, N. Young, Approximation algorithms for covering/packing integer programs. J. Comput. Syst. Sci. 71(4), pp.495-505, 2005.
- S. Lu, X. Lu, A graph model and an exact algorithm for finding transcription factor modules, 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 355-359, 2011.
- C. Lund, and M. Yannakakis, On the Hardness of Approximating Minimization Problem, J. of the Association for Computing Machinery, 45(5), pp. 960-981, 1994.
- C. Miller, S. Settle, E. Sulman, K. Aldape, A. Milosavljevic, Discovering functional modules by identifying recurrent and mutually ecxlusive mutational patterns in tumors, BMC medical genomics, 4, pp. 34, 2011.
- R. Niedermeier, and P. Rossmanith, An Effcient Fixed-parameter Algorithm for 3-Hitting Set, J. of Discrete Algorithms, 1(1), pp. 89-102, 2003.