Towards Effective Exact Algorithms for the Maximum Balanced Biclique Problem

Towards Effective Exact Algorithms for the Maximum Balanced Biclique Problem

Yi Zhou André Rossi Jin-Kao Hao LERIA, University of Angers, 2 bd Lavoisier, 49045 Angers, France Institut Universitaire de France, Paris, France hao@info.univ-angers.fr
Abstract

The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP. Firstly, we introduce an Upper Bound Propagation procedure to pre-compute an upper bound involving each vertex. Then we extend an existing branch-and-bound algorithm by integrating the pre-computed upper bounds. We also present a set of new valid inequalities induced from the upper bounds to tighten an existing mathematical formulation for MBBP. Lastly, we investigate another exact algorithm scheme which enumerates a subset of balanced bicliques based on our upper bounds. Experiments show that compared to existing approaches, the proposed algorithms and formulations are more efficient in solving a set of random graphs and large real-life instances.

Keywords: Combinatorial optimization; Clique; Exact algorithms; Techniques for tight bounds; Mathematical formulation.

, , Corresponding author.

1 Introduction

Given a bipartite graph with two disjoint vertex sets , and an edge set , a biclique (or ) is the union of two subsets of vertices , such that , . In other words, the subgraph induced by vertex set is a complete bipartite graph. If , then biclique is a balanced biclique. The Maximum Balanced Biclique Problem (MBBP) is to find a balanced biclique of maximum cardinality. As holds for a balanced biclique , MBBP is then to find the maximum half-size balanced biclique. MBBP is a special case of the conventional maximum clique problem [14].

MBBP is a prominent model with a large range of applications, such as nanoelectronic system design [1, 13], biclustering of gene expression data in computational biology [4] and PLA-folding in the VLSI theory [10]. In terms of computational complexity, the decision version of MBBP is NP-Complete [6, 2], though the maximum biclique problem in bipartite graphs (without requiring ) is polynomially solvable by maximum matching algorithm [4].

Considerable effort has been devoted to the pursuit of effective algorithms for MBBP, both theoretically and practically. Heuristic algorithms represent the most popular approach for MBBP, though they do not guarantee the optimality of the final solution found. The majority of existing heuristic algorithms solve the equivalent maximum balanced independent set (a vertex set such that no two vertices are adjacent) problem in the complement graph, rather than directly seeking the maximum balanced biclique from the given graph. For example, several greedy heuristic algorithms were proposed based on vertex-deletion on the complement graph from 2006 to 2014 [1, 13, 15, 16], while in [17], an evolutionary algorithm combining structure mutation and repair-assisted restart was studied.

On the other hand, according to our literature review, there are only two studies on exact algorithms in the literature. In [13], a recursive exact algorithm for searching a maximum balanced independent set with a given half-size in the complement graph was proposed. However, the computational time of this algorithm becomes prohibitive when the number of vertices of the given graph exceeds (32,32). In [9], a branch-and-bound (B&B) algorithm for MBBP for general graphs (including non-bipartite graphs) was studied. The algorithm incorporates a clique cover technique for upper bound estimation (an equivalent technique of using graph coloring to estimate the upper bound for the maximum clique problem) and employs lex symmetry breaking techniques for general graphs. As far as we know, this algorithm is currently the best performing exact algorithm, even though the bounding technique and symmetry breaking techniques are only effective for non-bipartite graphs.

In addition to specifically designed exact algorithms, the general Mixed Integer Programming (MIP) constitutes an interesting alternative for addressing hard combinatorial problems such as MBBP. Commercial MIP solvers, like IBM CPLEX, can even solve some hard instances which cannot be handled by other approaches. Meanwhile, the success of a MIP solver highly depends on the tightness of the mathematical formulation of the problem. For MBBP, a MIP formulation has been proposed in [5], it is based on the complement graph. Another mathematical formulation which defines the constraints on the original graph was presented in [17]. However, this formulation was not applicable for MIP solvers as it contains non-linear constraints.

In this work, we introduce new ideas for developing effective exact algorithms for MBBP, which can be applied to solve very large MBBP instances from applications like social networks. Our main contributions can be summarized as follows.

  • We elaborate an Upper Bound Propagation (UBP) procedure inspired from [12], which produces an upper bound of the maximum balanced biclique involving each vertex in the bipartite graph. UBP propagates the initial upper bound involving each vertex and achieves an even tighter upper bound for each vertex. UBP is independent from the search procedure and is performed before the start of the algorithm. An extended exact algorithm, denoted by (ExtBBClq), is proposed by taking advantage of UBP to improve BBClq, the branch-and-bound algorithm introduced in [9].

  • Based on the upper bounds returned by UBP, we introduce new valid inequalities to tighten the MIP formulation of MBBP introduced in [5]. Our computational experiments suggest that using the tightened model improves the performance of the MIP solver CPLEX.

  • We also present a new exact algorithm (ExtUniBBClq) to supplement the family of B&B based algorithms for MBBP. Unlike BBClq which goes through every possible balanced biclique, the new algorithm only enumerates the possible partial sets (half-sets) of the balanced bicliques in the graph. ExtUniBBClq also integrates UBP as a pre-processing procedure and performs generally well for the benchmark instances.

The reminder of the paper is organized as follows. Section 2 introduces the notations that will be used throughout the paper and Section 3 reviews the BBClq algorithm introduced in [9]. In Section 4, we present our Upper Bound Propagation procedure for upper bound estimation and explain how to use it to improve BBClq. Then, in Section 5, we show how the upper bounds can lead to new valid inequalities to tighten the MIP formulation of [5]. Furthermore, we introduce the novel ExtUniBBClq algorithm in Section 6. Computational results and experimental analyses are presented in Section 7, followed by conclusions and future working directions.

2 Notations

Figure 1: A bipartite graph , , .

Given a bipartie graph ( if not specifically stated), let be a balanced biclique of (i.e., ). The half-size of the balanced biclique is the cardinality of (or ). For example, in Figure 1, is a balanced biclique of half-size of 2. For all , denotes the subgraph of induced by . Given a vertex in , the set of vertices adjacent to is denoted by and is the degree of vertex . The upper bound involving vertex , denoted by , is an upper bound of the half-size of the maximum balanced biclique containing vertex . For example, in Figure 1, a possible value for could be 2, since .

3 Review of the BBClq algorithm

Algorithm 1 shows the BBClq algorithm, which is a recursive exact algorithm introduced in [9]. BBClq is adapted from a well-known B&B algorithm for the maximum clique problem [3] and recursively builds up two sets and such that forms a biclique. The algorithm maintains a candidate set () that includes vertices which are eligible to move into () while ensuring that is a biclique (i.e., , ). Initially, the algorithm sets , the global lower bound on the maximum biclique half-size to 0 and starts the search by calling BBClq.

At each recursive call to BBClq, a vertex (called branch vertex) is moved from (lines 7-8). The algorithm then considers the branches (possibilities) of in lines 9-12 and in the next while loop. The bounding procedure (line 9) prunes the branch of if the upper bound after estimation in this context is not larger than the global lower bound. The upper bound estimating method, which is classically a key point concerning the performance of a B&B algorithm, will be introduced in the following section. If the current branch is not pruned, the search goes on by reconstructing with a new vertex and by filtering from those vertices not adjacent to (every vertex in must be adjacent to every vertex in ). After updating the two sets, the algorithm recursively calls BBClq in line 12, swapping the roles of and , as and are extended alternatively for the sake of satisfying the balance requirement. The above process is repeated in the next recursive call of BBClq.

When the algorithm loops back to line 4, as we just mentioned, it explores another branch implying . The while loop stops when becomes empty or when the remaining vertices in do not allow to build a solution better than the global lower bound (lines 5-6). Besides, since or holds each time BBClq is called, we update the lower bound in lines 1-3 once and store the incumbent solution as the best solution found so far. As a result, the best solution is an optimal biclique with ( or ), but it may not be totally balanced (). Thus, in line 13, the procedure of retrieving the maximum balanced biclique (of half-size ) from a biclique is accomplished by make_balance(). This procedure simply removes vertices from the larger set or until a balanced biclique is obtained.

Figure 1 is now used to illustrate BBClq. Initially, and BBClq is called. According to the minimal degree heuristic in [9], vertex is chosen as the first branch vertex. Clearly, the current upper bound is greater than 0, the algorithm proceeds to BBClq to explore the solutions containing vertex . As a result, the solution is found and is updated to 1. Likewise, the algorithm selects as the second branch vertex in the following loop, proceeds to BBClq if no upper bounding technique is applied. We can see that this recursive call to BBClq has to explore the case of expanding the given biclique by adding vertex 8 or 10. However, with the upper bounding estimating technique proposed in this paper, the call of BBClq will not even start since the upper bound involving vertex 5 is 1 (). The algorithm finds the optimal solution after the third loop (which explores and calls BBClq). There will be no additional iteration as ().

Input: Graph instance , , - current sets that form a biclique, , - the sets of eligible vertices that can be added to and respectively
Output: A maximum balanced biclique of .
if then  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.1\hyper@anchorend
Record current best biclique in ;\Hy@raisedlink\hyper@anchorstartAlgoLine0.2\hyper@anchorend
  while do  if then  return   ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.3\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.4\hyper@anchorend
if then  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.5\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.6\hyper@anchorend
BBClq(,, , , )     return make_balance
Algorithm 1 BBClq(, , , , ), the B&B algorithm for MBBP taken from [9]
6

6

6

6

6

6

4 Upper bound propagation and its use to improve BBClq

We introduce in this section our Upper Bound Propagation procedure (UBP) which is then used as a pre-processing technique to reinforce the BBClq algorithm presented in the last section.

4.1 The upper bound propagation procedure

The original BBClq algorithm calculates a clique cover (based on addressing the graph coloring problem on the complement graph) to estimate the upper bound in a general graph relying on the fact that sets and are independent sets. However, when the given graph is bipartite, the upper bound found by this technique is trivial as two vertex sets are initially independent sets. Here, we introduce our Upper Bound Propagation to produce, for each vertex, an upper bound on the half-size of any maximum balanced biclique involving that vertex. UBP is based on the following propositions.

Proposition 1

For each vertex , is an upper bound on the maximum half-size balanced biclique involving .

This proposition is obviously true since the half-size of a balanced biclique cannot exceed the degree of any vertex in the biclique.

Proposition 2

Given a vertex , let . Let be the maximum integer such that there exists at least vertices in satisfying , then is an upper bound on the maximum half-size balanced biclique involving .

Proof: Clearly, in the maximum balanced biclique involving , for any vertex (including ), we have . Therefore, the maximum possible value such that vertices in share at least adjacent vertices with is an upper bound involving . Note that this proposition also holds given any vertex in .

Proposition 3

Given a vertex , let be the largest integer such that there exists vertices in having upper bounds at least . Then is an upper bound on the maximum half-size balanced biclique involving .

Proof: We prove this proposition by contradiction. Suppose is not an upper bound, then there exists a balanced biclique involving of half-size such that , implying that all the vertices in () must have an upper bound of at least (i.e., ), which contradicts the condition that is the maximum integer such that there exists in at least vertices having .

Consider the example of Figure 1, according to Proposition 1, we have , . Then, following Proposition 2, can be improved (decreased) to since , (). Similarly can also be improved to 2, 2, 1, 1 respectively. By Proposition 3, it can be deduced that and (, ), which are better upper bounds than the degrees.

Based on these proposition, we devise the UBP procedure (see Algorithm 2) to calculate an upper bound involving each vertex. Initially is set to , then the upper bound of each vertex in is improved according to Proposition 2 (lines 2-9). From line 10 to the end of Algorithm 2, the procedure aims at propagating the upper bound based on Proposition 3 until the upper bounds cannot be improved any more. The propagation procedure is guaranteed to converge as the upper bounds cannot be smaller than 0. Experiments in Section 7 show that, for both random and real-life large instances, UBP converges very fast, only in a limited number of iterations.

In both lines 7 and 14, we use binary search to find, for a given set of integers, the maximum element such that there are at least integers in that are larger than or equal to . The procedure works as follows: first, is sorted by decreasing order, then, an iteration starts by comparing the middle element with its index in (i.e., its position in the sorted list). If the middle element is greater (respectively lesser) than its index, the next iteration proceeds with the second half (respectively the first half) of . This binary search procedure based on dichotomy performs at most operations.

Actually, we can also tighten the initial upper bound involving each vertex in by repeating the process in lines 2-9 after replacing with before the propagating procedure (lines 10-17) starts. However, this procedure requires considerable memory and time especially for large graphs. For example, the matrix representing requires a memory of , the computational time of computing () is bounded by . Thus the overhead is not negligible. As a compromise, we set a threshold on the size of the vertex set. We apply the procedure of lines 2-9 to improve the upper bound involving each vertex only when the cardinality of the vertex set ( or ) is less than the threshold. In the following experiments, the threshold has empirically been set to 30000.

Input: Graph instance
Output: An upper bound vector for each vertex in .
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.1\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.2\hyper@anchorend
for do  for do  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.3\hyper@anchorend
    for do  Binary search for the largest integer such that ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.4\hyper@anchorend
if then  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.5\hyper@anchorend
    ;\Hy@raisedlink\hyper@anchorstartAlgoLine0.6\hyper@anchorend
while do  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.7\hyper@anchorend
for do  Binary search for the largest integer such that ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.8\hyper@anchorend
if then  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.9\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.10\hyper@anchorend
      return
Algorithm 2 Upper bound propagation procedure
10

10

10

10

10

10

10

10

10

10

To see how tight the upper bounds provided by UBP are, consider the example of Figure 1, the final upper bound achieved by UBP is for and for . These upper bounds are actually all tight.

4.2 Combining UBP with BBClq: ExtBBClq

As UBP is independent of the search algorithm, we use it as a pre-processing procedure for BBClq to obtain an extended version named ExtBBClq. In ExtBBClq, we use the same branching heuristic as in the original BBClq algorithm: the vertex of the minimum degree in is given the highest priority for branching. To efficiently implement ExtBBClq, we sort the arrays () in ascending order of index number before the beginning of BBClq, so that the intersection operation in line 11 (Algorithm 1) can be accomplished in asymptotic time by binary search. More importantly, to make use of the upper bound information calculated by UBP, in ExtBBClq, instead of calculating the upper bound by calling the upper bound estimation method (i.e., in line 9, we use the pre-computed returned by UBP as the upper bound in the current branch.

5 A tighter mathematical formulation

In this section, we propose a tightened mathematical formulation for MBBP that takes advantage of the UBP procedure. Let us first recall the mathematical formulation of MBBP introduced in [5]:

(1)

subject to:

(2)
(3)
(4)

where each vertex of is associated to a binary variable indicating whether the vertex is part of the biclique, is the set of edges in the complement bipartite graph of . Constraint (2) requires that each pair of non-adjacent vertices cannot be selected at the same time (i.e., the solution must form a biclique). Constraint (3) enforces that the biclique is balanced.

To make use of the upper bounds returned by UBP, let (or ) be the set of all the vertices in (respectively in ) such that ( is a positive integer) for all . Then the following inequality is valid:

Indeed, the vertices in can only be involved in balanced cliques having half-size less than . We consider this inequality for as it dominates the inequalities associated with lower values of .

Before tightening this inequality, we observe that since , we have . Then for each such that , or equivalently for all , we can lift the term associated with :

Let be any maximal subset of containing such that for all and , then is empty. The term ‘maximal subset’ means that no vertex can be added to .

We can deduce the following valid inequality:

It can be observed that the valid inequalities built from two vertices and of may possibly be identical, especially if . This is not an issue since modern solvers remove duplicate constraints automatically during presolving.

Naturally, the lower is, the tighter these inequalities are. Consider the example of Figure 1, since the upper bound involving each vertex is given by UBP, we can produce the following valid inequalities ():

  • Vertex 1 (and also 4) leads to

  • Vertex 5 leads to

  • Vertex 6 leads to

  • Vertex 10 leads to

The LP relaxation of the original formulation (1)-(4) yields an objective of 2.5, and nearly all the variables are fractional. Adding these four inequalities yields an objective of 2 and an integer solution, which proves to be optimal.

6 A novel MBBP algorithm ExtUniBBClq

We observe that for any biclique such that and , the maximum balanced biclique in subgraph is with , and the maximum half-size of any is still . In other words, the half-size of the maximum balanced biclique in is the cardinality of maximum subset which satisfies . As a result, instead of building the two sets of balanced biclique alternatively, we can directly enumerate the eligible subset from (or from ) such that . Based on this observation, we propose a new algorithm (Algorithm 3) which builds the maximum eligible subset from (as ).

Input: Graph instance , - the current subset of , - the candidate subset of , - the common neighbors of vertices in ,i.e.,
Output: A maximum balanced biclique of
if AND then  ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.1\hyper@anchorend
Record current best biclique ;\Hy@raisedlink\hyper@anchorstartAlgoLine0.2\hyper@anchorend
  if OR then  return make_balance   while do  if then  return   ; \Hy@raisedlink\hyper@anchorstartAlgoLine0.3\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.4\hyper@anchorend
if then  return   ;\Hy@raisedlink\hyper@anchorstartAlgoLine0.5\hyper@anchorend
;\Hy@raisedlink\hyper@anchorstartAlgoLine0.6\hyper@anchorend
ExtUniBBlq(, , , )   return make_balance
Algorithm 3 ExtUniBBClq(, , , ), a new B&B procedure for MBBP based on enumerating one vertex set.
6

6

6

6

6

6

The framework of ExtUniBBClq is similar to Algorithm 1 except that ExtUniBBClq only builds the set recursively such that forms a clique and . Moreover, in ExtUniBBClq, we make use of the upper bound involving each vertex returned by UBP. Therefore, UBP has to be called before the start of ExtUniBBClq. For each call of ExtUniBBClq, a current set , as well as the candidate set that contains vertices that can be moved into , and set which is the common adjacent vertices of vertex in (i.e., ) are given. The algorithm initializes to 0 and begins from ExtUniBBClq.

As in BBClq, in each call of ExtUniBBClq, a branch vertex (with the maximum upper bound) is moved out from (lines 9-10) and the algorithm goes to two branches: the branch where before the end of current loop (lines 11-15) and another branch where in the next loop. In lines 11-12, when the upper bound associated with vertex is not larger than the lower bound, ExtUniBBClq stops the current search immediately once is the largest upper bound of all vertices in . In lines 13-15, the search goes on by expanding and rebuilding . Note that we filter out unpromising vertices from which have an upper bound not larger than the lower bound. In the end of the loop, ExtUniBBClq is called to further enlarge set . After it returns, the algorithm moves to the next loop, entering another branch with . In lines 7-8, the search is stopped when is not large enough to build a better solution. In lines 5 and 13, the make_balance procedure is called so that the final solution is a strictly balanced biclique of half-size .

At the start of each call to ExtUniBBClq, we update the lower bound if it is needed (lines 1-3) and terminate the current search if is not eligible () or is not larger than the lower bound. For an efficient implementation, we pre-sort the array which represents (the initial ) in ascending order of the upper bound involving each vertex. Consequently, the last element in the array will always be the vertex with the largest upper bound. We also sort the arrays representing (the initial ) and () in ascending order of the index of each vertex so that the intersection operation in line 14 can be accomplished in linear time.

We illustrate the principle of this procedure by using the example of Figure 1 again. Firstly, the lower bound is initialized to 0 and ExtUniBBClq is then called. In the first while loop, vertex is selected as the branch vertex as is the largest upper bound of all vertices in ; then we get , , . The next call to ExtUniBBClq expands the incumbent set , which leads to a solution (and is updated to 2). The second while loop which branches on vertex and builds candidate set , is stopped earlier as the largest upper bound () is equal to . Thus, the whole search stops, returning as the optimal solution.

7 Computational experiments

This section is dedicated to a computational evaluation of the proposed algorithms for MBBP, based on the following two sets of benchmark graphs which are commonly used in the literature [7, 17].

  • Random graphs. In these graphs, every possible edge occurs independently with fixed probability. For each graph, there are vertices in each vertex set (i.e., ) and the probability that an edge exists between a pair of vertices is (). A theoretical analysis in [5] showed that the maximum half-size of the balanced bicliques in such graphs is in the range with high probability (when is sufficiently large).

  • Real-life networks. This set includes 30 bipartite networks from the Koblenz Network Collection (KONECT) [8], which contains hundreds of networks derived from real-life applications, including social networks, hyperlink networks, authorship networks, physical networks, interaction networks and communication networks. We summarize the main features of the selected instances in Table 1, including the number vertices and edges. Irrelevant information for solving MBBP, such as multiple edges, vertex or edges weights have been filtered out.

instance instance
actor-movie 127823 383640 1470418 edit-dewiki 425842 3195148 57323775
bibsonomy-2ui 5794 767447 2555080 edit-frwiktionary 5017 1907247 7399298
bookcrossing_full-rating 40523 105278 1149739 escorts 6624 10106 50632
dblp-author 1425813 4000150 8649016 flickr-groupmemberships 103631 395979 8545307
dbpedia-genre 7783 258934 463497 github 56519 120867 440237
dbpedia-location 53407 172091 293697 gottron-trec 556077 1173225 83629405
dbpedia-occupation 101730 127577 250945 jester1 100 73421 4136360
dbpedia-producer 48833 138844 207268 moreno_crime 551 829 1476
dbpedia-recordlabel 18421 168337 233286 opsahl-ucforum 522 899 33720
dbpedia-starring 76099 81085 281396 pics_ut 17122 82035 2298816
dbpedia-team 34461 901166 1366466 reuters 283911 781265 60569726
dbpedia-writer 46213 89356 144340 stackexchange-stackoverflow 96680 545196 1301942
discogs_affiliation 270771 1754823 14414659 unicodelang 254 614 1255
discogs_lgenre 6624 10106 50632 wiki-en-cat 182947 1853493 3795796
discogs_style 383 1617943 24085580 youtube-groupmemberships 30087 94238 293360
Table 1: The basic information of selected real-life networks.

We compare the performance of 5 algorithms including both the existing approaches and the new approaches proposed in this work. The first 3 algorithms are B&B algorithms.

  • BBClq: the algorithm introduced in [9]. However, compared with the original algorithm, symmetry breaking and clique cover techniques are removed as they are irrelevant for bipartite graphs.

  • ExtBBClq: the extended version of BBClq combining UBP with our new branching heuristic presented in Section 4.2

  • ExtUniBBClq: the new algorithm introduced in Section 6.

To compare the original mathematical formulation and the tightened formulation presented in this work, we use IBM CPLEX 12.6.1 to solve the benchmark instances with both formulations.

  • Original: the original mathematical formulation of MBBP from [5].

  • Tightened: the formulation with the additional inequalities introduced in Section 5.

UBP B&B Algorithms MIP
time iter BBClq ExtBBClq ExtUniBBClq Original Tightened
50 0.1 0.00 3.1 0.00 0.00 0.00 4.02 0.41
50 0.3 0.00 2.8 0.02 0.01 0.01 4.62 4.70
50 0.5 0.00 3.3 0.45 0.15 0.3 6.48 11.52
50 0.7 0.00 3.0 24.19 5.12 11.60 8.49 7.47
50 0.9 0.00 3.2 10174.28(5) 680.11 4405.93(28) 0.39 0.33
100 0.1 0.00 3.8 0.01 0.00 0.00 30.26 4.77
100 0.3 0.00 3.2 0.96 0.42 0.78 457.77 510.60
100 0.5 0.01 3.3 118.97 32.22 52.75 7137.27 4392.06
100 0.7 0.01 3.1 [13.50-] 9540.81(17) [13.73-] [13.57-20.30] [13.40-20.00]
100 0.9 0.00 2.8 [26.07-] [25.37-] [26.87-] 9977.23(6) 10358.21(4)
150 0.1 0.00 3.7 0.06 0.04 0.05 305.04 225.67
150 0.3 0.01 3.0 11.28 3.44 8.75 9953.88(15) 9937.61(14)
150 0.5 0.02 3.3 4716.49 933.95 2281.69 [8.86-28.38] [8.89-28.46]
150 0.7 0.04 3.4 [14.73-] [14.73-] [15.00-] [14.82-41.86] [14.62-42.08]
150 0.9 0.05 3.3 [29.53-] [27.93-] [30.23-] [35.04-55.58] [34.71-55.54]
200 0.1 0.01 3.3 0.19 0.07 0.26 1725.24 1239.80
200 0.3 0.03 3.2 84.08 21.95 59.22 [6.00-38.44] [6.00-22.52]
200 0.5 0.05 3.3 [9.97-] 10761.34(3) [10.03-] [8.95-55.68] [9.05-50.76]
200 0.7 0.08 3.3 [15.30-] [15.23-] [15.93-] [15.45-64.32] [15.67-65.78]
200 0.9 0.11 3.3 [31.93-] [30.10-] [32.60-] [38.70-79.19] [38.21-79.52]
Table 2: Computational results of the 5 algorithms for the random graphs.
instance BEST UBP B&B Algorithms MIP
time iter BBClq ExtBBClq ExtUniBBClq Original Tightened
actor-movie 8* 5.54 27 6533.01 1671.29 807.25 - -
bibsonomy-2ui 8* 1.56 7 491.36 13.84 9.13 - -
bookcrossing_full-rating 13* 5.11 33 3102.66 426.37 [10-] - -
dblp-author 10* 19.86 21 [1-] 403.16 30.06 - -
dbpedia-genre 7* 3.94 9 171.86 5.83 16.35 - -
dbpedia-location 5* 0.18 8 633.98 0.52 0.39 - -
dbpedia-occupation 6* 0.27 8 909.03 1.29 1.57 - -
dbpedia-producer 6* 0.27 11 535.44 0.62 0.65 - -
dbpedia-recordlabel 6* 24.33 7 214.45 24.67 24.04 - -
dbpedia-starring 6* 1.07 31 530.61 4.67 1.39 - -
dbpedia-team 6* 3.08 15 2982.24 241.06 1170.25 - -
dbpedia-writer 6* 0.19 12 283.16 0.35 0.23 - -
discogs_affiliation 26* 12.01 17 [1-] 1688.95 [18-] - -
discogs_lgenre 15* 0.06 1 37.08 1.01 0.17 - -
discogs_style 38 17.42 22 [23-] [38-] [23-] - -
edit-dewiki 40 93.68 23 [1-] [40-] [14-] - -
edit-frwiktionary 19* 9.56 9 944.21 152.5 [19-] - -
escorts 6* 5.69 6 7.68 10.05 10.55 - -
flickr-groupmemberships 36 47.37 36 [34-] [36-] [18-] - -
github 12* 1.01 16 677.72 150.66 [12-] - -
gottron-trec 83 549.21 35 [33-] [38-] [83-] - -
jester1 100* 1.86 1 1204.64 1123.66 4.24 248.87 -
moreno_crime 2* 0.05 3 0.05 0.06 0.06 3483.58 55.22
opsahl-ucforum 5* 0.09 10 0.18 0.13 0.26 [4-285] [5-7]
pics_ut 27 21.84 7 [27-] [23-] [23-] - -
reuters 39 611.76 61 [35-] [39-] [12-] - -
stackexchange-stackoverflow 9* 4.62 29 4107.56 265.8 3690.8 - -
unicodelang 4* 0.02 5 0.01 0.02 0.03 1218.46 19.58
wiki-en-cat 14* 7.67 20 [1-] 28.72 121.99 - -
youtube-groupmemberships 12* 0.96 21 222.88 11.49 1784.76 - -
Table 3: Computational results of the 5 algorithms for KONECT instances.

All the experiments are conducted on a computer with an Intel Xeon© E5-2670 processor (2.5GHz and 2GB RAM) running CentOS 6.5. The BBClq, ExtBBClq and ExtUniBBClq algorithms are implemented in C++ and compiled with g++ using optimization option -O3111The code of our algorithms will be available online.. For each instance, a cut-off time limit of 3 hours (10800 seconds) is given to each trial. When solving the DIMACS machine benchmark procedure ‘dfmax.c’222dfmax:ftp://dimacs.rutgers.edu/pub/dsj/clique/ without compilation optimization flag, the run time on our machine is 0.46, 2.68 and 10.70 seconds for graphs r300.5, r400.5 and r500.5 respectively.

The experimental results for random graphs are summarized in Table 2. For each configuration which is a pairwise combination of (the cardinality of or ) and (the edge density), we generated 30 graphs independently. Column “time” reports the pre-processing time (the time of running UBP) and column “iter” shows the average number of while loops (i.e., lines 11-17, Algorithm 2) needed to stabilize the upper bound of all the vertices. For each algorithm, we report the average time consumed to solve the corresponding instance (the time for pre-processing is also included). Note that 0.00 means the instance can be solved in less than 0.01 second. If some of the 30 instances cannot be solved within 3 hours, we also append the number of solved instances in brackets. If none of the 30 instances can be solved, for the first three algorithms, we report the average best lower bound, for MIP, we report both the average best lower bound and average upper bound. The shortest times among the first three algorithms and between the two mathematical formulations are highlighted by bold font.

For the tested random graphs, the time consumption of UBP is insignificant with respect ro the whole search time. Meanwhile, the number of iterations for propagating upper bounds is also trivial (closely around 3 for all the configurations). In terms of computational time of all the algorithms, ExtBBClq is generally the fastest algorithm to solve most of these instances while ExtUniBBClq also performs better than BBClq. The MIP formulation is found to be quite competitive compared with the other three algorithms when the density of the instance reaches 0.9. A possible explanation to this phenomenon is that the formulation of dense instances involves fewer constraints and may be solved more easily. In addition, when graphs density increases much, the maximum biclique and the maximum balanced biclique tend to be closer and closer, which means that the balancing constraint, which makes the problem NP-hard, tends to be less and less active. Since the maximum biclique problem is easy on bipartite graph, a balanced biclique is likely to be found easily by a MIP solver (as the quality of the linear programming relaxation improves with density) whereas high density is the worst situation for enumerative approaches. As expected, the tightened MIP is often solved faster than the original MIP, and the gap to optimality is generally less for those instances that could not be solved to optimality.

We report the results for the set of large real-life instances in Table 3. Column “instance” indicates the name of graph. Column “best” shows the best half-size found by all algorithms. An extra “*” indicates the optimality of this best value. Column “UBP” also reports the time of pre-processing and number of iterations to propagate the upper bounds. For each algorithm, we report the computational time to solve the instances. As in Table 2, when optimality is not proven, the best lower bound is reported for the first three algorithm while for MIP, both lower bound and upper bound are presented.

For the large real-life instances (see Table 3), the time spent by UBP is still insignificant with respect to the total search time. The number of iterations is also limited in fewer than 40 for these very large instances. We note that the new algorithms with UBP (ExtBBClq and ExtUniBBClq) dominate the original algorithms in terms of computational time. The extended version ExtBBClq reduces the time of BBClq from hundreds of seconds to less than 30 seconds for 10 instances. It is also the only algorithm that solves discogs_affiliation. ExtUniBBClq is faster than ExtBBClq on 8 instances. It also achieves a substantial speed-up on jester-1 (where ). CPLEX is no longer able to give a lower bound for most instances (but we still observe that the tightened formulation leads to a better performance for the 2 solved instances).

7.1 The Size of B&B tree

In this section, we compare the sizes of B&B trees generated by the algorithms to solve MBBP instances. Though in BBClq or ExtBBClq, there is no explicit declaration of B&B nodes, we can treat one call of BBClq() procedure as one enumeration of B&B node. In CPLEX, the number of nodes in the search tree is directly available. We exclude ExtUniBBClq as the search scheme and branching heuristic are different from BBClq and ExtBBClq.

Firstly, we generate 18 random instances with fixing to 50 () and (density) ranging from 0.1 to 0.95, then we solve these instances by BBClq, ExtBBClq and CPLEX with the two formulations. Then we compare the number of B&B tree nodes between BBClq and ExtBBClq on the one hand, those generated by CPLEX with the original and tightened formulations on the other hand. The results are shown in Figure 2.

Figure 2: The base-10 log scale number of B&B tree nodes explored by BBClq, ExtBBClq, and CPLEX with the original and tightened formulations to solve the random graphs of different densities.

From Figure 2, we can observe that, the bounding technique and the extra inequalities significantly reduce the B&B tree size on sparse graphs (whose densities are below 0.2). When the density of random graph is lower than 0.9, ExtBBClq always enumerates fewer B&B nodes than BBClq. However, when the density increase to 0.9, the B&B tree of ExtBBClq has the same size as that of BBClq. Since we have improved the performance of intersection operation in the implementation of ExtBBClq (see Section 4.2), ExtBBClq still outperforms BBClq when the sizes of B&B trees are equal (see Table 2). As to the two mathematical formulations, CPLEX can solve the tightened formulation without expanding the B&B nodes when the graph is either quite sparse () or very dense ().

Secondly, we compare the sizes of B&B trees for the real-life instances. Figure 3 shows the number of B&B tree nodes of BBClq and ExtBBClq for the 21 instances that can be solved by both algorithms in 3 hours (see Table 3). We no longer compare the two mathematical formulations as CPLEX fails to solve the majority of these large instances. Figure 3 indicates that ExtBBClq enumerates fewer tree nodes for all the real-life instances. This is especially true for dbpedia-producer, dbpedia-writer and moreno_crime, where ExtBBClq prunes more than half of the B&B nodes compared to BBClq.

Figure 3: The base-10 log scaled number of B&B tree nodes explored by the BBClq and ExtBBClq algorithms for the 21 solvable instances.

8 Conclusion and Future Work

In this paper, we proposed new ideas for designing exact algorithms for the NP-hard Maximum Balanced Biclique Problem. We introduced the Upper Bound Propagation (UBP) procedure for the sake of estimating tight upper bound involving each vertex. UBP starts from the initial bound of each vertex and improves the upper bounds by a propagating procedure. Based on UBP, we extended the B&B algorithm (BBClq) of [9] and proposed new valid inequalities to tighten the MIP formulation of [5]. Furthermore, we presented a new exact algorithm (ExtUniBBClq) which enumerates eligible vertex subsets rather than feasible bicliques like BBClq. Experiments showed the effectiveness of the new proposed ideas for random graphs as well as large real-life instances. Further experiments also confirm that our bounding technique reduces the size of B&B search tree for the majority of benchmark instances.

As future work, it would be interesting to investigate the bit-parallel technique [11] within our algorithms to further improve their performances. Also, the branch-and-cut approach based on the tightened formulation constitutes another promising perspective for solving the problem. Finally, the idea of upper bound propagation could be adapted to other similar optimization problems.

References

  • [1] A. A. Al-Yamani, S. Ramsundar, D. K. Pradhan, A defect tolerance scheme for nanotechnology circuits, IEEE Transactions on Circuits and Systems 54 (11) (2007) 2402–2409.
  • [2] N. Alon, R. A. Duke, H. Lefmann, V. Rodl, R. Yuster, The algorithmic aspects of the regularity lemma, Journal of Algorithms 16 (1) (1994) 80–109.
  • [3] R. Carraghan, P. M. Pardalos, An exact algorithm for the maximum clique problem, Operations Research Letters 9 (6) (1990) 375–382.
  • [4] Y. Cheng, G. M. Church, Biclustering of expression data., in: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, vol. 8, 2000, pp. 93–103.
  • [5] M. Dawande, P. Keskinocak, J. M. Swaminathan, S. Tayur, On bipartite and multipartite clique problems, Journal of Algorithms 41 (2) (2001) 388–403.
  • [6] M. R. Garey, D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman & Co., New York, NY, USA, 1979.
  • [7] S. J. Hardiman, L. Katzir, Estimating clustering coefficients and size of social networks via random walk, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 539–550.
  • [8] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.
  • [9] C. McCreesh, P. Prosser, An exact branch and bound algorithm with symmetry breaking for the maximum balanced induced biclique problem, in: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, LNCS 8451, Springer, 2014, pp. 226–234.
  • [10] S. Ravi, E. L. Lloyd, The complexity of near-optimal programmable logic array folding, SIAM Journal on Computing 17 (4) (1988) 696–710.
  • [11] P. San Segundo, D. Rodríguez-Losada, A. Jiménez, An exact bit-parallel algorithm for the maximum clique problem, Computers & Operations Research 38 (2) (2011) 571–581.
  • [12] M. Soto, A. Rossi, M. Sevaux, Three new upper bounds on the chromatic number, Discrete Applied Mathematics 159 (18) (2011) 2281–2289.
  • [13] M. B. Tahoori, Application-independent defect tolerance of reconfigurable nanoarchitectures, ACM Journal on Emerging Technologies in Computing Systems 2 (3) (2006) 197–218.
  • [14] Q. Wu, J.-K. Hao, A review on algorithms for maximum clique problems, European Journal of Operational Research 242 (3) (2015) 693–709.
  • [15] B. Yuan, B. Li, A low time complexity defect-tolerance algorithm for nanoelectronic crossbar, in: Proceedings of the International Conference on Information Science and Technology, IEEE, 2011, pp. 143–148.
  • [16] B. Yuan, B. Li, A fast extraction algorithm for defect-free subcrossbar in nanoelectronic crossbar, ACM Journal on Emerging Technologies in Computing Systems (JETC) 10 (3) (2014) 25.
  • [17] B. Yuan, B. Li, H. Chen, X. Yao, A new evolutionary algorithm with structure mutation for the maximum balanced biclique problem, IEEE Transactions on Cybernetics 45 (5) (2015) 1040–1053.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
46142
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description