Towards Effective Exact Algorithms for the Maximum Balanced Biclique Problem
Abstract
The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NPhard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP. Firstly, we introduce an Upper Bound Propagation procedure to precompute an upper bound involving each vertex. Then we extend an existing branchandbound algorithm by integrating the precomputed upper bounds. We also present a set of new valid inequalities induced from the upper bounds to tighten an existing mathematical formulation for MBBP. Lastly, we investigate another exact algorithm scheme which enumerates a subset of balanced bicliques based on our upper bounds. Experiments show that compared to existing approaches, the proposed algorithms and formulations are more efficient in solving a set of random graphs and large reallife instances.
Keywords: Combinatorial optimization; Clique; Exact algorithms; Techniques for tight bounds; Mathematical formulation.
, , Corresponding author.
1 Introduction
Given a bipartite graph with two disjoint vertex sets , and an edge set , a biclique (or ) is the union of two subsets of vertices , such that , . In other words, the subgraph induced by vertex set is a complete bipartite graph. If , then biclique is a balanced biclique. The Maximum Balanced Biclique Problem (MBBP) is to find a balanced biclique of maximum cardinality. As holds for a balanced biclique , MBBP is then to find the maximum halfsize balanced biclique. MBBP is a special case of the conventional maximum clique problem [14].
MBBP is a prominent model with a large range of applications, such as nanoelectronic system design [1, 13], biclustering of gene expression data in computational biology [4] and PLAfolding in the VLSI theory [10]. In terms of computational complexity, the decision version of MBBP is NPComplete [6, 2], though the maximum biclique problem in bipartite graphs (without requiring ) is polynomially solvable by maximum matching algorithm [4].
Considerable effort has been devoted to the pursuit of effective algorithms for MBBP, both theoretically and practically. Heuristic algorithms represent the most popular approach for MBBP, though they do not guarantee the optimality of the final solution found. The majority of existing heuristic algorithms solve the equivalent maximum balanced independent set (a vertex set such that no two vertices are adjacent) problem in the complement graph, rather than directly seeking the maximum balanced biclique from the given graph. For example, several greedy heuristic algorithms were proposed based on vertexdeletion on the complement graph from 2006 to 2014 [1, 13, 15, 16], while in [17], an evolutionary algorithm combining structure mutation and repairassisted restart was studied.
On the other hand, according to our literature review, there are only two studies on exact algorithms in the literature. In [13], a recursive exact algorithm for searching a maximum balanced independent set with a given halfsize in the complement graph was proposed. However, the computational time of this algorithm becomes prohibitive when the number of vertices of the given graph exceeds (32,32). In [9], a branchandbound (B&B) algorithm for MBBP for general graphs (including nonbipartite graphs) was studied. The algorithm incorporates a clique cover technique for upper bound estimation (an equivalent technique of using graph coloring to estimate the upper bound for the maximum clique problem) and employs lex symmetry breaking techniques for general graphs. As far as we know, this algorithm is currently the best performing exact algorithm, even though the bounding technique and symmetry breaking techniques are only effective for nonbipartite graphs.
In addition to specifically designed exact algorithms, the general Mixed Integer Programming (MIP) constitutes an interesting alternative for addressing hard combinatorial problems such as MBBP. Commercial MIP solvers, like IBM CPLEX, can even solve some hard instances which cannot be handled by other approaches. Meanwhile, the success of a MIP solver highly depends on the tightness of the mathematical formulation of the problem. For MBBP, a MIP formulation has been proposed in [5], it is based on the complement graph. Another mathematical formulation which defines the constraints on the original graph was presented in [17]. However, this formulation was not applicable for MIP solvers as it contains nonlinear constraints.
In this work, we introduce new ideas for developing effective exact algorithms for MBBP, which can be applied to solve very large MBBP instances from applications like social networks. Our main contributions can be summarized as follows.

We elaborate an Upper Bound Propagation (UBP) procedure inspired from [12], which produces an upper bound of the maximum balanced biclique involving each vertex in the bipartite graph. UBP propagates the initial upper bound involving each vertex and achieves an even tighter upper bound for each vertex. UBP is independent from the search procedure and is performed before the start of the algorithm. An extended exact algorithm, denoted by (ExtBBClq), is proposed by taking advantage of UBP to improve BBClq, the branchandbound algorithm introduced in [9].

Based on the upper bounds returned by UBP, we introduce new valid inequalities to tighten the MIP formulation of MBBP introduced in [5]. Our computational experiments suggest that using the tightened model improves the performance of the MIP solver CPLEX.

We also present a new exact algorithm (ExtUniBBClq) to supplement the family of B&B based algorithms for MBBP. Unlike BBClq which goes through every possible balanced biclique, the new algorithm only enumerates the possible partial sets (halfsets) of the balanced bicliques in the graph. ExtUniBBClq also integrates UBP as a preprocessing procedure and performs generally well for the benchmark instances.
The reminder of the paper is organized as follows. Section 2 introduces the notations that will be used throughout the paper and Section 3 reviews the BBClq algorithm introduced in [9]. In Section 4, we present our Upper Bound Propagation procedure for upper bound estimation and explain how to use it to improve BBClq. Then, in Section 5, we show how the upper bounds can lead to new valid inequalities to tighten the MIP formulation of [5]. Furthermore, we introduce the novel ExtUniBBClq algorithm in Section 6. Computational results and experimental analyses are presented in Section 7, followed by conclusions and future working directions.
2 Notations
Given a bipartie graph ( if not specifically stated), let be a balanced biclique of (i.e., ). The halfsize of the balanced biclique is the cardinality of (or ). For example, in Figure 1, is a balanced biclique of halfsize of 2. For all , denotes the subgraph of induced by . Given a vertex in , the set of vertices adjacent to is denoted by and is the degree of vertex . The upper bound involving vertex , denoted by , is an upper bound of the halfsize of the maximum balanced biclique containing vertex . For example, in Figure 1, a possible value for could be 2, since .
3 Review of the BBClq algorithm
Algorithm 1 shows the BBClq algorithm, which is a recursive exact algorithm introduced in [9]. BBClq is adapted from a wellknown B&B algorithm for the maximum clique problem [3] and recursively builds up two sets and such that forms a biclique. The algorithm maintains a candidate set () that includes vertices which are eligible to move into () while ensuring that is a biclique (i.e., , ). Initially, the algorithm sets , the global lower bound on the maximum biclique halfsize to 0 and starts the search by calling BBClq.
At each recursive call to BBClq, a vertex (called branch vertex) is moved from (lines 78). The algorithm then considers the branches (possibilities) of in lines 912 and in the next while loop. The bounding procedure (line 9) prunes the branch of if the upper bound after estimation in this context is not larger than the global lower bound. The upper bound estimating method, which is classically a key point concerning the performance of a B&B algorithm, will be introduced in the following section. If the current branch is not pruned, the search goes on by reconstructing with a new vertex and by filtering from those vertices not adjacent to (every vertex in must be adjacent to every vertex in ). After updating the two sets, the algorithm recursively calls BBClq in line 12, swapping the roles of and , as and are extended alternatively for the sake of satisfying the balance requirement. The above process is repeated in the next recursive call of BBClq.
When the algorithm loops back to line 4, as we just mentioned, it explores another branch implying . The while loop stops when becomes empty or when the remaining vertices in do not allow to build a solution better than the global lower bound (lines 56). Besides, since or holds each time BBClq is called, we update the lower bound in lines 13 once and store the incumbent solution as the best solution found so far. As a result, the best solution is an optimal biclique with ( or ), but it may not be totally balanced (). Thus, in line 13, the procedure of retrieving the maximum balanced biclique (of halfsize ) from a biclique is accomplished by make_balance(). This procedure simply removes vertices from the larger set or until a balanced biclique is obtained.
Figure 1 is now used to illustrate BBClq. Initially, and BBClq is called. According to the minimal degree heuristic in [9], vertex is chosen as the first branch vertex. Clearly, the current upper bound is greater than 0, the algorithm proceeds to BBClq to explore the solutions containing vertex . As a result, the solution is found and is updated to 1. Likewise, the algorithm selects as the second branch vertex in the following loop, proceeds to BBClq if no upper bounding technique is applied. We can see that this recursive call to BBClq has to explore the case of expanding the given biclique by adding vertex 8 or 10. However, with the upper bounding estimating technique proposed in this paper, the call of BBClq will not even start since the upper bound involving vertex 5 is 1 (). The algorithm finds the optimal solution after the third loop (which explores and calls BBClq). There will be no additional iteration as ().
6
6
6
6
6
4 Upper bound propagation and its use to improve BBClq
We introduce in this section our Upper Bound Propagation procedure (UBP) which is then used as a preprocessing technique to reinforce the BBClq algorithm presented in the last section.
4.1 The upper bound propagation procedure
The original BBClq algorithm calculates a clique cover (based on addressing the graph coloring problem on the complement graph) to estimate the upper bound in a general graph relying on the fact that sets and are independent sets. However, when the given graph is bipartite, the upper bound found by this technique is trivial as two vertex sets are initially independent sets. Here, we introduce our Upper Bound Propagation to produce, for each vertex, an upper bound on the halfsize of any maximum balanced biclique involving that vertex. UBP is based on the following propositions.
Proposition 1
For each vertex , is an upper bound on the maximum halfsize balanced biclique involving .
This proposition is obviously true since the halfsize of a balanced biclique cannot exceed the degree of any vertex in the biclique.
Proposition 2
Given a vertex , let . Let be the maximum integer such that there exists at least vertices in satisfying , then is an upper bound on the maximum halfsize balanced biclique involving .
Proof: Clearly, in the maximum balanced biclique involving , for any vertex (including ), we have . Therefore, the maximum possible value such that vertices in share at least adjacent vertices with is an upper bound involving . Note that this proposition also holds given any vertex in .
Proposition 3
Given a vertex , let be the largest integer such that there exists vertices in having upper bounds at least . Then is an upper bound on the maximum halfsize balanced biclique involving .
Proof: We prove this proposition by contradiction. Suppose is not an upper bound, then there exists a balanced biclique involving of halfsize such that , implying that all the vertices in () must have an upper bound of at least (i.e., ), which contradicts the condition that is the maximum integer such that there exists in at least vertices having .
Consider the example of Figure 1, according to Proposition 1, we have , . Then, following Proposition 2, can be improved (decreased) to since , (). Similarly can also be improved to 2, 2, 1, 1 respectively. By Proposition 3, it can be deduced that and (, ), which are better upper bounds than the degrees.
Based on these proposition, we devise the UBP procedure (see Algorithm 2) to calculate an upper bound involving each vertex. Initially is set to , then the upper bound of each vertex in is improved according to Proposition 2 (lines 29). From line 10 to the end of Algorithm 2, the procedure aims at propagating the upper bound based on Proposition 3 until the upper bounds cannot be improved any more. The propagation procedure is guaranteed to converge as the upper bounds cannot be smaller than 0. Experiments in Section 7 show that, for both random and reallife large instances, UBP converges very fast, only in a limited number of iterations.
In both lines 7 and 14, we use binary search to find, for a given set of integers, the maximum element such that there are at least integers in that are larger than or equal to . The procedure works as follows: first, is sorted by decreasing order, then, an iteration starts by comparing the middle element with its index in (i.e., its position in the sorted list). If the middle element is greater (respectively lesser) than its index, the next iteration proceeds with the second half (respectively the first half) of . This binary search procedure based on dichotomy performs at most operations.
Actually, we can also tighten the initial upper bound involving each vertex in by repeating the process in lines 29 after replacing with before the propagating procedure (lines 1017) starts. However, this procedure requires considerable memory and time especially for large graphs. For example, the matrix representing requires a memory of , the computational time of computing () is bounded by . Thus the overhead is not negligible. As a compromise, we set a threshold on the size of the vertex set. We apply the procedure of lines 29 to improve the upper bound involving each vertex only when the cardinality of the vertex set ( or ) is less than the threshold. In the following experiments, the threshold has empirically been set to 30000.
10
10
10
10
10
10
10
10
10
To see how tight the upper bounds provided by UBP are, consider the example of Figure 1, the final upper bound achieved by UBP is for and for . These upper bounds are actually all tight.
4.2 Combining UBP with BBClq: ExtBBClq
As UBP is independent of the search algorithm, we use it as a preprocessing procedure for BBClq to obtain an extended version named ExtBBClq. In ExtBBClq, we use the same branching heuristic as in the original BBClq algorithm: the vertex of the minimum degree in is given the highest priority for branching. To efficiently implement ExtBBClq, we sort the arrays () in ascending order of index number before the beginning of BBClq, so that the intersection operation in line 11 (Algorithm 1) can be accomplished in asymptotic time by binary search. More importantly, to make use of the upper bound information calculated by UBP, in ExtBBClq, instead of calculating the upper bound by calling the upper bound estimation method (i.e., in line 9, we use the precomputed returned by UBP as the upper bound in the current branch.
5 A tighter mathematical formulation
In this section, we propose a tightened mathematical formulation for MBBP that takes advantage of the UBP procedure. Let us first recall the mathematical formulation of MBBP introduced in [5]:
(1) 
subject to:
(2) 
(3) 
(4) 
where each vertex of is associated to a binary variable indicating whether the vertex is part of the biclique, is the set of edges in the complement bipartite graph of . Constraint (2) requires that each pair of nonadjacent vertices cannot be selected at the same time (i.e., the solution must form a biclique). Constraint (3) enforces that the biclique is balanced.
To make use of the upper bounds returned by UBP, let (or ) be the set of all the vertices in (respectively in ) such that ( is a positive integer) for all . Then the following inequality is valid:
Indeed, the vertices in can only be involved in balanced cliques having halfsize less than . We consider this inequality for as it dominates the inequalities associated with lower values of .
Before tightening this inequality, we observe that since , we have . Then for each such that , or equivalently for all , we can lift the term associated with :
Let be any maximal subset of containing such that for all and , then is empty. The term ‘maximal subset’ means that no vertex can be added to .
We can deduce the following valid inequality:
It can be observed that the valid inequalities built from two vertices and of may possibly be identical, especially if . This is not an issue since modern solvers remove duplicate constraints automatically during presolving.
Naturally, the lower is, the tighter these inequalities are. Consider the example of Figure 1, since the upper bound involving each vertex is given by UBP, we can produce the following valid inequalities ():

Vertex 1 (and also 4) leads to

Vertex 5 leads to

Vertex 6 leads to

Vertex 10 leads to
The LP relaxation of the original formulation (1)(4) yields an objective of 2.5, and nearly all the variables are fractional. Adding these four inequalities yields an objective of 2 and an integer solution, which proves to be optimal.
6 A novel MBBP algorithm ExtUniBBClq
We observe that for any biclique such that and , the maximum balanced biclique in subgraph is with , and the maximum halfsize of any is still . In other words, the halfsize of the maximum balanced biclique in is the cardinality of maximum subset which satisfies . As a result, instead of building the two sets of balanced biclique alternatively, we can directly enumerate the eligible subset from (or from ) such that . Based on this observation, we propose a new algorithm (Algorithm 3) which builds the maximum eligible subset from (as ).
6
6
6
6
6
The framework of ExtUniBBClq is similar to Algorithm 1 except that ExtUniBBClq only builds the set recursively such that forms a clique and . Moreover, in ExtUniBBClq, we make use of the upper bound involving each vertex returned by UBP. Therefore, UBP has to be called before the start of ExtUniBBClq. For each call of ExtUniBBClq, a current set , as well as the candidate set that contains vertices that can be moved into , and set which is the common adjacent vertices of vertex in (i.e., ) are given. The algorithm initializes to 0 and begins from ExtUniBBClq.
As in BBClq, in each call of ExtUniBBClq, a branch vertex (with the maximum upper bound) is moved out from (lines 910) and the algorithm goes to two branches: the branch where before the end of current loop (lines 1115) and another branch where in the next loop. In lines 1112, when the upper bound associated with vertex is not larger than the lower bound, ExtUniBBClq stops the current search immediately once is the largest upper bound of all vertices in . In lines 1315, the search goes on by expanding and rebuilding . Note that we filter out unpromising vertices from which have an upper bound not larger than the lower bound. In the end of the loop, ExtUniBBClq is called to further enlarge set . After it returns, the algorithm moves to the next loop, entering another branch with . In lines 78, the search is stopped when is not large enough to build a better solution. In lines 5 and 13, the make_balance procedure is called so that the final solution is a strictly balanced biclique of halfsize .
At the start of each call to ExtUniBBClq, we update the lower bound if it is needed (lines 13) and terminate the current search if is not eligible () or is not larger than the lower bound. For an efficient implementation, we presort the array which represents (the initial ) in ascending order of the upper bound involving each vertex. Consequently, the last element in the array will always be the vertex with the largest upper bound. We also sort the arrays representing (the initial ) and () in ascending order of the index of each vertex so that the intersection operation in line 14 can be accomplished in linear time.
We illustrate the principle of this procedure by using the example of Figure 1 again. Firstly, the lower bound is initialized to 0 and ExtUniBBClq is then called. In the first while loop, vertex is selected as the branch vertex as is the largest upper bound of all vertices in ; then we get , , . The next call to ExtUniBBClq expands the incumbent set , which leads to a solution (and is updated to 2). The second while loop which branches on vertex and builds candidate set , is stopped earlier as the largest upper bound () is equal to . Thus, the whole search stops, returning as the optimal solution.
7 Computational experiments
This section is dedicated to a computational evaluation of the proposed algorithms for MBBP, based on the following two sets of benchmark graphs which are commonly used in the literature [7, 17].

Random graphs. In these graphs, every possible edge occurs independently with fixed probability. For each graph, there are vertices in each vertex set (i.e., ) and the probability that an edge exists between a pair of vertices is (). A theoretical analysis in [5] showed that the maximum halfsize of the balanced bicliques in such graphs is in the range with high probability (when is sufficiently large).

Reallife networks. This set includes 30 bipartite networks from the Koblenz Network Collection (KONECT) [8], which contains hundreds of networks derived from reallife applications, including social networks, hyperlink networks, authorship networks, physical networks, interaction networks and communication networks. We summarize the main features of the selected instances in Table 1, including the number vertices and edges. Irrelevant information for solving MBBP, such as multiple edges, vertex or edges weights have been filtered out.
instance  instance  

actormovie  127823  383640  1470418  editdewiki  425842  3195148  57323775 
bibsonomy2ui  5794  767447  2555080  editfrwiktionary  5017  1907247  7399298 
bookcrossing_fullrating  40523  105278  1149739  escorts  6624  10106  50632 
dblpauthor  1425813  4000150  8649016  flickrgroupmemberships  103631  395979  8545307 
dbpediagenre  7783  258934  463497  github  56519  120867  440237 
dbpedialocation  53407  172091  293697  gottrontrec  556077  1173225  83629405 
dbpediaoccupation  101730  127577  250945  jester1  100  73421  4136360 
dbpediaproducer  48833  138844  207268  moreno_crime  551  829  1476 
dbpediarecordlabel  18421  168337  233286  opsahlucforum  522  899  33720 
dbpediastarring  76099  81085  281396  pics_ut  17122  82035  2298816 
dbpediateam  34461  901166  1366466  reuters  283911  781265  60569726 
dbpediawriter  46213  89356  144340  stackexchangestackoverflow  96680  545196  1301942 
discogs_affiliation  270771  1754823  14414659  unicodelang  254  614  1255 
discogs_lgenre  6624  10106  50632  wikiencat  182947  1853493  3795796 
discogs_style  383  1617943  24085580  youtubegroupmemberships  30087  94238  293360 
We compare the performance of 5 algorithms including both the existing approaches and the new approaches proposed in this work. The first 3 algorithms are B&B algorithms.

BBClq: the algorithm introduced in [9]. However, compared with the original algorithm, symmetry breaking and clique cover techniques are removed as they are irrelevant for bipartite graphs.

ExtBBClq: the extended version of BBClq combining UBP with our new branching heuristic presented in Section 4.2

ExtUniBBClq: the new algorithm introduced in Section 6.
To compare the original mathematical formulation and the tightened formulation presented in this work, we use IBM CPLEX 12.6.1 to solve the benchmark instances with both formulations.
UBP  B&B Algorithms  MIP  

time  iter  BBClq  ExtBBClq  ExtUniBBClq  Original  Tightened  
50  0.1  0.00  3.1  0.00  0.00  0.00  4.02  0.41 
50  0.3  0.00  2.8  0.02  0.01  0.01  4.62  4.70 
50  0.5  0.00  3.3  0.45  0.15  0.3  6.48  11.52 
50  0.7  0.00  3.0  24.19  5.12  11.60  8.49  7.47 
50  0.9  0.00  3.2  10174.28(5)  680.11  4405.93(28)  0.39  0.33 
100  0.1  0.00  3.8  0.01  0.00  0.00  30.26  4.77 
100  0.3  0.00  3.2  0.96  0.42  0.78  457.77  510.60 
100  0.5  0.01  3.3  118.97  32.22  52.75  7137.27  4392.06 
100  0.7  0.01  3.1  [13.50]  9540.81(17)  [13.73]  [13.5720.30]  [13.4020.00] 
100  0.9  0.00  2.8  [26.07]  [25.37]  [26.87]  9977.23(6)  10358.21(4) 
150  0.1  0.00  3.7  0.06  0.04  0.05  305.04  225.67 
150  0.3  0.01  3.0  11.28  3.44  8.75  9953.88(15)  9937.61(14) 
150  0.5  0.02  3.3  4716.49  933.95  2281.69  [8.8628.38]  [8.8928.46] 
150  0.7  0.04  3.4  [14.73]  [14.73]  [15.00]  [14.8241.86]  [14.6242.08] 
150  0.9  0.05  3.3  [29.53]  [27.93]  [30.23]  [35.0455.58]  [34.7155.54] 
200  0.1  0.01  3.3  0.19  0.07  0.26  1725.24  1239.80 
200  0.3  0.03  3.2  84.08  21.95  59.22  [6.0038.44]  [6.0022.52] 
200  0.5  0.05  3.3  [9.97]  10761.34(3)  [10.03]  [8.9555.68]  [9.0550.76] 
200  0.7  0.08  3.3  [15.30]  [15.23]  [15.93]  [15.4564.32]  [15.6765.78] 
200  0.9  0.11  3.3  [31.93]  [30.10]  [32.60]  [38.7079.19]  [38.2179.52] 
instance  BEST  UBP  B&B Algorithms  MIP  

time  iter  BBClq  ExtBBClq  ExtUniBBClq  Original  Tightened  
actormovie  8*  5.54  27  6533.01  1671.29  807.25     
bibsonomy2ui  8*  1.56  7  491.36  13.84  9.13     
bookcrossing_fullrating  13*  5.11  33  3102.66  426.37  [10]     
dblpauthor  10*  19.86  21  [1]  403.16  30.06     
dbpediagenre  7*  3.94  9  171.86  5.83  16.35     
dbpedialocation  5*  0.18  8  633.98  0.52  0.39     
dbpediaoccupation  6*  0.27  8  909.03  1.29  1.57     
dbpediaproducer  6*  0.27  11  535.44  0.62  0.65     
dbpediarecordlabel  6*  24.33  7  214.45  24.67  24.04     
dbpediastarring  6*  1.07  31  530.61  4.67  1.39     
dbpediateam  6*  3.08  15  2982.24  241.06  1170.25     
dbpediawriter  6*  0.19  12  283.16  0.35  0.23     
discogs_affiliation  26*  12.01  17  [1]  1688.95  [18]     
discogs_lgenre  15*  0.06  1  37.08  1.01  0.17     
discogs_style  38  17.42  22  [23]  [38]  [23]     
editdewiki  40  93.68  23  [1]  [40]  [14]     
editfrwiktionary  19*  9.56  9  944.21  152.5  [19]     
escorts  6*  5.69  6  7.68  10.05  10.55     
flickrgroupmemberships  36  47.37  36  [34]  [36]  [18]     
github  12*  1.01  16  677.72  150.66  [12]     
gottrontrec  83  549.21  35  [33]  [38]  [83]     
jester1  100*  1.86  1  1204.64  1123.66  4.24  248.87   
moreno_crime  2*  0.05  3  0.05  0.06  0.06  3483.58  55.22 
opsahlucforum  5*  0.09  10  0.18  0.13  0.26  [4285]  [57] 
pics_ut  27  21.84  7  [27]  [23]  [23]     
reuters  39  611.76  61  [35]  [39]  [12]     
stackexchangestackoverflow  9*  4.62  29  4107.56  265.8  3690.8     
unicodelang  4*  0.02  5  0.01  0.02  0.03  1218.46  19.58 
wikiencat  14*  7.67  20  [1]  28.72  121.99     
youtubegroupmemberships  12*  0.96  21  222.88  11.49  1784.76     
All the experiments are conducted on a computer with an Intel Xeon^{©} E52670 processor (2.5GHz and 2GB RAM) running CentOS 6.5. The BBClq, ExtBBClq and ExtUniBBClq algorithms are implemented in C++ and compiled with g++ using optimization option O3^{1}^{1}1The code of our algorithms will be available online.. For each instance, a cutoff time limit of 3 hours (10800 seconds) is given to each trial. When solving the DIMACS machine benchmark procedure âdfmax.câ^{2}^{2}2dfmax:ftp://dimacs.rutgers.edu/pub/dsj/clique/ without compilation optimization flag, the run time on our machine is 0.46, 2.68 and 10.70 seconds for graphs r300.5, r400.5 and r500.5 respectively.
The experimental results for random graphs are summarized in Table 2. For each configuration which is a pairwise combination of (the cardinality of or ) and (the edge density), we generated 30 graphs independently. Column “time” reports the preprocessing time (the time of running UBP) and column “iter” shows the average number of while loops (i.e., lines 1117, Algorithm 2) needed to stabilize the upper bound of all the vertices. For each algorithm, we report the average time consumed to solve the corresponding instance (the time for preprocessing is also included). Note that 0.00 means the instance can be solved in less than 0.01 second. If some of the 30 instances cannot be solved within 3 hours, we also append the number of solved instances in brackets. If none of the 30 instances can be solved, for the first three algorithms, we report the average best lower bound, for MIP, we report both the average best lower bound and average upper bound. The shortest times among the first three algorithms and between the two mathematical formulations are highlighted by bold font.
For the tested random graphs, the time consumption of UBP is insignificant with respect ro the whole search time. Meanwhile, the number of iterations for propagating upper bounds is also trivial (closely around 3 for all the configurations). In terms of computational time of all the algorithms, ExtBBClq is generally the fastest algorithm to solve most of these instances while ExtUniBBClq also performs better than BBClq. The MIP formulation is found to be quite competitive compared with the other three algorithms when the density of the instance reaches 0.9. A possible explanation to this phenomenon is that the formulation of dense instances involves fewer constraints and may be solved more easily. In addition, when graphs density increases much, the maximum biclique and the maximum balanced biclique tend to be closer and closer, which means that the balancing constraint, which makes the problem NPhard, tends to be less and less active. Since the maximum biclique problem is easy on bipartite graph, a balanced biclique is likely to be found easily by a MIP solver (as the quality of the linear programming relaxation improves with density) whereas high density is the worst situation for enumerative approaches. As expected, the tightened MIP is often solved faster than the original MIP, and the gap to optimality is generally less for those instances that could not be solved to optimality.
We report the results for the set of large reallife instances in Table 3. Column “instance” indicates the name of graph. Column “best” shows the best halfsize found by all algorithms. An extra “*” indicates the optimality of this best value. Column “UBP” also reports the time of preprocessing and number of iterations to propagate the upper bounds. For each algorithm, we report the computational time to solve the instances. As in Table 2, when optimality is not proven, the best lower bound is reported for the first three algorithm while for MIP, both lower bound and upper bound are presented.
For the large reallife instances (see Table 3), the time spent by UBP is still insignificant with respect to the total search time. The number of iterations is also limited in fewer than 40 for these very large instances. We note that the new algorithms with UBP (ExtBBClq and ExtUniBBClq) dominate the original algorithms in terms of computational time. The extended version ExtBBClq reduces the time of BBClq from hundreds of seconds to less than 30 seconds for 10 instances. It is also the only algorithm that solves discogs_affiliation. ExtUniBBClq is faster than ExtBBClq on 8 instances. It also achieves a substantial speedup on jester1 (where ). CPLEX is no longer able to give a lower bound for most instances (but we still observe that the tightened formulation leads to a better performance for the 2 solved instances).
7.1 The Size of B&B tree
In this section, we compare the sizes of B&B trees generated by the algorithms to solve MBBP instances. Though in BBClq or ExtBBClq, there is no explicit declaration of B&B nodes, we can treat one call of BBClq() procedure as one enumeration of B&B node. In CPLEX, the number of nodes in the search tree is directly available. We exclude ExtUniBBClq as the search scheme and branching heuristic are different from BBClq and ExtBBClq.
Firstly, we generate 18 random instances with fixing to 50 () and (density) ranging from 0.1 to 0.95, then we solve these instances by BBClq, ExtBBClq and CPLEX with the two formulations. Then we compare the number of B&B tree nodes between BBClq and ExtBBClq on the one hand, those generated by CPLEX with the original and tightened formulations on the other hand. The results are shown in Figure 2.
From Figure 2, we can observe that, the bounding technique and the extra inequalities significantly reduce the B&B tree size on sparse graphs (whose densities are below 0.2). When the density of random graph is lower than 0.9, ExtBBClq always enumerates fewer B&B nodes than BBClq. However, when the density increase to 0.9, the B&B tree of ExtBBClq has the same size as that of BBClq. Since we have improved the performance of intersection operation in the implementation of ExtBBClq (see Section 4.2), ExtBBClq still outperforms BBClq when the sizes of B&B trees are equal (see Table 2). As to the two mathematical formulations, CPLEX can solve the tightened formulation without expanding the B&B nodes when the graph is either quite sparse () or very dense ().
Secondly, we compare the sizes of B&B trees for the reallife instances. Figure 3 shows the number of B&B tree nodes of BBClq and ExtBBClq for the 21 instances that can be solved by both algorithms in 3 hours (see Table 3). We no longer compare the two mathematical formulations as CPLEX fails to solve the majority of these large instances. Figure 3 indicates that ExtBBClq enumerates fewer tree nodes for all the reallife instances. This is especially true for dbpediaproducer, dbpediawriter and moreno_crime, where ExtBBClq prunes more than half of the B&B nodes compared to BBClq.
8 Conclusion and Future Work
In this paper, we proposed new ideas for designing exact algorithms for the NPhard Maximum Balanced Biclique Problem. We introduced the Upper Bound Propagation (UBP) procedure for the sake of estimating tight upper bound involving each vertex. UBP starts from the initial bound of each vertex and improves the upper bounds by a propagating procedure. Based on UBP, we extended the B&B algorithm (BBClq) of [9] and proposed new valid inequalities to tighten the MIP formulation of [5]. Furthermore, we presented a new exact algorithm (ExtUniBBClq) which enumerates eligible vertex subsets rather than feasible bicliques like BBClq. Experiments showed the effectiveness of the new proposed ideas for random graphs as well as large reallife instances. Further experiments also confirm that our bounding technique reduces the size of B&B search tree for the majority of benchmark instances.
As future work, it would be interesting to investigate the bitparallel technique [11] within our algorithms to further improve their performances. Also, the branchandcut approach based on the tightened formulation constitutes another promising perspective for solving the problem. Finally, the idea of upper bound propagation could be adapted to other similar optimization problems.
References
 [1] A. A. AlYamani, S. Ramsundar, D. K. Pradhan, A defect tolerance scheme for nanotechnology circuits, IEEE Transactions on Circuits and Systems 54 (11) (2007) 2402–2409.
 [2] N. Alon, R. A. Duke, H. Lefmann, V. Rodl, R. Yuster, The algorithmic aspects of the regularity lemma, Journal of Algorithms 16 (1) (1994) 80–109.
 [3] R. Carraghan, P. M. Pardalos, An exact algorithm for the maximum clique problem, Operations Research Letters 9 (6) (1990) 375–382.
 [4] Y. Cheng, G. M. Church, Biclustering of expression data., in: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, vol. 8, 2000, pp. 93–103.
 [5] M. Dawande, P. Keskinocak, J. M. Swaminathan, S. Tayur, On bipartite and multipartite clique problems, Journal of Algorithms 41 (2) (2001) 388–403.
 [6] M. R. Garey, D. S. Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness, W. H. Freeman & Co., New York, NY, USA, 1979.
 [7] S. J. Hardiman, L. Katzir, Estimating clustering coefficients and size of social networks via random walk, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 539–550.
 [8] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.
 [9] C. McCreesh, P. Prosser, An exact branch and bound algorithm with symmetry breaking for the maximum balanced induced biclique problem, in: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, LNCS 8451, Springer, 2014, pp. 226–234.
 [10] S. Ravi, E. L. Lloyd, The complexity of nearoptimal programmable logic array folding, SIAM Journal on Computing 17 (4) (1988) 696–710.
 [11] P. San Segundo, D. RodríguezLosada, A. Jiménez, An exact bitparallel algorithm for the maximum clique problem, Computers & Operations Research 38 (2) (2011) 571–581.
 [12] M. Soto, A. Rossi, M. Sevaux, Three new upper bounds on the chromatic number, Discrete Applied Mathematics 159 (18) (2011) 2281–2289.
 [13] M. B. Tahoori, Applicationindependent defect tolerance of reconfigurable nanoarchitectures, ACM Journal on Emerging Technologies in Computing Systems 2 (3) (2006) 197–218.
 [14] Q. Wu, J.K. Hao, A review on algorithms for maximum clique problems, European Journal of Operational Research 242 (3) (2015) 693–709.
 [15] B. Yuan, B. Li, A low time complexity defecttolerance algorithm for nanoelectronic crossbar, in: Proceedings of the International Conference on Information Science and Technology, IEEE, 2011, pp. 143–148.
 [16] B. Yuan, B. Li, A fast extraction algorithm for defectfree subcrossbar in nanoelectronic crossbar, ACM Journal on Emerging Technologies in Computing Systems (JETC) 10 (3) (2014) 25.
 [17] B. Yuan, B. Li, H. Chen, X. Yao, A new evolutionary algorithm with structure mutation for the maximum balanced biclique problem, IEEE Transactions on Cybernetics 45 (5) (2015) 1040–1053.