Generic CPSupported CMSA for Binary Integer Linear Programs
Abstract
Construct, Merge, Solve & Adapt (CMSA) is a general hybrid metaheuristic for solving combinatorial optimization problems. At each iteration, CMSA (1) constructs feasible solutions to the tackled problem instance in a probabilistic way and (2) solves a reduced problem instance (if possible) to optimality. The construction of feasible solutions is hereby problemspecific, usually involving a fast greedy heuristic. The goal of this paper is to design a problemagnostic CMSA variant whose exclusive input is an integer linear program (ILP). In order to reduce the complexity of this task, the current study is restricted to binary ILPs. In addition to a basic problemagnostic CMSA variant, we also present an extended version that makes use of a constraint propagation engine for constructing solutions. The results show that our technique is able to match the upper bounds of the standalone application of CPLEX in the context of rather easytosolve instances, while it generally outperforms the standalone application of CPLEX in the context of hard instances. Moreover, the results indicate that the support of the constraint propagation engine is useful in the context of problems for which finding feasible solutions is rather difficult.
inline \fxsetfaceenv \fxusethemecolor
1 Introduction
Construct, Merge, Solve & Adapt (CMSA) [6] is a hybrid metaheuristic that can be applied to any combinatorial optimization problem for which is known a way of generating feasible solutions, and whose subproblems can be solved to optimality by a blackbox solver. Moreover, note that CMSA is thought for those problem instances for which the application of the standalone blackbox solver is not feasible due to the problem instance size and/or difficulty. The main idea of CMSA is to generate reduced subinstances of the original problem instances, based on feasible solutions that are constructed at each iteration, and to solve these reduced instances by means of the blackbox solver. Obviously, the parameters of CMSA have to be adjusted in order for the size of the reduced subinstances to be such that the blackbox solver can solve them efficiently. CMSA has been applied to several NPhard combinatorial optimization problems, including minimum common string partition [6, 4], the repetitionfree longest common subsequence problem [5], and the multidimensional knapsack problem [15].
A possible disadvantage of CMSA is the fact that a problemspecific way of probabilistically generating solutions is used in the abovementioned applications. Therefore, the goal of this paper is to design a CMSA variant that can be easily applied to different combinatorial optimization problems. One way of achieving this goal is the development of a solver for a quite general problem. Combinatorial optimization problems can be conveniently expressed as Integer Linear Programs (ILPs) in the format , where indicates a constraints matrix, and are the cost and righthandside vectors, respectively and is a vector of decision variables whose values are restricted to integral numbers. In this paper we propose a generic CMSA for binary integer programs (BIPs), that are obtained when . This type of problem is generic enough to model a wide range of combinatorial optimization problems, from the classical traveling salesman problem [2] to protein threading problems [19] and a myriad of applications listed in the MIPLIB 2010 collection of problem instances [13]. As CMSA is an algorithm that makes use of a solution construction mechanism at each iteration, one of the challenges that we address in this paper is the fast production of feasible solutions for general BIPs. For this purpose we support the proposed generic CMSA with a constraint propagation (CP) tool for increasing the probability to generate feasible solutions.
This paper is organized as follows. The next section discusses related work. In Section 3, the original version of CMSA is presented, which assumes that the type of the tackled problem is known. The generic CMSA proposal for general BIPs is described in Section 4. Finally, an extensive experimental evaluation is presented in Section 5 and conclusions as well as an outlook to future work are provided in Section 6.
2 Related Works
The development of fast, reliable general purpose combinatorial optimization solvers is a topic that occupies operations research practitioners since many years. The main reason being that the structure of real world optimization problems usually does not remain fixed over time: constraints usually change over time and solvers optimized for a very particular problem structure may lose their efficiency in this process. Thus, a remarkable interest in integer linear programming (ILP) software packages exists, with several commercially successful products such as IBM CPLEX, Gurobi and XPRESS. This success can be attributed to the continuous improvements concerning the performance of these solvers [12, 11] and the availability of high level languages such AMPL [10]. The application of these solvers to instances with very different structures creates many challenges. From a practical point of view, the most important one is, possibly, the ability of quickly providing high quality feasible solutions: even though a complete search is executed, it is quite often the case that time limits need to be imposed and a truncated search is performed. Thus, several methods have been proposed to try to produce feasible solutions in the first steps of the search process. One of the best known approaches is the socalled feasibility pump [8, 9].
In the context of metaheuristics, Kochenberger et al. [14] developed a general solver for unconstrained binary quadratic programming (UBQP) problems. A whole range of important combinatorial optimization problems such as set partitioning and coloring can be easily modeled as special cases of the UBQP problem. Experiments showed that their general solver was able to produce high quality solutions much faster than the general purpose ILP solver CPLEX for hard problems such as the the set partitioning problem. Brito & Santos [18] proposed a local search approach for solving BIPs, obtaining some encouraging results when comparing to the COINOR CBC BranchandCut solver. In the context of constraint programming, Benoist et al. [3] proposed a fast heuristic solver (LocalSolver) based on local search. Experiments showed that LocalSolver outperformed several other solvers, especially for what concerns executions with very restricted computation times. In this paper we propose a CMSA solver for solving BIPs. This format is more restricted than the LocalSolver format, where nonlinear functions can be used, but much more general than the UBQP, which can be easily modeled as a special case of binary programming. One advantage of BIPs is that several high performance solvers can be used to solve the subproblems generated within CMSA, a feature that will be explored in the next sections.
3 Original CMSA in the Context of BIPs
As already mentioned, in this work we focus on solving BIPs. Any BIP can be expressed in the following way:
(1) 
where is an matrix, is the righthandsize vector of size , is a cost vector, and is the vector of binary decision variables. Note that is the number of constraints of this BIP.
In the following we describe the original CMSA algorithm from [6]. However, instead of providing a general description as in [6], our description is already tailored for the application to BIPs. In order to clarify this fact, the algorithm described below is labeled CMSABIP. In general, the main idea of CMSA algorithms is to take profit from an efficient complete solver even in the context of problem instances that are too large to be solved directly. The general idea of CMSA is as follows. At each iteration, CMSA probabilistically generates solutions to the tackled problem instance. Next, the solution components that are found in these solutions are added to a subinstance of the original problem instance. Subsequently, an exact solver is used to solve the subinstance (if possible in the given time) to optimality.^{1}^{1}1In the context of problems that can be modelled as BIPs, any blackbox ILP solver such as, for example, CPLEX can be used for this purpose Moreover, the algorithm is equipped with a mechanism for deleting seemingly useless solution components from the subinstance. This is done such that the subinstance has a moderate size and can be solved rather quickly to optimality.
In the context of CMSABIP, any combination of a variable with one of its values is a solution component denoted by . Given a BIP instance, the complete set of solution components if denoted by . Any subinstance of the given BIP is a subset of , that is, . Such a subinstance is feasible, if contains for each variable () at least one solution component , that is, either , or , or both. Moreover, a solution to the given BIP is any binary vector that fulfills the constraints from Eq. (1). Note that a feasible solution contains solution components: .
The pseudocode of the CMSABIP algorithm is given in Algorithm 1. At each iteration the following is done. First, the bestsofar solution is initialized to null, indicating that no such solution exists yet. Moreover, subinstance is initialized to the empty set. Note, also, that each solution component has a socalled age value denoted by . All these age values are initialized to zero at the start of the algorithm. Then, at each iteration, solutions are probabilistically generated in function ProbabilisticSolutionGeneration(); see line 6 of Algorithm 1. As mentioned above, problemspecific heuristics are generally used for this purpose. The solution components found in the constructed solutions are then added to . Next, an ILP solver is applied in function ApplyILPSolver() to find a possibly optimal solution to the restricted problem instance (see below for a more detailed description). Note that null is returned in case the ILP solver cannot find any feasible solution. If is better than the current bestsofar solution , solution is taken as the new bestsofar solution. Next, subinstance is adapted on the basis of solution in conjunction with the age values of the solution components; see function Adapt(, , ) in line 14. This is done as follows. First, the age of all solution components from that are not in is incremented. Moreover, the age of each solution component from is reinitialized to zero. Subsequently, those solution components from with an age value greater than —which is a parameter of the algorithm—are removed from . This causes that solution components that never appear in solutions derived by the ILP solver do not slow down the solver in subsequent iterations. On the other side, components which appear in the solutions returned by the ILP solver should be maintained in .
Finally, the BIP that is solved at each iteration in function ApplyILPSolver() is generated by adding the following constraints to the original BIP. For each the following is done. If only contains solution component , the additional constraint is added to the original BIP. Otherwise, if only contains solution component , the additional constraint is added to the original BIP. Nothing is added to the original BIP in case contains both solution components. Note that the ILP solver is applied with a computation time limit of CPU seconds, which is a parameter of the algorithm.
4 Generic Way of Generating Solutions for BIPs
In those cases in which the optimization problem modeled by the given BIP is not known, we need a generic way of generating solutions to the given BIP in order to be able to apply the CMSABIP algorithm described in the previous section. In the following we first describe a basic solution construction mechanims, and afterwards an alternative mechanism which uses a CP tool for increasing the probability to generate feasible solutions. The first algorithm variant is henceforth denoted as GenCmsaBip (standing for generic CMSABIP) and the second algorithm variant as Gen/CpCmsaBip (standing for generic CMSABIP with CP support).
4.1 Basic Solution Construction Mechanism
Before starting with the first CMSABIP iteration, a node heuristic of the applied ILP solver might be used in order to obtain a first feasible solution. In our case, we used the node heuristic of CPLEX. If, in this way, a feasible solution can be obtained it is stored in . Otherwise, is set to null. If, after this step, has value null, the LP relaxation of the given BIP is solved. However, in order not to spend too much computation time on this step, a computation time limit of seconds is applied. After this, the possibly optimal solution of the LP relaxation is stored in vector . Then, whenever function ProbabilisticSolutionGeneration() is called, the following is done. First, a socalled sampling vector for sampling new (possibly feasible) solutions by randomized rounding is generated. If , is generated based on and a socalled determinism rate as follows:
for all . In case , is—for all —generated on the basis of :
After generating , a possibly infeasible binary solutions is generated from by randomized rounding. Note that this is done in the order .
4.2 CP Supported Construction Mechanism
Our algorithm makes use of the Constraint Propagation (CP) engine cprop that implements ideas from [1, 17].^{2}^{2}2The used CP tool can be obtained at https://github.com/hgs/cprop. The support of CP is used in the following two ways. First, all constraints are processed and implications derived from the constraint set are detected and the problem is preprocessed to keep those variables fixed throughout the search process. Second, the solution construction mechanism changes in the following way. Instead of deriving values for the variables in the order , a random order is chosen for each solution construction. That is, at step , instead of deriving a value for variable , instead a value for variable is derived. Then, after deciding for a value for variable , the CP tool checks if this assignment will produce an infeasible solution. If this is the case, variable is fixed to the alternative value. If, again, the CP tool determines that this setting cannot lead to a feasible solution, the solution construction proceeds as described in Section 4.1, that is, finalizing the solution construction without further CP support. Otherwise—that is, if a feasible value can be chosen for the current variable—the CP might indicate possible implications consisting of further variables that, as a consequence, have to be fixed to certain values. All these implications are dealt with, before dealing with the next nonfixed variable according to .^{3}^{3}3Note that, after fixing a value for , the value of might already be fixed due to one of the implications dealt with earlier.
4.3 An Additional Algorithmic Aspect
Instead of using fixed values for CMSABIP parameters and , we implemented the following scheme. For both parameters we use a lower bound and an upper bound. At the start of CMSABIP, the values of and are set to the lower bound. Whenever an iteration improves , the values of and are set back to their respective lower bounds. Otherwise, the values of and are increased by a factor determined by substracting the lower bound value from the upper bound value and dividing the result by 5.0. Finally, whenever the value of , respectively , exceeds its upper bound, it is set back to the lower bound value. This procedure is inspired by variable neighborhood search (VNS) [16].
5 Experimental Evaluation
In the following we present an experimental evaluation of GenCmsaBip and Gen/CpCmsaBip in comparison to the standalone application of the ILP solver IBM ILOG CPLEX v12.7. Note that the same version of CPLEX was applied within both CMSA variants. Moreover, in all cases CPLEX was executed in onethreaded mode. In order to ensure a fair comparison, CPLEX was executed with two different parameter settings in the standalone mode: the default parameter settings, and with the MIP emphasis parameter set to a value of 4 (which means that the focus of CPLEX is on finding good solutions rather than on proving optimality). The default version of CPLEX is henceforth denoted by CplexOpt and the heuristic version of CPLEX by CplexHeur. All techniques were implemented in ANSI C++ (with the Concert Library of ILOG for implementing everything related to the ILP models), and using GCC 5.4.0 for compiling the software. Moreover, the experimental evaluation was performed on a cluster of PCs with Intel(R) Xeon(R) CPU 5670 CPUs of 12 nuclei of 2933 MHz and at least 40 Gigabytes of RAM.
5.1 Considered Problem Instances
The properties of the 30 selected BIPs are described in Table 1. The first 27 instances are taken from MIPLIB 2010 (http://miplib.zib.de/miplib2010.php), which is one of the bestknown libraries for integer linear programming. More specifically, the ILPs on MIPLIB are classified into three hardness categories: easy, hard, and open. From each one of the these categories we chose (more or less randomly) 9 BIPs. In addition, we selected three instances from recent applications found in the literature:

mcsp20004 is an instance of the minimum common string partition problem with input strings of length 2000 and an alphabet size of four [6]. The hardness of this instance is due to a massive amount of constraints.

rflcs20483ndiv8 is an instance of the repetitionfree longest common subsequence problem with two input strings of length 2048 and an alphabet size of 768 [5]. This instance is hard to solve due to the large number of variables.

rcjs20testS6 is an instance of the resource constraint job scheduling problem considered in [7]. Finding feasible solutions for this problem is, for general purpose ILP solvers, rather time consuming.
5.2 Parameter Setting
Both generic CMSA variants have the following parameters for which wellworking values must be found: (1) the number of solution constructions per iteration (), (2) the maximumm age of solution components (), (3) a computation time limit for solving the LP relaxation (), (4) a lower and an upper bound for the determinism rate ( (LB) and (UB)), and (5) a lower and an upper bound for the computation time limit of the ILP solver at each iteration ( (LB) and (UB)).
Concerning , it became clear during preliminary experiments that this parameter has not the same importance for GenCmsaBip and Gen/CpCmsaBip as it has for a problemspecific CMSA. In other words, while a setting of and is essentially different to a setting of and for a problemspecific CMSA, this is not the case for the generic CMSA variants. This is related to the way of constructing solutions. In a problemspecific CMSA, the greedy heuristic that is used in a probabilistic way biases the search towards a certain area of the search space. This is generally beneficial, but may have the consequence that some solution components that are needed for highquality solutions have actually a low probability to be included in the constructed solutions. A setting of provides opportunites—that is, applications of the ILP solver—to find highquality solutions that incorporate such solution components. In contrast, the way of constructing solutions in the generic CMSA variants does not produce this situtaiton. Therefore, we decided for a setting of for all further experiments. Apart from , after preliminary experiments we also fixed the following parameter values:



The lower bound of is set to 30.0 and the upper bound to 100.0
The parameter that has the strongest impact on the performance of the generic CMSA variants is . We noticed that both generic CMSA variants are quite sensitive to the setting of the lower and the upper bound for this parameter. However, in order to avoid a finetuning for each single problem instance, we decided to identify four representative parameter value configurations in order to cover the characteristics of the 30 selected problem instances. Both generic CMSA variants are then applied with all four parameter configurations to all 30 problem instances. For each problem instance we take the result of the respective best configuration as the final result (and we indicate with which configuration this result was obtained). The four utilized parameter configurations are described in Table 2.
5.3 Results
All four approaches (GenCmsaBip, Gen/CpCmsaBip, CplexOpt, and CplexHeur) were applied with a computation time limit of 1000 CPU seconds to each one of the 30 problem instances. However, as GenCmsaBip and Gen/CpCmsaBip are stochastic algorithms, they are applied 10 times to each instance, while the two CPLEX variants are applied exactly once to each instance. The numerical results are provided in Table 3, which has the following structure. The first column contains the problem instance name, and the second column provides the value of an optimal solution (if known).^{4}^{4}4Note that all considered BIPs are in standard form, that is, they must be minimized. The results of GenCmsaBip and Gen/CpCmsaBip are presented in three columns for each algorithm. The first column (with heading ’Best’) contains the best result obtained over 10 runs, the second column (with heading ’Avg.’) shows the average of the best results obtained in each of the 10 runs, and the third column indicates the configuration (out of 4) that has produced the corresponding results. Finally, the results of CplexOpt and CplexHeur are both presented in two columns. The first column shows the value of the best feasible solution produced within the allowed computation time, and the second column shows the best gap (in percent) at the end of each run. Note that a gap of 0.0 indicates that optimality was proven.
The following observations can be made:

Concerning the BIPs classified as easy (see the first nine table rows), it can be noticed that CplexHeur always generates an optimal solution, even though optimality can not be proven in two cases. Gen/CpCmsaBip–that is, the generic CPsupported CMSA variant—also produces an optimal solution in at least one out of 10 runs for all nine problem instances. However, in three cases, the algorithm fails to produce an optimal solution in all 10 runs per instance. The results of the basic generic CMSA variant (GenCmsaBip) are quite similar. However, for instance ex9 it is not able to produce any feasible solution, and for instance netdiversion the results are clearly inferior to those of Gen/CpCmsaBip. Nevertheless, the support of CP also comes with a cost. This can be seen when looking at the anytime behaviour of the algorithms as shown for six exemplary cases in Figure 1. In particular, Gen/CpCmsaBip is often not converging as fast to good solutions as GenCmsaBip.

The increased difficulty of the instances labelled as hard (see table rows 1018), produces more differences between the four approaches. In fact, sometimes one of the CPLEX variants is clearly better than the two CMSA versions (see, for example, instance ivu52), and sometimes the generic CMSA variants outperform the CPLEX versions (such as, for example, for instance opm2z12s14). As the CPsupport is more costly for these instances, the results of GenCmsaBip are generally a bit better than those of Gen/CpCmsaBip. The effect of the increased cost of the CP support can also nicely be observed in the anytime behaviour of the algorithms for two hard instances in Figure (c)c and Figure (b)b.

In the context of the nine open instances, the generic CMSA variants clearly outperform the standalone application of CPLEX, with the exception of instance t1722. The same holds for the three additional, difficult problem instances (last three table rows). Note, especially, that for instance rcjs20testS6 the support of CP pays off again, as it is difficult to find feasible solutions for this instance.
Summarizing, with increasing problem size/difficulty, the advantage of the generic CMSA variants over the standalone application of CPLEX becomes more and more pronounced. However, as the cost of the CP support also increases with growing problem size, Gen/CpCmsaBip is only able to outperform GenCmsaBip when finding feasible solutions is really difficult. However, we noticed that the CP support has also an additional effect, which is shown in the graphics of Figure 2. Each boxplot shows the final results (obtained by 10 runs per instance) of each of the four parameter configurations for both generic CMSA variants. Interestingly, the use of CP during solution construction flattens out the quality differences between the four parameter configurations. This can be seen in all three boxplots. In Figure (b)b, for example, the results of GenCmsaBip are good with configurations 3 and 4, while they are significantly worse with configurations 1 and 2. In contrast, the results of Gen/CpCmsaBip, while also being best with configurations 3 and 4, are only slightly worse with configurations 1 and 2. In that sense, the CPsupport makes the algorithm more robust with respect to the parameter setting.
6 Conclusions
In this work, we developed a problemagnostic CMSA algorithm for solving binary linear integer programs. The main challenge was on constructing solutions to unknown problems in such a way that feasibility is quickly reached. For this purpose, in addition to the basic algorithm, we developed an algorithm variant that makes use of a constraint programming tool. Concerning the results, we were able to observe that the use of the constraint programming tool pays off for those instances for which it reaching feasiblity is rather difficult. In general, with growing problem size and/or difficulty, both CMSA variants have an increasing advantage over the standalone appliaction of the ILP solver CPLEX. In a sense, our generic CMSA can be seen as a better way of using an ILP solver in many cases. Concerning future work, we plan to extend our work towards general ILPs. Moreover, we plan to work on a mechanism for automatically adjusting the algorithm parameters at the start of a run.
References
 [1] Tobias Achterberg. Constraint Integer Programming’. PhD thesis, 2007.
 [2] David L. Applegate, Robert E. Bixby, Vasek Chvátal, and William J. Cook. The traveling salesman problem: a computational study. Princeton university press, 2007.
 [3] Thierry Benoist, Bertrand Estellon, Frédéric Gardi, Romain Megel, and Karim Nouioua. Localsolver 1.x: a blackbox localsearch solver for 01 programming. 4OR, 9(3):299, 2011.
 [4] Christian Blum. Construct, merge, solve and adapt: Application to unbalanced minimum common string partition. In Maria J. Blesa, Christian Blum, Angelo Cangelosi, Vincenzo Cutello, Alessandro Di Nuovo, Mario Pavone, and ElGhazali Talbi, editors, Proceedings of HM 2016 – 10th International Workshop on Hybrid Metaheuristics, volume 9668 of Lecture Notes in Computer Science, pages 17–31. Springer International Publishing, 2016.
 [5] Christian Blum and Maria J. Blesa. A comprehensive comparison of metaheuristics for the repetitionfree longest common subsequence problem. Journal of Heuristics, 2017. In press.
 [6] Christian Blum, Pedro Pinacho, Manuel LópezIbáñez, and José A. Lozano. Construct, merge, solve & adapt: A new general algorithm for combinatorial optimization. Computers & Operations Research, 68:75–88, 2016.
 [7] Andreas T. Ernst and Gaurav Singh. Lagrangian particle swarm optimization for a resource constrained machine scheduling problem. In 2012 IEEE Congress on Evolutionary Computation, pages 1–8, 2012.
 [8] Matteo Fischetti, Fred Glover, and Andrea Lodi. The feasibility pump. Mathematical Programming, 104(1):91–104, 2005.
 [9] Matteo Fischetti and Domenico Salvagnin. Feasibility pump 2.0. Mathematical Programming Computation, 1(2):201–222, 2009.
 [10] Robert Fourer, David Gay, and Brian Kernighan. AMPL, volume 117. Boyd & Fraser Danvers, MA, 1993.
 [11] Gerald Gamrath, Thorsten Koch, Alexander Martin, Matthias Miltenberger, and Dieter Weninger. Progress in Presolving for Mixed Integer Programming. Mathematical Programming Computation, 7:367–398, 2015.
 [12] E.L. Johnson, G. Nemhauser, and W.P. Savelsbergh. Progress in Linear ProgrammingBased Algorithms for Integer Programming: An Exposition. INFORMS Journal on Computing, 12, 2000.
 [13] Thorsten Koch, Tobias Achterberg, Erling Andersen, Oliver Bastert, Timo Berthold, Robert E Bixby, Emilie Danna, Gerald Gamrath, Ambros M Gleixner, Stefan Heinz, et al. Miplib 2010. Mathematical Programming Computation, 3(2):103, 2011.
 [14] Gary Kochenberger, JinKao Hao, Fred Glover, Mark Lewis, Zhipeng Lü, Haibo Wang, and Yang Wang. The unconstrained binary quadratic programming problem: a survey. Journal of Combinatorial Optimization, 28(1):58–81, 2014.
 [15] Evelia Lizárraga, Maria J. Blesa, and Christian Blum. Construct, merge, solve and adapt versus large neighborhood search for solving the multidimensional knapsack problem: Which one works better when? In Bin Hu and Manuel LópezIbáñez, editors, Proceedings of EvoCOP 2017 – 17th European Conference on Evolutionary Computation in Combinatorial Optimization, volume 10197 of Lecture Notes in Computer Science, pages 60–74. Springer International Publishing, 2017.
 [16] Nenad Mladenović and Pierre Hansen. Variable neighborhood search. Computers & Operations Research, 24(11):1097–1100, 1997.
 [17] Tuomas Sandholm and Rob Shields. Nogood Learning for Mixed Integer Programming. Technical report, 2006.
 [18] Samuel Souza Brito, Haroldo Gambini Santos, and Bruno Henrique Miranda Santos. A local search approach for binary programming: Feasibility search. In Maria J. Blesa, Christian Blum, and Stefan Voß, editors, Proceedings of HM 2014 – 9th International Workshop on Hybrid Metaheuristics, volume 8457 of Lecture Notes in Computer Science, pages 45–55. Springer International Publishing, 2014.
 [19] Jinbo Xy, Ming Li, Dongsup Kim, and Ying Xu. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology, 1(1):95–117, 2003.