RLT2based Parallel Algorithms for Solving Large Quadratic Assignment Problems on Graphics Processing Unit Clusters
Abstract.
This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the Quadratic Assignment Problem (QAP). Our parallel architecture is comprised of both multicore processors and Compute Unified Device Architecture (CUDA) enabled NVIDIA Graphics Processing Units (GPUs) on the Blue Waters Supercomputing Facility at the University of Illinois at UrbanaChampaign. We propose novel parallelization of the Lagrangian Dual Ascent algorithm on the GPUs, which is used for solving a QAP formulation based on Level2 Refactorization Linearization Technique (RLT2). The Linear Assignment subproblems (LAPs) in this procedure are solved using our accelerated Hungarian algorithm [Date, Ketan, Rakesh Nagi. 2016. GPUaccelerated Hungarian algorithms for the Linear Assignment Problem. Parallel Computing 57 5272]. We embed this accelerated dual ascent algorithm in a parallel branchandbound scheme and conduct extensive computational experiments on single and multiple GPUs, using problem instances with up to 42 facilities from the QAPLIB. The experiments suggest that our GPUbased approach is scalable and it can be used to obtain tight lower bounds on large QAP instances. Our accelerated branchandbound scheme is able to comfortably solve Nugent and Taillard instances (up to 30 facilities) from the QAPLIB, using modest number of GPUs.
Key words and phrases:
Quadratic Assignment Problem; Linear Assignment Problem; Branchandbound; Parallel Algorithm; Graphics Processing Unit; CUDA; RLT2.1. Introduction
The Quadratic Assignment Problem (QAP) is one of the oldest mathematical problems in the literature and it has received substantial attention from the researchers around the world. QAP was originally introduced by Koopmans and Beckmann (1957) as a mathematical model to locate indivisible economical activities (such as facilities) on a set of locations and the cost of the assignment is a function of both distance and flow. The objective is to assign each facility to a location so as to minimize a quadratic cost function. The generalized mathematical formulation for the QAP, given by Lawler (1963), can be written as follows:
(1.1)  
s.t.  (1.2)  
(1.3)  
(1.4) 
The decision variable , if facility is assigned to location and 0 otherwise. Constraints (1.2) and (1.3) enforce that each facility should be assigned to exactly one location and each location should be assigned to exactly one facility. is the fixed cost of assigning facility to location ; is the flow between the pair of facilities and ; and is the distance between the pair of locations and .
Despite having the same constraint set as the Linear Assignment Problem (LAP), the QAP is a strongly NPhard problem (Sahni and Gonzalez, 1976), i.e., it cannot be solved efficiently within a guaranteed time limit. Additionally, it is difficult to find a provable optimal solution to the QAP. The quadratic nature of the objective function also adds to the solution complexity. One of the ways of solving the QAP is to convert it into a Mixed Integer Linear Program (MILP) by introducing additional variables and constraints. Different linearizations have been proposed by Lawler (1963), Kaufman and Broeckx (1978), Frieze and Yadegar (1983) and Adams and Johnson (1994). Table 1 presents a comparison of these various linearizations in terms of number of variables and constraints. Many formulations and algorithms have been developed over the years for solving the QAP optimally or suboptimally. For a list of references on the QAP, readers are directed to the survey papers by Burkard (2002) and Loiola et al. (2007).
Linearization Model  Binary Variables  Continuous Variables  Constraints 

Lawler (1963)  –  
Kaufman and Broeckx (1978)  
Frieze and Yadegar (1983)  
Adams and Johnson (1994) RLT1  
Adams et al. (2007) RLT2  
Hahn et al. (2012) RLT3 
The main advantage of formulating the QAP as an MILP is that we can relax the integrality restrictions on the variables and solve the resulting linear program. The objective function value obtained from this LP solution can be used as a lower bound in the exact solution methods such as branchandbound. The most promising formulation was obtained by Adams and Johnson (1994) by applying level1 refactorization and linearization technique (RLT1) to the QAP. This was considered to be one of the best linearizations at the time, because it yielded strong LP relaxation bound. Adams and Johnson (1994) developed an iterative algorithm based on the Lagrangian dual ascent to obtain a lower bound for the QAP. Later Hahn and Grant (1998) developed an augmented dual ascent scheme (with simulated annealing), which yielded a lower bound which was close to the LP relaxation bound. This linearization technique was extended to RLT2 by Adams et al. (2007), which contains variables and constraints; and RLT3 by Hahn et al. (2012), which contains variables and constraints. These two formulations provide much stronger lower bounds as compared to RLT1, and for many problem instances they are able to match the optimal objective value of the QAP. However, it is extremely difficult to solve these linearization models using primal methods, because of the curse of dimensionality. Ramakrishnan et al. (2002) used Approximate Dual Projective (ADP) method to solve the LP relaxation of the RLT2 formulation of Ramachandran and Pekny (1996), which was limited to the problems with size . Adams et al. (2007) and Hahn et al. (2012) developed a dual ascent based algorithms to find strong lower bounds on RLT2 and RLT3 respectively, and used them to solve QAPs with . As observed by Hahn et al. (2012), LP relaxations of RLT2 and RLT3 provide strong lower bounds on the QAP, with RLT3 being the strongest. However, due to the large number of variables and constraints in RLT3, tremendous amount of memory is required to handle even the small/mediumsized QAP instances. In comparison, the RLT2 formulation has lesser memory requirements and it provides sufficiently strong lower bounds.
For obtaining a lower bound on the QAP using RLT2 dual ascent, we need to solve LAPs and update Lagrangian multipliers during each iteration, which can become computationally intensive. However, as described in Section 2, this algorithm can benefit from parallelization on an appropriate parallel computing architecture. In recent years, there have been significant advancements in the graphics processing hardware. Since graphics processing tasks generally require high data parallelism, the Graphics Processing Units (GPUs) are built as computeintensive, massively parallel machines, which provide a costeffective solution for high performance computing applications. Recently Gonçalves et al. (2017) developed a GPUbased dual ascent algorithm for RLT2, which shows significant parallel speed up as compared to the sequential algorithm. Although it is very promising, their algorithm is limited to a single GPU and not scalable for large problems. In our previous work (Date and Nagi, 2016), we developed a GPUaccelerated Hungarian algorithm, which was shown to outperform stateoftheart sequential and multithreaded CPU implementations. In this work, we are proposing a distributed version of the RLT2 Dual Ascent algorithm (which makes use of our GPUaccelerated LAP solver) and a parallel branchand bound algorithm, specifically designed for the CUDA enabled NVIDIA GPUs, for solving large instances of the QAP to optimality. These algorithms make use of the hybrid MPI+CUDA architecture, on the GPU cluster offered by the Blue Waters Supercomputing facility at the University of Illinois at UrbanaChampaign. This research is radical because, to the best of our knowledge, this is the first scalable GPUbased algorithm that can be used for solving large QAPs in a grid setting.
The rest of the paper is organized as follows. Section 2 describes the RLT2 formulation and the concepts of the sequential dual ascent algorithm. Sections 3 and 4 describe the various stages of our parallel algorithm, and an implementation on the multiGPU architecture. Section 5 contains the experimental results on the instances from the QAPLIB (Burkard et al., 1997). Finally, the paper is concluded in Section 6 with a summary and some directions for future research.
2. RLT2 Formulation and Dual Ascent Algorithm
In this section we will explain in detail the concepts of RLT2 formulation and the dual ascent algorithm from Adams and Johnson (1994) and Adams et al. (2007), which can be used to obtain lower bounds on the QAP.
2.1. RLT2 Formulation for the QAP
As explained by Adams et al. (2007), the refactorizationlinearization technique can be applied to the QAP formulation (1.1)(1.4), to obtain an instance of MILP. Henceforth, it is assumed that the indices , , , , etc., go from to unless otherwise stated. Initially, in the “refactorization” step, the constraints (1.2) and (1.3) are multiplied by variables . After removing the invalid variables of the form and omitting the trivial constraints we obtain new constraints of the form ; and . Then, in the “linearization” step, the product is replaced by a new variable with cost coefficient ; and a set of constraints of the form are introduced to signify the symmetry of multiplication. The resulting formulation is called RLT1 by Adams et al. (2007), which is depicted below:
(2.1)  
s.t.  
(2.2)  
(2.3)  
(2.4)  
(2.5) 
Result 1.
The RLT1 formulation is equivalent to the QAP, i.e., a feasible solution to RLT1 is also feasible to the QAP with the same objective function value (Adams and Johnson, 1994).
Now the refactorizationlinearization technique is applied on the RLT1 formulation. During the refactorization step, the constraints (2.2)(2.5) are multiplied by variables . The product is replaced by a new variable , with a cost coefficient of . The resulting RLT2 formulation is depicted below:
(2.6)  
s.t.  
(2.7)  
(2.8)  
(2.9)  
(2.10) 
Result 2.
The RLT2 formulation is equivalent to the QAP, i.e., a feasible solution to RLT2 is also feasible to the QAP with the same objective function value (Adams et al., 2007).
The main advantage of using RLT2 formulation is that its LP relaxation (LPRLT2) obtained by relaxing the binary restrictions on yields much stronger lower bounds than any other linearization from the literature. However, since this formulation has a large number of variables and constraints, primal methods are likely to fail for large QAPs (as observed by Ramakrishnan et al. (2002). Adams and Johnson (1994) and Adams et al. (2007) addressed this problem by developing a solution procedure based on Lagrangian dual ascent. In the next sections we will briefly discuss Lagrangian duality and then explain the Lagrangian dual ascent algorithm for RLT2.
2.2. Lagrangian Duality
Duality is an important concept in the theory of optimization. The Primal problem (P) and its Dual (D) share a very special relationship, known as the “weak duality.” If represents the objective function of a problem, then the weak duality states that for minimization problem. Many algorithms make use of this relationship, in cases where one of these problems is easier to solve than its counterpart. The basis of these constructive dual techniques is the Lagrangian relaxation. Let us consider the following simple optimization problem:
(2.11) 
Here, represent the complicating constraints and represent simple constraints. Then we can relax the complicating constraints and add them to the objective function using nonnegative Lagrange multipliers , which gives rise to the following Lagrangian relaxation:
(2.12) 
For any , provides a lower bound on , i.e., . To find the best possible lower bound, we solve the Lagrangian dual problem . Hence, the primary goal in these solution procedures is to systematically search for the Lagrange multipliers which maximize the objective function value of the Lagrangian dual problem. The following two solution procedures are most commonly used for obtaining these dual multipliers.
Subgradient Lagrangian Search.
The subgradient search method operates on two important observations: (1) is a piecewiselinear concave function of ; and (2) At some point , if is a solution to LRP(), then represents a valid subgradient of the function . The subgradient search procedure is very similar to the standard gradient ascent procedure, where we advance along the (sub)gradients of the objective function until we reach some solution that is no longer improving. At that point, we calculate the new (sub)gradients and continue. The only disadvantage of using subgradients instead of the gradient is that it is difficult to characterize an accurate stepsize which is valid for all the active subgradients. Therefore, taking an arbitrary step along the subgradients might worsen the objective function from time to time. However, for specific stepsize rules, it is proved that the procedure converges to the optimal solution asymptotically.
Lagrangian Dual Ascent.
Instead of using the subgradients in a naive fashion, they can be used to precisely figure out both the ascent direction and the stepsize that gives us the “best” possible improvement in the objective function . This is the crux of the dual ascent procedure. During each iteration of the dual ascent procedure, an optimization problem is solved to find a direction for some dual solution , which creates a positive inner product with every subgradient of , i.e., . If no such direction is found, then the solution and corresponding is an optimal solution. Otherwise, the “best” stepsize is established which gives the maximum improvement in the objective function along to find a new dual solution. The most difficult part of the dual ascent algorithm is to find the stepsize that will provide a guaranteed ascent, while maintaining the feasibility of all the previous primal solutions, and more often than not, finding the optimal stepsize is an NPhard problem. However, the salient feature of the Lagrangian dual of RLT2 linearization is that the improving direction and stepsize can be found without having to solve any optimization problem. This can be achieved by doing simple sensitivity analysis and maintaining the complementary slackness for the nonbasic , and variables in the corresponding LAPs. In the next section, we will discuss the features of the RLT2 linearization and its Lagrangian dual.
2.3. Sequential RLT2DA Algorithm
Let us consider the LP relaxation of the RLT2 formulation. Initially, the constraints (2.4) and (2.9) are relaxed and added to the objective function using the Lagrange multipliers , to obtain the Lagrangian relaxation LRLT2. Let represent the dual variables corresponding to the constraints (1.2), (1.3), (2.2), (2.3), (2.7), (2.8) respectively. Then for some fixed , the Lagrangian relaxation and its dual can be written as follows.
(2.13)  
s.t.  
(2.14) 
(2.15)  
s.t.  (2.16)  
(2.17)  
(2.18)  
(2.19) 
LAP Solution.
The problem can be solved using the decomposition principle explained by Adams et al. (2007). To maximize the dual objective function (2.15) with respect to constraints (2.16), we need large values of and , for which the term needs to be maximized subject to constraints (2.17). This requires large values of and , for which, the term needs to be maximized with respect to constraints (2.18). Thus we have a three stage problem, as seen in Fig. 1.
In the first stage, for each , with and , we need to solve the problems:
(2.20) 
which are nothing but ZLAPs in their dual form (with modified cost coefficients). In the second stage, for each , we need to solve the problems:
(2.21) 
which are nothing but YLAPs in their dual form (with modified cost coefficients). In the final stage, we need to solve a single XLAP (with modified cost coefficients):
(2.22) 
which gives us the required lower bound on RLT2. We can see that there are LAPs and the number of cost coefficients in each LAP is . The worst case complexity of any primaldual LAP algorithm for an input matrix with cost coefficients, is . Therefore, the overall solution complexity for solving DLRLT2() is .
Dual Ascent.
The Lagrangian dual problem for LRLT2 is to find the best set of multipliers , so as to maximize the objective function value , i.e.,
(2.23) 
Since LRLT2() exhibits integrality property, due to the theorem by Geoffrion (1974), the objective function value cannot exceed the linear programming relaxation bound , obtained by relaxing the binary restrictions (1.4). Therefore, we can assert the following inequality:
(2.24) 
To solve LDRLT2, one could employ a standard dual ascent algorithm. However, for LDRLT2, finding an ascent direction and a stepsize can be done relatively easily, without having to solve any optimization problem. To this end, we will now describe the principle behind the Lagrangian dual ascent for LDRLT2.

Let denote the reduced cost (or dual slack) of an LAP variable. Then, for some variable in an optimal LAP solution,
(2.25) 
From Equation (2.9), we know that for any , the variable is one of the six “symmetrical” variables appearing in that particular constraint, and in an optimal QAP solution, the values of all the six variables should be the same.

If some and one of its symmetrical variables , then for the constraint , the direction provides a natural direction of ascent for , because it is a valid subgradient of LRLT2. To obtain a new dual solution, a step may be taken along this direction, i.e., may be increased (i.e., may be decreased) and may be decreased (i.e., may be increased), using a valid stepsize.

While determining the stepsize, the most important criterion is that the feasibility of the current dual variables must be maintained. According the constraint (2.18), infeasibility is incurred in the dual space if the new . This means that is allowed to decrease by at most , and consequently, the symmetrical cost coefficient can be increased by the same amount. Since is basic, there is a good chance that this adjustment will increase by some nonnegative value, and therefore, this is a “strong” direction of ascent.

For some other pair of variables, if , then the direction is also a valid direction, i.e., the cost coefficient can be decreased by at most and can be increased by the same amount, without incurring any dual infeasibility. Since both variables are nonbasic, there will be no change in . This direction is a “weak” direction of ascent.

In an “optimal” dual ascent scheme, we would need to find ascent directions which will be “strong” for every pairwise constraint from Equation (2.9), and finding such directions would require significant computational effort. However, we can easily find a direction that is “strong” only for a subset of pairwise constraints, which may provide a nonnegative increase in . In other words, we can select a nonbasic variable , decrease its cost coefficient by some amount and increase the cost coefficients of the five symmetrical variables by some fraction of . If some of the directions happen to be “strong,” then the objective function will experience nonnegative increase, otherwise it will remain the same. This is the crux of the dual ascent procedure. Mathematically, we adjust the dual multipliers using the rule:
(2.26) Here, , , and . Similarly, for updating the dual multipliers corresponding to constraints (2.4), we can write the following rule.
(2.27) Here, . We refer to it as “Type 1 ascent rule.”

Now, let us consider the constraint (2.17). After applying Type 1 rule and solving the corresponding ZLAP(s); for some , it is possible that , i.e., . In this case, can be decreased by , by decreasing the cost coefficients by an amount . This allows us to increase the cost coefficients of the symmetrical variables, providing the objective functions of the corresponding LAPs a chance to grow. Mathematically, we adjust the dual multipliers using the rule:
(2.28) Here, , , and . We refer to it as “Type 2 ascent rule.”

Finally, we can use a similar rule for constraint (2.16). For some , if , we can decrease the cost coefficients , by an amount of . This is equivalent to decreasing the cost coefficients by an amount . Consequently, we can increase the cost coefficients of the symmetrical variables, potentially improving the objective function value of the corresponding LAPs. Mathematically, we adjust the dual multipliers using the rule:
(2.29) Here, , , and . We refer to it as “Type 3 ascent rule.”

We can also implement a “Type 4 ascent rule,” in which we can generate two fractions and such that . Then we decrease the current lower bound by the fraction , which is equivalent to decreasing the cost coefficients by and cost coefficients by . This is equivalent to decreasing the corresponding cost coefficients by an amount ; and by an amount . Consequently, we can increase the cost coefficients of the symmetrical variables, potentially improving the objective function values of the corresponding LAPs. This step deteriorates the current lower bound, however, the resulting redistribution provides a much greater increase in . This step can be implemented in the same spirit as the Simulated Annealing (SA) approach with a specific temperature schedule. Hahn and Grant (1998) reported stronger lower bounds for SA based dual ascent for RLT1, as compared to those of the naive dual ascent of Adams and Johnson (1994). Although it was not mentioned explicitly, we suspect that this approach was also used in dual ascent for RLT2 by Adams et al. (2007). In Section 5, we compare the lower bounds for various problems, with and without SA.

The overall stepsize rule for Lagrangian dual ascent is a combination of the four rules discussed above. The solution complexity of the dual ascent phase is , which is same as the upper bound on the number of cost coefficients.
Procedure RLT2DA.
Once the dual multipliers are updated, the LAPs need to be resolved to obtain an improved , which is also a lower bound on the QAP. Thus the RLT2DA procedure iterates between the LAP solution phase and the dual ascent phase, until a specified optimality gap has been achieved; a specified iteration limit has been reached; or a feasible solution to the QAP has been found. The steps of RLT2DA procedure are depicted in Algorithm 1.
Feasibility Check.
To check whether the primaldual feasibility has been achieved or not, the complementary slackness principle can be used. After solving the XLAP and obtaining a primal solution , we construct feasible and vectors; and check whether the dual slack values and corresponding to this primal solution are compliant with Equation (2.25). If this is true, then a feasible solution has been found, which also happens to be optimal to the QAP. Otherwise we continue to update the dual multipliers and resolve LRLT2.
Algorithm Correctness.
For the sake of completeness, we will now prove that the RLT2DA provides us with a sequence of nondecreasing lower bounds on the QAP. This result has been adapted from the result by Adams and Johnson (1994).
Theorem 1.
Given the input parameters , , and , the RLT2DA provides a nondecreasing sequence of lower bounds.
Proof.
Let us consider LRLT2 at some iterations and , with the corresponding dual multipliers and . To prove the theorem we need to show that . Consider the following dual of . Note that we have not shown the conditions on the indices for the sake of brevity.
(2.30)  
s.t.  (2.31)  
(2.32)  
(2.33)  
(2.34) 
Let us assume, without the loss of generality, that . We can substitute in Equation (2.33) with the following expression arising from the three dual ascent rules (6), (7), and (8).
(2.35) 
where, represents the sum of fractional slacks , , and , of the five symmetrical variables of , as given by the rules (6)–(8). After substituting Equation (2.35) in Equation (2.33) and rearranging the terms, we obtain the following constraint:
(2.36) 
After replacing Equation (2.33) with Equation (2.36), and aggregating the and terms, we can write the following expression:
(2.37) 
where, represents the constraint set:
(2.38)  
(2.39)  
(2.40)  
(2.41) 
If we split the constraint set into two constraint sets and , such that,
(2.42)  
(2.43)  
(2.44)  
(2.45) 