# Between Steps: Intermediate Relaxations between big-M and Convex Hull Formulations

## Abstract

This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages from both. The proposed “-split” formulations split convex additively separable constraints into partitions and form the convex hull of the partitioned disjuncts. Parameter represents the trade-off of model size vs. relaxation strength. We examine the novel formulations and prove that, under certain assumptions, the relaxations form a hierarchy starting from a big-M equivalent and converging to the convex hull. We computationally compare the proposed formulations to big-M and convex hull formulations on a test set including: K-means clustering, P_ball problems, and ReLU neural networks. The computational results show that the intermediate -split formulations can form strong outer approximations of the convex hull with fewer variables and constraints than the extended convex hull formulations, giving significant computational advantages over both the big-M and convex hull.

###### Keywords:

Disjunctive programming Relaxation comparison Formulations Mixed-integer programming Convex MINLP## 1 Introduction

There are well-known trade-offs between the big-M and convex hull relaxations of disjunctions in terms of problem size and relaxation tightness. Convex hull formulations [4, 6, 9, 16, 20, 36] provide a sharp formulation for a single disjunction, i.e., the continuous relaxation provides the best possible lower bound. The convex hull is often represented by so-called extended (a.k.a. perspective/multiple-choice) formulations [5, 7, 11, 14, 15, 17, 38], which introduce multiple copies of each variable in the disjunction(s). On the other hand, the big-M formulation only introduces one binary variable for each disjunct and results in a smaller problem in terms of both number of variables and constraints; however, in general it provides a weaker relaxation than the convex hull and may require a solver to explore significantly more nodes in a branch-and-bound tree [10, 38]. Even though the big-M formulation is weaker, in some cases it can computationally outperform extended convex hull formulations, as the simpler subproblems can offset the larger number of explored nodes. Anderson et al. [1] describe a folklore observation in mixed-integer programming (MIP) that extended convex hull formulations tend to perform worse than expected. The observation is supported by the numerical results in Anderson et al. [1] and in this paper.

This paper presents a framework for generating formulations for disjunctions between the big-M and convex hull with the intention of combining the best of both worlds: a tight, yet computationally efficient, formulation. The main idea behind the novel formulations is partitioning the constraints of each disjunct and moving most of the variables out of the disjunction. Forming the convex hull of the resulting disjunctions results in a smaller problem, while retaining some features of the convex hull. We call the new formulation the -split, as the constraints are split into parts. While many efforts have been devoted to computationally efficient convex hull formulations [3, 11, 19, 33, 37, 39, 40, 41] and techniques for deriving the convex hull of MIP problems [2, 22, 25, 31, 35], our primary goal is not to generate the convex hull. Rather, we provide a straightforward framework for generating a family of relaxations that approximate the convex hull for a general class of disjunctions using a smaller problem formulation. Our experiments show that the -split formulations can give a significant computational advantage over both the big-M and convex hull formulations.

This paper is organized as follows: the -split formulation is presented in Section 2, together with properties of the -split relaxations and how they compare to the big-M and convex hull relaxations. We also present a non-extended realization of the -split formulation for the special case of a two-term disjunction. Finally, a numerical comparison of the formulations is presented in Section 3 using both instances with linear and nonlinear disjunctions.

### 1.1 Background

We consider optimization problems containing disjunctions of the form

(1) | ||||

where contains the indices of the disjuncts, the indices of the constraints in disjunct , and is a convex compact set. This paper assumes the following:

###### Assumption 1

The functions are convex additively separable functions, i.e., where are convex functions, and each disjunct is non-empty on .

###### Assumption 2

All functions are bounded over .

###### Assumption 3

Each disjunct contains far fewer constraints than the number of variables in the disjunction, i.e., .

The first two assumptions are needed for the -split formulation to be valid and result in a convex MIP. While the first assumption simplifies our analysis of -split formulations, it could easily be relaxed to partially additively separable functions. Furthermore, the computational experiments only consider problems with linear or quadratic constraints, which ensures that the convex hull of the disjunction is representable by a polyhedron or (rotated) second-order cone constraints [6]. Assumption 3 characterizes problem structures favorable for the presented formulations. Problems with such a structure include, e.g., clustering [28, 32], mixed-integer classification [24, 30], optimization over trained neural networks [1, 8, 12, 13, 34], and coverage optimization [18].

## 2 Relaxations between convex hull and big-M

The formulations in this section apply to disjunctions with multiple constraints per disjunct. However, to simplify the derivation, we only consider disjunctions with one constraint per disjunct, i.e., . The extension to multiple constraints per disjunct simply applies the splitting procedure to each constraint.

To derive the new formulations, we partition the variables into sets and form the corresponding index sets . The constraint for each disjunct is then split into constraints, by introducing auxiliary variables

(2) |

By Assumption 2, function is bounded on , and bounds on the auxiliary variables are given by

(3) |

The -split formulation does not require tight bounds, but weak bounds result in an overall weaker relaxation.

The splitting creates a lifted formulation by introducing auxiliary variables. Both formulations in (2) have the same feasible set in the variables. We relax the disjunction by treating the splitted constraints as global constraints

(4) | |||||

Lemma 1 relates the -split representation to the original disjunction. The property is rather simple, but for completeness we have stated it as a lemma.

###### Lemma 1

The feasible set of -split representation projected onto the -space is equal to the feasible set of the original disjunctions in (2).

###### Proof

Using the extended formulation [4] to represent the convex hull of the disjunction in (4) results in the -split formulation

(-split) | ||||||

which forms a convex MIP problem. To clarify our terminology: a 2-split formulation is a formulation (-split) where the constraints of the original disjunction are split up into two parts, i.e., . We assume that the disjunction is part of a larger optimization problem that may contain multiple disjunctions. Therefore, we need to enforce integrality on the variables even if we recover the convex hull of the disjunction. Proposition 1 shows the correctness of the the (-split) formulation of the original disjunction.

###### Proposition 1

###### Proof

Proposition 1 states that the -split formulation is correct for integer feasible solutions, but it does not give any insight on the quality of the continuous relaxation. The following subsections further analyze the properties of the (-split) formulation and its relation to the big-M and convex hull formulations.

###### Remark 1

A (-split) formulation introduces continuous and binary variables. Unlike the extended convex hull formulation (which introduces continuous and binary variables), the number of \sayextra variables is independent of , i.e., the number of variables in the original disjunction. As we later show, there are applications where for which (-split) formulations can be smaller and computationally more tractable than the extended convex hull formulation.

### 2.1 Properties of the -Split formulation

This section focuses on the strength of the continuous relaxation of the -split formulation, and how it compares to convex hull and big-M formulations. To simplify the analyses, we only consider disjunctions with a single constraint per disjunct. However, the results directly extend to the case of multiple constraints per disjunct by applying the same procedure to each individual constraint.

We first analyze the 1-split, as summarized in the following theorem.

###### Theorem 2.1

The 1-split formulation is equivalent to the big-M formulation.

###### Proof

We eliminate the disaggregated variables from the 1-split formulation using Fourier-Motzkin elimination. Furthermore, we eliminate trivially redundant constraints, e.g., , resulting in

(5) | ||||||

The auxiliary variables are removed by combining the first and second constraints in (5). The smallest valid big-M coefficients are , which enables us to write (5) as

(6) | ||||||

∎

Since the 1-split formulation introduces auxiliary variables, but has the same continuous relaxation as the big-M formulation, there are no clear advantages of the 1-split formulation vs the big-M formulation.

We now examine the other extreme, where constraints are fully disaggregated, i.e., the -split. Its relation to the convex hull is given in the following theorem.

###### Theorem 2.2

If all are affine functions, then the -split formulation (where constraints are split for each variable) provides the convex hull of the disjunction.

###### Proof

In the linear case, the original disjunction is given by

(7) |

and the -split formulation can be written compactly as

(8) |

The -split formulation is given by the convex hull of (8) through the extended formulation. Here, defines a bijective mapping between the and variable spaces (only true for an -split). A reverse mapping is given by . The linear transformations preserve an exact representation of the feasible sets, i.e.,

(9) |

For any point in the the convex hull of (8) and

(10) | |||

Applying the reverse mapping to (10) gives

(11) |

By construction, . The point is given by a convex combination of points that all satisfy the constraints of one of the disjuncts in (7) and, therefore, belongs to the convex hull of (7). The same technique easily shows that any point in the convex hull of disjunction (7) also belongs to the convex hull of disjunction (8).∎

Theorem 2 does not hold with nonlinear functions, since the mapping may not be bijective or a homomorphism. In general, the -split formulation will not obtain the convex hull of nonlinear disjunctions, as Section 2.2 shows by example, but it can provide a strong outer approximation.

#### Two-term disjunctions

We further analyze the special case of a two-term disjunction for which we also present a non-lifted -split formulation in the following theorem.

###### Theorem 2.3

For a two-term disjunction, the -split formulation has the following non-extended realization

(12) | ||||||

where .

###### Proof

The equality constraints for the disaggregated variables () enable us to easily eliminate the variables from (-split), resulting in

(13) | ||||

(14) | ||||

(15) | ||||

(16) | ||||

(17) | ||||

(18) | ||||

(19) |

Next, we use Fourier-Motzkin elimination to project out the variables. Combining the constraints in (15) and (16) only results in trivially redundant constraints, e.g., . Eliminating the first variable creates two new constraints by combining (13) with (15)–(16). The first constraint is obtained by removing and from (13) and adding to the left-hand side. The second constraint is obtained by removing from (13) and subtracting from the left-hand side. Eliminating the next variable is done by repeating the procedure of combining the two new constraints with the corresponding inequalities in (15)–(16). Each elimination step doubles the number of constraints originating from inequality (13). Eliminating all the variables and results in the first set of constraints

(20) |

The variables and are eliminated by same steps, resulting in the second set of constraints in (12). ∎

To further analyze the tightness of different -split relaxations we require that the bounds on the auxiliary variables be independent, as defined below:

###### Definition 2

We say that the upper and lower bounds for the constraint

are independent on if

(21) | ||||

hold for all .

Independent bounds are not restricted to linear constraints, but the most general case of independent bounds are linear disjunctions with defined as a box. Independent bounds enable us to establish a strict relation on the tightness of different -split formulations, which is presented in the next corollary.

###### Corollary 1

For a two-term disjunction with independent bounds, a -split formulation, obtained by splitting one variable group in the -split, is always as tight or tighter than the corresponding P-split formulation.

###### Proof

The non-extended formulation (12) for the -split comprises the constraints in the -split formulation and some additional constraints. ∎

From Corollary 1 it follows that the -split formulations represent a hierarchy of relaxations, and we formally state this property in the following corollary.

###### Corollary 2

For a linear two-term disjunction the P-split formulations form a hierarchy of relaxations, starting from the big-M relaxation () and converging to the convex hull relaxation ().

###### Proof

Theorems 1 and 2 give equivalence to big-M and convex hull. By Corollary 1, the -split is as tight or tighter than the -split relaxation. ∎

### 2.2 Illustrative example

To see the differences between -split formulations, consider the disjunction

(ex-1) | ||||

The tightest valid bounds on all the auxiliary variables are given by

(22) |

These bounds are derived from the fact that one of the two constraints in the disjunction must hold, and are symmetric for the two set of -variables. The continuously relaxed feasible sets of the -split formulations of disjunction (ex-1) are shown in Fig. 1, which shows that the relaxations overall tighten with increasing number of splits . The 4-split formulation does not give the convex hull, but provides a good approximation. For this example, the independent bound property does not hold and the relaxations do not form a proper hierarchy. To show why the independent bound property is needed, we compare the non-extended representations of the 1-split and 2-split formulations:

(1-s) | ||||

(2-s1) | ||||

(2-s2) | ||||

(2-s3) |

The 1-split formulation is given by (1-s), and the 2-split by (2-s1)–(2-s3). The 2-split contains additional constraints (2-s1) and (2-s2), but (2-s3) is a weaker version of (1-s). If the independent bound property were true, then (2-s3) and (1-s) would be identical and the relaxations would form a proper hierarchy.

## 3 Numerical comparison

To compare how the formulations perform computationally, we apply the -split, big-M, and convex hull formulations to several test problems. We consider three types of optimization problems that have a suitable structure for the -split formulation (assumptions 1–3) and that are known to be challenging.

#### K-means clustering

Using the formulation by Papageorgiou and Trespalacios [28], the K-means clustering problem [26] is given by

(23) | ||||||

s.t. |

where are the cluster centers, are -dimensional data points, and . The tightest upper bound for the auxiliary variables in the P-split formulations are given by the largest squared Euclidean distance between any two data points in the subspace corresponding to the auxiliary variable. By introducing auxillary variables for the differences , we can express the convex hull of the disjunctions by rotated second order cone constraints [6] in a form suitable for Gurobi. We use the G2 data set [27] to generate low-dimensional test instances, and the MNIST data set [23] to generate high-dimensional test instances. For the MNIST-based problems, we select the first images of each class ranging from 0 to the number of clusters. Details about the problems are presented in Table 1.

#### P_ball problems

The task is to assign -points to -dimensional unit balls such that the total distance between all points is minimized and only one point is assigned to each unit ball [21]. Upper bounds on the auxiliary variables in the P-split formulation are given by the same technique as for the -coefficients in [21], but in the subspace corresponding to the auxiliary variable. By introducing auxiliary variables for the differences between the points and the centers, we are able to express the convex hull by second order cone constraints [6] in a form suitable for Gurobi. We have generated a few larger instances to obtain more challenging problems and details of the problems are given in Table 1.

#### ReLU neural networks

Optimization over a ReLU neural network (NN) is used to quantify extreme outputs [1, 8]. Each ReLU activation function () can be expressed as a two-part disjunction using the -split formulation, by separating . We sort the variables by index and assign them to splits of even size. Upper bounds on node outputs and auxiliary variables can be computed using simple interval arithmetic. We created several instances (Table 1) that minimize the prediction of single-output NNs trained on the -dimensional Ackley/Rastrigin functions. All NNs were implemented in PyTorch [29] and trained for 1000 epochs, using a Latin hypercube of 10 samples. Note that more samples may be required to accurately represent the target functions, but here we are solely concerned with the performance of various optimization formulations.

name | data points | data dimension | number of clusters |
---|---|---|---|

Cluster_g1 | 20 | 32 | 2 |

Cluster_g2 | 25 | 32 | 2 |

Cluster_g3 | 20 | 16 | 3 |

Cluster_m1 | 5 | 784 | 3 |

Cluster_m2 | 8 | 784 | 2 |

Cluster_m3 | 10 | 784 | 2 |

number of balls | number of points | ball dimension | |

P_ball_1 | 10 | 5 | 8 |

P_ball_2 | 10 | 5 | 16 |

P_ball_3 | 8 | 5 | 32 |

input dimension () | hidden layers | function | |

NN_1 | 2 | [50, 50, 50] | Ackley |

NN_2 | 10 | [50, 50, 50] | Ackley |

NN_3 | 3 | [100, 100] | Rastrigin |

#### Computational setup

Optimization performance is dependent on both the tightness and the computational complexity of the continuous relaxation. The default (automatic) parameter selection in Gurobi caused large variations in the results that were due to different solution strategies rather than differences between formulations. Therefore, we used the parameter settings MIPFocus = 3, Cuts = 1, and MIQCPMethod = 1 for all problems. We found that using PreMIQCPForm = 2 drastically improves the performance of the extended convex hull formulations for the clustering and P_ball problems. However, it resulted in worse performance for the other formulations and, therefore, we only used it with the convex hull. Since the NN problems only contain linear constraints, only the MIPFocus and Cuts parameters apply to these problems The default values were used for all other settings. All problems were solved using Gurobi 9.0.3 on a desktop computer with an i7 8700k processor and 16GB RAM.

Different variable partitionings can lead to differences in the -split formulations. For all the problems, the variables are simply partitioned based on their ordered indices. For the K-means clustering and P_ball problems, we have used the smallest valid M-coefficients and thight bounds for the -variables. The K-means clustering and P_ball problems both have analytical expressions for all the bounds. For the NN problems tight bounds are not easily obtained, and the bounds are obtained using interval arithmetic.

### 3.1 Numerical results

Table 2 shows the elapsed CPU time and number of nodes explored to solve each problem. The results show that -split formulations can drastically reduce the number of explored nodes compared to the big-M formulation, even with only a few splits. The differences are clearest for the nonlinear problems, where both the CPU times and numbers of nodes are reduced by several orders of magnitude. As expected, the convex hull formulation results in the fewest explored nodes.
However, the -split formulations have a simpler^{4}

instance | big-M | 2-split | 4-split | 8-split | 16-split | 32-split | convex hull | |

Cluster_g1 | time | 1800 | 81.0 | 13.9 | 2.9 | 1.7 | 3.5 | 42.0 |

nodes | 8998 | 2946 | 1096 | 256 | 98 | 91 | 73 | |

Cluster_g2 | time | 1800 | 106.3 | 7.7 | 4.3 | 2.1 | 4.5 | 40.6 |

nodes | 10431 | 1736 | 481 | 217 | 104 | 86 | 77 | |

Cluster_g3 | time | 1800 | 1800 | 870.6 | 407.2 | 597.5 | NA | 1800 |

nodes | 28906 | 40820 | 19307 | 14923 | 16806 | 7797 | ||

P_ball_1 | time | 403.0 | 235.4 | 285.1 | 18.5 | NA | NA | 42.2 |

nodes | 29493 | 7919 | 5518 | 2202 | 1437 | |||

P_ball_2 | time | 1800 | 483.6 | 326.6 | 41.6 | 30.6 | NA | 28.2 |

nodes | 19622 | 13602 | 5871 | 3921 | 1261 | 531 | ||

P_ball_3 | time | 1800 | 1800 | 1800 | 149.3 | 91.1 | 78.7 | 114.0 |

nodes | 7537 | 6035 | 6708 | 7042 | 3572 | 631 | 554 | |

big-M | 14-split | 28-split | 56-split | 196-split | 392-split | convex hull | ||

Cluster_m1 | time | 1800 | 1800 | 129.5 | 76.8 | 32.0 | 33.2 | 313.3 |

nodes | 10680 | 9651 | 2926 | 1462 | 524 | 195 | 228 | |

Cluster_m2 | time | 1800 | 1116.5 | 156.1 | 27.1 | 97.0 | 54.2 | 1260.1 |

nodes | 4867 | 6220 | 1915 | 805 | 2752 | 1155 | 131 | |

Cluster_m3 | time | 1800 | 1800 | 429.5 | 60.0 | 23.2 | 19.8 | 1800 |

nodes | 4419 | 4197 | 3095 | 1502 | 741 | 397 | 93 | |

1-split/ | 2-split | 4-split | 8-split | 16-split | 32-split | 50-split/ | ||

big-M | convex hull* | |||||||

NN_1 | time | 36.1 | 29.4 | 41.8 | 57.0 | 85.7 | 145.1 | 198.5 |

nodes | 24177 | 12377 | 11229 | 7415 | 11117 | 9793 | 11734 | |

NN_2 | time | 21.6 | 35.5 | 50.7 | 131.4 | 287.3 | 776.1 | 1800 |

nodes | 19746 | 20157 | 14003 | 11174 | 6687 | 12685 | 4016 | |

NN_3 | time | 141.8 | 210.6 | 206.5 | 275.5 | 305.8 | 429.1 | 556.6 |

nodes | 116996 | 101113 | 86582 | 84455 | 69022 | 56873 | 48153 |

*50-split is not the convex hull of each node for NN_3, which has layers of 100 nodes.

Note that the -split formulations are in general robust towards the choice of . For the clustering and P_ball problems, all -split formulations outperformed the big-M formulation both in terms of solution times and numbers of explored nodes. For the cases where the smallest -split formulations timed out, Gurobi terminated with a much smaller gap compared to that of the big-M formulation. The -split formulations also outperform the convex hull formulations in terms of solution time for a wide range of in all but one of the test problems.

For the NN problems, which have linear disjunctions, the situation is somewhat different. Here, while increasing still decreased the number of explored nodes, the improvements are less significant, and the trend is not completely monotonic. Note that bounds on the inputs to layers 2–3 are computed using interval arithmetic, resulting in overall weaker relaxations for all formulations. The weaker bounds in layers 2–3 reduce the benefits of both the -split and convex hull formulations, and may favor the simpler big-M formulation. As the reduction in explored nodes is less drastic, smaller formulations perform the best in terms of CPU time, supporting claims that extended formulations may perform worse than expected [1, 39]. This may also be a consequence of Gurobi efficiently handling linear problems when it detects big-M-type constraints. Ignoring the big-M (1-split), the 2- and 4-splits have the lowest CPU time for all NNs, and all the split formulations solve the problems significantly faster than the convex hull formulation.

## 4 Conclusions

We have presented a general framework for generating intermediate relaxations in between the big-M and convex hull. The numerical results show a great potential of the intermediate relaxations, by providing a good approximation of the convex hull through a computationally simpler problem. For several of the test problems, the intermediate relaxations result in a similar number of explored nodes as the convex hull formulation while reducing the total solution time by an order of magnitude.

## Acknowledgements

The research was funded by a Newton International Fellowship by the Royal Society (NIF\R1\182194) to JK, a grant by the Swedish Cultural Foundation in Finland to JK, and by Engineering & Physical Sciences Research Council (EPSRC) Fellowships to RM and CT (grant numbers EP/P016871/1 and EP/T001577/1). CT also acknowledges support from an Imperial College Research Fellowship.

### Footnotes

- email: {j.kronqvist, r.misener, c.tsay}@imperial.ac.uk
- email: {j.kronqvist, r.misener, c.tsay}@imperial.ac.uk
- email: {j.kronqvist, r.misener, c.tsay}@imperial.ac.uk
- The extended convex hull formulations for the nonlinear problems require auxiliary variables and (rotated) second order cone constraints. All -split formulations have fewer variables and constraints and only contain linear/convex-quadratic constraints.

### References

- Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. Mathematical Programming pp. 1–37 (2020)
- Balas, E.: Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM Journal on Algebraic Discrete Methods 6(3), 466–486 (1985)
- Balas, E.: On the convex hull of the union of certain polyhedra. Operations Research Letters 7(6), 279–283 (1988)
- Balas, E.: Disjunctive programming: Properties of the convex hull of feasible points. Discrete Applied Mathematics 89(1-3), 3–44 (1998)
- Balas, E.: Disjunctive Programming. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-00148-3
- Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2. Siam (2001)
- Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programming with indicator constraints. Mathematical Programming 151(1), 191–223 (2015)
- Botoeva, E., Kouvaros, P., Kronqvist, J., Lomuscio, A., Misener, R.: Efficient verification of ReLU-based neural networks via dependency analysis. In: AAAI-20 Proceedings. pp. 3291–3299 (2020)
- Ceria, S., Soares, J.: Convex programming for disjunctive convex optimization. Mathematical Programming 86(3), 595–614 (1999)
- Conforti, M., Cornuéjols, G., Zambelli, G.: Integer programming, volume 271 of graduate texts in mathematics (2014)
- Conforti, M., Wolsey, L.A.: Compact formulations as a union of polyhedra. Mathematical Programming 114(2), 277–289 (2008)
- Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23(3), 296–309 (2018)
- Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed-integer linear programs. Computers & Chemical Engineering 131, 106580 (2019)
- Grossmann, I.E., Lee, S.: Generalized convex disjunctive programming: Nonlinear convex hull relaxation. Computational optimization and applications 26(1), 83–100 (2003)
- Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Mathematical programming 124(1-2), 183–205 (2010)
- Helton, J.W., Nie, J.: Sufficient and necessary conditions for semidefinite representability of convex hulls and sets. SIAM Journal on Optimization 20(2), 759–791 (2009)
- Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring âon/offâ constraints. Computational Optimization and Applications 52(2), 537–558 (2012)
- Huang, C.F., Tseng, Y.C.: The coverage problem in a wireless sensor network. Mobile networks and Applications 10(4), 519–528 (2005)
- Jeroslow, R.G.: A simplification for some disjunctive formulations. European Journal of Operational research 36(1), 116–121 (1988)
- Jeroslow, R.G., Lowe, J.K.: Modelling with integer variables. In: Mathematical Programming at Oberwolfach II, pp. 167–184. Springer (1984)
- Kronqvist, J., Misener, R.: A disjunctive cut strengthening technique for convex MINLP. Optimization and Engineering pp. 1–31 (2020)
- Lasserre, J.B.: An explicit exact SDP relaxation for nonlinear 0-1 programs. In: International Conference on Integer Programming and Combinatorial Optimization. pp. 293–303. Springer (2001)
- LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010)
- Liittschwager, J., Wang, C.: Integer programming solution of a classification problem. Management Science 24(14), 1515–1525 (1978)
- Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0–1 optimization. SIAM Journal on Optimization 1(2), 166–190 (1991)
- MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. vol. 1, pp. 281–297. Oakland, CA, USA (1967)
- Mariescu-Istodor, P.F.R., Zhong, C.: XNN graph LNCS 10029, 207–217 (2016)
- Papageorgiou, D.J., Trespalacios, F.: Pseudo basic steps: bound improvement guarantees from Lagrangian decomposition in convex disjunctive programming. EURO Journal on Computational Optimization 6(1), 55–83 (2018)
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. pp. 8026–8037 (2019)
- Rubin, P.A.: Solving mixed integer classification problems by decomposition. Annals of Operations Research 74, 51–64 (1997)
- Ruiz, J.P., Grossmann, I.E.: A hierarchy of relaxations for nonlinear convex generalized disjunctive programming. European Journal of Operational Research 218(1), 38–47 (2012)
- Sağlam, B., Salman, F.S., Sayın, S., Türkay, M.: A mixed-integer programming approach to the clustering problem with an application in customer segmentation. European Journal of Operational Research 173(3), 866–879 (2006)
- Sawaya, N.W., Grossmann, I.E.: Computational implementation of non-linear convex hull reformulation. Computers & Chemical Engineering 31(7), 856–866 (2007)
- Serra, T., Kumar, A., Ramalingam, S.: Lossless compression of deep neural networks. In: Integration of Constraint Programming, Artificial Intelligence, and Operations Research. pp. 417–430. Springer (2020)
- Sherali, H.D., Adams, W.P.: A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM Journal on Discrete Mathematics 3(3), 411–430 (1990)
- Stubbs, R.A., Mehrotra, S.: A branch-and-cut method for 0-1 mixed convex programming. Mathematical Programming 86(3), 515–532 (1999)
- Trespalacios, F., Grossmann, I.E.: Algorithmic approach for improved mixed-integer reformulations of convex generalized disjunctive programs. INFORMS Journal on Computing 27(1), 59–74 (2015)
- Vielma, J.P.: Mixed integer linear programming formulation techniques. Siam Review 57(1), 3–57 (2015)
- Vielma, J.P.: Small and strong formulations for unions of convex sets from the cayley embedding. Mathematical Programming 177(1-2), 21–53 (2019)
- Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonseparable piecewise-linear optimization: unifying framework and extensions. Operations research 58(2), 303–315 (2010)
- Vielma, J.P., Nemhauser, G.L.: Modeling disjunctive constraints with a logarithmic number of binary variables and constraints. Mathematical Programming 128(1-2), 49–72 (2011)