On the combinatorics of the 2-class classification problem

On the combinatorics of the 2-class classification problem

Ricardo C. Corrêa
Departamento de Ciência da Computação
Instituto Multidisciplinar
Universidade Federal Rural do Rio de Janeiro
Brazil
Diego Delle Donne, Javier Marenco
Instituto de Ciencias

Argentina
Abstract

A set of points is linearly separable if the convex hulls of and are disjoint, hence there exists a hyperplane separating from . Such a hyperplane provides a method for classifying new points, according to which side of the hyperplane the new points lie. When such a linear separation is not possible, it may still be possible to partition and into prespecified numbers of groups, in such a way that every group from is linearly separable from every group from . We may also discard some points as outliers, and seek to minimize the number of outliers necessary to find such a partition. Based on these ideas, Bertsimas and Shioda proposed the classification and regression by integer optimization (CRIO) method in 2007. In this work we explore the integer programming aspects of the classification part of CRIO, in particular theoretical properties of the associated formulation. We are able to find facet-inducing inequalities coming from the stable set polytope, hence showing that this classification problem has exploitable combinatorial properties.

Keywords: classification, integer programming, polyhedral combinatorics

1 Introduction

The data classification problem is a widely-studied topic within the machine learning and the data mining communities. Briefly speaking, this problem aims at finding a partition of an euclidean space that represents the underlying pattern of a set of points. Many computational methods for tackling this problem exist, including decision trees [4], linear and quadratic programming [5, 8], and support vector machines [11], among others. Only recently has the use of discrete optimization, and in particular integer programming, been proposed within these settings [1, 3, 10].

In this work we are interested in the method called classification and regression via integer optimization (CRIO), proposed by Bertsimas and Shioda in [3]. This pioneering work presents a remarkable application of integer programming to this field, by proposing to classify points by hyperplanes separating pairs of groups of points. Given two sets of points in , it is not always possible to find a hyperplane separating them, as Figure 1(a) shows. However, assuming that the underlying pattern can be expressed in terms of convex sets, it may be possible to subdivide each set into groups, and then find hyperplanes separating each pair of groups coming from different sets, as in Figure 1(b). In [3] a method is proposed to find such linearly separable groups, which includes the solution of a mixed integer program as a key step. This model also identifies a number of points that deviate from the underlying pattern and need to be disregarded in order to enable the desired classification. Further developments [1, 10] and computational experiments performed with instances from the literature show that this approach is promising as an effective alternative in practice.

There exist general algorithms for solving integer programs, and very strong implementations of these algorithms are available. However, since the general integer programming problem is NP-hard, the running times of these approaches may be intractably large, depending on the instance size and the model structure. Due to this fact, it is usual to study polyhedra associated with specific integer programming formulations, in order to find strong valid inequalities that may be helpful within these algorithms. Such polyhedral explorations may also reveal interesting structures and relations among different problems. In this work we are interested in such issues concerning a 0-1 integer program derived from CRIO in order to partition each set into groups, find the separating hyperplanes among them, and identify a minimum number of disregarded points. The 0-1 integer program studied results from a projection of a mixed integer one. The main goal that motivated this work is the question of whether there exist standard combinatorial structures inherent to the CRIO method.

The remainder of this work is organized as follows. Section 2 introduces the problem in detail and sets out the notation used throughout the paper. Section 3 contains our initial polyhedral study, including several families of facet-inducing inequalities. A first affirmative answer to the existence of combinatorial structures within the associated polyhedra, in the form of facets coming from the stable set polytope, is the subject of Section 4. Finally, Section 5 closes the paper with conclusions and lines for future research.

2 The 2-class problem

Let be a set of samples (also referred to as points) and consider a partition of the index set into two classes and , defining subsets and . We start with some preliminary definitions before stating the mixed integer formulation.

2.1 Linear separability

The set is linearly separable if and only if , where denotes the convex hull of . It is worth noting that being linearly separable is equivalent to be partitionable into two convex sets respecting and . Hence, the linear separability of is characterized by the fact that the set of all , , such that

 ∑i∈Bλi\textscxi =∑j∈Rλj\textscxj, ∑i∈Bλi =∑j∈Rλj=1

must be empty. Applying Farkas’ Lemma, we get that this characterization is equivalent to state that there exist and such that

 r−q <0, p\textscxi+q ≤0 for i∈B, p\textscxj+r ≥0 for j∈R.

Adding in both sides of the last inequality and defining , we get

 p\textscxi+q ≤0 for i∈B, p\textscxj+q ≥δ for j∈R.

If these conditions hold for some and , then set , and divide p, , and by to conclude that is linearly separable if and only if the set of hyperplanes such that

 p\textscxi+q ≤−1 for i∈B, p\textscxj+q ≥1 for j∈R

is not empty. The example in Figure 1 clearly does not satisfy this property and, then, is not linearly separable.

2.2 Piecewise linear separability

In several situations of interest, the underlying pattern of the points in cannot be expressed by two convex sets only. In order to cope with such a scenario, let and , , be two sets of group indices specified for and , respectively. An assignment of points in to indices in defines groups of points in . Group is the subset of assigned to index . Groups of points in are defined similarly. A piecewise linear separation of is an assignment of points in to indices in and of points in to indices in such that groups and are linearly separable, for all and . A hyperplane separating groups and is such that

 pkℓ\textscxi+qkℓ ≤−1 for i∈B such that \textscxi is % assigned to k, pkℓ\textscxj+qkℓ ≥1 for j∈R such that \textscxj is % assigned to ℓ.

2.3 Mixed integer programming formulation

We now state a mixed integer programming formulation inspired by the one that constitutes the key step in the CRIO method [3]. The input of the problem is formed by the set of points , the partition of point indices and , and the sets and of group indices. The objective is to find an assignment of points to groups that induces a piecewise linear separation of .

For every group and every group , the formulation contains the variables and , in such a way that is the hyperplane separating the groups and . For and , the binary variable represents whether is assigned to group or not. For and , the binary variable represents whether is assigned to group or not. In this setting, we can provide the formulation corresponding to the maximization of

 ∑i∈B∑k∈LBzik+∑j∈R∑ℓ∈LRzjℓ (1)

subject to

 pkℓ\textscxi+qkℓ ≤M−(M+1)zik ∀i∈B,∀k∈LB,∀ℓ∈LR, (2) pkℓ\textscxj+qkℓ ≥−M+(M+1)zjℓ ∀j∈R,∀ℓ∈LR,∀k∈LB, (3) ∑k∈LBzik ≤1 ∀i∈B, (4) ∑ℓ∈LRzjℓ ≤1 ∀j∈R, (5) (pkℓ,qkℓ) ∈Rd+1 ∀k∈LB,∀ℓ∈LR, (6) zik ∈{0,1} ∀(i,k)∈(B×LB)∪(R×LR), (7)

where is a big positive number. Note that the feasibility of the model does not depend on the actual value of , namely if is small then some solutions are lost but the problem remains feasible.

Definition 1.

Given an instance of the problem, we call the convex hull of the points satisfying (2)-(7).

In addition, we can introduce a binary variable for each to specify whether sample is an outlier or not, i.e., if it is not assigned to any group. With these settings, the objective function can be written as

 minimize ∑i∈[m]oi

and constraints (4) and (5) should be replaced by the following constraints

 ∑k∈LBzik =1−oi ∀i∈B, (8) ∑l∈LRzjℓ =1−oi ∀j∈R. (9)

2.4 Integer programming formulation

We now discuss an integer programming formulation resulting from a projection of (2)-(7) onto the space of the -variables. For this purpose, let denote the set of points satisfying (2)-(3) and (6)-(7) for and , where and are vectors constituted by the variables , for all , and , for all , respectively. Rewrite (2)-(3) for and as

 [X⊤B1−X⊤R−1][pkℓqkℓ]+(M+1)⋅[zBkzRℓ]≤M⋅1,

and denote by the combination of , for all and . The projection of onto is defined as

 Projz(Qkℓ)={(zBk,zRℓ)∈{0,1}B∪R:∃(p% kℓ,qkℓ,zBk,zRℓ)∈Qkℓ}

The combination of , for all and , gives , which in turn gives the set of all possibly intersecting group assignments. By Theorem 1.1 of [2], is given by the group assignments satisfying

 (M+1)(∑i∈Bυikzik+∑j∈Rυjℓzjℓ)≤M(∑i∈Bυik+∑j∈Rυjℓ)

for all such that

 ∑i∈Bυik\textscxi =∑j∈Rυjℓ\textscxj (10) ∑i∈Bυik =∑j∈Rυjℓ. (11)

It is worth noting that the convex hull of groups and intersect in a group assignment if and only if there exists with , , and such that (10)-(11) hold. Hence,

 (M+1)(∑i∈Bυikzik+∑j∈Rυjℓzjℓ)≤2M (12)

prevents such a group assignment to be chosen. The integer programming formulation consists in maximizing (1) over all binary points of type (7) satisfying (4)-(5) and (12) for the extreme rays of the set defined by (10)-(11).

A final remark is in order with respect to this integer programming formulation. The projection of onto , , can be seen as the convex hull of the group assignments in satisfying (4)-(5). In addition, the valid inequalities discussed in the next sections involve -variables only. Consequently, Corollary 2.2 of [2] can then be applied to conclude that the facetness results of those sections are valid for as well.

3 Polyhedral study

In this section we are interested in facets of , with a particular interest in combinatorial structures originating facet-inducing inequalities. For , , and , we denote by the unit vector associated with the -th coordinate of the variable vector . Correspondingly, for and , we denote by the unit vector associated with the variable . For and , let be the unit vector associated with the variable . Finally, for and , we denote by the unit vector associated with the variable .

Proposition 1.

is full-dimensional.

Proof.

In order to prove this proposition, we construct the following affinely independent feasible solutions.

1. Let be the solution having all variables set to null values. Since in b for every , then no point is assigned to any group, and all constraints are satisfied, hence b is feasible.

2. For any , , and , consider the solution , with and if , where is the -th coordinate of . This solution is feasible since all points are outliers due to the fact that all variables are null, and it is affinely independent w.r.t. the previous solutions, which have .

3. Similarly, for any and , the solution is feasible and affinely independent w.r.t. the previously-constructed solutions.

4. For any and , construct the solution . Constraint (2) for , , and takes the form (since in this solution), hence it is satisfied. The remaining constraints are trivially satisfied. Furthermore, this solution is affinely independent w.r.t. the previous solutions, which have .

5. Similarly, for any and , the solution is feasible and affinely independent w.r.t. the previous ones.

The existence of these solutions shows that is full-dimensional. ∎

The solutions constructed within the proof of Proposition 1 allow to show the following facetness results in a quite straightforward way, hence the proof of the following proposition is omitted.

Proposition 2.
• The model constraints (4) and (5) are facet-inducing.

• The bound is facet-inducing, for every and .

• The bound is facet-inducing, for every and .

3.1 Convex-inclusion inequalities

We now explore families of valid inequalities for , and study their facetness properties. We first present a familiy of valid inequalities involving a point and a set of points in whose convex hull contains . In this setting, we may consider the valid inequality given by the following proposition. We adopt the notation , for any .

Proposition 3.

Let , and such that . The convex-inclusion inequality

 ∑i∈Szik+∑ℓ∈LRzjℓ ≤ s (13)

is valid for .

Proof.

Let be a feasible solution. If is an outlier in this solution (i.e., for every ), then (13) is trivially satisfied, so assume for some . Since is contained in the convex hull of the points , then there is no hyperplane separating from . This implies that a solution having all the points in assigned to the same group would not be feasible, hence , and (13) is satisfied. ∎

Xavier and Campêlo [12] proposed a general facet-generating procedure that takes a valid (facet-inducing) inequality for a polytope and a valid (facet-inducing) inequality for the face of defined by , and produces a new valid (facet-inducing) inequality for by combining them. It is interesting to note that inequalities (13) are obtained with the proposed procedure by using (5) as along with , which is valid when .

If is a feasible solution, we say that a constraint is strictly satisfied by if the latter does not satisfy the constraint with equality.

Theorem 1.

Assume . The inequality (13) defines a facet of if and only if is minimal w.r.t. the property (i.e., for every ).

Proof.

Assume first that is minimal w.r.t. the property . Let be the face of defined by (13), and let and be such that for every . We shall verify that is a multiple of the coefficient vector of (13), thus showing that is a facet of . To this end, let be the solution obtained by setting for every , and all the remaining -variables to 0 (i.e., all the points in are outliers). We set for all , and the remaining -variables are set to 0. Note that constraints (2) for and corresponding to the groups and are strictly satisfied (i.e., without equality). This point is feasible and satisfies (13) with equality.

Claim 1: . For , , and , consider the solution , which is feasible if is small enough. Indeed, constraints (2) for and are satisfied since they are strictly satisfied by b. In this solution only one group is nonempty, and the solution satisfies (13) with equality. Together with b, the existence of this solution implies .

Claim 2: . Similarly, for and , the solution is feasible (for a small enough ) and also satisfies (13) with equality. Again, the combination of this solution with b implies .

Claim 3: for and . Let and consider the solution constructed by setting for , , and the remaining -variables to 0 (including ). Since , then there exists a hyperplane separating from . Let and , set for all , and set the remaining p- and -variables to 0. The solution thus constructed is feasible and satisfies (13) with equality, hence . Since and , we conclude that .

Claim 4: for and . If is not contained in , then there exists a hyperplane separating from . Otherwise, if is contained in , since the points in are affinely independent (due to the minimality of ), then there exists such that . In both cases, there exists some such that we can find a hyperplane separating from . Construct a solution by setting for all , , and for any (recall that ), and the remaining -variables to 0. Finally, set equal to the hyperplane separating from , and equal to the hyperplane separating from . This solution is feasible and satisfies (13) with equality. By resorting to Claim 3.1, Claim 3.1, and Claim 3.1, the existence of b and shows that .

Claim 5: for and . Take any and consider the solution obtained from by setting equal to the hyperplane separating from , and equal to the hyperplane separating from . This solution is feasible and satisfies (13) with equality. By resorting to the previous claims, the existence of and b shows that .

Claim 6: for and . Similarly to the proof of Claim 3.1, the solution is feasible and satisfies (13) with equality, so .

By combining these claims, we conclude that is a multiple of the coefficient vector of (13), hence is a facet of .

For the converse direction, suppose that for some . This implies that any solution having cannot satisfy (13) with equality, since such a solution must have for every and for some , in order to attain equality. However, since then no hyperplane separating from exists, hence such a solution cannot be feasible. This implies that every solution in the face of induced by (13) satisfies and, since is full-dimensional, such a face is not a facet of . ∎

The symmetrical inequalities considering a point and a set of points in whose convex hull includes have the same properties as (13).

3.2 Obstacle inequalities

Given two distinct points , a set is an obstacle between and if (see Figure 2). We say that is a trivial obstacle if or , and that is a minimal obstacle if , for every . We denote by the affine space generated by the points in .

The presence of an obstacle that is a subset of between two points of implies that and are not linearly separable. However, it is interesting to remark that the converse is not true. There may not exist an obstacle between two points of even when and are not linearly separable; although in such a case an obstacle will exist between some pair of points of (recall that and are not polytopes but finite sets of points). An example of this is given by the sets

 XB ={(1,1,0,0),(−2,1,0,0),(1,−2,0,0)} XR ={(0,0,1,1),(0,0,−2,1),(0,0,1,−2)}

for which .

Proposition 4.

Let be such that is an obstacle between two points , . For and , the obstacle inequality

 zj1ℓ+zj2ℓ+∑i∈Szik≤s+1 (14)

is valid for .

Proof.

Let be a feasible solution. Since the left-hand side of (14) contains binary variables, we need only consider the case and for every . These variable values imply that and are assigned to the group , whereas all the points in are assigned to the group . This is not possible in a feasible solution, since implies that there is no hyperplane separating from . Hence, satisfies (14). Since is an arbitrary solution, then (14) is valid for . ∎

We now explore the facetness of the obstacle inequalities. To this end, we first state the following preliminary lemmas.

Lemma 1.

Let be such that is an obstacle between , . If , then is trivial or non minimal.

Proof.

Assume that is nontrivial. Let be a facet of containing a point of . Such a facet exists since otherwise is a polyhedron defined by a system of linear equations (this implies that is a singleton like in Figure 2(b)), contradicting either the hypothesis that is nontrivial or the hypothesis asserting that . Therefore, the set of vertices of is a proper subset of and forms an obstacle between and . ∎

Lemma 2.

Let be such that is a nontrivial minimal obstacle between , . Let , and define . Then, or .

Proof.

If , then the lemma trivially holds. Thus, assume that . If , then could be written as an affine combination of and any point x in (such an exists since is a nontrivial obstacle between and ). The situation where is analogous with the roles of and interchanged. Therefore, or implies , which contradicts Lemma 1.

So assume . It follows from that there exists a hyperplane separating from such that , , and for all . Additionally, or is violated by and by all points in simultaneously. In the former case, since holds for all points in and, in the latter case, .

Lemma 3.

Let be such that is a nontrivial minimal obstacle between , , and let , . Then, there exists such that and are linearly separable, where .

Proof.

Lemma 1 and the fact that is a nontrivial minimal obstacle yield that the intersection between and is a singleton, say (this is so because a second point in would result in ). Resorting to similar arguments, we conclude that is either empty or a singleton. The same applies to . Additionally, if then either or is empty. It follows that the intersection between