Cluster Analysis is Convex

# Cluster Analysis is Convex

Madhushini Narayana Prasad Graduate Program in Operations Research and Industrial Engineering, The University of Texas at Austin, USA Grani A. Hanasusanto Graduate Program in Operations Research and Industrial Engineering, The University of Texas at Austin, USA
###### Abstract

In this paper, we show that the popular -means clustering problem can equivalently be reformulated as a conic program of polynomial size. The arising convex optimization problem is NP-hard, but amenable to a tractable semidefinite programming (SDP) relaxation that is tighter than the current SDP relaxation schemes in the literature. In contrast to the existing schemes, our proposed SDP formulation gives rise to solutions that can be leveraged to identify the clusters. We devise a new approximation algorithm for -means clustering that utilizes the improved formulation and empirically illustrate its superiority over the state-of-the-art solution schemes.

## 1 Introduction

Given an input set of data points, cluster analysis endeavors to discover a fixed number of disjoint clusters so that the data points in the same cluster are closer to each other than to those in other clusters. Cluster analysis is fundamental to a wide array of applications in science, engineering, economics, psychology, marketing, etc. [15, 14]. One of the most popular approaches for cluster analysis is the -means clustering [21, 19, 14]. The goal of -means clustering is to partition the data points into clusters so that the sum of squared distances to the respective cluster centroids is minimized. Formally, the -means clustering seeks for a solution to the mathematical optimization problem

 minK∑i=1∑n∈Pi∥xn−ci∥2s.t.Pi⊆{1,…,N},ci∈RD∀i∈{1,…,K}ci=1|Pi|∑n∈PixnP1∪⋯∪PK={1,…,N},Pi∩Pj=∅∀i,j∈{1,…,K}:i≠j. (1)

Here, are the input data points, while are the output clusters. The vectors in (1) determine the cluster centroids, while the constraints on the last row of (1) ensure that the subsets constitute a partition of the set .

Due to its combinatorial nature, the -means clustering problem (1) is generically NP-hard [2]. A popular solution scheme for this intractable problem is the heuristic algorithm developed by Lloyd [19]. The algorithm initializes by randomly selecting the cluster centroids. It then proceeds by alternating between the assignment step and the update step. In the assignment step the algorithm designates each data point to the closest centroid, while in the update step the algorithm determines new cluster centroids according to current assignment.

Another popular approach arises in the form of convex relaxation schemes [24, 4, 26]. In this approach, tractable semidefinite programming (SDP) lower bounds for (1) are derived. Solutions of these optimization problems are then transformed into the cluster assignments via well-constructed rounding procedures. Such convex relaxation schemes have a number of theoretically appealing properties. If the data points are supported on disjoint balls then exact recovery is possible with high probability whenever the distance between any two balls is sufficiently large [4, 13]. A stronger model-free result is achievable if the cardinalities of the clusters are prescribed to the problem [26].

A closely related problem is the non-negative matrix factorization with orthogonality constraints (ONMF). Given an input data matrix , the ONMF problem seeks for non-negative matrices and so that both the product is close to in view of the Frobenius norm and the orthogonality constraint is satisfied. Although ONMF is not precisely equivalent to -means, solutions to this problem have the clustering property [10, 18, 11, 16]. In [25], it is shown that the ONMF problem is in fact equivalent to a weighted variant of the -means clustering problem.

In this paper, we attempt to obtain equivalent convex reformulations for the ONMF and the -means clustering problems. To derive these reformulations, we adapt the results by Burer and Hong [8] who showed that any (non-convex) quadratically constrained quadratic program can be reformulated as a linear program over the convex cone of completely positive matrices. The resulting optimization problem is called a completely positive program. Such a transformation does not immediately mitigate the intractability of the original problem, since solving a generic completely positive program is NP-hard. However, the complexity of the problem is now entirely absorbed in the cone of completely positive matrices which admits tractable semidefinite representable outer approximations [23, 9, 17]. Replacing the cone with these outer approximations gives rise to SDP relaxations of the original problem that in principle can be solved efficiently.

As byproducts of our derivations, we identify a new condition that makes the ONMF and the -means clustering problems equivalent and we obtain a new SDP relaxation for the -means clustering problem that is tighter than the well-known relaxation proposed by Peng and Wei [24]. The contributions of this paper can be summarized as follows.

1. We disclose a new connection between ONMF and -means clustering. We show that -means clustering is equivalent to ONMF if an additional requirement on the binarity of the solution to the latter problem is imposed. This amends the previous incorrect result by Ding et al. [10, Section 2] and Li and Ding [18, Theorem 1] who claimed that both problems are equivalent.111To our best understanding, they have shown only one of the implications that establish an equivalence.

2. We derive exact conic programming reformulations for the ONMF and -means clustering problems that are principally amenable to numerical solutions. To our best knowledge, we are the first to obtain equivalent convex reformulations for these problems.

3. In view of the equivalent convex reformulation, we derive a tighter SDP relaxation for the -means clustering problem whose solutions can be used to construct high quality estimates of the cluster assignment.

4. We devise a new approximation algorithm for the -means clustering problem that leverages the improved relaxation and numerically highlight its superiority over the state-of-the-art SDP approximation scheme and the Lloyd’s algorithm.

The remainder of the paper is structured as follows. We derive a conic programming reformulation for the ONMF problem in Section 2. We extend this result to the setting of -means clustering in Section 3. Section 4 develops an SDP relaxation and designs a new approximation algorithm for -means clustering. Finally, we empirically assess the performance of our proposed algorithm in Section 5.

### Notation:

For any , we define as the index set . We denote by the identity matrix and by the vector of all ones. We also define as the -th canonical basis vector. Their dimensions will be clear from the context. The trace of a square matrix is denoted as . We define as the diagonal matrix whose diagonal elements comprise the entries of . For any non-negative vector , we define the cardinality of all positive elements of by . For any matrix , we denote by the vector that corresponds to the -th column of . The set of all symmetric matrices in is denoted as , while the cone of positive semidefinite matrices in is denoted as . The cone of completely positive matrices over a set is denoted as . For any and any cone , the relations and denote that is an element of and , respectively. The -dimensional second-order cone is defined as . We denote by the intersection of the -dimensional second-order cone and the non-negative orthant.

## 2 Orthogonal Non-Negative Matrix Factorization

In this section, we consider the ONMF problem given by

 min∥X−FU⊤∥2Fs.t.F∈RD×K+,U∈RN×K+U⊤U=I. (2)

Here, is a matrix whose columns comprise the data points in . We remark that the problem (2) is generically intractable since we are minimizing a non-convex quadratic objective function over the Stiefel manifold [1, 3]. By expanding the Frobenius norm in the objective function and noting that , we find that (2) is equivalent to the problem

 min\textuptr(X⊤X−2XUF⊤+F⊤F)s.t.F∈RD×K+,U∈RN×K+U⊤U=I. (3)

If all elements of are non-negative, then we can reduce (2) into a simpler problem involving only the decision matrix .

###### Lemma 1.

If is a non-negative matrix then the problem (2) is equivalent to the non-convex program

 min\textuptr(X⊤X−X⊤XUU⊤)s.t.U∈RN×K+U⊤U=I. (4)
###### Proof.

Solving the minimization over analytically in (3), we find that the solution is feasible and optimal. Substituting this solution into the objective function of (3), we arrive at the equivalent problem (4). This completes the proof. ∎

In the following, we first derive a convex reformulation for the simpler problem (4). We remark that this problem is still intractable due to the non-convexity of the objective function and the constraint system. Thus, any resulting convex formulation will in general remain intractable. To derive our reformulation, we rely on the following lemma which is proven in [6].

###### Lemma 2 ([6, Lemma 2.2]).

Let the non-negative matrix and vector satisfy the linear constraints

 f⊤ℓz=gℓ,f⊤ℓZfℓ=g2ℓ∀ℓ∈L,

where the constraint system , , defines a bounded polytope in . Consider the completely positive matrix decomposition

 [Zzz⊤1]=∑t∈T[ζtηt][ζtηt]⊤,

with and for all , and define the subsets and of . Then we have

1. for all and ;

2. for all .

We are now ready to state our first main result.

###### Theorem 1.

Problem (4) is equivalent to the following completely positive program.

 min\textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤XVii)s.t.pi∈SOCN+1+,Qij∈R(N+1)×(N+1)+,ui∈RN+,Vij∈RN×N+∀i,j∈[K]pi=[ui1],Qij=[Vijuiu⊤j1]∀i,j∈[K]\textuptr(Vii)=1∀i∈[K]\textuptr(Vij)=0∀i,j∈[K]:i≠j⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Q11⋯Q1Kp1⋮⋱⋮⋮QK1⋯QKKpKp⊤1⋯p⊤K1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦∈C((SOCN+1+)K×R+) (5)
###### Proof.

By employing the notation for column vectors , we can reformulate (4) equivalently as the problem

 min\textuptr(X⊤X)−∑i∈[I]\textuptr(X⊤Xuiu⊤i)s.t.U∈RN×K+u⊤iui=1∀i∈[K]u⊤iuj=0∀i,j∈[K]:i≠j. (6)

Next, we introduce the auxiliary decision variables , , , that satisfy

 pi=[ui1],Qij=[uiu⊤juiu⊤j1]∀i,j∈[K].

Since , and consequently , we can without loss of generality impose the vector to reside in the cone . Thus, we obtain the equivalent reformulation

 min\textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤XVii)s.t.pi∈SOCN+1+,Qij∈R(N+1)×(N+1)+,ui∈RN+,Vij∈RN×N+∀i,j∈[K]pi=[ui1],Qij=[Vijuiu⊤j1]∀i,j∈[K]\textuptr(Vii)=1∀i∈[K]\textuptr(Vij)=0∀i,j∈[K]:i≠j⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Q11⋯Q1Kp1⋮⋱⋮⋮QK1⋯QKKpKp⊤1⋯p⊤K1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣p1⋮pK1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣p1⋮pK1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⊤. (7)

The last constraint of (7) implies that the left-hand side matrix has rank . If we relax this constraint to the requirement that the matrix should merely belong to the completely positive cone , then we arrive at the completely positive program (5). The arising problem is a relaxation whose optimal value constitutes a lower bound on the optimal value of (7). Exactness of this reformulation is then obtained by adapting the procedures in [8]. Since we will employ and extend the same reformulation techniques in the subsequent section, we provide the derivations for the specific instance above as follows.

We first consider the completely positive decomposition of the rank-1 matrix given by

 ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Q11⋯Q1Kp1⋮⋱⋮⋮QK1⋯QKKpKp⊤1⋯p⊤K1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=∑t∈T⎡⎢ ⎢ ⎢ ⎢ ⎢⎣θt1⋮θtKηt⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣θt1⋮θtKηt⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⊤,

where is a finite index set, and and for every and . We next partition the index set into the sets and , which gives rise to the decomposition

 ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Q11⋯Q1Kp1⋮⋱⋮⋮QK1⋯QKKpKp⊤1⋯p⊤K1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=∑t∈T+η2t⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣θt1ηt⋮θtKηt1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣θt1ηt⋮θtKηt1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⊤+∑t∈T0⎡⎢ ⎢ ⎢ ⎢ ⎢⎣θt1⋮θtK0⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣θt1⋮θtK0⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⊤. (8)

For every and , we partition the vector into the constituents and such that

 θti=[atibti].

Since , we have by construction

 ∥∥∥atiηt∥∥∥≤btiηt∀t∈T+ and ∥ati∥≤bti∀t∈T0. (9)

Next, for every we consider the following completely positive decomposition involving only the matrix and the vector in (8).

 [Qiipip⊤i1]=⎡⎢ ⎢⎣Viiuiuiu⊤i11u⊤i11⎤⎥ ⎥⎦=∑t∈T+η2t⎡⎢ ⎢ ⎢ ⎢⎣atiηtbtiηt1⎤⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢⎣atiηtbtiηt1⎤⎥ ⎥ ⎥ ⎥⎦⊤+∑t∈T0⎡⎢ ⎢⎣atibti0⎤⎥ ⎥⎦⎡⎢ ⎢⎣atibti0⎤⎥ ⎥⎦⊤ (10)

Lemma 2 then implies that

 btiηt=1∀t∈T+ and bti=0∀t∈T0.

Substituting these identities into (9), we obtain

 (11)

and

 ati=0∀t∈T0. (12)

Next, the constraint in (5) together with the decomposition in (10) gives rise to

 ∑t∈T+η2t(atiηt)⊤atiηt+∑t∈T0(ati)⊤ati=1.

Thus, (12) yields

 ∑t∈T+η2t(atiηt)⊤atiηt=1.

Since , the point on the right-hand-side of the equation lies in the convex hull of the scalars , . Thus, in view of the inequalities in (11), we must have for all and . A similar argument allows us to further conclude that the penultimate constraint in (5) yields for all and such that . In summary, we find that for any index the solution satisfying

 ^ui=atiηt∀i∈[K] (13)

is feasible to the original problem (6). Verifying the objective value of the relaxed problem (5), we further find that it constitutes a convex combination of the objective values of (6) evaluated at the respective points in (13), i.e.,

 \textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤XVii)=\textuptr(X⊤X)−∑i∈[K]\textuptr⎛⎝X⊤X∑t∈T+η2tatiηtatiηt⊤⎞⎠=∑t∈T+η2t⎛⎝\textuptr(X⊤X)−∑i∈[K]\textuptr⎛⎝X⊤Xatiηtatiηt⊤⎞⎠⎞⎠.

Thus, there exists an index such that

 \textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤XVii)≥\textuptr(X⊤X)−∑i∈[K]\textuptr⎛⎝X⊤Xat⋆iηt⋆at⋆iηt⋆⊤⎞⎠.

As the solution in (13) corresponding to is feasible in (6), we conclude that the optimal value of (5) constitutes an upper bound on the optimal value of (6). Our previous argument that (5) is a relaxation of (6) thus implies that both problems are indeed equivalent. This completes the proof.

By employing the same reformulation techniques as in the proof of Theorem 1, we can show that the generic problem (2) is amenable to an exact convex reformulation.

###### Proposition 1.

Problem (2) is equivalent to the following completely positive program.

 min\textuptr(X⊤X)+∑i∈[K]\textuptr(Gii−2XWii)s.t.pi∈SOCN+1+×RD+,Qij∈R(N+1+D)×(N+1+D)+∀i,j∈[K]ui∈RN+,Vij∈RN×N+,hi∈RD+,Gjj∈RD×D+,Wij∈RN×D+∀i,j∈[K]pi=⎡⎢⎣ui1hi⎤⎥⎦,Qij=⎡⎢ ⎢⎣VijuiWiju⊤j1h⊤jW⊤jihiGij⎤⎥ ⎥⎦∀i,j∈[K]\textuptr(Vii)=1∀i∈[K]\textuptr(Vij)=0∀i,j∈[K]:i≠j⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Q11⋯Q1Kp1⋮⋱⋮⋮QK1⋯QKKpKp⊤1⋯p⊤K1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦∈C((SOCN+1+×RD+)K×R+) (14)

## 3 K-means Clustering

Building upon the results in the previous section, we now derive an exact completely positive programming reformulation for the -means clustering problem (1). To this end, we note that the problem can equivalently be solved via the following mixed-integer nonlinear program [12].

 Z⋆=min∑i∈[K]∑n:πin=1∥xn−ci∥2s.t.πi∈{0,1}N,ci∈RD∀i∈[K]ci=1e⊤πi∑n:πin=1xn∀i∈[K]e⊤πi≥1∀i∈[K]∑i∈[K]πi=e (15)

Here, is the centroid of the -th cluster, while is the assignment vector for the -th cluster, i.e., if and only if the data point is assigned to the cluster . The last constraint in (15) ensures that each data point is assigned to a cluster, while the constraint system in the penultimate row ensures that there are exactly clusters. We now show that we can solve the -means clustering problem by solving a modified problem (4) with an additional constraint .

###### Theorem 2.

The following non-convex program solves the -means clustering problem.

 Z⋆=min\textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤Xuiu⊤i)s.t.U∈RN×K+u⊤iui=1∀i∈[K]u⊤iuj=0∀i,j∈[K]:i≠j∑i∈[K]uiu⊤ie=e (Z)
###### Proof.

We first observe that the centroids in (15) can be expressed as

 ci=1e⊤πi∑n∈[N]πinxn∀i∈[K].

Substituting these terms into the objective function and expanding the squared norm yield

Thus, (15) can be simplified into the problem

 min\textuptr(X⊤X)−∑i∈[K]1e⊤πi\textuptr(X⊤Xπiπ⊤i)s.t.πi∈{0,1}N∀i∈[K]e⊤πi≥1∀i∈[K]∑i∈[K]πi=e. (16)

For any feasible solution to (16) we define the vectors that satisfy

 ui=πi√e⊤πi∀i∈[K].

We argue that the solution is feasible to () and yields the same objective value. Indeed, we have

 u⊤iui=π⊤iπie⊤πi=1∀i∈[K]

because and for all . We also have

 ∑i∈[K]uiu⊤ie=∑i∈[K]πi√e⊤πie⊤πi√e⊤πi=e,

and

 u⊤iuj=0∀i,j∈[K]:i≠j

since the constraint in (16) ensures that each data point is assigned to at most cluster. Verifying the objective value of this solution, we obtain

 \textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤Xuiu⊤i)=\textuptr(X⊤X)−∑i∈[K]1e⊤πi\textuptr(X⊤Xπiπ⊤i).

Thus, we conclude that problem () constitutes a relaxation of (16).

To show that () is indeed an exact reformulation, consider any feasible solution to this problem. For any fixed , the complementary constraint in () means that

 uin>0⟹ujn=0andujn>0⟹uin=0for alln∈[N].

Thus, in view of the last constraint in (), we must have for every . Using this observation, we next define the binary vectors that satisfy

 πi=uiu⊤ie∈{0,1}N∀i∈[K].

For every , we find that since . Furthermore, we have

 ∑i∈[K]πi=∑i∈[K]uiu⊤ie=e.

Substituting the constructed solution into the objective function of (16), we obtain

 \textuptr(X⊤X)−∑i∈[K]1e⊤πi\textuptr(X⊤Xπiπ⊤i)=\textuptr(X⊤X)−∑i∈[K](u⊤ie)2e⊤uiu⊤ie\textuptr(X⊤Xuiu⊤i)=\textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤Xuiu⊤i).

Thus, any feasible solution to () can be used to construct a feasible solution to (16) that yields the same objective value. Our previous argument that (16) is a relaxation of () then implies that both problems are indeed equivalent. This completes the proof. ∎

###### Remark 1.

The constraint in () ensures that there are no fractional values in the resulting cluster assignment vectors . While the formulation (4) is only applicable for instances of ONMF with non-negative input data , the reformulation () remains valid for any instances of -means clustering, even if the input data matrix contains negative elements.

###### Remark 2.

We can reformulate the objective function of () as , where is the matrix with elements , . To obtain this reformulation, define . Then we have

 12\textuptr(DY)=12∑p,q∈[N]∥xp−xq∥2Ypq=12∑p,q∈[N](x⊤pxp+x⊤qxq−2x⊤pxq)Ypq=12⎛⎝2∑p∈[N]∑q∈[N]x⊤pxpYpq⎞⎠−∑p,q∈[N]x⊤pxqYpq=⎛⎝∑p∈[N]x⊤pxp⎞⎠−⎛⎝∑p,q∈[N]x⊤pxqYpq⎞⎠=\textuptr(X⊤X)−\textuptr(X⊤XY).

Here, the fourth equality holds because of the last constraint of () which ensures that for all .

We are now well-positioned to derive an equivalent completely positive program for the -means clustering problem. To simplify our notation we will employ the sets

 U(N,K)={U∈RN×K+:u⊤iui=1∀i∈[K],u⊤iuj=0∀i,j∈[K]:i≠j} and
 V(N,K)={(Vij)i,j∈[K]∈RN2×K2+:\textuptr(Vii)=1∀i∈[K],\textuptr(Vij)=0∀i,j∈[K]:i≠j}

in all reformulations in the remainder of this section.

###### Theorem 3.

The following completely positive program solves the -means clustering problem.

 Z⋆=min\textuptr(X⊤X)−∑i∈[K]\textuptr(X⊤XVii)s.t.pi∈SOCN+