Cluster Analysis is Convex
Abstract
In this paper, we show that the popular means clustering problem can equivalently be reformulated as a conic program of polynomial size. The arising convex optimization problem is NPhard, but amenable to a tractable semidefinite programming (SDP) relaxation that is tighter than the current SDP relaxation schemes in the literature. In contrast to the existing schemes, our proposed SDP formulation gives rise to solutions that can be leveraged to identify the clusters. We devise a new approximation algorithm for means clustering that utilizes the improved formulation and empirically illustrate its superiority over the stateoftheart solution schemes.
1 Introduction
Given an input set of data points, cluster analysis endeavors to discover a fixed number of disjoint clusters so that the data points in the same cluster are closer to each other than to those in other clusters. Cluster analysis is fundamental to a wide array of applications in science, engineering, economics, psychology, marketing, etc. [15, 14]. One of the most popular approaches for cluster analysis is the means clustering [21, 19, 14]. The goal of means clustering is to partition the data points into clusters so that the sum of squared distances to the respective cluster centroids is minimized. Formally, the means clustering seeks for a solution to the mathematical optimization problem
(1) 
Here, are the input data points, while are the output clusters. The vectors in (1) determine the cluster centroids, while the constraints on the last row of (1) ensure that the subsets constitute a partition of the set .
Due to its combinatorial nature, the means clustering problem (1) is generically NPhard [2]. A popular solution scheme for this intractable problem is the heuristic algorithm developed by Lloyd [19]. The algorithm initializes by randomly selecting the cluster centroids. It then proceeds by alternating between the assignment step and the update step. In the assignment step the algorithm designates each data point to the closest centroid, while in the update step the algorithm determines new cluster centroids according to current assignment.
Another popular approach arises in the form of convex relaxation schemes [24, 4, 26]. In this approach, tractable semidefinite programming (SDP) lower bounds for (1) are derived. Solutions of these optimization problems are then transformed into the cluster assignments via wellconstructed rounding procedures. Such convex relaxation schemes have a number of theoretically appealing properties. If the data points are supported on disjoint balls then exact recovery is possible with high probability whenever the distance between any two balls is sufficiently large [4, 13]. A stronger modelfree result is achievable if the cardinalities of the clusters are prescribed to the problem [26].
A closely related problem is the nonnegative matrix factorization with orthogonality constraints (ONMF). Given an input data matrix , the ONMF problem seeks for nonnegative matrices and so that both the product is close to in view of the Frobenius norm and the orthogonality constraint is satisfied. Although ONMF is not precisely equivalent to means, solutions to this problem have the clustering property [10, 18, 11, 16]. In [25], it is shown that the ONMF problem is in fact equivalent to a weighted variant of the means clustering problem.
In this paper, we attempt to obtain equivalent convex reformulations for the ONMF and the means clustering problems. To derive these reformulations, we adapt the results by Burer and Hong [8] who showed that any (nonconvex) quadratically constrained quadratic program can be reformulated as a linear program over the convex cone of completely positive matrices. The resulting optimization problem is called a completely positive program. Such a transformation does not immediately mitigate the intractability of the original problem, since solving a generic completely positive program is NPhard. However, the complexity of the problem is now entirely absorbed in the cone of completely positive matrices which admits tractable semidefinite representable outer approximations [23, 9, 17]. Replacing the cone with these outer approximations gives rise to SDP relaxations of the original problem that in principle can be solved efficiently.
As byproducts of our derivations, we identify a new condition that makes the ONMF and the means clustering problems equivalent and we obtain a new SDP relaxation for the means clustering problem that is tighter than the wellknown relaxation proposed by Peng and Wei [24]. The contributions of this paper can be summarized as follows.

We disclose a new connection between ONMF and means clustering. We show that means clustering is equivalent to ONMF if an additional requirement on the binarity of the solution to the latter problem is imposed. This amends the previous incorrect result by Ding et al. [10, Section 2] and Li and Ding [18, Theorem 1] who claimed that both problems are equivalent.^{1}^{1}1To our best understanding, they have shown only one of the implications that establish an equivalence.

We derive exact conic programming reformulations for the ONMF and means clustering problems that are principally amenable to numerical solutions. To our best knowledge, we are the first to obtain equivalent convex reformulations for these problems.

In view of the equivalent convex reformulation, we derive a tighter SDP relaxation for the means clustering problem whose solutions can be used to construct high quality estimates of the cluster assignment.

We devise a new approximation algorithm for the means clustering problem that leverages the improved relaxation and numerically highlight its superiority over the stateoftheart SDP approximation scheme and the Lloyd’s algorithm.
The remainder of the paper is structured as follows. We derive a conic programming reformulation for the ONMF problem in Section 2. We extend this result to the setting of means clustering in Section 3. Section 4 develops an SDP relaxation and designs a new approximation algorithm for means clustering. Finally, we empirically assess the performance of our proposed algorithm in Section 5.
Notation:
For any , we define as the index set . We denote by the identity matrix and by the vector of all ones. We also define as the th canonical basis vector. Their dimensions will be clear from the context. The trace of a square matrix is denoted as . We define as the diagonal matrix whose diagonal elements comprise the entries of . For any nonnegative vector , we define the cardinality of all positive elements of by . For any matrix , we denote by the vector that corresponds to the th column of . The set of all symmetric matrices in is denoted as , while the cone of positive semidefinite matrices in is denoted as . The cone of completely positive matrices over a set is denoted as . For any and any cone , the relations and denote that is an element of and , respectively. The dimensional secondorder cone is defined as . We denote by the intersection of the dimensional secondorder cone and the nonnegative orthant.
2 Orthogonal NonNegative Matrix Factorization
In this section, we consider the ONMF problem given by
(2) 
Here, is a matrix whose columns comprise the data points in . We remark that the problem (2) is generically intractable since we are minimizing a nonconvex quadratic objective function over the Stiefel manifold [1, 3]. By expanding the Frobenius norm in the objective function and noting that , we find that (2) is equivalent to the problem
(3) 
If all elements of are nonnegative, then we can reduce (2) into a simpler problem involving only the decision matrix .
Lemma 1.
If is a nonnegative matrix then the problem (2) is equivalent to the nonconvex program
(4) 
Proof.
In the following, we first derive a convex reformulation for the simpler problem (4). We remark that this problem is still intractable due to the nonconvexity of the objective function and the constraint system. Thus, any resulting convex formulation will in general remain intractable. To derive our reformulation, we rely on the following lemma which is proven in [6].
Lemma 2 ([6, Lemma 2.2]).
Let the nonnegative matrix and vector satisfy the linear constraints
where the constraint system , , defines a bounded polytope in . Consider the completely positive matrix decomposition
with and for all , and define the subsets and of . Then we have

for all and ;

for all .
We are now ready to state our first main result.
Theorem 1.
Problem (4) is equivalent to the following completely positive program.
(5) 
Proof.
By employing the notation for column vectors , we can reformulate (4) equivalently as the problem
(6) 
Next, we introduce the auxiliary decision variables , , , that satisfy
Since , and consequently , we can without loss of generality impose the vector to reside in the cone . Thus, we obtain the equivalent reformulation
(7) 
The last constraint of (7) implies that the lefthand side matrix has rank . If we relax this constraint to the requirement that the matrix should merely belong to the completely positive cone , then we arrive at the completely positive program (5). The arising problem is a relaxation whose optimal value constitutes a lower bound on the optimal value of (7). Exactness of this reformulation is then obtained by adapting the procedures in [8]. Since we will employ and extend the same reformulation techniques in the subsequent section, we provide the derivations for the specific instance above as follows.
We first consider the completely positive decomposition of the rank1 matrix given by
where is a finite index set, and and for every and . We next partition the index set into the sets and , which gives rise to the decomposition
(8) 
For every and , we partition the vector into the constituents and such that
Since , we have by construction
(9) 
Next, for every we consider the following completely positive decomposition involving only the matrix and the vector in (8).
(10) 
Lemma 2 then implies that
Substituting these identities into (9), we obtain
(11) 
and
(12) 
Next, the constraint in (5) together with the decomposition in (10) gives rise to
Thus, (12) yields
Since , the point on the righthandside of the equation lies in the convex hull of the scalars , . Thus, in view of the inequalities in (11), we must have for all and . A similar argument allows us to further conclude that the penultimate constraint in (5) yields for all and such that . In summary, we find that for any index the solution satisfying
(13) 
is feasible to the original problem (6). Verifying the objective value of the relaxed problem (5), we further find that it constitutes a convex combination of the objective values of (6) evaluated at the respective points in (13), i.e.,
Thus, there exists an index such that
As the solution in (13) corresponding to is feasible in (6), we conclude that the optimal value of (5) constitutes an upper bound on the optimal value of (6). Our previous argument that (5) is a relaxation of (6) thus implies that both problems are indeed equivalent. This completes the proof. ∎
By employing the same reformulation techniques as in the proof of Theorem 1, we can show that the generic problem (2) is amenable to an exact convex reformulation.
Proposition 1.
Problem (2) is equivalent to the following completely positive program.
(14) 
3 means Clustering
Building upon the results in the previous section, we now derive an exact completely positive programming reformulation for the means clustering problem (1). To this end, we note that the problem can equivalently be solved via the following mixedinteger nonlinear program [12].
(15) 
Here, is the centroid of the th cluster, while is the assignment vector for the th cluster, i.e., if and only if the data point is assigned to the cluster . The last constraint in (15) ensures that each data point is assigned to a cluster, while the constraint system in the penultimate row ensures that there are exactly clusters. We now show that we can solve the means clustering problem by solving a modified problem (4) with an additional constraint .
Theorem 2.
The following nonconvex program solves the means clustering problem.
() 
Proof.
We first observe that the centroids in (15) can be expressed as
Substituting these terms into the objective function and expanding the squared norm yield
Thus, (15) can be simplified into the problem
(16) 
For any feasible solution to (16) we define the vectors that satisfy
We argue that the solution is feasible to () and yields the same objective value. Indeed, we have
because and for all . We also have
and
since the constraint in (16) ensures that each data point is assigned to at most cluster. Verifying the objective value of this solution, we obtain
Thus, we conclude that problem () constitutes a relaxation of (16).
To show that () is indeed an exact reformulation, consider any feasible solution to this problem. For any fixed , the complementary constraint in () means that
Thus, in view of the last constraint in (), we must have for every . Using this observation, we next define the binary vectors that satisfy
For every , we find that since . Furthermore, we have
Substituting the constructed solution into the objective function of (16), we obtain
Thus, any feasible solution to () can be used to construct a feasible solution to (16) that yields the same objective value. Our previous argument that (16) is a relaxation of () then implies that both problems are indeed equivalent. This completes the proof. ∎
Remark 1.
The constraint in () ensures that there are no fractional values in the resulting cluster assignment vectors . While the formulation (4) is only applicable for instances of ONMF with nonnegative input data , the reformulation () remains valid for any instances of means clustering, even if the input data matrix contains negative elements.
Remark 2.
We are now wellpositioned to derive an equivalent completely positive program for the means clustering problem. To simplify our notation we will employ the sets
in all reformulations in the remainder of this section.
Theorem 3.
The following completely positive program solves the means clustering problem.