When Do Birds of a Feather Flock Together? kMeans, Proximity, and Conic Programming
Abstract
Given a set of data, one central goal is to group them into clusters based on some notion of similarity between the individual objects. One of the most popular and widelyused approaches is means despite the computational hardness to find its global minimum. We study and compare the properties of different convex relaxations by relating them to corresponding proximity conditions, an idea originally introduced by Kumar and Kannan. Using conic duality theory, we present an improved proximity condition under which the PengWei relaxation of means recovers the underlying clusters exactly. Our proximity condition improves upon Kumar and Kannan and is comparable to that of Awashti and Sheffet, where proximity conditions are established for projective means. In addition, we provide a necessary proximity condition for the exactness of the PengWei relaxation. For the special case of equal cluster sizes, we establish a different and completely localized proximity condition under which the AminiLevina relaxation yields exact clustering, thereby having addressed an open problem by Awasthi and Sheffet in the balanced case.
Our framework is not only deterministic and modelfree but also comes with a clear geometric meaning which allows for further analysis and generalization. Moreover, it can be conveniently applied to analyzing various data generative models such as the stochastic ball models and Gaussian mixture models. With this method, we improve the current minimum separation bound for the stochastic ball models and achieve the stateoftheart results of learning Gaussian mixture models.
1 Introduction
means clustering is one of the most wellknown and widelyused clustering methods in unsupervised learning. Given data points in , the goal is to partition them into clusters by minimizing the total squared distance between each data point and the corresponding cluster center. It is a problem related to Voronoi tessellations [10]. However, means is combinatorial in nature since it is essentially equivalent to an integer programming problem [22]. Thus, minimizing the means objective function turns out to be an NPhard problem, even if there are only two clusters [2] or if the data points are on a D plane [19].
Despite its hardness, numerous efforts have been made to develop effective and efficient heuristic algorithms to handle the means problem in practice. A famous example is Lloyd’s algorithm [17] which was originally introduced for vector quantization and then became popular in data clustering due to its high efficiency and simplicity of implementation. One of the earliest convergence analyses of Lloyd’s algorithm was given by Selim and Ismail [22]: Under certain conditions, the algorithm converges to a stationary point within a finite number of iterations but may fail to converge to a local minimum. A smoothed analysis given by Arthur, Manthey and Roglin [4] shows that the smoothed/expected number of iterations is bounded polynomially by , and while the worstcase running time can be even for the case when data points are on a plane [24].
We are particularly interested in the semidefinite programming (SDP) relaxation for means by Peng and Wei [21], who observed that the means objective function can be written as the inner product between a projection matrix and a distance matrix constructed from the data, and the combinatorial constraints of the projection matrix can be convexified. Thus, whenever the PengWei relaxation produces an output corresponding to a partition of the data set, the means problem is solved in polynomial time [27]. The details of the PengWei relaxation will be explained in \prettyrefsec:prelim.
Theoretical properties of the PengWei relaxation have also been studied under specific stochastic models in the literature. Minimum separation conditions were established in [5, 13] to guarantee exact clustering for the stochastic ball models with balanced clusters (i.e., each cluster has the same number of points), while a similar study was conducted in [20] for the Gaussian mixture model.
Despite these efforts, the PengWei relaxation is not yet thoroughly understood. Several fundamental questions of vital importance remain unexplored or require better answers, such as

How do the number of clusters and the data dimension affect the performance of the PengWei relaxation?

How does the performance of the PengWei relaxation depend on the balancedness of the cluster sizes and covariance structures within each cluster?

Can the global minimum separation condition be localized?

Under the special case of equal cluster sizes, does the tighter AminiLevina relaxation [3] improve the PengWei relaxation? If so, in which sense?
The studies in [5, 13, 20] reveal certain information about the PengWei relaxation based on the assumption of sufficient minimum center separation: guaranteed exact recovery in the case of the stochastic ball model [5, 13] and learning of centers for the Gaussian mixture model [20]. The price to obtain such information, the requirement imposed upon the minimum center separation, is the homogeneity of the criteria forced on all different clusters. In other words, each pair of clusters, regardless of their shapes and cardinalities, must have their centers separated by a uniform distance determined by the entire data set. As a consequence of this “global” condition, the effect of an isolated but huge cluster ripples throughout the entire data set by raising the minimum center separation. Thus, a more “localized” condition, i.e., a condition on the center separation for each pair of clusters that relies largely on local information, is much desired. Such a more localized condition might pave the way to address the aforementioned fundamental questions regarding the PengWei relaxation.
To that end, in this paper we introduce a proximity condition enabling us to relate the pairwise center distances to more localized quantities. Interestingly, it turns out that our proximity condition improves the one in [15] and is comparable to that in [6], the stateoftheart proximity conditions in the literature of SVDbased projective means. Furthermore, under the AminiLevina relaxation for clusters of equal cardinality, the associated proximity condition becomes even “fully localized”, as it only involves information about pairs of clusters.
1.1 Organization of our paper
Our paper is organized as follows. In the remainder of this introductory section we present our aforementioned proximity condition, discuss its implication for various stochastic cluster models and briefly compare our results to the state of the art. In Section 2, we discuss means and its convex relaxation introduced by Peng and Wei. In Section 3, we show that the PengWei relaxation yields the solution of the means objective as long as our proximity condition (1.1) is satisfied. A different proximity condition for the exactness of AminiLevina relaxation is discussed in the same section. In \prettyrefsec:sbm_and_gmm, we consider the application of our framework to the stochastic ball model and the Gaussian mixture model. Numerical simulations that illustrate our theoretical findings are presented in Section 5. All proofs can be found in Sections 6–8.
1.2 Proximity conditions under deterministic models
The idea of proximity conditions originates from the work [15] by Kumar and Kannan who use a proximity condition to characterize the performance of Lloyd’s algorithm with an initialization given by an SVDbased projection under deterministic models. The result is later improved by Awasthi and Sheffet [6], who perform a finer analysis and redesign the proximity condition for the same algorithm. To the best of our knowledge, no such type of proximity conditions has been established for the PengWei relaxation so far, and we will fill this gap in this paper.
Conceptually speaking, our proximity condition can be interpreted as follows:
For each pair of clusters, every point is closer to the center of its own cluster, while the bisector hyperplane of the centers keeps all points in the two clusters at a certain distance determined by global information of the data set.
Roughly speaking, the proximity condition characterizes for each pair of clusters how much closer each point is to the withincluster center than the crosscluster center. This is conceptually much more localized than minimum separation, which compares all pairwise center distances to a uniform quantity.
Let us introduce some necessary notation before we proceed to the exact statement of our proximity condition. Given a set of data points with mutually disjoint clusters , we can reindex according to the clusters: for all . Denote by the number of elements in .
Denote the data matrix of the th cluster by
Furthermore, define
In other words, is the sample mean (cluster center) of the th cluster, is the unit vector pointing from to , and is the centered data matrix of the th cluster. Now we are ready to give a mathematical characterization of the proximity condition.
Condition 1.1 (Proximity condition).
The partition satisfies the proximity condition if for any , there holds
(1.1) 
Here, is the operator norm of the matrix .
The proximity condition has a very intuitive geometric interpretation, see also Figure 1. Suppose the partition of data points satisfies the proximity condition. Then each pair of clusters and can be separated by a plane through the bisector of their sample means and . Moreover, the distance between every point in those two clusters and the bisector must be greater than the right hand side of (1.1). This geometric interpretation can be further illustrated by rewriting (1.1): Denote by the distance between the two centers and . Moreover, define
Clearly, is the maximum signed projection distance over all the data points in the clusters and . As illustrated in Figure 1, one can easily check that the left hand side of proximity condition (1.1) is in fact equal to which is the shortest distance between the midpoint and the projections of all the data points in and on the line connecting and . This observation gives us the following proposition.
Proposition 1.2.
The proximity condition (1.1) is equivalent to
(1.2) 
Besides showing that the proximity condition (1.1) guarantees the exactness of PengWei relaxation, we also obtain a necessary proximity condition. If a deterministic mixture fails to fulfill the necessary condition, exact recovery by the PengWei relaxation is provably impossible.
Awasthi and Sheffet’s has raised an open question in [6]: can the pairwise separation condition be fully localized, i.e., depend only on information of the corresponding pair of clusters? We apply the Amini and Levina’s relaxation [3], originally intended to address the weak assortativity issue in community detection among networks, to convexify the means problem in the case of balanced clusters. Surprisingly, we end up with a completely localized proximity condition for the exactness of the convex relaxation, thus solving Awasthi and Sheffet’s open problem for the balanced case.
Furthermore, beyond the scope of the PengWei relaxation of means, the proximity condition itself provides an algorithm that can accept answers to the NPhard means problem (although it is not able to reject an answer). For a given solution to means, one can simply check whether the proximity condition holds, and if it does hold, then the solution is provably the unique global minimum. The time cost is proportional to . Assuming the number of clusters and the dimension of data are fixed, the time complexity is linear in the total number of points , which improves the quasilineartime algorithm proposed in [13] in terms of the time complexity.
1.3 Comparison to existing proximity conditions in the literature
As mentioned before, in the literature of projective means, proximity conditions have been proposed in [15] and later improved in [6]. In this section we compare our proximity conditions with these existing results.
Denote . By our notation, the original KumarKannan proximity condition [15] is equivalent to
for some large absolute constant . The fact that implies . Therefore, our proximity condition (1.2) is strictly weaker than the KumarKannan condition by at least a factor of .
The comparison between (1.1) and the AwasthiSheffet conditions in [6] is less straightforward. Theorem 4 therein states that consistent clustering is guaranteed by projective means plus Lloyd’s algorithm as long as
(1.3) 
Compared to our proximity condition (1.1), the second term on the righthand side of \prettyrefeq:awasthi_min_sep could be more stringent given the fact , whereas the first term is less stringent than ours since
Therefore, it is fair to say our proximity condition is comparable to the AwasthiSheffet condition.
1.4 Implications under stochastic models
We should emphasize that in order to prove our main results, we benefit a lot from the existing primaldual analyses in [5, 13]. The major difference between our analysis and [5, 13] is that we aim at deriving proximity conditions under deterministic models rather than establishing minimum separation results under stochastic models.
However, we are still curious about what minimum separation conditions our proximity condition can yield when applied to both the stochastic ball model and the Gaussian mixture model. Before presenting conditions given by our proximity condition, we first review the stateoftheart results on both models.
Existing work on the PengWei relaxation:
The stochastic ball model can be viewed as a special case of mixture models where the distributions of sample data points are compactly supported on disjoint unit balls in . The clusters are balanced and the covariance structure is fairly rigid since all the distributions are assumed to be identical and isotropic.
Let be the minimal separation between the cluster centers. In [5], it is proven that the PengWei relaxation achieves exact recovery provided , where the lower bound of is independent of the number of clusters . Another bound of is given in [13] stating that exact recovery is guaranteed if which is nearoptimal in the regime.
The Gaussian mixture model (GMM) as a stochastic model is more flexible. This model is characterized by its density function which is a weighted sum of the density functions of Gaussian or subgaussian distributions. In [20], assuming the Gaussian distributions are identical and isotropic, Mixon, Villar and Ward prove that the PengWei relaxation learns the Gaussian centers for balanced clusters when the center separations are required to be above , where is the common covariance of all Gaussian distributions.
Existing work on other algorithms:
Clustering Gaussian mixture models has received extensive attention in machine learning and statistics communities. Besides [20], a lot of progress has been made in developing efficient algorithms for this task. Among them are a family of algorithms here referred to as the projective means [25, 1, 14, 15, 6, 9, 18]. In general, the projective means works in two steps: first project all the data points onto a lower dimensional space usually based on singular value decomposition (SVD), and then classify each point by heuristic methods such as single linkage clustering in [1] or Lloyd’s algorithm in [6].
Vempala and Wang [25] show that if each pairwise center separation is larger than a quantity determined by the number of clusters , the dimension and the variances of the clusters, the projective algorithm can classify a mixture of isotropic Gaussians with high probability. Achlioptas and McSherry [1] show that SVDbased projection followed by singlelinkage clustering is able to classify all the sampled data points accurately if the center separation of each pair of clusters is greater than the operator norm of the covariance matrix and the weights of the two clusters plus a term which depends on the concentration properties of the distributions in the mixture. The algorithm studied by Kannan and Kumar in [15]—the work that first devises the idea of proximity condition—also begins with an SVDbased projection and proceeds by Lloyd’s algorithm which is initialized by an unspecified nearoptimal solution to the means problem. As stated before, its technical results are improved by Awatshi and Sheffet in [6]. Recently, Lu and Zhou [18] provide a more detailed estimation of misclassification rate for each iteration of Lloyd’s algorithm with initialization given by spectral methods [14].
Our results:
We can easily apply the proximity condition to the stochastic ball model and the Gaussian mixture model. The corresponding recovery guarantees are competitive with or improve upon other stateoftheart results.

For the stochastic ball model, we show that is sufficient to guarantee the exact recovery of the PengWei relaxation, which improves the separation condition in [13] when is large. Moreover, our result applies to a broader class of stochastic ball models where each cluster can have a different number of points and may even satisfy a different probability distribution as long as the support of density function is contained within a unit ball.

For the Gaussian mixture model, we summarize our result for the PengWei relaxation and other stateoftheart results for both the PengWei relaxation and projective means in Table 1. It has been shown in [20] that the centers of a Gaussian mixture can be accurately estimated by PengWei relaxation provided the minimal separation is . In contrast, our proximity provides a different minimal separation condition , which is smaller than if is large and not too large. Our separation condition is better than [15] and comparable to [6] for projective means. Though our bound loses a factor visàvis the one in [25] for the special case of spherical Gaussian mixtures, we can handle more general Gaussian mixtures where the density functions do not have to be spherical or identical.
Authors  Separation bounds  Algorithms  Exact  Year 
Vempala and Wang [25]  Projective means  Yes  2004  
Achlioptas and McSherry [1]  Projective means  Yes  2005  
Kumar and Kannan [15]  Projective means  Yes  2010  
Awasthi and Sheffet [6]  Projective means  Yes  2012  
Lu and Zhou [18]  Projective means  No  2016  
Mixon, Villar, and Ward [20]  SDP means  No  2017  
Our work  SDP means  Yes   
1.5 Notation
Let be the indicator vector of . is an vector with all entries equal to 1. Given any two real matrices and in , we define the inner product as . For a vector , is equal to the largest entry of . We denote if is a nonnegative matrix, i.e., each entry is nonnegative; if is a symmetric positive semidefinite matrix. Besides, we also use the notation listed below throughout the paper.
Dimension of data  
Number of clusters  
Set of data points in  
The th cluster  
Total number of data points  
Number of points in the th cluster  
Set of symmetric matrices  
Set of positive semidefinite matrices  
Set of nonnegative matrices  
Data matrix of all data points  
Data matrix of the th cluster  
Centered data matrix of the th cluster  
Squared distance matrix  
Groundtruth solution to the SDP relaxation of means  
Submatrix of any matrix given by  
The th data point in the th cluster  
Population mean of the th cluster in a generative model  
Sample mean of the th cluster  
Unit vector pointing from to  
Signed projection distance given by  
Distance between and  
Maximum signed projection distance determined by and 
2 means and the PengWei relaxation
In this section, we briefly review the formulation of means and its SDP relaxation introduced by Peng and Wei [21]. Let be a set of data points in . means attempts to divide into disjoint clusters by seeking a solution to the following minimization problem:
where form a partition of (i.e., and if ). For any given partition , choosing as the centroid minimizes the objective function. Therefore, the means problem is equivalent to:
(2.1) 
Given an arbitrary partition of , let be the indicator function of the th cluster. That is,
A simple calculation can reveal that
and hence,
where is the distance matrix with the th entry being given by . Therefore, we can rewrite the means problem as
(2.2) 
It is selfevident that (2.2) is a nonconvex problem due to the combinatorial nature of the feasible set. Indeed, (2.2) is an NPhard problem [2]. Despite this, it can be easily verified that satisfies the following four properties:
Replacing the constraint in (2.2) by the above four properties leads to the SDP relaxation of means introduced by Peng and Wei in [21],
(2.3) 
which will be the focus of this paper.
The PengWei relaxation is a convex problem and can be solved in polynomial time using the interiorpoint method [27]. We denote by the optimal solution to the PengWei relaxation. Clearly, every feasible point of (2.2) is also feasible for (2.3); so once the optimal solution to (2.3) has the form , it must be an optimal solution to the means problem. Therefore, the question of central importance is:
When is the solution to (2.3) of the form ?
3 Exact recovery guarantees
3.1 Exact clustering and proximity conditions
In a nutshell our following main theorem states that the proximity condition (1.1) implies the exactness of the PengWei relaxation (2.3):
Theorem 3.1 (Main theorem).
Since the global minimum of (2.3) is always smaller than that of (2.1), Theorem 3.1 implies that the proximity condition provides a simple algorithm that is able to accept answers to the means problem.
Corollary 3.2 (Algorithm accepting answers to means).
If a partition satisfies the proximity condition (1.1), then it is the unique global minimum to the means objective function.
Note that each data point appears times on the left hand side of (1.1), and it takes amount of time to compute each matrix operator norm using the GolubReisch SVD algorithm [11]. Thus, the time cost to examine the proximity condition is proportional to .
To the best of our knowledge, means problem has not been shown in NP or not. The proximity condition does not change this fact. We want to emphasize that the polynomial time examination of the proximity condition (1.1) does not imply that an answer to the means problem can be verified in polynomial time since it does not accept all correct answers. A different approach that leverages the dual certificate associated with the PengWei relaxation to test under certain conditions the optimality of a candidate means solution can be found in [13]. The algorithm proposed in [13] tests the optimality of a candidate solution in quasilinear time. Hence, our method improves the time complexity by a logarithmic factor.
While the main theorem provides a sufficient condition for the PengWei relaxation to exactly recover a given partition, the following theorem gives a necessary condition.
Theorem 3.3 (Necessary condition).
Suppose is a global minimum of (2.3). Then the partition must satisfy
(3.1) 
Notice that as long as is a solution to (2.3), must be a global minimum to the means. In other words, it is harder for a deterministic mixture to be exactly recovered by the PengWei relaxation than being the global minimum to the means. It remains unclear whether this necessary condition (Theorem 3.3) is only necessary for the PengWei relaxation or is necessary for the means itself as well.
3.2 Balanced case: AminiLevina relaxation and proximity condition
One special case of interest is the balanced case where each cluster has the same number of points, i.e. . We have seen in Section 2 that the means problem can be rewritten as (2.2):
(3.2) 
With the balanced assumption, i.e., the cardinalities of all clusters being the same, it is easy to verify that obeys the following four constraints:
This leads to the AminiLevina relaxation of means, which was first introduced in [3] for community detection under balanced case in order to address the weak assortativity issue:
(3.3) 
As with the analyses on the PengWei relaxation, once the optimal solution to (3.3) takes the form , the AminiLevina relaxation gives an optimal solution to the means problem with balanced assumption. Once again, we ask the same question for Peng and Wei’s relaxation: When is the solution to (3.3) of the form ?
Unsurprisingly, the answer is another proximity condition specially tailored for Amini and Levina’s relaxation.
Condition 3.4 (Proximity condition for balanced clusters).
A partition with satisfies the proximity condition for balanced clusters if for any , there holds
(3.4) 
Similar to the general case, the proximity condition for balanced clusters also has an equivalent formulation:
(3.5) 
Theorem 3.5 (Exact recovery for balanced clusters).
Compared with the proximity condition for Peng and Wei’s relaxation (1.1), the proximity condition for Amini and Levina’s relaxation distinguishes itself by decoupling the clusters in the sense that each of the inequalities in (3.4) only depends on the two clusters involved in the inequality. In the case of balanced clusters, this immediately solves the open question posed by Awasthi and Sheffet [6], which asks if such a proximity condition exists.
The completely localized proximity condition is particularly meaningful when there are a few abnormal clusters whose covariance matrices are huge in matrix operator norm, but at the same time being away from all the other clusters. In this case, the proximity condition for Amini and Levina’s relaxation has far better chance than that for Peng and Wei’s relaxation to detect a reasonable partition of the data set. Figure 2 provides such an example.
Analogously, we can also prove a necessary condition for the AminiLevina relaxation, which can be compared with Theorem 3.3 for the general case.
Theorem 3.6 (Necessary condition for balanced clusters).
Suppose is a global minimum of (3.3). Then the partition must satisfy
(3.6) 
4 Results under random models
Next we apply the proximity condition (1.1) to data sets generated from the generalized stochastic ball model and the Gaussian mixture model, respectively. We first give a formal definition for each model and then present the minimal separation condition which is sufficient to guarantee the exact recovery of underlying clusters by the PengWei relaxation. The minimal separation conditions are established by verifying the proximity condition (1.1) for those two random models. For proofs, see Sections 8.2 and 8.3.
4.1 Stochastic ball model
The definition of generalized stochastic ball model is given as follows where we only assume the support of the density function is contained in the unit ball of for all clusters.
Definition 4.1 (Generalized stochastic ball model).
Let be a set of deterministic vectors in . For each , is a distribution supported on the unit ball of with a covariance matrix and are i.i.d. zeromean random vectors drawn from the distribution . The th cluster is formed by , where for .
Corollary 4.1.
Denote , , , and For the generalized stochastic ball model, we draw points from the th ball for each . The PengWei relaxation achieves exact recovery with probability at least if and
(4.1) 
where and . In particular, if for all , and each is a uniform distribution over the unit ball of , then (4.1) can be simplified to
by noting that
Remark 4.2.
As the number of data points goes to infinity provided and are fixed, the value of vanishes. So asymptotically the minimal separation condition reduces to when and . Note that we only assume that the distribution is supported on the unit ball, so rotationinvariant distributions which are assumed in [13, 12] are also included. Compared with the result in [13, 12] where is required, we have achieved a better bound when is large.
We can also apply the necessary lower bound (Theorem 3.3) to the generalized stochastic ball model. To illustrate this, let us study a special case where the following Corollary holds.
Corollary 4.3.
For the generalized ball model, if for all we have , then with high probability, the PengWei relaxation fails to achieve exact recovery provided that is large enough and
If for any , is the uniform distribution over the unit ball, the bound becomes
4.2 Gaussian mixture model
The definition of Gaussian mixture model is given below, followed by the minimal separation condition for the exactness of the PengWei relaxation.
Definition 4.2 (Gaussian mixture model).
Consider a mixture of Gaussian distributions in with a set of weights obeying and . The probability density function of this mixture model is
where is the probability density function of the Gaussian distribution .
Corollary 4.4.
Denote , and . For the Gaussian mixture model, the PengWei relaxation achieves exact recovery with probability at least if
where if . In particular, if and for all , then the above condition reduces to
and if .
5 Numerical experiments
Consider applying the PengWei relaxation to the generalized stochastic ball model. When the total number of the data points becomes large enough, the parameter vanishes and the sufficient lower bound predicted by Corollary 4.1 as in (4.1) becomes
(5.1) 
The stateoftheart bound for the stochastic ball model proved in [5, 13] is
(5.2) 
The exact phase transition bound, above which exact recovery can be achieved by the PengWei relaxation of means, is smaller than both of the above sufficient lower bounds. As one would expect, the actual lower bound is hard to find in practice. The major difficulty occurs when the number of clusters is greater than 2. In this case, when creating an instance of the stochastic ball model with prescribed minimal separation distance , there are infinitely many possible ways to place the centers and this cannot be resolved by translation, rotation, and scaling. To address this, we investigate the worst case where centers are packed as compactly as possible while points in each cluster are chosen in the most scattered way. We have a better chance finding a more accurate lower bound under this arrangement.
Three instructive centroidal geometries, the geometries formed by the locations of the centers, are considered, and we call them circleshaped geometry, lineshaped geometry, and hiveshaped geometry respectively. Centers are packed compactly under these shapes, especially the hiveshaped geometry. We can rescale the three geometries to change the minimal separation distance . An illustration of these geometries formed by the locations of the centers is shown in Figure 3.
We let the number of data points in each cluster be . Hence, the total number of points . As a result, . These points are equispaced points on the unit circle centered at . The data points are chosen in this way since it maximizes the variance. Because the data is isotropic and the variance is equal to , we have .
For and chosen above, we can see that our bound is an improvement to the stateoftheart result. Overall, it is still a meaningful addition to the stateoftheart result. Nevertheless, it is not yet tight. Figure 4 shows that the actual lower bound is almost independent of the parameter , while our theory still relies on the assumption that .
Another parameter that may affect the bound is the dimension . To reveal dependence of the bound on the dimension, we fix the number of clusters to be and let the dimension vary between and . The center separation is chosen among equispaced number between and . The number of points in each cluster is equal to , so there are in total. The distribution for each ball is the uniform distribution on the unit sphere centered at . For any fixed pair of and , we generate instances of the stochastic ball model.
From Figure 5, it is evident that neither our bound nor the stateoftheart bound is tight. The blue line, which represents the bound , fits our empircal result the best. Based on the observation of dependence between the empirical lower bound and the parameters and as in Figure 4 and 5 , we formulate a conjecture as stated below.
Conjecture 5.1.
For a mixture generated by the generalized stochastic ball model, the PengWei relaxation achieves exact recovery with high probability if
(5.3) 
provided that the total number of points is large enough.
After the completion of this manuscript, a semidefinite relaxation based on graph cuts has been proposed in [16] to overcome the performance limits of PengWei relaxation, which provides a new alternative way to learn the stochastic ball models.
6 Proofs for Section 3.1
We will prove the main theorem and related results under the proximity condition given in Proposition 1.2. The proof for the equivalence of the two proximity conditions is presented at the end of this section. The key ingredient in the proof of the main theorem is to construct a dual variable to certify the optimality of the desired solution based on the conic duality theorem in convex optimization [7].
6.1 Conic duality
We first rewrite (2.3) as a cone program in standard form which naturally leads to its dual formulation. Noting that is a symmetric variable, the PengWei relaxation of means (2.3) is equivalent to the following optimization problem:
(6.1) 
Let , the intersection of two selfdual cones: the positive semidefinite cone and the nonnegative cone . By definition, it is a pointed
We can express (6.1) in the form of a standard cone program,
(6.2) 
Thus, using the standard derivation in Lagrangian duality theory [8], the dual problem of (6.1) can be easily obtained and given by
(6.3) 
where is the dual variable with respect to the affine constraints and
(6.4) 
is the adjoint operator of under the canonical inner product over .
6.2 Optimality condition
This subsection presents a necessary and sufficient condition for to be the global minimum of the PengWei relaxation. The result is summarized in Proposition 6.5, which follows from the complementary slackness in the conic duality theory. Moreover, a stronger sufficient condition has been established for the uniqueness of in Proposition 6.6.
Theorem 6.1 (Conic Duality Theorem, Theorem 2.4.1 in [7]).
There hold:

If the primal problem is strictly feasible and bounded below, then the dual program is solvable
^{3} and the optimal values of the primal/dual problems are equal to each other; 
If the dual problem is strictly feasible and bounded above, then the primal program is solvable and the optimal values of the primal/dual problems are equal to each other;

Assume either the primal problem or the dual problem is bounded and strictly feasible. Then is a pair of primal/dual optimum if and only if either the duality gap is zero or the complementary slackness holds.
The following lemma, tailored to (6.1) and (6.3), simply follows from the strict feasibility of (6.1) or (6.3) and Theorem 6.1.
Lemma 6.2.
Proof: .
Consider , where for . Note that and . So is in the interior of . It is also easy to verify that satisfies the other two equality constraints. This shows (6.1) is strictly feasible. In addition, we can see that the objective function in (6.1) is also nonnegative since both and are entrywise nonnegative. In conclusion, the primal problem is strictly feasible and bounded below by .
Note that is a strictly positive symmetric matrix. For the dual problem (6.3), we can take and let be a sufficiently large positive number such that
is in the interior of . Hence, the dual program is also strictly feasible. Its optimal value is bounded above because it is always smaller than the optimal value of the primal problem.
Therefore, the application of Theorem 6.1 implies that is a pair of primal/dual optima if and only if the complementary slackness holds, i.e., where and ∎
Remark 6.3.
The complementary slackness is indeed equivalent to the zero duality gap since the optimal values of both problems are attained and there holds
In the following lemma, we will derive a more explicit expression for complementary slackness which will be used in the analysis later. By definition of , the matrix must be in the form of
(6.5) 
where , and both of them are symmetric.
Lemma 6.4.
Proof: .
It suffices to prove (6.6) from since the other direction is trivial. Note that the complementary slackness is equivalent to for some and . Since and , it follows that From and , we have
where Since both and are positive semidefinite matrices, we have