Good Clusterings Have Large Volume

Good Clusterings Have Large Volume

Abstract

The clustering of a data set into disjoint clusters is one of the core tasks in data analytics. Many clustering algorithms exhibit a strong contrast between a favorable performance in practice and bad theoretical worst-cases. Prime examples are least-squares assignments and the popular -means algorithm.

We are interested in this contrast and approach it using methods of polyhedral theory. Several popular clustering algorithms are readily connected to finding a vertex of the so-called bounded-shape partition polytopes. The vertices correspond to clusterings with extraordinary separation properties, in particular allowing the construction of a separating power diagram such that each cluster has its own cell.

The geometric structure of these polytopes reveals useful information: First, we are able to quantitatively measure the space of all sites that allow construction of a separating power diagrams for a clustering by the volume of its normal cone. This gives rise to a new quality criterion for clusterings, and explains why good clusterings are also the most likely to be found by some classical algorithms.

Second, we characterize the edges of the bounded-shape partition polytopes. Through this, we obtain an explicit description of the normal cones. This allows us to compute measures with respect to the new quality criterion, and even compute “most stable” sites for the separation of clusters. The hardness of these computations depends on the number of edges incident to a vertex, which may be exponential. However, this computational effort is rewarded with a wealth of information that can be gained from the results, which we highlight using a toy example.

keywords:
clustering; linear programming; power diagram; polyhedron; normal cone; stability
\newproof

proofProof

1 Introduction

Informed decision-making based on large data sets is one of the big challenges in operations research. We are interested in one of the fundamental tasks in data analytics: The clustering of a data set into disjoint clusters. Data is often represented as a finite set in -dimensional Euclidean space. A clustering then is a partition of the set into parts , such that and .

There is a wealth of literature on clusterings methods. We refer to three surveys Berkhin (2002); Jain et al. (1999); Xu and Wunsch (2005). For most clustering algorithms, there is a strong contrast between an extremely favorable behavior in practice and the lack of provable guarantees on their performance in theory. A prime example is the popular -means algorithm Lloyd (1982); MacQueen (1967). It exhibits a stunning discrepancy between its excellent behaviour in practice and its known worst-case behaviour: In practice, it typically terminates in just a few iterations and produces human-interpretable results. In a theoretical worst-case, it may take exponentially many iterations even for two-dimensional data Vattani (2011), and it is easy to construct artifical examples for which its results do not capture the structure of the underlying data at all. In the present paper, we use methods of polyhedral theory to better understand this difference between the practical performance and the theoretical worst-case behaviour.

The studies of polyhedra arising for applications have been a popular approach in operations research. There are numerous cases where the combinatorial properties of these polyhedra revealed deeper insight into the underlying application Anderes et al. (2016); Brieden and Gritzmann (2012); De Loera and Kim (2014); Doignon and Regenwetter (1997); Grötschel and Wakabayashi (1990); Guralnick and Perkinson (2006); Kalman (2014); Onn (1993); Suck (1992). For an introduction to polyhedral theory, we recommend Nemhauser and Wolsey (1999); Schrijver (1986); Ziegler (1995). Further, we refer the reader to the book review Suck (1997) of Ziegler’s classical textbook Lectures on Polytopes Ziegler (1995) for an overview and further literature on the power of polyhedral theory for applications.

The so-called assignment polytopes are closely related to our setting and have been studied well Balinski and Russakoff (1974); Gill and Linusson (2009); Gottlieb and Rao (1990). It is possible to represent the partition of data points into clusters by using decision variables that indicate whether data point is assigned to cluster () or not (). Further, many applications specify lower bounds and upper bounds on the number of points that may be assigned to cluster . This gives rise to a simple set of linear constraints that describe all clusterings:

 s−i≤n∑j=1yij≤si(i≤k)k∑i=1yij=1(j≤n)yij≥0(i≤k,j≤n).

The first set of constraints makes sure the prescribed cluster size bounds are respected, the second set of constraints guarantees that each data point is assigned to a cluster. With the relaxed constraints , we obtain a polytope . The left-hand sides of the constraints form a totally unimodular matrix, and the right-hand sides are integral, so the vertices of this polytope are -vectors, i.e. all satisfy . Each vertex describes a clustering, and vice versa.

The polytope is a useful way to describe all clusterings, but we are particularly interested in “good” clusterings. To this end, instead of considering the above polytope, we study a special type of projection first introduced in Barnes et al. (1992); Hwang et al. (1998), the so-called bounded-shape partition polytope. (For a formal definition, see Section .) The vertices of bounded-shape partition polytopes exhibit several favorable properties Barnes et al. (1992), such as being a minimizer of the least-squares functional among all clusterings of the same cluster sizes. But most importantly, these vertices have strong separation properties: They allow for the construction of a separating power diagram, a generalized Voronoi diagram in with one polyhedral cell for the data points for each cluster Borgwardt (2010). In machine learning and other parts of operations research, this separation property is sometimes called piecewise-linear separability Bennett and Mangasarian (1992). See Figure 1 for a small example.

Contributions and Outline

In Section , we introduce some notation and review related work. We then use the known vertex characterization of Barnes et al. (1992) as a starting point for the new contributions in this paper. We briefly mention these contributions in the following paragraphs. Section is a complete and self-contained presentation of our main results. Section contains the necessary proofs. We conclude our discussion with a short outlook in Section . Some of the presented work is based on the authors’ dissertation Borgwardt (2010) and master’s thesis Happach (2016).

Section 3.1. First, we devise a new measure for the quality of a clustering, which is quite different from the stability measures used in the literature v. Luxburg (2010). We call it the volume of a clustering, since it is directly related to the volume of the normal cone of the vertex encoding this particular clustering. A large volume corresponds with a clustering of high quality, which provides an informal explanation why some clustering algorithms work well in practice: For example, the computation of a least-squares assignment for fixed cluster sizes is in one-to-one correspondence to linear optimization over a single-shape partition polytope Borgwardt (2010). When choosing random sites, it is likely to find a clustering of large volume. The -means algorithm can be interpreted as the repeated computation of a least-squares assignments with changing sites in each iteration. Informally, it is likely to terminate with a clustering of large volume – the good clusterings are found most likely.

Section 3.2. Second, we discuss the structure of the normal cone. The key requirement for this is to have an explicit representation of the normal cone. As there is no explicit description of the facets of the bounded-shape partition polytope – it is only defined as the projection of another polytope – this is a difficult task. Nonetheless, we are able to find an explicit representation: In a generalization of previous results Borgwardt (2010); Fukuda et al. (2003), we characterize the edges of all bounded-shape partition polytopes which, informally, correspond to so-called movements or cyclical movements of items between clusters. This characterization allows us to construct the inner cone of a vertex, which is the polar of the normal cone. Herewith we obtain the desired explicit representation of the normal cone which allows us to investigate the structure of the site vectors located in the interior of the normal cone. This allows us to identify certain convex areas which contain the sites of all representatives of all site vectors inducing this clustering – up to scaling, which does not change the clustering. We provide some proof-of-concept computations and a running example, in which we compare clusterings of different volumes.

Section 3.3. Finally, we introduce a new stability criterion for sites for a clustering and provide an algorithm for the computation of optimal sites in the sense of this stability criterion. These sites are maximally stable with respect to perturbation, i.e. all sites can be perturbed in any direction with a largest possible amount while staying in the normal cone. We use a classical approach from computational geometry to find such sites: We roll a -norm unit ball into the normal cone, with the origin of the cone as gravity source. We then compute where it gets “stuck”. The center of the ball gives the desired sites. This computation is readily expressed as a mathematical program. Hardness of this computation comes from the fact that there can be exponentially many edges. Of course, this hardness is not surprising – there are related problems, like the -means problem, for which the complexity of finding a global optimum (globally optimal sites) is known to be -hard, even for Aloise et al. (2009) or for data in the Euclidean plane Mahajan et al. (2012).

2 Preliminaries

We begin with some standard notation.

2.1 Clusterings, Least Squares Assignments and Power Diagrams

Let throughout this paper be fixed. Let be a set of distinct non-zero data points and for define . We call a partition of a clustering and denote by its shape. For , we call the -th cluster of and its size. Let , such that for all be the lower and upper bounds on the cluster sizes.

A clustering is said to be feasible if it satisfies componentwisely. We will only consider feasible clusterings in this paper. is called separable if all pairs of clusters are linearly separable, i.e. for all , there is and such that for all , . The hyperplane separating the clusters is denoted by . Analogously, we define and to be the respective (open) half-spaces. A constrained least squares assignment (LSA) to certain sites is a clustering minimizing

 k∑i=1∑x∈Ci∥x−ai∥22 (1)

over all clusterings with the same shape as . If minimizes (1) over all feasible clusterings, we say is a general LSA. We call the site vector. Note that there is a close connection between (constrained) least squares assignments and so-called power diagrams Aurenhammer et al. (1998); Borgwardt (2010). Recall that power diagrams are a generalization of the well-known Voronoi diagrams Aurenhammer (1987). There are several equivalent ways to define a power diagram. We briefly recall the definition that is most convenient for our purposes Borgwardt (2015). Let be a site vector with distinct sites and let . For , we call

 Pi:={x∈Rd|(aj−ai)Tx≤αi−αj for all j∈[k]∖{i}}

the -th (power) cell of the power diagram . For a convex set , we call the interior and the boundary of . Aurenhammer et al. Aurenhammer et al. (1998) showed the following connection between constrained LSAs and power diagrams: If is a constrained LSA to the site vector , then there is a power diagram with site vector satisfying for all . On the other hand, if a power diagram with site vector satisfies for all , then is a constrained LSA to the site vector . If () for all , we say the power diagram (strongly) induces the clustering and we speak of a separating power diagram. We want to stress the strength of this separation property, which is stronger than just the existence of separating hyperplanes. The constructed cells also partition the underlying space.

2.2 Movements Between Clusterings

In order to compare two clusterings , , we use the clustering difference graph (CDG), c.f. Borgwardt (2010), which is the labeled directed multigraph with node set and edge set constructed as follows: For each with and , there is an edge with label . W.l.o.g. we delete isolated nodes in the CDG. These would correspond to clusters that are identical in and . Recall that a node is isolated, if it is not incident to any edge. We can derive the clustering from by applying operations corresponding to the edges of :

Let be an edge path in with labels . Applying the movement

 M:Ci1\lx@stackrelxj1⟶Ci2\lx@stackrelxj2⟶⋯\lx@stackrelxjt⟶Cit+1

to means deriving the clustering by setting for all , , and for all .

If , i.e. in case of a cycle, we speak of a cyclical movement. We then obtain and all cluster sizes remain the same. The inverse (cyclical) movement is defined via the corresponding path (cycle) in . Clearly, one can obtain any clustering from any other clustering by (greedily) decomposing their clustering difference graph into paths and cycles and applying the corresponding (cyclical) movements to . If both clusterings have the same shape, then their CDG decomposes into cycles, i.e. only cyclical movements are required to transform one into the other Borgwardt (2010).

The following figures show a clustering of twelve data points in with black, blue, red, as well as two clusterings which can be derived from it by applying a movement (Figure 2) and a cyclical movement (Figure 3), respectively. The convex hull of each cluster is shaded in the respective color in order to highlight changes.

2.3 Bounded-Shape and Single-Shape Partition Polytopes

Let be the normal cone of a polytope at . Let denote the linear subspace spanned by a set . Note that, if is a face of the normal cone , then is a face of .

The polytope we are investigating was introduced in Barnes et al. (1992) and further analyzed in Hwang et al. (1998). For a clustering and , let . The clustering vector of is and is feasible is called the bounded-shape partition polytope. If , then we speak of the single-shape partition polytope . The bounds will typically be clear from the context. In this case, we use the simpler notation and .

Note that is a projection of the generalized assignment polytope investigated in Gottlieb and Rao (1990). We want to stress that, since the bounded-shape and single-shape partition polytopes are defined as a convex hull, we do not have explicit information on (facet-defining) valid inequalities. This will be important in our discussion lateron, as we have to study the edge structure of the polytopes (which may have exponential size) in order to construct the normal cones. The single-shape partition polytopes are of special interest for our purposes, since computation of an optimal constrained LSA corresponds precisly to linear optimization over the corresponding single-shape partition polytope Borgwardt (2010). Another interesting special case is the all-shape partition polytope investigated in Fukuda et al. (2003) which is obtained by choosing and . We obtain the following connection between single-shape and bounded-shape partition polytopes Happach (2016).

Lemma

The bounded-shape partition polytope is equal to the convex hull of single-shape partition polytopes with feasible shapes, i.e.

 P±(X,k,s−,s+)=conv(⋃s−≤s≤s+P=s).

Proof

Denote by the union of all single-shape partition polytopes, i.e.

 U=⋃s−≤s≤s+P=s.

Let be a clustering vector. Then and, since is a feasible clustering, . This holds for all clustering vectors and, as is the convex hull of all clustering vectors, we obtain .

On the other hand, clearly for all . Thus, . Taking the convex hull on both sides yields where the last equality is due to the fact that the bounded-shape partition polytope is convex.

Note that Lemma 2.3 implies that every vertex of is a vertex of and that the normal cone of a vector in the bounded-shape partition polytope is contained in the one of the single-shape partition polytope, i.e. . In 1992, Barnes et al. Barnes et al. (1992) gave a concrete characterization of the vertices of .

Proposition (Barnes, Hoffman, Rothblum 1992)

The clustering vector of a clustering is a vertex of , if and only if there are and satisfying the following statements.

1. If for , then .

2. If for , then .

3. If for , then for all

 (aj−ai)Txl<αi−αj.

A proof is given in Barnes et al. (1992) and with more technical details in Hwang et al. (1998). We call a clustering that corresponds to a vertex a vertex clustering. Condition states linear separability of the clusters of a vertex clustering with separation directions and right hand sides for all . This implies the existence of a separating power diagram. If the scalars additionally satisfy conditions and then the resulting separation fulfills some further extraordinary properties: For example, if two clusters satisfy , then and are “-separable” Aviran et al. (2002), in particular they can be separated by a hyperplane containing the origin. Any vector can be chosen to construct suitable such that the properties of Proposition 2.3 are satisfied Barnes et al. (1992). This gives the following corollary Borgwardt (2010); Happach (2016).

Corollary

Let such that is a vertex of and let . Then there is a separating power diagram with site vector such that for all . If , then this power diagram satisfies for all . If we choose , then there is an index such that .

Further, the normal cone of a vertex clustering of the single-shape partition polytope encodes exactly all site vectors that allow a separating power diagram Borgwardt (2010). Clearly, any positive scaling of a site vector in the normal cone stays in the normal cone. Thus and yield the same constrained LSA. This was first proven by Aurenhammer et al. (Aurenhammer et al., 1998) (without using polyhedral theory).

Theorem (Aurenhammer, Hoffmann, Aronov 1998)

Let be site vector of a constrained LSA. For all , the site vectors yield the same constrained LSA.

Before we state our main results, note that our assumption that the zero vector is not contained in is not a restriction, since the overall structure of a data set is not changed when translating the whole set by the same vector. This will make some of our arguments easier in the following. Moreover, we can interpret any movement as the translation of the clustering vector. Let such that is a path or cycle corresponding to a (cyclical) movement . Then the difference of the clustering vectors is called the vector of the movement . Note that, if is the vector of a movement, then the vector of the inverse movement is given by .

3 Main results

We begin each of the three section with a brief overview.

3.1 Volume of Clusterings

Overview: A vertex of a bounded-shape partition polytope can be computed efficiently by linear programming. The linear objective vector itself lists a set of sites for the construction of a separating power diagram for the clusters Barnes et al. (1992); Borgwardt (2010); Hwang et al. (1998). In fact, the normal cone of the vertex with respect to the single-shape partition polytope represents precisely all the sites that can be used for the construction of a separating power diagram for this clustering. Sites in the normal cone of a vertex of the bounded-shape partition polytope satisfy even stronger separation properties. Power diagrams are invariant under scaling in the sense that, a power diagram constructed for sites can equivalently be expressed as a power diagram constructed for any sites for all . Combining all these properties allows us to quantitatively measure the space of all sites that allow construction of a separating power diagram for a given clustering by the volume of its normal cone. This gives rise to a quality measure that we call the volume of a clustering.

Instead of looking at each site vector individually, we consider its equivalence class and choose the unit vector as a representative. This allows us to introduce a notion of the “distance of sites”. Let be the length of a curve and be the Euclidean unit sphere.

Definition (Distance of Sites)

Let be two site vectors. The distance of the equivalence classes and is defined as the distance of the site vectors on the unit sphere, i.e.

 d(a,a′):=inf{L(γ) | γ:[0;1]↦Rd⋅k,γ(0)=a,γ(1)=a′,γ(t)∈Sd⋅k ∀t∈[0;1]}.

Note that is a metric, takes values between and and that an infimum always exists, because for all .

If a vertex clustering of or has a large normal cone, then it is likely that a randomly chosen site vector lies in its cone, so by previous observations, the chosen site vector defines a separating power diagram inducing . We are interested in measuring the volume of the normal cones of the bounded-shape and single-shape partition polytopes in order to characterize “good” clusterings.

We follow the notation of Bonifas et al. Bonifas et al. (2014) who used the volume of normal cones for the investigation of polytope diameters. For a cone , we call the base of . The volume of is defined as the -dimensional volume of and is denoted by . A set is called spherically convex, if for all the geodesic connecting and with and is also contained in . Recall that a geodesic is a curve on the sphere with shortest length. Note that the base of a cone is itself spherically convex. These notions allow use to introduce a new term: the volume of a clustering.

Definition (Volume of a Clustering)

Let be a feasible clustering and let and be its normal cones of and , respectively. We define

 μ±(C):=\emphvol(NP±(w(C)))\emphvol(Rd⋅k)

to be the BHR volume of (as a tribute to the first authors discussing this polytope Barnes et al. (1992)) and

 μ=(C):=\emphvol(NP=(w(C)))\emphvol(Rd⋅k)

to be the LSA volume of .

The volumes of a clustering put the volume of the respective normal cones in relation to the volume of the whole space. By definition, , since is a cone containing all cones in and vol equals the area of the surface of . Note that, in practice, permutation of the clusters/sites yield the same clustering and the respective normal cones have the same volume. For our theoretical purposes, it is more useful to consider each permutation as an individual clustering. However, note that, if all cluster bounds are symmetric, then the volumes defined above only take values between 0 and , because they only take one possible permutation into account. We would like to stress that, depending on , even small values already represent good clusterings. By Lemma 2.3, for all clusterings .

It does not matter whether we consider the whole space or just the affine hull of the polytope in Definition 3.1, because all normal cones have the same lineality space equal to with for , respectively. Since we consider the volume of a cone relative to the whole space, we can restrict ourselves to the affine hull of the polytopes. For further details on the restriction to the affine hull we refer to Happach (2016).

In fact, one can show that and are contained in a -dimensional affine subspace. In order to see that for is not full-dimensional, consider a vector with and an arbitrary clustering vector . Then

 aTw(C)=k∑i=1¯aTσi=k∑i=1∑x∈Ci¯aTx=n∑j=1¯aTxj,

so . The dimension of the set of all vectors consisting of copies of -dimensional vectors is equal to , so the dimension of is at most .

Theorem

The measures and are well-defined. For , is a vertex clustering of if and only if and then

 μ⋆(C)=Γ(d⋅k2)2πd⋅k2∫bd(B(NP⋆(w(C))))d(z,a) da, (2)

with and being the well-known Gamma function.

Note that the coefficient of the integral in (2) is just the inverse of the area of the surface of . We postpone the proof of this theorem to Section 4.1.

3.2 Site Vectors in the Normal Cone

Overview: In this section we derive an explicit description of the normal cone of a vertex clustering of the bounded-shape partition polytope via a characterization of the edges of the polytope. This characterization generalizes previously known results on several special cases Borgwardt (2010); Fukuda et al. (2003); Gao et al. (1999). As a corollary, we also obtain a description for the normal cones of the single-shape partition polytopes. Informally, two clusterings of the polytope belong to neighboring vertices if they differ by only a single movement or cyclical movement. Further, there has to exist a set of sites for which both clusterings allow construction of a separating power diagram. The explicit information on the normal cone allows to represent the set of sites that define a separating power diagram for a vertex clustering as convex areas in the original space of the data set. These areas contain a representative of each equivalence class of site vectors that induce the clustering. A running example of two clusterings with different volumes is used for an illustration.

It is well-known that the edges incident to a vertex of a polytope are normal vectors to the facets of the normal cone of the vertex. Therefore, in order to obtain an explicit representation of the normal cone, we characterize the edges of the bounded-shape partition polytope. Our characterization generalizes previous results for special cases Gao et al. (1999) and Borgwardt (2010).

The most closely related characterization was done by Fukuda et al. Fukuda et al. (2003) who explicitly stated the neighborhood of a vertex of the all-shape partition polytope . They showed that edges incident to correspond to movements of the form or with and , . Moreover, they proved that, if there are and such that , then and are in the same cluster in a vertex clustering, so one can assume that no such pair exists. In particular, if there are no multiples in the point set , i.e.  for all then all edges of correspond to movements which move a single element from one cluster to another. The following theorem extends these results to general lower and upper bounds and .

Theorem

Let , be two clusterings such that and are adjacent vertices of . Let further no three points in lie on a single line. Then and differ by a single (cyclical) movement or by two movements and there are distinct such that both are of the form .

The different cases that can occur for the clustering difference graph according to Theorem 3.2 are depicted in Figure 7. Note that the case of two movements is degenerated and that this can only occur, if all sites lie on a line. For the single-shape partition polytope, we obtain the following corollary Borgwardt (2010).

Corollary

Let , be two clusterings such that and are adjacent vertices of . If no four points in lie on a single line, then and only differ by a single cyclical movement.

We dedicate Section 4.2 to the proof of this theorem and corollary. Figure 8 shows two vertex clusterings of which are connected by an edge. The site vector of the plotted power diagram lies in the (boundary of the) normal cones of both vertices. As one can see, the data points on the boundary of the cells – which exist due to Corollary 2.3 – move to the other cell.

Figure 9 illustrates two possible clusterings of 27 data points (colored dots) in . These clusterings were computed by running the -means algorithm twenty times with three random sites in the beginning. In every iteration the -means algorithm computes a LSA to the current sites and updates the sites to be the arithmetic mean of their corresponding cluster. This is repeated until the clustering does not change anymore. Note that the -means algorithm is deterministic, but its result strongly depends on the choice of the initial sites. Whereas the clustering on the left hand side (except for permutation of the colors) was the result in about half of the runs, the one on the right hand side was computed only once. Intuitively, one sees that the left clustering captures the structure of the data better than the one on the right hand side. This fits with our quantitative measure when looking at the normal cones of the respective vertices of the all-shape partition polytope and single-shape partition polytope:

Both volumes of the clustering on the left hand side are higher (, ) than the ones of the clustering on the right hand side (, ). The area of the surface of the -dimensional sphere is . The volumes of the respecitve normal cones were computed with MATLAB using the function Volume Computation of Convex Bodies by Ben Cousins with an error tolerance of 0.001 Cousins (2015).1 The higher volumes of the left clustering can also be verified by computing the edges of the respective normal cones and projecting these (normalized) site vectors to the -dimensional components corresponding to the sites. The sites of a site vector in the convex hull of the edges (in the normal cone) are now located in the convex hulls of the sites of the edges in the -dimensional space.

Figures 10 and 11 illustrates the three areas of the -dimensional sites for the respective clusterings w.r.t.  and , respectively. In order to be comparable, the -dimensional site vectors (edges of the normal cones) were normalized to Euclidean norm equal to 4. Of course, this does not change the induced clustering by the invariance under scaling of sites.

As one can see the areas on the left hand side are larger than the ones on the right hand side, especially for the all-shape partition polytope. Note that these areas do not mean that one can choose three sites arbitrarily within the three areas and obtain an optimal site vector for the respective clustering. Instead, for any arbitrary site within one of these areas, there are sites in the respective other areas such that the corresponding site vector is inside the normal cone of the clustering. Further, these areas contain sites of representatives for all equivalence classes of site vectors. For the single-shape partition polytope these areas depict representatives of all site vectors for which the respective clusterings are optimal constrained LSAs, see Figure 11.

3.3 Stability of Site Vectors

Overview: For any vertex clustering with clustering vector , its normal cone contains all site vectors for which is an optimal constrained LSA and the normal cone contains all sites with the properties stated in Corollary 2.3. So far, we observed that clusterings with large normal cones are good in two senses: they are likely to be computed with randomly chosen sites and they have high volume, i.e. each site vector strictly inducing this clustering can be perturbed with the same (high) amount without changing the clustering. Given a vertex clustering and its normal cone, we now characterize a most stable site vector in the normal cone. After introducing our notion of stability, which depends on the choice of a -norm, we provide an optimization problem whose optimal solution gives us a site vector with the highest possible stability for this clustering. Moreover, we present how the optimal solutions for different -norms are connected and how one can obtain an approximate solution for any -norm via use of the Euclidean norm.

Let us begin by defining a notion for the stability of site vectors.

Definition (Stability of Site Vectors)

Let . For the BHR stability of the site vector w.r.t.  is

 τp±(a):=max{δ>0 | ¯a∈NP±(w(C)) for all ∥a−¯a∥p≤δ}.

For we call

 τp=(a):=max{δ>0 | ¯a∈NP=(w(C))% for all ∥a−¯a∥p≤δ}

the LSA stability of the site vector w.r.t. .

The stability measures indicate the amount by which we can perturb the site vector within the respective normal cone without changing the induced clustering. Note that we can extend the above definition to the equivalence classes of site vectors (and therefore to all site vectors) by inserting the corresponding representative into , . Geometrically, the above characterization of a “most stable” site vector can be described by informally dropping a -norm unit ball into the normal cone with as gravity center and compute where it “gets stuck” due to being blocked by the facets of the cone. The center of this unit ball then gives us a vector which lies “most centrally” within the normal cone. Figure 14 illustrates two 2D examples of this approach.

The optimization problem below yields a stable site vector in the sense of Definition 3.3. We postpone the proofs of the theorems to Section 4.3.

Theorem

Let , and be a vertex with incident edges . Then the optimal solution of the following optimization problem yields a site vector inducing the clustering with highest possible stability w.r.t. .

 min∥z∥22s.t.vTjz≤γpj∀j∈[t],z∈Rd⋅k (3)

with being the optimal objective values of the problems

 minvTjzs.t.z∈Bp1(0),z∈Rd⋅k. (4)

for each . Here denotes the closed -norm ball with center and radius .

The stability of an optimal solution of (3) is . Note that the edges encode single (cyclical) movements for all (Theorem 3.2 and Corollary 3.2). Problem (3) is a quadratic optimization problem with linear constraints and the auxiliary problems (4) are linear optimization problems over a convex set. For the feasible regions are polytopes. Note, however, that there might be exponentially many edges and, thus, we might have to solve exponentially many auxiliary problems.

Problem (3) models the facets of the normal cone blocking the unit ball. We justify our approach that the ball is blocked by facets, not lower dimensional faces of the cone, with a short counterexample in Section 4.3 after the proof of Theorem 3.3. We will see that lower dimensional faces do not guarantee a stable site vector.

Fix and let be an optimal solution of the optimization problem (3). Then we can perturb one site, say , within a -norm ball with radius 1 without changing the clustering. If we choose , then for such that for all , we obtain

 ∥∥~z−z(p)∥∥p=(k∑i=1∥~zi−zi∥pp)1p≤(k∑i=1δp)1p=δ⋅k1p<(1k)1p⋅k1p=1.

Thus, , i.e. we can perturb each site within a -norm ball with e.g. radius without changing the clustering.

If , we can even choose , because then

 ∥∥~z−z(p)∥∥∞=max{∥~zi−zi∥∞ | i∈[k]}≤δ<1.

The following figures illustrate the two clusterings of Figure 9 together with the optimal sites (crosses in respective colors) corresponding to Theorem 3.3 for the all-shape and single-shape partition polytopes with , respectively. The -dimensional optimal solutions of (3) were again scaled to Euclidean norm equal to 4. Figures 15 and 16 depict the area of possible perturbation when only perturbing the first (black) site and keeping the other two sites fixed such that the clustering is not changed.

In Figures 17, 18, 19 and 20, all sites can be perturbed simultaneously within the drawn -norm balls without changing the clustering. Figures 19 and 20 show the areas of perturbation for the single-shape partition polytope, i.e. the corresponding LSA sites can be perturbed simultaneously within these areas without changing the LSA. The different sizes of the -norm balls are due to scaling of the optimal solutions of (3). Note that scaling the sites by a positive factor does not change the clustering, but of course, these scaling factors affect the radius of the -norm ball. In fact, they directly correspond to the stability of the depicted optimal sites. The BHR stability measures of the left clustering are and , whereas for the right clustering we have and .

For the LSA stability measures the differences between left and right clustering are not as large, since the single-shape partition polytope fixes the cluster sizes. We obtain , (left) and , (right). One sees that the sites on the left hand side can be perturbed much more than the ones on the right hand side, which again justifies the intuition that the left clustering is better than the right one.

Next, we show how to obtain a feasible approximate solution of (3). It is well-known that for all there is a positive constant only depending on and the dimension of the space such that for all .

Theorem

Let , and be an optimal solution of (3). Then for all the vector satisfies . In particular, , i.e. both site vectors are in the same equivalence class. Moreover, the objective value of satisfies where is the optimal solution of (3) when considering the -norm ball.

Note that an upper bound on the objective value of the approximate solution implies a lower bound on its stability