On spectral partitioning of signed graphsA preliminary version posted at arXiv.

On spectral partitioning of signed graphsthanks: A preliminary version posted at arXiv.

Andrew Knyazev Mitsubishi Electric Research Laboratories (MERL). Cambridge, MA 02139-1955. Email: knyazev@merl.com
Abstract

We argue that the standard graph Laplacian is preferable for spectral partitioning of signed graphs compared to the signed Laplacian. Simple examples demonstrate that partitioning based on signs of components of the leading eigenvectors of the signed Laplacian may be meaningless, in contrast to partitioning based on the Fiedler vector of the standard graph Laplacian for signed graphs. We observe that negative eigenvalues are beneficial for spectral partitioning of signed graphs, making the Fiedler vector easier to compute.

On spectral partitioning of signed graphsthanks: A preliminary version posted at arXiv.

Andrew Knyazevthanks: Mitsubishi Electric Research Laboratories (MERL). Cambridge, MA 02139-1955. Email: knyazev@merl.com


1 Background and Motivation

Spectral clustering groups together related data points and separates unrelated data points, using spectral properties of matrices associated with the weighted graph, such as graph adjacency and Laplacian matrices; see, e.g., [22, 23, 24, 25, 30, 28, 2, 32]. The graph Laplacian matrix is obtained from the graph adjacency matrix that represents graph edge weights describing similarities of graph vertices. The graph weights are commonly defined using a function measuring distances between data points, where the graph vertices represent the data points and the graph edges are drawn between pairs of vertices, e.g., if the distance between the corresponding data points has been measured.

Classical spectral clustering bisections the graph according to the signs of the components of the Fiedler vector defined as the eigenvector of the graph Laplacian, constrained to be orthogonal to the vector of ones, and corresponding to the smallest eigenvalue; see [6].

Some important applications, e.g., Slashdot Zoo [19] and correlation [1] clustering, naturally lead to signed graphs, i.e., with both positive and negative weights. Negative values in the graph adjacency matrix result in more difficult spectral graph theory; see, e.g., [3].

Applying the original definition of the graph Laplacian to signed graphs breaks many useful properties of the graph Laplacian, e.g., leading to negative eigenvalues, making the definition of the Fiedler vector ambivalent. The row-sums of the adjacency matrix may vanish, invalidating the definition of the normalized Laplacian. These difficulties can be avoided in the signed Laplacian, e.g., [8, 18, 20], defined similarly to the graph Laplacian, but with the diagonal entries positive and large enough to make the signed Laplacian positive semi-definite.

We argue that the original graph Laplacian is a more natural and beneficial choice, compared to the popular signed Laplacian, for spectral partitioning of signed graphs. We explain why the definition of the Fiedler vector should be based on the smallest eigenvalue, no matter whether it is positive or negative, motivated by the classical model of transversal vibrations of a mass-spring system, e.g., [10, 5], but with some springs having negative stiffness, cf. [14].

Inclusions with negative stiffness can occur in mechanics if the inclusion is stored with energy [21], e.g., pre-stressed and constrained. We design inclusions with negative stiffness by pre-tensing the spring to be repulsive [4]. Allowing only the transversal movement of the masses, as in [5], gives the necessary constraints.

The resulting eigenvalue problem for the vibrations remains mathematically the same, for the original graph Laplacian, no matter if some entries in the adjacency matrix of the graph are negative. In contrast, to motivate the signed Laplacian, the “inverting amplifier” model in [20, Sec. 7] uses a questionable argument, where the sign of negative edges changes in the denominator of the potential, but not in its numerator

Turning to justification of spectral clustering via relaxation, we compare the standard “ratio cut,” e.g., [24, 25], and “signed ratio cut” of [20], noting that minimizing the signed ratio cut may amplify cutting positive edges. We illustrate the behavior of the Fiedler vector for an intuitively trivial case of partitioning a linear graph modelled by vibrations of a string. We demonstrate numerically and analyze deficiencies of the signed Laplacian vs. the standard Laplacian for spectral clustering on a few simple examples.

Graph-based signal processing introduces eigenvectors of the graph Laplacian as natural substitutions for the Fourier basis. The construction of the graph Laplacian of [16] is extended in [15] to the case of some negative weights, leading to edge enhancing denoising of an image that can be used as a precursor for image segmentation along the edges. We extend the use of negative weights to graph partitioning in the present paper.

The rest of the paper is organized as follows. We introduce spectral clustering in Section 2 via eigendecomposition of the graph Laplacian. Section 3 deals with a simple, but representative, example—a linear graph,—and motivates spectral clustering by utilizing properties of low frequency mechanical vibration eigenmodes of a discrete string, as an example of a mass-spring model. Negative edge weights are then naturally introduced in Section 4 as corresponding to repulsive springs, and the effects of negative weights on the eigenvectors of the Laplacian are informally predicted by the repulsion of the masses connected by the repulsive spring. In Section 5, we present simple motivating examples, discuss how the original and signed Laplacians are introduced via relaxation of combinatorial optimization, and numerically compare their eigenvectors and gaps in the spectra. Possible future research directions are spotlighted in Section 6.

2 Brief introduction to spectral clustering

Let entries of the real symmetric -by- data similarity matrix be called weighs and the matrix be diagonal, made of row-sums of the matrix . The matrix may be viewed as a matrix of scores that digitize similarities of pairs of data points. Similarity matrices are commonly determined from their counterparts, distance matrices, which consist of pairwise distances between the data points. The similarity is small if the distance is large, and vice versa. Traditionally, all the weighs/entries in are assumed to be non-negative, which is automatically satisfied for distance-based similarities. We are interested in clustering in a more general case of both positive and negative weighs, e.g., associated with pairwise correlations of the data vectors.

Data clustering is commonly formulated as graph partitioning, defined on data represented in the form of a graph , with vertices in and edges in , where entries of the -by- graph adjacency matrix are weights of the corresponding edges. The graph is called signed if some edge weighs are negative. A partition of the vertex set into subsets generates subgraphs of with desired properties.

A partition in the classical case of non-weighted graphs minimizes the number of edges between separated sub-graphs, while maximizes the number of edges within each of the sub-graphs. The goal of partitioning of signed graphs, e.g., into two vertex subsets and , can be to minimize the total weight of the positive cut edges, while at the same time to maximize the absolute total weight of the negative cut edges. For uniform partitioning, one also needs to well-balance sizes/volumes of and . Traditional approaches to graph partitioning are combinatorial and naturally fall under the category of NP-hard problems, solved using heuristics, such as relaxing the combinatorial constraints.

Data clustering via graph spectral partitioning is a state-of-the-art tool, which is known to produce high quality clusters at reasonable costs of numerical solution of an eigenvalue problem for a matrix associated with the graph, e.g.,  for the graph Laplacian matrix , where the scalar denotes the eigenvalue corresponding to the eigenvector . To simplify our presentation for the signed graphs, we mostly avoid the normalized Laplacian , where is the identity matrix, e.g., since may be singular.

The Laplacian matrix always has the number as an eigenvalue; and the column-vector of ones is always a trivial eigenvector of corresponding to the zero eigenvalue. Since the graph adjacency matrix is symmetric, the graph Laplacian matrix is also symmetric, so all eigenvalues of are real and the various eigenvectors can be chosen to be mutually orthogonal. All eigenvalues are non-negative if the graph weights are all non-negative.

A nontrivial eigenvector of the matrix corresponding to smallest eigenvalue of , commonly called the Fiedler vector after the author of [6], bisects the graph into only two parts, according to the signs of the entries of the eigenvector. Since the Fiedler vector, as any other nontrivial eigenvector, is orthogonal to the vector of ones, it must have entries of opposite signs, thus, the sign-based bisection always generates a non-trivial two-way graph partitioning. We explain in Section 3, why such a partitioning method is intuitively meaningful.

A multiway spectral partitioning is obtained from “low frequency eigenmodes,” i.e., eigenvectors corresponding to a cluster of smallest eigenvalues, of the Laplacian matrix The cluster of (nearly)-multiple eigenvalues naturally leads to the need of considering a Fiedler invariant subspace of , spanned by the corresponding eigenvectors, extending the Fiedler vector, since the latter may be not unique or well defined numerically in this case. The Fiedler invariant subspace provides a geometric embedding of graph’s vertices, reducing the graph partitioning problem to the problem of clustering of a point cloud of embedded graph vertices in a low-dimensional Euclidean space. However, the simple sign-based partitioning from the Fiedler vector has no evident extension to the Fiedler invariant subspace.

Practical multiway spectral partitioning can be performed using various competing heuristic algorithms, greatly affecting the results. While these same heuristic algorithms can as well be used in our context of signed graphs, for clarity of presentation we restrict ourselves in this work only to two-way partitioning using the component signs of the Fiedler vector.

The presence of negative weights in signed graphs brings new challenges to spectral graph partitioning:

  • negative eigenvalues of the graph Laplacian make the definition of the Fiedler vector ambiguous, e.g., whether the smallest negative or positive eigenvalues, or may be the smallest by absolute value eigenvalue, should be used in the definition;

  • difficult spectral graph theory, cf. [8] and [23];

  • possible zero diagonal entries of the degree matrix in the normalized Laplacian , cf. [30];

  • violating the maximum principle—the cornerstone of a theory of connectivity of clusters [6];

  • breaking the connection of spectral clustering to random walks and Markov chains, cf. [24];

  • the quadratic form is not “energy,” e.g., in the heat (diffusion) equation; cf. a forward-and-backward diffusion in [9, 31];

  • the graph Laplacian can no longer be viewed as a discrete analog of the Laplace-Beltrami operator on a Riemannian manifold that motivates spectral manifold learning; e.g., [11, 29].

Some of these challenges can be addressed by defining a signed Laplacian as follows. Let the matrix be diagonal, made of row-sums of the absolute values of the entries of the matrix , which thus are positive, so that is well-defined. We define the signed Laplacian following, e.g., [8, 18, 20]. The signed Laplacian is positive semi-definite, with all eigenvalues non-negative. The Fiedler vector of the signed Laplacian is defined in [8, 18, 20] as an eigenvector corresponding to the smallest eigenvalue and different from the trivial constant vector. We finally note recent work [7], although it is not a part of our current investigation.

In the rest of the paper, we justify spectral partitioning of signed graphs using the original definition of the graph Laplacian , and argue that better quality clusters can generally be expected from eigenvectors of the original , rather than from the signed Laplacian . We use the intuitive mass-spring model to explain novel effects of negative stiffness or spring repulsion on eigenmodes of the standard Laplacian, but we are unaware of a physical model for the signed Laplacian.

Figure 1: Low frequency eigenmodes of a string (left) and two disconnected pieces of the string (right).

3 Linear graph Laplacian and low frequency eigenmodes of a string

Spectral clustering can be justified intuitively via a well-known identification of the graph Laplacian matrix with a classical problem of vibrations of a mass-spring system without boundary conditions, with masses and springs, where the stiffness of the springs is related to the weights of the graph; see, e.g., [26]. References [27, 26] consider lateral vibrations, where [27] allows springs with negative stiffness. We prefer the same model, but with transversal vibrations, as in [5], although the linear eigenvalue problem is the same, for the original graph Laplacian, no matter whether the vibrations are lateral or transversal, under the standard assumptions of infinitesimal displacements from the equilibrium and no damping. The transversal model allows relating the linear mass-spring system to the discrete analog of an ideal string [10, Fig. 2] and provides the necessary constraints for us to introduce a specific physical realization of inclusions with the negative stiffness by pre-tensing some springs to be repulsive. We start with the simplest example, where the mass-spring system is a discrete string.

3.1 All edges with unit weights

Let with all other zero entries, so that the graph Laplacian is a tridiagonal matrix

(3.1)

that has nonzero entries and in the first row, and in the last row, and in every other row—a standard finite-difference approximation of the negative second derivative of functions with vanishing first derivatives at the end points of the interval. Its eigenvectors are the basis vectors of the discrete cosine transform; see the first five low frequency eigenmodes (the eigenvectors corresponding to the smallest eigenvalues) of displayed in the left panel in Figure 1. Let us note that these eigenmodes all turn flat at the end points of the interval.

The flatness is attributed to the vanishing first derivatives, which manifests itself in the fact, e.g., that the Laplacian row sums always vanish, including in the first and last rows, corresponding to the “boundary.”

Eigenvectors of matrix (3.1) are well-known in mechanics, as they represent shapes of transversal vibration modes of a discrete analog of a string—a linear system of masses connected with springs. Figure 2 illustrates a system with masses and springs.

Figure 2: Traditional linear mass-spring system.

The frequencies squared of the vibration modes are the eigenvalues , e.g., [10, p. 15]. The eigenvectors of the graph Laplacian can be called eigenmodes because of this mechanical analogy. The smallest eigenvalues correspond to low frequencies , explaining the terminology used in the caption in Figure 1. Our system of masses is not attached, thus there is always a trivial eigenmode, where the whole system goes up/down, i.e., the eigenvector is constant with the zero frequency/eigenvalue .

If the system consists of completely separate components, each component can independently move up/down in zero frequency vibration, resulting in total multiplicity of the zero frequency/eigenvalue, where the corresponding eigenvectors are all piecewise constant with discontinuities between the components. Such a system represents a graph consisting of completely separate sub-graphs and can be used to motivate -way spectral partitioning.

In our case , the Fiedler vector is chosen orthogonal to the trivial constant eigenmode, and thus is not only piecewise constant, but also has strictly positive and negative components, determining the two-way spectral partitioning.

Figure 2 shows transversal displacements of the masses from the equilibrium plane for the first nontrivial mode, which is the Fiedler vector, where the two masses on the left side of the system move synchronously up, while the two masses on the right side of the system move synchronously down. This is the same eigenmode as drawn in red color in Figure 1 left panel for a similar linear system with a number of masses large enough to visually appear as a continuous string. Performing the spectral bisection (two-way partitioning) according to the signs of the Fiedler vector, one puts the data points corresponding to the masses in the left half of the mass-spring system into one cluster and those in the right half into the other. The Fiedler vector is not piecewise constant, since the partitioned components are not completely separate.

The amplitudes of the Fiedler vector components are also very important. The amplitude of the component squared after proper scaling can be interpreted as a probability of the corresponding data point to belong to the cluster determined according to the sign of the component. For example, the Fiedler vector in Figure 2 has small absolute values of its components in the middle of the system. With the number of masses increased, the components in the middle of the system approach zero. Perturbations of the graph weights may lead to the sign changes in the small components, putting the corresponding data points into a different cluster.

3.2 A string with a single weak link (small edge weight)

Next, we set a very small value for some index , keeping all other entries of the matrix the same as before. In terms of clustering, this example represents a situation where there is an intuitively evident bisection with one cluster containing all data points with indexes and the other with . In terms of our mass-spring system interpretation, we have a discrete string with one weak link, i.e., one spring with such a small stiffness that makes two pieces of the string nearly disconnected.

Let us check how the low frequency eigenmodes react to such a change. The first five vectors of the corresponding Laplacian are shown in Figure 1, right panel. We observe that all the eigenvectors plotted in Figure 1 are aware of softness (small stiffness) of the spring between the masses with the indexes and . Moreover, their behavior around the soft spring is very specific—they are all flat on both sides of the soft spring!

The presence of the flatness in the low frequency modes of the graph Laplacian on both sides of the soft spring is easy to explain mathematically. When the value is small relative to other entries, the matrix becomes nearly block diagonal, with two blocks that approximate the graph Laplacian matrices on sub-strings to the left and right of the soft spring. The low frequency eigenmodes of the graph Laplacian thus approximate combinations of the low frequency eigenmodes of the graph Laplacians on the sub-intervals.

However, each of the low frequency eigenmodes of the graph Laplacian on the sub-interval is flat on both ends of the sub-interval, as explained above. Combined, it results in the flatness in the low frequency modes of the graph Laplacian on both sides of the soft spring.

The flatness is also easy to explain in terms of mechanical vibrations. The soft spring between the masses with the indexes and makes the masses nearly disconnected, so the system can be well approximated by two independent disconnected discrete strings with free boundary conditions, on the left and on the right to the soft spring. Thus, the low frequency vibration modes of the system are visually discontinuous at the soft spring, and nearly flat on both sides of the soft spring.

Can we do better and make the flat ends bend in the opposite directions, making it easier to determine the bisection, e.g., using low-accuracy computations of the eigenvectors? In [15], where graph-based edge-preserving signal denoising is analyzed, we have suggested to enhance the edges of the signal by introducing negative edge weights in the graph, cf. [9]. In the next section, we put a spring which separates the masses by repulsing them and see how the repulsive spring affects the low-frequency vibration modes.

4 Negative weights for spectral clustering

In our mechanical vibration model of a spring-mass system, the masses that are tightly connected have a tendency to move synchronically in low-frequency free vibrations. Analyzing the signs of the components corresponding to different masses of the low-frequency vibration modes determines the clusters.

The mechanical vibration model describes conventional clustering when all the springs are pre-tensed to create attracting forces between the masses, where the mass-spring system is subject to transverse vibrations, i.e., the masses are constrained to move only in a transverse direction, perpendicular to a plane of the mass-spring system. However, one can also pre-tense some of the springs to create repulsive forces between some pairs of masses, as illustrated in Figure 3. For example, the second mass is connected by the attractive spring to the first mass, but by the repulsive spring to the third mass in Figure 3. The repulsion has no effect in the equilibrium, since the masses are constrained to displacements only in the transversal direction, i.e. perpendicular to the equilibrium plane. When the second mass deviates, shown in white circle in Figure 3, from its equilibrium position, shown in back circle in Figure 3, the transversal component of the attractive force from the attractive spring on the left is oriented toward the equilibrium position, while the transversal component of the repulsive force from the repulsive spring on the right is in the opposite direction, resulting in opposite signs in the equation of the balance of the two forces. Since the stiffness is the ratio of the force and the displacement, the attractive spring on the left has effective positive stiffness, but the repulsive spring represents the inclusion with effective negative stiffness, due to the opposite directions of the corresponding forces.

In the context of data clustering formulated as graph partitioning, that corresponds to negative entries in the adjacency matrix. The negative entries in the adjacency matrix are not allowed in conventional spectral graph partitioning. However, the model of mechanical vibrations of the spring-mass system with repulsive springs is still valid, allowing us to extend the conventional approach to the case of negative weights.

Figure 3: Linear mass-spring system with repulsion.

The masses which are attracted move together in the same direction in low-frequency free vibrations, while the masses which are repulsed have the tendency to move in the opposite direction. Moreover, the eigenmode vibrations of the spring-mass system relate to the corresponding wave equation, where the repulsive phenomenon makes it possible for the time-dependent solutions of the wave equation to exponentially grow in time, if they correspond to negative eigenvalues.

Figure 3 shows the same linear mass-spring system as Figure 2, except that the middle spring is repulsive, bending the shape of the Fiedler vector in the opposite directions on different sides of the repulsive spring. The clusters in Figure 2 and Figure 3 are the same, based on the signs of the Fiedler vectors. However, the data points corresponding to the middle masses being repulsed more clearly belong to different clusters in Figure 3, compared to Figure 2, because the corresponding components in the Fiedler vector are larger by absolute value in Figure 3 vs. Figure 2. Determination of the clusters using the signs of the Fiedler vector is easier for larger components, since they are less likely to be computed with a wrong sign due to data noise or inaccuracy of computations, e.g., small number of iterations.

Figure 4: The same eigenmodes, but negative weights, original (left) and signed (right) Laplacians.

Figure 4 left panel displays the five eigenvectors, including the trivial one, for the five smallest eigenvalues of the same tridiagonal graph Laplacian as that corresponding to the right panel in Figure 1 except that the small positive entry of the weights for the same is substituted by in Figure 4. Figure 4 right panel displays the five leading eigenvectors of the corresponding signed Laplacian. The left panel of Figure 4 illustrates the predicted phenomenon of the repulsion, in contrast to the right panel. The Fiedler vector of the Laplacian, displayed in blue color in the left panel of Figure 4, is most affected by the repulsion compared to higher frequency vibration modes. This effect gets more pronounced if the negative weight increases by absolute value, as we observe in other tests not shown here.

The Fiedler vector of the signed Laplacian with the negative weight displayed in blue color in the right panel of Figure 4 looks piecewise constant, just the same as the Fiedler vector of the Laplacian with nearly zero weight shown in red color in Figure 1 right panel. We now prove that this is not a coincidence. Let us consider a linear graph corresponding to Laplacian (3.1). We first remove one of the middle edges and define the corresponding graph Laplacian . Second, we put this edge back but with the negative unit weight and define the corresponding signed Laplacian . It is easy to verify

(4.2)

where all dotted entries are zeros.

The Fiedler vector of is evidently piece-wise constant with one discontinuity at the missing edge, since the graph Laplacian corresponds to the two disconnected discrete string pieces. Let denote the Fiedler vector of shifted by the vector of ones and scaled so that its components with the opposite sign are simply and , while still . We get from (4.2), thus, also , i.e., is the Fiedler vector of both matrices and , where in the latter our only negative weight is simply nullified.

Figure 5: Laplacian eigenmodes, original (left) and signed (right), a “noisy” -mass string with a negative weight at one edge between vertices and .

5 Comparing the original vs. signed Laplacians

We present a few simple motivating examples, discuss how the original and signed Laplacians are introduced via relaxation of combinatorial optimization, and compare their eigenvectors and gaps in the spectra, computed numerically for these examples.

5.1 Linear graph with noise

We consider another standard linear mass-spring system with masses and one repulsive spring, between masses and , but add to the graph adjacency an extra full random matrix with entries uniformly distributed between and , modelling noise in the data. It turns out that in this example the two smallest eigenvalues of the signed Laplacian form a cluster, making individual eigenvectors unstable with respect to the additive noise, leading to meaningless spectral clustering, if based on the signs on the components of any of the two eigenvectors. Specifically, the exact Laplacian eigenmodes are shown in Figure 5: the original Fiedler (left panel) and both eigenvectors of the signed Laplacian (right panel). The Fiedler vector of the original Laplacian clearly suggests the perfect cut. Neither the first nor the second (giving it a benefit of a doubt) exact eigenvectors of the signed Laplacian result in meaningful clusters, using the signs of the eigenvector components as suggested in [20].

5.2 “Cobra” graph

Let us consider the mass-spring system in Figure 6, assuming all springs of the same strength, except for the weak spring connecting masses and , and where one of the springs repulses masses and . Intuition suggests two alternative partitionings: (a) cutting the weak spring, thus separating the “‘tail” consisting of masses and , and (b) cutting the repulsive spring and one of the attracting springs, linking mass or mass (and ) to the rest of the system. Partitioning (a) cuts the weak, but attractive spring; while partitioning (b) cuts one repulsive and one attracting springs of the same absolute strength “canceling” each other influence. If the cost function minimized by the partitioning were the total sum of the removed edges, partitioning (a) would be costlier than (b). Within the variants of the partition (b), the most balanced partitioning is the one separating masses and from the rest of the system. Let us now examine the Fiedler vectors of the spectral clustering approaches under our consideration.

The graph corresponding to the mass-spring system in Figure 6, assuming all edges have unit weights, except for the weight of the edge, and with weight of the edge, has the adjacency matrix

(5.3)
Figure 6: Mass-spring system with repulsive springs.

Let us also consider a graph like the one corresponding to the mass-spring system in Figure 6, but with the repulsive spring eliminated. We nullify the negative weight in the graph adjacency matrix by and denote the corresponding to graph Laplacian matrix by .

Figure 7 displays the corresponding Fiedler vectors of original (top left), original with negative weights nullified (top right), and both main modes of the signed Laplacian (bottom). The original Laplacian (top left) suggests meaningful clustering of vertices and vs. and . Dropping the negative weight results in cutting the weakly connected tail of the cobra, see Figure 7 top right. The first eigenvector of the signed Laplacian in Figure 7 bottom right appears meaningless for clustering, even though it is far from looking as a constant. The second eigenvector of the signed Laplacian in Figure 7 bottom left suggests cutting off vertex from and , which is not well balanced.

Figure 7: Laplacian eigenvectors: the original (top left), the original with negative weights nullified (top right), and the signed Laplacian first (bottom right) and second (bottom left) eigenvectors, for a -mass string with a negative weight at one edge between vertices and .

5.3 “Dumbbell” graph

Figure 8: Dumbbell graph, with two negative edges, and , marked thick red.
Figure 9: Dumbbell graph, eigenvectors of the original Laplacian (left) and the signed Laplacian (right)

Our final example is the “Dumbbell” graph, displayed in Figure 8, consisting of two complete sub-graphs of slightly unequal sizes, and , to break the symmetry, attracted by two edges with positive weights, and , and at the same time repelled by two other edges, and , with negative weights, where all weights are unit by absolute value. Since the weights of the edges between the two complete sub-graphs average to zero, intuition suggests cutting all these edges, separating the two complete sub-graphs.

Figure 9 displays the corresponding eigenvectors of the original (left) and the signed Laplacian (right). The signs of the components of the Fiedler vector in the left panel clearly point to the intuitively expected bisection, keeping the two complete sub-graphs intact. The eigenvector of the signed Laplacian in Figure 9 (right) is quite different and suggest clustering vertices and , cutting off not only the edges and with negative weights, but also a large number of edges with positive weights connecting vertices and within the first complete sub-graph. The positive components and suggest counter-intuitive cutting off vertices and from the first complete sub-graph vertex set and cluster them with the vertices of the second complete sub-graph, due to the presence of two edges with positive weights, and .

5.4 Spectral clustering via relaxation

A common approach to formulate spectral graph partitioning is via relaxation of combinatorial minimization problems, even though it is difficult to mathematically analyze how different cost functions in the combinatorial formulation affect clustering determined via their relaxed versions.

Let us compare the standard “ratio cut,” e.g., [24, 25], leading to the traditional graph Laplacian, and “signed ratio cut” of [20], used to justify the definition of the signed Laplacian. Let a graph with the set of vertices be cut into two sub-graphs induced by and . The cut value is defined as the number of cut edges for unweighted graphs and the sum of the weights of cut edges for weighted graphs. In signed graphs, thus, , where () denotes the sum of the absolute values of the weights of positive (negative) cut edges. The combinatorial balanced graph partitioning is minimizing the ratio of and the sizes of the partitions; its relaxation gives the spectral partitioning using the Fiedler vector of the graph Laplacian.

The signed ratio cut of [20] is defined by substituting the “signed cut” defined as for the “cut”. However, the value of all negative edges in the signed graph remains constant, no matter what is. We notice that, up to this constant value, is equal to

This expression is similar to that of , but the term appears with the multiplier , which suggests that the cuts minimizing quantities involving could tend to ignore the edges with negative weights, focusing instead on cutting the edges with small positive weights. In deep contrast, the positive and negative weights play equal roles in the definition of .

5.5 Comparing the eigenvectors

It is challenging to directly quantitatively compare various spectral clustering formulations where the clusters are determined from eigenvectors, since the eigenvectors depend on matrix coefficients in a complex way. We have to rely on simple examples, where we can visualize shapes of the eigenvectors and informally argue which kinds of shapes are beneficial for clustering.

To add to the trouble, there is apparently still no algorithm universally accepted by experts for an ultimate determination of multiway clusters from several eigenvectors. With this in mind, we restrict ourselves to determining the clusters from the component signs of only one eigenvector—the Fiedler vector for the traditional Laplacian, assuming the corresponding eigenvalues are simple. For the signed Laplacian, the analog of the Fiedler vector is defined in [20] as corresponding to the smallest, or second smallest, eigenvalue of the signed Laplacian, depending on if the trivial constant eigenvector is absent.

In practice, however, this single eigenvector that determines clustering is computed only approximately, typically being mostly contaminated by other eigenvectors, corresponding to the nearby eigenvalues, especially clustered, so one needs to take into account these other eigenvectors. Our first goal is to check the shapes of several exact eigenmodes already displayed in Figures 1 and 4 and to argue which shapes can be more suitable for automatic partitioning.

Figure 4 right panel displays the eigenmodes of the signed Laplacian for the same weights as in the left panel for the original Laplacian. We observe that, indeed, as we have proved above, one of the eigenvectors is piece-wise constant, as in Figure 1 right panel.

Moreover, the shapes of the other eigenmodes of the signed Laplacian in Figure 4 right panel also look more similar to those in Figure 1 right panel, corresponding to zero weight, than Figure 4 left panel, corresponding to the original graph Laplacian with the same weights.

The displayed eigenvectors of both the original and signed Laplacian exhibit jumps in the same location of the negative weight in Figure 4. However, the jumps are more pronounced in Figure 4 left panel (original Laplacian) due to sharp edges, compared to those in Figure 4 right panel (signed Laplacian), making the location of the former jumps potentially easier to detect automatically than the latter ones, if the eigenvectors are perturbed due to, e.g., numerical inaccuracies.

Figure 10: Approximate Laplacian eigenmode, unit (a: top left), zero (b: top right), and negative weight at one edge for the original (c: bottom left) and signed (d: bottom right) Laplacians.

Now we turn our attention to the single eigenvector, but approximated using an iterative eigenvalue/eigenvector solver (eigensolver); e.g., [13, 33]. To set up a direct numerical comparison for our string example, we need to choose a practical eigensolver, so let us briefly discuss computational aspects of spectral clustering. The Fiedler vector, or a group of the eigenvectors, corresponding to the left-most eigenvalues of a symmetric eigenvalue problem needs to be computed iteratively. The size of the Laplacian matrix is equal to the number of data points, which in modern applications is often extremely large. Most textbook eigensolvers, especially based on matrix transformations, become impractical for large scale problems, where in some cases the Laplacian matrix itself cannot be easily stored, even if it is sparse. We follow [13] advocating the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method; see [12]. LOBPCG does not need to store the matrix in memory, but requires only the result of multiplying the matrix by a given vector, or a block of vectors. This characteristic makes LOBPCG applicable to eigenvalue analysis problems of very high dimensions, and results in good parallel scalability to large matrix sizes processed on many parallel processors; e.g., see reference [17], describing our open source and publicly available implementation of LOBPCG. We refer to [33] for performance and timing.

Available convergence theory of LOBPCG in [12] requires the matrix be symmetric, but not necessarily with all non-negative eigenvalues, i.e., a possible presence of negative eigenvalues still satisfies the convergence assumptions. The calculation of the product of the matrix by a vector is the main cost per iteration, no matter if the weights are positive or negative.

We perform iterations of LOBPCG, without preconditioning and starting from a random initial approximation—the same for various choices of the weights and for different Laplacians for our discrete string example. The number of iterations is chosen small enough to amplify the influence of inaccuracy in approximating the eigenvector iteratively. We display a representative case in Figure 10 showing the approximately computed Laplacian eigenmodes with the unit (a), zero (b), and negative (c) weight at one edge, as well as the signed Laplacian (d), corresponding to the exact eigenfunctions in Figures 1 and 4. Initial large contributions from other eigenmodes, shown in Figures 1 and 4, remain unresolved, as anticipated. Two-way partitioning according to the signs of the components of the computed eigenmode of the Laplacian with the negative weight nullified, Figure 10 (b), or the signed Laplacian, Figure 10 (d), would result in wrong clusters.

In a sharp contrast, the exact eigenmode (the blue line in Figure 4 left panel) of the original Laplacian with the negative weight demonstrates a sharp edge with a large jump between its components of the opposite signs at the correct location of the negative edge, between the and vertices. This large jump is inherited by the corresponding approximate eigenmode in Figure 10 (c), differentiating it from all other approximate eigenmodes in Figure 10. The opposite signs of the components of the eigenmode in Figure 10 (c) allow determining the correct bisection. Large amplitudes of the absolute values of the components around the jump location in Figure 10 (c) make such a determination robust with respect to perturbations and data noise.

There are two reasons why the computed eigenmode in Figure 10 (c) visually much better approximates the exact Fiedler vector compared to other cases in Figure 10. The first one is that the shape of the exact Fiedler eigenmode (the blue line in Figure 4 left panel) is pronounced and quite different from those of other eigenfunctions in Figure 4 left panel. The second reason is related to condition numbers of eigenvectors, primarily determined by gaps in the matrix spectrum.

The convergence speed of iterative approximation to an eigenvector, as well as eigenvector sensitivity with respect to perturbations in the matrix entries, e.g., due to noise in the data, is mostly determined by a quantity, called the condition number of the eigenvector, defined for symmetric matrices as the ratio of the spread of the matrix spectrum to the gap in the eigenvalues. The larger the condition number is, the slower the typical convergence is and more sensitive to the perturbations the eigenvector becomes. The trivial zero eigenvalue of the original Laplacian can be excluded from the spectrum, if the influence the corresponding trivial eigenvector, made of ones, may be ignored. For the eigenvector corresponding to the smallest nontrivial eigenvalue, the gap is simply the difference between this eigenvalue and the nearest eigenvalue.

What happens in our example, as we see numerically, is that the largest eigenvalue remains basically the same for all variants, so we only need to check the gap. It turns out that the gap for the signed Laplacian is about times smaller, for all tested values of the negative weight, compared to the gap for the case of the zero weight, explaining why we see no improvement in Figures 10 (b) and (d), compared to (a). In contrast, introducing the negative weight in the original Laplacian tends to make the target smallest eigenvalue smaller, even negative, in our test for the discrete string, while barely changing the other eigenvalues nearby. As a result, the gap with the negative weight is times larger compared to the baseline case of the zero weight. We conclude that the eigenvector condition number for the signed Laplacian is about times larger, while for the original Laplacian is times smaller, depending on the negative weight , compared to the baseline eigenvector condition number for the Laplacian with zero weight. We conclude that in this example the signed Laplacian gives times larger condition number of the eigenvector of interest and thus is numerically inferior for spectral clustering compared to the original Laplacian.

6 Possible extensions for future work

We concentrate on the model of the system of masses connected with springs only because it directly leads to the standard definition of the graph Laplacian, giving us a simple way to justify our introduction of negative weights. Similarly, we restrict the vibrations to be transversal, since then we can use the classical two-way partitioning definition based on the signs of the components of the Fiedler vector. The negative weights can as well be introduced in other models for spectral clustering—we describe two examples below; cf. [14].

The first model is based on vibration modes of a wave equation of a system of interacting quasi-particles subjected to vibrations. Each quasi-particle of the vibration model corresponds to one of the data points. Interaction coefficients of the vibration model are determined by pair-wise comparison of the data points. The interaction is attractive/absent/repulsive and the interaction coefficient is positive/zero/negative if the data points in the pair are similar/not comparable/disparate, respectively. The strength of the interaction and the amplitude of the corresponding interaction coefficient represent the level of similarity or disparity.

The eigenmodes are defined as eigenvectors of an eigenvalue problem resulting from the usual separation of the time and spatial variables. In low-frequency or unstable vibration modes, the quasi-particles are expected to move synchronically in the same direction if they are tightly connected by the attractive interactions, but in the opposite directions if the interactions are repulsive, or in the complementary directions (where available) if the interaction is absent.

Compared to the transversal vibrations already considered, where the masses can only move up or down, on the one hand determining the clusters by analyzing the shapes of the vibrations is less straightforward than simply using the signs of the components, but, on the other hand may allow reliable detection of more than two clusters from a single eigenmode. For example, a quasi-particle representing an elementary volume of an elastic body in three-dimensional space has six degrees of freedom, which may allow definition of up to twelve clusters from a single vibration mode. Multiway algorithms of spectral graph partitioning have to be adapted to this case, where a quasi-particle associated with a graph vertex has multiple degrees of freedom.

A second, alternative, model is a system of interacting quasi-particles subjected to concentration or diffusion, described by concentration-diffusion equations. Every quasi-particle of the concentration-diffusion model corresponds to a point in the data. Conductivity coefficients of interactions of the quasi-particles are determined by pair-wise comparison of data points. The interaction is diffusive and the interaction conductivity coefficient is positive if the data points in the pair are similar. The interaction is absent and the interaction conductivity coefficient is zero if the data points in the pair are not comparable. Finally, the interaction is concentrative and the interaction conductivity coefficient is negative if the data points in the pair are disparate. The strength of the interaction and the amplitude of the interaction coefficient represent the level of similarity or disparity.

As in the first model, the eigenvalue problem is obtained by the separation of the time and spatial variables in the time dependent diffusion equation. The clusters are defined by the quasi-particles that concentrate together in unstable or slowest eigenmodes, corresponding to the left part of the spectrum.

A forward-and-backward diffusion in [9, 31] provides a different interpretation of a similar diffusion equation, but the negative sign in the conductivity coefficient is moved to the time derivative, reversing the time direction. Here, the time is going forward (backward) on the graph edges with the positive (negative) weights. Having the time forward and backward in different parts of the same model seems unnatural.

Finally, our approach allows reversing the signs of all weights, thus treating the minimum cut and the maximum cut problems in the same manner, e.g., applying the same spectral clustering techniques to the original Laplacian, in contrast to the signed Laplacian.

7 Conclusions

Spectral clustering has been successful in many applications, ranging from traditional resource allocation, image segmentation, and information retrieval to more recent bio- and material-informatics, providing good results at a reasonable cost. Improvements of cluster quality and algorithm performance are important, e.g., for big data or real-time clustering. We introduce negative weights in the graph adjacency matrix for incorporating disparities in data via spectral clustering that traditionally only handles data with similarities.

Incorporating the disparities in the data into spectral clustering is expected to be of significance and have impact in any application domain where the data disparities naturally appear, e.g., if the data comparison involves correlation or covariance. If data features are represented by elements of a vector space equipped with a vector scalar product, the scalar product can be used for determining the pair-wise comparison function having both negative and non-negative values.

Traditional spectral clustering, with only non-negative weights, remains largely intact when negative weights are introduced. Eigenvectors corresponding to the algebraically smallest eigenvalues (that can be negative) of the graph Laplacian define clusters of higher quality, compared to those obtained via the signed Laplacian. The mass-spring system with repulsive springs justifies well the use of the standard Laplacian for clustering, in contrast to the signed Laplacian that may result in counter-intuitive partitions.

References

  • [1] N. Bansal, A. Blum, and S. Chawla, Correlation clustering, Machine Learning, 56 (2004), pp. 89–113.
  • [2] M. Bolla, Spectral Clustering and Biclustering, Ch. 2: Multiway cuts and spectra, John Wiley & Sons, Ltd, 2013.
  • [3] J. C. Bronski and L. DeVille, Spectral theory for dynamics on graphs containing attractive and repulsive interactions, SIAM Journal on Applied Mathematics, 74 (2014), pp. 83–105.
  • [4] D. Chronopoulos, I. Antoniadis, and T. Ampatzidis, Enhanced acoustic insulation properties of composite metamaterials having embedded negative stiffness inclusions, Extreme Mechanics Letters, 12 (2017), pp. 48–54. Frontiers in Mechanical Metamaterials.
  • [5] J. Demmel, CS267: Notes for Lecture 23. Graph Partitioning, Part 2, 1999, {https://people.eecs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html}.
  • [6] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak Math. J, 23 (1973), pp. 298–305.
  • [7] A. Fox, T. Manteuffel, and G. Sanders, Numerical methods for gremban’s expansion of signed graphs, SIAM Journal on Scientific Computing, 39 (2017), pp. S945–S968.
  • [8] J. Gallier, Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey, arXiv:1601.04692 [cs.LG], (2016), https://arxiv.org/abs/1601.04692.
  • [9] G. Gilboa, N. Sochen, and Y. Y. Zeevi, Forward-and-backward diffusion processes for adaptive image enhancement and denoising, IEEE Tran. Image Processing, 11 (2002), pp. 689–703.
  • [10] S. H. Gould, Variational Methods for Eigenvalue Problems: An Introduction to the Weinstein Method of Intermediate Problems (Second Edition), University of Toronto Press, 1966.
  • [11] J. Ham, D. D. Lee, S. Mika, and B. Schölkopf, A kernel view of the dimensionality reduction of manifolds, in Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004, ACM, p. 47.
  • [12] A. Knyazev, Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method, SIAM J. Scientific Computing, 23 (2001), pp. 517–541.
  • [13] A. Knyazev, Modern preconditioned eigensolvers for spectral image segmentation and graph bisection, in Proceedings of the workshop Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003), Boley, Dhillon, Ghosh, and Kogan, eds., Melbourne, Florida, 2003, IEEE Computer Society, pp. 59–62.
  • [14] A. Knyazev, Method for kernel correlation-based spectral data processing, June 2014. United States Patent Application 20150363361.
  • [15] A. Knyazev, Edge-enhancing filters with negative weights, in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec 2015, pp. 260–264.
  • [16] A. Knyazev and A. Malyshev, Conjugate gradient acceleration of non-linear smoothing filters, in 2015 IEEE Global Conf. on Signal and Information Processing (GlobalSIP), Dec 2015, pp. 245–249.
  • [17] A. V. Knyazev, M. E. Argentati, I. Lashuk, and E. E. Ovtchinnikov, Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in Hypre and PETSc., SIAM J. Scientific Computing, 29 (2007), pp. 2224–2239.
  • [18] R. Kolluri, J. R. Shewchuk, and J. F. O’Brien, Spectral surface reconstruction from noisy point clouds, in Proc. 2004 Eurographics/ACM SIGGRAPH Symp. Geometry Processing, SGP ’04, New York, USA, 2004, pp. 11–21.
  • [19] J. Kunegis, A. Lommatzsch, and C. Bauckhage, The Slashdot Zoo: Mining a social network with negative edges, in Proc, 18th Int. Conf. World Wide Web, WWW ’09, New York, NY, USA, 2009, ACM, pp. 741–750.
  • [20] J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E. W. D. Luca, and S. Albayrak, Spectral analysis of signed graphs for clustering, prediction and visualization, in Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, 2010, pp. 559–570.
  • [21] R. S. Lakes, T. Lee, A. Bersie, and Y. C. Wang, Extreme damping in composite materials with negative-stiffness inclusions, Nature, 410 (2001), pp. 565–567.
  • [22] J. Liu and J. Han, Spectral clustering, in Data Clustering: Algorithms and Applications, C. Aggarwal and C. Reddy, eds., Chapman and Hall/CRC Press, 2013.
  • [23] U. Luxburg, A tutorial on spectral clustering, Stat. Comp., 17 (2007), pp. 395–416.
  • [24] M. Meila and J. Shi, Learning segmentation by random walks, in Adv. Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, eds., MIT Press, 2001, pp. 873–879.
  • [25] A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Adv. Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, eds., MIT Press, 2002, pp. 849–856.
  • [26] J. Park, M. Jeon, and W. Pedrycz, Spectral clustering with physical intuition on spring-mass dynamics, J. Franklin Inst., 351 (2014), pp. 3245–3268.
  • [27] E. Pasternak, A. V. Dyskin, and G. Sevel, Chains of oscillators with negative stiffness elements, Journal of Sound and Vibration, 333 (2014), pp. 6676 – 6687.
  • [28] A. Pothen, H. D. Simon, and K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM Journal on Matrix Analysis and Applications, 11 (1990), pp. 430–452.
  • [29] L. Rossi, A. Torsello, and E. R. Hancock, Unfolding kernel embeddings of graphs: Enhancing class separation through manifold learning, Pattern Recognition, 48 (2015), pp. 3357–3370.
  • [30] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22 (2000), pp. 888–905.
  • [31] L. Tang and Z. Fang, Edge and contrast preserving in total variation image denoising, EURASIP J. Adv. Signal Processing, 2016 (2016), pp. 1–21.
  • [32] L. Wu, X. Wu, A. Lu, and Y. Li, On spectral analysis of signed and dispute graphs, in 2014 IEEE Int. Conf. Data Mining, Dec 2014, pp. 1049–1054.
  • [33] D. Zhuzhunashvili and A. Knyazev, Preconditioned spectral clustering for stochastic block partition streaming graph challenge, in IEEE High Performance Extreme Computing Conference (HPEC) IEEE Xplore, 2017, pp. 1–6, https://doi.org/10.1109/HPEC.2017.8091045.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
133415
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description