Cheeger Inequalities for Submodular Transformations

Cheeger Inequalities for Submodular Transformations

Yuichi Yoshida111Supported by JST ERATO Grant Number JPMJER1305 and JSPS KAKENHI Grant Number JP17H04676.
National Institute of Informatics
yyoshida@nii.ac.jp
Abstract

The Cheeger inequality for undirected graphs, which relates the conductance of an undirected graph and the second smallest eigenvalue of its normalized Laplacian, is a cornerstone of spectral graph theory. The Cheeger inequality has been extended to directed graphs and hypergraphs using normalized Laplacians for those, that are no longer linear but piecewise linear transformations.

In this paper, we introduce the notion of a submodular transformation , which applies submodular functions to the -dimensional input vector, and then introduce the notions of its Laplacian and normalized Laplacian. With these notions, we unify and generalize the existing Cheeger inequalities by showing a Cheeger inequality for submodular transformations, which relates the conductance of a submodular transformation and the smallest non-trivial eigenvalue of its normalized Laplacian. This result recovers the Cheeger inequalities for undirected graphs, directed graphs, and hypergraphs, and derives novel Cheeger inequalities for mutual information and directed information.

Computing the smallest non-trivial eigenvalue of a normalized Laplacian of a submodular transformation is NP-hard under the small set expansion hypothesis. In this paper, we present a polynomial-time -approximation algorithm for the symmetric case, which is tight, and a polynomial-time -approximation algorithm for the general case.

We expect the algebra concerned with submodular transformations, or submodular algebra, to be useful in the future not only for generalizing spectral graph theory but also for analyzing other problems that involve piecewise linear transformations, e.g., deep learning.

1 Introduction

1.1 Background

Spectral graph theory is concerned with the relations between the properties of a graph and the eigenvalue/vectors of matrices associated with the graph (refer to [7] for a book). One of the most seminal results in spectral graph theory is the Cheeger inequality [1, 2], which we briefly review below. Let be an undirected graph. The conductance of a vertex set is defined as

 ϕG(S)=cutG(S)min{volG(S),volG(V∖S)},

where the cut size of , denoted by , is the number of edges between and , and the volume of , denoted by , is the sum of degrees of the vertices in . The conductance of is the minimum conductance of a vertex set . The problem of finding a vertex set of a small conductance has been intensively studied because such a set can be regarded as a tight community [9, 16]. Although computing is an NP-hard problem, we can well approximate it using the Cheeger inequality, which relates and an eigenvalue of a matrix constructed from known as the normalized Laplacian. Here, the Laplacian of is the matrix , where is the diagonal matrix consisting of the degrees of vertices and is the adjacency matrix, and the normalized Laplacian of is the matrix . Then, the Cheeger inequality [1, 2] states that

 λG2≤ϕG≤√2λG, (1)

where is the second smallest eigenvalue of (note that the smallest eigenvalue is zero with the corresponding trivial eigenvector , where is the all-one vector). Indeed, the second inequality of (1) yields an algorithm, which computes a set of conductance at most from an eigenvector corresponding to . Moreover, the Cheeger inequality is tight in the sense that computing a set with a conductance is NP-hard [25] assuming the small set expansion hypothesis (SSEH) [24].

Extensions of the Cheeger inequality were recently proposed for directed graphs [28] and hypergraphs [5, 18] by using modified notions of conductance and a normalized Laplacian. We note that normalized Laplacians for directed graphs and hypergraphs are no longer linear but piecewise linear transformations. We can show that those normalized Laplacians always have the eigenvalue of zero associated with a trivial eigenvector, and that they also have a non-trivial eigenvalue in the sense that the corresponding eigenvector is orthogonal to the trivial eigenvector. Then, the extended Cheeger inequalities [5, 18, 28] relate the conductance of a directed graph or a hypergraph with the smallest non-trivial eigenvalue of its normalized Laplacian. However, as those normalized Laplacians are no longer linear transformations, computing its smallest non-trivial eigenvalue becomes NP-hard under the SSEH [5, 18]. Although a polynomial-time -approximation algorithm is known for hypergraphs on vertices, which is tight under the SSEH [5, 18], no non-trivial polynomial-time approximation algorithm is known for directed graphs.

1.2 Our contributions

In this paper, we unify and extend the existing Cheeger inequalities discussed above by introducing the notions of a submodular transformation and its normalized Laplacian. A set function is called submodular if for every . We note that the cut function associated with an undirected graph, a directed graph, or a hypergraph is submodular, where for a vertex set represents the number of edges, arcs, or hyperedges leaving and entering . We say that a function is a submodular transformation if is a submodular function for every .

To derive a Cheeger inequality for a submodular transformation , we need to define the conductance of a set with respect to and the normalized Laplacian associated with . First, we define the degree of as the number of ’s to which is relevant. (See Section 2 for the formal definition.) For a set , we define the volume of as and the cut size of as . Then, we define the conductance of a set as

 ϕF(S)=min{cutF(S),cutF(V∖S)}min{volF(S),volF(V∖S)}.

We define the conductance of as .

Example 1.1.

Let be an undirected graph. Now, we consider a submodular transformation , where is the cut function of the undirected graph with a single edge . Then, for a vertex coincides with the usual degree of , and for a vertex set coincides with the usual cut size of . As is symmetric, that is, holds for every vertex set , coincides with the conductance of in the graph sense.

Using a submodular transformation , we can define its Laplacian . We defer its definition to Section 3 as we need several other notions to define it. Here, we note that is set-valued and forms a convex polytope in . However, the measure of the set consisting of with not being a single point is zero. Hence, we can almost always regard as a function that maps a vector in to another vector in . Moreover, for with consisting of a single point, acts as a linear transformation. Hence, we can basically regard as a piecewise linear function.

Next, we define the normalized Laplacian as , where is a diagonal matrix with . We say that is an eigenvalue of if there exists a non-zero vector such that . As with the normalized Laplacian for an undirected graph, when , we can show that is positive-semidefinite, that is, all the eigenvalues are non-negative, and that , that is, is the smallest eigenvalue of with the corresponding trivial eigenvector . Then, we can also show that there exists a non-trivial eigenvalue in the sense that the corresponding eigenvector is orthogonal to . We denote by the smallest non-trivial eigenvalue of .

Example 1.2.

For an undirected graph , we define a submodular transformation as in Example 1.1. Then, essentially equals to the usual normalized Laplacian for because consists of a single vector . (See Example 3.3 for details.) Moreover, is equal to the second smallest eigenvalue of .

We show the following Cheeger inequality that relates and :

Theorem 1.3.

Let be a submodular transformation with and for every . Then, we have

 λF2≤ϕF≤2√λF.

We now see several instantiations of Theorem 1.3.

Example 1.4 (Undirected graphs).

For an undirected graph , we define a submodular transformation as in Example 1.1. Then, Theorem 1.3 reduces to the Cheeger inequality for undirected graphs (with a slightly worse coefficient in the right inequality, that is, instead of ).

Example 1.5 (Directed graphs).

Let be a directed graph. Then, we define a submodular transformation so that, for each arc , is the cut function of the directed graph with a single arc . Then, for a vertex is the number of arcs to which is incident as a head or a tail, and for a vertex set is the number of arcs leaving and entering . Then, the Cheeger inequality derived from Theorem 1.3 coincides with that in [28].

Example 1.6 (Hypergraphs).

Let be a hypergraph. Then, we define a submodular transformation so that, for each hyperedge , is the cut function of the hypergraph with a single hyperedge . Then, for a vertex is the number of hyperedges incident to , and for a vertex set is the number of hyperedges containing a vertex in and another vertex in . Then, the Cheeger inequality derived from Theorem 1.3 coincides with that in [5, 18].

Theorem 1.3 also derives some novel Cheeger inequalities for joint distributions.

Example 1.7 (Mutual information).

Let be a set of Boolean random variables with . Then, it is known that the mutual information as a function of satisfies submodularity. From the fact that the random variables are Boolean, is bounded by . Now, we define a submodular transformation (or, function) as , divided by for normalization. Then, for , and . Since is symmetric, we have . Intuitively speaking is small when there is a partition of into large sets and such that we obtain little information on by observing , and vice versa. We can bound from below and above by Theorem 1.3 using .

Example 1.8 (Directed information).

Let be a finite set with and for each , we consider a sequence of Boolean random variables, where we regard as the random variable associated with at time . Then, for a set and , we define as the set of random variables associated with available at time , and define . For two sets , the directed information from to , denoted by , is defined as , which measures the amount of information that flows from to . Directed information has many applications in causality analysis [20, 21, 22]. The directed information as a function of is known to be submodular but is unnecessarily symmetric [29].

As in Example 1.7, we define a submodular transformation (or, function) as , divided by for normalization. Then, we can bound from below and above by Theorem 1.3 using .

We note that we can easily generalize Examples 1.7 and 1.8 to the case with multiple joint distributions.

The right inequality in Theorem 1.3 is algorithmic in the following sense: Given a vector orthogonal to , we can compute in polynomial time a set such that , where is the Rayleigh quotient of defined as

 RF(x)=⟨x,LF(x)⟩∥x∥22.

Here, we can show that has the same value for any , and hence we denote it by by abusing the notation. We can show that is the minimum of subject to and being orthogonal to the trivial eigenvector, that is, .

Example 1.9.

For a submodular transformation associated with a undirected graph (see Example 1.1), we have . For a submodular transformation associated with a directed graph (see Example 1.5), we have . For a submodular transformation associated with a hypergraph (see Example 1.6), we have .

As opposed to the matrix case, it is NP-hard to compute under the SSEH. Hence, we consider approximating . First, we provide the following approximation algorithm for symmetric submodular transformations. Here, we say that a submodular transformation is symmetric if for every .

Theorem 1.10.

There is an algorithm that, given and (a value oracle of) a non-negative symmetric submodular transformation with , computes a non-zero vector such that and

 λF≤RF(x)≤O(lognϵ2λF+ϵB),

with a probability of at least in time, where , , and is the maximum Euclidean norm of a point in the base polytopes of ’s.

The definition of the base polytope is deferred to Section 2. We do not need the condition because it follows from and the symmetry of . The left inequality is trivial because is the minimum of subject to and . We note that the approximation ratio of is tight [5, 18] under the SSEH even when the submodular transformation is constructed from a hypergraph as in Example 1.6.

For general submodular transformations, we give the following algorithm:

Theorem 1.11.

There is an algorithm that, given and (a value oracle of) a non-negative submodular transformation with , computes a non-zero vector such that and

 λF≤RF(x)≤O(lognlog(n1/ϵ2m)ϵ2λF+ϵB)=O((log2nϵ4+lognlogmϵ2)λF+ϵB),

with a probability of at least in time, where , , and is the maximum Euclidean norm of a point in the base polytopes of ’s.

Again, the left inequality is trivial. Although the approximation ratio for the general case is slightly worse than that for the symmetric case, it remains polylogarithmic in and .

Now, we provide concrete bounds on for some specific cases. For the cut functions explained in Examples 1.41.5, and 1.6, we have , and hence the approximated eigenvalue satisfies . Then, we have by Theorem 1.3. Hence, the lower bound is meaningful when , which always holds when . For the mutual and directed information explained in Example 1.7 and 1.8, we have , and hence we have by Theorem 1.3. Hence, the lower bound is meaningful when .

1.3 Proof sketch

The proof of our Cheeger inequality for submodular transformations (Theorem 1.3) is similar to those of the existing Cheeger inequalities [1, 2, 5, 18, 28], although we have to use some specific properties of submodular functions.

In order to prove Theorem 1.10 and 1.11, that is, to approximate the smallest non-trivial eigenvalue of the normalized Laplacian of a submodular transformation, we use semidefinite programming (SDP). To this end, we first rephrase its Rayleigh quotient using Lovász extensions. For a set function , we define its Lovász extension as , where is the base polytope of (see Section 2 for the definition). Then, for a submodular transformation , the numerator of can be written as

 ⟨x,LF(x)⟩=∑e∈Efe(x)2=∑e∈E(maxw∈B(Fe)⟨w,x⟩)2, (2)

where is the Lovász extension of . Now the goal is to minimize this numerator (2) subject to and .

In the symmetric case, we can show that it is possible to further rephrase the numerator of as

 ⟨x,LF(x)⟩=∑e∈Efe(x)2=∑e∈Emaxw∈B(Fe)⟨w,x⟩2.

A problem here is that is a polytope and we cannot express the maximum over in an SDP. Although it is not difficult to show that we only have to take the maximum over extreme points of , the number of extreme points can be in general, which is prohibitively large. To address this issue, we replace with an -cover (see Theorem 1.10 for the definition of ), which is a set of points such that for any , there exists a point with . Using the properties of submodular functions, we can show that there is an -cover of size roughly (instead of being exponential in ), and we can efficiently compute it by exploiting Wolfe’s algorithm [27], which is useful for judging whether a given point is close to a base polytope. Then, we can solve the resulting SDP in polynomial time in and . The additive error of in Theorem 1.10 (and Theorem 1.11 as well) occurs when replacing by its -cover .

For each variable in the Rayleigh quotient, we introduce an SDP variable for a large . Then after solving the SDP, we round the obtained solution using the Gaussian rounding, that is, , where is sampled from a standard normal distribution . Then, we can show that the value of is roughly equal to . Note that, as each is normally distributed, for each acts as a normal random variable. Then, the value is larger than the SDP value by a factor of , caused when taking the maximum of many squared normal variables for each . We can also show that the denominator is at least half and the constraint is satisfied with high probability, and hence we establish Theorem 1.10.

The general case is more involved as we should stick to the numerator of the form (2). To see the difficulty, suppose that the numerator of the Rayleigh quotient is zero in the SDP relaxation, that is, we obtained an SDP solution satisfying for every , where is a unit vector representing the value of one. Here, this value is supposed to represent . Hence for the vector obtained by rounding , we expect that . However, if we adopt the Gaussian rounding as with the symmetric case, then for each acts as a normal random variable. This means that, with a high probability, we have , and hence the approximation ratio can be arbitrarily large.

The above-mentioned problem is avoided by decomposing as , where is the projection matrix to the subspace orthogonal to . Then, we construct two vectors and such that and for each , where and is sampled from the standard normal distribution . This rounding procedure places more importance on the direction than on other directions. Then, with an additional constraint in the SDP, we can show that the Rayleigh quotient of at least one of them achieves -approximation.

We have mentioned that the smallest non-trivial eigenvalue of the normalized Laplacian of a submodular transformation is obtained as the minimum of the Rayleigh quotient subject to and . As opposed to symmetric matrices, the relation between the eigenvalues of and the Rayleigh quotient is not immediate because is not a linear transformation. Indeed, it is not clear whether has a non-trivial eigenvalue at all. To show this, we consider the following diffusion process associated with : , that is, at each moment we move the current vector to a direction chosen from . The idea of using such a diffusion process was already mentioned in [5, 18, 28]. The fact that is not continuous in means that may not proceed beyond a certain point. For example, after moving to along the direction for an infinitesimal time, it could be that the direction is in and returns to by moving along the direction for an infinitesimal time. In order to avoid this problem, we need to choose a direction at each moment so that the direction also exists in , where is the vector obtained by moving along the direction for an infinitesimal time. Fortunately, we can show that such a direction always exists by using Kakutani’s fixed point theorem [10]. Then, analyzing the point at which converges in the diffusion process, we can guarantee that there exists a small non-trivial eigenvalue of and it is achieved by the minimum of the Rayleigh quotient subject to and .

1.4 Discussions

For undirected graphs, several extensions of the Cheeger inequality have been proposed. For a graph , the order- conductance of disjoint vertex sets is defined as their maximum conductance, and the order- conductance of a graph is the minimum order- conductance of disjoint vertex sets taken from the graph. Then, the higher order Cheeger inequality [15, 19] bounds the order- conductance of a graph from below and above by the -th smallest eigenvalue of its normalized Laplacian. The standard conductance is also analyzed using the -th smallest eigenvalue [13, 14]. In [26], it is argued that the largest eigenvalue of a normalized Laplacian can be used to bound from below and above the bipartiteness ratio, which measures the extent to which the graph is approximated by a bipartite graph. Its higher order version is also studied [17]. It would be interesting to generalize these extended Cheeger inequalities for submodular transformations.

We believe that the notion of a submodular transformation will be useful not only for generalizing spectral graph theory but also for analyzing various problems that involve piecewise linear functions. To see this, we introduce the notion of a Lovász transformation, which is a function of the form such that is the Lovász extension of some submodular function for each .

Lovász transformations are piecewise linear in general, and can express any linear transformation. In particular, we can express a deep neural network using several Lovász transformations, as explained below. A typical feed-forward neural network used in deep learning is of the following form:

 f(x)=WL(σL−1(Wℓ−1(⋯σ2(W2σ1(W1x)))),

where is a matrix with and and is a rectified linear unit (ReLU), which applies the following operation coordinate-wise: . Then, in the regression setting with the -norm loss, given training examples , we aim to find that minimizes the loss function . As the loss function is non-convex, we cannot hope to obtain the global minimum in polynomial time, and hence we want to analyze the structure of local minima. When ReLUs are not applied in a neural network, every local minimum is known to be a global minimum (under a plausible assumption) [11]. However, the proof heavily relies on elegant properties of linear transformations and it does not generalize to the case with ReLUs.

Note that the function is the Lovász extension of the cut function of the directed graph consisting of a single arc . Using this fact, we can express the feed-forward neural network as an iterated applications of Lovász transformations. First, define as the matrix obtained from by adding the all-zero row vector. Then, we define as . Finally, we define as

 f′(x)=WL(σ′L−1(W′ℓ−1(⋯σ′2(W′2σ′1(W′1x)))).

We can observe that acts as a ReLU because the last element of the vector given to is always zero. Hence, we have . This observation implies that we could deepen the understanding of deep learning by studying Lovász transformations.

Indeed, the smallest non-trivial eigenvalue of the Laplacian of a submodular transformation is equal to for the corresponding Lovász transformation , which can be regarded as the smallest non-trivial singular value of . (The connection will become clear in Section 3.) Hence, this work can be seen as the first step toward extending linear algebra to the algebra with Lovász transformations, or submodular algebra.

1.5 Organization

In Section 2, we review basic properties of submodular functions. In Section 3, we formally define submodular transformation and its Laplacian, and observe their basic properties. We prove the Cheeger inequality for submodular transformations in Section 4. We consider the covering number of the base polytope of a submodular function in Section 5. Then, we provide polynomial-time approximation algorithms for the smallest non-trivial eigenvalue of a normalized submodular Laplacian for the symmetric and general cases in Sections 6 and 7, respectively. In Section 8, we show that the (normalized) Laplacian of a submodular transformation has a non-trivial eigenvalue and it can be obtained by minimizing the Rayleigh quotient.

2 Preliminaries

For an integer , we define as the set . For a subset , we define as the indicator vector of , that is, if and otherwise. When , we simply write . For a vector and a subset , we define as the vector such that for every and for every . The support of a vector , denoted by , is defined as the set . For a polyhedron , , and , we define and . For a polytope , we define as the maximum -norm of a point in .

For a set function , we define . For a set function , a set , and an element , we define as the marginal gain .

A function is referred to as submodular if

 f(S)+f(T)≥f(S∪T)+f(S∩T)

for every . We say that a function is symmetric if for every . A submodular function is referred to as normalized if . In this work, we only consider normalized submodular functions.

We consider a variable relevant in if adding (or removing) from the input set may change the value of , that is, there exists some such that . We consider irrelevant otherwise. The support of , denoted by , is the set of relevant variables of .

Let be a submodular function. The submodular polyhedron and the base polytope of are defined as

 P(F)={x∈RV∣∑v∈Sx(v)≤F(S)∀S⊆V}andB(F)={x∈P(F)∣∑v∈Vx(v)=F(V)}.

As the name suggests, it is known that the base polytope is bounded (Theorem 3.12 of [8]).

The Lovász extension of a submodular function is defined as

 f(x)=maxw∈B(F)⟨w,x⟩.

We note that for every and hence we can uniquely recover a submodular function from its Lovász extension.

We define 222We adopted the notation because each vector in is a subgradient of at  [8]. However, we do not use this property in the work presented in this paper. as the set of vectors that attains . The following is well known:

Lemma 2.1 (Theorem 3.22 of [8]).

Let be the Lovász extension of a submodular function. Then, every extreme point of is obtained as follows: Let be an ordering of with such that . Then, for every .

In particular, every extreme point of can be obtained by following this approach by setting .

The algorithm for computing based on the ordering of values in is known as Edmonds’ algorithm in the literature. By Lemma 2.1, as long as the ordering of values does not change, we can use the same for computing .

3 Submodular Transformations and their Laplacians

In this section, we introduce the notion of a submodular transformation and its Laplacian and normalized Laplacian.

For a function and , let be the -th component of , that is, . Then, we define a submodular transformation as follows:

Definition 3.1 (Submodular transformation).

We say that is a submodular transformation if the function is a submodular function for every .

For a submodular transformation , we always use the symbols and to denote and . We say that a submodular transformation is symmetric if for every . The Lovász extension of a submodular transformation is such that is the Lovász extension of for each . The Lovász extensions of submodular transformations are collectively referred to as Lovász transformations. For a submodular transformation , we will use symbols and to denote those functions.

In Section 3.1, we define the Laplacian of a submodular transformation, which we collectively refer to as a submodular Laplacian, and study its basic spectral properties. In Section 3.2, we discuss the normalized version of a submodular Laplacian.

3.1 Submodular Laplacians

We define the Laplacian associated with a submodular transformation as follows:

Definition 3.2 (Submodular Laplacian).

Let be a submodular transformation. Then, the Laplacian of is defined as

where is the Lovász extension of for each .

We can verify that, for every , we have , and hence we write to denote by abusing the notation. Let be the Lovász extension of . Then, we have for any . Hence, we can symbolically understand as because , and this is the intuition behind the definition of .

Example 3.3.

For an undirected graph , we define a submodular transformation as in Example 1.1. Then for an edge , we have if , if , and is of the form for if . Then, we can verify that , where is the usual Laplacian of .

A pair is called an eigenpair of a submodular Laplacian if . Such and are called eigenvalue and eigenvector of , respectively. When a submodular transformation satisfies , its Laplacian satisfies the following elegant spectral properties:

Lemma 3.4.

Let be a submodular transformation with . Then, is an eigenpair of .

We have . ∎

Lemma 3.5.

Let be a submodular transformation. Then,