Graph Similarity and Approximate Isomorphism
The graph similarity problem, also known as approximate graph isomorphism or graph matching problem, has been extensively studied in the machine learning community, but has not received much attention in the algorithms community: Given two graphs of the same order with adjacency matrices , a well-studied measure of similarity is the Frobenius distance
where ranges over all permutations of the vertex set of , where denotes the matrix obtained from by permuting rows and columns according to , and where is the Frobenius norm of a matrix . The (weighted) graph similarity problem, denoted by GSim (WSim), is the problem of computing this distance for two graphs of same order. This problem is closely related to the notoriously hard quadratic assignment problem (QAP), which is known to be NP-hard even for severely restricted cases.
It is known that GSim (WSim) is NP-hard; we strengthen this hardness result by showing that the problem remains NP-hard even for the class of trees. Identifying the boundary of tractability for WSim is best done in the framework of linear algebra. We show that WSim is NP-hard as long as one of the matrices has unbounded rank or negative eigenvalues: hence, the realm of tractability is restricted to positive semi-definite matrices of bounded rank. Our main result is a polynomial time algorithm for the special case where one of the matrices has a bounded clustering number, a parameter arising from spectral graph drawing techniques.
Graph isomorphism has been a central open problem in algorithmics for the last 50 years. The question of whether graph isomorphism is in polynomial time is still wide open, but at least we know that it is in quasi-polynomial time . On the practical side, the problem is largely viewed as solved; there are excellent tools [9, 16, 21, 22] that efficiently decide isomorphism on all but very contrived graphs . However, for many applications, notably in machine learning, we only need to know whether two graphs are “approximately isomorphic”, or more generally, how “similar” they are. The resulting graph similarity problem has been extensively studied in the machine learning literature under the name graph matching (e.g. [1, 10, 15, 29, 30]), and also in the context of the schema matching problem in database systems (e.g. ). Given the practical significance of the problem, surprisingly few theoretical results are known. Before we discuss these known and our new results, let us state the problem formally.
It is not obvious how to define the distance between two graphs, but the distance measure that we study here seems to be the most straightforward one, and it certainly is the one that has been studied most. For two -vertex graphs and with adjacency matrices and , we define the Frobenius distance between and to be
Here ranges over all permutations of the vertex set of , denotes the matrix obtained from by permuting rows and columns according to , and the norm is the Frobenius norm of a matrix . Note that counts the number of edge mismatches in an optimal alignment of the two graphs. The graph similarity problem, denoted by GSim, is the problem of computing for graphs of the same order, or, depending on the context, the decision version of this problem (decide whether for a given ). We can easily extend the definitions to weighted graphs and denote the weighted graph similarity problem by WSim. In practice, this is often the more relevant problem. Instead of the adjacency matrices of graphs, we may also use the Laplacian matrices of the graphs to define distances. Recall that the Laplacian matrix of a graph is the matrix , where is the diagonal matrix in which the entry is the degree of the th vertex, or in the weighted case, the sum of the weights of the incident edges. Let be the corresponding distance measure. Intuitively, in the definition of we prefer permutations that map vertices of similar degrees onto one another. Technically, is interesting, because the Laplacian matrices are positive semidefinite (if the weights are nonnegative). Both the (weighted) similarity problem and its version for the Laplacian matrices are special cases of the problem MSim of computing for given symmetric matrices . In the Laplacian case, these matrices are positive semidefinite.111Note that the notion of similarity that we use here has nothing to do with the standard notion of “matrix similarity” from linear algebra.
The graph similarity problem is closely related to quadratic assignment problem (QAP) : given two -matrices , the goal is to find a permutation that minimizes . The usual interpretation is that we have facilities that we want to assign to locations. The entry is the flow from the th to the th facility, and the entry is the distance from the th to the th location. The goal is to find an assignment of facilities to locations that minimizes the total cost, where the cost for each pair of facilities is defined as the flow times the distance between their locations. The QAP has a large number of real-world applications, as for instance hospital planning , typewriter keyboard design , ranking of archeological data , and scheduling parallel production lines . On the theoretical side, the QAP contains well-known optimization problems as special cases, as for instance the Travelling Salesman Problem, the feedback arc set problem, the maximum clique problem, and all kinds of problems centered around graph partitioning, graph embedding, and graph packing.
In the maximization version max-QAP of QAP the objective is to maximize (see [19, 24]). Both QAP and max-QAP are notoriously hard combinatorial optimization problems, in terms of practical solvability  as well as in terms of theoretical hardness results even for very restricted special cases [5, 8, 7]. It is easy to see that MSim is equivalent to max-QAP, because in reductions between QAP and MSim the sign of one of the two matrices is flipped. Most of the known results for GSim and its variants are derived from results for (max)QAP.
It seems to be folklore knowledge that GSim is NP-complete. For example, this can be seen by a reduction from the Hamiltonian path problem: take to be the -vertex input graph and a path of length ; then if and only if has a Hamiltonian path. By the same argument, we can actually reduce the subgraph isomorphism problem to GSim. Arvind, Köbler, Kuhnert, and Vasudev  study several versions of what they call approximate graph isomorphism; their problem Min-PGI is the same as our GSim. They prove various hardness of approximation results. Based on an earlier QAP-approximation algorithm due to Arora, Frieze, and Kaplan , they also obtain a quasi-polynomial time approximation algorithm for the related problem Max-PGI. Further hardness results were obtained by Makarychev, Manokaran, and Sviridenko  and O’Donnell, Wright, Wu, and Zhou , who prove an average case hardness result for a variant of GSim problem that they call robust graph isomorphism. Keldenich  studied the similarity problem for a wide range matrix norms (instead of the Frobenius norm) and proved hardness for essentially all of them.
Spectral Graph Visualization.
Since WSim and MSim are essentially linear algebraic problems, it is reasonable to hope that the spectral structure of the input (adjacency) matrices is closely related with the computational complexity of these problems. In this regard, we remark that spectral graph drawing is a well-established technique for visualizing graphs via their spectral properties. Formally, let be a -vertex graph: a graph drawing is a map , where the ambient space has dimension . For spectral graph drawings, this map is typically defined as follows. We select a suitable matrix representation of the graph and select up to eigenvectors of this matrix. Then, the mapping is defined by the rows of the matrix . The choice of the matrix representation and the selection of eigenvectors usually depends on the problem at hand. The most useful matrix representation in the spectral drawing framework is the well-known Laplacian matrix: the eigenvectors corresponding to smallest eigenvalues define the drawing of interest.
Observe that the graph drawing defined above is not injective in general. Given such a drawing , we define the clustering number of a graph to be the cardinality of the set . The elements of correspond to subsets of : every vertex in such a ‘cluster’ has identical adjacency.
So where does all this leave us? Well, GSim is obviously an extremely hard optimization problem. We start our investigations by adding to the body of known hardness results: we prove that GSim remains NP-hard even if both input graphs are trees (Theorem 3.2). Note that in strong contrast to this, the subgraph isomorphism problem becomes easy if both input graphs are trees . The reduction from Hamiltonian path sketched above shows that GSim is also hard if one input graph is a path. We prove that GSim is tractable in the very restricted case that one of the input graphs is a path and the other one is a tree (Theorem 3.3).
As WSim and MSim are essentially linear algebraic problems, it makes sense to look for algebraic tractability criteria. We explore bounded rank (of the adjacency matrices) as a tractability criteria for WSim and MSim. Indeed, the NP-hardness reductions for GSim involve graphs which have adjacency matrices of high rank (e.g. paths, cycles). We show that the problem GSim (and WSim) remains NP-hard as long as one of the matrices has unbounded rank or negative eigenvalues. (Theorems 3.4, 3.5 and 3.6). Consequently, the realm of tractability for WSim (and MSim) is restricted to the class of positive semi-definite matrices of bounded rank. We feel that for a problem as hard as QAP or MSim, identifying any somewhat natural tractable special case is worthwhile. Our main result (Theorem 4.1) is a polynomial time algorithm for MSim if both input matrices are positive semidefinite (as it is the case for the Laplacian version of WSim) and have bounded-rank, and where one of the matrices has a bounded clustering number.
For the proof of Theorem 4.1, we can re-write the (squared) objective function as , where ranges over all permutation matrices. This is a convex function, and it would be feasible to minimize it over a convex domain. The real difficulty of the problem lies in the fact that we are optimizing over the complicated discrete space of permutation matrices. Our approach relies on a linearization of the solution space, and the key insight (Lemma 4.2) is that the optimal solution is essentially determined by polynomially many hyperplanes. To prove this, we exploit the convexity of the objective function in a peculiar way.
We denote the set by . Unless specified otherwise, we will always assume that the vertex set of an -vertex graph is . We denote the degree of a vertex by .
Given an matrix , the row (column) of is denoted by (). The multiset is denoted by . Given , the sum is denoted by . We denote the identity matrix by .
A real symmetric matrix is called positive semi-definite (p.s.d), denoted by , if the scalar is non-negative for every . The following conditions are well-known to be equivalent.
Every eigenvalue of is non-negative.
for some matrix . In other words, there exist vectors such that .
Given two vectors , their dot product is defined to be . Given , the inner product of w.r.t. M, denoted by , is defined to be . The usual dot product corresponds to the case , the identity matrix.
Every symmetric matrix of rank has a spectral decomposition . Here, is a diagonal matrix with the eigenvalues on the diagonal. The matrix is a matrix with the corresponding eigenvectors as the columns .
Graphs and Matrices.
The Laplacian matrix of a (weighted) undirected graph , denoted by , is defined as follows. Let be the symmetric (weighted) adjacency matrix of . Let be a diagonal matrix, such that is the sum of weights of the edges incident on the vertex. For simple undirectred graphs, . Define the Laplacian of as . This definition allows us to express the quadratic form
The above expression immediately implies that is positive semi-definite.
Recall the following definitions from Section 1. Given a -vertex graph , a graph drawing is a map , where the ambient dimension . We will use the adjacency matrix of a graph to generate spectral graph drawings as follows. Let the rank of be , and let be a spectral decomposition. Denote , where are the eigenvectors of . The mapping of our interest is defined by the rows of the matrix . Given any two spectral decompositions and , it holds that for some orthogonal matrix . Since is invertible, the number of distinct tuples in the set is equal to the corresponding number for the set . This allows us to define the clustering number of a graph : it is equal to the cardinality of the set , where is defined via some spectral decomposition of , as above. The above definitions generalize to weighted (undirected) graphs in an analogous manner.
The trace of a matrix , denoted by , is defined to be . The trace inner product of two matrices and , denoted by , is the scalar . The Frobenius norm of a matrix is defined in the introduction. It is easy to check that .
Given two -vertex graphs and and a permutation , a -mismatch between and is a pair such that and (or vice-versa). In other words, does not preserve adjacency for the pair . The following claim will be useful as a combinatorial interpretation of the Frobenius norm. Let denote the number of -mismatches between and .
The only non-zero terms in the expansion of summation correspond to -mismatches. Since every mismatch contributes and is counted twice in the summation, the claim follows. ∎
2.2 Convex Optimization
A hyperplane in the Euclidean space is a -dimensional affine subspace. The usual representation of a hyperplane is a linear equation for some . The convex sets and are called the open half-spaces corresponding to , denoted by respectively.
Two sets are weakly linearly separated if there exists a hyperplane such that and . In this case, we call them to be weakly linearly separated along . A family of sets is weakly linearly separated if for every , the sets are weakly linearly separated. Let be a partition of a set into sets . The partition is said to be mutually linearly separated if the family of sets is weakly linearly separated.
A subset is called convex if for every , , . A function is called convex on a convex set if for every , . The following theorem about linearization of convex differentiable functions is well-known and is stated without proof. The gradient of a function , denoted by , is the vector-valued function . Given , let denote the vector .
Theorem 2.1 (Convex function linearization)
Let be a convex function. For all , .
Next, we show that the linearization of a convex function can be useful in understanding its optima over a finite domain. We prove the following lemma about convex functions, which is interesting in its own right.
Let be a finite subset of . Let , such that is convex, and let be defined as . Let .
Then there exist a such that:
In other words, for every which maximizes over , there exists a partially “linearized” function such that maximizes over . Moreover, every maximizer of over is a maximizer of over . This additional condition is necessary so that this “linearization” does not create spurious optimal solutions.
Let . Since is convex, we can use Theorem 2.1 to linearize around . Hence, there exists a such that , or equivalently,
for all . Hence with , for all we have
where the inequality holds by (2) and because maximizes . Hence maximizes as well, which proves (i).
For (ii), consider . To prove that , it suffices to prove that . By (i), we have . Thus
where the inequality holds by (2) with and as maximizes . ∎
Let be a finite subset of . For all , let be a convex function, and let be defined ny . Let .
Then there are such that:
Inductively apply the lemma to the functions
Finally, we state an important fact about the convexity of quadratic functions. Given a p.s.d. matrix , the quadratic function is defined as .
Lemma 2.2 (Convexity of p.s.d)
is convex on .
For all , . Using , we can show that . Combining, we have . Hence, is convex. ∎
2.3 Simulation of Simplicity
In this section, we describe an elegant technique for handling degeneracy in the input data for geometrical algorithms that is due to Edelsbrunner and Mücke . We also state an important lemma which will be directly useful for our algorithmic results in Section 4.
An input set of points is said to be in general position, if there is no subset with that lies on a common hyperplane. If we are optimizing a certain function of this input on a discrete space , infinitesimally small perturbations of will not change the set of optimal solutions. Hence we may always assume (modulo infinitesimal perturbations) that such input sets are in general position and do not contain degenerate subsets. From the algorithmic point of view, the caveat is that these perturbations might be so small that we cannot even represent them efficiently.
In this context, Edelsbrunner and Mücke  developed a useful technique to handle degeneracy in input data, called Simulation-of-Simplicity. The idea is to introduce conceptual perturbations which eliminate all degeneracies: the perturbations are never computed explicitly in practice. In fact, the perturbations are just certain conveniently chosen polynomials in a parameter , so that after adding these polynomials to the coordinates the perturbed set agrees with the input set for . For our purposes, we require such a perturbation of an input set of points that brings them into general position. We select perturbations for and as follows. We perturb the coordinate of vector by adding . In our algorithmic application, we need to consistently answer queries of the type: “Given points (with ) and a point , does the point lie below, on, or above the hyperplane determined by ?” We can implement and answer such queries in time as follows. The answer to the query depends on the sign of the determinant of the following matrix , which is also the signed volume of the parallelopiped defined by the vectors .
The determinant of matrix is a polynomial in the , which can be computed in time by using the Leibniz expansion
It is easy to see that this polynomial is not identically zero, as every term in the Leibniz expansion yields a different polynomial. This property ensures the non-degeneracy in our conceptual perturbations. We impose a lexicographic ordering on as follows: . This induces a natural lexicographic ordering on the monomials in the polynomial . The lexicographically least monomial in this ordering has either a positive or a negative coefficient: we interpret the sign of this coefficient as the relative position of with respect to the hyperplane determined by . We refer the reader to  for further details. We summarize the above discussion in the following lemma.
Given a set of points in ,
The lexicographic ordering of the yields a canonical perturbation of the points such that the resulting set is in general position.
There exists an time subroutine which computes the relative position of a canonically perturbed point with respect to the hyperplane determined by canonically perturbed points.
3 Hardness Results
In this section, we show several new hardness results for problems and MSim. As we will observe, these problems turn out to be algorithmically intractable, even for severely restricted cases. We begin by recalling the following observation.
Theorem 3.1 (Folklore)
GSim is NP-hard for the class of simple undirected graphs.
In fact, the problem turns out to be NP-hard even for very restricted graph classes. The following theorem is the main hardness result of this section.
GSim is NP-hard for the class of trees.
On the other hand, if we restrict one of the input instances to be a path, the problem can be solved in polynomial time. The following theorem provides a positive example of tractability of GSim.
An input instance of GSim, where is a path and is a tree, can be solved in polynomial time.
The above results exhibit the hardness of GSim, and consequently, the hardness of the more general problems WSim and MSim. Since the graphs (for instance cycles and paths) involved in the hardness reductions have adjacency matrices of high rank, it is natural to ask whether MSim would become tractable for matrices of low rank. Our following theorem shows that MSim is NP-hard even for matrices of rank at most . The underlying reason for hardness is the well-known problem QAP, which shares the optimization domain .
MSim is NP-hard for symmetric matrices of rank at most .
The key to the above reduction is the fact that one of the matrices has non-negative Eigenvalues while the other matrix has non-positive Eigenvalues. We show that the MSim is NP-hard even for positive semi-definite matrices. The main idea is to reformulate the hardness reduction in Theorem 3.1 in terms of Laplacian matrices.
MSim is NP-hard for positive semi-definite matrices.
In fact, we show that the problem remains NP-hard, even if one of the matrices is of rank . The proof follows by modifying the matrices in the proof of Theorem 3.4 so that they are positive semi-definite.
MSim is NP-hard for positive semi-definite matrices, even if one of the matrices has rank .
Therefore, the realm of tractability for MSim is restricted to positive definite matrices of low rank. In the next section, we prove algorithmic results in this direction.
4 Algorithmic Results
In this section, we present the main algorithmic result of this paper. As established in the previous section, the domain of tractability for MSim is restricted to p.s.d. matrices with low rank. The main theorem of this section is stated as follows. Given an instance of MSim, let . Let be the clustering number of .
There is a algorithm for MSim. Here, the notation hides factors polynomial in the size of input representation.
In order to prove Theorem 4.1, we define a closely related optimization problem, called the Quadratic-Vector-Partition (QVP). Let be the set of all (ordered) partitions of into sets of size . I.e., an element is an ordered partition of , where . Given a set of vectors , we will employ two important notations. Denote to be the point-set corresponding to . Denote , .
The input instance to QVP is a set of vectors , along with two matrices and . The matrix is a p.s.d matrix of size . The matrix is a diagonal matrix with positive entries. The objective is to search for a partition which maximizes the following quadratic objective function .
Informally, the goal is to ‘cluster’ the set into sets of cardinalities such that the quadratic function above is maximized. The connection to MSim arises due to the following observation. We can interpret a permutation as a bijection where and are the respective spectral decompositions. Since , we must have and consequently, . Since the set has only distinct tuples (the clustering number), it suffices to examine the partitions of into sets of certain fixed cardinalities. It remains then to show that the minimization of the objective function for MSim can be reformulated as the maximization of the objective function for QVP.
The proof of Theorem 4.1 proceeds in three steps. First, in Section 4.1, we show a reduction from MSim to QVP. In particular, the dimension and the parameter for the QVP instance are equal to the rank and the clustering number in Theorem 4.1 respectively. Second, in Section 4.2, we show that the optimal solutions for a QVP instance have a nice geometrical structure. In particular, the convex-hulls of the point-sets in the partition are mutually disjoint (upto some caveats). Third, in Section 4.3, we describe a algorithm for QVP. The algorithm essentially enumerates all partitions with the optimal solution structure. This finishes the proof of Theorem 4.1.
4.1 Reduction to Qvp
In this subsection, we prove the following reduction lemma. Given two matrices , let . Let be the cluster-number of .
Given a MSim instance , we can compute a QVP-instance , where of size and , , in time such that the following holds. Given an optimal solution for the QVP-instance , we can compute in time.
4.2 Optimal Structure of Qvp
In this section, we show that the optimal solutions for a QVP instance have, in fact, a nice geometrical structure. Let denote the set of optimal solutions for a QVP instance , where of size . Recall from Section 2 that a partition of is mutually linearly separated if for every , there exists a hyperplane which weakly linearly separates and .
Let be an optimal partition for a QVP instance . The corresponding partition is mutually linearly separated.
The proof of Lemma 4.2 proceeds in three steps. Claim 4.1 shows that we can reformulate QVP as a convex programming problem in . Claim 4.2 stipulates certain necessary conditions for optimality, in this reformulated version. Using Claim 4.3, we revert back to the original QVP formulation in . This allows us to interpret the optimality conditions in Claim 4.2 as the mutually linearly separated property in Lemma 4.2.
Given a partition of , let be the vector of length corresponding to the coordinates of vectors . Formally, denotes the vector , . Recall that is a diagonal matrix with positive entries, say . The following claim shows that we can describe our problem as a convex programming problem in . The objective function is a sum of vector norms (squared).
The proof is deferred to the appendix.
The second step constitutes the key insight to the proof of Lemma 4.2. We show that an optimal solution for the convex program of Claim 4.1 must be an optimal solution for some linear program. The proof of this claim builds on the statements in Subsection 2.2 about linearization of convex objective functions. Recall that the set denote the set of optimal solutions for the QVP instance .
For every , there exist vectors such that is an optimal solution for the objective function
Moreover, the set of optimal solutions of is a subset of .
The proof is deferred to the appendix.
For every , there exist vectors such that is an optimal solution for the objective function
Moreover, the set of optimal solutions of is a subset of .
The proof is deferred to the appendix.
We finish with the proof of Lemma 4.2.
Recall the notation . Suppose there exist such that and are not (weakly) linearly separated. We claim that this is a contradiction. Indeed, we can isolate the terms and rewrite them as . Now we (weakly) linearly separate the set along the direction , that is, we choose a partition of such that . Then , because and are not (weakly) linearly separated by , and , because . Hence , which contradicts the maximality of . Therefore, it must be the case that the sets and are already (weakly) linearly separated along .
4.3 Algorithm for Qvp
We proceed with an informal description of the algorithm. Recall that a QVP instance is where . The output is an ordered partition of satisfying , for some fixed . Our strategy is simple: we enumerate all partitions of such that the sets are weakly linearly separated for every . By Lemma 4.2, this suffices to obtain an optimal partition. We briefly describe our algorithm. We first guess the separating hyperplanes , , where weakly linearly separates and . Let be the set of hyperplanes defined by -subsets of . It is sufficient to pick from the set , since a hyperplane in can be equivalently replaced by a hyperplane in , without changing the underlying (weakly) linear separation. These hyperplanes partition into convex regions. For every , we check its relative position with respect to these hyperplanes. We assign to one of the sets , depending of its relative position. We claim that every weakly linearly separated family of sets can be discovered on some branch of our computation. The choice of hyperplanes implies a branching. Therefore, the overall branching factor is . Algorithm 4.3 gives a formal description of our algorithm.
There are two caveats. First, we also pick an orientation for every hyperplane . The orientation indicates that (and vice-versa). Second, there may exist some points which lie on the hyperplanes, and hence, their assignments cannot be determined by their relative positions to these hyperplanes. To handle this degeneracy, we use the Simulation-of-Simplicity technique and assume general position. Therefore, there are at most such ambigious points. Since this is a bounded number, we can brute-force try all possible sets for such points. This leads to a branching factor of . The overall branching factor is still . We now proceed to give a formal description as Algorithm 4.3.
Given a QVP instance, Algorithm 4.3 correctly computes an optimal solution in time.
We first show the correctness. By Lemma 4.2, it suffices to show that Algorithm 4.3 computes all partitions of such that the family of sets is weakly linearly separated. We claim that Algorithm 4.3 discovers every such family of sets in Step 1. Indeed, for such a family , there exist hyperplanes which weakly linearly separate the sets , for . By S-o-S technique of Section 2.3, we can assume general position for the input set . It can be shown that for every hyperplane , we can equivalently find another hyperplane in with the following property. If is a partition of such that weakly linearly separates , then also weakly linearly separates . (Refer to Claim A.1 in the appendix). Therefore, there exists a branch of the algorithm in Step 1 such that we discover the hyperplanes . Steps 1 (b)-(d) ensure that we recover the partition .
The running time can be bounded as follows. The branching in Step 1 is bounded by . In Step 1 (b), the number of calls to the Simulation of Simplicity subroutine is bounded by , since we have points, regions and queries . By Lemma 2.3, every call to this subroutine has a cost In Step 1 (c), there is an additional branching factor of for brute-force assignment of points in . These are precisely the points which lie on some hyperplane , and hence . This incurs an additonal branching. The remaining steps are usual polynomial time computations. The overall running time is thus bounded by .
Finally, we summarize this section with the proof of our main theorem.