Graph Similarity and Approximate Isomorphism
Abstract
The graph similarity problem, also known as approximate graph isomorphism or graph matching problem, has been extensively studied in the machine learning community, but has not received much attention in the algorithms community: Given two graphs of the same order with adjacency matrices , a wellstudied measure of similarity is the Frobenius distance
where ranges over all permutations of the vertex set of , where denotes the matrix obtained from by permuting rows and columns according to , and where is the Frobenius norm of a matrix . The (weighted) graph similarity problem, denoted by GSim (WSim), is the problem of computing this distance for two graphs of same order. This problem is closely related to the notoriously hard quadratic assignment problem (QAP), which is known to be NPhard even for severely restricted cases.
It is known that GSim (WSim) is NPhard; we strengthen this hardness result by showing that the problem remains NPhard even for the class of trees. Identifying the boundary of tractability for WSim is best done in the framework of linear algebra. We show that WSim is NPhard as long as one of the matrices has unbounded rank or negative eigenvalues: hence, the realm of tractability is restricted to positive semidefinite matrices of bounded rank. Our main result is a polynomial time algorithm for the special case where one of the matrices has a bounded clustering number, a parameter arising from spectral graph drawing techniques.
1 Introduction
Graph isomorphism has been a central open problem in algorithmics for the last 50 years. The question of whether graph isomorphism is in polynomial time is still wide open, but at least we know that it is in quasipolynomial time [4]. On the practical side, the problem is largely viewed as solved; there are excellent tools [9, 16, 21, 22] that efficiently decide isomorphism on all but very contrived graphs [25]. However, for many applications, notably in machine learning, we only need to know whether two graphs are “approximately isomorphic”, or more generally, how “similar” they are. The resulting graph similarity problem has been extensively studied in the machine learning literature under the name graph matching (e.g. [1, 10, 15, 29, 30]), and also in the context of the schema matching problem in database systems (e.g. [23]). Given the practical significance of the problem, surprisingly few theoretical results are known. Before we discuss these known and our new results, let us state the problem formally.
Graph similarity.
It is not obvious how to define the distance between two graphs, but the distance measure that we study here seems to be the most straightforward one, and it certainly is the one that has been studied most. For two vertex graphs and with adjacency matrices and , we define the Frobenius distance between and to be
(1) 
Here ranges over all permutations of the vertex set of ,
denotes the matrix obtained from by permuting rows and
columns according to , and the norm
is the Frobenius norm of a
matrix . Note that counts the number of
edge mismatches in an optimal alignment of the two graphs. The
graph similarity problem, denoted by GSim, is the problem of
computing for graphs of the same order, or,
depending on the context, the decision version of this problem (decide
whether for a given ). We can easily extend the
definitions to weighted graphs and denote the weighted graph
similarity problem by WSim. In practice, this is often the more
relevant problem. Instead of the adjacency matrices of graphs, we may
also use the Laplacian matrices of the graphs to define distances.
Recall that the Laplacian matrix of a graph is the matrix
, where is the diagonal matrix in which the entry
is the degree of the th vertex, or in the weighted
case, the sum of the weights of the incident edges. Let
be the corresponding
distance measure. Intuitively, in the definition of we
prefer permutations that map vertices of similar degrees onto one
another. Technically, is interesting, because the
Laplacian matrices are positive semidefinite (if the weights are
nonnegative). Both the (weighted) similarity problem and its version
for the Laplacian matrices are special cases of the problem MSim of
computing for given symmetric matrices
. In the Laplacian case, these matrices are
positive semidefinite.
The QAP.
The graph similarity problem is closely related to quadratic assignment problem (QAP) [6]: given two matrices , the goal is to find a permutation that minimizes . The usual interpretation is that we have facilities that we want to assign to locations. The entry is the flow from the th to the th facility, and the entry is the distance from the th to the th location. The goal is to find an assignment of facilities to locations that minimizes the total cost, where the cost for each pair of facilities is defined as the flow times the distance between their locations. The QAP has a large number of realworld applications, as for instance hospital planning [12], typewriter keyboard design [27], ranking of archeological data [18], and scheduling parallel production lines [14]. On the theoretical side, the QAP contains wellknown optimization problems as special cases, as for instance the Travelling Salesman Problem, the feedback arc set problem, the maximum clique problem, and all kinds of problems centered around graph partitioning, graph embedding, and graph packing.
In the maximization version maxQAP of QAP the objective is to maximize (see [19, 24]). Both QAP and maxQAP are notoriously hard combinatorial optimization problems, in terms of practical solvability [28] as well as in terms of theoretical hardness results even for very restricted special cases [5, 8, 7]. It is easy to see that MSim is equivalent to maxQAP, because in reductions between QAP and MSim the sign of one of the two matrices is flipped. Most of the known results for GSim and its variants are derived from results for (max)QAP.
Previous Work.
It seems to be folklore knowledge that GSim is NPcomplete. For example, this can be seen by a reduction from the Hamiltonian path problem: take to be the vertex input graph and a path of length ; then if and only if has a Hamiltonian path. By the same argument, we can actually reduce the subgraph isomorphism problem to GSim. Arvind, Köbler, Kuhnert, and Vasudev [3] study several versions of what they call approximate graph isomorphism; their problem MinPGI is the same as our GSim. They prove various hardness of approximation results. Based on an earlier QAPapproximation algorithm due to Arora, Frieze, and Kaplan [2], they also obtain a quasipolynomial time approximation algorithm for the related problem MaxPGI. Further hardness results were obtained by Makarychev, Manokaran, and Sviridenko [19] and O’Donnell, Wright, Wu, and Zhou [26], who prove an average case hardness result for a variant of GSim problem that they call robust graph isomorphism. Keldenich [17] studied the similarity problem for a wide range matrix norms (instead of the Frobenius norm) and proved hardness for essentially all of them.
Spectral Graph Visualization.
Since WSim and MSim are essentially linear algebraic problems, it is reasonable to hope that the spectral structure of the input (adjacency) matrices is closely related with the computational complexity of these problems. In this regard, we remark that spectral graph drawing is a wellestablished technique for visualizing graphs via their spectral properties. Formally, let be a vertex graph: a graph drawing is a map , where the ambient space has dimension . For spectral graph drawings, this map is typically defined as follows. We select a suitable matrix representation of the graph and select up to eigenvectors of this matrix. Then, the mapping is defined by the rows of the matrix . The choice of the matrix representation and the selection of eigenvectors usually depends on the problem at hand. The most useful matrix representation in the spectral drawing framework is the wellknown Laplacian matrix: the eigenvectors corresponding to smallest eigenvalues define the drawing of interest.
Observe that the graph drawing defined above is not injective in general. Given such a drawing , we define the clustering number of a graph to be the cardinality of the set . The elements of correspond to subsets of : every vertex in such a ‘cluster’ has identical adjacency.
Our results.
So where does all this leave us? Well, GSim is obviously an extremely hard optimization problem. We start our investigations by adding to the body of known hardness results: we prove that GSim remains NPhard even if both input graphs are trees (Theorem 3.2). Note that in strong contrast to this, the subgraph isomorphism problem becomes easy if both input graphs are trees [20]. The reduction from Hamiltonian path sketched above shows that GSim is also hard if one input graph is a path. We prove that GSim is tractable in the very restricted case that one of the input graphs is a path and the other one is a tree (Theorem 3.3).
As WSim and MSim are essentially linear algebraic problems, it makes sense to look for algebraic tractability criteria. We explore bounded rank (of the adjacency matrices) as a tractability criteria for WSim and MSim. Indeed, the NPhardness reductions for GSim involve graphs which have adjacency matrices of high rank (e.g. paths, cycles). We show that the problem GSim (and WSim) remains NPhard as long as one of the matrices has unbounded rank or negative eigenvalues. (Theorems 3.4, 3.5 and 3.6). Consequently, the realm of tractability for WSim (and MSim) is restricted to the class of positive semidefinite matrices of bounded rank. We feel that for a problem as hard as QAP or MSim, identifying any somewhat natural tractable special case is worthwhile. Our main result (Theorem 4.1) is a polynomial time algorithm for MSim if both input matrices are positive semidefinite (as it is the case for the Laplacian version of WSim) and have boundedrank, and where one of the matrices has a bounded clustering number.
For the proof of Theorem 4.1, we can rewrite the (squared) objective function as , where ranges over all permutation matrices. This is a convex function, and it would be feasible to minimize it over a convex domain. The real difficulty of the problem lies in the fact that we are optimizing over the complicated discrete space of permutation matrices. Our approach relies on a linearization of the solution space, and the key insight (Lemma 4.2) is that the optimal solution is essentially determined by polynomially many hyperplanes. To prove this, we exploit the convexity of the objective function in a peculiar way.
2 Preliminaries
2.1 Notation
We denote the set by . Unless specified otherwise, we will always assume that the vertex set of an vertex graph is . We denote the degree of a vertex by .
Matrices.
Given an matrix , the row (column) of is denoted by (). The multiset is denoted by . Given , the sum is denoted by . We denote the identity matrix by .
A real symmetric matrix is called positive semidefinite (p.s.d), denoted by , if the scalar is nonnegative for every . The following conditions are wellknown to be equivalent.


Every eigenvalue of is nonnegative.

for some matrix . In other words, there exist vectors such that .
Given two vectors , their dot product is defined to be . Given , the inner product of w.r.t. M, denoted by , is defined to be . The usual dot product corresponds to the case , the identity matrix.
Every symmetric matrix of rank has a spectral decomposition . Here, is a diagonal matrix with the eigenvalues on the diagonal. The matrix is a matrix with the corresponding eigenvectors as the columns .
Graphs and Matrices.
The Laplacian matrix of a (weighted) undirected graph , denoted by , is defined as follows. Let be the symmetric (weighted) adjacency matrix of . Let be a diagonal matrix, such that is the sum of weights of the edges incident on the vertex. For simple undirectred graphs, . Define the Laplacian of as . This definition allows us to express the quadratic form
The above expression immediately implies that is positive semidefinite.
Clustering Number.
Recall the following definitions from Section 1. Given a vertex graph , a graph drawing is a map , where the ambient dimension . We will use the adjacency matrix of a graph to generate spectral graph drawings as follows. Let the rank of be , and let be a spectral decomposition. Denote , where are the eigenvectors of . The mapping of our interest is defined by the rows of the matrix . Given any two spectral decompositions and , it holds that for some orthogonal matrix . Since is invertible, the number of distinct tuples in the set is equal to the corresponding number for the set . This allows us to define the clustering number of a graph : it is equal to the cardinality of the set , where is defined via some spectral decomposition of , as above. The above definitions generalize to weighted (undirected) graphs in an analogous manner.
Frobenius Norm.
The trace of a matrix , denoted by , is defined to be . The trace inner product of two matrices and , denoted by , is the scalar . The Frobenius norm of a matrix is defined in the introduction. It is easy to check that .
Given two vertex graphs and and a permutation , a mismatch between and is a pair such that and (or viceversa). In other words, does not preserve adjacency for the pair . The following claim will be useful as a combinatorial interpretation of the Frobenius norm. Let denote the number of mismatches between and .
Claim 2.1
.
Proof.
The only nonzero terms in the expansion of summation correspond to mismatches. Since every mismatch contributes and is counted twice in the summation, the claim follows. ∎
2.2 Convex Optimization
A hyperplane in the Euclidean space is a dimensional affine subspace. The usual representation of a hyperplane is a linear equation for some . The convex sets and are called the open halfspaces corresponding to , denoted by respectively.
Two sets are weakly linearly separated if there exists a hyperplane such that and . In this case, we call them to be weakly linearly separated along . A family of sets is weakly linearly separated if for every , the sets are weakly linearly separated. Let be a partition of a set into sets . The partition is said to be mutually linearly separated if the family of sets is weakly linearly separated.
A subset is called convex if for every , , . A function is called convex on a convex set if for every , . The following theorem about linearization of convex differentiable functions is wellknown and is stated without proof. The gradient of a function , denoted by , is the vectorvalued function . Given , let denote the vector .
Theorem 2.1 (Convex function linearization)
Let be a convex function. For all , .
Next, we show that the linearization of a convex function can be useful in understanding its optima over a finite domain. We prove the following lemma about convex functions, which is interesting in its own right.
Lemma 2.1
Let be a finite subset of . Let , such that is convex, and let be defined as . Let .
Then there exist a such that:

where ;

.
In other words, for every which maximizes over , there exists a partially “linearized” function such that maximizes over . Moreover, every maximizer of over is a maximizer of over . This additional condition is necessary so that this “linearization” does not create spurious optimal solutions.
Proof.
Let . Since is convex, we can use Theorem 2.1 to linearize around . Hence, there exists a such that , or equivalently,
(2) 
for all . Hence with , for all we have
where the inequality holds by (2) and because maximizes . Hence maximizes as well, which proves (i).
For (ii), consider . To prove that , it suffices to prove that . By (i), we have . Thus
where the inequality holds by (2) with and as maximizes . ∎
Corollary 2.1
Let be a finite subset of . For all , let be a convex function, and let be defined ny . Let .
Then there are such that:

where ;

.
Proof.
Inductively apply the lemma to the functions
∎
Finally, we state an important fact about the convexity of quadratic functions. Given a p.s.d. matrix , the quadratic function is defined as .
Lemma 2.2 (Convexity of p.s.d)
is convex on .
Proof.
For all , . Using , we can show that . Combining, we have . Hence, is convex. ∎
2.3 Simulation of Simplicity
In this section, we describe an elegant technique for handling degeneracy in the input data for geometrical algorithms that is due to Edelsbrunner and Mücke [11]. We also state an important lemma which will be directly useful for our algorithmic results in Section 4.
An input set of points is said to be in general position, if there is no subset with that lies on a common hyperplane. If we are optimizing a certain function of this input on a discrete space , infinitesimally small perturbations of will not change the set of optimal solutions. Hence we may always assume (modulo infinitesimal perturbations) that such input sets are in general position and do not contain degenerate subsets. From the algorithmic point of view, the caveat is that these perturbations might be so small that we cannot even represent them efficiently.
In this context, Edelsbrunner and Mücke [11] developed a useful technique to handle degeneracy in input data, called SimulationofSimplicity. The idea is to introduce conceptual perturbations which eliminate all degeneracies: the perturbations are never computed explicitly in practice. In fact, the perturbations are just certain conveniently chosen polynomials in a parameter , so that after adding these polynomials to the coordinates the perturbed set agrees with the input set for . For our purposes, we require such a perturbation of an input set of points that brings them into general position. We select perturbations for and as follows. We perturb the coordinate of vector by adding . In our algorithmic application, we need to consistently answer queries of the type: “Given points (with ) and a point , does the point lie below, on, or above the hyperplane determined by ?” We can implement and answer such queries in time as follows. The answer to the query depends on the sign of the determinant of the following matrix , which is also the signed volume of the parallelopiped defined by the vectors .
The determinant of matrix is a polynomial in the , which can be computed in time by using the Leibniz expansion
It is easy to see that this polynomial is not identically zero, as every term in the Leibniz expansion yields a different polynomial. This property ensures the nondegeneracy in our conceptual perturbations. We impose a lexicographic ordering on as follows: . This induces a natural lexicographic ordering on the monomials in the polynomial . The lexicographically least monomial in this ordering has either a positive or a negative coefficient: we interpret the sign of this coefficient as the relative position of with respect to the hyperplane determined by . We refer the reader to [11] for further details. We summarize the above discussion in the following lemma.
Lemma 2.3
Given a set of points in ,

The lexicographic ordering of the yields a canonical perturbation of the points such that the resulting set is in general position.

There exists an time subroutine which computes the relative position of a canonically perturbed point with respect to the hyperplane determined by canonically perturbed points.
3 Hardness Results
In this section, we show several new hardness results for problems and MSim. As we will observe, these problems turn out to be algorithmically intractable, even for severely restricted cases. We begin by recalling the following observation.
Theorem 3.1 (Folklore)
GSim is NPhard for the class of simple undirected graphs.
In fact, the problem turns out to be NPhard even for very restricted graph classes. The following theorem is the main hardness result of this section.
Theorem 3.2
GSim is NPhard for the class of trees.
On the other hand, if we restrict one of the input instances to be a path, the problem can be solved in polynomial time. The following theorem provides a positive example of tractability of GSim.
Theorem 3.3
An input instance of GSim, where is a path and is a tree, can be solved in polynomial time.
The above results exhibit the hardness of GSim, and consequently, the hardness of the more general problems WSim and MSim. Since the graphs (for instance cycles and paths) involved in the hardness reductions have adjacency matrices of high rank, it is natural to ask whether MSim would become tractable for matrices of low rank. Our following theorem shows that MSim is NPhard even for matrices of rank at most . The underlying reason for hardness is the wellknown problem QAP, which shares the optimization domain .
Theorem 3.4
MSim is NPhard for symmetric matrices of rank at most .
The key to the above reduction is the fact that one of the matrices has nonnegative Eigenvalues while the other matrix has nonpositive Eigenvalues. We show that the MSim is NPhard even for positive semidefinite matrices. The main idea is to reformulate the hardness reduction in Theorem 3.1 in terms of Laplacian matrices.
Theorem 3.5
MSim is NPhard for positive semidefinite matrices.
In fact, we show that the problem remains NPhard, even if one of the matrices is of rank . The proof follows by modifying the matrices in the proof of Theorem 3.4 so that they are positive semidefinite.
Theorem 3.6
MSim is NPhard for positive semidefinite matrices, even if one of the matrices has rank .
Therefore, the realm of tractability for MSim is restricted to positive definite matrices of low rank. In the next section, we prove algorithmic results in this direction.
4 Algorithmic Results
In this section, we present the main algorithmic result of this paper. As established in the previous section, the domain of tractability for MSim is restricted to p.s.d. matrices with low rank. The main theorem of this section is stated as follows. Given an instance of MSim, let . Let be the clustering number of .
Theorem 4.1
There is a algorithm for MSim. Here, the notation hides factors polynomial in the size of input representation.
In order to prove Theorem 4.1, we define a closely related optimization problem, called the QuadraticVectorPartition (QVP). Let be the set of all (ordered) partitions of into sets of size . I.e., an element is an ordered partition of , where . Given a set of vectors , we will employ two important notations. Denote to be the pointset corresponding to . Denote , .
The input instance to QVP is a set of vectors , along with two matrices and . The matrix is a p.s.d matrix of size . The matrix is a diagonal matrix with positive entries. The objective is to search for a partition which maximizes the following quadratic objective function .
Informally, the goal is to ‘cluster’ the set into sets of cardinalities such that the quadratic function above is maximized. The connection to MSim arises due to the following observation. We can interpret a permutation as a bijection where and are the respective spectral decompositions. Since , we must have and consequently, . Since the set has only distinct tuples (the clustering number), it suffices to examine the partitions of into sets of certain fixed cardinalities. It remains then to show that the minimization of the objective function for MSim can be reformulated as the maximization of the objective function for QVP.
The proof of Theorem 4.1 proceeds in three steps. First, in Section 4.1, we show a reduction from MSim to QVP. In particular, the dimension and the parameter for the QVP instance are equal to the rank and the clustering number in Theorem 4.1 respectively. Second, in Section 4.2, we show that the optimal solutions for a QVP instance have a nice geometrical structure. In particular, the convexhulls of the pointsets in the partition are mutually disjoint (upto some caveats). Third, in Section 4.3, we describe a algorithm for QVP. The algorithm essentially enumerates all partitions with the optimal solution structure. This finishes the proof of Theorem 4.1.
4.1 Reduction to Qvp
In this subsection, we prove the following reduction lemma. Given two matrices , let . Let be the clusternumber of .
Lemma 4.1
Given a MSim instance , we can compute a QVPinstance , where of size and , , in time such that the following holds. Given an optimal solution for the QVPinstance , we can compute in time.
4.2 Optimal Structure of Qvp
In this section, we show that the optimal solutions for a QVP instance have, in fact, a nice geometrical structure. Let denote the set of optimal solutions for a QVP instance , where of size . Recall from Section 2 that a partition of is mutually linearly separated if for every , there exists a hyperplane which weakly linearly separates and .
Lemma 4.2
Let be an optimal partition for a QVP instance . The corresponding partition is mutually linearly separated.
The proof of Lemma 4.2 proceeds in three steps. Claim 4.1 shows that we can reformulate QVP as a convex programming problem in . Claim 4.2 stipulates certain necessary conditions for optimality, in this reformulated version. Using Claim 4.3, we revert back to the original QVP formulation in . This allows us to interpret the optimality conditions in Claim 4.2 as the mutually linearly separated property in Lemma 4.2.
Given a partition of , let be the vector of length corresponding to the coordinates of vectors . Formally, denotes the vector , . Recall that is a diagonal matrix with positive entries, say . The following claim shows that we can describe our problem as a convex programming problem in . The objective function is a sum of vector norms (squared).
Claim 4.1
.
The proof is deferred to the appendix.
The second step constitutes the key insight to the proof of Lemma 4.2. We show that an optimal solution for the convex program of Claim 4.1 must be an optimal solution for some linear program. The proof of this claim builds on the statements in Subsection 2.2 about linearization of convex objective functions. Recall that the set denote the set of optimal solutions for the QVP instance .
Claim 4.2
For every , there exist vectors such that is an optimal solution for the objective function
Moreover, the set of optimal solutions of is a subset of .
The proof is deferred to the appendix.
Finally, we undo the transformation of Claim 4.1 and revert back to in the following claim. Consequently, we can reformulate the optimality conditions of Claim 4.2 as follows.
Claim 4.3
For every , there exist vectors such that is an optimal solution for the objective function
Moreover, the set of optimal solutions of is a subset of .
The proof is deferred to the appendix.
We finish with the proof of Lemma 4.2.
Proof of Lemma 4.2. Since is an optimal partition for a QVP instance , by Claim 4.3, there exist vectors such that is an optimal solution for the objective function
Recall the notation . Suppose there exist such that and are not (weakly) linearly separated. We claim that this is a contradiction. Indeed, we can isolate the terms and rewrite them as . Now we (weakly) linearly separate the set along the direction , that is, we choose a partition of such that . Then , because and are not (weakly) linearly separated by , and , because . Hence , which contradicts the maximality of . Therefore, it must be the case that the sets and are already (weakly) linearly separated along .
4.3 Algorithm for Qvp
In this subsection, we describe a algorithm for QVP. Along with the reduction stated in Lemma 4.1, this finishes the proof of Theorem 4.1.
We proceed with an informal description of the algorithm. Recall that a QVP instance is where . The output is an ordered partition of satisfying , for some fixed . Our strategy is simple: we enumerate all partitions of such that the sets are weakly linearly separated for every . By Lemma 4.2, this suffices to obtain an optimal partition. We briefly describe our algorithm. We first guess the separating hyperplanes , , where weakly linearly separates and . Let be the set of hyperplanes defined by subsets of . It is sufficient to pick from the set , since a hyperplane in can be equivalently replaced by a hyperplane in , without changing the underlying (weakly) linear separation. These hyperplanes partition into convex regions. For every , we check its relative position with respect to these hyperplanes. We assign to one of the sets , depending of its relative position. We claim that every weakly linearly separated family of sets can be discovered on some branch of our computation. The choice of hyperplanes implies a branching. Therefore, the overall branching factor is . Algorithm 4.3 gives a formal description of our algorithm.
There are two caveats. First, we also pick an orientation for every hyperplane . The orientation indicates that (and viceversa). Second, there may exist some points which lie on the hyperplanes, and hence, their assignments cannot be determined by their relative positions to these hyperplanes. To handle this degeneracy, we use the SimulationofSimplicity technique and assume general position. Therefore, there are at most such ambigious points. Since this is a bounded number, we can bruteforce try all possible sets for such points. This leads to a branching factor of . The overall branching factor is still . We now proceed to give a formal description as Algorithm 4.3.
Algorithm 4.3
Input: , matrices , .
Output: A partition of where for some fixed .

For every choice of hyperplanes , from the set with an orientation ,

Let .

For every and , check if belongs to the convex region corresponding to the intersection of open halfspaces
We use the SimulationofSimplicity subroutine of Section 2.3 to check the relative position of with respect to the hyperplanes.

If belongs to some region , we assign to the set . Otherwise, we add to the set .

For every point , try each of the assignments to .

Check if the constraints are satisfied, otherwise reject this branch of computation.


For every partition computed above, evaluate the QVP objective function and output an optimal solution.
Claim 4.4
Given a QVP instance, Algorithm 4.3 correctly computes an optimal solution in time.
Proof.
We first show the correctness. By Lemma 4.2, it suffices to show that Algorithm 4.3 computes all partitions of such that the family of sets is weakly linearly separated. We claim that Algorithm 4.3 discovers every such family of sets in Step 1. Indeed, for such a family , there exist hyperplanes which weakly linearly separate the sets , for . By SoS technique of Section 2.3, we can assume general position for the input set . It can be shown that for every hyperplane , we can equivalently find another hyperplane in with the following property. If is a partition of such that weakly linearly separates , then also weakly linearly separates . (Refer to Claim A.1 in the appendix). Therefore, there exists a branch of the algorithm in Step 1 such that we discover the hyperplanes . Steps 1 (b)(d) ensure that we recover the partition .
The running time can be bounded as follows. The branching in Step 1 is bounded by