An exponential lower bound for Individualization-Refinement algorithms for Graph Isomorphism
The individualization-refinement paradigm provides a strong toolbox for testing isomorphism of two graphs and indeed, the currently fastest implementations of isomorphism solvers all follow this approach. While these solvers are fast in practice, from a theoretical point of view, no general lower bounds concerning the worst case complexity of these tools are known. In fact, it is an open question whether individualization-refinement algorithms can achieve upper bounds on the running time similar to the more theoretical techniques based on a group theoretic approach.
In this work we give a negative answer to this question and construct a family of graphs on which algorithms based on the individualization-refinement paradigm require exponential time. Contrary to a previous construction of Miyazaki, that only applies to a specific implementation within the individualization-refinement framework, our construction is immune to changing the cell selector, or adding various heuristic invariants to the algorithm. Furthermore, our graphs also provide exponential lower bounds in the case when the -dimensional Weisfeiler-Leman algorithm is used to replace the standard color refinement operator and the arguments even work when the entire automorphism group of the inputs is initially provided to the algorithm.
The individualization-refinement paradigm provides a strong toolbox for testing isomorphism of two graphs. To date, algorithms that implement the individualization-refinement paradigm constitute the fastest practical algorithms for the graph isomorphism problem and for the task of canonically labeling combinatorial objects.
Originally exploited by McKay’s software package nauty [MR635936] as early as 1981, in a nutshell, the basic principle is to classify vertices using a refinement operator according to an isomorphism-invariant property. In a basic form one usually uses the so-called color refinement operator, also called 1-dimensional Weisfeiler-Leman algorithm, for this purpose. Whenever the refinement is not sufficient, vertices within a selected color class (usually called a cell) are individualized one by one in a backtracking manner as to artificially distinguish them from other vertices. This yields a backtracking tree, that is traversed to explore the structure of the input graphs. Additional pruning with the use of invariants and the exploitation of automorphisms of the graphs makes the approach viable in practice, leading to the fastest isomorphism solvers currently available. The use of invariants also allows us to define a smallest leaf, which can be used to canonically label the graph, i.e., to rearrange the vertices in canonical fashion as to obtain a standard copy of the graph.
There are several highly efficient isomorphism software packages implementing the paradigm. Among them are nauty/traces [mckay], bliss [bliss], conauto [conauto] and saucy [saucy]. While they all follow the basic individualization-refinement paradigm, these algorithms differ drastically in design principles and algorithmic realization. In particular, they differ in the way the search tree is traversed, they use different low level subroutines, have diverse ways to perform tasks such as automorphism detection, and they use different cell selection strategies as well as vertex invariants and refinement operators.
With Babai’s [DBLP:conf/stoc/Babai16] recent quasi-polynomial time algorithm for the graph isomorphism problem, the theoretical worst case complexity of algorithms for the graph isomorphism problem was drastically improved from a previous best (see [DBLP:conf/stoc/BabaiL83]) to for some constant . As an open question, Babai asks [DBLP:conf/stoc/Babai16] for the worst case complexity of algorithms based on individualization-refinement techniques. About this worst case complexity, very little had been known.
In 1995 Miyazaki [DBLP:conf/dimacs/Miyazaki95] constructed a family of graphs on which the then current implementation of nauty has exponential running time. For this purpose these graphs are designed to specifically fool the cell selection process into exponential behavior. However, as Miyazaki also argues, with a different cell selection strategy the examples can be solved in polynomial time within the individualization-refinement paradigm.
In this paper we provide general lower bounds for individualization-refinement algorithms with arbitrary combinations of cell selection, refinement operators, invariants and even given perfect automorphism pruning. More precisely, the graphs we provide yield an exponential size search tree (i.e., nodes) for any combination of refinement operator, invariants, and the cell selector which are not stronger than the -dimensional Weisfeiler-Leman algorithm for some fixed dimension . The natural class of algorithms for which we thus obtain lower bounds encompasses all software packages mentioned above even with various combinations of switches that can be turned on and off in the execution of the algorithm to tune the algorithms towards specific input graphs. Our graphs are asymmetric, i.e., have no non-trivial automorphisms, and thus no strategy for automorphism detection can help the algorithm to circumvent the exponential lower bound.
Our construction makes use of a construction of Cai-Fürer-Immerman [cfi] and the multipede construction of Gurevich and Shelah [DBLP:journals/jsyml/GurevichS96] that yields for every dimension non-isomorphic finite rigid structures that are not distinguishable by the -dimensional Weisfeiler-Leman algorithm. In more detail, our construction starts with a bipartite base graph that is obtained by a simple random process. With high probability such a graph has strong expansion properties ensuring a variant of the meagerness property of [DBLP:journals/jsyml/GurevichS96] suitable for our purposes. Additionally, with high probability the graph has an almost-disjointness property for neighborhoods of vertices from one bipartition class. To the base graph we apply a bipartite variant of the construction of [cfi]. By individualizing a small fraction of vertices, we can guarantee that the final graphs are rigid (have no non-trivial automorphisms). For our theoretical analysis, we define a closure operator that gives us control over the effect of the Weifeiler-Leman algorithm on the graphs. Due to the disjointness property this effect is limited. Exploiting automorphism of subgraphs of the input, we then proceed to argue that there is an exponential number of colorings of the graph that cannot be distinguished. These statements can be combined to show that the search tree of every algorithm within the individualization-refinement framework has exponential size.
Some of the packages above have a mechanism called component recursion (see [DBLP:conf/tapas/JunttilaK11]). We show that even this strategy cannot yield improvements for our examples. We should point out that component recursion was used in Goldberg’s result [Goldberg1983229] which shows that with the right cell selection strategy and the use of component recursion (in that paper called sections) individualization-refinement algorithms have exponential upper bounds, matching our lower bounds.
We also should remark that, seen as colored graphs, our graphs have bounded color class size and as such isomorphism of the graphs can be decided in polynomial time using simple group theoretic techniques (see [BabaiRandom, DBLP:conf/focs/FurstHL80]).
Since the software packages that follow the individualization-refinement paradigm are designed for practical purposes rather than to obtain theoretical worst case guarantees, the question lies at hand how meaningful the lower bounds provided in the paper are. However, in separate work [benchmark-paper], we investigate practical benchmark graphs. It turns out that constructions related to the ones discussed in this paper in fact yield graphs which, experimentally, pose by far the most challenging graph isomorphism instances available to date.
A graph is a pair with vertex set and edge relation . In this paper all graphs are finite simple, undirected graphs. The neighborhood of is denoted . For a set let .
An isomorphism from a graph to another graph is a bijective mapping which preserves the edge relation, that is if and only if for all . Two graphs and are isomorphic () if there is an isomorphism from to . We write to indicate that is an isomorphism from to . The isomorphism type of a graph is the class of graphs isomorphic to . An automorphism of a graph is an isomorphism from to itself. By we denote the group of automorphisms of . A graph is rigid (or asymmetric) if its automorphism group is trivial, that is, the only automorphism of is the identity map.
A vertex coloring of a graph is a map into some set of colors . Mostly, we will use vertex colorings into the natural numbers .
Isomorphisms between two colored graphs and are required to preserve vertex colors. Slightly abusing notation we will sometimes not differentiate between and , if the coloring is apparent from context.
We will also consider vertex colored graphs with a distinguished sequence of not necessarily distinct vertices , where for some . For a tuple we let be the length of the tuple. Two such graphs with distinguished sequences and are isomorphic if and there is an isomorphism from to preserving vertex colors and satisfying for all . In analogy to the definition above we write .
2.2 The Weisfeiler-Leman algorithm
The -dimensional Weisfeiler-Leman algorithm is a procedure that, given a graph and a coloring of the -tuples of the vertices, computes an isomorphism-invariant refinement of the coloring. Let be colorings of the -tuples of vertices of , where is some set of colors. We say refines () if for all we have
Let be a colored graph (where is a coloring of the vertices) and let be some integer. We set to be the coloring, where each -tuple is colored by the isomorphism type of its underlying induced ordered subgraph. More precisely, we define in such a way that if and only if for all it holds that and for all we have and . For we recursively define for the coloring by setting , where is the multiset defined as
For the definition is analogous but the multiset is defined as i.e., iterating only over neighbors of .
By definition, every coloring induces a refinement of the partition of the -tuples of the graph with coloring . Thus, there is some minimal such that the partition induced by the coloring is not strictly finer than the one induced by the coloring on . For this minimal , we call the coloring the stable coloring of and denote it by .
For , the -dimensional Weisfeiler-Leman algorithm takes as input a colored graph and returns the colored graph . For two colored graphs and , we say that the -dimensional Weisfeiler-Leman algorithm distinguishes and with respect to the initial colorings and if there is some color such that the sets and have different cardinalities. We write if the -dimensional Weisfeiler-Leman algorithm does not distinguish between and .
We extend the definition to vertex-colored graphs with distinguished vertices. Let and be graphs with vertex colorings and and let and be sequences of vertices. Define as the coloring given by . We call this the coloring obtained from by individualizing . Similarly we define as . Then we say that is not distinguished from , in symbols , if and are not distinguished by the -dimensional Weisfeiler-Leman algorithm with respect to the initial colorings and . We denote by the vertex coloring that is induced by the stable coloring with respect to the initial coloring , that is, where is the stable coloring of the -tuples with respect to the initial coloring .
There is a close connection between the Weisfeiler-Leman algorithm and fixed-point logic with counting. In fact the stable coloring computed by -dimensional Weisfeiler-Leman comprehensively captures the information that can be obtained in fixed-point logic with counting using at most variables. We refer to [cfi, IL90] for more details.
We will not require details about the information computed by the Weisfeiler-Leman algorithm and rather use the following pebble game that is known to capture the same information. Let be a fixed number. For graphs on the same number of vertices and with vertex colorings and , respectively, we define the bijective -pebble game on and as follows:
The game has two players called Spoiler and Duplicator
The game proceeds in rounds. Each round is associated with a pair of positions with and .
The initial position of the game is .
Each round consists of the following steps. Suppose the current position of the game is .
Spoiler chooses some .
Duplicator picks a bijection .
Spoiler chooses and sets .
The new position is then the pair consisting of and .
Spoiler wins the game if for the current position the induced graphs are not isomorphic. More precisely, Spoiler wins if there is an such that or or there are such that or . If the play never ends Duplicator wins.
We say that Spoiler (respectively Duplicator) wins the bijective -pebble game if Spoiler (respectively Duplicator) has a winning strategy for the game.
Theorem 2.1 (cf. [cfi, Il90]).
Let be two graphs. Then if and only if Duplicator wins the pebble game .
3 Individualization-refinement algorithms
An extensive description of the paradigm of individualization-refinement algorithms is given in [mckay]. These algorithms capture information about the structure of a graph by coloring the vertices. An initially uniform coloring is first refined in an isomorphism-invariant manner as follows.
A refinement operator is an isomorphism-invariant function that takes a graph , a coloring and a sequence and outputs a coloring such that has a unique color for every . In this context isomorphism-invariant means that implies . A typical choice for such a refinement would be the 1-dimensional Weisfeiler-Leman algorithm described above, where the vertices in are artificially given special colors.
A vertex with a unique color is called a singleton and a coloring is called discrete if all vertices are singletons. Due to the isomorphism-invariance, every isomorphism must preserve the refined colors. Thus, in case the refinement operator produces a discrete coloring on a graph , it is trivial to check whether this graph is isomorphic to another graph . Indeed, the refinement of must also be discrete and there is at most one color preserving bijection between the vertex sets which can be trivially checked for being an isomorphism. However, if the coloring of is not discrete we need to do more work. In this case we select a color class, usually called a cell, and then individualize a single vertex from the class. Here individualization means to refine the coloring by making the vertex a singleton. Since such an operation is not necessarily isomorphism-invariant, we branch over all choices of this vertex within the chosen cell. To the coloring with the newly individualized vertex, we apply the refinement operator again and proceed in a recursive fashion. To explain this in more detail we first need to clarify how the cell is chosen.
Let be a graph and be a coloring of the vertices. A cell selector is an isomorphism-invariant function which takes as input a graph and a coloring and either outputs with if such a color exists or otherwise. In this context isomorphism-invariant means that implies . The performance of an individualization-refinement algorithm can drastically depend on the cell selection strategy. A typical strategy would be to take the first class of smallest size.
Let be a graph with an initial coloring . Let be a cell selector and a refinement operator. Inductively define the search tree as follows. The root of the tree is labeled with the empty sequence . Let be a node of the search tree. Let be the coloring computed by the refinement operator for the current sequence and let be the color selected by the cell selector. If then is a leaf of the search tree and the coloring is discrete. Otherwise, for each , there is child node labeled with . The vertices of the search tree are referred to as nodes and we identify them with the sequence of vertices they are labeled with.
Pruning with invariants.
Together a cell selector and a refinement operator are sufficient to build a correct isomorphism test. Indeed, two graphs are isomorphic if and only if they have isomorphic leaves in their search tree. For these leaves, due to having a discrete coloring, isomorphism is trivial to check. However, there are two further ingredients that are crucial for the efficiency of practical individualization-refinement algorithms. These are the use of node invariants and the exploitation of automorphisms. Let be a totally ordered set. A node invariant is an isomorphism-invariant function taking a graph , a coloring and a sequence and outputs an element such that for all vertex sequences of equal length
if then it also holds for all that and
if and are discrete and then .
Here isomorphism-invariant means that implies .
Let be a node invariant and define
Finally, define the search tree as the subtree induced by the node set . Observe that Property (i) implies that is indeed a tree. By using the invariant we thus cut off the parts of the search tree that do not have a nodes that are minimal among all nodes on their level. However, due to isomorphism invariance, the property that two graphs are isomorphic if and only if they have isomorphic leaves remains.
The use of an invariant also makes it easy to define a canonical labeling using a leaf for which the invariant is smallest. For the purpose of obtaining our lower bounds we will not require detailed information on the concept of a canonical labeling and rather refer to [mckay].
Pruning with automorphisms.
The second essential ingredient needed for the practicality of individualization-refinement algorithms is the exploitation of automorphisms. Indeed, if for two nodes in , labeled with and , respectively, we have then it is sufficient to explore only one of the subtrees corresponding to the two nodes (we refer to [mckay] for correctness arguments). Thus automorphisms that are detected by the algorithm can be used to cut off further parts of the search tree. An efficient strategy for the detection of automorphisms is an essential part of individualization-refinement algorithms and here the various packages differ drastically (see [mckay]). Making our lower bounds only stronger, in this paper we take the following standpoint. We will assume that all automorphisms of the input graph are provided to the algorithm in the beginning at no cost. In fact the following lower bound will be sufficient for our purposes.
The running time of an individualization-refinement algorithm with cell selector , refinement operator and invariant on a graph is bounded from below by .
The program nauty has an extensive selection of refinement operators/invariants that can be activated via switches. To name a couple, there are various options to count for each vertex the number of vertices reachable by paths with vertex colors of a certain type, options to count substructures such as triangles, quadrangles, cliques up to size 10, independent sets up to size 10, and even options to count the number of Fano planes (the projective plane with 7 points and 7 lines). Finally, the user can implement their own invariant via a provided interface.
Our goal is to make a comprehensive statement about individualization-refinement algorithms independent of the choices for , , and . However, there is an intrinsic limitation here. For example a complete invariant that can distinguish any two non-isomorphic graphs would yield a polynomial-size search tree. Likewise would a refinement operator that refines every coloring into the orbit partition under the automorphism group. However, we do not know how to compute these two examples efficiently. In fact computing either of these is at least as hard as the isomorphism problem itself.
Of course it is nonsensical to allow that an individualization-refinement algorithm uses a subroutine that already solves the graph isomorphism problem. It becomes apparent that there is a limitation to the operators we can allow. However, within this limitation we strive to be as general as possible. With this in mind, throughout this paper we require that the information computed by refinement operators, invariants and cell selectors can be captured by a fixed dimension of the Weisfeiler-Leman algorithm. This is the case for all available choices in all the practical algorithms. In what follows, we describe the requirement more formally.
We say a cell selector is -realizable if whenever . Similarly a node invariant is -realizable if whenever . Intuitively this means that whenever the -dimensional Weisfeiler-Leman algorithm cannot distinguish between the graphs associated with two nodes of the refinement tree then the cell selector and the node invariant have to behave in the same way on both nodes. Finally a refinement operator is -realizable if holds for all triples .
We want to stress the fact that all the operators used in all practical implementations (e.g. nauty/traces, bliss, conauto, etc.) are -realizable for some small constant . In fact, from a theoretical point of view it would always be better to directly use the Weisfeiler-Leman algorithm as a refinement operator, since it is polynomial-time computable. Let us remark that the only reason why the individualization-refinement algorithms do not do this is the excessive running time and space consumption.
Based on these definitions we can now formulate our main result, which implies an exponential lower bound for individualization-refinement algorithms within the framework.
For every constant there is a family of rigid graphs with such that for every -realizable cell selector , every -realizable refinement operator , and every -realizable node invariant it holds that
Together with Proposition 3.1 this implies exponential lower bounds on the running time of individualization-refinement algorithms.
For the theorem we construct graphs which have large search trees. To prove that these search trees are large, we use the following lemma stating that for every two tuples for which holds, either both tuples are nodes in the search or neither of them is.
Suppose and let be a -realizable cell selector, a -realizable node invariant and a -realizable refinement operator. Furthermore, let be a graph and suppose . Let . Then for every with .
We prove the statement by induction on . For the statement trivially holds since . So suppose . Let be the tuple obtained from by deleting the last entry and similarly define . Clearly, and . So by induction hypothesis it follows that . Let and . Because and are -realizable we get that . So and also since . Thus, . Furthermore, which implies that . ∎
In the light of the lemma, to prove our lower bound it suffices to construct a graph whose search tree has a node with an exponential number of equivalent tuples. To argue the existence of these we, roughly proceed in two steps. First, we show that to obtain a discrete partition we have to individualize a linear number of vertices, in other words, we show that the search tree is of linear height. Thus, there is a node in the corresponding search such that is linear in the number of vertices of . Then, in a second step, we show that if is sufficiently large, there are exponentially many equivalent tuples. To find such equivalent tuples we prove a limitation of the effect of the -dimensional Weisfeiler-Leman algorithm after individualizing the vertices from . Intuitively we identify a subgraph containing which encapsulates the effect of the Weisfeiler-Leman algorithm. We then exploit the existence of many automorphisms of this subgraph to demonstrate the existence of many equivalent tuples.
4 The multipede construction
We describe a construction of a graph from a bipartite base graph . The construction is a combination of the Cai-Fürer-Immerman construction [cfi] giving graphs with large Weisfeiler-Leman dimension and the construction of Gurevich and Shelah of multipedes [DBLP:journals/jsyml/GurevichS96] that yields rigid structures with such properties.
The Cai-Fürer-Immerman gadget.
For a non-empty finite set we define the CFI gadget to be the following graph. For each there are vertices and and for every with even there is a vertex . For every with even there are edges for all and for all . As an example the graph is depicted in Figure 4.1. The graph is colored so that forms a color class for each and so that forms a color class.
Let and . We say that swaps exactly the pairs of if for and for .
Lemma 4.1 ([cfi]).
Let . Then there is an automorphism swapping exactly the pairs of if and only if is even. Additionally, if such an automorphism exists, it is unique.
The multipede graphs.
Let be a bipartite graph. We define the multipede graph construction as follows. We replace every by two vertices and . For each let and for define . We then replace every by the CFI gadget and identify the vertices and with and for all . The middle vertices will be denoted by for with even and the set of all middle vertices of is denoted by . For we define . We also define a vertex coloring for the graphs, however we only specify the color classes, since the actual names of the colors will be irrelevant to us. In this manner, we require that for each , the pair of vertices forms a color class, and for each the set of corresponding middle vertices forms a color class. The resulting (vertex colored) graph will be denoted by . An example of this construction is shown in Figure 4.2.
For we further define the graph similar to but refine the coloring so that for each both and form a color class. Hence, .
Since we are aiming to construct rigid graphs we start by identifying properties of that correspond to having few automorphisms.
Let be a bipartite graph. We say is odd if for every there exists some such that is odd.
Let be an odd bipartite graph. Then is rigid for every .
Let be an automorphism. Due to the coloring of the vertices, we know stabilizes every set for and every set for .
Let . Suppose towards a contradiction that . Since is odd there is some such that is odd. Then restricts to an automorphism of the gadget swapping an odd number of the outer pairs. This contradicts the properties of CFI gadgets (cf. Lemma 4.1).
So and thus for all . From this it easily follows that is the identity mapping. ∎
For a bipartite graph let be the matrix with if and only if . We denote by the transpose and by the -rank of a matrix .
Let be a bipartite graph. Then .
For a matrix we define the set . To show the lemma it suffices to argue that .
For we define the vector by setting if and only if . Observe that the mapping is injective. Furthermore since for each the automorphism swaps an even number of neighbors of .
For the backward direction let . Then, for each , the set has even cardinality. Thus, by the properties of the CFI-gadgets (cf. Lemma 4.1), there is a unique automorphism that swaps exactly those pairs for which . ∎
The arguments show that a bipartite graph is odd if and only if .
Let be an odd bipartite graph. Then there is some with such that the induced subgraph is odd.
We can also use to quantify how many vertices must be individualized to make rigid.
Let be a bipartite graph. Then there is with such that is rigid.
Let be the standard basis for (that is, if and only if ). Furthermore, for , let be the -th row of and let be a minimal subset of such that spans the entire space . Finally, let . Clearly, .
We argue that is rigid. Let and let be the vector obtained by setting if and only if . Then for all . Furthermore for all by the same argument as in the proof of Lemma 4.4. Since spans the entire space it follows by the standard linear algebra arguments that . Thus is the identity. ∎
5 The Weisfeiler-Leman refinement and the -closure
We wish to understand the effect of the Weisfeiler-Leman refinement on graphs obtained with the construction from the previous section. To this end we define the -closure.
Let . For define the -attractor of as
A set is -closed if . The -closure of is the unique minimal superset which is -closed, that is
As observed in [DBLP:journals/jsyml/GurevichS96] the 1-closure describes exactly the information the 1-dimensional Weisfeiler-Leman captures.
Let be a bipartite graph and suppose . Then
for all .
The backward direction follows by an inductive argument from the properties of the CFI gadgets. For the forward direction it is easy to check that the corresponding partition on directly extends to a stable partition for the graph . ∎
Let be a bipartite graph. Slightly abusing notation, for a set define . For define (for a graph and a set the graph is the induced subgraph of with vertex set ).
In this work we essentially argue that the previous lemma can be used to show that for a 1-closed set and a sequence of vertices from , for every automorphism we have . The 1-closure thus gives us a method to find tuples that cannot be distinguished by the 1-dimension Weisfeiler-Leman algorithm. However, we require such a statement also for higher dimensions. Obtaining a similar statement characterizing the effect of the -dimensional Weisfeiler-Leman seems to be much more intricate and it is easy to see that the -closure does not achieve this. However, under some additional assumptions, we show that the forward direction of the previous lemma still holds and thus, the -closure gives us a tool to control the effect of -dimensional Weisfeiler-Leman which is sufficient for our purposes.
Let and suppose . Let be a bipartite graph and be a -closed set. Furthermore suppose that for distinct we have . Let be a sequence of vertices with and let . Then .
We prove that Duplicator has a winning strategy in the bijective -pebble game played on and . Towards this end we say a vertex (respectively ) is pebbled if there exists (respectively ) which is pebbled. Furthermore we say that a vertex is fixed if there is some pebbled with . For a tuple of length at most of pebbled vertices let
Now let , and suppose there is an isomorphism from to mapping to and to . Observe that extends and for we can choose .