1 Introduction
###### Abstract

Chromatic polynomials are important objects in graph theory and statistical physics, but as a result of computational difficulties, their study is limited to graphs that are small, highly structured, or very sparse. We have devised and implemented two algorithms that approximate the coefficients of the chromatic polynomial , where is the number of proper -colorings of a graph for . Our algorithm is based on a method of Knuth that estimates the order of a search tree. We compare our results to the true chromatic polynomial in several known cases, and compare our error with previous approximation algorithms.

Approximating the Chromatic Polynomial

Yvonne Kemper111National Institute of Standards and Technology, Gaithersburg, MD; yvonnekemper@gmail.com; isabel.beichl@nist.gov and Isabel Beichl

## 1 Introduction

The chromatic polynomial of a graph has received much attention as a result of the now-resolved four-color problem, but its relevance extends beyond combinatorics, and its domain beyond the natural numbers. To start, the chromatic polynomial is related to the Tutte polynomial, and evaluating these polynomials at different points provides information about graph invariants and structure. In addition, is central in applications such as scheduling problems [26] and the -state Potts model in statistical physics [10, 25]. The former occur in a variety of contexts, from algorithm design to factory procedures. For the latter, the relationship between and the Potts model connects statistical mechanics and graph theory, allowing researchers to study phenomena such as the behavior of ferromagnets.

Unfortunately, computing for a general graph is known to be -hard [16, 22] and deciding whether or not a graph is -colorable is -hard [11]. Polynomial-time algorithms have been found for certain subclasses of graphs, including chordal graphs [21] and graphs of bounded clique-width [12, 20], and recent advances have made it feasible to study for graphs of up to thirty vertices [29, 30]. Still, the best known algorithm for computing for an arbitrary graph of order has complexity [4] and the best current implementation is limited to and [14, 15].

Approximation methods have received less attention, though they can provide significant information, see Section 2.1. The only previous approach – developed by Lin [19] – is based on a theorem of Bondy [5] that gives upper and lower bounds for . Lin’s algorithm is a greedy method that uses a particular ordering on the vertices to derive upper and lower polynomial bounds; the final result is a mean of these two polynomials. While this algorithm has complexity , it gives a single fixed estimate, the accuracy of which cannot be improved by further computation.

In the following, we take a different approach and adapt a Monte Carlo method of sampling used by Beichl, Cloteaux, Sullivan, and others (e.g. [2]) to be the basis of two approximation algorithms. We have computed approximations of the coefficients of for Erdős-Rényi (ER) random graphs with up to vertices and (larger graphs are possible), and report evaluation times. Though ER graphs are known to have certain structural properties and invariants [17], they are frequently used as test cases and first-order approximations of unknown networks, and the ability to approximate their polynomials is both demonstrative and useful. To evaluate the accuracy of our algorithm in certain cases, we compute approximations of for graphs with known chromatic polynomial formulas, as well as for a variety of random graphs small enough to compute precisely. We compare the relative error in evaluation for Lin’s and our algorithm, and the error in individual coefficients for our algorithm.

In Section 2, we give the relevant graph theory background and briefly discuss the connections with the Potts model and other applications. The Monte Carlo method we modify is based on an idea of Knuth that is detailed in Section 2.2, and we present our algorithms in Sections 3 and 4. An overview of our results appears in Section 5, including discussions of complexity and variance. We conclude in Section 6.

## 2 Background

We recall just the relevant definitions; further background on graphs and the chromatic polynomial may be found in [8, 31]. In this paper, a graph is given as , where is a collection of vertices and is the order of , and where is a collection of edges between two vertices and is the size of . Two vertices and are adjacent if connected by an edge; that is, if . A subset of vertices is independent if no vertices in are adjacent. A circuit is a closed path with no repeated edges or vertices (except the first and last vertex). A proper -coloring of a graph is a labeling of its vertices with at most colors such that adjacent vertices receive different colors, and the smallest number of colors necessary to properly color a graph is the chromatic number of , . Finally, the chromatic polynomial of a graph is the polynomial such that is the number of proper colorings of using at most colors. It is well-known that the degree of is and that the coefficients alternate in sign. As an example, we consider the kite graph.

###### Example 2.1

Let be the kite graph. We draw it here with a proper -coloring. The chromatic polynomial of is .

### 2.1 The Potts Model and Other Motivation

As mentioned above, the chromatic polynomial has recently grown in interest as a result of its connection to the Potts model. To see this, we expand , where , as the sum over products of Kronecker deltas :

 P(G,q)=q∑σvn=1⋯q∑σv1=1⎛⎝∏vivi∈E(G)(1−δσviσvj)⎞⎠, (1)

where , and is a coloring of the vertices. If , and , then indicating an improper coloring – this assignment of ’s is thus not included in the sum. In this manner, we may also interpret as a ‘global microscopic state of an anti-ferromagnetic Potts model with the individual ’s being local states or spin values’ [30]. Thus, (1) can be used to count energy minimizing global states. In practice, scientists use solutions of graphs of small order to predict the behavior of larger graphs. Our method could provide a quick check for the accuracy of these extrapolations, and allows scientists to make better predictions for still larger numbers of vertices.

As mentioned above, coloring problems correspond directly to situations in which jobs must be scheduled, but cannot happen simultaneously. In this case, using an approximation of it is possible to estimate the number of possible solutions given specific parameters. An approximation further gives an estimate of the chromatic number of a graph, useful as a lower bound for the graph bandwidth problem [6]. In particular, when plotting an approximation , the integer at which it increases rapidly serves as an indication of when becomes colorable. Approximating the coefficients additionally provides information about the broken circuit complex of a graph, an object fundamental in understanding shellability and homology of geometric lattices, matroids, linear hyperplane arrangements, etc. (See [31] for details on matroids and broken circuits complexes.) It may also be possible to study the structure of these complexes using the approximations generated by the algorithms.

### 2.2 Knuth’s Algorithm

Knuth’s algorithm [18] is a way to approximate the run-time of a large backtrack program. Specifically, it estimates the number of nodes of the search tree without actually visiting all of them, and in some cases, it finishes very quickly. This estimation is accomplished by selecting a path and noting the number of children at each stage in the path (we set , for the root node). Each path begins at the root and ends at a leaf, and at each node on the path a child to follow is selected uniformly at random. The estimate of is then

 C≈n0+n0n1+n0n1n2+⋯+n0n1⋯nl=n0(1+n1(1+n2(1+⋯))), (2)

where is the length of the path from root to leaf. The idea that this is a reasonable estimate is based on the likelihood that a node selected at random at level is similar to other nodes at level . Moreover, the expected value of the sum in (2) is the true value of . To see this, first we express

 C=∑T∈ST(G)1.

Then, of all the nodes in level , the probability of picking a particular node is

 p(Tk)=1n0n1⋯nk−1nk,

where is as in (2). Each time we include in the path, we include the value as part of our approximation, and by linearity of expectation:

 E(n−1∑k=0n0n1⋯nk−1nk) = E(n−1∑k=01p(Tk)) = n−1∑k=0E(1p(Tk)) = n−1∑k=0⎛⎝∑Tk1p(Tk)p(Tk)⎞⎠ = ∑T∈ST(G)1 = C.

The average over sufficiently many samples is thus a good approximation for the true value of . We estimate coefficients of by computing estimates of the number of nodes on each level, a slight modification.

## 3 The Broken Circuit Algorithm to Approximate P(G,x)

In what follows, we assume all graphs are simple (removing parallel edges does not change , and graphs with loops have no proper colorings, i.e. ) and connected (the chromatic polynomial of a graph with multiple components is equal to the product of the chromatic polynomials of its components). Let a graph have vertices and edges. For our algorithm, we make use of a particular combinatorial interpretation of the coefficients of . Before stating this interpretation, we recall the notion of a broken circuit.

###### Definition 3.1

Given a graph with a total linear ordering on the edges , a broken circuit (BC) is a circuit with one edge removed such that for all .

More on broken circuits can be found in [31, Ch. 7]. A classic result [32] proves

 P(G,x)=n−1∑i=0(−1)ibixn−i, (3)

where and contains no broken circuits.

Notice that the linear ordering on is not specified. Amazingly, any ordering will work, as long as we are consistent [31, Thrm 7.3.7.], though certain orderings will be advantageous in our approximation, see Section 3.1. To illustrate Whitney’s theorem, consider again the kite graph from Example 2.1. For the sake of brevity, we will refer to the edges as , , , , and . Let . There are three circuits: , , and , and so we have three broken circuits: , , and . Pictorially, we forbid the subgraphs shown in Figure 1. The possible “no broken circuit” (NBC) subgraphs are shown in Figure 2, and we see that the (absolute value of) coefficient of in coincides with the number of possible NBC subgraphs of size .

To compute an approximation of the coefficients of for a general graph using Knuth’s idea, we must first design an appropriate search tree. To this end, assign a total linear ordering on , and let be the tree such that the nodes at level correspond to the NBC subgraphs with edges, where . Each node at level is labeled with an NBC subgraph with edges, and has as children the NBC subgraphs of size that can be obtained by adding one edge to . In particular, the root (level zero) has children, and the nodes at level have no children. (If contains a circuit, then it necessarily contains a broken circuit. The maximal circuit-free subgraphs of a graph are the spanning trees, all of which have size .) Note that each NBC subgraph labels different nodes, as we can select the edges in any order. Putting all of this together, we have that by approximating the number of unique nodes at each level, we approximate the coefficients of . Note that the search tree is labeled with respect to the ordering . Though the number of unique nodes on each level is the same for any ordering, the shape of the tree is affected by the linear ordering – again, see Section 3.1.

To avoid constructing the entire search tree, we take a sample by building a single NBC spanning tree using a version of Kruskal’s spanning tree algorithm [3]. To do this, we start with all of the vertices and none of the edges, and one at a time add edges that do not introduce broken circuits stopping when we have a spanning tree. At each stage , , we record the number of edges we could add (i.e. , the number of children), and choose the next edge uniformly at random. Then, we approximate:

 bk≈n0n1⋯nk−1nkk!=bk−1nkk. (4)

We initialize with the leading coefficient . The basic algorithm is summarized as pseudocode in Algorithm 1. To illustrate the BC algorithm, we perform one branch traversal for the kite graph in Figure 3, with the same ordering as before. For this example, the approximation of is . Theoretically, the algorithm returns three different polynomials for the kite graph: , , and with probabilities , , and , respectively. The expected value is then , exactly what we wanted.

### 3.1 Ordering the Edges

Different edge orderings can significantly affect the magnitude of the variance of the samples as they alter the uniformity of the search tree. The ideal ordering would result in every node at every level having the same number of children – this is not usually possible (trees are one exception), but a “good” ordering will get as close to this as possible. A “bad” ordering, on the other hand, results in a wide range of numbers of children. In Example 3.2, we show how two different orderings of the edges of the kite graph affect the structure of .

###### Example 3.2

On the left of each box in Figure 4, we have the kite graph with two edge orderings: , where , and , where . On the right, we show the corresponding to the ordering. The leaves in are labeled ‘’ (abbreviating ‘’), where edge ‘’ was added first, then ‘’, and finally ‘’. The “good” ordering (when combined with the improvement described in Section 3.2) is in fact the best we can do for the kite graph.

Experimentally, we found that the ordering that gave the least variance (while not being too computationally expensive) was based on a perfect elimination ordering of the vertices.

###### Definition 3.3

Let be a graph with vertices. A perfect elimination ordering of the vertices is an ordering such that the neighbors of form a clique in , where , and .

See [19, 24] for more on perfect elimination orderings and the algorithmic aspects thereof. Not every graph has a perfect elimination ordering (in fact, only chordal graphs; this is one characterization of this class of graphs), but we approximate such an ordering as follows. Let , and . Then, let

 vi={v∈Gisuch that the neighbors of v form a% clique in Gi, orw∈Gisuch that the degree of w is minimal in Gi % if no such v exists.

Now, let . An edge is smaller than if and only if or if and . The complexity of this ordering is analyzed in Section 5.2. We conjecture that this is the best ordering possible; the intuitive and theoretical bases of this conjecture have their sources in matroid theory and broken circuit complexes.

### 3.2 An Improvement to the BC Algorithm

A deeper understanding of the structure of the NBC subgraphs allows for an improvement to the BC algorithm. Given a graph and a total linear ordering on its edges, let be the smallest edge with respect to . Then, is in every NBC spanning tree. (If , for some spanning tree of , the contains a circuit with . Then, is a broken circuit.) Now, let be the set of NBC subgraphs of that do not contain . As every NBC subgraph of is a subgraph of one (or more) of the NBC spanning trees, can be added to any subgraph , and still have an NBC subgraph. Let . Then, , and every NBC subgraph is contained in . Therefore, to compute , it is sufficient to know the number of NBC subgraphs of each dimension in . In particular, if is the number of NBC subgraphs in with edges, we can write

 P(G,x) = a0xn−(a0+a1)xn−1+(a1+a2)xn−2+⋯ ⋯+(−1)n−2(an−3+an−2)x2+(−1)n−1(an−2)x

The change from the original algorithm is to include in the NBC tree we are building as an initial step, then perform our usual sampling, where the root of the tree is labeled by the subgraph containing only . We approximate the number of uniquely labeled nodes on each level in the same manner as before.

We then compute our approximation of the coefficients of using the relationships , , and for . This decreases the variance for two reasons. First, we remove the inherent imbalance in the number of children of each node on the second level. (The node labeled with the subgraph containing just the edge will have at least as many children as any other NBC subgraph with just one edge.) Second, we are performing one level of approximation fewer, which reduces the compounding variance.

## 4 The Falling Factorial Algorithm to Approximate P(G,x)

Like the BC algorithm, the Falling Factorial (FF) algorithm makes use of a variation of Knuth’s method to approximate the coefficients of , but is based on a different expansion of the chromatic polynomial. As before, graphs are simple and connected, with edges and vertices. We can express

 P(G,x)=n−1∑i=0pn−i⟨x⟩n−i,

where is the number of ways to partition the vertices of into independent sets and . See [7, 23] for more details of this expansion. In Figure 5, we show the partitions of . We have arranged the possible partitions as nodes of a tree, where the partition with independent sets is the root, and the leaves represent partitions with minimal numbers of parts, and each level is a refinement of its children in level . Notice that unlike the BC tree, maximal paths may be of different lengths. The leaves of paths of longest length correspond to colorings with the minimum number of colors. Such a tree exists for any graph , and does not depend on any labeling of or , so we denote it simply . We can approximate the number of nodes on each level using Knuth’s method, though in this instance, counting the number of repetitions of nodes becomes more involved.

### 4.1 Counting Repetitions

Say we have a partition with blocks appearing on level of the tree (note that partitions with blocks will only appear on this level). The elements of each block represent the vertices of , and so are distinguishable, but the blocks themselves are not. Duplicates of a partition occur because independent subsets may be combined in many ways to form , and to account for these in our approximation, we must determine out how many ways there are to refine to singletons using refinements.

###### Lemma 4.1

Let be a graph, and let be the tree described in Section 4. A partition labeling a node on level of the tree with blocks of sizes , ,…, is duplicated

 (k∏i=1βi!(βi−1)!2βi−1)⋅n−k(β1−1)!(β2−1)!⋯(βk−1−1)! (5)

times.

• Say we have a partition with blocks appearing on level of the tree . We know that the number of ways to transform distinguishable objects into singletons with a sequence of refinements is [9]. Thus, if we consider one block , we have possible paths from the singletons in to . The product in parentheses in (5) is the number of ways to choose paths for all blocks. The right hand side of (5) then counts the number of ways to merge the steps of the paths; this is a simple multichoose evaluation. A path to a block with elements has by definition steps, so the number of ways to order the all the steps of a particular selection of paths is:

 (∑ki=1(βi−1)(β1−1),(β2−1),…,(βk−1))=(β1−1)+(β2−1)+⋯+(βk−1)(β1−1)!(β2−1)!⋯(βk−1−1)!.

Notice that . Thus, the total number of distinct paths in from is precisely the value in (5).

As the number of duplicates depends only on the sizes of the blocks and how many blocks there are (and not on the contents of the blocks), we will denote the number of duplicates as .

### 4.2 Pseudocode for the FF Algorithm

Our algorithm starts with a partition of non-empty blocks (i.e. each vertex is in its own independent set). At each level, we find all pairs of blocks that could be combined and still be an independent set, and record this number. One pair is selected uniformly at random, and the chosen blocks are merged. This is repeated until we cannot merge any more blocks while maintaining the independent set condition (again, the number of levels in a path can vary). This procedure is repeated many times, and the results of each sample are averaged. Zero values (e.g. when one path is shorter than another) are included in the average. Then, the number of independent vertex partitions with blocks is approximated as

 pn−i≈c0c1c2⋯ciF(β1,β2,⋯,βn−i),

where and is the number of children of the root, the number of children of the child selected in the first level, and so on. We summarize this as pseudocode in Algorithm 2.

As before, the central question is how we determine . This is far simpler than with broken circuits. We encode the blocks as the columns (and rows) of an adjacency matrix , where initially, is just the (unsigned) adjacency matrix of . Merging two blocks and is permitted (i.e. is independent) when . In terms of the graph, this means there are no edges connecting any of the vertices in with any of the vertices in . Thus, the number of possible pairs is equal to the number of zeros above (or below: the matrix is symmetric) the diagonal of . When we merge two blocks and , this corresponds to adding column to column , and row to row , then deleting row and column (or vice-versa; for simplicity, we let be the block of smaller index). The while loop consists of repeating this process until there are no more zeros above the diagonal in the latest updated .

## 5 Analysis and Experimental Results

### 5.1 Implementation

We implemented our algorithms in both MatLab and C222Certain commercial equipment, instruments, or materials are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.. The C implementation allows for the analysis of graphs of dramatically larger size and order than previously possible. The largest graph (that we could find and may report coefficients for) with no explicit formula for for which the chromatic polynomial has been computed is the truncated icosahedron, with vertices and edges [14]. Using our algorithms, we have computed approximations of graphs with up to vertices and edges, and larger are possible.

Larger graphs do introduce the complication of storing extremely big numbers. For instance, the coefficient of of , where is the cycle graph on vertices, is about . To accommodate these numbers, we make use of an idea invented by Beichl, Cloteaux, and Sullivan using logarithms to accurately update the running average of the samples and compute the mean and variance; for specific details, see [2, Sec. 4].

### 5.2 Complexity of the BC Algorithm

Ordering the edges of as described in Section 3.1 requires to order the vertices [19] and to sort the edges with respect to that order. Thus, preprocessing requires . The BC algorithm is based on Kruskal’s spanning tree algorithm, with the extra condition of avoiding broken circuits. We use Tarjan’s implementations [27, 28] of find and union, which together have complexity , where is the inverse Ackermann function [1].

To build at each level, we test (at most) edges to see if they could be added to the tree without introducing any broken circuits. To do this, we pretend to add each edge to the current subgraph– – then check to see if any edge creates a circuit – . If so, we find the cycle and check if is the smallest edge in the cycle – . We repeat the latter two operations at most times while building . After building , we pick from it an edge at random – – and add it to the NBC tree – . A spanning tree has edges, thus we repeat the above times. Therefore, for one sample, the entire BC algorithm has complexity , and each sample following the first would have complexity . Furthermore, since we assume our graph is connected, , and we may simplify our complexity to . As the number of edges just decreases by one in the improved version of BC, this does not change the complexity of the algorithm.

### 5.3 Complexity of the FF Algorithm

The FF algorithm requires no preprocessing outside of populating the adjacency matrix – . The first section of Algorithm 2 requires time to assign vertices to the blocks and determine (i.e. to count the number of zeros in the adjacency matrix).

In the worst-case scenario, the while loop will be repeated times. At iteration , the number of zeros is bounded above by . To determine this set, and then to pick randomly from it for all iterations, we require time. Then, to determine each time, we capitalize on the fact that only two blocks change. By using the value of to compute , each iteration requires only , thus for the entire loop. We therefore have a complexity of for the first sample, and a complexity of for each sample of the algorithm after the first.

This complexity is significantly better than that of the BC algorithm, and in practice, it finishes in far less time. However, this algorithm is less useful than the BC algorithm, for several reasons. First, as the tree is less uniform (maximal paths are different lengths, for instance), we have much larger variance, and must take more samples. Moreover, there is no way to alter the uniformity of the search tree, as in the BC algorithm. Second, as partitions corresponding to proper -colorings can have low probability, we may never select such a path, even after many samples. Third, transitioning between the falling factorial and broken circuit expansions of requires multiplication by a large matrix of Stirling numbers: adding, subtracting, and multiplying large numbers is computationally difficult, and often results in inaccuracies. We therefore restrict ourselves to the BC Algorithm in the following analysis.

### 5.4 No FPRAS for P(G,x)

It is known [13] that no fully polynomial-time randomized approximation scheme (FPRAS) exists for the chromatic polynomial, unless . That is, if there existed polynomial-time approximation scheme that could give the coefficients of to within a certain probabilistic error-bound, we would be able to decide if a graph was -colorable, for any , in polynomial time – a problem known to be NP-hard (except for ). Our complexity sections might appear to contradict the fact that no FPRAS exists for , but it is important to keep in mind that these are complexities for a single sample – in order to get within a certain error bound, we might have to take an exponential number of samples.

However, we will show in the next sections that despite these limitations our algorithm still produces a reasonable estimate in a short amount time, as judged by the convergence of the average of the coefficients, as well as comparison in the case of known graph polynomials.

### 5.5 Run-Time

In Figure 7, we show the time in seconds required to take ten samples of ER graphs of sizes using the BC algorithm. We used an empirical method based on the convergence of the running averages of the coefficients to decide an appropriate number of samples. In particular, we take a large number of samples, and if the running average does not vary more than about one percent of the mean for that coefficient, we stop. If not, more samples are taken. To illustrate this idea, in Figure 6, we show the running averages for each coefficient over the course of samples for an ER graph of order and size , for the BC algorithm. As the first two coefficients have no variance ( and , for every sample), we do not include the convergence graphs for and . As the expected value of the algorithm is the precise value, it is reasonable to assume that when the running average flattens, it is converging to the correct answer. Naturally, larger graphs will require larger numbers of samples, especially for the coefficients with larger variance. The benefit of this algorithm, however, is that each sample is independent. If an insufficient number of samples has been taken, we can simply take more, and include these in the average. Again, the BC algorithm lends itself perfectly to parallelization: each sample is independent, so we may run as many copies of the same algorithm in parallel as we wish.

Timme et al.’s innovative techniques [29] have allowed scientists to consider the chromatic polynomials of certain large graphs. In this paper, the chromatic polynomial of the grid graph with vertices and edges was reported to be computed exactly in hours on a single Linux machine with an Intel Pentium 4, 2.8GHz-32 bit processor. Our BC algorithm took seconds for (successive) samples on a single Linux machine with an Intel Xeon Quad-Core 3.4GHz processor. The average relative error – that is, the difference between the true value and the approximate value, divided by the true value – for the coefficients was 0.0062.

### 5.6 Variance

We give the average relative variance for the BC algorithm for graphs of orders , , , , , , , , , and in Figure 8. To get the data for these numbers, we took ten graphs of each order, approximated the coefficients, and found the variance for each coefficient for each graph. We then averaged the variance over all the coefficients for all the graphs of a particular order. The variance of our samples could be quite large; large variance is inherent in the Knuth method, as low probability events can occur. The variances in Figure 8 are with respect to the perfect elimination ordering method.