A Sequential Importance Sampling Algorithm for Estimating Linear Extensions

# A Sequential Importance Sampling Algorithm for Estimating Linear Extensions

Isabel Beichl National Institute of Standards and Technology Alathea Jensen George Mason University Francis Sullivan Institute for Defense Analyses
August 1, 2017
###### Abstract

In recent decades, a number of profound theorems concerning approximation of hard counting problems have appeared. These include estimation of the permanent, estimating the volume of a convex polyhedron, and counting (approximately) the number of linear extensions of a partially ordered set. All of these results have been achieved using probabilistic sampling methods, specifically Monte Carlo Markov Chain (MCMC) techniques. In each case, a rapidly mixing Markov chain is defined that is guaranteed to produce, with high probability, an accurate result after only a polynomial number of operations.

Although of polynomial complexity, none of these results lead to a practical computational technique, nor do they claim to. The polynomials are of high degree and a non-trivial amount of computing is required to get even a single sample. Our aim in this paper is to present practical Monte Carlo methods for one of these problems, counting linear extensions. Like related work on estimating the coefficients of the reliability polynomial, our technique is based on improving the so-called Knuth counting algorithm by incorporating an importance function into the node selection technique giving a sequential importance sampling (SIS) method. We define and report performance on two importance functions.

## 1 Introduction

For a partially ordered set, any total order which respects the partial order is known as a linear extension. Determining the number of linear extensions of a given partially ordered set (poset) is a fundamental problem in the study of ordering with many applications.

The most common applications are in scheduling problems, where the number of linear extensions gives the size of the solution space. For example, if we are trying to find a schedule that is optimal with respect to some cost function, the size of the solution space can be valuable in deciding how to carry out the search. If the solution space is small, then we may be able to perform an exhaustive search for the best order, but if the solution space is large, then we may wish to settle for an approximate solution.

Computing the number of linear extensions exactly is P-complete , but there are fully polynomial-time randomized approximation schemes to estimate the number, such as the Markov Chain Monte Carlo (MCMC) method presented in . These algorithms, while of great theoretical importance, are not practical because the polynomials are of high degree, even for the improved versions due to Bubbly-Dyer  and Wilson .

Our aim is a practical method based on sequential importance sampling (SIS) in the spirit of the applications discussed in  and . Our method may have exponential rather than polynomial complexity in some cases, but, as is typical of SIS, a modest amount of computation is often sufficient to learn at least something plausible about the number of linear extensions for any poset of interest. This is not the case for MCMC methods where significant computation may be required to obtain even a single sample chosen from a probability distribution close to the limit distribution.

In Section 2, we give definitions and notation necessary to describe our algorithm. In Section 3, we give the algorithm and explain how it works. The algorithm takes an importance function as an input, and in Section 5 we discuss the relative merits of several importance functions. In Section 4, we give an expression for the variance of the estimates produced by the algorithm and discuss methods for bounding the variance. In Section 6, we discuss possible improvements to the algorithm, and in Section 7 we give results of our numerical experiments with randomly generated posets.

## 2 Preliminaries

For a partially ordered set (poset), , we use the notation to mean that precedes in the partial order or that . If and are not equal but , we write . If and there is no such that , we say that covers .

By the descendants of an element we mean all such that . We will refer to the number of descendants of as . By the ancestors of we mean all such that .

A linear extension of is any permutation such that for all , implies that . We denote the set of all linear extensions of a poset by , and the number of linear extensions by . If has elements, then is some integer between and .

In practice, posets are often arise from directed acyclic graphs (DAGs), and in this context, a linear extension is usually called a topological ordering or topological sort. Each DAG has a unique poset that corresponds to its reachability relation; that is, in the poset if and only if there is a directed walk from to in the DAG.

Determining the reachability poset of a DAG amounts to taking the transitive reduction or the transitive closure of the DAG. This can be accomplished by boolean matrix multiplication and hence has the same complexity as that procedure .

In general, different DAGs can have the same reachability relation, but we can create a unique DAG from a poset by directing an edge from to if and only if covers in the poset.

## 3 The Algorithm

Generating a single linear extension of a poset is well-understood and can be done in time for a poset with elements. For example, any depth-first or breadth-first search of a DAG will generate a linear extension as a side-effect. For a breadth-first search, we can order the elements by their first visit times in ascending order, and for a depth-first search, we can order the elements by their last visit times in descending order. A greedy search will also generate a linear extension.

The standard method of obtaining a linear extension, which was first described in a 1962 paper by Kahn , corresponds to a breadth-first search. The procedure is as follows. We choose some minimal element of the poset and delete it, possibly causing some new elements to become minimal. We choose another minimal element of the poset and delete it. We repeat this until there are no elements left in the poset. The order in which the elements were deleted is then a linear extension of the poset.

We can also perform the same procedure with maximal elements instead of minimal, and in this case the reverse order of the deletions gives the linear extension. In this paper, we will use maximal elements instead of minimal.

Since all linear extensions can be obtained by sequential deletion of maximal elements, is equal to the number of ways to execute this procedure. We can think of the choices available to us at each step as forming a decision tree, where each branching corresponds to a choice of a maximal element, and each leaf corresponds to one complete linear extension. Hence the number of leaves is .

In the classic Knuth algorithm  for estimating the number of leaves of a tree, we walk a path from the root to a leaf of the decision tree by choosing which branch to take at each level uniformly at random from the branches available. We then take the product of the number of branches available at each step along the path to be an estimate of the number of leaves in the tree. Knuth showed that this estimate is unbiased, and so the mean of the estimates over many samples will approach the number of leaves.

In our case, this means that to produce an estimate of , we begin our estimate with the number of maximal elements in the poset, then delete one of the maximal elements uniformly at random. We then multiply our estimate by the new number of maximal elements, and again delete one of the maximal elements uniformly at random. We continue in this fashion until the poset is empty.

One of the chief difficulties with Knuth’s algorithm is that if the decision tree is far from uniform, the variance of the estimates may be large, and an exponential number of samples will be required for an accurate estimate. Unfortunately, the degree of non-uniformity is unknown in general. One way to mitigate the problem of non-uniform decision trees is to choose paths through the tree non-uniformly, according to an importance function. This is known as sequential importance sampling (SIS).

Consider a function from the elements of a poset to the positive real numbers. We will refer to as the importance function and to as the importance of . By the importance of a set , denoted , we mean the sum over of the importance of its elements.

To obtain an estimate of using SIS, we again as before traverse a single path from the root to a leaf of the decision tree. However, at each decision point, instead of choosing from among the maximal elements uniformly at random, we instead choose a maximal element with probability proportional to its importance.

To be more precise, if is a maximal element and is the set of all currently maximal elements, we choose from with probability . The factor multiplied into our estimate at that branch point is then rather than , and our estimate of is the product over each chosen of .

For example, in the decision tree shown in figure 1, the highlighted path gives an estimate of

 (r(a)+r(b)r(b))(r(a)+r(d)r(a))(r(c)+r(d)r(d))(r(c)r(c))

Because the probability of obtaining any given linear extension is the reciprocal of the estimate associated with that linear extension, when we take the expected value of the estimates, each linear extension contributes exactly 1 to the sum, and so the expected value of the estimates is . Hence SIS gives an unbiased estimate for .

Pseudocode for finding a single estimate using this method appears in Algorithm 1. For a poset with elements, the outer loop runs exactly times, while the inner loops run at most times. Finding maximal elements can also be accomplished in linear time. Hence the complexity of the algorithm is .

## 4 Variance

Here we calculate the relative variance of our estimates. Note that by “relative variance” we mean the ratio of the variance to the square of the mean. This is also the square of the coefficient of variation. We denote this quantity for a poset .

The relative variance of our algorithm is given explicitly by the following theorem.

###### Theorem 4.1.

For a poset with linear extensions, the relative variance of the estimates given by Algorithm 1 is

 RV(P)=⟨fP(λ)L⟩u−1

where is the estimate associated with linear extension and denotes the mean of the uniform distribution over all linear extensions.

###### Proof.

Let be the set of all linear extensions of poset , let be the SIS estimate associated with extension , and let . Note that is also the expected value of .

Recall that the probability of selecting extension is precisely . Hence

 E[fP(λ)2]=∑λ∈Λ(P)fP(λ)21fP(λ)=∑λ∈Λ(P)fP(λ)

Then, using our definition of relative variance, we have

 RV(P)=E[fP(λ)2]−E[fP(λ)]2E[fP(λ)]2=E[fP(λ)2]L2−1=1L2⎛⎝∑λ∈Λ(P)fP(λ)⎞⎠−1=1L⎛⎝∑λ∈Λ(P)fP(λ)L⎞⎠−1=⟨fP(λ)L⟩u−1

We can also express the relative variance as a recursive function of the relative variance of certain sub-posets, as detailed in the following theorem.

###### Theorem 4.2.

For a poset , let be the set of maximal elements of , where for , is the number of linear extensions of that begin with . Then the relative variance of the estimates produced by Algorithm 1 using importance function is given by

 RV(P)+1=r(M)L2∑m∈ML2mr(m)(RV(P∖m)+1)
###### Proof.

First, observe that the linear extensions of that begin with a particular are equivalent to the extensions in , but with appended to the beginning of each extension in .

Furthermore, is an extension in that begins with , then the estimate produced by Algorithm 1 for in is related to the estimate for the equivalent extension, call it , in , by the following formula.

 fP(λ)=r(M)r(m)fP∖m(λ′) (1)

Note that every extension in begins with some element , and so for each , there is such that equation (1) holds.

In the proof of Theorem 4.1 we showed that

 RV(P)+1=1L2∑λ∈Λ(P)fP(λ) (2)

Then, by substituting equation (1) into equation (2) and manipulating the summation expressions, we get

 RV(P)+1=1L2∑m∈M∑λ∈Λ(P∖m)r(M)r(m)fP∖m(λ)=r(M)L2∑m∈M1r(m)∑λ∈Λ(P∖m)fP∖m(λ)=r(M)L2∑m∈ML2mr(m)1L2m∑λ∈Λ(P∖m)fP∖m(λ) (3)

Theorem 4.1 tells us that

 RV(P∖m)+1=1L2m∑λ∈Λ(P∖m)fP∖m(λ)

Hence, from the last line of equation (3), we have

 RV(P)+1=r(M)L2∑m∈ML2mr(m)(RV(P∖m)+1)

This recursive form of the relative variance suggests a way to bound it, which is described in our next theorem.

###### Theorem 4.3.

Let be the set of all posets of size . Let be given by

 Ai:=maxP∈Pi(maxm∈Mr(M)r(m)LmL)

where , , and are defined as before with respect to . Then for all posets of size , the relative variance of the estimates produced by Algorithm 1 using importance function is bounded by

 RV(P)≤A1A2⋯An−1
###### Proof.

The proof is by induction on the size of the poset .

For the basis step, consider a poset of size 1. The single element, call it , is also the only maximum, and so and . Thus . Since this poset has only one possible linear extension, the algorithm always produces the same estimate, making the relative variance zero. Thus the theorem statement is true for .

Assume the theorem statement is true for all posets of size and less. Then for a poset of size , the theorem statement holds for the sub-posets , where . That means

 RV(P∖m)+1≤A1A2⋯An−1

Then, beginning with Theorem 4.2, we have

 ∑m∈MLmL=1

Hence,

 RV(P)+1≤A1A2⋯An

From Theorem 4.3, we can see that the ideal importance function is , that is, the importance function which maps each maximal element to the number of extensions that begin with that element. This importance function would give us for all , making for all posets .

Of course, this ideal importance function is impossible to achieve, since if we knew its values, we would not need an estimation algorithm. In Section 5, we discuss more practical importance functions.

## 5 The Importance Function

If the importance function is uniform across all elements at each decision point, then our algorithm is simply Knuth’s algorithm again. On the other hand, if a good importance function is available, one that reflects the actual structure of the decision tree, excellent results can sometimes be obtained, as in , . Here we present two importance functions, each with its own strengths and weaknesses, which we will discuss.

### 5.1 The Number of Descendants

For the first importance function, we define to be , the number of descendants of , including itself. An example is shown in Figure 2.

The idea for this importance function came from the observation that each vertex must appear in a linear extension before all of its descendants, which themselves have fewer descendants. In fact, if we order the vertices by decreasing number of descendants, we obtain a linear extension.

We also obtain a useful lower bound from this importance function. The sum over the maximal elements of the number of their descendants must always be at least the number of vertices which have not been chosen yet, hence a lower bound for our samples and so for itself is

 |Λ(P)|≥n!d(v1)d(v2)⋯d(vn)

Here the denominator is the product over all and is simply the observation that every is processed exactly once when generating a sample.

If the poset can be represented by a DAG that is a forest, then no maximal elements share descendants, and so every sample is equal to this lower bound, which is therefore exact. In fact, these observations serve as an alternative proof of the formula for the number of linear extensions of a forest. We also note in passing that this gives a way to sample from linear extensions of a forest uniformly at random.

In addition to a lower bound, this importance function also gives us an upper bound for each sample. Consider that when constructing a linear extension, each element of the poset becomes maximal exactly once. An element which was not maximal in the original poset becomes maximal when its last remaining ancestor and the ancestor’s edge to are deleted. If we collect this edge for each element of the poset, we have a spanning forest, , of the poset .

The number of linear extensions of this forest is clearly an upper bound for , and can be calculated exactly as

 |Λ(F)|=n!dF(v1)dF(v2)⋯dF(vn)

where is the number of descendants of in . Hence this upper bound depends on which forest we construct.

### 5.2 The Available Spaces Quotient

The second importance function we will discuss is

 r(v)=i+d(v)−1i−d(v)+1

where is the number of elements that have not yet been added to the linear extension.

Unlike the first importance function, this importance function changes depending on how far along in constructing the linear extension we are. This clearly adds to the running time of the algorithm, but it does not increase the order of its complexity.

The motivation for this function came from the intuition that in addition to favoring elements with more descendants, we should also favor elements with fewer available spaces left. At any point during the construction of a linear extension, if there are elements left to be added to the extension, then the number of available spaces for element is , since must be chosen before all of its proper descendants.

By placing the formula for the number of available spaces in the denominator, the importance of an element increases dramatically as the number of spaces available for that element decreases. The particular formula for the numerator, on the other hand, was chosen as a result of numerical experimentation.

This importance function, unlike the first, does not give exact estimates for any known set of posets, nor has it given us any bounds on the number of linear extensions. Its sole virtue is that it significantly reduces the observed variance of estimates in numerical testing. See Section 7 for numerical results.

## 6 Improvements on the Algorithm

Here we discuss a method of reducing the variance of our estimates by modifying the way in which we construct linear extensions.

Consider a DAG that, when undirected, consists of several connected components . Each linear extension of can be obtained by first partitioning the positions into sets of size , and then filling the positions selected for each component with a linear extension of that component. Hence

 |Λ(D)|=|D|!|D1|!|D2|!⋯|Dk|!|Λ(D1)|⋅|Λ(D2)|⋯|Λ(Dk)|

This suggests a recursive algorithm whose pseudocode appears in Algorithm 2.

The order of the time complexity for Algorithm 2 is, at worst, . This is because we have added the additional step of searching for connected components, which executes times and in the worst case takes quadratic time in the size of the poset being searched. See Section 7 for numerical results on the variance of Algorithm 2.

## 7 Numerical Results

Numerical tests of Algorithm 1 were implemented in C++ using a sparse representation of the posets. Given the poset elements , for each pair of elements and with , the relation was given a 20% probability to exist using a pseudorandom number generator. The posets were then transitively completed.

For each value of from , we generated posets in this manner, and SIS estimates were performed on each poset. The relative variance of the estimates was calculated for each poset, and the results averaged for each value of .

The results for the different importance functions were of differing orders of magnitude; therefore, they are compared in Figure 3 on a log-log scale. Linearity on a log-log scale suggests a power function relationship where the exponent is the slope of the line. Figure 3: Comparison of the average relative variance of the three importance functions, on a log-log scale.

Numerical tests of Algorithm 2, the Recursive Connected Components algorithm, were performed in the same manner as for Algorithm 1. The results in Figure 4 show a modest reduction in the average relative variance for each of the three importance functions when recursion is used. The relative size of this reduction appears to diminish as the poset size increases. Figure 4: Comparison of the Recursive Connected Components algorithm with the original algorithm for the three importance functions, on a log-log scale.

These numerical results show that the variance of our method does not grow too quickly, and that our importance functions significantly reduce the variance as compared to a uniform importance function.

## Acknowledgements

We thank Myra Deng and Amanda Crawford for work done on early versions of the algorithm.

## References

•  A.V. Aho, M.R. Garey, and J.D. Ullman, The transitive reduction of a directed graph, SIAM J. Comput. 1 (1972), no. 2, 131–137.
•  I. Beichl and F. Sullivan, Approximating the permanent via importance sampling with applications to the dimer covering problem, Journal of Computational Physics 149 (1999), 128–147.
•  Graham Brightwell and Peter Winkler, Counting linear extensions, Order 8 (1991), no. 3, 225–242.
•  Russ Bubley and Martin Dyer, Faster random generation of linear extensions, Discrete Mathematics 201 (1999), no. 1, 81–88.
•  David Harris, Francis Sullivan, and Isabel Beichl, Fast sequential importance sampling to estimate the graph reliability polynomial, Algorithmica 68 (2014), no. 4, 916–939.
•  Arthur B. Kahn, Topological sorting of large networks, Commun. ACM 5 (1962), no. 11, 558–562.
•  Donald E. Knuth, Estimating the efficiency of backtrack programs, Mathematics of Computation 29 (1975), no. 129, 121–136.
•  David Bruce Wilson, Mixing times of lozenge tiling and card shuffling markov chains, The Annals of Applied Probability 14 (2004), no. 1, 274–325.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   