Ranked Enumeration of Minimal Triangulations
Tree decompositions facilitate computations on complex graphs by grouping vertices into bags interconnected in an acyclic structure; hence their importance in a plethora of problems such as query evaluation over databases and inference over probabilistic graphical models. Different applications take varying benefits from different tree decompositions, and hence, measure them by diverse (sometime complex) cost functions. For generic cost functions (such as width or fill-in), an optimal tree decomposition can be computed in some cases, notably when the number of minimal separators is bounded by a polynomial (due to Bouchitte and Todinca); we refer to this assumption as “poly-MS.” Yet, in general, finding an optimal tree decomposition is computationally intractable even for these cost functions, and approximations or heuristics are commonly used. Furthermore, the generic cost functions hardly cover the benefit measures needed in practice. Therefore, it has recently been proposed to devise algorithms for enumerating many decomposition candidates for applications to select from using specialized, or even machine-learned, cost functions.
We present the first algorithm for enumerating the minimal triangulations of a graph by increasing cost, for a wide class of cost functions. Consequently, we get ranked enumeration of the (non-redundant) tree decompositions of a graph, for a class of cost functions that substantially generalizes the above generic ones. On the theoretical side, we establish the guarantee of polynomial delay if poly-MS is assumed, or if we are interested only in decompositions of a width bounded by a constant. Lastly, we describe an experimental evaluation on graphs of various domains (join queries, Bayesian networks and random graphs), and explore both the applicability of the poly-MS assumption and the performance of our algorithm relative to the state of the art.
A tree decomposition of a graph is a tree such that each vertex of is associated with a bag of vertices of , every edge of appears in at least one bag, and every vertex of occurs in a connected subtree of . Tree decompositions are useful in common scenarios where problems are intractable on general structures, but are nevertheless tractable on acyclic ones. A beneficial tree decommission is one with properties that allow for efficient computation. This benefit is typically estimated by a cost function, the most popular being the width, which is the cardinality of the largest bag (minus one) and the fill in, which is the number of missing edges among bag neighbors. The generalization to hypergraphs is that of a generalized hypertree decomposition, which is a tree decomposition of the primal graph (consisting of an edge between every hyperedge neighbors), and a coverage of each bag by hyperedges to give rise to specialized costs (Gottlob et al., 2005b) such as hypertree width (Gottlob et al., 2002), generalized hypertree width (Gottlob et al., 2009), and fractional hypertree width (Marx, 2010). The applications of tree decompositions and generalized hypertree decompositions include optimization of join queries in databases (Tu and Ré, 2015; Gottlob et al., 2005a), solvers for constraint satisfaction problems (Kolaitis and Vardi, 2000), RNA analysis in bioinformatics (Zhao et al., 2006), computation of Nash equilibria in game theory (Gottlob et al., 2005a), inference in probabilistic graphical models (Lauritzen and Spiegelhalter, 1988), and weighted model counting (Kenig and Gal, 2015).
Computing an optimal tree decomposition is NP-hard for the classic cost measures, including the ones aforementioned. Therefore, heuristic algorithms are often used (Berry et al., 2002; Berry et al., 2006a). But even regardless of the computational hardness, applications often require specialized costs that are not covered by the classics. For instance, in the context of weighted model counting there are costs associated with the “CNF-tree” of the formula (Kenig and Gal, 2015; Gottlob et al., 2005b). In the work of Kalinsky et al. (Kalinsky et al., 2017) on database join optimization, the execution cost is dominated by the effectiveness of the adhesions (intersection of neighboring bags) for caching, particularly the associated skew. They show real-life scenarios where isomorphic tree decompositions (of minimum width) feature orders-of-magnitude difference in performance. Abseher et al. (Abseher et al., 2017) designed a machine-learning framework to learn the cost function of a tree decomposition in various problems, using various features of the tree decomposition.
Motivated by the above phenomena, Carmeli et al. (Carmeli et al., 2017) have embarked on the challenge of enumerating tree decompositions; that is, generating tree decompositions one by one so that an application can stop the enumeration at any time and select the decomposition that best suits its needs. As they point out, in enumeration it is essential to avoid of redundancy—it does not make sense to generate a tree decoposition that is useless or clearly subsumed by another. For example, if a graph is already a tree, then there is no need to further group its vertices. Hence, following Carmeli et al. (Carmeli et al., 2017), we consider the task of enumerating the proper tree decompositions, which are intuitively the ones that cannot be improved by splitting a bag or removing it altogether. They have shown that the proper tree decompositions are precisely the clique trees of the minimal triangulations. A triangulation of a graph is a chordal graph obtained from by adding edges, called fill edges. A triangulation is minimal if no triangulation has a strict subset of the fill edges. In fact, Carmeli et al. (Carmeli et al., 2017) proved that for enumerating tree decompositions, it suffices to enumerate the minimal triangulations.
While algorithms for generating pools of tree decompositions have been proposed in the past for small graphs (representing database queries) (Tu and Ré, 2015), Carmeli et al. (Carmeli et al., 2017) have presented the first algorithm that has both completeness and efficiency guarantees; that is, it can generate all minimal triangulations (and by implication all proper tree decompositions), and it does so in incremental polynomial time, which means that the time between producing the th result and the st result is polynomial in and in the size of the input (Johnson et al., 1988). Nevertheless, there can be exponentially many minimal triangulations, and an effective enumeration algorithm needs to produce earlier the triangulations that are likely to be low cost. Ideally, we would like the algorithm to enumerate the minimal triangulations by increasing relevant cost such as width (of some version) or fill-in. Carmeli et al. (Carmeli et al., 2017) use heuristics to affect the enumeration order, but provide no guarantees. Of course, without making assumptions they could not guarantee efficient ranked enumeration, since it is already intractable to compute the first (best) triangulation.
Yet, in some important classes of graphs there is a polynomial-time algorithm for computing a tree decomposition of a minimum weight and/or fill-in. These include the chordal and weakly chordal graphs, interval graphs, circular-arc graphs, and cographs. One of the most significant properties of graph classes that allow for polynomial-time computation of is due to Bouchitté and Todinca (Bouchitté and Todinca, 2001; Bouchitté and Todinca, 2002): having a polynomial upper bound (in the size of the graph) on the number of minimal separators. All of the above graphs classes satisfy this property (see (Fomin et al., 2015)). A minimal separator of a graph is a set of nodes such that for some nodes and it is the case that separates between and , but no proper subset of does so. (See Section 5 for the formal definition.) We refer to this property as poly-MS. Various problems have been studied in the context of the poly-MS assumption (Fomin et al., 2015), including graph isomorphism (Otachi and Schweitzer, 2014).
The decomposition algorithm of Bouchitté and Todinca (Bouchitté and Todinca, 2001; Bouchitté and Todinca, 2002) consists of two main steps. First, they construct the set of minimal separators of the input graph, for example using the algorithm of Berry et al. (Berry et al., 1999), and from these compute the set of all potential maximal cliques (which are essentially the bags of the proper tree decompositions) (Bouchitté and Todinca, 2002). Second, they use the potential maximal cliques in order to find an optimal triangulation. In fact, their algorithm has two variants—one for minimal width and one for minimal fill-in. The second step has been later generalized to allow for positive weights on bags (in the case of width) and edges (in the case of fill) by Furuse and Yamazaki (Furuse and Yamazaki, 2014), again presenting two corresponding variants of their algorithm.
Our first contribution is a generalization of the concepts of width and fill-in to general cost functions over tree decompositions. These cost functions satisfy two properties. First, they assign the same cost to tree decompositions with the same bags; hence, these are essentially costs over the set of bags. Second, and more importantly, they are monotonic in the following (informal) sense. Suppose that we cut a tree decomposition along an edge, and replace one of the sides with an alternative subtree (which is a tree decomposition of a subgraph of the original graph), resulting in a tree decomposition ; if the altenative subtree does not cost more than the one it replaced, then the cost of is no greater than that of . We call such a cost function split monotone, and refer the reader to Section 3 for the precise definition. Importantly, split-monotone cost functions generalize existing costs such as fill-in, width and generalized/fractional hypertree width, as well as the weighted width and fill-in of Furuse and Yamazaki (Furuse and Yamazaki, 2014). Moreover, we can come up with various motivated split-monotone costs that are not among the classic ones, such as the sum over the (exponents of the) bag cardinalities and linear combinations of width and fill-in. We present a generalization of the algorithm of Bouchitté and Todinca (Bouchitté and Todinca, 2001) to general split-monotone cost functions. As we explain later, the importance of supporting general cost functions is not just for the sake a richer costs; even if we are interested just in width or fill-in, we need the flexibility of the cost function in order to incorporate constraints that we later use to devise our algorithm for ranked enumeration.
Our main theoretical contribution of is an algorithm that enumerates minimal triangulations by increasing cost, for any split-monotone cost function that is polynomial-time computable (e.g., the aforementioned ones). We provide two variants of the algorithm, each yielding a different complexity result. The first variant enumerates all minimal triangulations, and does so with polynomial delay if the input is from a poly-MS class of graphs. The second enumerates all minimal triangulations of a bounded width, and it does so with polynomial delay if the bound on the width is a fixed constant (that affects the degree of the polynomial). Polynomial delay (Johnson et al., 1988) means that the time between every two consecutive answers is polynomial in the size of the input (graph), a guarantee that is stronger than incremental polynomial time. Due to the previously discussed connection between proper tree decompositions and minimal triangulations, we get algorithms with the same guarantees for the enumeration of proper tree decompositions. Observe that these algorithms imply polynomial-time procedures for computing top- minimal triangulations and/or proper tree decompositions. To the best of out knowledge, these are the first enumeration algorithms for minimal triangulations (and proper tree decompositions) with completeness, efficiency, and order guarantees.
Our technique for ranked enumeration deploys the generic procedure of Lawler-Murty (Lawler, 1972; Murty, 1968) for ranked enumeration. This procedure can be described abstractly as follows. There is a set of items, and the goal of the procedure is to enumerate itemsets by increasing cost. To do so, the procedure assumes that one can compute in polynomial time a lowest-cost itemset subject to constraints, and there are two types of constraints: inclusion constraint—a specific item needs to be present in the itemset, and an exclusion constraint—the item needs to be absent. So, for our deployment, we need to define what the items and itemsets are, and we need to solve the corresponding constrained optimization problem.
Here, we use a result by Parra and Scheffler (Parra and Scheffler, 1997) who show that a minimal triangulation is fully identified by its set of minimal separators. Moreover, due to Rose (Rose, 1970) it is known that a minimal triangulation has fewer minimal separators than nodes. Hence, to adopt Lawler-Murty we define items as node sets, and itemsets as the collections of minimal separators of the minimal triangulations. To efficiently solve the constrained optimization problem, we show that inclusion and exclusion constraints on minimal separators can be complied into any split-monotone cost function so that the resulting cost remains split monotone. Furthermore, if the original cost function can be computed in polynomial time, then so can the new cost function with the constraints compiled in.
Finally, we describe an implementation of our algorithm and an experimental study. We conduct experiments over the datasets of Carmeli et al. (Carmeli et al., 2017) that consist of three types of graphs: probabilistic graphical models (from the 2011 Probabilistic Inference Challenge), database queries (TPC-H), and random (Erdős-Rényi) graphs. We conduct a comparison of our algorithm against the enumeration of Carmeli et al. (Carmeli et al., 2017) on both the execution time and the quality (width/fill) of the generated triangulations. In addition, we explore the validity of the poly-MS assumption on our datasets; that is, we provide statistics on the number of minimal separators, and explore the portion of the instances where this number is “manageable.”
The remainder of the paper is organized as follows. We first present preliminary definitions and terminology in Section 2. Then, in Section 3 we describe the notion of split monotonicity, and we give our main theoretical results in Section 4. The proof of these results, and in particular the algoritms that realize the results, are presented in Sections 5–7. Specifically, Section 5 provides background on the central concepts of minimal separators and potential maximal cliques, Section 6 presents our algorithm for computing a minimum-cost minimal triangulation, and Section 6 discusses the adaptation of Lawler-Murty to our enumeration. Finally, we describe our implementation and experimental study in Section 8, and conclude in Section 9.
We begin by introducing the basic notation, terminology and formal concepts that we use throughout the paper.
2.1. Graphs and Cliques
All the graphs in this paper are undirected. We denote by and the set of vertices and edges, respectively, of a graph . An edge in is a pair of distinct vertices in .
A set of vertices of a graph is a clique (of ) if every two vertices in are connected by an edge of . The set is a maximal clique (of ) if is not strictly contained in any other clique of . We note by and the set of al cliques and maximal cliques of , respectively.
2.2. Tree Decompositions
A tree decomposition of a graph is a pair , where is a tree and is a function that maps every vertex of to a set of nodes of , so that all of the following hold.
Vertices are covered: for every vertex of there is a vertex of such that .
Edges are covered: for every edge of there is a vertex of such that .
The junction-tree property: for all vertices and of , the intersection is contained in every vertex along the path between and .
Let be a graph, and let be a tree decomposition of . A set , for , is called a bag of . We denote by the set .
Let and be two tree decompositions of a graph . We say that bag-contains in if there is an injection such that for all . We say that and are bag equivalent if bag-contains and vice versa. We say that strictly subsumes if is obtained from by splitting a bag or removing it altogether. More formally, strictly subsumes if there is a mapping such that for all , and for at least one it is the case that whenever (hence, either no node is mapped to or node that is mapped to is a strict subset of ) (Carmeli et al., 2017).111This is a simplified, yet equivalent definition to that of Carmeli et al. (Carmeli et al., 2017).
Example 2.1 ().
Figure 1(b) depicts five tree decompositions of the graph of Figure 1(a). Each rectangle (with rounded corners) corresponds to a node of the tree, and the bag is depicted inside the rectangle. As an example, if we denote , then is a path of three nodes (corresponding to the three rectangles), and for the top node we have .
Observe that and are bag equivalent, since they have the exact same bags (though connected differently). The tree decomposition strictly subsumes , since the latter is obtained from the former by adding to the bottom bag. Moreover, strictly subsumes , since the former is obtained from the latter by splitting the bottom bag into two—the middle and bottom nodes fo . Therefore, and are not proper. We will later show that and (and, hence, ) are proper.∎
2.3. Minimal Triangulations
Let be a graph. A cycle in is a path that starts and ends with the same vertex. A chord of a cycle is an edge that connects two nodes that are non-adjacent in . We say that is chordal if every cycle of length greater than three has a chord. Whether a given graph is chordal can be decided in linear time (Tarjan and Yannakakis, 1984).
A triangulation of a graph is a chordal graph that is obtained from by adding edges. The fill set of a triangulation of is the set of edges added to , that is, . A minimal triangulation of is a triangulation of such that the fill set of is not strictly contained in the fill set of any other triangulation; that is, there is no chordal graph with and . In particular, if is already chordal then is the only minimal triangulation of itself.
2.3.1. Clique Trees
Let be a graph. A clique tree of is a tree decomposition of such that is bijection between and . In other words, is a clique tree of if and no two bags are the same. The following is known, and recorded for later use.
Theorem 2.2 ().
Example 2.3 ().
Continuing Example 2.1, observe that the graph of Figure 1(a) is not chordal. As one evidence, it has the chordless cycle . Figure 1(c) depicts two minimal triangulations, and , of the graph of Figure 1(a). The reader can verify that and of Figure 1(b) are clique trees of and , respectively. In particular, we conclude that and are proper tree decompositions. ∎
2.4. Ranked Enumeration
An enumeration problem is a collection of pairs where is an input and is a finite set of answers for , denoted by . A solver for an enumeration problem is an algorithm that, when given an input , produces (or prints) a sequence of answers such that every answer in is printed precisely once. A solver for an enumeration problem is also referred to as an enumeration algorithm.
Johnson, Papadimitriou and Yannakakis (Johnson et al., 1988) introduced several different notions of efficiency for enumeration algorithms, and we recall these now. Let be an enumeration problem, and let be solver for . We say that runs in:
polynomial total time if the total execution time of is polynomial in ;
polynomial delay if the time between printing every two consecutive answers is polynomial in ;
incremental polynomial time if, after printing a sequence of answers, the time to print the next answer is polynomial in where is the size of the representation of .
Observe that a solver that enumerates with polynomial delay also enumerates with incremental polynomial time, which, in turn, implies polynomial total time.
Let be an enumeration problem. A cost function for is a function that associates a numerical cost to each input and answer for . A solver for is said to enumerate by increasing , where is cost function for , if for every two answers and produced by , if is produced before then .
3. Monotone Cost Functions
By a cost function over tree decompositions we refer to a function that maps a graph and a tree decoposition for to a numerical (positive, negative or zero) value . In this section we define a class of such cost functions that includes many of the common costs such as width and fill. This class is defined by means of monotonicity, as we formally defined next.
Let be a graph, and let be a tree decomposition of . Every edge of connects two unique subtrees of —one connected to and one connected to . Let be an edge of , let and be the two subtrees connected by , and let and be the restrictions of to and , respectively. Let and , and let and be the subgraphs of induced by the nodes in the bags of and , respectively. Then we say that splits (by ) as . The following proposition is straightforward.
Proposition 3.1 ().
Let be a graph, a tree decomposition of . If splits as , then is a tree decomposition of and is a tree decomposition of .
From Proposition 3.1 it follows that if is a cost function and splits as , then both and are defined. We can now define properties of cost functions.
Definition 3.2 ().
Let be a cost function over tree decompositions. We say that is:
invariant under bag equivalence, if for all graphs and tree decompositions and of , if and are bag equivalent then .
split monotone if for all graphs and tree decompositions and of , if and split as and , respectively, and for , then .
If is invariant under bag equivalence, then it is essentially a scoring function over the collection of bags, and in that case we say that is a bag cost.
For illustration, the following most popular cost functions are both split-monotone bag costs.
: the maximum cardinality of a bag, minus one.
: the number of edges required to saturate all bags.
Other such cost functions are the generalizations of width and fill-in introduced by Furuse and Yamazaki (Furuse and Yamazaki, 2014), where it is assumed that each bag has a cost , and each edge has a cost . Then, they define to be the maximal score of a bag, and to be the sum of costs of the edges required to saturate all bags. As a special case, if the graph is the primal graph of a hypergraph, then can be the minimal number of hyperedges needed to cover , or the minimal weight of a fractional edge cover of , thereby establishing the popular cost functions of hypertree width (Gottlob et al., 1999) and fractional hypertree width (Grohe and Marx, 2014).
Finally, another intuitive split-monotone bag costs is
that effectively establishes the lexicographic ordering of the width followed by the fill-in of .
We can then use a bag cost as a cost function over triangulations , by defining the cost as where is any clique tree of . Since is invariant under bag equivalence (being a bag cost), then the choice of does not matter. By a slight abuse of notation, we use to denote the resulting cost function over triangulations of .
4. Main Theoretical Results
In this section we present the main theoretical results of the paper. These results are upper bounds (existence of algorithms) on problems of ranked enumeration of tree decompositions and minimal triangulations. In the next two sections we will describe the algorithms that realize these results.
Recall that a graph may have an exponential number of minimal separators. Our main result holds for the case where this number is reasonable, a case that was deeply investigated in past research (Furuse and Yamazaki, 2014; Bouchitté and Todinca, 2002; Montealegre and Todinca, 2016; Liedloff et al., 2015). Formally, we consider classes of graph such that some polynomial it is the case that for all . We then say shortly that is a poly-MS class of graphs (where “MS” stands for Minimal Separators). Later in this paper we empirically study the applicability of this assumption on real and synthetic datasets.
Before presenting our results, we recall some relevant results from the literature. Carmeli et al. (Carmeli et al., 2017) showed that, without making any assumption, one can enumerate in incremental polynomial time the set of all proper tree decompositions and the set of all minimal triangulations. Note, however, that no guarantee is made on the order of enumeration.
Theorem 4.1 ().
((Carmeli et al., 2017)) Given a graph , one can enumerate in incremental polynomial time all proper tree decompositions, and all minimal triangulations.
Parra and Scheffler (Parra and Scheffler, 1997) showed that minimal triangulations are in one-to-one correspondence with the maximal independent sets of the graph that has the minimal separators as vertices, and an edge between every two crossing separators. Combining that with results on the enumeration of maximal independent sets (Johnson et al., 1988; Cohen et al., 2008), we get that that for poly-MS classes of graphs, the minimal separators can be enumerated with polynomial delay (again with no guarantees on the order). Moreover, as we explain in the next section, such enumeration automatically translates into an algorithm for enumerating the proper tree decompositions with polynomial delay. Hence, we get the following.
Theorem 4.2 ().
(see (Carmeli et al., 2017)) If is a poly-MS class of graphs, then one can enumerate with polynomial delay all proper tree decompositions, and all minimal triangulations.
Bouchitté and Todinca showed that on poly-MS classes of graphs, a tree decomposition (or triangulation) of a minimal width or fill-in can be found in polynomial time.
Theorem 4.3 ().
((Bouchitté and Todinca, 2002)) Let be a poly-MS class of graphs. One can find in polynomial time a minimal-cost tree decomposition (or triangulation) when the cost is either the width or the fill in.
We now turn to our results. The main result generalizes Theorems 4.2 and 4.3 in two directions. First, the enumeration is ranked. Second, the cost function is not just width of fill-in, but in fact every bag cost that is split monotone and computable in polynomial time.
Theorem 4.4 ().
Let be a poly-MS class of graphs, and let be a bag cost that is split monotone and computable in polynomial time. On graphs of one can enumerate with polynomial delay all:
proper tree decompositions by increasing ;
minimal triangulations by increasing .
Finally, the next result applies to general graphs, and assumes that we are interested only in tree decompositions of a bounded width. In this case, we get a ranked enumeration with polynomial delay without assuming an upper bound on the number of minimal separators.
Theorem 4.5 ().
Let be a fixed natural number, and let be a bag cost that is split monotone and computable in polynomial time. Given a graph, one can enumerate with polynomial delay all:
proper tree decompositions of width at most by increasing ;
minimal triangulations of of width at most by increasing .
5. Minimal Separators and Potential Maximal Cliques
In this section we recall some concepts and results from the literature that our enumeration algorithm (described in the next two sections) builds upon. We begin with some additional general notation.
|/||Vertex/edge set of|
|Maximal cliques of|
|Full graph (clique) over node set|
|Minimal separators of|
|Full -components of|
|Realization of , that is,|
|Potential maximal cliques of|
|Minimal separators associated to of|
|Full blocks associated to in|
5.1. Additional Graph Notation
A subgraph of a graph is a graph with and . Let be a set of vertices of . We denote by the subgraph of that is induced by ; that is, is the graph with and .
Let be a graph, and a set of vertices of . We denote by the graph obtained from by removing all vertices in (along with their incident edges); that is, is the graph . A -component (of ) is a connected component of the graph . We denote the set of all -components of by , or only if is clear from the context. Recall that a connected component is a subset of such that contains a path from each node of to every other node of , and to none of the nodes outside .
The union of two graphs ad , denoted , is the graph with and .
Let be a graph, and a set of vertices of . We denote by is the complete graph over a vertex set ; that is, is the graph with and (hence, itself is a clique of ). By saturating (in ) we refer to the operation connecting every non-adjacent vertices in by a new edge, thereby making a clique of . In other words, saturating refers to the operation of replacing with .
5.2. Minimal Separators
Let be a graph, and let and be vertices of . A -separator (w.r.t. ) is a set such that and belong to different connected components in ; that is, does not contain any path between and (or equivalently, every path between and visits one or more vertices of ). We say that is a minimal -separator if no proper subset of is a )-separator. We say that is a minimal separator of if there are vertices and such that is a minimal -separator. We denote by the set of all minimal separators of .
Let be a graph, and let and be two minimal separators of . We say that crosses , in notation , if there are vertices and in such that is a -separator. If is clear from the context, we may omit it and write simply . It is known that is a symmetric relation: if crosses then crosses (Parra and Scheffler, 1997; Kloks et al., 1997). Hence, if then we may also say that and are crossing. When and are non-crossing, then we also say that and are parallel.
Example 5.1 ().
We continue with our running example. Figure 2(a) depicts three minimal separators , and of the graph of Figure 1(a). For instance is a minimal -separator, is a minimal -separator, and is a minimal -separator. Note that is a -separator but not a minimal -separator, since a strict subset of , namely , is a -separator. Also note that and are crossing, since is a -separator (and also is a -separator).
This example shows that, albeit being “minimal,” a minimal separator can be a strict subset of another; for instance .
It can be verified that , and are the only minimal separators of . Hence . ∎
Next, we recall a few central results from the literature that are needed for our algorithm. Dirac (Dirac, 1961) has shown a characterization of chordal graphs by means of their minimal separators.
Theorem 5.2 ().
(Dirac (Dirac, 1961)) A graph is chordal if and only if every minimal separator of is a clique.
Parra and Scheffler (Parra and Scheffler, 1997) established the following connection between minimal triangulations and maximal sets of pairwise-parallel minimal separators. By that we mean that every two distinct members of are parallel, and moreover, every minimal separator not in is crossing at least one member of .
Theorem 5.3 ().
(Parra and Scheffler (Parra and Scheffler, 1997)) Let be a graph.
Let be a maximal set of pairwise-parallel minimal separators of , and let be obtained from by saturating each member of . Then is a minimal triangulation of having .
Conversely, if is a minimal triangulation of , then is a maximal set of pairwise-parallel minimal separators in , and is obtained from by saturating each member of .
Blair and Peyton (Blair and Peyton, 1993) characterized the minimal separators of a chordal graph by means of its clique tree.
Theorem 5.4 ().
(Blair and Peyton (Blair and Peyton, 1993)) Let be a chordal graph and be any clique tree of . A vertex set is a minimal separator of if and only if there exist two vertices such that and .
Rose (Rose, 1970) proved that a chordal graph has fewer minimal separators than vertices.
Theorem 5.5 ().
(Rose (Rose, 1970)) If is chordal, then .
A graph may have exponentially many minimal separators. Berry et al. (Berry et al., 1999) gave an algorithm that enumerates the minimal separators in polynomial total time.222Carmeli et al. (Carmeli et al., 2017) noticed that with a minor modification, that algorithm can enumerate with polynomial delay.
Theorem 5.6 ().
(Berry et al. (Berry et al., 1999)) The minimal separators of a graph can be enumerated in polynomial total time.
5.3. Components and Blocks
Let be a graph, and let be a minimal separator of . An -component is said to be full if every vertex in is connected to one or more vertices in . We denote by the set of full -components. A block (of ) is a pair where is a minimal separator and is an -component (i.e., and in our notation). By a slight abuse of notation, we often identify the block with the vertex set . A block is full if is a full component ( in our notation). The realization of the block , denoted , is the induced graph of after saturating ; that is:
When is clear from the context, we may remove it from the subscripts and write simply and .
Example 5.7 ().
Recall from Example 5.1 that for the graph of our running example (Figure 1(a)) we have , where the are depicted in Figure 2(a). Figure 2(b) shows the difference realizations of the -blocks. The edges that have been added in the saturation are colored red. Note that all of the blocks are full, except for where no node of is connected to . ∎
5.4. Potential Maximal Cliques
Let be a graph. A vertex set is a Potential Maximal Clique (PMC for short) if there is a minimal triangulation of such that is a maximal clique of . Due to Theorem 2.2 we conclude that a vertex set is a PMC if and only if it is a bag of some proper tree decomposition of . We denote by the set of potential maximal cliques of . Bouchitté and Todinca (Bouchitté and Todinca, 2001) established the following connection between PMCs and blocks.
Theorem 5.8 ().
(Bouchitté and Todinca (Bouchitté and Todinca, 2001)) Let be a graph, , and . Let be the set of all nodes in that are neighbors of nodes in . Then and is a full block of .
The minimal separator and block of Theorem 5.8 are said to be associated to (in ). We denote by and the sets of minimal separators and full blocks, respectively, associated to . When is clear from the context, we may omit it and write simply and .
Theorem 5.9 ().
(Bouchitté and Todinca (Bouchitté and Todinca, 2001)) Let be a graph and . The set of minimal separators such that is exactly the set .
Example 5.10 ().
Bouchitté and Todinca (Bouchitté and Todinca, 2002) have shown that, given a graph and its set of minimal separators, the set of potential maximal cliques of can be computed in polynomial time.
Theorem 5.11 ().
(Bouchitté and Todinca (Bouchitté and Todinca, 2002)) can be computed in polynomial time in the size of and .
6. Computing an Optimal Minimal Triangulation
In this section we present an algorithm for computing a minimum-cost minimal triangulation, assuming that the cost function is a split-monotone bag cost. Our algorithm terminates in polynomial time if the cost function can be evaluated in polynomial time, and moreover, the input graphs belong to a poly-MS class of graphs. Our algorithm generalizes an algorithm by Bouchitté and Todinca (Bouchitté and Todinca, 2001) for computing the treewidth and minimum fill-in over a poly-MS class of graphs. Later in this section we will consider the restriction to triangulations of a bounded width, and the incorporation of inclusion and exclusion constraints that are critical for the enumeration algorithm of the next section.
6.1. Algorithm Description
To describe the algorithm, we first give some background. The Bouchitté-Todinca algorithm is based on the observation that a minimal triangulation of a graph is composed of minimal triangulations over realizations of its blocks. This is formalized in the following theorem:
Theorem 6.1 ().
(Bouchitté and Todinca (Bouchitté and Todinca, 2001)) The following hold for a graph .
If is a minimal triangulation of and , then for all the graph is a minimal triangulation of the realization .
Conversely, let and . For let be a minimal triangulation of . Then is a minimal triangulation of .
Observe that implies that is a potenial maximal clique. Theorem 6.1 provides a characterization of the minimal triangulations in terms of the minimal triangulations of the block realizations. Then, how do we proceed to computing the minimal triangulations of the block realizations? This is shown in the following result.
Theorem 6.2 ().
(Bouchitté and Todinca (Bouchitté and Todinca, 2001)) Let be a graph, , and a full block of . Let . Let be a graph with and . The following are equivalent.
is a minimal triangulation of .
There exists such that and , where and is a minimal triangulation of for all .
Now, consider an input for our algorithm and . We will assume that for each our algorithm has computed a minimal triangulation for the realization . We then define the following.
1: Compute and
2: the set of full blocks of
3: for by increasing cardinality do
Our algorithm, , is depicted in Figure 3. It applies dynamic programming based on Equation (1). The algorithm is parameterized by a split-monotone bag cost , takes as input a graph , and computes a minimum- minimal triangulation of .
The algorithm begins by computing and . Assuming that belongs to a poly-MS class of graphs, this step can be done efficiently by applying Theorems 5.6 and 5.11. Then, the set of full blocks of are computed and traversed by the order of ascending cardinality (beginning with such that is minimal) in the loop of line 5.11. In the iteration of , the optimal triangulation of is computed.
When processing a block , the algorithm selects a potential maximal clique where , to be saturated according to Equation (1), such that the cost of the resulting triangulation of is minimized. The saturated node set is stored as (line 4). By a slight abuse of notation, we denote by the set . The chosen optimal triangulation of is then saved as for later use (line 5). The processing order of the blocks allows larger blocks to evaluate each potential maximal clique based on the previously calculated optimal triangulation for each of the realizations of its smaller blocks. That is, for each block , the term in Equation (1) will refer to previously computed .
Bouchitté and Todinca (Bouchitté and Todinca, 2001; Bouchitté and Todinca, 2002) proved the runtime for this algorithm is polynomial in the number of minimal separators of the input graph. We prove the algorithm’s correctness in the appendix. We summarize the correctness and efficiency of the algorithm in the following lemma.
Lemma 6.3 ().
Let be a split-monotone bag cost computable in polynomial time. returns an minimal triangulation of minimal cost in time polynomial in the number of minimal separators of the graph. Hence, if belongs to a poly-MS class of graphs then MinTriang terminates in polynomial time.
6.2. Incorporating Constraints
For the enumeration process we describe in the following section, we need to be able to apply constraints on the solution returned from . We consider two types of constraints: an inclusion constraint and an exclusion constraint, both represented as a minimal separator . A minimal triangulation satisfies sets and of inclusion and exclusion constraints, respectively, if and . We denote such a pair as , and say that satisfies if it satisfies both and .
Yet, while triangulating a realization of , we need to take into consideration two problems. First, may include nodes that are not in , so will be violated for the wrong reasons. Second, it might be the case that a minimal separator of is not a minimal separator of , but it will be a minimal separator in a triangulation that contains the triangulation. Therefore, we use the following equivalent definition (see Theorem 5.2). We say that satisfies , in notation , if for all with it holds that is a clique of if and is not a clique of if .
To incorporate constraints into our algorithm, we can simply alter our cost function and set a very high cost ( or any cost greater than that of all minimal triangulations of ) to triangulations that violate the constraints. The resulting cost, denoted , is then defined as follows.
To compute an optimal minimal triangulations over , we will show that it is a split-monotone bag cost whenever is. The proof is in the appendix.
Lemma 6.4 ().
Let be a cost function, a graph, and a set of constraints over the minimal triangulations of .
If is a split-monotone bag cost, then so is .
If can be computed in polynomial time in the size of , then can be computed in polynomial time in the size of , and .
Theorem 6.5 ().
Let be a poly-MS class of graphs and a split-monotone bag cost computable in polynomial time over . For all and constraints over , returns a minimum minimal triangulation in polynomial time.
6.3. Bounded Width
Another application of our algorithm is for problems where we are interested only in tree decompositions of a bounded (constant) width , without making the poly-MS assumption. Bounding the width of the result can be accomplished by attaching a high cost () to triangulations with maximal cliques of a larger size than , as we have done with constraints. Furthermore, any minimal separator larger than can not be saturated in our output. Blocks of these separators will be assigned a high cost as well, and can be completely disregarded in the main loop (line 3). The limit on the width bounds the number of minimal separators and potential maximal cliques our algorithm should consider. Hence, if this limit is considered constant then we get a polynomial bound on the execution time of our algorithm (again, without assuming poly-MS). This is summarized in the following theorem.
Theorem 6.6 ().
Let be a fixed natural number, and a split monotone bag cost computable in polynomial time. Given a graph and constraints , can return, in polynomial time, a minimum minimal triangulation of width at most (unless none exist).
7. From Optimization to Enumeration
In this section we discuss our algorithm for ranked enumeration of minimal triangulations. Before we do so, let us explain why it suffices to solve the problem only for minimal triangulations. The formal statement is as follows.
Proposition 7.1 ().
Let be class of graphs, and a bag cost. If, on graphs of , the minimal triangulations can be enumerated with polynomial delay by increasing , then so can the proper tree decompositions.
So, in the remainder of this section we restrict the discussion to the enumeration of minimal triangulations.
Our algorithm is a direct and standard application of Lawler-Murty’s procedure (Lawler, 1972; Murty, 1968), which reduces ranked enumeration into optimization under inclusion and exclusion constraints. Specifically, the goal of this procedure is to enumerate sets of items by an increasing cost function . Here, an item is a minimal separator of and each set is a maximal set of pairwise-parallel minimal separators. Recall from Theorem 5.3 that each such set can be identified by a minimal triangulation . In particular, the score is . The inclusion and constraints are then precisely described in the previous section. As the adaptation is standard, we defer its details to the appendix due to lack of space.
We now consider the case of enumerating the minimal triangulations of a bounded width, as stated in Theorem 4.5 (Section 4). The correctness of our reduction to Lawler-Murty is based on the fact that we can identify minimal triangulations by their minimal separators. Now, let be a fixed natural number, and suppose that we are interested in only the minimal triangulations of width bounded by . Then the cardinality of each minimal triangulation of interest is also bounded by . This is true, because the width of a triangulation is at least as large as the cardinality of each of its separators, as each minimal separator of a triangulation is necessarily a clique (Theorem 5.2).
8. Implementation and Experimental Evaluation
In this section, we describe an experimental study. The goal of our study is twofold. First and foremost, we explore the performance of our enumeration algorithm, which we refer to as . As stated earlier, this algorithm is mainly based on the algorithm of Figure 3, and adopts Lawler-Murty’s procedure (Lawler, 1972; Murty, 1968) for reducing ranked enumeration to optimization under constraints. The second goal of our experimental study is to explore the applicability of the poly-MS assumption in reality, and particularly to get an insight on how often realistic graphs have a manageable number of minimal separators.
8.1. Experimental Setup
We begin by describing the general setup for our experiments.
All algorithms were implemeted in C++, with STL data structures. We used some of the code of Carmeli et al. (Carmeli et al., 2017), which can be found on GitHub.333https://github.com/NofarCarmeli/MinTriangulationsEnumeration Specifically, we have used their implementation of the algorithm for enumerating the minimal separators by Berry et. al (Berry et al., 1999). To calculate the potential maximal cliques of a graph, we implemented the algorithm by Bouchitté and Todinca (Bouchitté and Todinca, 2002). It is important to note that the implementation of these two algorithms is direct, with no attempt of optimization. While these algorithms might take a significant portion of the time, improving their implementation is beyond the scope of this paper, and we leave it for future investigation.
We ran all experiments on a -core server with GB of RAM running Ubuntu LTS. The experiments ran single threaded, though the algorithm could be implemented on multiple threads to reduce the delay between answers, by parallelizing the main loop of using more advanced ideas such as those of Golenberg et al. (DBLP:journals/pvldb/GolenbergKS11).
8.1.3. Compared Algorithms
We compared our algorithm to the enumeration algorithm by Carmeli, Kenig and Kimelfeld (Carmeli et al., 2017), which enumerates with incremental polynomial time and has no guarantees on the order; we refer to that algorithm as . To the best of our knowledge, no other published algorithms for enumerating minimal triangulations or tree decompositions with completeness guarantees exist, with the exception of DunceCap (Tu and Ré, 2015) that is designed for small query graphs; for more details about its performance we refer the reader to Carmeli et al. (Carmeli et al., 2017).
The algorithm requires a black-box minimal triangulator. In our experiments where used (Berry et al., 2006b) for this matter, as it was found to allow for enumeration of triangulations of smaller width and fill (Carmeli et al., 2017). In principle, we could also have used our , but we chose not to do so since requires a long initialization step, and CKK applies its traingulator to many graphs that change between execution calls.
We used the datasets of Carmeli et al. (Carmeli et al., 2017). These include graphs of three types: probabilistic graphical models from the PIC2011 challenge,444http://www.cs.huji.ac.il/project/PASCAL/showNet.php Gaifman graphs of conjunctive queries translated from the TPC-H benchmark (see (Carmeli et al., 2017)), and random graphs. Random graphs were generated by the Erdös-Rényi model, where number of nodes is and every pair of nodes is (independently) connected by an edge with probability .
8.2. The Poly-MS Assumption
We start with our exploration of the poly-MS assumption, as it is needed as context for the experimental evaluation of our enumeration algorithm, described in the next sections. In this study, we attempted to generate all minimal separators, and then all potential maximal cliques, on our datasets. We describe the rates of success, and for each successul case the corresponding number of results.
8.2.1. Real-Life Graphs
In Figure 4 we report, for each dataset, the number of graphs for which each computation terminated in predefined time lengths. The chart uses the following encoding.
Terminated: Graphs where the time required to compute is under a minute, and the time required to compute is under minutes.
MS terminated: Graphs where the time required to compute under a minute, but the time to compute is over minutes.
Not terminated: Graphs where the time to compute is over minutes. There were almost no graphs that took between a minute and minutes to compute.
As expected, many graphs violate the poly-MS assumption (otherwise the NP-hard problem of computing the treewidth and fill-in would actually be tractable in all of these graphs). In some of the datasets, all of the graphs were found infeasible. The good news, which we found surprising, is that the portion of graphs with a manageable number of minimal separators is quite substantial (around 50%). The reader can also observe that in most cases, when we were able to compute the minimal separators we were also able to compute the potential maximal cliques (which is consistent with the known theory (Bouchitté and Todinca, 2002)). Figure 5 shows the distribution of the number of minimal separators (in log scale) over the MS terminated cases of the PIC2011. One can observe that these numbers are comparable to the number of edges, and are quite often much smaller.
8.2.2. Random Graphs
We ran a similar experiment on random graphs. As said earlier, our random graphs are from assorted and . We drew graphs with nodes, drawing three graphs from each probability . This allowed us to observe the correlation between the fraction of edges in the graph and the number of minimal separators. Figure 6 reports the result of these tests. When the computation time exceeded minutes we stopped the execution, and as observed it happened in the case of and (shown by the red marks). The reader can observe an interesting phenomenon—the number of minimal separators is small fo either sparse or dense graphs. In between (around ) this number blows up.