Scaling up Probabilistic Inference in Linear and Non-Linear Hybrid Domains by Leveraging Knowledge Compilation

Scaling up Probabilistic Inference in Linear and Non-Linear Hybrid Domains by Leveraging Knowledge Compilation

Anton Fuxjaeger
University of Edinburgh
anton.fuxjaeger@ed.ac.uk
   Vaishak Belle
University of Edinburgh &
Alan Turing Institute
vaishak@ed.ac.uk
Abstract

Weighted model integration (WMI) extends weighted model counting (WMC) in providing a computational abstraction for probabilistic inference in mixed discrete-continuous domains. WMC has emerged as an assembly language for state-of-the-art reasoning in Bayesian networks, factor graphs, probabilistic programs and probabilistic databases. In this regard, WMI shows immense promise to be much more widely applicable, especially as many real-world applications involve attribute and feature spaces that are continuous and mixed. Nonetheless, state-of-the-art tools for WMI are limited and less mature than their propositional counterparts.

In this work, we propose a new implementation regime that leverages propositional knowledge compilation for scaling up inference. In particular, we use sentential decision diagrams, a tractable representation of Boolean functions, as the underlying model counting and model enumeration scheme. Our regime performs competitively to state-of-the-art WMI systems, but is also shown, for the first time, to handle non-linear constraints over non-linear potentials.

1 Introduction

Weighted model counting (WMC) is a basic reasoning task on propositional knowledge bases. It extends the model counting task, or #SAT, which is to count the number of satisfying assignments to a given propositional formula [10]. In WMC, one accords a weight to every model and computes the sum of the weights of all models. The weight of a model is often factorized into weights of assignments to individual variables. WMC has emerged as an assembly language for numerous formalisms, providing state-of-the-art probabilistic reasoning for Bayesian networks  [13], factor graphs  [16], probabilistic programs  [22], and probabilistic databases  [36]. Exact WMC solvers are based on knowledge compilation  [17, 30] or exhaustive DPLL search  [33]. (Approximate WMC algorithms have been proposed as well; for example, see  [39].) These successes have been primarily enabled by the development of efficient data structures, e.g., arithmetic circuits (ACs), for representing Boolean theories, together with fast model enumeration strategies. In particular, the development of ACs has enabled a number of developments beyond inference, such as parameter and structure learning  [6, 28, 32, 24, 32]. For example, the learning of distributions becomes possible by keeping the circuit fixed and finding the best set of probabilistic parameters that fit the data. Many circuit languages exploit context-specific independence  [11], which becomes particularly useful when handling hard constraints  [28] that are troublesome to deal with in classical probabilistic models. Finally, having a data structure in hand means that multiple queries can be evaluated efficiently: that is, exhaustive search need not be re-run for each query.

However, WMC is limited to discrete finite-outcome distributions only, and little was understood about whether a suitable extension exists for continuous and discrete-continuous random variables until recently. The framework of weighted model integration (WMI)  [8] extended the usual WMC setting by allowing real-valued variables over symbolic weight functions, as opposed to purely numeric weights in WMC. The key idea is to use formulas involving real-valued variables to define a hyper-rectangle, or a hyper-rhombus, or in general, any arbitrary region of the event space of a continuous random variable, and use the symbolic weights to define the corresponding density function for that region. WMC is based on propositional SAT technology and, by extension, WMI is based on satisfiability modulo theories (SMT), which enable us to, for example, reason about the satisfiability of linear constraints over the reals [5]. Thus, for every assignment to the Boolean and continuous variables, the WMI problem defines a density. The WMI for a knowledge base (KB) is computed by integrating these densities over the domain of solutions to , which is a mixed discrete-continuous space, yielding the value for a probabilistic query. The approach is closely related to the mixture-of-polynomials density estimation for hybrid Bayesian networks  [35]. Applications of WMI (and closely related formulations) for probabilistic graphical modeling and probabilistic programming tasks have also been emerging  [14, 1, 29, 7, 12].

Given the popularity of WMC, WMI shows immense promise to be much more widely applicable, especially as many real-world applications, including time-series models, involve attribute and feature spaces that are continuous and mixed. However, state-of-the-art tools for WMI are limited and significantly less mature than their propositional counterparts. Initial developments on WMI  [8] were based on the so-called block-clause strategy, which naively enumerates the models of a theory and is impractical on all but small problems (see  [27] for a formulation). In later work,  [9] attempted to piggyback on propositional component caching methods  [2] but were restricted to interval formulas. Recently, a solver based on predication abstraction was introduced by  [29] with strong performance, but since no explicit circuit is constructed, it is not clear how tasks like parameter learning can be realized. Following that development  [25] proposed the use of extended algebraic decision diagrams  [34], an extension of algebraic decision diagrams  [3], as a compilation language for WMI. They also perform comparably to  [29].

However, while this progress is noteworthy, there are still many significant differences to the body of work on propositional circuit languages. For example, properties such as canonicity have received considerable attention for these latter languages  [37]. Many of these languages allow (weighted) model counting to be computed in time linear in the size of the obtained circuit [38]. To take advantage of these results, in this work we revisit the problem of how to leverage propositional circuit languages for WMI more carefully and develop a generic implementation regime to that end. In particular, we leverage sentential decision diagrams (SDDs)  [18] via abstraction. SDDs are tractable circuit representations that are at least as succinct as ordered binary decision diagrams (OBDDs) [18]. Both of these support querying such as model counting (MC) and model enumeration (ME) in time linear in the size of the obtained circuit (we use the term querying to mean both probabilistic conditional queries as well as weighted model counting, because the latter simply corresponds to the case where the query is true.) Because of SDDs having such desirable properties, a number of papers have dealt with more involved issues, such as learning the structure from data directly [6, 28] and thus learning the structure of the underlying graphical models.

In essence, our implementation regime uses SDDs as the underlying querying language for WMI in order to perform tractable and scalable probabilistic inference in hybrid domains. The regime neatly separates the model enumeration from the integration, which is demonstrated by allowing a choice of two integration schemes. The first is a provably efficient and exact integration approach for polynomial densities  [21, 4, 20] and the second is an unmodified integration library available in the programming language platform (Python in our case). The results obtained are very promising with regards to the empirical behavior: we perform competitively to the existing state-of-the-art WMI solver  [29]. But perhaps most significantly, owing to the generic nature of our regime, we can scale the same approach to non-linear constraints, with possibly non-linear potentials, which has not been considered in the WMI setting before.

We structure the work as follows. We will first present a brief review on probabilistic inference in graphical models by means of WMC and WMI. We will then go on to review the formulation of SDDs, outlining the essentials for our framework. Then, we discuss polytime model enumeration in SDDs. Next we will introduce our scheme for computing the WMI of a given hybrid knowledge base (HKB) by means of knowledge compilation followed by probabilistic inference on the compiled representation. Finally, we will evaluate the framework empirically and conclude.

2 Background

In this section, we will introduce the preliminaries for this work. We will discuss graphical models, the logical background, WMC, WMI and SDDs in that order.

Graphical Models

Throughout this paper we will refer to Boolean and continues random variables as and respectively for some finite . Lower case letters, and , will represent the instantiations of these variables. Similarly, bold upper case letters will denote sets of variables and bold lower case letters will denote their instantiation.

Now consider a probabilistic model, defined on and and let

be one element in the probability space , denoting a particular assignment to the values in the respective domains. A graphical model can then be used to describe dependencies between the variables and define a joint density function of those variables compactly. The graphical model we will consider in this paper are Markov networks, which are undirected models. (Encoding a directed graphical model as a weighted propositional theory is discussed in  [13].) Roughly, the nodes of the graph are Boolean functions taking real and Boolean variables and the edges are logical implications. We can then compactly factorize the joint density function in terms of the cliques of the graph [26]:

where and are those random variables participating in the kth clique and is a non-negative, real-valued potential function. Here is not necessarily denoting a probability and so is used as a normalizing constant, also called the partition function defined as  [8]:

Logical Background

Propositional satisfiability (SAT) has been proven to be very useful in many areas of computer science where the task is to find a satisfying assignment to a formula in propositional logic. An instance of satisfiability modulo theory (SMT)  [10] is a generalization of classical SAT in allowing first-order formulas with respect to some decidable background theory. For example, is understood here as quantifier-free linear arithmetic formulas over the reals and the corresponding background theory is the fragment of first-order logic over the signature restricting the interpretation of these symbols to standard arithmetic.

Throughout this paper we will consider two different background theories; quantifier-free linear () and non-linear () arithmetic over the reals. A problem instance (input) to our WMI solver is then a formula with respect to one of those background theories in combination with propositional logic for which satisfaction is defined in an obvious way  [5]. Such an instance is referred to as a hybrid knowledge base (HKB).

Weighted Model Counting

Weighed model counting (WMC)  [13] is a strict generalization of model counting  [10]. In WMC, each model of a given propositional knowledge base (PKB) has an associated weight and we are interested in computing the sum of the weights that correspond to models that satisfy . (As is convention, the underlying propositional language and propositional letters are left implicit. We often refer to the set of literals to mean the set of all propositional atoms as well as their negations constructed from the propositions mentioned in .)

In order to create an instance of the WMC problem given a PKB and literals , we define a weight function mapping the literals to a non-negative, numeric weights. We can then use the literals of a given model to define the weight of that model as well as the weighted model count as follows:

Definition 1.

: Given a PKB over literals and weight function , we define the weight of a model as:

Further we define the weighted model count (WMC) as:

Based on the above definition, it can be observed that if for all , the weight of a model obtained via is always 1, and thus we are just counting the number of satisfying models, in other words, computing the model count.

Probabilistic Inference in Markov Networks by Weighted Model Counting

WMC can be used to calculate probabilities of a given graphical model [13]. As discussed earlier, the simplest instance, an undirected model can be represented as a weighted propositional theory in that the nodes of the graph are Boolean functions taking real and boolean variables, and the edges are logical implications, from which one gets:

Theorem 2.

: [8] Let be a Markov network over Boolean random variables and potentials . Now let be the PKB that encodes the network structure and be the weight function. Then

for some evidence and query , where are PKBs as well, defined for .

Weighted Model Integration

While WMC is very powerful as an inference tool in discrete and Boolean domains, it suffers from the inherent limitation of only admitting inference in discrete probability distributions. This is due to its underlying theory in enumerating all models (or expanding the complete network polynomial), which is exponential in the number of variables, but still finite and countable in the discrete case. For the continuous case, we need to find a language to reason about the uncountable event spaces, as well as represent density functions. WMI [8] was proposed as a strict generalization of WMC for hybrid domains, with the idea of annotating a SMT theory with polynomial weights.

Definition 3.

:  [8] Suppose is a HKB over Boolean and real variables and , and literals . Suppose , where are expressions over . Then the WMI is defined as:

where:

and WEIGHT is defined as described in Def 1.

Intuitively the WMI of an SMT theory is defined in terms of the models of its propositional abstraction . For each such model we compute its volume, that is, we integrate the WEIGHT-values of the literals that are true in the model. The interval of the integral is defined in terms of the refinement of the literals. The weight function is to be seen as mapping an expression to its density function, which is usually another expression mentioning the variables in .

Probabilistic Inference in Markov Networks by Weighted Model Integration

Firstly, note that if is a formula in propositional logic over literals and , then . Second, the results on graphical models generalize as follows:

Theorem 4.

:  [8] Let be a Markov network over the Boolean and real-valued random variables and . Let and be the corresponding encodings. Then for any being HKB over and ,

Example 5.

: [8] Suppose is the following formula:

Let be the propositional abstraction of , where is the propositional abstraction of . For weights, let , , and . There are now 3 models of :

  1. : since , by definition we have and so

  2. :

  3. :

Thus, .
Now suppose that we are interested in the probability of the query given that is observed. Suppose is the abstraction of . First, corresponds to the weight of a single interpretation, that of item 2, yielding a value of 100. Next, also corresponds to the weight of a single interpretation , an extension to that in item 2. In this case:

Therefore, the conditional probability is .

Sentential Decision Diagram

Sentential decision diagrams (SDDs) were first introduced in  [18] and are graphical representations of propositional knowledge bases. SDDs are shown to be a strict subset of deterministic decomposable negation normal form (d-DNNF), a popular representation for probabilistic reasoning applications  [13] due to their desirable properties. Decomposability and determinism ensure tractable probabilistic (and logical) inference, as they enable MAP queries in Markov networks. SDDs however satisfy two even stronger properties found in ordinary binary decision diagrams (OBDD), namely structured decomposability and strong determinism. Thus, they are strict supersets of OBDDs as well, inheriting their key properties; canonicity and a polynomial time support for Boolean combination. Finally SDD’s also come with an upper bound on their size in terms of tree-width.

Structured Decomposability, Strong Determinism and Vtrees

Consider the Boolean function such that . If are further Boolean functions and , then is called an of since it allows us to express purely in terms of functions on and  [31]. Formally, a conjunction is decomposable if each pair of its conjuncts share no variables. Now if the decomposition is considered to be strongly deterministic. In such a case the structures pair is called an element of the decomposition and , the elements prime and sub respectively [18]. But the decomposition used by SDDs has structural properties as well, that build on the notion of the vtrees [31].

Definition 6.

: A vtree for a set of variables is a full, rooted binary tree whose leaves are in one-to-one correspondence with the variables in .

Figure 0(a) represents a possible vtree for the function: . In this paper is used for a vtree node. and are used to represent the left and right child respectively. Furthermore, each vtree induces a total variable order that is obtained by a left-right traversal of the tree.

The Syntax and Semantics of SDDs

Here is used to specify a mapping from an SDD to a Boolean function.

Definition 7.

: is an SDD that respects vtree v iff:

(1)

The size of an SDD , denoted , is obtained by summing the sizes of all its decompositions.

Boolean- and literal-SDD-nodes are called terminal nodes and decomposition/decision nodes otherwise. Graphically, we represent a decision node by a circle with a number indicating the vtree node it respects, and elements of the decision node by boxes. Figure 1 depicts an SDD and the vtree it respects for the specified Boolean function. SDDs can therefore be used to represent a Boolean function in compact form.

(a) Vtree
(b) SDD
Figure 1: Function:  [18]

Canonicity

Canonicity of SDDs is an especially interesting property introduced and proved in [18]. This will ultimately provide a bound on the SDDs we will be constructing in our framework.

Theorem 8.

:  [18] Let and be compressed and trimmed SDDs respecting nodes in the same vtree. Then iff .

Polytime Apply Operation

The Apply operation allows us to construct an SDD from any propositional KB in CNF by simply converting each clause into an SDD and then recursively combining all of the SDDs. The details can be found in  [18] and the technique follows the same ideas as for the Apply operation of OBDDs. Any two SDDs can be conjoined as well as disjoined in polynomial time with respect to the size of the SDD. Furthermore, we can also negate an SDD in polytime, by doing an exclusive-or with  [18]. Formally [18], the Apply operation for a given Boolean connective and SDDs and returns the combined SDD in time, where denotes the size of an SDD.

Upper Bound on SDDs

Finally, the upper bound of SDDs discusses its size for an arbitrary CNF formula:

Theorem 9.

:  [18] A CNF with variables and treewidth has a compressed and trimmed SDD of size .

While this means that the size of the SDD is still exponential in , it is linear in the number of variables . This means that SDDs come with a tighter bound on their size than binary decision diagrams (BDD), which is squared exponential in treewidth.

3 Method

Over the past few years there have been a number of papers on exact probabilistic inference  [6, 29, 34] using the formulation of WMI. What we propose in this section is a novel way of doing weighted model integration by using SDDs as the underlying model counting, enumeration and querying language. Here predicate abstraction and knowledge compilation enable us to compile the abstracted PKB into an SDD, which has the desirable property of a fully parallelisable polytime model-enumeration algorithm. Recall that polytime here refers to the complexity of the algorithm with respect to the size of the tree (SDD) [19].

In practice, computing the probability of a given query for some evidence consists of calculating the WMI of two separate but related HKBs. That is, we have to compute the WMI of a given HKB conjoined with some evidence and the query , dividing it by the WMI of conjoined with the evidence . This formulation introduced by [8] and explained in more detail in Section 2, can be written as:

We will give a quick overview of the whole pipeline for computing the WMI value of a given KB, before discussing in detail the individual computational steps.

Wmi-SDD: The Pipeline

As a basis for performing probabilistic inference, we first have to be able to calculate the WMI of a given HKB with corresponding weight function . As we are interested in doing so by using SDDs as a query language, the WMI breaks down into a sequence of sub-computations depicted as the WMI-SDD pipeline in Figure 2.

Figure 2: Pictorial Depiction of the Proposed Pipeline for WMI

Input/Outputs of the pipeline

The input of the pipeline is composed of two things; the HKB with respect to some background theory (eg. ) on the one hand and the weight function on the other. Here, atoms are defined as usual for the respective language  [5] and can be understood as functions that can not be broken down further into a conjunction, disjunction or negation of yet smaller atoms. This means that a HKB of the form should be abstracted as with and , rather than with .

The first step is to arrange atoms in a form that we call ’separable’. The corresponding background theory determines whether a correct rewriting of formulas is possible to satisfy this condition:

Definition 10.

: A given HKB satisfies the condition separable if every atom within the formula can be rewritten in one of the following forms: , , or where is any term over , with VARS being the set of all variables (Boolean and continuous) that appear in the atom. Here, for any given variable . Such a variable is then called the leading variable (leadVar).

For some background theories, this conversion is immediate. In a formula , any given atom can be rewritten as an inequality or equality where we have a single variable on one site and a linear function on the other side, such as . But this is not a given for HKBs with background theory . To illustrate this condition with an example; the atom can be rewritten as or as and therefore fulfills the imposed condition. The atom on the other hand can not be rewritten as where is a function over the variables such that . Therefore this does not fulfill the condition and can not be dealt with by our implementation regime.

Considering the motivation of performing probabilistic inference, where we deal with evidence and queries in addition to a HKBs, as discussed in Section 2, we note that all elements of have to fulfill the seperability condition. As queries and evidence are applied by means of a logical connective with the HKB, they should generally be thought of as HKBs themselves.

The weight function on the other hand is only restricted by the condition that the term must be integratable for any given model . As long as this condition is met, we can accept any arbitrary functions over the variables (Boolean and continuous) of the KB.

Step 1: Predicate Abstraction

The aim of this step in the WMI framework is twofold. On the one hand it is given a HKB () and is tasked to produce a PKB () and the corresponding mapping from propositional variables to continuous refinements, by means of propositional abstraction. On the other hand, this part of the framework also rearranges the continuous refinements such that a single variable is separated from the rest of the equation to one side of the inequality/equality. Considering these two sub-parts we will give more detail on propositional abstraction first, before talking in more depth about the rearrangement of the refinements.

On a conceptual level, the predicate abstraction closely follows the theoretical formulation introduced in [8]. The HKB is recursively traversed and every encountered atom is replaced with a propositional variable, while the logical structure (connectives and parentheses) of the KB is preserved.

We make use of the imposed separable property in order to rewrite the individual refinements into bounds for a given variable. These bounds can easily be negated and will be used at a later stage to construct the intervals of integration for a given model. Now the process of rewriting a single atom corresponds to symbolically solving an equation for one variable and it is implemented as an arithmetic solver.

The variable we choose to isolate from the rest of the equation (that is, the leading variable), is determined by a variable order, that in turn enforces the order of integration in a later stage of the pipeline.

For example, assume that the chosen variable order is the usual alphabetical one over the variable names. Then predicates are rewritten such that from all variables referenced in the atom, the one highest up in the variable order is chosen as the leading variable and separated from the rest of the equation, resulting in a bound for the given variable. This ensures that for any predicate the bound for the leading variable does not reference any variable that precedes it alphabetically, which in turn ensures that calculating the integrals is possible.

Example 11.

: To illustrate this with an example, consider the HKB ;

After abstraction we are given the PKB ( where the abstracted variables correspond to the following atoms: , , and . As mentioned above, we construct the order of the continuous variable alphabetically, resulting in for the proposed example. Once the order has been constructed we can rewrite each predicate as a bound for the variable appearing first in the order: , , and . This ensures that the integral computes a number for every possible model of the KB. Considering for example the model , the bounds of the integral would be as follows: and yields a number.

In the case of non-linear refinements, the step of rearranging the variable could give rise to new propositions, that in turn have to be added to the PKB. If we consider, for example, the predicate with the refinement: , that should be rewritten for the variable as the leading one. Now as the variable might be negative or zero, we are unable to simply divide both sides by but rather have to split up the equation in the following way: which can be further abstracted as: . Once created, we can replace with its Boolean function refinement in the PKB and add all the new predicates to our list of propositions.

Step 2: Knowledge Compilation

Within this step of our pipeline, the PKB constructed in the previous step is compiled into a canonical SDD. In practice, we first convert the PKB to CNF before passing it to the SDD library.111http://reasoning.cs.ucla.edu/sdd/ The library has a number of optimizations in place, including dynamic minimization [15]. However, the algorithm is still constrained by the asymptotically exponential nature of the problem.

Once the SDD is created, it is imported back into our internal data structure, which is designed for retrieving all satisfying models of a given SDD.

Step 3: Model Enumeration

Retrieving all satisfying models of a given PKB is a crucial part of the WMI formulation and we now focus on this step in our pipeline. In essence, we make use of knowledge compilation in order to compile the given PKB into a data structure, which allows us to enumerate all satisfying models in polynomial time with respect to the size of the tree. As discussed in the background section, SDDs are our data structures of choice and their properties, including canonicity, make them an appealing choice for our pipeline.

The algorithm we developed for retrieving the satisfying models makes full use of the structural properties of SDDs such that it realizes the theoretical property of polytime ME. By recursively traversing the tree bottom up, models are created for each node in the SDD with respect to the vtree node it represents. Those models are then passed upwards in the tree where they are combined with the other branches. This is possible due to the structured decomposability property of the data structure (SDD). It should also be noted at this point that parallelisation of the algorithm is as well possible due to SDDs decomposability properties. This is a highly desirable attribute when it comes to scaling to very large datasets.

Example 12.

: Let us reconsider Example 11, in which case all satisfying models of the PKB are given by:

To illustrate the effectiveness of the ME algorithm via knowledge compilation consider Figure 3 which depicts the run-time of two algorithms for ME on a propositional KB. The first one is our algorithm that compiles the KB into an SDD before enumerating all the models. The other algorithm uses the Z3 SMT solver, such that it checks if a given KB is satisfiable. If that is the case, it retrieves a model, adds its negation to the KB and loops back to the start. If the KB is not satisfiable it terminates222https://§.com/Z3Prover/z3.

Figure 3: Model enumeration with z3 vs WMI-SDD

Step 4: Integration

The workload of this part of the framework is to compute the volume (VOL) (as defined in Def 3) for every satisfying model that was found in the previous step. That volume for a given model of the PKB is computed by integrating the weight function () over the literals true at the model, where the bound of the integral corresponds to the refinement and truth value of a given propositional variable within the model. All such volumes are then summed together and give the WMI value of the given HKB.

Computing a volume for a given model consists of two things; firstly we have to combine the refinements of predicates correctly, creating the bounds of integration before actually integrating over the with respect to the variables and bounds. As discussed in the predicate abstracting and rewriting step, a given predicate (that has a refinement) consists at this point of a leading variable and a bound for this variable. More precisely, combining the bounds into an interval is explained in Algorithm 1.

1:procedure Combine(leadVar, predicates, model)
2:     
3:     for  do
4:         if pred.leadVar != leadVar then
5:              continue          
6:         if  then
7:              
8:         else
9:                        
10:               
11:     return interval
Algorithm 1 Combining the intervals for a leadVar and model

Here the negation function negates the bound of a given predicate in the usual way, so for example: . The function combine then combines intervals via intersections. Thus, for example, and . This procedure is done for every variable referenced in , ensuring that we have a bound of integration for every such variable. Naturally, not all abstracted models have to be models of the original SMT theory, and this becomes reflected in the interval bounds. For example, suppose that a model makes both and true, abstracted as and , then the propositional abstraction erroneously retrieves a model where , and so the interval bounds would be . After a new interval is created, we check if such an impossible interval is obtained, which is then simply disregarded. (Naturally, the model should not be considered as a model for the SMT theory too.) Correspondingly, the process of computing the WMI value is continued with the next model. Now that all the real bounds of integration are defined for the given model, the only step left before integrating is to enumerate all possible instantiations of Boolean variables referenced in the .

When it comes to the implementation of this part of the framework, we used two different integration methods. We support the integration module of the scipy python package333https://scipy.org/ to compute the defined integral for a given , a set of intervals and the instantiations of Boolean variables. Using this package allowed us to formalize the method as described above and perform inference in non-linear domains. However, this formulation is not exact and suffers from a slow runtime. For this reason we also implemented the pipeline using latte444https://www.math.ucdavis.edu/~latte/, an exact integration software that is particularly well-suited for piecewise polynomial density approximations.

4 Empirical Evaluation

Here, we evaluate the proposed framework on the time it needs to compute the WMI of a given HKB and . It is a proof-of-concept system for WMI via SDDs. In order to evaluate the framework, we randomly generate problems, as described below and compare the time to the WMI-PA framework developed in [29].555We were unable to compare the performance with the framework developed in [25] owing to compatibility issues in the experimental setup. Since it is reported to perform comparably to [29], all comparisons made in this paper are in reference to the pipeline developed in [29].

Problem Set Generator

A problem is generated based on 3 factors; the number of variables, the number of clauses and the percent of real variables it should have. Then the algorithm for generating a problem is given in Algorithm 2.

1:procedure generate(nbClauses, nbVars, percContinuous)
2:     nbRealVars
3:     
4:     
5:     for  do
6:         {varwidth}[t]            
7:     for  do
8:         {varwidth}[t]            
9:     clauses = []
10:     for idx in  do
11:         {varwidth}[t]      
12:         {varwidth}[t]      
13:               
14:     
Algorithm 2 Generating problem sets

Here and generate a new atom for the given variable ID. While generating a new Boolean atom, we simply return a Boolean variable with the given ID, whereas generating a real atom is more intricate and depends on the kind of HKB we are generating (i.e., vs ). For both background theories we generate a constant interval for a given variable ID with probability (e.g., for variable ID ). Otherwise, with probability we pick two random subsets of all other real variables for the upper and lower bound respectively. Now if we are generating a HKB with respect to the background theory , we sum all variables in the upper as well as the lower bound, to create a linear function as the upper and lower bound for the variable with the given ID. Similarly, when generating a HKB with respect to the background theory , we conjoin the variables of a given set () by multiplication rather than by addition. Finally, when creating such an interval we additionally add a constant interval for the same variable ID to make sure our integration is definite and evaluates to a real number.

In order to evaluate our framework, we let the number of variables () range from 2 to 28, where the number of clauses we tested is , , for a given value of . Now for each variable clause pair, we generate two problem instances where the percent of continuous variables is set to 50

Results

First, we discuss the performance of our framework on non-linear hybrid domains. As part of this experiment the generated HKB consists of non-linear atoms which are products of variables (e.g. ). Since we are, to the best of our knowledge, the first that are able to compute the WMI for such KBs, a comparison to other algorithms cannot be made. However Figure 4 plots the average time spent in each computational step for all problems that have the same number of variables. Here we see that the overall time increases with the number of variables as expected. While most of the steps have a rather small impact on the overall computational time, the integration step has by far the greatest cost. This is in part due to the Scipy integration method, which was used for these benchmarks, as it can cope with non-linear bounds but is not as efficient as the latte integration package. Finally, we want to point out the surprisingly small cost of compiling the PKB into an SDD, which reinforces our decision to use knowledge compilation.

Figure 4: Runtime analysis of WMI-SDD framework for non-linear HKBs

Next, we discuss the performance of the WMI-SDD framework on linear HKBs against the current state-of-the-art WMI solver, the WMI-PA framework  [29]. The results are plotted in Figure 5. The results demonstrate the overall impact of using knowledge compilation as part of the framework. While the additional step of compiling the abstracted PKB into an SDD results in longer computational time for small problem instances, the trade-off shows its advantage as we increase the number of variables. Considering the logarithmic scale of the y-axis, the difference between the two algorithms becomes quite substantial as the number of variables exceed 20. By extension, we believe the WMI-SDD framework shows tremendous promise for scaling WMI to large domains in the future.

Figure 5: Total runtime comparison WMI-SDD vs WMI-PA for linear HKBs

Before concluding this section, we remark that readers familiar with propositional model counters are likely to be surprised by the total variable size being less than 50 in our experiments and in other WMI solvers [29]. Contrast this with SDD evaluations that scale to hundreds of propositional variables [18, 15]. The main bottleneck here is symbolic integration, even if in isolation solvers such as latte come with strong polynomial time bounds [4]. This is because integration has bnen performed for each model, and so with variables and a knowledge base of the form , there are integration computations in the worst case. That is, there are models on abstraction, and in each model, we will have integration variables.

There are a number of possible ways to address that concern. First, a general solution is to simply focus on piecewise constant potentials, in which case, after abstraction, WMI over a HKB immediately reduces to a WMC task over the corresponding PKB. Second, parallelisation can be enabled. For example, we can decompose a CNF formula into components, which are CNF formulas themselves, the idea being that components do not share variables  [23]. In this case, the model count of a formula , written with components would be . Third, one can keep a dictionary of partial computations of the integration (that is, cache the computed integrals), and leverage these values where applicable. While we do not explore such possibilities in this article, we feel the ability of SDDs to scale as well as its ability to enable parallelisation can be seen as additional justifications for our approach. We also suspect that it should be fairly straightforward to implement such choices given the modular way our solver is realized.

5 Summary

In this paper we introduced a novel way of performing WMI by leveraging efficient predicate abstraction and knowledge compilation. Using SDDs to represent the abstracted HKBs enabled us to make full use of the structural properties of SDD and devise an efficient algorithm for retrieving all satisfying models. The evaluations demonstrate the competitiveness of our framework and reinforce our hypothesis that knowledge compilation is worth considering even in continuous domains. We were, also able to deal with non-linear constraints for the first time.

In the future, we would like to better explore how the integration bottleneck can be addressed, possibly by caching integration computations.

Acknowledgements

This work was supported by the Engineering and Physical Sciences Research Council
(grant EP/L01503X/1), EPSRC Centre for Doctoral Training in Pervasive Parallelism at the University of Edinburgh, School of Informatics. This work is also partly supported by the EPSRC grant Towards Explainable and Robust Statistical AI: A Symbolic Approach.

References

  • [1] Aws Albarghouthi, Loris D’Antoni, Samuel Drews, and Aditya Nori. Quantifying program bias. arXiv preprint arXiv:1702.05437, 2017.
  • [2] Fahiem Bacchus, Shannon Dalmao, and Toniann Pitassi. Solving# sat and bayesian inference with backtracking search. Journal of Artificial Intelligence Research, 34:391–442, 2009.
  • [3] R Iris Bahar, Erica A Frohm, Charles M Gaona, Gary D Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebric decision diagrams and their applications. Formal methods in system design, 10(2-3):171–206, 1997.
  • [4] Velleda Baldoni, Nicole Berline, Jesus De Loera, Matthias Köppe, and Michèle Vergne. How to integrate a polynomial over a simplex. Mathematics of Computation, 80(273):297–325, 2011.
  • [5] Clark W. Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Satisfiability modulo theories. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability,  185 of Frontiers in Artificial Intelligence and Applications, pages 825–885. IOS Press, 2009.
  • [6] Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, and Guy Van den Broeck. Tractable learning for complex probability queries. In Advances in Neural Information Processing Systems, pages 2242–2250, 2015.
  • [7] Vaishak Belle. Weighted model counting with function symbols. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
  • [8] Vaishak Belle, Andrea Passerini, and Guy Van den Broeck. Probabilistic inference in hybrid domains by weighted model integration. In Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI), pages 2770–2776, 2015.
  • [9] Vaishak Belle, Guy Van den Broeck, and Andrea Passerini. Component caching in hybrid domains with piecewise polynomial densities. In AAAI, pages 3369–3375, 2016.
  • [10] Armin Biere, Marijn Heule, and Hans van Maaren. Handbook of satisfiability,  185. IOS press, 2009.
  • [11] Craig Boutilier, Nir Friedman, Moises Goldszmidt, and Daphne Koller. Context-specific independence in bayesian networks. In Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence, pages 115–123. Morgan Kaufmann Publishers Inc., 1996.
  • [12] Rodrigo De Salvo Braz, Ciaran O’Reilly, Vibhav Gogate, and Rina Dechter. Probabilistic inference modulo theories. arXiv preprint arXiv:1605.08367, 2016.
  • [13] Mark Chavira and Adnan Darwiche. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6-7):772–799, 2008.
  • [14] Dmitry Chistikov, Rayna Dimitrova, and Rupak Majumdar. Approximate counting in smt and value estimation for probabilistic programs. Acta Informatica, 54(8):729–764, 2017.
  • [15] Arthur Choi and Adnan Darwiche. Dynamic minimization of sentential decision diagrams. In AAAI, 2013.
  • [16] Arthur Choi, Doga Kisa, and Adnan Darwiche. Compiling probabilistic graphical models using sentential decision diagrams. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pages 121–132. Springer, 2013.
  • [17] Adnan Darwiche. New advances in compiling cnf to decomposable negation normal form. In Proceedings of the 16th European Conference on Artificial Intelligence, pages 318–322. Citeseer, 2004.
  • [18] Adnan Darwiche. Sdd: A new canonical representation of propositional knowledge bases. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence,  22, page 819, 2011.
  • [19] Adnan Darwiche and Pierre Marquis. A knowledge compilation map. Journal of Artificial Intelligence Research, 17(1):229–264, 2002.
  • [20] Jesus De Loera, Brandon Dutra, Matthias Koeppe, Stanislav Moreinis, Gregory Pinto, and Jianqiu Wu. Software for exact integration of polynomials over polyhedra. arXiv preprint arXiv:1108.0117, 2011.
  • [21] Jesús A De Loera, Raymond Hemmecke, Jeremiah Tauzer, and Ruriko Yoshida. Effective lattice point counting in rational convex polytopes. Journal of symbolic computation, 38(4):1273–1302, 2004.
  • [22] Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice of Logic Programming, 15(3):358–401, 2015.
  • [23] Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Model counting. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability,  185 of Frontiers in Artificial Intelligence and Applications, pages 633–654. IOS Press, 2009.
  • [24] Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. In KR, 2014.
  • [25] Samuel Kolb, Martin Mladenov, Scott Sanner, Vaishak Belle, and Kristian Kersting. Efficient symbolic integration for probabilistic inference. In IJCAI, pages 5031–5037, 2018.
  • [26] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  • [27] Shuvendu K Lahiri, Robert Nieuwenhuis, and Albert Oliveras. Smt techniques for fast predicate abstraction. In International Conference on Computer Aided Verification, pages 424–437. Springer, 2006.
  • [28] Yitao Liang, Jessa Bekker, and Guy Van den Broeck. Learning the structure of probabilistic sentential decision diagrams. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
  • [29] Paolo Morettin, Andrea Passerini, and Roberto Sebastiani. Efficient weighted model integration via smt-based predicate abstraction. def, 1(x1):x2, 2017.
  • [30] Christian Muise, Sheila A McIlraith, J Christopher Beck, and Eric I Hsu. D sharp: fast d-dnnf compilation with sharpsat. In Canadian Conference on Artificial Intelligence, pages 356–361. Springer, 2012.
  • [31] Thammanit Pipatsrisawat and Adnan Darwiche. A lower bound on the size of decomposable negation normal form. In AAAI, 2010.
  • [32] Hoifung Poon and Pedro Domingos. Sum-product networks: A new deep architecture. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 689–690. IEEE, 2011.
  • [33] Tian Sang, Paul Beame, and Henry A Kautz. Performing bayesian inference by weighted model counting. In AAAI,  5, pages 475–481, 2005.
  • [34] Scott Sanner, Karina Valdivia Delgado, and Leliane Nunes De Barros. Symbolic dynamic programming for discrete and continuous state mdps. arXiv preprint arXiv:1202.3762, 2012.
  • [35] Prakash P Shenoy and James C West. Inference in hybrid bayesian networks using mixtures of polynomials. International Journal of Approximate Reasoning, 52(5):641–657, 2011.
  • [36] Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic databases. Synthesis lectures on data management, 3(2):1–180, 2011.
  • [37] Guy Van den Broeck and Adnan Darwiche. On the role of canonicity in knowledge compilation. In AAAI, pages 1641–1648, 2015.
  • [38] Jonas Vlasselaer, Joris Renkens, Guy Van den Broeck, and Luc De Raedt. Compiling probabilistic logic programs into sentential decision diagrams. In Proceedings Workshop on Probabilistic Logic Programming (PLP), Workshop on Probabilistic Logic Programming (PLP), Vienna, 17 July 2014, pages 1–10, July 2014.
  • [39] Wei Wei and Bart Selman. A new approach to model counting. In International Conference on Theory and Applications of Satisfiability Testing, pages 324–339. Springer, 2005.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
321505
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description