Complexity of Propositional Abduction for Restricted Sets of Boolean FunctionsSupported by ANR Algorithms and complexity 07-BLAN-0327-04 and DFG grant VO 630/6-1.

Complexity of Propositional Abduction for Restricted Sets of Boolean Functionsthanks: Supported by ANR Algorithms and complexity 07-BLAN-0327-04 and DFG grant VO 630/6-1.

Nadia Creignou    Johannes Schmidt
Laboratoire d’Informatique Fondamentale
   CNRS
Université d’Aix-Marseille II
163
   avenue de Luminy    13288 Marseille Cedex 9    France &Michael Thomas
Institut für Theoretische Informatik
Gottfried Wilhelm Leibniz Universität
Appelstr. 4
   30167 Hannover    Germany
Abstract

Abduction is a fundamental and important form of non-monotonic reasoning. Given a knowledge base explaining how the world behaves it aims at finding an explanation for some observed manifestation. In this paper we focus on propositional abduction, where the knowledge base and the manifestation are represented by propositional formulae. The problem of deciding whether there exists an explanation has been shown to be -complete in general. We consider variants obtained by restricting the allowed connectives in the formulae to certain sets of Boolean functions. We give a complete classification of the complexity for all considerable sets of Boolean functions. In this way, we identify easier cases, namely -complete and polynomial cases; and we highlight sources of intractability. Further, we address the problem of counting the explanations and draw a complete picture for the counting complexity.

Complexity of Propositional Abduction for Restricted Sets of Boolean Functionsthanks: Supported by ANR Algorithms and complexity 07-BLAN-0327-04 and DFG grant VO 630/6-1.


Copyright © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Introduction

Abduction is a fundamental and important form of non-monotonic reasoning. Assume that given a certain consistent knowledge about the world, we want to explain some observation. This task of finding an explanation or only telling if there is one, is called abduction. Today it has many application areas spanning medical diagnosis (?), text analysis (?), system diagnosis (?), configuration problems (?), temporal knowledge bases (?) and has connections to default reasoning (?).

There are several approaches to formalize the problem of abduction. In this paper, we focus on logic based abduction in which the knowledge base is given as a set of propositional formulae. We are interested in deciding whether there exists an explanation , i.e., a set of literals consistent with such that and together entail the observation.

From a complexity theoretic viewpoint, the abduction problem is very hard in the sense that it is -complete and thus situated at the second level of the polynomial hierarchy (?). This intractability result raises the question for restrictions leading to fragments of lower complexity. Several such restrictions have been considered in previous works. One of the most famous amongst those is Schaefer’s framework, where formulae are restricted to generalized conjunctive normal form with clauses from a fixed set of relations (???).

A similar yet different procedure is to rather require formulae to be constructed from a restricted set of Boolean functions . Such formulae are called -formulae. This approach has first been taken by Lewis, who showed that the satisfiability problem is -complete if and only if this set of Boolean functions has the ability to express the negation of implication connective  (?). Since then, this approach has been applied to a wide range of problems including equivalence and implication problems (??), satisfiability and model checking in modal and temporal logics (??), default logic (?), and circumscription (?), among others.

We follow this approach and show that Post’s lattice allows to completely classify the complexity of propositional abduction for several variants and all possible sets of allowed Boolean functions. We first examine the case where the representation of the manifestation is a literal. We show that depending on the set of allowed connectives the abduction problem is either -complete, or -complete, or in and -hard, or in . More precisely, we prove that the complexity of this abduction problem is -complete as soon as can express one of the functions , or . It drops to -complete when all functions in are monotonic and have the ability to express one of the functions , or . The problem becomes solvable in polynomial time and is -hard if -formulae may depend on more than one variable while being representable as linear equations. Finally the complexity drops down to in all remaining cases.

We then examine several variants of the propositional abduction problem. The variants considered are obtained by restricting representation of the manifestation to be respectively a clause, a term or a -formula. We present a complete classification in all cases. An overview of the results is given in Figure 1. Our results highlight the sources of intractability and exhibit properties of Boolean functions that lead to an increase of the complexity of abduction.

In (?) the authors obtained a complexity classification of the abduction problem for formulae which are in generalized conjunctive normal form, with clauses from a fixed set of relations. The two classifications are in the same vein since they classify the complexity of abduction for local restrictions on the knowledge base. However the two results are incomparable, in the sense that no classification can be deduced from the other. They only overlap on the particular case of the linear connective , for which both types of sets of formulae can be seen as systems of linear equations. This special abduction case has been shown to be decidable in polynomial time in (?).

Besides the decision problem, another natural question is concerned with the number of explanations. This problem refers to the counting problem for abduction. The study of the counting complexity of abduction has been started by Hermann and Pichler (?). We prove here a trichotomy theorem showing that counting the full explanations of propositional abduction problems is either -complete or -complete or in , depending on the set of allowed connectives.

The rest of the paper is structured as follows. We first give the necessary preliminaries. Afterwards, we define the abduction problem considered herein. We then classify the complexity of the abduction of a single literal. These results are complemented with the complexity of the abduction problem for clauses, terms and restricted formulae. Next, we consider the counting problem and finally conclude with a discussion of the results.

Preliminaries

Complexity Theory

We require standard notions of complexity theory. For the decision problems the arising complexity degrees encompass the classes , , , and . For more background information, the reader is referred to (?). We furthermore require the class defined as the class of languages such that there exists a nondeterministic logspace Turing machine that exhibits an odd number of accepting paths if and only if , for all  (?). It holds that . For our hardness results we consider logspace many-one reductions, defined as follows: a language is logspace many-one reducible to some language (written ) if there exists a logspace-computable function such that if and only if .

A counting problem is represented using a witness function , which for every input returns a finite set of witnesses. This witness function gives rise to the following counting problem: given an instance , find the cardinality of the witness set . The class is the class of counting problems naturally associated with decision problems in . According to (?) if is a complexity class of decision problems, we define to be the class of all counting problems whose witness function is such that the size of every witness of is polynomially bounded in the size of , and checking whether is in . Thus, we have and . Completeness of counting problems is usually proved by means of Turing reductions. A stronger notion is the parsimonious reduction where the exact number of solutions is conserved by the reduction function.

Propositional formulae

We assume familiarity with propositional logic. The set of all propositional formulae is denoted by . A model for a formula is a truth assignment to the set of its variables that satisfies . Further we denote by the formula obtaine from by replacing all occurrences of with . For a given set of formulae, we write to denote the set of variables occurring in . We identify finite with the conjunction of all the formulae in , . For any formula , we write if entails , i.e., if every model of also satisfies .

A literal is a variable or its negation ; is called the atom of and is denoted by . Given a set of variables , denotes the set of all literals formed upon the variables in , i.e., . A clause is a disjunction of literals and a term is a conjunction of literals.

Clones of Boolean Functions

A clone is a set of Boolean functions that is closed under superposition, i.e., it contains all projections (that is, the functions for and ) and is closed under arbitrary composition. Let be a finite set of Boolean functions. We denote by the smallest clone containing and call a base for . All closed classes of Boolean functions were identified by Post (?). Post also found a finite base for each of them and detected their inclusion structure, hence the name of Post’s lattice (see Figure 1).

In order to define the clones, we require the following notions, where is an -ary Boolean function:

  • is -reproducing if , .

  • is monotonic if implies .

  • is -separating of degree if for all of size there exists an such that implies , .

  • is -separating if is -separating of degree .

  • is self-dual if .

  • is affine if with .

A list of all clones with definitions and finite bases is given in Table 1 on page 1, see also e.g., (?). A propositional formula using only functions from as connectives is called a -formula. The set of all -formulae is denoted by . Let be an -ary Boolean function. A -formula such that is a -representation of if for all it holds that if and only if every with and for all relevant , satisfies . Such a -representation exists for every . Yet, it may happen that the -representation of some function uses some input variable more than once.

Example 1

Let . An -representation of the function is .

Name Definition Base
All Boolean functions
is a disjunction of variables or constants
is a conjunction of variables or constants
depends on at most one variable
Table 1: The list of all Boolean clones with definitions and bases, where and .
Figure 1: Post’s lattice showing the complexity of the abduction problem for all sets of Boolean functions and considered restrictions of the manifestations.

Observe that if and are two sets of Boolean functions such that , then every function of can be expressed by a -formula, its so-called -representation.

The Abduction Problem

Let be a finite set of Boolean functions. We are interested in a propositional abduction problem parameterized by the set of allowed connectives. We define the abduction problem for -formulae as

  • , where

    • is a set of -formulae, ,

    • is a set of variables, ,

    • is a formula, with

  • Is there a set such that is satisfiable and (or equivalently is unsatisfiable)?

The set represents the knowledge base. The set is called the set of hypotheses and is called manifestation or query. Furthermore, if such a set exists, it is called an explanation or a solution of the abduction problem. It is called a full explanation if . Observe that every explanation can be extended to a full one.

We will consider several restrictions on the manifestations of this problem. To indicate these restrictions, we introduce a second argument : in the abduction problem , is required to be a single literal if , a clause if , a term if , and a -formula if .

Let us start with a lemma that makes clear the role of the two constants and in our problem.

Lemma 1

Let be a finite set of Boolean functions

  1. If , then

  2. If and , then

Proof

To reduce to we transform any instance of the first problem in replacing every occurrence of by a fresh variable and adding the unit clause to the knowledge base. To prove , let be an instance of the first problem and be a fresh variable. If , then we can suppose w.l.o.g. that does not contain . We map to , where is the -representation of .

The Complexity of

Theorem 0..1

Let be a finite set of Boolean functions. Then, the abduction problem for propositional -formulae, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in in all other cases.

Remark 1

For such a classification a natural question is: given , how hard is it to determine the complexity of ? Solving this task requires checking whether certain clones are included in (for lower bounds) and whether itself is included in certain clones (for upper bounds). As shown in (?), the complexity of checking whether certain Boolean functions are included in a clone depends on the representation of the Boolean functions. If all functions are given by their truth table then the problem is in quasi-polynomial-size , while if the input functions are given in a compact way, i.e., by circuits, then the above problem becomes -complete.

We split the proof of Theorem 0..1 into several propositions.

Proposition 1

Let be a finite set of Boolean functions such that or or . Then .

Proof

Let be an instance of .

For or , is equivalent to a set of literals, hence has the empty set as a solution if possesses a solution at all. Finally notice that satisfiability of a set of -formulae can be tested in logarithmic space (?).

For each formula is equivalent to either a constant or disjunction. It holds that has a solution if and only if contains a formula such that , and is satisfiable. This can be tested in logarithmic space, as substitution of symbols and evaluation of -formulae can all be performed in logarithmic space.

Proposition 2

Let be a finite set of Boolean functions such that . Then is -hard and contained in .

Proof

In this case, deciding whether an instance of has a solution logspace reduces to the problem of deciding whether a propositional abduction problem in which the knowledge base is a set of linear equations has a solution. This has been shown to be decidable in polynomial time in (?).

As for the -hardness, let be such that . Consider the -complete problem to determine whether a system of linear equations over has a solution (?). Note that is closed under complement, so deciding whether such a system has no solution is also -complete. Let be such a system of linear equations over variables . Then, for all , the equation is of the form with and . We map to a set of affine formulae over variables via

Now define

is obviously satisfied by the assignment mapping all propositions to . It furthermore holds that has no solution if and only if is unsatisfiable. Hence, we obtain that has no solution if and only if the propositional abduction problem has an explanation.

It remains to transform into a set of -formulae in logarithmic space. Since , we have . We insert parentheses in every formula of in such a way that we get a ternary -tree of logarithmic depth whose leaves are either a proposition or the constant 1. Then we replace every node by its equivalent -formula. Thus we get a -formula of size polynomial in the size of the original one. Lemma 1 allows to conclude.

Note that the -formulae replacing the connectives might use some input variable more than once. Therefore, the logarithmic depth tree is built in order to avoid an exponential explosion of the formula size during the replacement.

Observe that the abduction problem for -formulae is self-reducible for the above cases, i.e., for , and . Roughly speaking this means, given an instance and a literal , we can compute efficiently an instance such that the question whether there exists an explanation with reduces to the question whether admits solutions. It is well-known that for self-reducible problems whose decision problem is in , the lexicographically first solution can be computed in . It is an easy exercise to extend this algorithm to enumerate all solutions in lexicographical order with polynomial delay and polynomial space. Thus, the explanations of can be enumerated with polynomial delay and polynomial space if or or , according to Proposition 1 and 2.

Proposition 3

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

We first show that is efficiently verifiable. Let be an -instance and be a candidate for an explanation. Define as the set of formulae obtained from by replacing each occurrence of the proposition with if , and each occurrence of the proposition with if . It holds that is a solution for if is satisfiable and is not. These tests can be performed in polynomial time, because is a set of monotonic formulae (?). Hence, .

Next we give a reduction from the -complete problem , i.e., the problem to decide whether there exists an assignment that satisfies exactly two propositions in each clause of a given formula in conjunctive normal form with exactly three positive propositions per clause, see (?). Let with , , be the given formula. We map to the following instance . Let , , , be fresh, pairwise distinct propositions and let . We define as

(1)
(2)
(3)

We show that there is an assignment that sets to true exactly two propositions in each clause of if and only if has a solution. First, suppose that there exists an assignment such that for all , there is a permutation of such that  and . Thus (1) and (2) are satisfied, and (3) is equivalent to . From this, it is readily observed that is a solution to .

Conversely, suppose that has an explanation that is w.l.o.g. full. Then is satisfiable and . Let be an assignment that satisfies . Then, for any , if , and otherwise. Since entails and as the only occurrence of is in (3), we obtain that sets to each and at least one proposition in each clause of . Consequently, from (2) follows that sets to at least two propositions in each clause of . Therefore, sets to exactly two propositions in each clause of .

It remains to show that can be transformed into an -instance for all considered . Observe that and . Therefore due to Lemma 1 it suffices to consider the case . Using the associativity of rewrite (3) as an -tree of logarithmic depth and replace all the connectives in by their B-representation ().

Proposition 4

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

Membership in is easily seen to hold: given an instance , guess an explanation and subsequently verify that is satisfiable and is not.

Observe that . By virtue of Lemma 1 and the fact that , it suffices to consider the case . In (?) it has been shown that the propositional abduction problem remains -complete when the knowledge base is a set of CNF-formulae. From such an instance we build an instance of by rewriting first each formula as a tree of logarithmic depth and then replacing all the connectives , and by their -representation, thus concluding the proof.

Complexity of the Variants

We now turn to the study of the complexity of some variants of the abduction problem. It is obvious that and that . Therefore, all hardness results still hold for the variants and . Also, it can be easily checked that the hardness results in the previous sections still hold when the query is required to be a positive literal. For this reason the hardness results also carry over to the variant .

It is an easy exercise to prove that all algorithms that have been developed for a single query can be naturally extended to clauses. Therefore, the complexity classification for the problem is exactly the same as for .

Theorem 0..2

Let be a finite set of Boolean functions. Then, the abduction problem for propositional -formulae, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in in all other cases.

More interestingly, we will prove in the next section that allowing terms as manifestations increases the complexity for the clones (from membership in to -completeness), while allowing -formulae as manifestations makes the classification dichotomous, /-complete, thus skipping the intermediate level.

The Complexity of

Proposition 5

Let be a finite set of Boolean functions such that . Then is -complete.

Proof

Let be a finite set of Boolean functions such that  and let be an instance of . Hence, is a set of -formulae and is a term, . Observe that is a solution for if is satisfiable and for every , is not. Given a set , these verifications, which require substitution of symbols and evaluation of an -formula, can be performed in polynomial time, thus proving membership in .

To prove -hardness, we give a reduction from 3Sat. Let be a 3-CNF-formula, . Let enumerate the variables occurring in . Let and be fresh, pairwise distinct variables. We map to , where

We show that is satisfiable if and only if has a solution. First assume that is satisfied by the assignment . Define and as the extension of mapping and for all . Obviously, . Furthermore, for all , because any satisfying assignment of sets to either or and thus . Hence is an explanation for .

Conversely, suppose that has a full explanation . The facts that and that each occurs only in the clauses enforce that, for every , contains or . Because of the clause , it cannot contain both. Therefore in the value of is determined by the value of and is its dual. From this it is easy to conclude that the assignment defined by if , and otherwise, satisfies . Finally can be transformed into an -instance, because every formula in is the disjunction of at most three variables and .

Theorem 0..3

Let be a finite set of Boolean functions. Then, the abduction problem for propositional -formulae, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in in all other cases.

The Complexity of

Proposition 6

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

We prove -hardness by giving a reduction from the -hard problem (?). Let an instance of be given by a closed formula with a 3-DNF-formula. First observe that is true if and only if there exists a consistent set such that , for all , and is (universally) valid (or equivalently is unsatisfiable).

Denote by the negation normal form of and let be obtained from by replacing all occurrences of with a fresh proposition , , and all occurrences of with a fresh proposition , . That is, . Thus where every is a disjunction of three propositions. To we associate the propositional abduction problem defined as follows:

Suppose that is true. Then there exists an assignment such that no extension of satisfies . Define as the set of literals over set to by . Defining , we obtain with abuse of notation

which is unsatisfiable by assumption. As is satisfied by any assignment setting in addition all , , and all , , to , we have proved that is an explanation for .

Conversely, suppose that has an explanation . Due to the clause in , we also may assume that for all