Complexity Classifications for Propositional Abduction in Post’s FrameworkSupported by ANR Algorithms and complexity 07-BLAN-0327-04 and DFG grant VO 630/6-1. An earlier version appeared in the Proc. of 12th International Conference on the Principles of Knowledge Representation and Reasoning, KR’2010, Toronto, Canada.

Complexity Classifications for Propositional Abduction in Post’s Frameworkthanks: Supported by ANR Algorithms and complexity 07-BLAN-0327-04 and DFG grant VO 630/6-1. An earlier version appeared in the Proc. of 12th International Conference on the Principles of Knowledge Representation and Reasoning, KR’2010, Toronto, Canada.

Nadia Creignou LIF, UMR CNRS 6166, Aix-Marseille Université
163, Avenue de Luminy, 13288 Marseille Cedex 9, France
creignou@lif.univ-mrs.fr
johannes.schmidt@lif.univ-mrs.fr
   Johannes Schmidt LIF, UMR CNRS 6166, Aix-Marseille Université
163, Avenue de Luminy, 13288 Marseille Cedex 9, France
creignou@lif.univ-mrs.fr
johannes.schmidt@lif.univ-mrs.fr
   Michael Thomas Institut für Theoretische Informatik, Gottfried Wilhelm Leibniz Universität
Appelstr. 4, 30167 Hannover, Germany
thomas@thi.uni-hannover.de
Abstract

In this paper we investigate the complexity of abduction, a fundamental and important form of non-monotonic reasoning. Given a knowledge base explaining the world’s behavior it aims at finding an explanation for some observed manifestation. In this paper we consider propositional abduction, where the knowledge base and the manifestation are represented by propositional formulae. The problem of deciding whether there exists an explanation has been shown to be -complete in general. We focus on formulae in which the allowed connectives are taken from certain sets of Boolean functions. We consider different variants of the abduction problem in restricting both the manifestations and the hypotheses. For all these variants we obtain a complexity classification for all possible sets of Boolean functions. In this way, we identify easier cases, namely -complete, -complete and polynomial cases. Thus, we get a detailed picture of the complexity of the propositional abduction problem, hence highlighting sources of intractability. Further, we address the problem of counting the explanations and draw a complete picture for the counting complexity.

Keywords: abduction, computational complexity, Post’s lattice, propositional logic, boolean connective

1 Introduction

This paper is dedicated to the computational complexity of abduction, a fundamental and important form of non-monotonic reasoning. Given a certain consistent knowledge about the world, abductive reasoning is used to generate explanations (or at least telling if there is one) for observed manifestations. Nowadays abduction has taken on fundamental importance in Artificial Intelligence and has many application areas spanning medical diagnosis [BATJ89], text analysis [HSAM93], system diagnosis [SW01], configuration problems [AFM02], temporal knowledge bases [BL00] and has connections to default reasoning [SL90].

There are several approaches to formalize the problem of abduction. In this paper, we focus on logic based abduction in which the knowledge base is given as a set of propositional formulae. We are interested in deciding whether there exists an explanation , i.e., a set of literals consistent with such that and together entail the observation.

From a complexity theoretic viewpoint, the abduction problem is very hard since it is -complete and thus situated at the second level of the polynomial hierarchy [EG95]. This intractability result raises the question for restrictions leading to fragments of lower complexity. Several such restrictions have been considered in previous works. One of the most famous amongst those is Schaefer’s framework, where formulae are restricted to generalized conjunctive normal form with clauses from a fixed set of relations [CZ06, NZ05, NZ08].

A similar yet different procedure is to rather require formulae to be constructed from a restricted set of Boolean functions . Such formulae are called -formulae. This approach has first been taken by Lewis, who showed that the satisfiability problem is -complete if and only if this set of Boolean functions has the ability to express the negation of implication connective  [Lew79]. Since then, this approach has been applied to a wide range of problems including equivalence and implication problems [Rei03, BMTV09a], satisfiability and model checking in modal and temporal logics [BHSS06, BSS08], default logic [BMTV09b], and circumscription [Tho09], among others.

We follow this approach and show that Post’s lattice allows to completely classify the complexity of propositional abduction for several variants and all possible sets of allowed Boolean functions. We consider two main variants of the abduction problem. In the first one we may build explanations from positive and negative literals. We refer to this problem as symmetric abduction, Abd for short. The second variant, , is the so-called positive abduction where we allow only positive literals in the explanations.

We first examine the symmetric variant in the case where the representation of the manifestation is a positive literal. We show that depending on the set of allowed connectives the abduction problem is either -complete, or -complete, or in and -hard, or in Logspace. More precisely, we prove that the complexity of this abduction problem is -complete as soon as can express one of the functions , or . It drops to -complete when all functions in are monotonic and have the ability to express one of the functions , or . The problem becomes solvable in polynomial time and is -hard if -formulae may depend on more than one variable while being representable as linear equations. Finally the complexity drops to Logspace in all remaining cases. We then complete our study of symmetric abduction with analogous complexity classifications of the variants of Abd obtained by restricting the manifestation to be respectively a clause, a term or a -formula.

These results are subsequently extended to positive abduction. An overview can be found in Figures 1 and 2.

Please note that in [CZ06], the authors obtained a complexity classification of the abduction problem in the so-called Schaefer’s framework. The two classifications are in the same vein since they classify the complexity of abduction for local restrictions on the knowledge base. However the two results are incomparable in the sense that no classification can be deduced from the other. They only overlap in the particular case of the linear connective , for which both types of sets of formulae can be seen as systems of linear equations. This special abduction case has been shown to be decidable in polynomial time in [Zan03].

Besides the decision problem, another natural question is concerned with the number of explanations. This problem refers to the counting problem for abduction. The study of the counting complexity of abduction has been started by Hermann and Pichler ([HP07]). We prove here a trichotomy theorem showing that counting the full explanations of symmetric abduction is either -complete or -complete or in , depending on the set of allowed connectives. We also consider the counting problem associated with positive abduction, for which we distinguish two frequently used settings: counting either all positive explanations, or counting the subset-minimal. For both formalizations of the counting problem, we get a potentially dichotomous classification with one open case.

The rest of the paper is structured as follows. We first give the necessary preliminaries in Section 2. The abduction problems considered herein are defined in Section 3. In Section 4 we classify the complexity of symmetric abduction, where we first consider the case where the manifestation is a single positive literal (Section 4.1), and then turn to variants in which the manifestations are clauses, terms and restricted formulae (Section 4.2). Section 5 then studies the complexity of positive abduction. An overview of these results is given in Section 6. Finally, Section 7 is dedicated to the counting problem, and Section 8 contains some concluding remarks.

2 Preliminaries

Complexity Theory

We require standard notions of complexity theory. For the decision problems the arising complexity degrees encompass the classes Logspace, , , and . For more background information, the reader is referred to [Pap94]. We furthermore require the class defined as the class of languages such that there exists a nondeterministic logspace Turing machine that exhibits an odd number of accepting paths if and only if , for all  [BDHM92]. It holds that . For our hardness results we employ logspace many-one reductions, defined as follows: a language is logspace many-one reducible to some language (written ) if there exists a logspace-computable function such that if and only if .

A counting problem is represented using a witness function , which for every input returns a finite set of witnesses. This witness function gives rise to the following counting problem: given an instance , find the cardinality of the witness set . The class is the class of counting problems naturally associated with decision problems in . According to [HV95] if is a complexity class of decision problems, we define to be the class of all counting problems whose witness function is such that the size of every witness of is polynomially bounded in the size of , and checking whether is in . Thus, we have and . Completeness of counting problems is usually proved by means of Turing reductions. A stronger notion is the parsimonious reduction where the exact number of solutions is conserved by the reduction function.

Propositional formulae

We assume familiarity with propositional logic. The set of all propositional formulae is denoted by . A model for a formula is a truth assignment to the set of its variables that satisfies . Further we denote by the formula obtained from by replacing all occurrences of with . For a given set of formulae, we write to denote the set of variables occurring in . We identify finite with the conjunction of all the formulae in , . Naturally, then stands for . For any formula , we write if entails , i.e., if every model of also satisfies .

A literal is a variable or its negation . Given a set of variables , denotes the set of all literals formed upon the variables in , i.e., . A clause is a disjunction of literals and a term is a conjunction of literals.

Clones of Boolean Functions

A clone is a set of Boolean functions that is closed under superposition, i.e., it contains all projections (that is, the functions for all and ) and is closed under arbitrary composition. Let be a finite set of Boolean functions. We denote by the smallest clone containing and call a base for . In 1941 Post identified the set of all clones of Boolean functions [Pos41]. He gave a finite base for each of the clones and showed that they form a lattice under the usual -relation, hence the name Post’s lattice (see, e.g., Figure 1). To define the clones we introduce the following notions, where is an -ary Boolean function:

  • is -reproducing if , .

  • is monotonic if implies .

  • is -separating of degree if for all of size there exists an such that implies , .

  • is -separating if is -separating of degree .

  • is self-dual if , where .

  • is affine if with .

A list of all clones with definitions and finite bases is given in Table 1 on page 1. A propositional formula using only functions from as connectives is called a -formula. The set of all -formulae is denoted by .

Let be an -ary Boolean function. A -formula such that is a -representation of if for all it holds that if and only if every with and for all relevant satisfies . Such a -representation exists for every . Yet, it may happen that the -representation of some function uses some input variable more than once.

Example 1

Let . An -representation of the function is .

Name Definition Base
All Boolean functions
is a disjunction of variables or constants
is a conjunction of variables or constants
depends on at most one variable
Table 1: The list of all Boolean clones with definitions and bases, where and .
Figure 1: Post’s lattice showing the complexity of the symmetric abduction problem for all sets of Boolean functions and the most interesting restrictions of the manifestations. In the legend, abbreviates Logspace and the suffixes “-h” and “-c” indicate hardness and completeness respectively.
Figure 2: Post’s lattice showing the complexity of the positive abduction problem for all sets of Boolean functions and the most interesting restrictions of the manifestations. In the legend, abbreviates Logspace and the suffixes “-h” and “-c” indicate hardness and completeness respectively.

3 The Abduction Problem

Let be a finite set of Boolean functions. We are interested in the propositional abduction problem parameterized by the set of allowed connectives. We define the abduction problem for -formulae as

  • , where

    • is a set of -formulae,

    • is a set of variables,

    • is a formula with .

  • Is there a set such that is satisfiable and (or equivalently is unsatisfiable)?

The set represents the knowledge base. The set is called the set of hypotheses and is called manifestation or query. Furthermore, if such a set exists, it is called an explanation or a solution of the abduction problem. It is called a full explanation if . Observe that every explanation can be extended to a full one. We will consider several restrictions of the manifestations of this problem. To indicate them, we introduce a second argument meaning that is required to be

  • (resp. , ): a single literal (resp. positive literal, negative literal),

  • (resp. , ): a clause (resp. positive clause, negative clause),

  • (resp. , ): a term (resp. positive term, negative term),

  • : a -formula.

We refer to the above defined abduction problem as symmetric abduction, since every variable of the hypotheses may be taken positive or negative to construct an explanation. We will also consider positive abduction, where we are interested in purely positive explanations only. To indicate this, we add the prefix “-”. Thus, for an instance of , every solution of has to satisfy .

The following important lemma makes clear the role of the constants in our abduction problem. It often reduces the number of cases to be considered.

Lemma 1

Let be a finite set of Boolean functions.

  1. If , then

  2. If and , then

Proof

To reduce to we transform any instance of the first problem in replacing every occurrence of by a fresh variable and adding the unit clause to the knowledge base. The same reduction works for . To prove , let be an instance of the first problem and be a fresh variable. Since , we can suppose w.l.o.g. that does not contain . We map to , where is the -representation of and . ∎

Of course this lemma holds also for purely positive/negative queries, clauses or terms, i.e., can be replaced by or , respectively.

Observe that if and are two sets of Boolean functions such that , then every function of can be expressed by a -formula, namely by its -representation. This way there is a canonical reduction between and if : replace all -connectives by their -representation. Note that this reduction is not necessarily polynomial: Since the -representation of some function may use some input variable more than once, the formula size may grow exponentially. Nevertheless we will use this reduction very frequently, avoiding an exponential blow-up by special structures of the -formulae.

4 The complexity of

We commence with the symmetric abduction problem The results of this section are summarized in Figure 1. We will first focus on the case where is a single positive literal, thus discussing the problem .

4.1 The complexity of

Theorem 4.1

Let be a finite set of Boolean functions. Then the symmetric abduction problem for propositional -formulae with a positive literal manifestation, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in Logspace in all other cases.

Remark 1

For such a classification a natural question is: given , how hard is it to determine the complexity of ? Solving this task requires checking whether certain clones are included in (for lower bounds) and whether itself is included in certain clones (for upper bounds). As shown in [Vol09], the complexity of checking whether certain Boolean functions are included in a clone depends on the representation of the Boolean functions. If all functions are given by their truth table then the problem is in quasi-polynomial-size , while if the input functions are given in a compact way, i.e., by circuits, then the above problem becomes -complete.

We split the proof of Theorem 4.1 into several propositions.

Proposition 1

Let be a finite set of Boolean functions such that or or . Then .

Proof

Let be an instance of .

For or , is equivalent to a set of literals, hence has the empty set as a solution if possesses a solution at all. Finally notice that satisfiability of a set of -formulae can be tested in logarithmic space [Sch05].

For each formula is equivalent to either a constant or disjunction. It holds that has a solution if and only if contains a formula such that , and is satisfiable. This can be tested in logarithmic space, as substitution of symbols and evaluation of -formulae can all be performed in logarithmic space. ∎

Proposition 2

Let be a finite set of Boolean functions such that . Then is -hard and contained in .

Proof

In this case, deciding whether an instance of has a solution logspace reduces to the problem of deciding whether a propositional abduction problem in which the knowledge base is a set of linear equations has a solution. This has been shown to be decidable in polynomial time in [Zan03].

As for the -hardness, let be such that . Consider the -complete problem to determine whether a system of linear equations over has a solution [BDHM92]. Note that is closed under complement, so deciding whether such a system has no solution is also -complete. Let be such a system of linear equations over variables . Then, for all , the equation is of the form with and . We map to a set of affine formulae over variables via

if and
if .

Now define

is obviously satisfied by the assignment mapping all propositions to . It furthermore holds that has no solution if and only if is unsatisfiable. Hence, we obtain that has no solution if and only if the propositional abduction problem has an explanation.

It remains to transform into a set of -formulae in logarithmic space. Since , we have . We insert parentheses in every formula of in such a way that we get a ternary -tree of logarithmic depth whose leaves are either a proposition or the constant 1. Then we replace every node by its equivalent -formula. Thus we get a -formula of size polynomial in the size of the original one. Lemma 1 allows to conclude. ∎

Observe that in the cases , , and , the abduction problem for -formulae is self-reducible. Roughly speaking, this means that given an instance and a literal , we can efficiently compute an instance such that the question whether there exists an explanation with reduces to the question whether admits solutions. It is well-known that for self-reducible problems whose decision problem is in , the lexicographic first solution can be computed in . It is an easy exercise to extend this algorithm to enumerate all solutions in lexicographic order with polynomial delay and polynomial space. Thus, if or or , the explanations of can be enumerated with polynomial delay and polynomial space according to Proposition 1 and 2.

Proposition 3

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

We first show that is efficiently verifiable. Let be an -instance and be a candidate for an explanation. Define as the set of formulae obtained from by replacing each occurrence of the proposition with if , and each occurrence of the proposition with if . It holds that is a solution for if is satisfiable and is not. These tests can be performed in polynomial time, because is a set of monotonic formulae [Lew79]. Hence, .

Next we give a reduction from the -complete problem , i.e., the problem to decide whether there exists an assignment that satisfies exactly two propositions in each clause of a given formula in conjunctive normal form with exactly three positive propositions per clause, see [Sch78]. Let with , , be the given formula. We map to the following instance . Let , , be fresh, pairwise distinct propositions and let . The set is defined as

(1)
(2)
(3)

We show that there is an assignment that sets to true exactly two propositions in each clause of if and only if has a solution. First, suppose that there exists an assignment such that for all , there is a permutation of such that   and . Thus (1) and (2) are satisfied, and (3) is equivalent to . From this, it is readily observed that is a solution to .

Conversely, suppose that has an explanation that is w.l.o.g. full. Then is satisfiable and . Let be an assignment that satisfies . Then, for any , if , and otherwise. Since entails and as the only occurrence of is in (3), we obtain that sets to each and at least one proposition in each clause of . Consequently, from (2) it follows that sets to at least two propositions in each clause of . Therefore, sets to exactly two propositions in each clause of .

It remains to show that can be transformed into an -instance for all considered . Observe that and . Therefore due to Lemma 1 it suffices to consider the case . Using the associativity of rewrite (3) as an -tree of logarithmic depth and replace all the connectives in by their B-representation (). ∎

Proposition 4

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

Membership in is easily seen to hold: given an instance , guess an explanation and subsequently verify that is satisfiable and is not.

Observe that . By virtue of Lemma 1 and the fact that , it suffices to consider the case . In [EG95] it has been shown that the propositional abduction problem remains -complete when the knowledge base is a set of clauses. From such an instance we build an instance of by rewriting first each clause as an -tree of logarithmic depth and then replacing the occurring connectives and by their -representation, thus concluding the proof. ∎

4.2 Variants of

We now consider the symmetric abduction problem for different variants on the manifestations: clause, term and -formula. Let us first make a remark on the cases where the manifestation is a (not necessarily positive) literal or a negative literal.

Remark 2

obeys the same classification as since all bounds, upper and lower, easily carry over. For the problem becomes trivial if . For is solvable in polynomial time according to [Zan03]. For the remaining clones (i.e., for , , and ), we can again easily adapt the proofs of . This way we obtain a dichotomous classification for into -complete and -complete cases; thus skipping the intermediate level.

For clauses, it is obvious that . Therefore, all hardness results continue to hold for the . It is an easy exercise to prove that all algorithms that have been developed for a single query can be naturally extended to clauses. Therefore, the complexity classifications for the problems , and are exactly the same as for , and , respectively.

Theorem 4.2

Let be a finite set of Boolean functions. Then the symmetric abduction problem for propositional -formulae with a positive clause manifestation, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in Logspace in all other cases.

Notably, we will prove in the next section that allowing for terms as manifestations increases the complexity for the clones (from membership in Logspace to -completeness), while allowing -formulae as manifestations makes the classification dichotomous again: all problems become either - or -complete.

4.2.1 The complexity of

Proposition 5

Let be a finite set of Boolean functions such that . Then is -complete.

Proof

Let be a finite set of Boolean functions such that  and let be an instance of . Hence, is a set of -formulae and is a term, . Observe that is a solution for if is satisfiable and for every , is not. Given a set , these verifications, which require substitution of symbols and evaluation of an -formula, can be performed in polynomial time, thus proving membership in .

To prove -hardness, we give a reduction from 3Sat. Let be a 3-CNF-formula, . Let enumerate the variables occurring in . Let and be fresh, pairwise distinct variables. We map to , where

We show that is satisfiable if and only if has a solution. First assume that is satisfied by the assignment . Define and as the extension of mapping and for all . Obviously, . Furthermore, for all , because any satisfying assignment of sets to either or and thus . Hence is an explanation for .

Conversely, suppose that has a full explanation . The facts that and that each occurs only in the clauses enforce that, for every , contains or . Because of the clause , it cannot contain both. Therefore in the value of is determined by the value of and is its dual. From this it is easy to conclude that the assignment defined by if , and otherwise, satisfies . Finally can be transformed into an -instance, because every formula in is the disjunction of at most three variables and .∎

Theorem 4.3

Let be a finite set of Boolean functions. Then the symmetric abduction problem for propositional -formulae with a positive term manifestation, , is

  1. -complete if or or ,

  2. -complete if or or ,

  3. in and -hard if , and

  4. in Logspace in all other cases.

Proof
  1. The -hardness follows directly from Proposition 4.

  2. For the clones , see Proposition 5. In all other clones, the -hardness follows from a straightforward generalization of the proof of Proposition 3.

  3. Membership in follows directly from [NZ08, Theorem 67], the -hardness from Proposition 2.

  4. Analogous to Proposition 1. ∎

Remark 3

All upper and lower bounds for easily carry over to . It is also easily seen that is classified exactly as , see Remark 2.

4.2.2 The complexity of

Proposition 6

Let be a finite set of Boolean functions such that or or . Then is -complete.

Proof

We prove -hardness by giving a reduction from the -hard problem [Wra77]. Let an instance of be given by a closed formula with being a 3-DNF-formula. First observe that is true if and only if there exists a consistent set such that , for all , and is (universally) valid (or equivalently is unsatisfiable).

Denote by the negation normal form of and let be obtained from by replacing all occurrences of with a fresh proposition , , and all occurrences of with a fresh proposition , . That is,