Dichotomies in Ontology-Mediated Querying with the Guarded Fragment
We study the complexity of ontology-mediated querying when ontologies are formulated in the guarded fragment of first-order logic (GF). Our general aim is to classify the data complexity on the level of ontologies where query evaluation w.r.t. an ontology is considered to be in PTime if all (unions of conjunctive) queries can be evaluated in PTime w.r.t. and coNP-hard if at least one query is coNP-hard w.r.t. . We identify several large and relevant fragments of GF that enjoy a dichotomy between PTime and coNP, some of them additionally admitting a form of counting. In fact, almost all ontologies in the BioPortal repository fall into these fragments or can easily be rewritten to do so. We then establish a variation of Ladner’s Theorem on the existence of NP-intermediate problems and use this result to show that for other fragments, there is provably no such dichotomy. Again for other fragments (such as full GF), establishing a dichotomy implies the Feder-Vardi conjecture on the complexity of constraint satisfaction problems. We also link these results to Datalog-rewritability and study the decidability of whether a given ontology enjoys PTime query evaluation, presenting both positive and negative results.
Dichotomies in Ontology-Mediated Querying with the Guarded Fragment
|University of Liverpool|
|University of Bremen|
|University of Liverpool|
|University of Liverpool|
Ontology-Based Data Access; Query Answering; Dichotomies
In Ontology-Mediated Querying, incomplete data is enriched with an ontology that provides domain knowledge, enabling more complete answers to queries [?, ?, ?]. This paradigm has recently received a lot of interest, a significant fraction of the research being concerned with the (data) complexity of querying [?, ?] and, closely related, with the rewritability of ontology-mediated queries into more conventional database query languages [?, ?, ?, ?, ?]. A particular emphasis has been put on designing ontology languages that result in PTime data complexity, and in delineating these from the coNP-hard cases. This question and related ones have given rise to a considerable array of ontology languages, including many description logics (DLs) [?, ?] and a growing number of classes of tuple-generating dependencies (TGDs), also known as Datalog and as existential rules [?, ?]. A general and uniform framework is provided by the guarded fragment (GF) of first-order logic and extensions thereof, which subsume many of the mentioned ontology languages [?, ?].
In practical applications, ontologies often need to use language features that are only available in computationally expensive ontology languages, but do so in a way such that one may hope for hardness to be avoided. This observation has led to a more fine-grained study of data complexity than on the level of ontology languages, initiated in [?], where the aim is to classify the complexity of individual ontologies while quantifying over the actual query: query evaluation w.r.t. an ontology is in PTime if every CQ can be evaluated in PTime w.r.t. and it is coNP-hard if there is at least one CQ that is coNP-hard to evaluate w.r.t. . In this way, one can identify tractable ontologies within ontology languages that are, in general, computationally hard. Note that an even more fine-grained approach is taken in [?], where one aims to classify the complexity of each pair with an ontology and an actual query. Both approaches are reasonable, the first one being preferable when the queries to be answered are not fixed at the design time of the ontology; this is actually often the case because ontologies are typically viewed as general purpose artifacts to be used in more than a single application. In this paper, we follow the former approach.
The main aim of this paper is to identify fragments of GF (and of extensions of GF with different forms of counting) that result in a dichotomy between PTime and coNP when used as an ontology language and that cover as many real-world ontologies as possible, considering conjunctive queries (CQs) and unions thereof (UCQs) as the actual query language. We also aim to provide insight into which fragments of GF (with and without counting) do not admit such a dichotomy, to understand the relation between PTime data complexity and rewritability into Datalog (with inequality in rule bodies, in case we start from GF with equality or counting), and to clarify whether it is decidable whether a given ontology has PTime data complexity. Note that we concentrate on fragments of GF because for the full guarded fragment, proving a dichotomy between PTime and coNP implies the long-standing Feder-Vardi conjecture on constraint satisfaction problems [?] which indicates that it is very difficult to obtain (if it holds at all). In particular, we concentrate on the fragment of GF that is invariant under disjoint unions, which we call uGF, and on fragments thereof and their extension with forms of counting. Invariance under disjoint unions is a fairly mild restriction that is shared by many relevant ontology languages, and it admits a very natural syntactic characterization.
Our results are summarized in Figure Dichotomies in Ontology-Mediated Querying with the Guarded Fragment. We first explain the fragments shown in the figure and then survey the obtained results. A uGF ontology is a set of sentences of the form where is a guard (possibly equality) and is a GF formula that does not contain any sentences as subformulas and in which equality is not used as a guard. The depth of such a sentence is the quantifier depth of (and thus the outermost universal quantifier is not counted). A main parameter that we vary is the depth, which is typically very small in real world ontologies. In Figure Dichotomies in Ontology-Mediated Querying with the Guarded Fragment, the depth is the first parameter displayed in brackets. As usual, the subscript indicates the restriction to two variables while a superscript means that the guard in the outermost universal quantifier can only be equality, means that equality is allowed (in non-guard positions), indicates the ability to declare binary relation symbols to be interpreted as partial functions, and GC denotes the two variable guarded fragment extended with counting quantifiers, see [?, ?]. While guarded fragments are displayed in black, description logics (DLs) are shown in grey and smaller font size. We use standard DL names except that ‘’ denotes globally functional roles while ‘’ refers to counting concepts of the form . We do not explain DL names here, but refer to the standard literature [?].
The bottommost part of Figure Dichotomies in Ontology-Mediated Querying with the Guarded Fragment displays fragments for which there is a dichotomy between PTime and coNP, the middle part shows fragments for which such a dichotomy implies the Feder-Vardi conjecture (from now on called CSP-hardness), and the topmost part is for fragments that provably have no dichotomy (unless ). The vertical lines indicate that the linked results are closely related, often indicating a fundamental difficulty in further generalizing an upper bound. For example, uGF enjoys a dichotomy while uGF is CSP-hard, which demonstrates that generalizing the former result by dropping the restriction that the outermost quantifier has to be equality (indicated by ) is very challenging (if it is possible at all).111A tentative proof of the Feder-Vardi conjecture has very recently been announced in [?], along with an invitation to the research community to verify its validity. Our positive results are thus optimal in many ways. All results hold both when CQs and when UCQs are used as the actual query; in this context, it is interesting to note that there is a GF ontology (which is not an uGF ontology) for which CQ answering is in PTime while UCQ-answering is coNP-hard. In the cases which enjoy a dichotomy, we also show that PTime query evaluation coincides with rewritability into Datalog (with inequality in the rule bodies if we start from a fragment with equality or counting). In contrast, for all fragments that are CSP-hard or have no dichotomy, these two properties do provably not coincide. This is of course independent of whether or not the Feder-Vardi conjecture holds.
For ontologies of depth 1, we also show that it is decidable and ExpTime-complete whether a given ontology admits PTime query evaluation (equivalently: rewritability into Datalog). For uGC, we show a NExpTime upper bound. For ontologies of depth 2, we establish NExpTime-hardness. The proof indicates that more sophisticated techniques are needed to establish decidability, if the problem is decidable at all (which we leave open).
To understand the practical relevance of our results, we have analyzed 411 ontologies from the BioPortal repository [?]. After removing all constructors that do not fall within , an impressive 405 ontologies turned out to have depth 2 and thus belong to a fragment with dichotomy (sometimes modulo an easy complexity-preserving rewriting). For , still 385 ontologies had depth 1 and so belonged to a fragment with dichotomy. As a concrete and simple example, consider the two uGC-ontologies
which both enjoy PTime query evaluation (and thus rewritability into Datalog), but where query evaluation w.r.t. the union is coNP-hard. Note that such subtle differences cannot be captured when data complexity is studied on the level of ontology languages, at least when basic compositionality conditions are desired.
We briefly highlight some of the techniques used to establish our results. An important role is played by the notions of materializability and unraveling tolerance of an ontology . Materializability means that for every instance , there is a universal model of and , defined in terms of query answers rather than in terms of homomorphisms (which, as we show, need not coincide in our context). Unraveling tolerance means that the ontology cannot distinguish between an instance and its unraveling into a structure of bounded treewidth. While non-materializability of implies that query evaluation w.r.t. is coNP-hard, unraveling tolerance of implies that query evaluation w.r.t. is in PTime (in fact, even rewritable into Datalog). To establish dichotomies, we prove for the relevant fragments that materializability implies unraveling tolerance which, depending on the fragment, can be technically rather subtle. To prove CSP-hardness or non-dichotomies, very informally speaking, we need to express properties in the ontology that a (positive existential) query cannot ‘see’. This is often very subtle and can often be achieved only partially. While the latter is not a major problem for CSP-hardness (where we need to deal with CSPs that ‘admit precoloring’ and are known to behave essentially in the same way as traditional CSPs), it poses serious challenges when proving non-dichotomy. To tackle this problem, we establish a variation of Ladner’s theorem on NP-intermediate problems such that instead of the word problem for NP Turing machines, it speaks about the run fitting problem, which is to decide whether a given partially described run of a Turing machine (which corresponds to a precoloring in the CSP case) can be extended to a full run that is accepting. Also our proofs of decidability of whether an ontology admits PTime query evaluation are rather subtle and technical, involving e.g. mosaic techniques.
Due to space constraints, throughout the paper we defer proof details to the appendix.
Related Work. Ontology-mediated querying has first been considered in [?, ?]; other important papers include [?, ?, ?]. It is a form of reasoning under integrity constraints, a traditional topic in database theory, see e.g. [?, ?] and references therein, and it is also related to deductive databases, see e.g. the monograph [?]. Moreover, ontology-mediated querying has drawn inspiration from query answering under views [?, ?]. In recent years, there has been significant interest in complete classification of the complexity of hard querying problems. In the context of ontology-mediated querying, relevant references include [?, ?, ?]. In fact, this paper closes a number of open problems from [?] such as that ontologies of depth two enjoy a dichotomy and that materializability (and thus PTime complexity and Datalog-rewritability) is decidable in many relevant cases. Other areas of database theory where complete complexity classifications are sought include consistent query answering [?, ?, ?, ?], probabilistic databases [?], and deletion propagation [?, ?].
We assume an infinite set of data constants, an infinite set of labeled nulls disjoint from , and a set of relation symbols containing infinitely many relation symbols of any arity . A (database) instance is a non-empty set of facts , where , is the arity of , and . We generally assume that instances are finite, unless otherwise specified. An interpretation is a non-empty set of atoms , where , is the arity of , and . We use and to denote the set of relation symbols and, respectively, constants and labelled nulls in . We always assume that is finite while can be infinite. Whenever convenient, interpretations are presented in the form where and is a -ary relation on for each of arity . An interpretation is a model of an instance , written , if . We thus make a strong open world assumption (interpretations can make true additional facts and contain additional constants and nulls) and also assume standard names (every constant in is interpreted as itself in ). Note that every instance is also an interpretation.
Assume and are interpretations. A homomorphism from to is a mapping from to such that implies for all and of arity . We say that preserves a set of constants and labelled nulls if for all and that is an isomorphic embedding if it is injective and entails . An interpretation is a subinterpretation of if and implies ; if , we denote by and call it the subinterpretation of induced by .
Conjunctive queries (CQs) of arity take the form , where is the tuple of answer variables of , and is a conjunction of atomic formulas with of arity and variables. As usual, all variables in must occur in some atom of . Any CQ can be regarded as an instance , often called the canonical database of , in which each variable of is represented by a unique data constant , and that for each atom in contains the atom . A tuple of constants is an answer to in , in symbols , if there is a homomorphism from to with . A union of conjunctive queries (UCQ) takes the form , where each is a CQ. The are called disjuncts of . A tuple of constants is an answer to in , denoted by , if is an answer to some disjunct of in .
We now introduce the fundamentals of ontology-mediated querying. An ontology language is a set of first-order sentences over signature (that is, function symbols are not allowed) and an -ontology is a finite set of sentences from . We introduce various concrete ontology languages throughout the paper, including fragments of the guarded fragment and descriptions logics. An interpretation is a model of an ontology , in symbols , if it satisfies all its sentences. An instance is consistent w.r.t. if there is a model of and .
An ontology-mediated query (OMQ) is a pair , where is an ontology and a UCQ. The semantics of an ontology-mediated query is given in terms of certain answers, defined next. Assume that has arity and is an instance. Then a tuple of length in is a certain answer to on an instance given , in symbols , if for all models of and . The query evaluation problem for an OMQ is to decide, given an instance and a tuple in , whether .
We use standard notation for Datalog programs (a brief introduction is given in the appendix). An OMQ is called Datalog-rewritable if there is a Datalog program such that for all instances and , iff . Datalog-rewritability is defined accordingly, but allows the use of inequality in the body of Datalog rules. We are mainly interested in the following properties of ontologies.
Let be an ontology and a class of queries. Then
-evaluation w.r.t. is in PTime if for every , the query evaluation problem for ( is in PTime.
-evaluation w.r.t. is Datalog-rewritable (resp. Datalog-rewritable) if for every , the query evaluation problem for ( is Datalog-rewritable (resp. Datalog-rewritable).
-evaluation w.r.t. is coNP-hard if there is a such that the query evaluation problem for is coNP-hard.
As ontology languages, we consider fragments of the guarded fragment (GF) of FO, the two-variable guarded fragment of FO with counting, and DLs. Recall that GF formulas [?] are obtained by starting from atomic formulas over and equalities and then using the boolean connectives and guarded quantifiers of the form
where is a guarded formula with free variables among and is an atomic formula or an equality that contains all variables in . The formula is called the guard of the quantifier.
In ontologies, we only allow GF sentences that are invariant under disjoint unions, that is, for all families , , of interpretations with mutually disjoint domains, the following holds: for all if, and only if, . We give a syntactic characterization of GF sentences that are invariant under disjoint unions. Denote by openGF the fragment of GF that consists of all (open) formulas whose subformulas are all open and in which equality is not used as a guard. The fragment uGF of GF is the set of sentences obtained from openGF by a single guarded universal quantifier: if is in openGF, then is in uGF, where is an atomic formula or an equality that contains all variables in . We often omit equality guards in uGF sentences of the form and simply write . A uGF ontology is a finite set of sentences in uGF.
A GF sentence is invariant under disjoint unions iff it is equivalent to a uGF sentence.
The direction from right to left is straightforward. For the converse direction, observe that every GF sentence is equivalent to a Boolean combination of uGF sentences. Now assume that is a GF sentence and invariant under disjoint unions. Let be the set of all sentences in uGF with . By compactness of FO it is sufficient to show that . If this is not the case, take a model of refuting and take for any sentence in uGF that is not in an interpretation satisfying and refuting . Let be the disjoint union of all . By preservation of under disjoint unions, satisfies . By reflection of for disjoint unions, the disjoint union of and does not satisfy . Thus satisfies and does not satisfy but by construction and satisfy the same sentences in uGF. This is impossible since is equivalent to a Boolean combination of uGF sentences. ∎
The following example shows that some very simple Boolean combinations of uGF sentences are not invariant under disjoint unions.
Then is not preserved under disjoint unions since and are models of but refutes ; does not reflect disjoint unions since the disjoint union of and is a model of but refutes . We will use these ontologies later to explain why we restrict this study to fragments of GF that are invariant under disjoint unions.
When studying uGF ontologies, we are going to vary several parameters. The depth of a formula in openGF is the nesting depth of guarded quantifiers in . The depth of a sentence in uGF is the depth of , thus the outermost guarded quantifier is not counted. The depth of a uGF ontology is the maximum depth of its sentences. We indicate restricted depth in brackets, writing e.g. uGF to denote the set of all uGF sentences of depth at most 2.
is in uGF since the openGF formula has depth 1.
For every GF sentence , one can construct in polynomial time a conservative extension in uGF by converting into Scott normal form [?]. Thus, the satisfiability and CQ-evaluation problems for full GF can be polynomially reduced to the corresponding problem for uGF.
We use uGF to denote the fragment of uGF where only equality guards are admitted in the outermost universal quantifier applied to an openGF formula. Thus, the sentence in Example 2 (1) is a uGF sentence of depth 1, but not a uGF sentence of depth 1. It is, however, equivalent to the following uGF sentence of depth 1:
An example of a uGF sentence of depth 1 that is not equivalent to a uGF sentence of depth 1 is given in Example 3 below. Intuitively, uGF sentences of depth 1 can be thought of as uGF sentences of ‘depth ’ because giving up allows an additional level of ‘real’ quantification (meaning: over guards that are not forced to be equality), but only in a syntactically restricted way.
The two-variable fragment of uGF is denoted with uGF. More precisely, in uGF we admit only the two fixed variables and and disallow the use of relation symbols of arity exceeding two. We also consider two extensions of uGF with forms of counting. First, uGF denotes the extension of uGF with function symbols, that is, an uGF ontology is a finite set of uGF sentences and of functionality axioms [?]. Second, we consider the extension uGC of uGF with counting quantifiers. More precisely, the language openGC is defined in the same way as the two-variable fragment of openGF, but in addition admits guarded counting quantifiers [?, ?]: if , , and for some and is in openGC, then is in openGC. The ontology language uGC is then defined in the same way as uGF, using openGC instead of openGF. The depth of formulas in uGC is defined in the expected way, that is, guarded counting quantifiers and guarded quantifiers both contribute to it.
The above restrictions can be freely combined and we use the obvious names to denote such combinations. For example, uGF denotes the two-variable fragment of uGF with function symbols and where all sentences must have depth 1 and the guard of the outermost quantifier must be equality. Note that uGF admits equality, although in a restricted way (only in non-guard positions, with the possible exception of the guard of the outermost quantifier). We shall also consider fragments of uGF that admit no equality at all except as a guard of the outermost quantifier. To emphasize that the restricted use of equality is allowed, we from now on use the equality symbol in brackets whenever equality is present, as in uGF, uGF, and uGC. Conversely, uGF, uGF, and uGC from now on denote the corresponding fragments where equality is only allowed as a guard of the outermost quantifier.
Description logics are a popular family of ontology languages that are related to the guarded fragments of FO introduced above. We briefly review the basic description logic , further details on this and other DLs mentioned in this paper can be found in the appendix and in [?]. DLs generally use relations of arity one and two, only. An concept is formed according to the syntax rule
where ranges over unary relations and over binary relations. An ontology is a finite set of concept inclusions , with and concepts. The semantics of concepts can be given by translation to openGF formulas with one free variable and two variables overall. A concept inclusion then translates to the uGF sentence . The depth of an concept is the maximal nesting depth of and . The depth on an ontology is the maximum depth of concepts that occur in it. Thus, every ontology of depth is a uGF ontology of depth . When translating into uGF instead of into uGF, the depth might decrease by one because one can exploit the outermost quantifier (which does not contribute to the depth). A more detailed description of the relationship between DLs and fragments of uGF is given in the appendix.
The concept inclusion has depth 2, but is equivalent to the uGF sentence
Note that for any ontology in any DL considered in this paper one can construct in a straightforward way in polynomial time a conservative extension of of depth one. In fact, many DL algorithms for satisfiability or query evaluation assume that the ontology is of depth one and normalized.
We also consider the extensions of with inverse roles (denoted in the name of the DL by the letter ), role inclusions (denoted by ), qualified number restrictions (denoted by ), partial functions as defined above (denoted by ), and local functionality expressed by (denoted by ). The depth of ontologies formulated in these DLs is defined in the obvious way. Thus, ontologies (which admit all the constructors introduced above) translate into uGC ontologies, preserving the depth.
For any syntactic object (such as an ontology or a query), we use to denote the number of symbols needed to write , counting relation names, variable names, and so on as a single symbol and assuming that numbers in counting quantifiers and DL number restrictions are coded in unary.
We introduce guarded tree decompositions and rooted acyclic queries [?]. A set is guarded in the interpretation if is a singleton or there are and such that . By , we denote the set of all guarded sets in . A tuple is guarded in if is a subset of some guarded set in . A guarded tree decomposition of is a triple with an acyclic undirected graph and bag a function that assigns to every a set of atoms such that and
is guarded for every ;
is connected in , for every .
We say that is guarded tree decomposable if there exists a guarded tree decomposition of . We call a connected guarded tree decomposition (cg-tree decomposition) if, in addition, is connected (i.e., a tree) and for all . In this case, we often assume that has a designated root , which allows us to view as a directed tree whenever convenient.
A CQ is a rooted acyclic query (rAQ) if there exists a cg-tree decomposition of the instance with root such that is the set of answer variables of . Note that, by definition, rAQs are non-Boolean queries.
is not an rAQ since is not guarded tree decomposable. By adding the conjunct to one obtains an rAQ.
We will frequently use the following construction: let be an instance and a set of guarded sets in . Assume that , , are interpretations such that and for any two distinct guarded sets and in . Then the interpretation
is called the interpretation obtained from by hooking to for all . If the are cg-tree decomposable interpretations with for the root of a (fixed) cg-tree decomposition of , then is called a forest model of defined using . If is the set of all maximal guarded sets in , then we call simply a forest model of . The following result can be proved using standard guarded tree unfolding [?, ?].
Let be a uGF or uGC ontology, a possibly infinite instance, and a model of and . Then there exists a forest model of and and a homomorphism from to that preserves .
We introduce and study materializability of ontologies as a necessary condition for query evaluation to be in PTime. In brief, an ontology is materializable if for every instance , there is a model of and such that for all queries, the answers on agree with the certain answers on given . We show that this sometimes, but not always, coincides with existence of universal models defined in terms of homomorphisms. We then prove that in uGF and uGC, non-materializability implies coNP-hard query answering while this is not the case for GF. Using these results, we further establish that in uGF and uGC, query evalution w.r.t. ontologies to be in PTime, Datalog-rewritable, and coNP-hard does not depend on the query language, that is, all these properties agree for rAQs, CQs, and UCQs. Again, this is not the case for GF.
Definition 2 (Materializability)
Let be an FO-ontology, a class of queries, and a class of instances. Then
an interpretation is a -materialization of and an instance if it is a model of and and for all and in , iff .
is -materializable for if for every instance that is consistent w.r.t. , there is a -materialization of and .
If is the class of all instances, we simply speak of -materializability of .
We first observe that the materializability of ontologies does not depend on the query language (although concrete materializations do).
Let be a uGF or uGC ontology and a class of instances. Then the following conditions are equivalent:
is rAQ-materializable for ;
is CQ-materializable for ;
is UCQ-materializable for .
The only non-trivial implication is (1) (2). It can be proved by using Lemma 1 and showing that if is a rAQ-materialization of an ontology and an instance , then any forest model of and which admits a homomorphism to that preserves is a CQ-materialization of and . ∎
Because of Theorem 2, we from now on speak of materializability without reference to a query language and of materializations instead of UCQ-materializations (which are then also CQ-materializations and rAQ-materializations).
A notion closely related to materializations are (homomorphically) universal models as used e.g. in data exchange [?, ?]. A model of an ontology and an instance is hom-universal if there is a homomorphism preserving into any model of and . We say that an ontology admits hom-universal models if there is a hom-universal model for and any instance . It is well-known that hom-universal models are closely related to what we call UCQ-materializations. In fact, in many DLs and in uGC, materializability of an ontology coincides with admitting hom-universal models (although for concrete models, being hom-universal is not the same as being a materialization). We show in the long version that this is not the case for ontologies in uGF (with three variables). The proof also shows that admitting hom-universal models is not a necessary condition for query evaluation to be in PTime (in contrast to materializability).
A uGC ontology is materializable iff it admits hom-universal models. This does not hold for uGF ontologies.
The following theorem links materializability to computational complexity, thus providing the main reason for our interest into this notion. The proof is by reduction of 2+2-SAT [?], a variation of a related proof from [?].
Let be an FO-ontology that is invariant under disjoint unions. If is not materializable, then rAQ-evaluation w.r.t. is coNP-hard.
We remark that, in the proof of Theorem 3, we use instances and rAQs that use additional fresh (binary) relation symbols, that is, relation symbols that do not occur in .
The ontology from Example 1 shows that Theorem 3 does not hold for GF ontologies, even if they are of depth 1 and use only a single variable. In fact, is not CQ-materializable, but CQ-evaluation is in PTime (which is both easy to see).
For all and ontologies , the following are equivalent:
rAQ-evaluation w.r.t. is in PTime;
CQ-evaluation w.r.t. is in PTime;
UCQ-evaluation w.r.t. is in PTime.
This remains true when ‘in PTime’ is replaced with ‘Datalog-rewritable’ and with ‘coNP-hard’ (and with ‘Datalog-rewritable’ if is a uGF ontology).
By Theorem 3, we can concentrate on ontologies that are materializable. For the non-trivial implication of Point 3 by Point 1, we exploit materializability to rewrite UCQs into a finite disjunction of queries where is a “core CQ” that only needs to be evaluated over the input instance (ignoring labeled nulls) and each is a rAQ. This is similar to squid decompositions in [?], but more subtle due to the presence of subqueries that are not connected to any answer variable of . Similar constructions are used also to deal with Datalog-rewritability and with coNP-hardness. ∎
The ontology from Example 1 shows that Theorem 4 does not hold for GF ontologies, even if they use only a single variable and are of depth 1 up to an outermost universal quantifier with an equality guard.
CQ-evaluation w.r.t. is in PTime and UCQ-evaluation w.r.t. is coNP-hard.
The lower bound essentially follows the construction in the proof of Theorem 3 and the upper bound is based on a case analysis, depending on which relations occur in the CQ and in the input instance.
While materializability of an ontology is a necessary condition for PTime query evaluation in uGF and uGC, we now identify a sufficient condition called unravelling tolerance that is based on unravelling instances into cg-tree decomposable instances (which might be infinite). In fact, unravelling tolerance is even a sufficient condition of Datalog-rewritability and we will later establish our dichotomy results by showing that, for the ontology languages in question, materializability implies unravelling tolerance.
We start with introducing suitable forms of unravelling (also called guarded tree unfolding, see [?] and references therein). The uGF-unravelling of an instance is constructed as follows. Let be the set of all sequences where , , are maximal guarded sets of and
In the following, we associate each with a set of atoms . Then we define as and note that is a cg-tree decomposition of where if for some .
Set . Take an infinite supply of copies of any . We set if is a copy of . We define (up to isomorphism) proceeding by induction on the length of the sequence . For any , is an instance whose domain is a set of copies of such that the mapping is an isomorphism from onto the subinstance of induced by . To define for when , take for any a fresh copy of and define with domain such that the mapping is an isomorphism from onto . The following example illustrates the construction of .
(1) Consider the instance depicted below with the maximal guarded sets . Then the unravelling of consists of three isomorphic chains (we depict only one such chain):
(2) Next consider the instance depicted below which has the shape of a tree of depth one with root and has three maximal guarded sets . Then the unravelling of consists of three isomorphic trees of depth one of infinite outdegree (again we depict only one):
By construction, the mapping is a homomorphism from onto and the restriction of to any guarded set is an isomorphism. It follows that for any uGF ontology , UCQ , and in , if , then . This implication does not hold for ontologies in the guarded fragment with functions or counting. To see this, let
Then for the instance from Example 5 (2) but . For this reason the uGF-unravelling is not appropriate for the guarded fragment with functions or counting. By replacing Condition (c) by the stronger condition
we obtain an unravelling that we call uGC-unravelling and that we apply whenever all relations have arity at most two. One can show that the uGC-unravelling of an instance preserves the number of -successors of constants in and that, in fact, the implication ‘’ holds for every uGC ontology , UCQ , and tuple in the uGC-unravelling of .
We are now ready to define unravelling tolerance. For a maximal guarded set in , the copy in of a tuple in is the unique in such that for .
A uGF (resp. uGC) ontology is unravelling tolerant if for every instance , every rAQ , and every tuple in such that the set of elements of is maximally guarded in the following are equivalent:
where is the copy of in
where is the uGF-unravelling (resp. the uGC-unravelling) of .
We have seen above that the implication (2) (1) in Definition 3 holds for every uGF and uGC ontology and every UCQ. Note that it is pointless to define unravelling tolerance using the implication (1) (2) for UCQs or CQs that are not acyclic. The following example shows that (1) (2) does not always hold for rAQs.
Consider the uGF ontology that contains the sentences
For instances not using , states that is entailed for all that are -connected to some -cycle in with an odd number of constants. Thus, for the instance from Example 5 (1) we have for every but for any .
We now show that, as announced, unraveling tolerance implies that query evaluation is Datalog-rewritable.
For all and ontologies , unravelling tolerance of implies that rAQ-evaluation w.r.t. is Datalog-rewritable (and Datalog-rewritable if is formulated in uGF).
We sketch the proof for the case that is a ontology; similar constructions work for the other cases. Suppose that is unravelling tolerant, and that is a rAQ. We construct a Datalog program that, given an instance , computes the certain answers of on given , where w.l.o.g. we can restrict our attention to answers such that the set of elements of is maximally guarded in . By unravelling tolerance, it is enough to check if , where is the copy of in and is the uGF-unravelling of .
The Datalog program assigns to each maximally guarded tuple in a set of types. Here, a type is a maximally consistent set of uGF formulas with free variables in , where the variable represents the element . It can be shown that we only need to consider types with formulas of the form or , where is obtained from a subformula of or by substituting a variable in for each of its free variables, or is an atomic formula in the signature of with free variables in . In particular, the set of all types is finite. We further restrict our attention to types that are realizable in some model of , i.e., there is a model of containing all elements of that is a model of each formula in under the interpretation . The Datalog program ensures the following:
for any two maximally guarded tuples , in that share an element, and any type assigned to there is a type assigned to that is compatible to (intuitively, the two types agree on all formulas that only talk about elements shared by and );
a tuple is an answer to if all types assigned to contain , or some maximally guarded tuple in has no type assigned to it.
It can be shown that is an answer to iff .
The interesting part is the “if” part. Suppose is not an answer to . Then, each maximally guarded tuple in is assigned to at least one type, and for some type assigned to we have . We use this type assignment to label each maximally guarded tuple of with a type so that (1) for each maximally guarded tuple of that shares an element with the two types and are compatible; and (2) , where is the copy of in . We can now show that the interpretation obtained from by hooking to , for all maximally guarded tuples of , is a model of and with . ∎
We prove dichotomies between PTime and coNP for query evaluation in the five ontology languages displayed in the bottommost part of Figure Dichotomies in Ontology-Mediated Querying with the Guarded Fragment. In fact, the dichotomy is even between Datalog-rewritability and coNP. The proof establishes that for ontologies formulated in any of these languages, CQ-evaluation w.r.t. is Datalog-rewritable iff it is in PTime iff is unravelling tolerant iff is materializable for the class of (possibly infinite) cg-tree decomposable instances iff is materializable and that, if none of this is the case, CQ-evaluation w.r.t. is coNP-hard. The main step towards the dichotomy result is provided by the following theorem.
Let be an ontology formulated in one of uGF, uGF, uGF, uGC, or an ontology of depth 2. If is materializable for the class of (possibly infinite) cg-tree decomposable instances with , then is unravelling tolerant.
We sketch the proof for uGF and uGF ontologies and then discuss the remaining cases. Assume that satisfies the precondition from Theorem 6. Let be an instance and its uGF unravelling. Let be a tuple in a maximal guarded set in and be the copy in of . Further let be an rAQ such that . We have to show that . Using the condition that is materializable for the class of cg-tree decomposable instances with , it can be shown that there exists a materialization of and .
By Lemma 1 we may assume that is a forest model which is obtained from by hooking cg-tree decomposable to maximal guarded in . Now we would like to obtain a model of and the original by hooking for any maximal guarded in the interpretation to rather than to . However, the resulting model is then not guaranteed to be a model of . The following example illustrates this. Let contain
Thus in every model of each node has an -successor in and having an -successor that is not in is propagated along . is unravelling tolerant. Consider the instance from Example 5 (1) depicted here again with the maximal guarded sets .
We have seen that the unravelling of consists of three chains. An example of a forest model of and is given in the figure. Even in this simple example a naive way of hooking the models , , to the original instance will lead to an interpretation not satisfying as the propagation condition for -successors not in will not be satisfied. To ensure that we obtain a model of we first define a new instance by adding to each maximal guarded set in a copy of any entailed rAQ. The following facts are needed for this to work:
Automorphisms: for any with there is an automorphism of mapping onto and such that for all . (This is trivial in the example above.) It is for this property that we need that is obtained from using maximal guarded sets only and the assumption that . It follows that if then the same rAQs are entailed at and in .
Homomorphism preservation: if there is a homomorphism from instance to instance then entails . Ontologies in uGF and uGF have this property as they do not use equality nor counting. Because of homomorphism preservation the answers in to rAQs are invariant under moving from to . Note that the remaining ontology languages in Theorem 6 do not have this property.
Now using that is materializable w.r.t. one can uniformize a materialization of that is a forest model in such a way that the automorphisms for extend to automorphisms of the resulting model which also still satisfies . In the example, after uniformization all chains will behave in the same way in the sense that every node receives an -successor not in . We then obtain a forest model of by hooking the interpretations to the maximal guarded sets in . and are guarded bisimilar. Thus is a model of and , as required.
For uGF and uGC the intermediate step of constructing is not required as sentences have smaller depth and no uniformization is needed to satisfy the ontology in the new model. For ontologies of depth 2 uniformization by constructing is needed and has to be done carefully to preserve functionality when adding copies of entailed rAQs to . ∎
We can now prove our main dichotomy result.
Let be an ontology formulated in one of uGF, uGF, uGF, uGC, or an ontology of depth 2. Then the following conditions are equivalent (unless ):
is materializable for the class of cg-tree decomposable instances with ;
is unravelling tolerant;
query evaluation w.r.t.