Finite Query Answering in Expressive Description Logics with Transitive Roles

Finite Query Answering in Expressive Description Logics with Transitive Roles

Tomasz Gogacz
University of Warsaw, Poland
t.gogacz@mimuw.edu.pl
   Yazmín Ibáñez-García
TU Wien, Austria
yazmin.garcia@tuwien.ac.at
   Filip Murlak
University of Warsaw, Poland
fmurlak@mimuw.edu.pl
Abstract

We study the problem of finite ontology mediated query answering (FOMQA), the variant of OMQA where the represented world is assumed to be finite, and thus only finite models of the ontology are considered. We adopt the most typical setting with unions of conjunctive queries and ontologies expressed in description logics (DLs). The study of FOMQA is relevant in settings that are not finitely controllable. This is the case not only for DLs without the finite model property, but also for those allowing transitive role declarations. When transitive roles are allowed, evaluating queries is challenging: FOMQA is undecidable for {\cal SHOIF} and only known to be decidable for the Horn fragment of {\cal{ALCI\hskip-0.43ptF}}. We show decidability of FOMQA for three proper fragments of {\cal{SOIF}}: {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}, and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}. Our approach is to characterise models relevant for deciding finite query entailment. Relying on a certain regularity of these models, we develop automata-based decision procedures with optimal complexity bounds.

1 Introduction

Evaluating queries in the presence of background knowledge has been extensively studied in several communities. A particularly prominent take on this problem is ontology mediated query answering (OMQA) where background knowledge represented by an ontology is leveraged to infer more complete answers to queries [6]. A widely accepted family of ontology languages with varying expressive power is offered by Description Logics (DLs) [3], while the most commonly studied query language is that of (unions of) conjunctive queries.

Often, the intended models of the ontology are finite and this additional assumption allows to infer more properties: finite ontology mediated query answering (FOMQA) is the variant of OMQA restricted to finite models. For some logics the finite variant and the unrestricted variant of the problem coincide; we then say that OMQA is finitely controllable. Studying FOMQA is interesting in settings lacking finite controllability. This is the case not only for DLs lacking the finite model property (e.g., DLs allowing both inverse roles and number restrictions), but also for logics allowing transitive role declarations. Indeed, it has been recently proved that FOMQA is undecidable for {\cal SHOIF} ontologies [25], whereas the only fragment known to be decidable is Horn-{\cal{ALCI\hskip-0.43ptF}} [14]; more expressive fragments of {\cal SHOIF} are entirely uncharted. In this paper, we establish decidability for three of them: {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}, and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}.

OMQA is closely related to query answering under integrity constraints in database theory: given a finite database instance and a set of constraints, determine answers to a query that are certain to hold over any extension of the given instance that satisfies the constraints. Among important classes of constraints are inclusion dependencies (IDs) and functional dependencies (FDs). This problem, often called open-world query answering (OWQA), has also been studied in the variant considering only finite extensions of the given database instance (finite OWQA), which is directly relevant for our work. OWQA over IDs is known to be finitely controllable  [15, 24]. Rosati’s techniques were extended to show finite controllability for the guarded fragment of first order logic [5]. Under combinations of IDs and FDs, OWQA is undecidable, both unrestricted and finite, but multiple decidable fragments have been isolated. For instance, for non-conflicting IDs and FDs [7], unrestricted OWQA is decidable. However, finite OWQA is undecidable already for non-conflicting IDs and keys, which are less expressive than FDs [24]. The work of [1] investigates finite OWQA for unary IDs and FDs over arbitrary signatures.

Combinations of unary IDs and unary FDs can be expressed in relatively simple DLs. This relationship and the techniques developed by [10] have been exploited in the study of finite satisfiability for simple DLs [23]. Indeed, finite satisfiability has been studied extensively [8, 18, 16, 21], but FOMQA has received limited attention in the DL community. The mentioned results on the guarded fragment give finite controllability for DLs up to {\cal{ALCHOI}\textit{b}}. For non-finitely-controllable DLs, only the already mentioned results about {\cal SHOIF} and Horn-{\cal{ALCI\hskip-0.43ptF}} are known. For Datalog{}^{\pm}, finite controllability holds for several fragments [13, 2, 4, 9]. Finally, [22] studies finite query answering for expressive fragments of first order logic and establishes undecidability for the two variable fragment with counting quantifiers (\mathcal{C}^{2}), and decidability for its guarded fragment, \mathcal{GC}^{2}. Decidability of \mathcal{GC}^{2} has no direct implications for DLs with nominals or transitive roles, but it proves useful in the study of {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}.

Contributions.

We show that the combined complexity of FOMQA is in 2ExpTime for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}. These bounds are tight by existing matching lower bounds for OMQA for less expressive logics enjoying finite controllability [19, 17]. We present a direct construction of finite counter-models from arbitrary tree-like counter models for {\cal{ALC\hskip-1.075ptO\hskip-0.258ptI}}, thus re-proving finite controllability. An extension of this construction builds finite counter-models from special tree-like models of {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} and {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}, which are guaranteed to exist whenever finite counter-models exist. This way finite query entailment reduces to entailment over a certain class of tree-like models recognisable by tree automata. For {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}, we show that to some extent one can separate the reasoning about transitive and non-transitive (possibly functional) roles, and design a procedure that uses the decidability results for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} and {\cal{ALCI\hskip-0.43ptF}} as black boxes. The latter is derived from the work of [22].

2 Preliminaries

The DL {\cal{SOIF}} extends the classical DL {\cal ALC} with transitivity declarations on roles (\mathcal{S}), nominals (\mathcal{O}), inverses ({\cal{I}}), and role functionality declarations (\mathcal{F}[3]. We assume a signature of countably infinite disjoint sets of concept names \mathsf{N_{C}}=\{A_{1},A_{2},\dots\}, role names \mathsf{N_{R}}=\{r_{1},r_{2},\dots\} and individual names \mathsf{N_{I}}=\{a_{1},a_{2},\dots\}. {\cal{SOIF}}-concepts C,D are defined by the grammar:

C,D::=\top\mid A\mid\neg C\mid C\sqcap D\mid\{a\}\mid\exists r.C\,,

where r\in\mathsf{N_{R}}\cup\{r^{-}\mid r\in\mathsf{N_{R}}\} is a role. Roles of the form r^{-} are called inverse roles. A {\cal{SOIF}} TBox \mathcal{T} is a finite set of concept inclusions (CIs) C\sqsubseteq D, transitivity declarations \mathsf{Tr}(r), functionality declarations \mathsf{Fn}(r), where C,D are {\cal{SOIF}}-concepts and r is a role. We assume that if the TBox contains \mathsf{Tr}(r), then it contains neither \mathsf{Fn}(r) nor \mathsf{Fn}(r^{-}). With an appropriate extension of the signature, each {\cal{SOIF}} TBox can be transformed into an equivalent TBox whose each CI has one of the following normal forms:

\mathop{\mathop{\mbox{\bigmathxx\char 117}}}\limits A_{i}\sqsubseteq\mathop{% \mathop{\mbox{\bigmathxx\char 116}}}\limits B_{j}\,,\quad A\equiv\{a\}\,,\quad A% \sqsubseteq\forall r.B\,,\quad A\sqsubseteq\exists r.B\,,

where empty conjunction is equivalent to \top and empty disjunction to \bot. We also assume that for each concept name A used in \mathcal{T} there is a complementary concept name \bar{A} axiomatised with CIs \top\sqsubseteq A\sqcup\bar{A} and A\sqcap\bar{A}\sqsubseteq\bot.

{\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} TBoxes are restrictions of {\cal{SOIF}} TBoxes. {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} TBoxes do not contain functionality declarations, whereas concept inclusions in {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} do not contain inverse roles and nominals, respectively. Because the inverse of a transitive role is transitive anyway, for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}, and {\cal{SOIF}} we shall assume that if \mathsf{Tr}(r) is present in the TBox, then so is \mathsf{Tr}(r^{-}).

An ABox is a finite set of concept and role assertions of the form A(a) and r(a,b), where A\in\mathsf{N_{C}}, r\in\mathsf{N_{R}} and \{a,b\}\subseteq\mathsf{N_{I}}. A knowledge base (KB) is a pair {\cal{K}}=(\mathcal{T},\mathcal{A}). We write |{\cal{K}}| for |\mathcal{A}|+|\mathcal{T}|. We use \mathsf{CN}({\cal{K}}), \mathsf{Rol}({\cal{K}}), \mathsf{Nom}({\cal{K}}), and \mathsf{Ind}({\cal{K}}) to denote, respectively, the set of all concept names, roles, nominals, and individuals occurring in {\cal{K}}. We stress that if r occurs in {\cal{K}}, but r^{-} does not, then r^{-}\notin\mathsf{Rol}({\cal{K}}).

A unary type is a subset of \mathsf{CN}({\cal{K}}) that contains exactly one of the concept names A, \bar{A} for each A\in\mathsf{CN}({\cal{K}}). We write \mathsf{Tp}({\cal{K}}) for the set of all unary types.

The semantics is defined via interpretations {\cal{I}}=(\Delta^{\cal{I}},\cdot^{\cal{I}}) with a non-empty domain \Delta^{\cal{I}} and an interpretation function \cdot^{\cal{I}} assigning to each A\in\mathsf{CN}({\cal{K}}) a set A^{\mathcal{I}}\subseteq\Delta^{\mathcal{I}} and to each role name r with r\in\mathsf{Rol}({\cal{K}}) or r^{-}\in\mathsf{Rol}({\cal{K}}), a binary relation r^{\mathcal{I}}\subseteq\Delta^{\mathcal{I}}\times\Delta^{\mathcal{I}}. The interpretation of complex concepts and roles is defined as usual [3]. We only consider interpretations complying with the standard name assumption in the sense that a^{\cal{I}}=a for every a\in\mathsf{N_{I}}.

An interpretation {\cal{I}} satisfies \alpha\in\mathcal{T}\cup\mathcal{A}, written as {\cal{I}}\models\alpha, if the following holds: if \alpha is a CI C\sqsubseteq D then C^{\cal{I}}\subseteq D^{\cal{I}}, if \alpha is a transitivity declaration \mathsf{Tr}(r) then r^{{\cal{I}}}is transitive, if \alpha is a functionality declaration \mathsf{Fn}(r) then r^{{\cal{I}}} is a partial function, if \alpha is an assertion A(a) then a\in A^{{\cal{I}}}, and if \alpha is an assertion r(a,b) then (a,b)\in r^{\cal{I}}.

Finally, {\cal{I}} is a model of: a TBox \mathcal{T}, denoted {\cal{I}}\models\mathcal{T}, if {\cal{I}}\models\alpha for all \alpha\in\mathcal{T}; an ABox \mathcal{A}, denoted {\cal{I}}\models\mathcal{A}, if {\cal{I}}\models\alpha for all \alpha\in\mathcal{A}; and a KB {\cal{K}} if {\cal{I}}\models\mathcal{T} and {\cal{I}}\models\mathcal{A}.

Interpretation {\cal{I}} is a subinterpretation of interpretation {\cal{J}}, written as {\cal{I}}\subseteq{\cal{J}}, if \Delta^{\cal{I}}\subseteq\Delta^{\cal{J}}, A^{\cal{I}}\subseteq A^{\cal{J}}, and r^{\cal{I}}\subseteq r^{\cal{J}} for all A\in\mathsf{CN}({\cal{K}}), r\in\mathsf{Rol}({\cal{K}}). An interpretation {\cal{I}} is a subinterpretation of {\cal{J}} induced by \Delta_{0}\subseteq\Delta^{\cal{J}}, written as {\cal{I}}={\cal{J}}\upharpoonright{\Delta_{0}}, if \Delta^{\cal{I}}=\Delta_{0}, A^{\cal{I}}=A^{\cal{J}}\cap\Delta_{0}, and r^{\cal{I}}=r^{\cal{J}}\cap\Delta_{0}\times\Delta_{0} for all A\in\mathsf{CN}({\cal{K}}), r\in\mathsf{Rol}({\cal{K}}). We write {\cal{J}}\setminus X for the subinterpretation of {\cal{J}} induced by \Delta^{\cal{J}}\setminus X.

Let {\cal{I}} and {\cal{J}} be interpretations of {\cal{K}}. A homomorphism from {\cal{I}} to {\cal{J}}, written as h:{\cal{I}}\to{\cal{J}} is a function h:\Delta^{\cal{I}}\to\Delta^{\cal{J}} that preserves roles, concepts, and individual names; that is, (h(d),h(d^{\prime}))\in r^{\cal{J}} whenever (d,d^{\prime})\in r^{\cal{I}}, r\in\mathsf{Rol}({\cal{K}}), h(d)\in A^{\cal{J}} whenever d\in A^{\cal{I}}, A\in\mathsf{CN}({\cal{K}}), and h(a)=a for all a\in\mathsf{Ind}(\mathcal{K}). Note that {\cal{I}}\subseteq{\cal{J}} iff the identity mapping \mathrm{id} is a homomorphism \mathrm{id}:{\cal{I}}\to{\cal{J}}.

Let \mathsf{N_{V}} be a countably infinite set of variables. An atom is an expression of the form A(x) or r(x,y) with A\in\mathsf{N_{C}}, r\in\mathsf{N_{R}}, and x,y\in\mathsf{N_{V}}, referred to as concept atoms and role atoms, respectively. A conjunctive query (CQ) Q is an existentially quantified conjunction q of atoms, \exists x_{1}\cdots\exists x_{n}\,q\,. For simplicity we restrict it to be Boolean; that is, \textit{var}(Q)=\{x_{1},\dots,x_{n}\}. This is without loss of generality since the case of non-Boolean CQs can be reduced to the case of Boolean queries; see e.g. [26].

A match for Q in \mathcal{I} is a total function \pi:\textit{var}(Q)\to\Delta^{\cal{I}} such that {\cal{I}},\pi\models q under the standard semantics of first-order logic. An interpretation {\cal{I}} satisfies Q, written as {\cal{I}}\models Q if there exists a match for Q in {\cal{I}}. Note that we do not consider queries with constants (i.e., individual names); such queries can be viewed as non-boolean queries with a fixed valuation of free variables, and thus are covered by the reduction to the Boolean case. We do consider unions of conjunctive queries (UCQs), which are disjunctions of CQs. An interpretation {\cal{I}} satisfies a UCQ Q if it satisfies one of its disjuncts. It follows immediately that UCQs are preserved under homomorphisms; that is, if {\cal{I}}\models Q and there is a homomorphism from {\cal{I}} to \mathcal{J}, then also \mathcal{J}\models Q.

A query Q is entailed by a KB {\cal{K}}, denoted as {\cal{K}}\models Q, if every model of {\cal{K}} satisfies Q. A model of {\cal{K}} that does not satisfy Q is called a counter-model. The query entailment problem asks whether a KB {\cal{K}} entails a (U)CQ Q. Moreover, this problem is equivalent to that of finding a counter-model. It is well known that the query answering problem can be reduced to query entailment.

In this paper, we address the problem of finite query entailment, which is a variant of query entailment where only finite interpretations are considered: an interpretation \mathcal{I} is finite if \Delta^{\mathcal{I}} is finite, and a query Q is finitely entailed by \mathcal{K}, denoted as \mathcal{K}\models_{\sf{fin}}Q, if every finite model of {\cal{K}} satisfies Q.

3 From tree-shaped to finite counter-models

Let us fix an {\cal{ALC\hskip-1.075ptO\hskip-0.258ptI}} knowledge base {\cal{K}} and a union of conjunctive queries Q. Because we have nominals in our logic, we can assume without loss of generality that {\cal{K}}’s ABox does not contain role assertions.

The construction of a finite counter-model begins from a tree-shaped counter-model. An interpretation {\cal{I}} is tree-shaped if the interpretation {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) is a finite collection of trees of bounded degree, with elements of \mathsf{Ind}({\cal{K}})\setminus\mathsf{Nom}({\cal{K}}) occurring only in the roots. It is well known that a tree-shaped counter-model can be obtained from an arbitrary counter-model {\cal{M}} by the standard unravelling procedure. To turn a tree-shaped counter-model into a finite counter-model we use a variant of the blocking principle: a systematic policy of reusing elements. For example, rather than adding a fresh r-successor of unary type \tau, one could add an r-edge to some previously added element of unary type \tau (if there is one). This would give a finite model for {\cal{K}}, but not necessarily a counter-model for Q: a query asking for a cycle of length 42 might be unsatisfied in the original model, but the blocking principle introduces many new cycles, possibly one of length 42 among them. This is in fact the key difficulty to overcome: we need a blocking principle that does not introduce cycles shorter than the size of the query.

The first step is to look at sufficiently large neighbourhoods, rather than just unary types.

Definition 1.

For d\in\Delta^{\cal{I}}\setminus\mathsf{Nom}({\cal{K}}), the n-neighbourhood N_{n}^{{\cal{I}}}(d) is the subinterpretation of {\cal{I}} induced by \mathsf{Nom}({\cal{K}}) and all elements e\in\Delta^{\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) within distance n from d in {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}), enriched with a fresh concept interpreted as \{d\}. For a\in\mathsf{Nom}({\cal{K}}), N_{n}^{{\cal{I}}}(a) is the subinterpretation induced by \mathsf{Nom}({\cal{K}}), enriched similarly.

Replacing unary types with large neighbourhoods is not enough, because nearby elements can have arbitrary large isomorphic neighbourhoods: in the integers with the successor relation all n-neighbourhoods are isomorphic. The next step is to enrich the initial counter-model in such a way that overlapping neighbourhoods are not isomorphic, following an idea from [12].

Definition 2.

A colouring with k colours of an interpretation {\cal{I}} is an extension {\cal{J}} of {\cal{I}} with \Delta^{\mathcal{J}}=\Delta^{\mathcal{I}}, such that \mathcal{J} coincides with \mathcal{I} in every element in the signature of \mathcal{I}, and interprets fresh k concept names B_{1},\dots,B_{k} such that B_{1}^{\cal{J}},\dots,B_{k}^{\cal{J}} is a partition of \Delta^{\mathcal{J}}. We say that d\in B_{i}^{\cal{J}} has colour B_{i}. A colouring \mathcal{J} of {\cal{I}} is n-proper if for each d\in\Delta^{\mathcal{J}} all elements of N_{n}^{{\cal{J}}}(d) have different colours.

Because \mathsf{Nom}({\cal{K}}) is contained in each neighbourhood, in n-proper colourings each nominal has a unique colour.

Lemma 1.

If {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has bounded degree, then for all n\geq 0 there exists an n-proper colouring of {\cal{I}} with finitely many colours.

We write {\cal{I}}_{n} for an arbitrarily chosen n-proper colouring of {\cal{I}}. Because the neighbourhoods have bounded size and we used only finitely many colours, there are only finitely many n-neighbourhoods in {\cal{I}}_{n} up to isomorphism. The blocking principle described below relies on this.

Let {\cal{I}} be a tree-shaped counter-model for Q. We turn it into a finite counter-model for Q as follows. Because {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has bounded degree, we can consider an n-proper colouring {\cal{I}}_{n} of {\cal{I}}. For each branch \pi in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}), let d_{\pi} be the first node on \pi such that some earlier node e_{\pi} on \pi satisfies N^{{\cal{I}}_{n}}_{n}(d_{\pi})\simeq N^{{\cal{I}}_{n}}_{n}(e_{\pi}). The new interpretation {\cal{F}}_{n} is obtained as follows. {\cal{F}}_{n}\setminus\mathsf{Nom}({\cal{K}}) includes the branch \pi up to the predecessor of node d_{\pi} and the edge originally leading to d_{\pi} is redirected to e_{\pi}. Because the degree in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}) is bounded, the domain of {\cal{F}}_{n}\setminus\mathsf{Nom}({\cal{K}}) is a finite subset of the domain of {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}). The whole interpretation {\cal{F}}_{n} is obtained by including \mathsf{Nom}({\cal{K}}) into the domain and copying from {\cal{I}}_{n} all edges connecting elements of \mathsf{Nom}({\cal{K}}) with each other and with the elements of {\cal{F}}_{n}\setminus\mathsf{Nom}({\cal{K}}).

Because we started from a model of {\cal{K}}, for all n\geq 0,

{\cal{F}}_{n}\models{\cal{K}}\,.

We claim that for sufficiently large n, {\cal{F}}_{n} is a counter-model for Q. In order to prove this, we introduce yet another interpretation, containing {\cal{I}}_{n} and {\cal{F}}_{n} as subinterpretations.

Definition 3.

Let i\leq n and let d, e be elements of {\cal{I}}_{n}. We say that (d,e) is an i-link along role r if either d has an r-successor e^{\prime} in {\cal{I}}_{n} such that N^{{\cal{I}}_{n}}_{i}(e^{\prime})\simeq N^{{\cal{I}}_{n}}_{i}(e), or e has an r-predecessor d^{\prime} in {\cal{I}}_{n} such that N^{{\cal{I}}_{n}}_{i}(d^{\prime})\simeq N^{{\cal{I}}_{n}}_{i}(d).

Notice that for i<j, each j-link is also an i-link. Note also that (d,e) is an i-link along role r if and only if (e,d) is an i-link along r^{-}.

Definition 4.

For i\leq n, let {\cal{I}}_{n}^{i} be the interpretation obtained from {\cal{I}}_{n} by including into the interpretation of each role r all i-links along r; that is, for every role r and every i-link (d,e) along r, (d,e)\in r^{{\cal{I}}_{n}^{i}}.

Clearly, we have

{\cal{I}}_{n}\subseteq{\cal{I}}_{n}^{n}\subseteq{\cal{I}}_{n}^{n-1}\subseteq% \dots\subseteq{\cal{I}}_{n}^{1}\subseteq{\cal{I}}_{n}^{0}\,,

but the domains of all these interpretations coincide. We keep referring to the edges present in {\cal{I}}_{n}^{i} but not in {\cal{I}}_{n} as i-links, even though they are ordinary edges now.

Theorem 1.

Let P be a CQ with at most k binary atoms and let n\geq k^{2}. For each homomorphism h:P\to{\cal{I}}_{n}^{n} there exists a homomorphism h^{\prime}:P\to{\cal{I}}_{n} such that

N^{{\cal{I}}_{n}}_{n-k^{2}}(h(x))\simeq N^{{\cal{I}}_{n}}_{n-k^{2}}(h^{\prime}% (x))

for all x\in\mathit{var}(P).

Theorem 1 holds for any interpretation {\cal{I}} of any {\cal{SOIF}} KB.

Before proving Theorem 1, let us see that it implies that {\cal{F}}_{k^{2}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.7% 49943pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943% pt\kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2% .62496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q, where k is a common upper bound on the number of binary atoms in the CQs constituting Q. Because {\cal{F}}_{k^{2}} is obtained from {\cal{I}}_{k^{2}} by adding some k^{2}-links and restricting the domain, it follows that {\cal{F}}_{k^{2}}\subseteq{\cal{I}}_{k^{2}}^{k^{2}}. Consequently, if there were a homomorphism h:P\to{\cal{F}}_{k^{2}}\subseteq{\cal{I}}_{k^{2}}^{k^{2}} for some CQ P constituting Q, Theorem 1 would yield a homomorphism h^{\prime}:P\to{\cal{I}}_{k^{2}}, contradicting {\cal{I}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q. Thus, we have proved finite controllability for {\cal{ALC\hskip-1.075ptO\hskip-0.258ptI}}.

Corollary 1.

For each {\cal{ALC\hskip-1.075ptO\hskip-0.258ptI}} KB {\cal{K}} and UCQ Q,

{\cal{K}}\models Q\text{ iff }{\cal{K}}\models_{\mathsf{fin}}Q\,.
Proof of Theorem 1.

Let h(P) denote the subinterpretation of {\cal{I}}_{n}^{n} obtained by restricting the domain to h(\textit{var}(P)), and only keeping in each role r edges (h(x),h(y)) such that r(x,y) is an atom from P. We say that h uses an r-edge of {\cal{I}}_{n}^{n} if this r-edge is present in h(P).

Let \ell be the number of links in {\cal{I}}_{n}^{n} used by P. Then \ell\leq k, because P contains at most k binary atoms. The theorem follows by applying the following claim \ell times: For each homomorphism h:P\to{\cal{I}}_{n}^{i} with k\leq i\leq n that uses at least one link, there exists a homomorphism h^{\prime}:P\to{\cal{I}}_{n}^{i-k} that uses strictly fewer links and satisfies

N^{{\cal{I}}_{n}}_{i-k}(h(x))\simeq N^{{\cal{I}}_{n}}_{i-k}(h^{\prime}(x))

for all x\in var(P). Let us prove the claim.

Let (d,e) be a link used by h: an s-edge in h(P)\subseteq{\cal{I}}_{n}^{i} that is not an s-edge in {\cal{I}}_{n}. Then (d,e) is an i-link in {\cal{I}}_{n}. By symmetry it suffices to consider the case when d has an s-successor e^{\prime} in {\cal{I}}_{n} such that N^{{\cal{I}}_{n}}_{i}(e)\simeq N^{{\cal{I}}_{n}}_{i}(e^{\prime}). Let

g:N^{{\cal{I}}_{n}}_{i}(e)\to N^{{\cal{I}}_{n}}_{i}(e^{\prime})

be the witnessing isomorphism. Because g is identity over \mathsf{Nom}({\cal{K}})\subseteq\mathsf{Ind}({\cal{K}}), we have e\notin\mathsf{Nom}({\cal{K}}); indeed, otherwise e^{\prime}=g(e)=e and (d,e) would be an s-edge in {\cal{I}}_{n}. Let E be the connected component of e in

h(P)\cap({\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}))\,,

where by {\cal{J}}^{\prime}\cap{\cal{J}}^{\prime\prime} we mean the interpretation {\cal{J}} such that \Delta^{\cal{J}}=\Delta^{{\cal{J}}^{\prime}}\cap\Delta^{{\cal{J}}^{\prime% \prime}}, A^{\cal{J}}=A^{{\cal{J}}^{\prime}}\cap A^{{\cal{J}}^{\prime\prime}} for all concept names A, and r^{\cal{J}}=r^{{\cal{J}}^{\prime}}\cap r^{{\cal{J}}^{\prime\prime}} for all role names r. Because h(P) has at most k edges and (d,e) is an s-edge in h(P) but not in E, there are at most k-1 edges in E. We shall bring E close to d in {\cal{I}}_{n} by pulling it back by the i-link (d,e).

As E is a connected subinterpretation of {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}) and has at most k-1 edges, each element of E lies within distance k-1 from e. In particular, E\subseteq N^{{\cal{I}}_{n}}_{i}(e). Hence, E is contained in the domain of g and we can define

h^{\prime}:P\rightarrow{\cal{I}}_{n}^{i-k}

as follows. For each x\in\textit{var}(P), let h^{\prime}(x)=g(h(x)) if h(x)\in E, and h^{\prime}(x)=h(x) otherwise. The additional claim of the theorem follows immediately because g preserves (i-k)-neighbourhoods of elements within distance k from e. We only need to verify that h^{\prime} is indeed a homomorphism and that it uses fewer links than h.

Let r(x,y) be an atom of the query P. There are three cases to consider. First, suppose that h(x),h(y)\notin E. Then

(h^{\prime}(x),h^{\prime}(y))=(h(x),h(y))\,.

We have that (h(x),h(y)) is an r-edge in {\cal{I}}_{n}^{i-k} because h is a homomorphism into {\cal{I}}_{n}^{i}\subseteq{\cal{I}}_{n}^{i-k}. Obviously, h^{\prime} uses no new links for such atoms.

Next, suppose that h(x),h(y)\in E. Then

(h^{\prime}(x),h^{\prime}(y))=(g(h(x),g(h(y)))\,.

Moreover, (h(x),h(y)) is an r-edge in {\cal{I}}_{n}^{i} because h is a homomorphism. Suppose it is a link along r. Then, h(x) has an r-successor in {\cal{I}}_{n} with the same colour as h(y), or h(y) has an r-predecessor in {\cal{I}}_{n} with the same colour as h(x). Because both h(x) and h(y) lie within distance k-1 from e, this successor or predecessor belongs to N^{{\cal{I}}_{n}}_{i}(e), along with h(x) and h(y). But this is impossible because all elements of N^{{\cal{I}}_{n}}_{i}(e) have different colours. Hence, (h(x),h(y)) is an r-edge in N^{{\cal{I}}_{n}}_{i}(e) and (g(h(x)),g(h(y))) is an r-edge in N^{{\cal{I}}_{n}}_{i}(e^{\prime}). That is, (g(h(x)),g(h(y))) is an r-edge in {\cal{I}}_{n}^{i-k}, and is not a link along r.

Finally, suppose that h(x)\notin E and h(y)\in E (the symmetric case is analogous). Because h is a homomorphism, (h(x),h(y)) is an r-edge in {\cal{I}}_{n}^{i}. Now there are two subcases. Assume first that (h(x),h(y)) is also an r-edge in {\cal{I}}_{n}. By the definition of E it is not an r-edge in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}), so it must be an r-edge between a nominal and an element of E. As such, it is also an r-edge in N^{{\cal{I}}_{n}}_{i}(e). Consequently,

(h^{\prime}(x),h^{\prime}(y))=(h(x),g(h(y)))=(g(h(x)),g(h(y)))

is an r-edge in N^{{\cal{I}}_{n}}_{i}(e^{\prime}) and we conclude like previously.

Assume now that (h(x),h(y)) is an i-link along r. We need to check that (h(x),g(h(y))) is an r-edge in {\cal{I}}_{n}^{i-k}. Since h(y) and g(h(y)) are in distance at most k-1 from e and e^{\prime}, respectively, and N^{{\cal{I}}_{n}}_{i}(e)\simeq N^{{\cal{I}}_{n}}_{i}(e^{\prime}), it follows that

N^{{\cal{I}}_{n}}_{i-k}(h(y))\simeq N^{{\cal{I}}_{n}}_{i-k}(g(h(y)))\,.

Because (h(x),h(y)) is an i-link, it is also an (i-k)-link. If h(x) has an r-successor f in {\cal{I}}_{n} such that

N^{{\cal{I}}_{n}}_{i-k}(f)\simeq N^{{\cal{I}}_{n}}_{i-k}(h(y))\simeq N^{{\cal{% I}}_{n}}_{i-k}(g(h(y)))\,,

then (h(x),g(h(y))) is an (i-k)-link along r, unless the successor f is g(h(y)) itself; in either case (h(x),g(h(y))) is an r-edge in {\cal{I}}_{n}^{i-k}. The remaining possibility is that h(y) has an r-predecessor f in {\cal{I}}_{n} such that

N^{{\cal{I}}_{n}}_{i-k}(f)\simeq N^{{\cal{I}}_{n}}_{i-k}(h(x))\,.

Because h(y) lies within distance k-1 from e,

N^{{\cal{I}}_{n}}_{i-k}(f)\subseteq N^{{\cal{I}}_{n}}_{i}(e)\,.

Hence, g(f) is an r-predecessor of g(h(y)) such that

N^{{\cal{I}}_{n}}_{i-k}(g(f))\simeq N^{{\cal{I}}_{n}}_{i-k}(h(x))\,.

Consequently, (h(x),g(h(y))) is an (i-k)-link along r, unless g(f) is h(x) itself; in either case (h(x),g(h(y))) is an r-edge in {\cal{I}}_{n}^{i-k}.

Thus h^{\prime} is a homomorphism and uses links only for the atoms of P for which h uses links. To see that h^{\prime} uses strictly fewer links than h, recall that instead of the i-link (d,e) along s, it uses the s-edge (d,e^{\prime}), which is not a link. ∎

4 {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} and {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}

The goal of this section is to prove the following theorem.

Theorem 2.

The finite query entailment problem for both {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} and {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} is 2ExpTime-complete.

The lower bounds follow immediately from the results on unrestricted query entailment for {\cal{ALC\hskip-1.075ptO}} [19] and {\cal ALCI} [17], and Corollary 1; the challenge is to prove the upper bounds. We develop our argument with {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} in mind, but it adapts easily to {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} (see appendix).

Let us fix a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} knowledge base {\cal{K}} and a union of conjunctive queries Q. Like for {\cal{ALC\hskip-1.075ptO\hskip-0.258ptI}}, we can assume that {\cal{K}}’s ABox contains no role assertions.

Because {\cal{K}} is normalised, complete information about restrictions on the types of neighbours of a node is encoded in its unary type. Now, we would like the unary type to determine also the neighbouring nominals. This can be assumed without loss of generality, because one can always extend {\cal{K}} by adding for each a\in\mathsf{Nom}({\cal{K}}) and r\in\mathsf{Rol}({\cal{K}}) fresh concept names A_{r,a}, A_{r^{-},a} axiomatised with A_{r,a}\equiv\exists r.\{a\}, \{a\}\equiv\forall r.A_{r^{-},a}, and normalise the resulting KB.

Let {\cal{I}}^{*} be the interpretation obtained from interpretation {\cal{I}} by closing transitively the interpretation of each transitive role. Note that each existential restriction satisfied in {\cal{I}} is also satisfied in {\cal{I}}^{*}. The same holds for quantifier-free CI, and for universal restrictions involving non-transitive roles. For universal restrictions involving transitive roles, we ensure this property by adding a fresh concept name B^{\prime} for each B\in\mathsf{CN}({\cal{K}}) and CIs A\sqsubseteq\forall r.B^{\prime}, B^{\prime}\sqsubseteq\forall r.B^{\prime}, B^{\prime}\sqsubseteq B for each CI of the form A\sqsubseteq\forall r.B with r transitive.

The last assumption we would like to make about {\cal{K}} is that the unary type of each element of \mathsf{Nom}({\cal{K}}) is fully specified in the ABox; that is, for all a\in\mathsf{Nom}({\cal{K}}) and A\in\mathsf{CN}({\cal{K}}), the ABox contains either A(a) or \bar{A}(a). This can be done without loss of generality, because {\cal{K}}\models_{\mathsf{fin}}Q iff {\cal{K}}^{\prime}\models_{\mathsf{fin}}Q for each {\cal{K}}^{\prime} that can be obtained from {\cal{K}} by completing assertions about nominals. This adds the factor 2^{|\mathsf{Nom}({\cal{K}})|\cdot|\mathsf{CN}({\cal{K}})|} to the running time of the decision procedure, but the overall complexity bound is not affected, because it is exponential in the size of {\cal{K}} anyway.

Building on the results of the previous section, we show that the existence of a finite counter-model for Q is equivalent to the existence of a possibly infinite counter-model of a special form, which generalises tree-shaped models. The special form is based on the notion of clique-forests.

Definition 5.

A clique-forest for an interpretation {\cal{I}} of {\cal{K}} is a forest (a sequence of trees) whose each node v is labelled with a subinterpretation {\cal{I}}_{v} of {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) such that

  • the sets \Delta^{{\cal{I}}_{v}} are a partition of \Delta^{{\cal{I}}\setminus\mathsf{Nom}({\cal{K}})};

  • each {\cal{I}}_{v} is either a single element with all roles empty (element node) or a clique over some transitive role with all other roles empty and no repetitions of unary types (clique node);

  • apart from edges within cliques, in {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) there is exactly one edge between \Delta^{{\cal{I}}_{u}} and \Delta^{{\cal{I}}_{v}} for every two adjacent nodes u and v: assuming u is the parent of v, it is an r-edge from an element of \Delta^{{\cal{I}}_{u}} to an element of \Delta^{{\cal{I}}_{v}} for some r\in\mathsf{Rol}({\cal{K}}).

Definition 6.

An interpretation {\cal{I}} of {\cal{K}} is a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest if it admits a clique-forest that consists of at most |{\cal{K}}|^{2} trees of branching at most |{\cal{K}}|^{2}, such that each element of \mathsf{Ind}({\cal{K}})\setminus\mathsf{Nom}({\cal{K}}) occurs in some root.

Let {\cal{K}}^{*} denote the KB obtained from {\cal{K}} by dropping transitivity declarations.

Definition 7.

A counter-example for Q is a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest {\cal{I}} such that {\cal{I}}\models{\cal{K}}^{*} and {\cal{I}}^{*}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q.

If {\cal{I}} is a counter-example for Q, thanks to the initial preprocessing, {\cal{I}}^{*} is a counter-model for Q. One could also show that if there is a counter-model for Q, then there is a counter-example for Q. But we are interested in finite counter-models and for that we need an additional condition. Recall that a path is simple if it does not revisit elements.

Definition 8.

An interpretation {\cal{I}} is safe if it does not contain an infinite simple r-path for any transitive role r.

The whole argument now splits into two parts: equivalence of the existence of a finite counter-model and a safe counter-example, and effective regularity of the set of clique-forests of safe counter-examples. Together they show that finite query entailment can be solved by testing emptiness of an appropriate doubly-exponential automaton (with Büchi acceptance condition), which can be done in polynomial time. We begin from the second part, as it is needed to prove the first one.

Theorem 3.

Given a union Q of CQs, each of size at most m, one can compute (in time polynomial in the size of the output) an automaton of size 2^{|Q|\cdot|{\cal{K}}|^{{\cal{O}}(m)}} that recognises clique-forests of safe counter-examples for Q.

The proof of Theorem 3 is a routine automata construction (detailed in the appendix). Let us focus on the first part of the argument.

Theorem 4.

Q has a finite counter-model iff Q has a safe counter-example.

Suppose first that there exists a finite counter-model {\cal{M}} for Q. We build a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} forest {\cal{I}} out of it using a version of the standard unravelling. We begin by taking copies of all elements of \mathsf{Ind}({\cal{K}}) with unary types copied accordingly. Then, recursively, for each added element d^{\prime} and each CI A\sqsubseteq\exists r.B that is not yet satisfied for d^{\prime} in {\cal{I}} proceed as follows. The element d^{\prime} is a copy of some d from {\cal{M}} of the same unary type. Therefore there exists an element e in {\cal{M}} witnessing the CI. If e\in\mathsf{Nom}({\cal{K}}), then it is already included in {\cal{I}}, and we just add an r edge from d^{\prime} to e. Assume e\notin\mathsf{Nom}({\cal{K}}). If r is not a transitive role, we just add a copy of e as an r-successor of d^{\prime}. Assume that r is a transitive role. Let X be the strongly connected component of r that contains e and let X_{0} be a minimal set that contains at least one element from each nonempty C^{\cal{M}}\cap\big{(}X\setminus\mathsf{Nom}({\cal{K}})\big{)}, where C ranges over \mathsf{CN}({\cal{K}}). By minimality, |X_{0}|\leq|{\cal{K}}|. We add to {\cal{I}} an r-clique over a copy of X_{0}, with an r edge from d^{\prime} to the copy of some element f\in B^{\cal{M}}\cap X_{0}; f exists because e\in B^{\cal{M}}\cap\big{(}X\setminus\mathsf{Nom}({\cal{K}})\big{)}. Note that no other edges among newly added elements are present: existential restrictions for these nodes will be witnessed in the following steps of the construction. Let {\cal{I}} be the interpretation obtained in the limit. By construction, {\cal{I}} admits a clique-forest. For each element at most one successor per CI is added. Because each clique node contains up to |{\cal{K}}| elements, the branching of the clique-forest is bounded by |{\cal{K}}|^{2}. The same bound holds for the number of trees in the clique-forest: we begin from |\mathsf{Ind}({\cal{K}})| nodes, but then the ones corresponding to elements of \mathsf{Nom}({\cal{K}}) are removed and their children become roots. Hence, {\cal{I}} is a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} forest. Because we do not unravel cliques in transitive roles, it is safe.

Lemma 2.

{\cal{I}} is a safe counter-example for Q.

Assume now that there exists a safe counter-example {\cal{I}} for Q. By Theorem 3, the set of clique-forests of safe counter-examples for Q can be recognised by an automaton. It is well known that the automaton then accepts a regular forest, which has only finitely many non-isomorphic subtrees. Hence, without loss of generality we can assume that the clique-forest of {\cal{I}} has p non-isomorphic subtrees for some p. Using the methodology from the previous section we shall turn {\cal{I}} into a finite counter-model for Q. The main obstacle is that Q uses transitive roles, which are not fully represented in {\cal{I}}. Our solution is to replace Q with a different query that can be evaluated directly over {\cal{I}}. This is done by exploiting a bound on the length of simple r-paths for transitive roles r, guaranteed by the regularity of the clique-forest of {\cal{I}}.

Definition 9.

An interpretation is \ell-bounded if for each transitive role r, each simple r-path has length at most \ell.

Lemma 3.

{\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) is \ell-bounded for \ell=2p\cdot|{\cal{K}}|.

Proof.

Let r be a transitive role in {\cal{K}}. Each r-path going down the clique-forest of {\cal{I}} contains at most p nodes. Indeed, if there were a longer r-path, then a subtree would occur twice on that path, which immediately leads to an infinite simple r-path in {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}), contradicting the safety of {\cal{I}}. Each simple path in the clique-forest can be split into an r-path going up and an r-path going down. Each of them has at most p nodes. Because each node contains at most |{\cal{K}}| elements, it follows that each simple r-path in {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has length at most 2p\cdot|{\cal{K}}|. ∎

Lemma 4.

For each {\cal{J}}, if {\cal{J}}\setminus\mathsf{Nom}({\cal{K}}) is \ell-bounded, then {\cal{J}} is \ell^{*}-bounded for \ell^{*}=(\ell+2)\cdot(|\mathsf{Nom}({\cal{K}})|+1).

Let Q^{*} be obtained from Q by replacing each transitive atom s(x,y) by the disjunction

\bigvee_{i\leq\ell^{*}}s^{i}(x,y)\,,

where s^{i}(x,y) is the conjunctive query expressing the i-fold composition of s. Assuming that each disjunct of Q contains at most k binary atoms, Q^{*} can be rewritten as a union of conjunctive queries, each using at most k\cdot\ell^{*} binary atoms.

Lemma 5.

For all \ell^{*}-bounded {\cal{J}}, {\cal{J}}^{*}\models Q iff {\cal{J}}\models Q^{*}.

By Lemmas 35, we conclude that {\cal{I}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q^{*}. Now we can use the blocking principle. Because clique nodes have at most |{\cal{K}}| elements and each node has at most |{\cal{K}}|^{2} children, {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has bounded degree and we can consider the n-properly coloured {\cal{I}}_{n}, for any n. On each branch \pi in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}), let D_{\pi} be the first node for which some earlier node E_{\pi} satisfies N^{{\cal{I}}_{n}}_{n}(d_{\pi})\simeq N^{{\cal{I}}_{n}}_{n}(e_{\pi}), where d_{\pi}\in D_{\pi} and e_{\pi}\in E_{\pi} are the endpoints of the edges connecting D_{\pi} and E_{\pi} to their parent nodes. The new interpretation {\cal{F}}_{n} is obtained as usual: we include the branch \pi up to the predecessor of node D_{\pi} and the edge originally leading to d_{\pi} is redirected to e_{\pi}; edges connecting the elements of \mathsf{Nom}({\cal{K}}) with each other and with the elements of the included parts of the branches are copied from {\cal{I}}_{n}.

Because we started from {\cal{I}}\models{\cal{K}}^{*}, it is routine to check that {\cal{F}}_{n}\models{\cal{K}}^{*} for all n. By the initial preprocessing, ({\cal{F}}_{n})^{*}\models{\cal{K}}. Let us fix

n=\max((k\cdot\ell^{*})^{2},(\ell+1)^{2}+\ell)\,.

By Theorem 1, {\cal{F}}_{n}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q^{*}. We conclude ({\cal{F}}_{n})^{*}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.74994% 3pt\kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2% .62496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q using Lemmas 45 and Theorem 5 below.

Definition 10.

A link (d,e) in {\cal{I}} along r is external if either no r-path from the witnessing e^{\prime} to d is disjoint from \mathsf{Nom}({\cal{K}}) or dually no r-path from e to the witnessing d^{\prime} is disjoint from \mathsf{Nom}({\cal{K}}).

By construction, all links in {\cal{I}}_{n} along transitive roles included into {\cal{F}}_{n} are external.

Theorem 5.

Assume that {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has bounded degree and is \ell-bounded. Let n>(\ell+1)^{2}+\ell and let {\cal{J}} be a subinterpretation of {\cal{I}}_{n}^{n} in which all links along transitive roles are external. Then, {\cal{J}}\setminus\mathsf{Nom}({\cal{K}}) is also \ell-bounded.

Proof.

Suppose there is a simple s-path \pi in {\cal{J}}\setminus\mathsf{Nom}({\cal{K}}) of length \ell+1, for some transitive role s. We can view \pi as a conjunctive query with \ell+1 s-atoms. By applying Theorem 1 to \pi we lift the inclusion homomorphism \pi\subseteq{\cal{J}}\subseteq{\cal{I}}_{n}^{n} to a homomorphism h:\pi\to{\cal{I}}_{n}\,, that preserves \ell-neighbourhoods. Because \pi is disjoint from \mathsf{Nom}({\cal{K}}), so is its image. Hence, we can view h as a homomorphism

h:\pi\to{\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}})\,.

Because {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}) is \ell-bounded, it suffices to show that h is injective to obtain a contradiction.

Observe first that h is injective over segments of \pi that do not contain links. Indeed, because {\cal{I}}_{n} is n-properly coloured and n\geq|\pi|, in each such segment all elements have different colours. Hence, it suffices to show that the images of the segments are disjoint. Suppose the images of some two different segments overlap on an element from a strongly connected component X of s in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}). Hence, all segments between these two are entirely mapped to X. In particular, there exists an n-link (d,e) along s such that h(d)\in X and h(e)\in X. We claim this is impossible.

By symmetry we can assume that d has an s-successor e^{\prime} such that no s-path from e^{\prime} to d is disjoint from \mathsf{Nom}({\cal{K}}) and N^{{\cal{I}}_{n}}_{n}(e^{\prime})\simeq N^{{\cal{I}}_{n}}_{n}(e). In particular, e^{\prime} and e have the same colour. Because n>1, we have e^{\prime}\in N^{{\cal{I}}_{n}}_{n}(d). We obtain a contradiction by finding another element in N^{{\cal{I}}_{n}}_{n}(d) of the same colour as e.

Let D be the strongly connected component of s in {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}) that contains d. Because {\cal{I}}_{n}\setminus\mathsf{Nom}({\cal{K}}) is \ell-bounded, all elements of D are within distance \ell<n from d. Consequently, D is isomorphic to X, because h preserves \ell-neighbourhoods. Hence, there exists an element e^{\prime\prime}\in D\subseteq N^{{\cal{I}}_{n}}_{n}(d) of the same colour as e. Because e^{\prime}\notin D, we have e^{\prime}\neq e^{\prime\prime}, as required for the contradiction. ∎

5 {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}

For {\cal{ALCI\hskip-0.43ptF}}, a tight upper bound on the complexity of finite query entailment can be obtained by revisiting some known and implicitly proven results on the guarded fragment with two variables and counting [22, 21]. We consider a slightly more general problem of finite entailment modulo types, which will be useful later. For a KB {\cal{K}}, a query Q, and a set of unary types T\subseteq\mathsf{Tp}({\cal{K}}) we write {\cal{K}}\models_{\mathsf{fin}}^{T}Q if for each interpretation {\cal{I}} that only realises types from T, if {\cal{I}}\models{\cal{K}} then {\cal{I}}\models Q. This problem reduces to finite query entailment by including into Q one CQ for each type not listed in T, but this makes Q exponential in the size of \mathsf{CN}({\cal{K}}) and leads to a worse complexity upper bound.

Theorem 6.

Given an {\cal{ALCI\hskip-0.43ptF}} KB {\cal{K}}, a union Q of CQs, each of size at most m, and a set T\subseteq\mathsf{Tp}({\cal{K}}), one can decide whether {\cal{K}}\models_{\mathsf{fin}}^{T}Q in time 2^{{\cal{O}}(|{\cal{K}}|+|Q|\cdot m^{m})}.

Corollary 2.

The finite query entailment problem for {\cal{ALCI\hskip-0.43ptF}} is 2ExpTime-complete.

Relying on Theorem 6 and our previous results for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, we extend the upper bound of Corollary 2 to {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}.

Let us fix a UCQ Q and a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} KB {\cal{K}}. We work again with counter-models of a special shape, this time based on tree partitions. We assume a proviso that the ABox of {\cal{K}} does not contain transitive and non-transitive roles simultaneously; we lift it by the end of the section.

Definition 11.

A tree partition of an interpretation {\cal{I}} is a tree T whose each node v is labelled with a finite subinterpretation {\cal{I}}_{v} of {\cal{I}}, called a bag, such that \bigcup_{v\in T}{\cal{I}}_{v}={\cal{I}} and for each element some bag containing it is the parent of all other bags containing it. The maximal bag size is called the width of T.

Definition 12.

An interpretation {\cal{I}} is a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree if it admits a tree partition such that

  • the root bag contains \mathsf{Ind}({\cal{K}}),

  • each bag contains edges in transitive roles only (tr bag) or in non-transitive roles only (nt bag),

  • each element is in exactly two bags, one tr and one nt,

  • each two adjacent bags share exactly one element.

Lemma 6.

There exists a finite counter-model for Q iff there exists a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree counter-model for Q of finite width.

Proof.

Let {\cal{F}} be a finite counter-model for Q. We turn it into a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree counter-model {\cal{I}} using a very simple unravelling procedure. For each \mu\in\{\textsc{tr},\textsc{nt}\}, let {\cal{F}}_{\mu} be the interpretation obtained from {\cal{F}} by restricting the set of roles to \mu roles. By the proviso, the ABox of {\cal{K}} contains only \mu_{0} roles for some \mu_{0}\in\{\textsc{tr},\textsc{nt}\}. We construct the {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree top down. In the root we put {\cal{F}}_{\mu_{0}} itself. Then, iteratively, for each element d that belongs only to a \mu bag we add a child bag obtained by taking an isomorphic copy of {\cal{F}}_{\nu} for \nu\neq\mu, in which all elements except d are replaced with their fresh copies; in particular, each individual different from d is replaced with an ordinary anonymous element of the same unary type. It is routine to verify that the resulting interpretation {\cal{I}} is a model of {\cal{K}}. The natural homomorphism from {\cal{I}} to {\cal{F}} ensures that {\cal{I}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q. The width of {\cal{I}} is |{\cal{F}}|.

Let us now take a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree {\cal{I}} of width \ell that is a counter-model for Q. We use the methodology developed for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} to turn {\cal{I}} into a finite counter-model. Because |{\cal{I}}_{v}|\leq\ell, {\cal{I}} has degree at most 2\cdot\ell\cdot|{\cal{K}}|. Because each r-path for any transitive role r is contained within a single tr bag, it follows that {\cal{I}} is (\ell-1)-bounded.

For the purpose of the coloured blocking principle, we need to ensure that each infinite branch of the tree partition of our interpretation contains infinitely many tr bags that consist of a single edge (pointing up or down the tree). We achieve this by performing an additional unravelling of {\cal{I}}. We start with a copy of the root bag in the tree partition of {\cal{I}}, where elements of \mathsf{Ind}({\cal{K}}) are preserved and other elements are replaced with their fresh copies. Let d^{\prime} be an element in the interpretation under construction that so far belongs to only one bag X^{\prime}. By construction, d^{\prime} is a copy of some element d of {\cal{I}}. If X^{\prime} is a tr bag, add a copy of the nt bag that contains d, with d replaced with d^{\prime} and other elements replaced with their fresh copies. Assume that X^{\prime} is an nt bag. For each tr role r and each r-successor e of d, add three new bags. First, add a bag consisting of d^{\prime}, a fresh copy e^{\prime} of e, and an r-edge from d^{\prime} to e^{\prime}. Then, for each \mu\in\{\textsc{tr},\textsc{nt}\}, add a copy of the \mu-bag containing e, with e replaced with e^{\prime} and all other elements replaced with their fresh copies (different for each \mu).

Let {\cal{J}} be the interpretation obtained in the limit. Because in the tree partition of {\cal{I}} tr bag and nt bags alternate, in the tree partition of {\cal{J}} nt bags have only new single-edge tr bag children, new single-edge tr bags have one nt bag child and one tr bag child, and copies of original tr bags have only nt bag children. Consequently, on each infinite branch, there are infinitely many single-edge tr bags.

Interpretations of transitive roles in {\cal{J}} need not be transitive relations, but it is straightforward to check that {\cal{J}} is a model of {\cal{K}}^{*}; in particular, functionality declarations were not affected because the new single-edge bags involve only tr roles (non-functional). Moreover, {\cal{J}}^{*}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q because {\cal{J}} maps homomorphically to {\cal{I}} and, consequently, so does {\cal{J}}^{*}. The degree in {\cal{J}} is bounded by 2\cdot\ell\cdot|{\cal{K}}|+1, because each element belongs to one tr bag and one nt bag of size at most \ell, and possibly one single-edge bag. Finally, {\cal{J}} is 2\ell-bounded because in the worst case a simple r-path for any transitive role r goes first through a bag with at most \ell elements, then two single-edge bags, and then another bag with at most \ell elements.

We can now apply the coloured blocking principle. Suppose each disjunct of Q uses at most k binary atoms. Let \ell^{*}=2\ell and let Q^{*} be obtained from Q by replacing each transitive role atom S by the disjunction

\bigvee_{i\leq\ell^{*}}S^{i}(x,y)\,,

and rewriting the resulting query as a UCQ. Each CQ in Q^{*} has at most k\cdot\ell^{*} binary atoms. Because {\cal{J}} has bounded degree, we can consider its n-proper colouring {\cal{J}}_{n} for any n. On each branch \pi of the tree partition of {\cal{J}}_{n}, let D_{\pi} be the first single-edge tr bag for which some earlier single-edge tr bag E_{\pi} satisfies N^{{\cal{J}}_{n}}_{n}(d_{\pi})\simeq N^{{\cal{J}}_{n}}_{n}(e_{\pi}), where d_{\pi}\in D_{\pi} and e_{\pi}\in E_{\pi} are the elements that D_{\pi} and E_{\pi} share with their respective parents. The new structure {\cal{F}}_{n} is obtained like before: we include the branch \pi up to the predecessor of node D_{\pi} and the edge in D_{\pi} is redirected to the successor of e_{\pi} in E_{\pi}. Because {\cal{J}} is a model of {\cal{K}}^{*} and we only redirected edges in non-functional roles, it follows that {\cal{F}}_{n} is a model of {\cal{K}}^{*}. Consequently, {\cal{F}}_{n}^{*}\models{\cal{K}}. Let us now fix

n=\max((k\cdot\ell^{*})^{2},(\ell^{*}+1)^{2}+\ell^{*})\,.

By Theorem 1, we get {\cal{F}}_{n}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}{\cal{Q}}^% {*}. Because {\cal{J}} is \ell^{*}-bounded and we clearly used only external links in the construction of {\cal{F}}_{n}, by Lemma 5 and Theorem 5 we obtain {\cal{F}}_{n}^{*}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.7% 49943pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943% pt\kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2% .62496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}{\cal{Q}}. ∎

Thus, it suffices to consider counter-models that are {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-trees of finite width, but there is a priori no bound on the width, which hinders direct application of automata. Instead, we show that one can test existence of {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree counter-models without manipulating {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-trees directly.

Our first step is to adjust the structure of Q’s disjuncts to the structure of {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-trees. To keep this as simple as possible, we make a second proviso that each CQ constituting Q is connected. We eliminate it towards the end of the section. Let P be one of the CQs constituting Q. It is convenient to think P as an interpretation with the domain \textit{var}(P) and interpretations of concepts and roles given by the atoms of P. Whenever P is mapped homomorphically into a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree {\cal{I}}, the image of P is a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree as well. Indeed, because P is connected, a witnessing tree partition of the image of P is naturally induced by the tree partition of {\cal{I}}. Hence, if Q is a union of n CQs of size at most m, over {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-trees Q is equivalent to

Q_{1}\cup Q_{2}\cup\dots\cup Q_{p}\,, (*)

where the queries Q_{i} are all non-isomorphic {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-trees obtained as homomorphic images of the CQs of Q, each using a fresh set of variables, and p\leq n\cdot m^{m}.

Figure 1: Queries Q_{\textsc{tr},x} and Q_{\textsc{nt},x} for x\in\textit{var}(Q_{i}).

For all \mu\in\{\textsc{tr},\textsc{nt}\} and x\in\bigcup_{i}\textit{var}(Q_{i}), let Q_{\mu,x} be the query obtained by taking all bags that are reachable from the \mu bag containing x without visiting the other bag containing x, as illustrated in Figure 1. For all x\in\textit{var}(Q_{i}) it holds that Q_{i}=Q_{\textsc{tr},x}\land Q_{\textsc{nt},x}.

Let {\cal{K}}_{Q} be obtained from {\cal{K}} by extending the TBox as follows: for each \mu\in\{\textsc{tr},\textsc{nt}\} and x\in\bigcup_{i}\textit{var}(Q_{i}), we add a fresh concept name A_{\mu,x} and the complementary concept name \bar{A}_{\mu,x}, together with the usual axiomatisation. The interpretation of A_{\mu,x} is intended to collect elements d such that Q_{\mu,x} can be matched with x mapped to d.

A specialisation \widetilde{Z} of a bag Z of query Q_{i} is obtained by including for each x\in\textit{var}(Z) and each \mu\in\{\textsc{tr},\textsc{nt}\} either the atom A_{\mu,x}(x) or the atom \bar{A}_{\mu,x}(x), where \bar{A}_{\mu,x} is the concept name complementary to A_{\mu,x}. A specialisation \widetilde{Z} of a \mu-bag Z of Q_{i} is consistent if for all x it holds that: \widetilde{Z} contains A_{\mu,x}(x) iff for all y\in\textit{var}(\widetilde{Z})\setminus\{x\}, \widetilde{Z} contains A_{\nu,y}(y) with \nu\neq\mu. An interpretation {\cal{I}} (with the extended set of concept names) is consistent if it does not match inconsistent specialisations of bags of queries Q_{1},Q_{2},\dots,Q_{p}.

For a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} KB {\cal{L}} and \mu\in\{\textsc{tr},\textsc{nt}\} we write {\cal{L}}\!\upharpoonright\!\mu for the KB obtained by dropping all ABox assertions, CIs, and declarations that involve \nu-roles for \nu\neq\mu.

Definition 13.

T\subseteq\mathsf{Tp}({\cal{K}}_{Q}) is a counter-witness for Q if

  • for all x\!\in\!\bigcup_{i}\textit{var}(Q_{i}), each \tau\!\in\!T contains \bar{A}_{\textsc{tr},x} or \bar{A}_{\textsc{nt},x};

  • assuming {\cal{K}} uses only \mu_{0}-roles in the ABox, there exists a consistent finite model of {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0} that realises only types from T; and

  • for all \tau\in T and \mu\in\{\textsc{tr},\textsc{nt}\} there exists a consistent finite model of the TBox of {\cal{K}}_{Q}\!\upharpoonright\!\mu that realises type \tau and realises only types from T.

Lemma 7.

Q admits a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} tree counter-model of finite width iff there exists a counter-witness for Q.

Proof.

Let {\cal{I}} be a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree counter-model for Q; we do not need to assume that {\cal{I}} has finite width. Let {\cal{I}}_{Q} be obtained by extending {\cal{I}} with the unique interpretation of the concept names A_{\mu,x} and \bar{A}_{\mu,x} faithful to their intended meaning: if Q_{\mu,x} can be matched in {\cal{I}} with x mapped to d, then d\in\left(A_{\mu,x}\right)^{{\cal{I}}_{Q}}, and otherwise d\in\left(\bar{A}_{\mu,x}\right)^{{\cal{I}}_{Q}}. By construction, {\cal{I}}_{Q} is consistent, and so is each of its bags. Let T be the set of types realised in {\cal{I}}_{Q}. Because {\cal{I}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q, no type from T contains both A_{\textsc{tr},x} and A_{\textsc{nt},x}, which gives the first condition in Definition 13. The root bag of {\cal{I}}_{Q} witnesses the second condition. As each element of {\cal{I}}_{Q} belongs to a tr bag and a nt bag, each \tau\in T is realised in some tr bag and in some nt bag. These bags witness the third condition.

Conversely, let T\subseteq\mathsf{Tp}({\cal{K}}_{Q}) be a counter-witness for Q. Let {\cal{I}}_{0} be the interpretation guaranteed by the second condition, and let {\cal{I}}_{\mu,\tau} be interpretations guaranteed by the third condition. From them we build a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree counter-model for Q in a top-down fashion. The root bag is {\cal{I}}_{0}. Take an element d that so far only belongs to a \mu-bag. By construction, the type \tau of d belongs to T. Let \nu\neq\mu. We add to the {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree under construction a copy of {\cal{I}}_{\mu,\tau}, with one element of type \tau replaced by d. Because {\cal{K}} is normalised, the resulting {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}-tree {\cal{I}} is a model of {\cal{K}}. The tree partition of {\cal{I}} has finite width because each bag is a copy of one of the finitely many finite interpretations {\cal{I}}_{0} and {\cal{I}}_{\mu,\tau}.

It remains to see that {\cal{I}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q. We first prove by induction on the size of Q_{\mu,x} that for each homomorphism f:Q_{\mu,x}\to{\cal{I}}, it holds that f(x)\in A_{\mu,x}^{\,{\cal{I}}}. Let Z_{x} and Z_{f(x)} be the \mu-bags of x and f(x), respectively. By the inductive assumption, f(y)\in A_{\nu,y}^{\,{\cal{I}}} for all y\in Z_{x}\setminus\{x\} and \nu\neq\mu. Because Z_{f(x)} matches only consistent specialisations, there is a consistent specialisation \widetilde{Z}_{x} of Z_{x} such that f induces a homomorphism from \widetilde{Z}_{x} to Z_{f(x)}. From the consistency of \widetilde{Z}_{x} it follows that f(x)\in A_{\mu,x}^{\,{\cal{I}}}. Now, if {\cal{I}}\models Q, then there is a homomorphism f:Q_{i}\to{\cal{I}} for some i. Then, f(x)\in A^{\,{\cal{I}}}_{\textsc{tr},x}\cap A^{\,{\cal{I}}}_{\textsc{nt},x} for all x\in\textit{var}(Q_{i}). Because all types realised in {\cal{I}} occur in T, this contradicts Definition 13. ∎

Theorem 7.

The finite query entailment problem for {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} is in 2ExpTime.

Proof.

Let {\cal{K}} be a {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} KB using only tr or only nt roles in the ABox and let Q be a union of connected CQs, each of size at most m. By Lemmas 6-7, testing {\cal{K}}\models_{\mathsf{fin}}Q amounts to deciding if there exists a counter-witness for Q, which can be done using the following variant of type elimination [20, 27]. Let T_{0} be the set of types from \mathsf{Tp}({\cal{K}}_{Q}) that contain either \bar{A}_{\textsc{tr},x} or \bar{A}_{\textsc{nt},x} for all x\in\bigcup_{i}\textit{var}(Q_{i}). For T\subseteq T_{0}, let F(T) be the set of types \tau\in T_{0} such that for all \mu\in\{\textsc{tr},\textsc{nt}\} there exists a consistent finite model of the TBox of {\cal{K}}_{Q}\!\upharpoonright\!\mu that realises type \tau and realises only types from T. Then, a set T is a counter-witness if it is a fixed point of the operator F and satisfies the second condition of Definition 13. Notice that F is a monotone operator on subsets of T_{0}. Consequently, F has the greatest fixed point and it can be obtained by iterating F on T_{0}:

T_{0}\supseteq F(T_{0})\supseteq F^{2}(T_{0})\supseteq\dots\supseteq F^{i}(T_{% 0})=F^{i+1}(T_{0})

for some i\leq|T_{0}|. Thus, a counter-witness for Q exists iff F^{i}(T_{0}) satisfies the second condition of Definition 13. It remains to see how to test this condition and how to compute F(T) for a given T. Both these tasks reduce to finite query entailment modulo types for simpler logics.

A given T satisfies the second condition of Definition 13 iff it is not the case that {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0}\models_{\mathsf{fin}}^{T}Q^{\prime}, where the UCQ Q^{\prime} is the union of all inconsistent specialisations of the bags of queries Q_{1},Q_{2},\dots,Q_{p} (*5). The size of {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0} is bounded by the size of {\cal{K}}_{Q} which is |{\cal{K}}|+{\cal{O}}(mp), and Q^{\prime} is a union of at most p\cdot 2^{2m} CQs of size {\cal{O}}(m).

If \mu_{0}=\textsc{nt}, then {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0} is an {\cal{ALCI\hskip-0.43ptF}} KB. By Theorem 6, we can decide if {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0}\models_{\mathsf{fin}}^{T}Q^{\prime} in time 2^{{\cal{O}}(|{\cal{K}}_{Q}\upharpoonright\mu_{0}|+|Q^{\prime}|\cdot m^{m})}, which is 2^{{\cal{O}}(|{\cal{K}}|+mp\cdot 2^{\textrm{poly}(m)})}.

If \mu_{0}=\textsc{tr}, then {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0} is a {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} KB (with no nominals used). Using our previous results on {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, we can decide if {\cal{K}}_{Q}\!\upharpoonright\!\mu_{0}\models_{\mathsf{fin}}Q^{\prime} in time 2^{|Q^{\prime}|\cdot|{\cal{K}}_{Q}\upharpoonright\mu_{0}|^{{\cal{O}}(m)}}, which is 2^{mp\cdot(|{\cal{K}}|+mp)^{{\cal{O}}(m)}}. We can easily incorporate the set of types T without increasing the complexity: if the ABox contains some type not in T the algorithm immediately accepts; otherwise, the automaton is constructed like before, except that the set of all types is replaced everywhere with T.

To compute F(T) for a given T we need to test for each \tau\in T and \mu\in\{\textsc{tr},\textsc{nt}\} whether there is a consistent finite model of the TBox of {\cal{K}}_{Q}\!\upharpoonright\!\mu that realises type \tau and realises only types from T. For each \tau and \mu this test can be done just like above, except that in {\cal{K}}_{Q}\!\upharpoonright\!\mu we replace the ABox with \{A(b)\mid A\in\tau\} where b is a fresh individual name. The complexity bounds for a single test carry over. To compute the fixed point we need at most 2^{2mp+|{\cal{K}}|} iterations of F, each requiring at most 2^{2mp+|{\cal{K}}|} {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} tests and at most 2^{2mp+|{\cal{K}}|} {\cal{ALCI\hskip-0.43ptF}} tests. These factors are absorbed by the asymptotic bounds on the cost of single tests. Substituting the bound p\leq|Q|\cdot m^{m} we obtain the bound 2^{(|{\cal{K}}|+|Q|)^{\mathrm{poly}(m)}} for the total running time.

Let us now lift the provisos. Take an arbitrary {\cal{S\hskip-0.258ptI\hskip-0.43ptF}} KB {\cal{K}} and arbitrary UCQ Q. Like for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, we can assume that each individual has its unary type fully specified in the ABox. Consider two KBs {\cal{K}}_{1} and {\cal{K}}_{2} obtained from {\cal{K}} by removing from the ABox of {\cal{K}} all transitive and all non-transitive roles, respectively. One can prove (see appendix) that {\cal{K}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!\mathsf{fin}}Q iff there exist finite interpretations {\cal{F}}_{1}\models{\cal{K}}_{1} and {\cal{F}}_{2}\models{\cal{K}}_{2} such that for each disjunct P of Q, for each V\subseteq\textit{var}(P), for each function h:V\to\mathsf{Ind}({\cal{K}}), for each partition of the atoms of P into P_{1} and P_{2} with \textit{var}(P_{1})\cap\textit{var}(P_{2})\subseteq V, for some i it holds that {\cal{F}}_{i}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}h(P_{i}), where h(P_{i}) is a CQ with constants obtained from P_{i} by applying h to variables in V. For each P, V, h and each partition P_{1}, P_{2} of P, guess whether it is h(P_{1}) or h(P_{2}) that will not hold. Let Q_{i} be the union of all chosen h(P_{i}); note that this is a union of exponentially many CQs of size bounded by the maximal size of Q’s CQs. (The number of possible Q_{i} is doubly exponential, so eliminating this nondeterminism adds a doubly exponential factor to the running time.) It holds that {\cal{K}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!\mathsf{fin}}Q iff {\cal{K}}_{i}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.74994% 3pt$\displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt% \kern-3.749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.6% 2496pt\kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{% \kern 1.874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!% \mathsf{fin}}Q_{i} for all i, and each {\cal{K}}_{i} respects the proviso. For the second proviso, consider R=R_{1}\cup\dots\cup R_{p} with R_{j}=R^{1}_{j}\land\dots\land R^{q_{j}}_{j}, where R_{j}^{k} are connected CQs over disjoint sets of variables and constants. Then for any KB {\cal{L}}, {\cal{L}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!\mathsf{fin}}R iff {\cal{L}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!\mathsf{fin}% }R_{1}^{k_{1}}\cup\dots\cup R_{p}^{k_{p}} for some k_{1},\dots,k_{p}. The number of sequences k_{1},\dots,k_{p} to check is singly exponential in p. Applying this construction to {\cal{K}}_{i} and Q_{i}, we arrive at the case where both provisos are satisfied. Because Q_{i} is an exponential union of CQs, this step introduces a doubly exponential factor to the running time, but the size bounds for the involved KBs and UCQs are not affected. After eliminating constants from Q_{i} in the usual way, we can use the algorithm described above. ∎

6 Conclusions and Discussion

We have established decidability of finite query entailment of {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}, {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} and {\cal{S\hskip-0.258ptI\hskip-0.43ptF}}, and proved that the combined complexity coincides with that of unrestricted query entailment (2ExpTime-complete in all cases). Decidability of finite query entailment for {\cal{SOIF}} remains open.

Since existing 2ExpTime-hardness proofs hold for finite query answering for both {\cal ALCI} and {\cal{ALC\hskip-1.075ptO}}, our upper bound is tight for all logics containing either of these. For {\cal SF} and its fragments, the best known lower bound is co-nexptime of query answering in \mathcal{S} [11].

One crucial aspect in our techniques is the ability to define a suitable notion of decomposition of counter-models. This appears to be more challenging for logics with role inclusions, and we conjecture that for fragments of {\cal{SOIF}} extended with role inclusions a different approach is needed. A promising direction for future work is to push our techniques to establish tight bounds for Horn fragments of {\cal{SOIF}}.

Acknowledgements.

This work was done in the course of several meetings in Vienna and Warsaw, made possible by a grant from the Austrian Agency for International Cooperation in Education and Research (OeAD-GmbH) awarded for the project Logic-based Methods in Data Management and Knowledge Representation within the call WTZ Poland 2017-19. The whole project and this work in particular is a result of three great brainstorming sessions with Claire David, Magdalena Ortiz, and Mantas Simkus back in 2016. The first author was supported by Poland’s National Science Centre grant 2016/21/D/ST6/01485. Last but not least, we salute the anonymous reviewers of KR 2018 for their hard work.

References

  • [1] Antoine Amarilli and Michael Benedikt. Finite open-world query answering with number restrictions. In LICS, pages 305–316. IEEE Computer Society, 2015.
  • [2] Giovanni Amendola, Nicola Leone, and Marco Manna. Finite model reasoning over existential rules. TPLP, 17(5-6):726–743, 2017.
  • [3] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, New York, NY, USA, 2nd edition, 2010.
  • [4] Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artif. Intell., 175(9-10):1620–1654, 2011.
  • [5] Vince Bárány, Georg Gottlob, and Martin Otto. Querying the guarded fragment. Logical Methods in Computer Science, 10(2), 2014.
  • [6] Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with data-tractable description logics. In Reasoning Web, volume 9203 of Lecture Notes in Computer Science, pages 218–307. Springer, 2015.
  • [7] Andrea Calì, Domenico Lembo, and Riccardo Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In PODS, pages 260–271. ACM, 2003.
  • [8] Diego Calvanese. Finite model reasoning in description logics. In KR, pages 292–303. Morgan Kaufmann, 1996.
  • [9] Cristina Civili and Riccardo Rosati. A broad class of first-order rewritable tuple-generating dependencies. In Datalog 2.0, pages 68–80, 2012.
  • [10] Stavros S. Cosmadakis, Paris C. Kanellakis, and Moshe Y. Vardi. Polynomial-time implication problems for unary inclusion dependencies. J. ACM, 37(1):15–46, 1990.
  • [11] Thomas Eiter, Carsten Lutz, Magdalena Ortiz, and Mantas Simkus. Query answering in description logics with transitive roles. In IJCAI, pages 759–764, 2009.
  • [12] Tomasz Gogacz and Jerzy Marcinkowski. On the BDD/FC conjecture. In PODS, pages 127–138, 2013.
  • [13] Tomasz Gogacz and Jerzy Marcinkowski. Converging to the chase – A tool for finite controllability. J. Comput. Syst. Sci., 83(1):180–206, 2017.
  • [14] Yazmin Angélica Ibáñez-García, Carsten Lutz, and Thomas Schneider. Finite model reasoning in horn description logics. In KR. AAAI Press, 2014.
  • [15] David S. Johnson and Anthony C. Klug. Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput. Syst. Sci., 28(1):167–189, 1984.
  • [16] Yevgeny Kazakov. RIQ and SROIQ are harder than SHOIQ. In KR, pages 274–284. AAAI Press, 2008.
  • [17] Carsten Lutz. The complexity of conjunctive query answering in expressive description logics. In IJCAR, volume 5195 of Lecture Notes in Computer Science, pages 179–193. Springer, 2008.
  • [18] Carsten Lutz, Ulrike Sattler, and Lidia Tendera. The complexity of finite model reasoning in description logics. Inf. Comput., 199(1-2):132–171, 2005.
  • [19] Nhung Ngo, Magdalena Ortiz, and Mantas Simkus. Closed predicates in description logics: Results on combined complexity. In AMW, volume 1644 of CEUR Workshop Proceedings. CEUR-WS.org, 2016.
  • [20] Vaughan R. Pratt. Models of program logics. In FOCS, pages 115–122. IEEE Computer Society, 1979.
  • [21] Ian Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quantifiers. J. Log. Comput., 17(1):133–155, 2007.
  • [22] Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers. Inf. Comput., 207(8):867–888, 2009.
  • [23] Riccardo Rosati. Finite model reasoning in dl-lite. In ESWC, volume 5021 of Lecture Notes in Computer Science, pages 215–229. Springer, 2008.
  • [24] Riccardo Rosati. On the finite controllability of conjunctive query answering in databases under open-world assumption. J. Comput. Syst. Sci., 77(3):572–594, 2011.
  • [25] Sebastian Rudolph. Undecidability results for database-inspired reasoning problems in very expressive description logics. In KR, pages 247–257. AAAI Press, 2016.
  • [26] Sebastian Rudolph and Birte Glimm. Nominals, inverses, counting, and conjunctive queries or: Why infinity is your friend! J. Artif. Intell. Res., 39:429–481, 2010.
  • [27] Sebastian Rudolph, Markus Krötzsch, and Pascal Hitzler. Type-elimination-based reasoning for the description logic SHIQbs using decision diagrams and disjunctive datalog. Logical Methods in Computer Science, 8(1), 2012.

Appendix A Proof of Lemma 1

Let n\geq 0. Because {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}) has bounded degree, 2n-neighbourhoods in {\cal{I}} have size bounded by some m. We colour the elements of {\cal{I}} one by one, with m colours. Pick an uncoloured element d. At most m-1 colours are already used in N_{2n}^{{\cal{I}}}(d). Assign to d any colour that is not yet used in N_{2n}^{{\cal{I}}}(d). This procedure gives an n-proper colouring. Indeed, consider different e, e^{\prime} from N_{n}^{{\cal{I}}}(d) for some d\in{\cal{I}}. Without loss of generality we can assume that e was coloured before e^{\prime}. But e belongs to N_{2n}^{{\cal{I}}}(e^{\prime}), so the colours of e and e^{\prime} are different by construction.

Appendix B Proof of Theorem 3

To make clique-forests accessible to automata, we encode them as finitely labelled forests. Let \mathsf{TRol}({\cal{K}}) be the set of transitive roles from \mathsf{Rol}({\cal{K}}), and let [X]^{\leq k} be the family of subsets of X of size at most k. In the encoding, nodes are labelled with elements of the alphabet

\Sigma=\mathsf{Tp}({\cal{K}})\cup\left(\mathsf{TRol}({\cal{K}})\times[\mathsf{% Tp}({\cal{K}})]^{\leq|{\cal{K}}|}\right)

and edges are labelled with elements of the alphabet

\Gamma=\mathsf{Tp}({\cal{K}})\times\mathsf{Rol}({\cal{K}})\times\mathsf{Tp}({% \cal{K}})\,.

To produce the encoding of a clique-forest for {\cal{I}}\setminus\mathsf{Nom}({\cal{K}}). We order its trees in such a way that the root of the ith tree is the ith element of \mathsf{Ind}({\cal{K}})\setminus\mathsf{Nom}({\cal{K}}) wrt. some fixed ordering. Then, we label each element node with the single unary type it realises, and each clique node with its single nonempty role and the set of unary types it realises. Finally, if in {\cal{I}} there is an r-edge from an element of type \tau in some parent node to an element of type \sigma in some child node, then in the encoding the edge from the parent node to the child node is labelled with (\tau,r,\sigma). Because unary types do not repeat within cliques, this uniquely determines the endpoints. We do not represent nominals explicitly in the encoding, but thanks to the initial preprocessing, all relevant information about them is contained in the unary types of the remaining elements.

Thus, our automata run over forests built of at most N=|{\cal{K}}|^{2} trees, with branching bounded by N, nodes labelled with elements of alphabet \Sigma and edges are labelled with elements of the alphabet \Gamma. In such automata, transition relation has the form

\delta\subseteq Q\times\Sigma\times(\Gamma\times Q)^{\leq N}\,,

where Q is the set of states. The automata process the forests top down. The initial states are specified for each tree separately: the automaton has a set I\subseteq Q^{\leq N} of sequences of initial states. A run is a labelling of the input forest with states in such a way that the sequence of states in the roots belongs to I, and if a node has state q, label \alpha, and its children are connected via edges with labels \beta_{1},\beta_{2},\dots,\beta_{n} and have states q_{1},q_{2},\dots,q_{n}, then

(q,\alpha,(\beta_{1},q_{1}),\dots,(\beta_{n},q_{n}))\in\delta\,.

We use Büchi acceptance condition: we specify a set F\subseteq Q of marked states that need to be revisited, and consider a run accepting if on each branch marked states occur infinitely often. A forest is accepted by the automaton if there exists an accepting run over it.

An automaton has trivial acceptance condition if F=Q. Then, each run is accepting but the automaton may still reject some forests, because there may be no run for them: a branch of the computation can get stuck if no transition is consistent with the current state, label and edge labels. An automaton is weak if on each branch of each run, once a marked state is visited, all subsequent states are marked. Notice that all automata with trivial acceptance condition are weak. Given a weak automaton and an arbitrary Büchi automaton it is particularly easy to construct an automaton recognising trees accepted by both input automata: it suffices to take the standard (synchronous) product automaton and mark all states that contain a marked states on both coordinates.

The automaton recognising safe counter-examples for Q is obtained as a product of automata verifying independently various parts of the condition.

The first thing to check is the consistency of the encoding: if an edge has label (\tau,r,\sigma), then \tau must occur in the label of the parent node, and \sigma must occur in the label of the child node. To check this, it suffices to examine for each node the labels of all edges incident to it plus the label of the node itself. When a transition is made, all these are available except the label on the edge to the parent: it must be stored in the state. The automaton has {\cal{O}}(|\Gamma|)=2^{{\cal{O}}(|{\cal{K}}|)} states and trivial acceptance condition.

The second thing to check is that the {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest is a model of {\cal{K}}^{*}. Checking that the {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest is a model of the ABox {\cal{A}} of {\cal{K}}^{*} amounts to testing if the roots of the trees are labelled with appropriate types. This can be done easily by an automaton with {\cal{O}}(|{\cal{K}}|) states and trivial acceptance condition. To verify that the TBox is satisfied we need to check each CI. For CIs of the form

\mathop{\mathop{\mbox{\bigmathxx\char 117}}}\limits_{i}A_{i}\sqsubseteq\mathop% {\mathop{\mbox{\bigmathxx\char 116}}}\limits_{j}B_{j}

we have a two-state automaton with trivial acceptance condition that simply tests that each type used in the encoding satisfies this CI; if the type of some a\in\mathsf{Nom}({\cal{K}}) specified in {\cal{A}} violates this CI, the automaton rejects everything. CIs of the form

A\sqsubseteq\forall r.B

are also easy to handle. If {\cal{A}} contains A(a), A_{r,b}(a), \bar{B}(b) for some a and b, the automaton rejects everything. Otherwise, it suffices to check that in the input {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest there is no r-edge from an element whose unary type contains A to an element whose unary type contains \bar{B}. This amounts to verifying that none of the following are used in the encoding:

  • node labels (r,T) such that A\in\tau\in T and \bar{B}\in\sigma\in T for some \tau, \sigma;

  • edge labels (\tau,r,\sigma) with A\in\tau and \bar{B}\in\sigma;

  • edge labels (\sigma,r^{-},\tau) with A\in\tau and \bar{B}\in\sigma;

  • unary types containing both A and A_{r,b} for some b such that \bar{B}(b)\in{\cal{A}};

  • unary type containing both \bar{B} and A_{r^{-},b} for some b such that A(b)\in{\cal{A}}.

These conditions simply disallow certain labels; they can be checked by a two-state automaton with trivial acceptance condition.

Finally, let us take a CI of the form

A\sqsubseteq\exists r.B\,.

For ordinary elements this condition can be tested in a similar way as above, except that one needs access to the label of the current node and all edges incident to it. Like for the initial consistency check, it suffices to store in the state the label of the edge to the parent. Nominals have to be treated separately, because they are not explicitly represented in the tree: for each a such that A(a)\in{\cal{A}} and there is no b such that A_{r,b}(a)\in{\cal{A}} and B(b)\in{\cal{A}}, we have a two-state weak automaton looking for a label that uses a type \tau such that B\in\tau and A_{r^{-},a}\in\tau. Note that this automaton has a non-trivial acceptance condition, but it is weak: as soon as it finds an appropriate label, it loops in a marked state. Summing up, the total size of the state-space of the KB component is 2^{{\cal{O}}(|{\cal{K}}|^{2})}.

The third thing to check is that the query Q is not satisfied. We begin by replacing query Q with a query Q^{\prime} such that {\cal{I}}^{*}\models Q iff ({\cal{I}}\setminus\mathsf{Nom}({\cal{K}}))^{*}\models Q for each model {\cal{I}} of {\cal{K}}. The query Q^{\prime} is obtained in two steps. In the first step, for each CQ P constituting Q, we add to Q each CQ that can be obtained from P by subdividing some transitive atoms; that is, by replacing some atoms of the form r(x,y) for some transitive r, with two atoms r(x,z) and r(z,y) for a fresh variable z. In the second step, for each CQ P of the modified Q, we add to Q each CQ that can be obtained from P by performing the following operation any number of times. Let \mathsf{tp}(x) be the set of all A such that P contains A(x). Choose x\in\textit{var}(P) and a\in\mathsf{Nom}({\cal{K}}) such that A(a)\in{\cal{A}} whenever A\in\mathsf{tp}(x). Drop all atoms of the form A(x) from P. Replace in P each atom of the form r(y,x) by A_{r,a}(y) and each atom of the form r(x,y) by A_{r^{-},a}(y). It is easy to see that the resulting query Q^{\prime} has the desired property. After the first step, the number of CQs grows by the factor 2^{m} and their size is at most 2m. After the second step, the number of CQs grows by the factor |{\cal{K}}|^{2m} and their size is still at most 2m. Thus, the size of the resulting query is at most |Q|\cdot 2^{m}\cdot|{\cal{K}}|^{2m}, and its CQs have size at most 2m.

Thus, it suffices to construct for each CQ P of Q^{\prime} an automaton that tests if ({\cal{I}}\setminus\mathsf{Nom}({\cal{K}}))^{*}\models P where {\cal{I}} is the {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest represented by the input encoding. Its states are composed from an edge label \beta=(\sigma,r,\tau) and a set of partial functions

f:\textit{var}(P)\multimap\!\to\{\mathsf{succ},\mathsf{other}\}\,,

representing all partial matchings of P in the interpretation ({\cal{I}}\setminus\mathsf{Nom}({\cal{K}}))^{*} restricted to elements represented in the subtree rooted at the current node. The label \beta=(\sigma,r,\tau) is always the label on the edge from the parent to the current node (if the current node is the root of the input tree, \beta is arbitrary). Under this assumption there is a unique element of type \tau in the current node. We refer to this element as the current element. Similarly, in the parent node there is exactly one element of type \sigma; we call it the parent element. In ({\cal{I}}\setminus\mathsf{Nom}({\cal{K}}))^{*} these two elements are connected by an r-edge. The identifier \mathsf{succ} stands for any element (represented in the current subtree) that is an r-successor of the parent element in ({\cal{I}}\setminus\mathsf{Nom}({\cal{K}}))^{*}. If r is non-transitive, this simply means the current element. If r is transitive, it means any element r-reachable from the current element. The identifier \mathsf{other} stands for any other element represented in the current subtree. All states are initial. Transitions are defined only for states that contain only functions that are not total, and the acceptance condition is trivial. It is clear that such an automaton is correct provided that the transition relation ensures the intended semantics of the states. Let us see how to define it.

First, we describe when

\bigg{(}\big{(}(\sigma,r,\tau),\Phi\big{)},\tau,\big{(}(\tau,r_{i},\tau_{i}),(% (\tau,r_{i},\tau_{i}),\Phi_{i})\big{)}^{n}_{i=1}\bigg{)}

is a transition of the automaton. Let \Psi be the set of all constant partial functions h:\textit{var}(P)\multimap\!\to\{\tau\} such that \mathsf{tp}(x)\subseteq h(x) for all x\in{\sf dom}(h). We say that functions h\in\Psi and f_{1}\in\Phi_{1},f_{2}\in\Phi_{2},\dots,f_{n}\in\Phi_{n} are compatible if they have disjoint domains and for each atom s(x,y) of P

  • if x\in{\sf dom}(h), y\in{\sf dom}(f_{i}) then r_{i}=s, f_{i}(y)=\mathsf{succ};

  • if y\in{\sf dom}(h), x\in{\sf dom}(f_{i}) then r_{i}=s^{-}, f_{i}(x)=\mathsf{succ};

  • if x\in{\sf dom}(f_{i}), y\in{\sf dom}(f_{j}), i\neq j then s is transitive, r_{i}=r_{j}^{-}=s , and f_{i}(x)=f_{j}(y)=\mathsf{succ}. 111 Due to the initial preprocessing of Q, this condition is actually redundant, but we include it to make the correctness more apparent.

If r is non-transitive, the condition that each transition of the form above has to satisfy is that \Phi is the set of all functions f that can be obtained from any compatible functions h\in\Psi and f_{1}\in\Phi_{1},f_{2}\in\Phi_{2},\dots,f_{n}\in\Phi_{n} by setting

f(x)=\begin{cases}\mathsf{succ}&\text{if }h(x)=\tau\\ \mathsf{other}&\text{if }f_{i}(x)=\mathsf{succ}\\ &\text{or }f_{i}(x)=\mathsf{other}\end{cases}

and if r is transitive, we set

f(x)=\begin{cases}\mathsf{succ}&\text{if }h(x)=\tau\\ &\text{or }f_{i}(x)=\mathsf{succ},r_{i}=r\\ \mathsf{other}&\text{if }f_{i}(x)=\mathsf{other}\\ &\text{or }f_{i}(x)=\mathsf{succ},r_{i}\neq r\end{cases}

with the convention that whenever we write g(x)=\gamma for a partial function g, we implicitly assume that x\in{\sf dom}(g).

For transitions of the form

\bigg{(}\big{(}(\sigma,r,\tau),\Phi\big{)},(r^{\prime},T),\big{(}(\sigma_{i},r% _{i},\tau_{i}),((\sigma_{i},r_{i},\tau_{i}),\Phi_{i})\big{)}^{n}_{i=1}\bigg{)}

the condition is similar. For \Psi be take the set of all partial functions h:\textit{var}(P)\multimap\!\to T such that \mathsf{tp}(x)\subseteq h(x) for all x\in{\sf dom}(h) and the only role atoms in P with both variables in {\sf dom}(h) are s atoms. Functions h\in\Psi and f_{1}\in\Phi_{1},f_{2}\in\Phi_{2},\dots,f_{n}\in\Phi_{n} are compatible if they have disjoint domains and for each atom s(x,y) of P

  • if x\in{\sf dom}(h), y\in{\sf dom}(f_{i}) then r_{i}=s, f_{i}(y)=\mathsf{succ}, and either h(x)=\sigma_{i} or s=r^{\prime} and s is transitive;

  • if y\in{\sf dom}(h), x\in{\sf dom}(f_{i}) then r_{i}=s^{-}, f_{i}(x)=\mathsf{succ}, and either h(y)=\sigma_{i} or s=r^{\prime} and s is transitive;

  • if x\in{\sf dom}(f_{i}), y\in{\sf dom}(f_{j}), i\neq j then s is transitive, r_{i}=r_{j}^{-}=s , f_{i}(x)=f_{j}(y)=\mathsf{succ}, and either \sigma_{i}=\sigma_{j} or s=r^{\prime}.

If r is non-transitive, \Phi is the set of all partial functions f that can be obtained from any compatible functions h\in\Psi and f_{1}\in\Phi_{1},f_{2}\in\Phi_{2},\dots,f_{n}\in\Phi_{n} by setting

f(x)=\begin{cases}\mathsf{succ}&\text{if }h(x)=\tau\\ \mathsf{other}&\text{if }h(x)=\tau^{\prime}\neq\tau\\ &\text{or }f_{i}(x)=\mathsf{other}\end{cases}

and if r is transitive, we set

f(x)=\begin{cases}\mathsf{succ}&\text{if }h(x)=\tau\\ &\text{or }f_{i}(x)=\mathsf{succ},r_{i}=r,\sigma_{i}=\tau\\ &\text{or }f_{i}(x)=\mathsf{succ},r_{i}=r,r^{\prime}=r\\ \mathsf{other}&\text{if }f_{i}(x)=\mathsf{other}\\ &\text{or }f_{i}(x)=\mathsf{succ},r_{i}\neq r\\ &\text{or }f_{i}(x)=\mathsf{succ},\sigma_{i}\neq\tau,r^{\prime}\neq r\\ \end{cases}\;.

To see that this transition relation ensures the intended semantics of the states one needs to argue that each partial matching is accurately represented. This can be done by induction on the size of image. For size one, the matching will be accounted for based solely on the labels. For larger images, use the inductive hypothesis for restrictions of the match to variables mapped to the trees rooted at the children of the current node.

The total size of the state-space of the query component is 2^{{\cal{O}}(3^{2m}\cdot|Q|\cdot 2^{m}\cdot|{\cal{K}}|^{2m})}=2^{|Q|\cdot|{% \cal{K}}|^{{\cal{O}}(m)}}.

The last component of the automaton checks that the {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forest is safe. Observe that it is unsafe if in the input forest there is a branch with consecutive node and edge labels \alpha_{1}\beta_{1}\alpha_{2}\beta_{2}\dots such that for some transitive r and all i large enough, \beta_{i}=(\tau_{i},r,\sigma_{i}) and either \sigma_{i}=\tau_{i+1} (edges are incident in the {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-tree) or \alpha_{i+i}=(r,T_{i}) (edges are incident with an r-clique). An automaton can easily check that there is no such branch. Each time it sees a transitive role it moves to an unmarked state, storing the role. It moves to a marked state as soon as the condition above is broken. The automaton has {\cal{O}}(|{\cal{K}}|) states.

The automaton recognising safe counter-examples can be obtained from these components by the simple product construction described above, because only the last component is not weak. The resulting product automaton has 2^{|Q|\cdot|{\cal{K}}|^{{\cal{O}}(m)}} states. An automaton with k states has total size {\cal{O}}(k\cdot|\Sigma|\cdot(k\cdot|\Gamma|)^{N}+k^{N}), which in our case is {\cal{O}}(k^{|{\cal{K}}|^{2}}\cdot 2^{2\cdot|{\cal{K}}|^{3}+|{\cal{K}}|^{2}% \log|{\cal{K}}|+|{\cal{K}}|^{2}+\log|{\cal{K}}|}). Thus, the size of the product automaton is also 2^{|Q|\cdot|{\cal{K}}|^{{\cal{O}}(m)}}.

Appendix C Proof of Lemma 2

There exists a natural homomorphism h:{\cal{I}}\to{\cal{M}}, mapping copies of elements from {\cal{M}} to their originals. The homomorphism h induces a homomorphism from {\cal{I}}^{*} to {\cal{M}}, so {\cal{I}}^{*} cannot satisfy Q. Because {\cal{I}} was obtained using a variant of the standard unravelling procedure, verifying {\cal{I}}\models{\cal{K}}^{*} is routine. Let us see that {\cal{I}} is safe. Suppose that {\cal{I}} does contain an infinite simple r-path \pi for some transitive role r. Because \mathsf{Nom}({\cal{K}}) is finite, by skipping a finite prefix we can assume that \pi never visits \mathsf{Nom}({\cal{K}}). The image of \pi under the homomorphism h from the previous paragraph forms an r-path h(\pi) in {\cal{M}}. Because {\cal{M}} is finite, h(\pi) eventually stabilises in a single strongly connected component X of r in {\cal{M}}. By skipping a finite prefix of \pi we can assume that h(\pi)\subseteq X. In the construction of {\cal{I}}, nominals are only copied once, so only nominals get mapped to nominals by h. Consequently, h(\pi)\subseteq X\setminus\mathsf{Nom}({\cal{K}}). From the construction of {\cal{I}} it further follows that by skipping another finite prefix we can assume that the first element of \pi belongs to an r-clique X_{0} that contains a representant of each C\in\mathsf{CN}({\cal{K}}) with a representant in X\setminus\mathsf{Nom}({\cal{K}}). Because X_{0} is finite and \pi is infinite and simple, \pi eventually leaves X_{0}. Let d be the first element of \pi outside of X_{0}. There exists C\in\mathsf{CN}({\cal{K}}) such that d\in C^{\cal{I}} but C^{\cal{I}}\cap X_{0}=\emptyset, for otherwise there would be no reason to add d to {\cal{I}}. On the other hand, h(d)\in C^{\cal{M}}\cap\big{(}X\setminus\mathsf{Nom}({\cal{K}})\big{)}, which implies C^{\cal{I}}\cap X_{0}\neq\emptyset and gives a contradiction.

Appendix D Adaptation of the argument for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}} to {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}

The argument for {\cal{S\hskip-0.86ptO\hskip-1.032ptF}} is almost identical to the one for {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}; differences are few and easy to delimit. All constructions remain the same, but each time we check that some interpretation is a model of {\cal{K}}, we need to verify the functionality declarations. These are generally ensured by the absence of inverses in CIs. We list all necessary modifications below.

  1. {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}-forests are defined exactly like {\cal{S\hskip-0.86ptO\hskip-0.258ptI}}-forests. Because \mathsf{Rol}({\cal{K}}) contains no inverses, all edges between instances {\cal{I}}_{v} point down the tree.

  2. In the construction of the automaton from Theorem 3 we include an additional component for each functionality declaration \mathsf{Fn}(r). To check functionality of r for ordinary nodes it suffices to examine the label of the node and the labels on all incident edges, which only requires storing in the state the label of the edge to the parent. Additionally, for all a\in\mathsf{Nom}({\cal{K}}), if the ABox contains A_{r,b}(a) and A_{r,b^{\prime}}(a) for some b\neq b^{\prime}, the automaton trivially rejects everything; if the ABox contains A_{r,b}(a) for only one b, the automaton checks that no type used in the input forest contains A_{r^{-},a}; if the ABox contains no A_{r,b}(a), the automaton checks that a type with A_{r^{-},a} occurs at most once in the input forest. The total number of states in the described component is 2^{{\cal{O}}(|{\cal{K}}|^{2})}, so including it does not affect the overall upper bound.

  3. Checking that the unravelling procedure produces a model of {\cal{K}} (Lemma 2) requires verifying the functionality declarations. This is routine.

  4. After {\cal{F}}_{n} has been constructed from a {\cal{S\hskip-0.86ptO\hskip-1.032ptF}}-forest using the coloured blocking principle, we need to check that it satisfies all functionality declarations of {\cal{K}}. This follows immediately from the fact that each redirected edge is a forward edge.

Appendix E Proof of Theorem 6

Each {\cal{ALCI\hskip-0.43ptF}} KB can be expressed in the guarded fragment with two variables and counting (\mathcal{GC}^{2}). Hence, the the following result is relevant for us.

Theorem 8.

[[22], Theorem 4] For any \mathcal{GC}^{2}-sentence \phi and any positive conjunctive query \psi both finite and infinite query entailment are in co-NP in terms of data complexity.

Because we are interested in combined complexity and UCQs, we have to inspect the proof rather than just using the theorem as a black box.

The first step of the proof is to show that if \phi (together with some ground atoms) entails \psi then \phi entails a treeification of \psi, which can be rewritten as a \mathcal{GC}^{2} formula \psi_{\mathcal{GC}^{2}}. It is easy to see that the same argument applies to UCQs. For a single CQ \psi there are at most |\psi|^{|\psi|} possible treeifications, therefore for our UCQ Q we have at most n\cdot m^{m} possible treeifications.

The next step is to use finite query answering for \mathcal{GC}^{2}.

Theorem 9.

[[21], Theorem 1] Finite satisfiability for \mathcal{GC}^{2} is in EXPTIME.

Once again, to obtain the precise bounds we need a bit more than the stated theorem provides. The proof of Theorem 9 uses the well-known technique of inequality systems, developed by Pratt-Hartmann. The provided algorithm is polynomial in the size of the formula and exponential in the signature under the assumption that the formula is in the normal form. The normalisation of an arbitrary formula \phi increases the size of the signature by {\cal{O}}(|\phi|), and that we can afford.

In the inequality system from the proof of Theorem 9, each variable represents a star type realised in a hypothetical counter-model. The star type of an element is a refinement of its unary type, so we can for free incorporate into the proof the restriction on allowed unary types: we simply remove variables whose associated unary type is not in T. This procedure does not complicate the inequality system in any measure.

Appendix F Proof of the claim in the proof of Theorem 7

The claim can be equivalently formulated as follows: {\cal{K}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}_{\!\!\mathsf{fin}}Q iff there exist finite interpretations {\cal{F}}_{1}\models{\cal{K}}_{1} and {\cal{F}}_{2}\models{\cal{K}}_{2} such that for each disjunct P of Q, for each V\subseteq\textit{var}(P), for each function h:V\to\mathsf{Ind}({\cal{K}}), for each partition of the atoms of P into P_{1} and P_{2} with \textit{var}(P_{1})\cap\textit{var}(P_{2})\subseteq V, one cannot simultaneously extended h to homomorphisms h_{i}:P_{i}\to{\cal{F}}_{i} for all i.

Suppose first that there is a finite interpretation {\cal{F}} such that {\cal{F}}\models{\cal{K}} and {\cal{F}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q. We construct interpretations {\cal{F}}_{1} and {\cal{F}}_{2}, as specified in the claim, by unravelling {\cal{F}} like in the proof of Lemma 6, but only to a finite depth. For {\cal{F}}_{1} we start from {\cal{F}}_{\textsc{tr}}, and for each element d in {\cal{F}}_{\textsc{tr}} we add a copy of {\cal{F}} with all elements fresh except d. For {\cal{F}}_{2} we start from {\cal{F}}_{\textsc{nt}}, for each element d in {\cal{F}}_{\textsc{nt}} add a copy of {\cal{F}}_{\textsc{tr}} with all elements fresh except d, and then for each element e that only belongs to a copy of {\cal{F}}_{\textsc{tr}}, add a copy of {\cal{F}} with all elements fresh except e. In both cases, we close the interpretations of transitive roles under transitivity. Because copies of {\cal{F}} share elements only with copies of {\cal{F}}_{\textsc{tr}}, functionality requirements do not get violated in the unravelling process. It follows that {\cal{F}}_{i}\models{\cal{K}}_{i}. Consider a disjunct P of Q, a set V\subseteq\textit{var}(P), a function h:V\to\mathsf{Ind}({\cal{K}}), and a partition of P into P_{1} and P_{2} such that \textit{var}(P_{1})\cap\textit{var}(P_{2})\subseteq V. Suppose that h can be extended to a homomorphism h_{i}:P_{i}\to{\cal{F}}_{i} for all i. By the construction of {\cal{F}}_{i}, there exists a homomorphism f_{i}:{\cal{F}}_{i}\to{\cal{F}}. Consequently, we obtain a match for P in {\cal{F}} by taking f_{1}\circ h_{1}\cup f_{2}\circ h_{2}, which is a contradiction. Thus, {\cal{F}}_{1} and {\cal{F}}_{2} are as we wanted.

Conversely, assume that we have finite interpretations {\cal{F}}_{1} and {\cal{F}}_{2} as in the claim. We first unravel them like above. To obtain {\cal{F}}^{\prime}_{1} we start from ({\cal{F}}_{1})_{\textsc{tr}} and for each d in ({\cal{F}}_{1})_{\textsc{tr}} we add a copy of {\cal{F}}_{1} with all elements fresh except d. For {\cal{F}}^{\prime}_{2} we start from ({\cal{F}}_{2})_{\textsc{nt}}, for each d in ({\cal{F}}_{2})_{\textsc{nt}} we add a copy of ({\cal{F}}_{2})_{\textsc{tr}} with all elements fresh except d, and then for each e that belongs only to a copy of ({\cal{F}}_{2})_{\textsc{tr}}, add a copy of {\cal{F}}_{2} with all elements fresh except e. Again, close the interpretations of transitive roles under transitivity. By construction, {\cal{F}}^{\prime}_{1} and {\cal{F}}^{\prime}_{2} also satisfy the condition in the claim. To construct {\cal{F}}, first delete all subtrees of the tree partitions of {\cal{F}}^{\prime}_{1} and {\cal{F}}^{\prime}_{2} rooted in second-level nodes that contain an element of \mathsf{Ind}({\cal{K}}), and then take the union of the two resulting interpretations. This is consistent because all a\in\mathsf{Ind}({\cal{K}}) have their types fully specified. An argument similar to the one above shows that {\cal{F}}\models{\cal{K}} and {\cal{F}}\mathchoice{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3.749943pt$% \displaystyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 3.749943pt\kern-3% .749943pt$\textstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 2.62496pt% \kern-2.62496pt$\scriptstyle\not$}{\models}}}{\mathrel{\hbox to 0.0pt{\kern 1.% 874971pt\kern-1.874971pt$\scriptscriptstyle\not$}{\models}}}Q.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
247310
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description