Ontology Focusing: Knowledge-enriched Databases on Demand
We propose a novel framework to facilitate the on-demand design of data-centric systems by exploiting domain knowledge from an existing ontology. Its key ingredient is a process that we call focusing, which allows to obtain a schema for a (possibly knowledge-enriched) database semi-automatically, given an ontology and a specification of the scope of the desired system. We formalize the inputs and outputs of focusing, and identify relevant computational problems: finding a schema via focusing, testing its consistency, and answering queries in the knowledge-enriched databases it produces. These definitions are fully independent from the ontology language. We then instantiate the framework using selected description logics as ontology languages, and popular classes of queries for specifying the scope of the system. For several representative combinations, we study the decidability and complexity of the identified computational problems. As a by-product, we isolate (and solve) variants of classical decision problems in description logics, that are interesting in their own right.
In the design of data-centric systems, coming up with the right data organization (in terms of database schemas, integrity constraints, conceptual models, etc.) is of paramount importance. If well-chosen, it can make the implementation of the remaining functionality more evident, as it binds the developers to one shared and unambiguous view of the data to be managed by the target system. Unfortunately, coming up with the right data organization remains challenging and time-consuming, despite the many techniques and tools that are available to aid the design of data-centric systems. In addition, modern systems face further challenges, like incompleteness of information, or the need for interoperability with multiple other systems (Abiteboul et al., 2018).
We propose a novel way to exploit domain knowledge captured in ontologies in the design of data-centric systems. Ontologies, understood here as logical theories expressing domain knowledge, provide a shared understanding of the domain to different users and applications; justified by expected reusability, considerable resources have been invested in constructing high-quality ontologies for many domains (Horrocks, 2008). In data management they have already proved to be a powerful tool. Successful applications include data integration and querying incomplete data sources, where they are used on-line during the system operation to infer additional facts from incomplete data, and to provide a unified view of heterogenous data sources (Poggi et al., 2008; Benedikt et al., 2018; Xiao et al., 2018). Here, we would like to use an existing ontology to produce, quickly and with moderate effort, data-centric systems on demand. To achieve this goal, two key challenges need to be overcome.
Ontologies are typically broad, containing many terms irrelevant for the intended application. We need methods to restrict their scope to obtain more manageable conceptualizations.
- Data completeness.:
Ontologies have an open-world semantics that treats data as incomplete, while every meaningful data-centric application will call for some completeness assumptions on (parts of) the data.
As a response to these challenges, we propose a process called focusing that allows us to trim away irrelevant information, and establish completeness assumptions. The goal of this process is to find so-called focusing solutions. Syntactically, a focusing solution provides a schema for a database and specifies how its instances are enriched with the knowledge from the ontology, by prescribing which ‘parts’ of the ontology are relevant and which are complete. Semantically, focusing solutions define a set of intended models for these knowledge-enriched databases: those that give the expected answers for the relevant queries.
Our main contribution is formalizing focusing solutions. The notion is independent from the ontology language and gives several options for specifying the scope of the system. We identify key computational problems relevant for obtaining and using focusing solutions. As an advanced proof of concept, we instantiate our general notions by considering a few choices of ontology languages and scope specifications. For these combinations, we study the decidability and complexity of the introduced computational problems. As a by-product, we isolate (and solve) variants of classical reasoning tasks that seem to be interesting in their own right.
2. General Framework
We now discuss the key features and design choices in the framework we propose. We begin from a motivating use case.
2.1. Emergency response in a smart city
City authorities rely on an ontology to quickly build situation-specific applications supporting response to emergency events. The ontology contains, among others: (1) Data about the city, e.g., districts, population, facilities such as hospitals, schools, and sport centers, together with associated details like their capacity, size, and existing services; (2) The city’s risk assessment data, detailing possible emergencies; (3) Knowledge about public health emergencies, e.g., types of emergencies include disease outbreaks, radiation emergencies, and disasters and weather emergencies; the latter include extreme heat, floods, hurricanes, and wild fires; (4) Emergency response knowledge, like types of responders (paramedics, firefighters, police officers, etc.); (5) General knowledge about relevant topics like weather, buildings, or cities.
Having such an ontology for this purpose is possible and realistic. In fact, many cities now compile and store data concerning (1) and (2), and there are vast repositories of online resources and readily available ontologies for (3)–(5); for example, see (OpenSensingCity Project, 2014). Moreover, maintaining such an ontology can be part of the routine preparation measures carried out in emergency response departments when there are no emergencies.
When a disaster happens, an automated focusing engine is fed with relevant parameters, to obtain focusing solutions that can help quickly build tools specific to this kind of disaster. For example, in many emergencies, so-called ‘community assessment’ questionnaires are used to gather critical public health data such as availability of drinking water and electricity in households. Existing guidelines suggest to first design the database that will be used to gather and analyze the data, and to guide the questionnaire design with it. Using an ontology like the one we have described, one could instead propose the database tables automatically. In the event of a disaster like a flood, one could quickly create an application for storing, for example, the current alert level in the different districts of the city and the complete list of operating shelters together with currently available resources, while disregarding any knowledge in the ontology related to other possible disasters, such as wild fires or extreme heat.
2.2. What should focusing be?
The starting point is an ontology, expressed as a theory in some logical formalism over a relational signature. Based on some specification of the intended scope of the system, we need to decide which predicates are to be supported by the system and how the database will interact with the ontology at run time. Let us assume that we only allow ourselves two make two decisions about each predicate: whether it will evolve during the run of the system and so should be stored in the database (dynamic vs static predicates), and whether it should be viewed as a complete representation of the data it represents (closed vs open predicates). The focusing solutions are designed to support these decisions, as well as the specification of the scope of the system. We now informally describe the four components of focusing solutions.
A bare-bones focusing solution is a database schema, collecting all dynamic predicates. Such system allows storing relevant data and updating it as the reality evolves, but neither gives any completeness guarantees, nor improves the specificity.
We address the data completeness issue by declaring some predicates as closed. Semantically, these declarations trim the set of represented models, keeping only those that agree with the current database instance on the closed predicates. In particular, if we decide to close all dynamic predicates, we end up with a standard database, with the ontology acting as a set of integrity constraints.
The actual intention behind declaring a predicate as closed, is to commit the system to keep the stored content of the predicate identical to its real-world interpretation. Thus, closed dynamic predicates should be used to represent changing aspects of reality that are fully observable to the system. In our motivating example, these could include the precise list of districts for which an evacuation was ordered, the open shelters, assignments of personal to tasks, shelters, etc. Some of these aspects may be fully observable because they are actually controlled by the system (maybe the evacuation orders are issued by the system itself), for others we might rely on updates from some other trusted system.
We address the specificity issue by declaring some predicates as fixed. When we fix a predicate, we require its extension to be determined by the ontology alone. That is, in an intended model, a fixed predicate will contain a tuple of constants only if the ontology alone entails the corresponding atom. When we fix a predicate, it effectively becomes static and closed: even if it is stored in the database, it has only one allowed extension.
If a predicate is not populated by the ontology (which is quite common since most ontologies focus on terminological rather than assertional knowledge), fixing it enforces its extension to be empty. By fixing irrelevant predicates, we avoid reasoning about them and make the ontology more specific for the current situation. This reflects the intuition that the more specific our knowledge of the situation is, the more inferences we can draw from the ontology about our situation.
Fixed predicates will typically include aspects of reality that are captured in the ontology but irrelevant to our application, like the specifics of other types of emergencies not related to the current one. We can also fix predicates that are relevant, but correspond to immutable aspects of reality, and their extension is uniquely (and accurately) determined by the ontology. For instance, all the districts of a city, or all hospitals in a district.
A fixed predicate can in principle act as an additional integrity constraint. This happens if declaring the predicate as fixed discards all intended models for some instance, thus making the instance inconsistent. This is undesired, and will be explicitly forbidden in the definition of focusing solution.
Our focusing solutions already provide a database schema and an interface to the ontology that addresses the specificity and data completeness issues, but so far we have been assuming that the designer makes all the choices, as appropriate for the intended scope of the system. In order to support these choices using automated reasoning, or at least check that they are correct, we need the designer to provide a formal specification of the scope. We propose to specify the scope in terms of queries that will be posed when the system is running. We shall have three families of such queries.
Determined queries are the ones for which we want a guarantee that the answers entailed by the data and the ontology are complete with respect to all possible models. Equivalently, the answers do not depend on the concrete model, as long as it is compatible with the database contents and the ontology. Declaring a query as determined should be viewed as a demand of the designer: this query needs to be determined, how do we guarantee this?
Fixed queries generalize fixed predicates: complete answers to these queries are entailed by the ontology alone. Declaring a query as fixed is a decision rather than a demand: We freeze the answers as the ones entailed by the ontology alone, and models yielding more answers should be discarded. The more assumptions of this kind, the easier it is to make other queries determined.
Closed queries generalize closed (dynamic) predicates. For these queries, we are making the assumption that complete answers can be obtained from the data alone, by directly evaluating the query. Declaring a query as closed is a promise made by the designer: I am prepared to maintain the data in such a way that this query is closed. Again, the more assumptions of this kind, the easier it is to make other queries determined.
Note that, in fact, all three families of queries can be viewed as completeness assertions: closed queries talk about completeness of the data, fixed queries talk about completeness of the ontology, and determined queries talk about completeness of the combination of both. Allowing fixed and closed queries, rather than just predicates, gives a bit more flexibility to the designer. For example, it might not be reasonable to assume complete knowledge about all buildings in the city, or all hospitals in the country, but maintaining up-to-date information about these buildings in the city that are hospitals is a perfectly reasonable requirement.
The focusing problem is to find a focusing solution that guarantees that certain queries are determined, assuming that certain other queries are closed or fixed. These solutions are in general not unique and there are many trade-offs involved. For example, the more predicates are closed, the easier it is for a query to be determined, but we must pay the maintenance costs for each predicate we close. It seems desirable to have as few closed predicates as possible, provided we can guarantee that the suitable queries are determined. On the other hand, if suitable queries are already determined, it may be desirable to fix as many predicates as is possible without making any instances inconsistent. In the following subsection we formalize the outcome of this discussion.
2.3. What is a focusing solution?
We shall keep our notions independent from the specific formalisms used to express ontologies and queries.
We assume an infinite set of constants and an infinite set of relation symbols. Each relation symbol has a non-negative integer arity, denoted by . An atom is an expression of the form , where , , and . A (database) instance is any set of atoms. A signature is any set of relation symbols. An instance is over a signature , if implies . The active domain of , denoted by , is the set of all constants in the atoms of .
We assume an infinite set of theories. Each theory is associated with a set of instances that are called models of . We write if is a model of .
We assume an infinite set of queries. Each query has a non-negative integer arity, denoted . Each query is associated with a function that maps every instance to an -ary relation , where . Queries of arity 0 are called Boolean; their associated functions map instances to subsets of , where is the empty tuple. A Boolean query holds in an instance , if . We need the notion of certain answers. Given an -ary query , a theory and an instance , we let denote the set of -tuples satisfying the following implication: if and , then .
A focusing configuration is a database schema together with three sets of queries representing completeness assertions about the data and about the theory, and determinacy assertions.
Definition 0 (Focusing configuration).
A (focusing) configuration is a tuple , where is a signature, and are sets of queries. An instance is legal for in case it is over .
Let us explain how the four ingredients of a focusing configuration work. First, the signature is a database schema that determines database instances of interest: an instance is legal for it is over the signature .
The queries from specify completeness assertions about data, effectively restricting the set of models represented by a legal instance as follows.
Definition 0 (Query-based Completeness).
Given a theory , an instance , and a set of queries, we let be the set of all instances such that
for all .
Intuitively, contains exactly the models of and that provide no new information about the queries in compared to the information given by alone.
The queries in specify completeness assertions about the theory, further restricting the set of represented models.
Definition 0 (Query-based Fixing).
For a theory , an instance , and a set of queries, we let be the set of all instances such that
for all .
Intuitively, contains exactly the models of and that provide no new information about the queries in compared to the information given by alone.
Thus, a configuration tells us to consider only database instances over as legal. For each such instance , we are to assume that the information retrieved from alone by the queries in is complete, and we are to restrict our attention to models where the answers to queries in are frozen to the facts inferred from the initial theory (without any data). These are the intended models represented by a concrete legal database instance .
Definition 0 (Intended models).
For a theory , a configuration , and an instance over , let
We are now ready to provide the definition of focusing, which makes the role of precise: we shall demand that freezing answers to the queries from does not affect consistency of instances and that the answers to the queries in coincide over all intended models.
Definition 0 (Focusing).
A focusing solution for a theory is a configuration such that the next two conditions are obeyed for all instances over :
if , then ;
for all , and all , we have .
Let us see how this definition captures the scenario discussed in the previous subsection. Suppose that the designer specifies a set determined queries that need to be guaranteed as well as sets of queries that are promised to be closed, and of queries are chosen to be interpreted as fixed. We now want to find a correct focusing solution of the form
with and . There may be many such focusing solutions, and they are all good in the sense that they guarantee correct answers to . This leaves some space to accommodate additional preferences of the designer; we discuss it in the following subsection.
2.4. Getting and using focusing solutions
We shall now identify key reasoning problems crucial in obtaining and using focusing solutions. For each problem the input includes a focusing configuration ; some problems additionally input theories, database instances, and queries. To be able to speak about concrete formalisms, we parameterize the problems by query and ontology languages. We write to indicate that the language used for expressing theories is . Similarly, we use , , , and for .
The main problem is recognizing focusing solutions among focusing configurations.
|Input:||A pair with , and|
|Question:||Is a focusing solution to ?|
Thus, in the focusing problem a candidate focusing configuration is given, and the task is to decide if it is a focusing solution. In the scenario discussed previously, the input consists of closed queries , fixed queries , and determined queries , and the output is a focusing solution with and . If we consider for and a query language that gives us a finite number of candidates, like atomic queries (which covers the basic scenario with closed and fixed predicates), the recognition problem can be used directly in the search for such a solution by applying exhaustive search. This search can be guided by some preferences of the designer. For example, a basic strategy could be to minimize the set of closed queries, and then maximize the set of fixed ones. More sophisticated strategies could involve a specified order in which the designer prefers to close predicates, reflecting, for instance, the cost of maintaining them (size statistics, availability, acquisition costs, etc). One could also consider semi-automated approaches, like a dialog approach where successive solutions are proposed to the designer, who adjusts the specification, or even the ontology, and accepts some suggested choices while rejecting others, thus converging to a satisfactory focusing solution. We leave investigating such strategies to future research.
An additional criterion that might be useful in the search for suitable focusing solutions is the existence of consistent database instances. This is embodied in the following decision problem.Here stands for a collection of parameters.
|Input:||A triple , where|
|Question:||Is for each legal for ?|
Obviously a single consistent database instance does not make a focusing solution very useful, but the criterion can help eliminate some utterly useless solutions.
Assuming that a satisfactory focusing solution is found, how do we use it? Two reasoning tasks are crucial in the operation of the system resulting from focusing. First, whenever a tuple is inserted, deleted, or updated, the system needs to check that the resulting database instance is still consistent. This makes the feasibility of the following problem essential.
|Input:||A triple , where|
Finally, the system is there to be queried. This makes the following entailment problem relevant.
|Input:||A tuple , where|
|Question:||Is true in all ?|
Note that is a special case of non-, but not conversely, so complexity of the two problems might differ.
3. Concrete problems
To illustrate some concrete settings that may be useful, in this section we instantiate the reasoning problems with some selected languages. We discuss how the problems can be solved in these concrete cases, and provide complexity results for them.
3.1. Ontology and Query Languages
Here we recall the concrete formalisms for expressing theories and queries that we study. As ontology languages we look at description logics (DLs), and as query languages we consider instance queries, atomic queries, and conjunctive queries (CQs).
Description Logics (DLs)
DLs is a family of logics, specifically tailored for writing ontologies (Baader et al., 2017). Most DLs are fragments of first-order logic, which allow only unary and binary relation symbols. This and other restrictions allow DLs to be equipped with a special syntax, which allows to write formulas in a more concise way. We now introduce the main DL of this paper, called .
Let be a countably infinite set of unary relation symbols, called concept names, and let be a countably infinite set of binary relation symbols, called role names. If , then and the expression are roles ( is also called the inverse of ). For a role , we let . We let . The set of concepts is defined inductively as follows: (a) every concept name is a concept; (b) the expressions and are concepts; (c) the expression , where , is also a concept (called nominal); (d) if are concepts, and is a role, then , , , and are also concepts. A concept inclusion is an expression of the form , where are concepts. A role inclusion is an expression of the form , where are roles. A functionality assertion is an expression of the form , where is a role. An () ontology is a finite set of concept inclusions, role inclusions, and functionality assertions. The DL that is obtained by disallowing functionality assertions, role inclusions, or nominals is indicated by, respectively, dropping ‘’, ‘’, or ‘’ from its name. An ontology is in the DL , if it contains only concept inclusions, and they are built using only the constructs and (i.e., nominals, , and , as well as role inclusions and functionality assertions, are forbidden). We use and to denote the sets of concept names and role names that appear in , respectively. We let .
The semantics to ontologies is given using instances, defined in Section 2.3. Assume an instance and an ontology . We define a function that maps every concept to a set , and every role to a relation . For a concept name , and a role name , we let and . The function is then extended to the remaining concepts and roles as follows:
We say is a model of , if the following are satisfied: (1) for all concept and role inclusions , and (2) and imply , for all and . We note that by defining the semantics using database instances, instead of general first-order structures (or interpretations, as they are called in DLs), we are effectively interpreting ontologies under the Standard Name Assumption (SNA). That is, the domain of interpretation is always the set , and the interpretation of constants is given by the identity function. However, the active domain of the instance, as defined in Section 2.3, may well be a proper subset of .
In order to simplify presentation, when providing upper bounds we can concentrate wlog. on ontologies in normal form, which is defined as following. A simple concept is any concept in . We use to denote the set of simple concepts that appear in . An ontology is in normal form if all its statements are of one of the following forms:
where are simple concepts, , , and are roles. It is well known that any ontology can be transformed into an ontology in normal form such that and have the same models up to the signature of .
We will also study DLs whose definition is based on the above normal form. In particular, ontologies are ontologies in normal form that additionally satisfy the following:
for all , we have ,
for all , we have , and
functional roles have no subroles: and imply .
The DL is obtained by applying the above restrictions, but additionally prohibiting nominals, as well as and ; that is, we only have concept inclusions of the form and the two forms (i) and (ii) shown above. Observe that
We let be the class of conjunctive queries (primitive positive first-order formulas) over the signature , with the usual semantics. Occasionally we talk about , the class of unions of conjunctive queries (UCQs), corresponding to positive existential first-order formulas. We also consider the class of atomic queries, and the class of instance queries, that is, atomic queries over concepts. If , we can view as a set of predicates.
3.2. Recognizing focusing solutions
In this section we deal with recognizing focusing solutions for ontologies. The two conditions in the definition of focusing solutions (Definition 5) are closely related to natural variants of two classical problems in description logics.
The first condition of Definition 5 boils down to a variant of the classical query entailment problem that only considers models where selected predicates have finite extensions (the remaining predicates may have a finite or an infinite extension).
|Input:||A tuple where|
|Question:||Is satisfied in every model of such that
This mixed variant generalizes the usual finite and unrestricted variants of OMQA. Recall that a logic has finite controllability if, for each theory expressed in this logic, non-entailed queries have finite counter-models. For logics enjoying finite controllability, the three variants of OMQA coincide. This is the case for , which makes our problem 2ExpTime-complete (Calvanese et al., 2009).
In the second condition of Definition 5, the computational crux of the matter is the following problem.
|Input:||A tuple where|
|Question:||Do all over with admit with ?|
The nullability problem is closely related to the query emptiness problem, which is known to be NExpTime-complete for atomic queries, ontologies, and no closed predicates (Baader et al., 2016). We show that nullability of atomic queries for with closed instance queries (i.e., closed concepts) is in . Allowing closed roles rather quickly leads to undecidability, but for sufficiently restricted DLs we regain decidability.
Theorem 1 ().
The following hold:
Given an ontology , , , and , we need to check that for each over with non-empty there exists with . We first show that if this is not the case, then there is a witnessing instance over with .
In what follows, a type is any subset of such that and . For an instance and , the type of in is the set of concepts such that .
Let be an instance over such that and for all . Let . Let be obtained from by identifying constants that have the same type in ; when some constants are identified, all edges incident with them become incident with the resulting constant; nominals remain themselves.
It is easy to see that . A suitable witness is obtained from by identifying constants in that have the same type in . By construction, is an extension of that is a model of and preserves closed concepts.
Suppose that for some . We shall turn into such that . Let us replace the copy of contained in by a copy of . That is, . We now lift the interpretation of concepts and roles from . For , let be the unique constant in that has the same type in as . For , let . We let iff for and iff for . By construction, is an extension of that is a model of , preserves closed concepts, and realizes the same types. It follows that and .
We can now describe the algorithm. We guess universally an instance over with . We should accept if either or , where is and is . Both these checks amount to deciding if there is a model of a given ontology that extends a given instance while preserving a given set of closed concepts. For ontologies this problem is in NP in terms of data complexity (Lutz et al., 2015). More precisely, it can be solved by a nondeterministic algorithm with running time , which is sufficient to guarantee the upper bound. One way to do this is as follows. Guess the restriction of to . Use type elimination to compute realizable types. Start from the set of types that are realized in or do not contain closed concepts. Iteratively remove types that cannot have their existential restrictions satisfied using available types; for types realized in ignore existential restrictions that are already fulfilled in . When the set of available types stabilizes, accept iff it contains each type realized in .
Undecidability is obtained via a reduction from the halting problem for deterministic Turing machines, relying on the ability enforce that the counter witness to nullability is a grid, and the upper bound for the third problem is obtained by a a polynomial reduction to the first problem (see Appendix for details). ∎
Let us now see that the two introduced problems indeed help solve the focusing problem, and that lower bounds and undecidability propagate to focusing as well.
Theorem 2 ().
are 2ExpTime-complete, and
We begin with the upper bound for the first problem. Let be an ontology and let with , , , and . As a first step, we reduce to the case with only one fixed query , that additionally satisfies . Let be the following extension of . For each , introduce a fresh concept name . If and , axiomatize as
If and , axiomatize as
Finally, add yet another fresh concept name , axiomatized as
Let . By construction, is a focusing solution for iff is a focusing solution for . Notice that has polynomial size, but computing it involves finding for each . As we have already mentioned, for this is in 2ExpTime (Calvanese et al., 2009), so we are within the intended complexity bounds.
Thus, without loss of generality we can assume that our input is and such that and . For such inputs, the first condition of Definition 5 is an instance of , which we shall prove to be in (Theorem 1). For the second condition of Definition 5, we first observe that we can eliminate the single fixed query without affecting the set of intended models: it suffices to add the concept inclusion to . Then, to reduce the second condition to mixed query answering, we first introduce fresh duplicates and for all and that do not occur in . Next, we construct and for all by replacing original predicates in and with their duplicates. It is straightforward to see that for all and iff over all models of with finite predicates . Query containment can be reduced to query answering in the usual way, by incorporating the query into the ontology using nominals; the presence of finiteness assumptions (in both problems) does not affect the construction. This way we have reduced the second condition to polynomial-size instances of mixed query answering. As we have argued, the latter is in 2ExpTime for , which gives the desired upper bound for the focusing problem.
To obtain the upper bound for the second problem, note that in the absence of fixed predicates the first condition of Definition 5 trivializes and the second condition can be checked exactly like before, but without the first preprocessing step.
For the third problem the upper bound follows directly from either of the previous cases. To show the lower bound, we reduce from the standard unrestricted query answering problem for (Ngo et al., 2016). Let be an ontology and let . Let be the ontology obtained from by introducing a fresh concept name , axiomatized with and for all , , and replacing each concept inclusion with . Each model of can be turned into a model of by letting ; each model of can be turned into a model of by restricting it to . Consequently, , but there is a model of that satisfies , because outside of no restrictions are imposed by . It follows that is a focusing solution for iff entails .
The goal of this section is to prove the following theorem.
Theorem 3 ().
is 2ExpTime-complete in combined complexity, and coNP-complete in data complexity.
Let be an KB, a set of CQs, a Boolean CQ, and an instance. Unless specified otherwise, we will always consider -instances where is the signature of , i.e., the set of concept and role names occurring in . Our goal is to decide if for every instance , . This is clearly equivalent to the problem of finding a counter model for : an instance such that . Moreover, it suffices to consider counter models that are almost forests. An instance is a tree extension of if such that and is a collection of trees of bounded degree in which elements of occur only in leaves (and never in roots, even if they are also leaves). Note that the partition of into and is unique. We call the forest of .
Lemma 0 ().
The following are equivalent:
There exists such that ;
There exists such that and is a tree extension of .
Next we observe that for every query can be reformulated as a non-entailment problem of a UCQ capturing ’bad matches’ of queries in . Intuitively, a match of a query in is bad if , or if it maps an answer variable to some not in . To this aim, we assume below that we have two concept names and such that for each instance in we have and .
Lemma 0 ().
For every in , let
Then iff and .
Putting all pieces together, the problem thus reduces to deciding the existence of a tree-like model of that extends and is a counter model for the UCQ .
Using an approach similar to that used e.g. in (Eiter et al., 2012; Lutz et al., 2015) we will start by establishing the coNP upper bound in data complexity by decomposing counter models and then use a guess and check algorithm for finding such decompositions.
Given an instance and an element , the -type of in is defined as . A (realizable) unary type for is a subset of such that there is a model of and with . Further, for unary types and role we write if there is a model of and such that , and . We now extend the notion of types to capture small substructures of tree extensions of .
Definition 0 ().
Let . A -type for is a finite tree extension of such that
the forest of is a single tree of out-degree at most and depth at most ;
for each role , and all , ;
for each at depth at most and each in there exists such that ;
for all in .
We write for the root of the unique tree in the forest of .
For an element , we use to denote the subinstance of induced by and the subtree of depth at most of the only tree in the forest of , rooted at . We write if there is an isomorphism from to such that .
Definition 0 ().
A set of -types for is coherent if the following are satisfied:
for all , and ;
for every and in there exists such that there is an -edge from to and ;
for every and every successor of , there is such that with .
Let be union of Boolean CQs, each using at most variables. Then iff for each coherent set of -types for it holds that .
Thus, we can test whether , by universally guessing a set of coherent types and verifying that . The size of an -type for is at most . Because all -types in a coherent set coincide over , up to isomorphism there is at most different -types. Hence, we can impose the same bound on the size the sets guessed in the algorithm. Further, checking whether a coherent set of -types satisfies a union of Boolean CQs, each using at most variables can be done in time . We apply this algorithm to . We can take for