Consistent Query Answering under Spatial Semantic Constraints

Consistent Query Answering under Spatial Semantic Constraints

M. Andrea Rodríguez
Universidad de Concepción, Chile
andrea@udec.cl
   Leopoldo Bertossi
Carleton University, Canada
bertossi@scs.carleton.ca
Faculty Fellow of the IBM Center for Advanced Studies. Also affiliated to Universidad de Concepción, Chile.
   Mónica Caniupán
Universidad del Bío-Bío, Chile
mcaniupa@ubiobio.cl
Abstract

Consistent query answering is an inconsistency tolerant approach to obtaining semantically correct answers from a database that may be inconsistent with respect to its integrity constraints. In this work we formalize the notion of consistent query answer for spatial databases and spatial semantic integrity constraints. In order to do this, we first characterize conflicting spatial data, and next, we define admissible instances that restore consistency while staying close to the original instance. In this way we obtain a repair semantics, which is used as an instrumental concept to define and possibly derive consistent query answers. We then concentrate on a class of spatial denial constraints and spatial queries for which there exists an efficient strategy to compute consistent query answers. This study applies inconsistency tolerance in spatial databases, rising research issues that shift the goal from the consistency of a spatial database to the consistency of query answering.

1 Introduction

Consistency in database systems is defined as the satisfaction by a database instance of a set of integrity constraints (ICs) that restricts the admissible database states. Although consistency is a desirable and usually enforced property of databases, it is not uncommon to find inconsistent spatial databases due to data integration, unforced integrity constraints, legacy data, or time lag updates. In the presence of inconsistencies, there are alternative courses of action: (a) ignore inconsistencies, (b) restore consistency via updates on the database, or (c) accept inconsistencies, without changing the database, but compute the “consistent or correct” answers to queries. For many reasons, the first two alternatives may not be appropriate [6], specially in the case of virtual data integration [5], where centralized and global changes to the data sources are not allowed. The latter alternative has been investigated in the relational case [4, 10]. In this paper we explore this approach in the spatial domain, i.e., for spatial databases and with respect to spatial semantic integrity constraints (SICs).

Extracting consistent data from inconsistent databases could be qualified as an “inconsistency tolerant” approach to querying databases. A piece of data will be part of a consistent answer if it is not logically related to the inconsistencies in the database with respect to its set of ICs. We introduce this idea using an informal and simple example.

Example 1

Consider a database instance with a relation LandP, denoting land parcels, with a thematic attribute (), and a spatial attribute, , of data type . An IC stating that geometries of two different land parcels must be disjoint or just touch, i.e., they cannot internally intersect, is expected to be satisfied. However, the instance in Figure 1 does not satisfy this IC and therefore it is inconsistent:  the land parcels with idls and overlap. Notice that these geometries are partially in conflict and what is not in conflict can be considered as consistent data.

LandP
idl geometry
Figure 1:  An inconsistent spatial database

Suppose that a query requests all land parcels whose geometries intersect with a query window, which represents the spatial region shown in Figure 1 as a rectangle with dashed borders. Although the database instance is inconsistent, we can still obtain useful and meaningful answers. In this case, only the intersection of and is in conflict, but the rest of both geometries can be considered consistent and should be part of any “database repair” if we decide to restore consistency by means of minimal geometric changes. Thus, since the non-conflicting parts of geometries and intersect the query window, we would expect an answer including land parcels with identities and .

If we just concentrate on (in)consistency issues in databases (leaving aside consistent query answering for a moment), we can see that, in contrast to (in)consistency handling in relational databases, that has been largely investigated, not much research of this kind has been done for spatial databases. In particular, there is not much work around the formalization of semantic spatial ICs, satisfaction of ICs, and checking and maintenance of ICs in the spatial domain. However, some papers address the specification of some kinds of integrity constraints [8, 20], and checking topological consistency at multiple representations and for data integration [13, 14, 31].

More recently, [12] proposes qualitative reasoning with description logic to describe consistency between geographic data sets. In [22] a set of abstract relations between entity classes is defined; and they could be used to discover redundancies and conflicts in sets of SICs. A proposal for fixing (changing) spatial database instances under different types of spatial inconsistencies is given in [29]. According to it, changes are applied over geometries in isolation; that is, they are not analyzed in combination with multiple SICs. In [27] some issues around query answering under violations of functional dependencies involving geometric attributes were raised. However, the problem of dealing with an inconsistent spatial database, while still obtaining meaningful answers, has not been systematically studied so far.

Consistent query answering (CQA) from inconsistent databases as a strategy of inconsistent tolerance has an extensive literature (cf. [4, 6, 10] for surveys). It was introduced and studied in the context of relational database in [2]. They defined consistent answers to queries as those that are invariant under all the minimal forms of restoring consistency of the original database. Thus, the notion of repair of an instance with respect to a set of ICs becomes a fundamental concept for defining consistent query answers. A repair semantics defines the admissible and consistent alternative instances to an inconsistent database at hand. More precisely, a repair of an inconsistent relational instance is a consistent instance obtained from by deleting or inserting whole tuples. The set of tuples by which and differ is minimal under set inclusion [2]. Other types of repair semantics have been studied in the relational case. For example, in [16, 32] repairs are obtained by allowing updates of attribute values in tuples.

In comparison to the relational case, spatial databases offer new alternatives and challenges when defining a repair semantics. This is due, in particular, to the use of complex attributes to represent geometries, their combination with thematic attributes, and the nature of spatial (topological) relations.

In this work we define a repair semantics for spatial databases with respect to a subset of spatial semantic integrity constraints (a.k.a. topo-semantic integrity constraints) [29], which impose semantic restrictions on topological predicates and combinations thereof. In particular, we treat spatial semantic integrity constraints that can be expressed by denials constraints. For example, they can specify that “two land parcels cannot internally intersect”. This class of constraints are neither standardized nor integrated into current spatial database management systems (DBMSs); they rather depend on the application, and must be defined and handled by the database developers. They are very important because they capture the semantics of the intended models. Spatial semantic integrity constraints will be simply called spatial integrity constraints (SICs). Other spatial integrity constraints [11] are domain (topological or geometric) constraints, and they refer to the geometry, topology, and spatial relations of the spatial data types. One of them could specify that “polygons must be closed”. Many of these geometric constraints are now commonly integrated into spatial DBMSs [23].

A definition of a repair semantics for spatial DBs and CQA for spatial range queries was first proposed in [28], where we discussed the idea of shrinking geometries to solve conflicting tuples and applied to CQA for range queries. In this paper we complement and extend our previous work with the following main contributions: (1) We formalize the repair semantics of a spatial database instance under violations of SICs. This is done through virtual changes of geometries that participate in violations of SICs. Unlike [28], we identify the admissible local transformations and we use them to provide an inductive definition of database repair. (2) Based on this formalization, a consistent answer to a spatial query is defined as an answer obtained from all the admissible repairs. Extending the results in [28], we now define CQA not only for range but also for spatial join queries. (3) Although the repair semantics and consistent query answers can be defined for a fairly broad class of SICs and queries, as it becomes clear soon, naive algorithms for computing consistent answers on the basis of the computation of all repairs are of exponential time. For this reason, CQA for a relevant subset of SICs and range and join queries is done via a core computation. This amounts to querying directly the intersection of all repairs of an inconsistent database instance, but without actually computing the repairs. We show cases where this core can be specified as a view of the original, inconsistent database. (4)  We present an experimental evaluation with real and synthetic data sets that compares the cost of CQA with the cost of evaluating queries directly over the inconsistent database (i.e., ignoring inconsistencies).

The rest of the paper is organized as follows. In Section 2 we describe the spatial data model upon which we define the repair semantics and consistent query answers. A formal definition of repair for spatial inconsistent databases under SICs is introduced in Section 3. In Section 4 we define consistent answers to conjunctive queries. We analyze in particular the cases of range and join queries with respect to their computational properties. This leads us, in Section 5, to propose polynomial time algorithms (in data complexity) for consistent query answering with respect to a relevant class of SICs and queries. An experimental evaluation of the cost of CQA is provided in Section 6. Final conclusions and future research directions are given in Section 7.

2 Preliminaries

Current models of spatial database are typically seen as extensions of the relational data model (known as extended-relational or object-relational models) with the definition of abstract data types to specify spatial attributes. We now introduce a general spatio-relational database model that includes spatio-relational predicates (they could also be purely relational) and spatial ICs. It uses some of the definitions introduced in [25]. The model is independent of the geometric data model (e.g. Spaghetti [30], topological [18, 30], raster [19], or polynomial model [24]) underlying the representation of spatial data types.

A spatio-relational database schema is of the form , where: (a) is the possibly infinite database domain of atomic thematic values. (b) is a set of thematic, non-spatial, attributes. (c) is a finite set of spatio-relational predicates whose attributes belong to or are spatial attributes. Spatial attributes take admissible values in , the power set of , for an that depends on the dimension of the spatial attribute. (d) is a fixed set of binary spatial predicates, with a built-in interpretation. (e) is a fixed set of geometric operators that take spatial arguments, also with a built-in interpretation. (f) is a fixed set of built-in relational predicates, like comparison predicates, e.g. , which apply to thematic attribute values.

Each database predicate has a type , with , indicating the number of thematic attributes, and the spatial dimension of the single spatial attribute (it takes values in )).111For simplicity, we use one spatial attribute, but it is not difficult to consider a greater number of spatial attributes. In Example 1, , since it has one thematic attribute () and one spatial attribute () defined by a 2D polygon. In this work we assume that each relation has a key of the form (1) formed by thematic attributes only:

(1)

where the are sequences of distinct variables representing thematic attributes of , and the are variables for geometric attributes. Here means geometric equality; that is, the identity of two geometries.

A database instance of a spatio-relational schema is a finite collection of ground atoms (or spatial database tuples) of the form , where , contains the thematic attribute values, and , where is the class of admissible geometries (cf. below). The extension in a particular instance of a spatio-relational predicate is a subset of . For simplicity, and to fix ideas, we will consider the case where .

Among the different abstraction mechanisms for modelling single spatial objects, we concentrate on regions for modelling real objects that have an extent. They are useful in a broad class of applications in Geographic Information Systems (GISs). More specifically, our model will be compatible with the specification of spatial operators (i.e., spatial relations or geometric operations) as found in current spatial DBMSs [23]. Following current implementations of DBMSs, regions could be defined as finite sets of polygons that, in their turn, are defined through their vertices. This would make regions finitely representable. However, in this work geometries will be treated at a more abstract level, which is independent of the spatial model used for geometric representation. In consequence, an admissible geometry of the Euclidean plane is either the empty geometry, , which corresponds to the empty subset of the plane, or is a closed and bounded region with a positive area. It holds , for every region . From now on, empty geometries and regions of are called admissible geometries and they form the class .

Geometric attributes are complex data types, and their manipulation may have an important effect on the computational cost of certain algorithms and algorithmic problems. As usual, we are interested in data complexity, i.e., in terms of the size of the database. The size of a spatio-relational database can be defined as a function of the number of tuples and the representation size of geometries in those tuples.

We concentrate on binary (i.e., two-ary) spatial predicates that represent topological relations between regions. They have a fixed semantics, and become the elements of . There are eight basic binary relations over regions of : , , , , , , , and [15, 26].222The names of relations chosen here are in agreement with the names used in current SQL languages [23], but differ slightly from the names found in the research literature. The relations found in current SQL languages are represented in Figure 2 with thick borders. The semantics of the topological relations follows the point-set topology defined in [15], which is not defined for empty geometries. We will apply this semantics to our non-empty admissible geometries. For the case of the empty set, a separate definition will be given below. According to [15], an atom becomes true if four conditions are simultaneously true. Those conditions are expressed in terms of emptyness () and non-emptyness () of the intersection of their boundaries () and interiors (). The definitions can be found in Table 1. For example, for non-empty regions , is true iff all of , , , and simultaneously hold.

Relation
DJ(x,y)
TO(x,y)
EQ(x,y)
IS(x,y)
CB(x,y)
IC(x,y)
CV(x,y)
OV(x,y)
Table 1: Definition of topological relations between regions based on point-set topology

In this work we exclude the topological relation from . This decision is discussed in Section 3, where we introduce the repair semantics. In addition to the basic topological relations, we consider three derived relations that exist in current SQL languages and can be logically defined in terms of the other basic predicates: , , and . We also introduce a forth relation, IIntersects (II), that holds when the interiors of two geometries intersect. It can be logically defined as the disjunction of , and (cf. Figure 2). For all the topological relations in , their converse (inverse) relation is within the set. Some of them are symmetric, like , , and . For the non-symmetric relations, the converse relation of is , of is , and of is .

As mentioned before, the formal definitions of the topological relations [15, 26] do not consider the empty geometry as an argument. Indeed, at the best of our knowledge, no clear semantics for topological predicates with empty geometries exists. However, in our case we extent the definitions in order to deal with this case. This will allow us to use a classical bi-valued logic, where atoms are always true or false, but never undefined. According to our extended definition, for any , is false if or . In particular, is false, for every admissible region . In order to make comparisons with the empty region, we will introduce and use a special predicate on admissible geometries, such that is true iff .

Figure 2:  Subsumption lattice of topological relations between regions: OV (Overlaps), CB (CoveredBy), IS (Inside), EQ (Equals), CV (Covers), IC (Includes), TO (Touches), DJ (Disjoint), IT (Intersects), II (IIntersects), WI (Within), and CO (Contains).

Notice that the semantics of the topological predicates, even for non-empty regions, may differ from the intuitive set-theoretic semantics one could assign to them. For example, for an admissible and non-empty geometry , is false (due to the conditions in the last two columns in Table 1). In consequence, the constraint is satisfied.

Given a database instance, additional spatial information is usually computed from the explicit geometric data by means of the spatial operators in associated with . Some relevant operators are: , (binary), , , , and .333Operator returns the geometry that represents the point set union of all geometries in a given set, an operator also known as a spatial aggregation operator. Although this function is part of SQL for several spatial databases (Postgres/PostGIS, Oracle), it is not explicitly defined in the OGC specification  [23]. (Cf. [23] for the complete set of spatial predicates defined within the Open GIS Consortium.) There are several spatial operators used in this work; however, we will identify a particular subset of spatial operators in , i.e., , which will be defined for all admissible geometries and used to shrink geometries with the purpose of restoring consistency, as we describe in Section 3.

Definition 1

The set of admissible operations contains the following geometric operations on admissible geometries and :
      (1) is the topological closure of the set-difference.

(2) is the geometry obtained by buffering a distance around , where is a distance unit. returns a closed region containing geometry , such that every point in the boundary of is at a distance from some point of the boundary of . In particular,

Notice that these operators, when applied to admissible geometries, produce admissible geometries.

Remark 1

The value of in Definition 1 is instance dependent. It should be precomputed from the spatial input data. For this work, we consider to be a fixed value associated with the minimum distance between geometries in the cartographic scale of the database instance.

A schema determines a many-sorted, first-order (FO) language of predicate logic. It can be used to syntactically characterize and express SICs. For simplicity, we concentrate on denial SICs,444Denial constraints are easier to handle in the relational case as consistency with respect to them is achieved by tuple deletions only [6]. which are sentences of the form:

(2)

Here, are finite sequences of geometric and thematic variables, respectively, and . Thus, each is a finite tuple of thematic variables and will be treated as a set of attributes, such that means that the variables in area also variables in . Also, stands for ; and stands for , with the universal quantifiers ranging over all the non-empty admissible geometries (i.e. regions). Here, , , is a formula containing built-in atoms over thematic attributes, and . A constraint of the form (2) prohibits certain combinations of database atoms. Since topological predicates for empty geometries are always false, the restricted quantification over non-empty geometries in the constraints could be eliminated. However, we do not want to make the satisfaction of the constraints rely on our particular definition of the topological predicates for the empty region. In this way, our framework becomes more general, robust and modular, in the sense that it would be possible to redefine the topological predicates for the empty region without affecting our approach and results.

Example 2

Figure 3 shows an instance for the schema , . Dark rectangles represent buildings and white rectangles represents land parcels. In , the thematic attributes are and , whereas is the spatial attribute of dimension . Similarly for , which has only as a thematic attribute.

The following sentences are denial SICs:  (The symbol stands for the universal closure of the formula that follows it.)

(3)
(4)

The SIC (3) says that geometries of land parcels with different ids cannot internally intersect (i.e., they can only be disjoint or touch). The SIC (4) establishes that building blocks cannot (partially) overlap land parcels.

LandP
idl name owner geometry
Building
idb geometry
Figure 3:  A spatial database instance

A database instance for schema can be seen as an interpretation structure for the language . For a set of SICs in ,   denotes that each of the constraints in is true in (or satisfied by) . In this case, we say that is consistent with respect to . Correspondingly, is inconsistent with respect to , denoted , when there is a that is violated by , i.e., not satisfied by . The instance in Example 2 is consistent with respect to its SICs.

In what follows, we will assume that the set of SICs under consideration is logically consistent; i.e., that there exists a non-empty database instance (not necessarily the one at hand), such that . For example, any set of SICs containing a constraint of the form is logically inconsistent. The analysis of whether a set of SICs is logically consistent or not is out of the scope of this work.

3 A Repair Semantics

Different alternatives for update-based consistency restoration of spatial databases are discussed in [28]. One of the key criteria to decide about the update to apply is minimality of geometric changes. Another important criteria may be the semantics of spatial objects, which makes changes over the geometry of one type of object more appropriate than others. For this work, the repair semantics is a rule applied automatically. It assumes that no previous knowledge about the quality and relevance of geometries exists and, therefore, it assumes that geometries are all equally important.

On the basis on the minimality condition on geometric changes and the monotonicity property of some topological predicates [28], we propose to solve inconsistencies with respect to SICs of the form (2) through shrinking of geometries. Notice that this repair semantics will be used as an instrumental concept to formalize consistent query answers (no actual modification over the database occurs). As such, it defines what part of the geometry is not in conflict with respect to a set of integrity constraints and can, therefore, be part of a consistent answer.

Shrinking geometries eliminates conflicting parts of geometries without adding new uncertain geometries by enlargement. In this way, we are considering a proper subset of the possible changes to fix spatial databases proposed in [29]. We disregard translating objects, because they will carry potentially new conflicts; and also creating new objects (object splitting), because we would have to deal with null or unknown thematic attributes.

The SICs of the form (2) exclude the topological predicate Disjoint. The reason is that falsifying an atom by shrinking geometries is not possible, unless we make one of them empty. However, doing so would heavily depend upon our definition of this topological predicate for empty regions. Since we opted for not making our approach and results depend on this particular definition, we prefer to exclude the Disjoint predicate from our considerations. The study of other repair semantics that sensibly includes the topological predicate will be left for future work.

Technically, a database violates a constraint , with ,555For simplicity and without lost of generality, in the examples we consider denial constraints with at most two spatio-relational predicates and one topological predicate. However, a denial constraint of the form (2) may have more spatio-relational predicates and topological predicates. when there are data values , with , for the variables in the constraint such that becomes true in the database under those values. This is denoted with . When this is the case, it is possible to restore consistency of by shrinking or such that becomes false.

We can compare geometries, usually an original geometry and its shrunk version, by means of a distance function that refers to their areas. We assume that is an operator that computes the area of a geometry.

Definition 2

For regions ,  .

Since we will compare a region with a region obtained by shrinking , it will hold . Indeed, when comparing 666 stands for geometric inclusion, the distance function can be simplified by . We will assume that it is possible to compare geometries through the distance function by correlating their tuples, one by one. This requires a correspondence between instances.

Definition 3

Let be database instances of schema . is -indexed if is a bijective function from to , such that, for all : , for some region .

In a -indexed instance we can compare tuples one by one with their counterparts in instance . In particular, we can see how the geometric attribute values differ. In some cases there is an obvious function , for example, when there is a key from a subset of to the spatial attribute , or when relations have a surrogate key for identification of tuples. In these cases we simply use the notion of -indexed. When the context is clear, we also use instead of .

Example 3

(example 2 cont.) Consider the relational schema . For the instance given in Example 2, the following instance is -indexed

LandP
idl name owner geometry

Here, , etc.

When restoring consistency, it may be necessary to consider different combinations of tuples and SICs. Eventually, we should obtain a new instance, hopefully consistent, that we have to compare to the original instance in terms of their distance.

Definition 4

Let be spatial database instances over the same schema , with -indexed. The distance  between and  is the numerical  value , where is the projection of tuple on its spatial attribute .

Now it is possible to define a “repair semantics”, which is independent of the geometric operators used to shrink geometries.

Definition 5

Let be a spatial database instance over schema , a set of SICs, such that . (a) An s-repair of with respect to is a database instance over , such that: (i) . (ii) is -indexed. (iii) For every tuple , if , then . (b) A minimal s-repair of is a repair of such that, for every repair of , it holds .

Proposition 1

If is consistent with respect to , then is also its only minimal s-repair.

Proof:   For , it holds: (i) , (ii) is -indexed, (iii) for every tuple , if , then . In this case, . Any other consistent instance obtained by shrinking any of ’s geometries and still obtaining admissible geometries gives .

This is an “ideal and natural” repair semantics that defines a collection of semantic repairs. The definition is purely set-theoretic and topological in essence. It is worth exploring the properties of this semantics and its impact on properties of consistent query answers (as invariant under minimal s-repairs) and on logical reasoning about them. However, for a given database instance we may have a continuum and infinite number of s-repairs since between two points we have an infinite number of points, which we want to avoid for representational and computational reasons.

In this work we will consider an alternative repair semantics that is more operational in nature (cf. Definition 8), leaving the previous one for reference. This operational definition of repair makes it possible to deal with repairs in current spatial DBMSs and in terms of standard geometric operators (cf. Lemma 1). Under this definition, there will always be a finite number of repairs for a given instance. Consistency will be restored by applying a finite sequence of admissible transformation operations to conflicting geometries.

It is easy to see that each true relationship (atom) of the form , with , can be falsified by applying an admissible transformation in to or . Actually, they can be falsified in a canonical way. These canonical falsification operations for the different topological atoms are presented in Table 2. They have the advantages of: (a) being defined in terms of the admissible operators, (b) capturing the repair process in terms of the elimination of conflicting parts of geometries, and (c) changing one of the geometries participating in a conflict.

x Pred. x A true atom becomes a false atom with
x x 1.  If :
, .
2.  If :
, .
3. If :
, .
4. If :
, .
x x 1. If :
, .
2. If :
, .
3. , .
x x 1.  If :
, .
2.  If :
, .
3. , .
x x 1. , .
2. , .
x x1. , .
2. , .
(See Remark 1 for definition of )
x x 1. , .
2. , .
Table 2: Admissible transformations

More specifically, in Table 2 we indicate, for each relation , alternative operations that falsify a true atom of the form . Each of them makes changes on one of the geometries, leaving the other geometry unchanged. The list of canonical transformations in this table prescribes particular ways of applying the admissible operators of Definition 1. Later on, they will also become the admissible or legal ways of transforming geometries with the purpose of restoring consistency.

For example, Table 2 shows that for , there are in principle four ways to make false an atom that is true. These are the alternatives 1. to 4. in that entry, where alternatives 1. and 2. change geometry ; and alternatives 3. and 4. change geometry . Only one of these alternatives that satisfies its condition is expected to be chosen to falsify the atom. A minimal way to change a geometry depends on the relative size between overlapping and non-overlapping areas: (i) when the overlapping area between and is smaller than or equal to their non-overlapping areas, a minimal change over geometry is , and over is (cases 1. and 3. for in Table 2). (ii) When the non-overlapping areas of or are smaller than the overlapping area, a minimal change over geometry is , and over geometry is (cases 2. and 4. for in Table 2).

For the case when is true, the transformations in Table 2 make either geometry, or empty to falsify the atom. However, there are other alternatives that by shrinking geometries would achieve the same result, but also producing smaller changes in terms of the affected area. A natural candidate update consists in applying the transformation (similarly and alternatively for ). In this case, we just take away from the part of the internal area of width surrounding the boundary of , to make it different from . We did not follow this alternative for practical reasons: having two geometries that are topologically equal could, in many cases, be the result of duplicate data, and one of them should be eliminated. Moreover, this alternative, in comparison with the officially adopted in this work, may create new conflicts with respect to other SICs. Avoiding them whenever possible will be used later, when designing a polynomial algorithm for CQA based on the core of an inconsistent database instance (see Section 5).

Table 2 shows that Touches and Intersects are predicates for which the eliminated area is not completely delimited by the real boundary of objects. Actually, we need to separate the touching boundaries. We do so by buffering a distance around one of the geometries and taking the overlapping part from the other one.777The buffer operator does not introduce new points in the geometric representation of objects, but it translates the boundary a distance outwards.

The following result is obtained directly from Table 2.

Lemma 1

For each topological predicate and true ground atom , there are geometries obtained by means of the corresponding admissible transformation in Table 2, such that becomes false.

The following definition defines, for each geometric predicate , a binary geometric operator such that, if is true, then returns a geometry such that becomes false. The definition is based on the transformations that affect geometry in Table 2.

Definition 6

Let be a topological predicate. We define an admissible transformation operator as follows:

  • If is false, then .

  • If is true, then:

It can be easily verified that the admissible operations , applied to admissible geometries, produce admissible geometries. They can be seen as macros defined in terms of the basic operations in Definition 1, and inspired by Table 2. The idea is that the operator takes , for which is true, and makes the latter false by transforming into , i.e., becomes false.

Definition 6 can also be used to formalize the transformations on geometry indicated in Table 2. First, notice that for the converse predicate of predicate it holds: true iff . Secondly, the converse of a transformation operator can be defined by  . In consequence, we can apply to , obtaining the desired transformation of geometry . In this way, all the cases in Table 2 are covered. For example, if we want to make false a true atom , we can apply , but also .

Example 4

Table  3 illustrates the application of the admissible transformations to restore consistency of predicates . The dashed boundary is the result of applying Buffer.

Original


Table 3: Examples of admissible transformations

We now define the notion of accessible instance that results from an original instance, after applying admissible transformation operations to geometries. The application of sequences of operators solves violations of SICs. Accordingly, the accessible instances are defined by induction.

Definition 7

Let be a database instance. is an accessible instance from (with respect to a finite set of SICs ), if is obtained after applying, a finite number of times, the following inductive rules (any of them, when applicable):
(1).   .

(2).  There is an accessible instance from , such that, for some with a topological predicate , 888 may have more than one topological predicate. through tuples and in , for which is true; and
         (a) ,  or
         (b) .

Example 5

Consider the database instance in Figure 4(a) that is inconsistent with respect to SIC (3). An accessible instance from this inconsistent database is in Figure 4(b), where only has changed. This can be expressed in the following way: .

Figure 4: An accessible instance: (a) original instance, and (b) accessible transformation (geometry with thick boundary changed)

Given a database , possibly inconsistent, we are interested in those accessible instances that are consistent, i.e., . Even more, having the repairs in mind, we have to make sure that admissible instances from can still be indexed with .

Proposition 2

Let be an accessible instance from . Then, is -indexed to via an index function , that can be defined by induction on .

Proof:   To simplify the presentation, we will assume that has an index (or surrogate key) , that is a one-to-one mapping from to an initial segment of . Let be an accessible instance from . We define for tuples in by induction on :
(1). If and , .

(2). If there is an accessible instance from and through the atoms , , and with and the converse relation of :
(a)  , and and , or (b)  , and and