Incremental Tabling in Support of Knowledge Representation and Reasoning

# Incremental Tabling in Support of Knowledge Representation and Reasoning

Terrance Swift
NOVALincs
###### Abstract

Resolution-based Knowledge Representation and Reasoning (KRR) systems, such as Flora-2, Silk or Ergo, can scale to tens or hundreds of millions of facts, while supporting reasoning that includes Hilog, inheritance, defeasibility theories, and equality theories. These systems handle the termination and complexity issues that arise from the use of these features by a heavy use of tabled resolution. In fact, such systems table by default all rules defined by users, unless they are simple facts.

Performing dynamic updates within such systems is nearly impossible unless the tables themselves can be made to react to changes. Incremental tabling as first implemented in XSB [Saha (2006)] partially addressed this problem, but the implementation was limited in scope and not always easy to use. In this paper, we introduce automatic incremental tabling which at the semantic level supports updates in the 3-valued well-founded semantics, while guaranteeing full consistency of all tabled queries. Automatic incremental tabling also has significant performance improvements over previous implementations, including lazy recomputation, and control over the dependency structures used to determine how tables are updated.

## 1 Introduction

Tabled Logic Programming has supported a variety of applications that would be difficult to implement in Prolog alone, including model checking, program analysis, ontology-based deductions and decision making for collaborative agents. Typically such applications are written mainly as Prolog programs, but with a subset of the predicates tabled in order to support termination, reduce complexity, to use well-founded negation or to exploit other features.

However, systems such as Flora-2 [Yang et al. (2013)] and its extensions: Silk (cf. silk.semwebcentral.org), Ergo (cf. coherentknowledge.com/publications) and the RAVE system (cf. www.sri.com/about/people/grit-denker) have been recently developed for knowledge representation and reasoning (KRR), and rely on tabled resolution for their computational underpinning. For instance, Flora-2 [Yang et al. (2013)], which is based on XSB [Swift and Warren (2012)], supports the non-monotonic inheritance of F-logic, prioritized defeasibility with multiple levels of conflicts, rule identifiers, function symbols, logical constraints, and HiLog. Silk and Ergo, both based on Flora-2, support all of the above features plus omni axioms, which are contrapositional rules whose bodies and heads are comprised of any formulas that can be supported by the Lloyd-Topor transformation [Lloyd and Topor (1984)].

As an example of using these features, given the sentence: A contractile vacuole is inactive in an isotonic environment from [Reece et al. (2010)], a tool called Linguist (www.haleyai.com) produces a Silk or Ergo formula in a mostly automatic manner (knowledge engineers may have to choose between translations in ambiguous cases), resulting in the axiom: forall(?x6)^contractile(vacuole)(?x6) == forall(?x9)^isotonic(environment)(?x9) == inactive(in(?x9))(?x6); Such an axiom is next translated into several Flora-2 rules about conditions of contractile vacuoles, inactive contractile vacuoles, and isotonic environments. These Flora-2 rules are then transformed to support HiLog, defeasibility and other features, resulting in numerous normal rules executed in XSB. Once a knowledge base has been constructed from axioms such as the one above, queries can be made such as: If a Paramecium swims from a hypotonic environment to an isotonic environment, will its contractile vacuole become more active? The translation of queries is similar to that of knowledge, but may include hypothetical information, e.g., that ?x is a Paramecium swimming from a hypotonic environment to an isotonic environment. Knowledge bases themselves are built from a collection of rules and omni axioms usually written by different knowledge engineers using a shared background vocabulary. The limited coordination among knowledge engineers is critical for producing knowledge bases at a low cost.

All of the the KRR-systems mentioned above employ what may be called pervasive tabling where a predicate is tabled unless it is explicitly declared non-tabled. Such programs have an operational behavior that is vastly different from (tabled) Prolog. Among other matters, as many of these tables represent background knowledge, it is critical for good system performance to reuse tables between queries. However, because queries may include hypothetical knowledge, and because knowledge bases are created by interactively adding or modifying rules, good performance demands the use of incremental tabling [Saha and Ramakrishnan (2005), Saha (2006)].

The main idea behind incremental tabling is to maintain an Incremental Dependency Graph (IDG), indicating how tabled goals depend both on dynamic code and on one another. When an update is made to dynamic code, the IDG is traversed, and affected tables are updated if necessary. However, while previous versions of incremental tabling were robust enough to support a commercial application [Ramakrishnan et al. (2007)], they were not sufficient to support high-level KRR applications. Most significantly, a programmer had to decide when tables were updated: either an update was forced immediately upon an assert or retract, or the programmer performed “bulk” updates, after which a command propagated the updates to all affected tables. This methodology was complicated and had semantic drawbacks: unless an update was manually invoked, there was no guarantee that tables would be updated and no provision for stronger forms of view consistency. In fact, because of the brittleness caused by the need for low-level control along with other drawbacks, previous versions of incremental tabling, (designated here as manual incremental tabling) were suitable only for careful use by tabling experts.

Support for pervasive tabling requires that a tabling engine be redesigned in several ways, including the mechanisms whereby tables are updated. This paper introduces automatic incremental tabling to support applications that rely on pervasive tabling such as the KRR-systems described above. The papers major contributions are:

• A description of core changes that allow table updates to be made in a safe and efficient manner: first, tables are updated automatically and efficiently by lazy recomputation; second, updates always guarantee view consistency for incremental tables.

• A description of how incremental recomputation is extended to support updates according to the three-valued well-founded semantics.

• Introduction of the notion of IDG abstraction to reduce the size of the IDG when necessary.

• Detailed performance analyses of automatic incremental tabling for both small program fragments and for KRR-style examples over Extensional Databases (EDBs) up to size . These results indicate that automatic incremental tabling efficiently supports the KRR uses previously mentioned, and may also provide a basis for reactive KRR.

Automatic incremental tabling is available in the current version of XSB. In addition to the extensions mentioned above, its implementation is based on a significant rewriting of the previous implementation of manual incremental tabling. Incremental tabling is not yet available in tabling engines other than XSB. However, while transparent incremental tabling adds data structures such as the IDG, it interfaces with a tabling engine mostly through routines for maintaining table space. Accordingly, most of the features described below are relatively portable, as tabling engines have similar table space operations, and sometimes similar data structures.

## 2 A Review of Manual Incremental Tabling

In this section we describe the previous version of incremental tabling using the main data structures and algorithms of [Saha (2006)], which form the starting point for the features of automatic incremental tabling described in later sections. The description is as self-contained as possible, but sometimes uses the terminology of the SLG-WAM [Sagonas and Swift (1998)].

Fig. 1 shows an XSB program where predicates are declared to use incremental tabling. In general both tables and dynamic code may be declared with various attributes: not only incremental as here, but also subsumptive, trie-indexed, and so on. Note that tnot/1 is an XSB operator for tabled negation. Execution of the query t_1(X) creates the Incremental Dependency Graph (IDG) schematically shown in Fig. 1.

The IDG has a node for each tabled subgoal but not for non-tabled subgoals such as nt_1(X) – though the bindings made by the rules for nt_1/1 are implicitly propagated. Leaf nodes in the IDG correspond to predicates such as p/1 and q/1 that are declared to be both dynamic and incremental. Each downward edge in a IDG represents an element of the direct dependency relation; the inverse relation is the direct affected relation. Note that paths in the IDG may be cyclic.

At the level of data structures, each node in the IDG is represented via an IDG node frame (Fig. 2). For a tabled incremental subgoal t/n, the IDG node frame is created by the tabletry instruction, by registering it into the subgoal trie for t/n 111In XSB, the default data structure for tabled subgoals and their answers is based on tries [Ramakrishnan et al. (1999)]. While XSB offers basic support for answers that are “hash-consed” [Zhou and Have (2012)] and not maintained as tries, our presentation assumes subgoal and answer tries throughout., and linking it with the subgoal frame, which contains information about each tabled subgoal. For dynamic incremental subgoals a new SLG-WAM instruction, try_dynamic_incremental performs these tasks. Each time a (tabled or dynamic) incremental subgoal is called, the IDG may be updated. If is new, an IDG node frame is created; also whether or not is new, if has a nearest tabled subgoal as an ancestor, edges between and are added if not already present. As answers are derived for , their count is maintained in the nbr_of_answers field of the IDG node frame.

At a high level, the use of the IDG is easy to understand. If a fact, say p(g(2)), is asserted, the incremental update subsystem must call traverse_affected_nodes() (Fig. 3) to traverse the IDG. Separate traversals start from each leaf node with which p(g(2)) unifies, and the traversals will increment the falsecount field of their IDG node frame (cf. Fig. 2), marking them as invalid (i.e., having a falsecount greater than 0). As it is unclear whether sensible semantics can be given to updating a subgoal that is incomplete (i.e., that is still being computed), a permission error is thrown if this is attempted. In our running example, assuming that no nodes in the IDG are already invalid, the algorithm will traverse depth-first through all nodes affected by p(g(X)) (directly or indirectly). In so doing, the affected non-leaf nodes are added to a global invalid list in the same order. In our example, the nodes for t_5(X), t_4(X) and t_1(X) are traversed, and the invalid list represents this sequence.

Several properties of the traversal are worth noting. First, use of the falsecount field in traverse_affected_nodes() prevents the same node from being traversed multiple times. Also, note that invalidation simply represents some change in the underlying data so that retracts are handled in the same manner as asserts, and both positive and negative dependencies are treated in the same way. In fact, since the traversal starts with dependency leaf nodes that unify with a given atom, propagation of a rule update is handled in the same manner as a fact update: traverse_affected_nodes() is invoked for leaf nodes that unify with the rule head. In either case, the unification of leaf nodes with a given atom can also prevent unnecessary updates: for instance, if the fact q(g(2)) were added, it would not cause any update, since no leaf node of the IDG unifies with this fact.

## 3 Supporting Well-Founded Negation

A necessary extension for incremental tabling to support KRR applications of the type mentioned in the introduction is to support full well-founded negation. KRR applications make use of the undefined truth value to represent conflicts in the defeasibility theory used by a program, as well as to handle infinite models through a type ofanswer abstraction called restraint [Grosof and Swift (2013)], and to support debugging of KRR programs (cf. [Naish (2006)]).

As mentioned in the previous section, the IDG maintains information about the dependency and affected relations without representing whether these changes are positive or negative. One advantage of this is that manual incremental tabling is correct for stratified negation — here, meaning well-founded negation with two-valued models. However, to support full well-founded negation, the update process must handle tables in which some atoms are undefined. To explain how this is done, we overview those aspects of well-founded negation in SLG resolution [Chen and Warren (1996)] that are relevant to the incremental update algorithms.

Essentially, a query evaluation by SLG resolution builds up a partial model of those parts of a program that are relevant to the query. To make this specific, an SLG evaluation is modeled as a sequence of states, called forests. Let be one such forest in . contains a set of tabled subgoals that have been encountered so far in . Each such tabled subgoal in is associated with a table containing computed answers for ; may be marked as completed in if it has been determined that all necessary resolution has been performed to derive answers for . To support 3-valued interpretations of , answers are distinguished as unconditional answers representing true derivations, and conditional answers representing derivations of atoms with truth value undefined. Accordingly, let be a completed table for a subgoal in , and let be an atom in the ground instantiation of . is true if it is in the ground instantiation of some unconditional answer in , and is false if is not in the ground instantiation of any answer in (conditional or unconditional).

Formally, for a subgoal , a conditional answer has the form where is termed the answer substitution; and , the delay list, is a list of literals needed to prove but whose resolution has been delayed because they do not have a well-founded derivation (based on the current state of the evaluation if is not completed). During the course of an evaluation, if a literal in a delay list becomes true or false, the SLG simplification operation respectively removes from the delay list or indicates that the conditional answer itself is false.

###### Example 3.1

The goal p(X) to the program p(1) p(2):- not q(2) p(2):- not q(3) q(X):- not p(X) has an unconditional answer p(1) along with two conditional answer: p(2):- not q(2), p(2):- not q(3). Note that the delay lists for answers to p(2) contain only undefined literals upon which p(2) directly depends (e.g., not q(2), not q(3)), but not indirect dependencies such as not p(3).

As mentioned above, XSB represents both tabled subgoals and their answers using tries, a representation that is supported by other Prologs such as YAP [SCDR12] and Ciao [HBCLMMP12]. In XSB this representation is extended as follows [Sagonas et al. (2000)]. If an atom is undefined, the leaf node of the answer representing points to an answer information frame which in turn points to other answers conditional on as well to a delay trie representing all delay lists upon which is conditional. In Example 3.1 the delay trie for p(2) would contain the lists [not q(2)] and [not q(3)]. Whenever an unconditional answer is derived in a table for subgoal , the answer information frame and delay trie for conditional answers to are deallocated if they exist.

To extend incremental recomputation to correctly handle changes involving conditional answers, several previously unconsidered cases must be addressed for a given answer substitution in a table . Each case below considers only those answers in the table .

• Informational Weakening 1, There were previously no answers for ; after the update there are one or more conditional answers for .

• Informational Weakening 2, There was previously an unconditional answer for ; after the update there are one or more conditional answers for .

• No Informational Change, There were previously one or more conditional answers for ; after the update further conditional answers were added, or some but not all conditional answers for were deleted.

• Informational Strengthening 1, There were previously one or more conditional answers for ; after the update becomes true, with an unconditional answer.

• Informational Strengthening 2, There were previously one or more conditional answers for ; after the update becomes false, with no answers.

The cases above are grouped by their action on the information ordering of truth values, where both true and false are stronger than undefined. From the perspective of table updates, no action need be taken in the case of No Informational Change, as the truth value of is unchanged. To see this, recall that delay lists contain only direct dependencies. Thus any answer that is conditional on will contain or in its delay lists so that changes to the delay list of need not be propagated. Strengthening and weakening of answers are addressed by the extensions to incremental_reeval() shown underlined in Fig. 4.

Fig. 4 reflects a bilattice of the information ordering and the truth ordering (where undefinedfalse). As discussed, SLG simplification propagates changes of an answer’s truth value when it is informationally strengthened. Changes to the truth value of that reflect a strengthening in the truth ordering can be detected during re-derivation (Information Weakening 1, Information Strengthening 1). Changes that reflect a weakening in the truth ordering must wait until the re-derivation is complete (Information Weakening 2, Information Strengthening 2).

\mycomment

Thus case 1, where there were previously no answers for is handled in the same way for both conditional and unconditional answers. Case 2, where there was previously an unconditional answer for but there is now only conditional answer(s) for is handled by checking a new bit field, , added to trie nodes. During the traversal of the answer list for prior to its re-derivation (Fig. 4 lines 4-6) a bit in the leaf trie node of each unconditional answer is set to true. Later, in the answer list traversal after the re-derivation (lines 13-18) the bit is checked to determine that an unconditional

Fig. 4 also indicates how cases 4 and 5 are handled. Case 4, an answer substitution that was previously conditional and is now unconditional can be caught as soon as the unconditional answer is derived (line 12). Case 5, an answer substitution that was previously conditional and is now false, cannot be caught until the post-reevaluation traversal (line 15). In both cases, SLG simplification is invoked, making use of fine-grained dependency pointers among answers and subgoals [Sagonas et al. (2000)]. Thus, cases 4 and 5 are handled in a more “incremental” manner than the other cases: the truth values of answers are updated directly, without the need of further re-evaluation of affected subgoals.

## 4 Ensuring Transparency through Lazy Recomputation and View Consistency

Perhaps the main drawback of manual incremental tabling is the level of control it requires from a programmer. A programmer can specify that an incremental update is to be done immediately after an assert or retract, but this is inefficient when multiple updates are required. Alternatively, a programmer can specify that an assert or retract simply invalidate affected subgoals, but later must make a call to reevaluate subgoals on the invalid list. In either case, if choice points exist to an incrementally tabled subgoal that is completed, the semantics of an update are undefined (and in fact the program may crash). In addition to these issues, manual incremental tabling may cause unnecessary work as all affected goals are recomputed even if they are never re-queried. We show how these problems are fixed in automatic incremental tabling.

### 4.1 Lazy Recomputation

The implementation of line 15 of Fig. 5 uses a general interrupt mechanism whereby a given goal may dynamically interrupt the current execution environment so that is immediately executed and success and failure continuations of are (a modification of) 444In XSB, as in other Prologs, such interrupts are used to handle unification of attributed variables, signaling among Prolog threads, and other tasks.. In line 17, the interrupt mechanism intersperses a call to recompute_dependent_tables(), to traverse the invalid list and recompute subgoals. When recompute_dependent_tables() finishes, its continuation will make a fresh call , which will see a completed and valid table, and will then simply backtrack through answers for (starting with line 18 of Fig. 5).

\mycomment

There are situations where it is convenient or necessary to abolish an incremental table rather than simply incrementally updating it. An example of this occurs when an exception is thrown. If an exception is thrown over a choice point to a completed table no action need be taken; however if an exception is thrown over a choice point to an incomplete tabled subgoal , XSB abolishes the table as its computation has become compromised. In fact, if is part of a larger recursive component, the other subgoals in the recursive component are also compromised so that XSB allows exceptions to be caught only by subgoals that are “in between” recursive components, an implementation choice that helps XSB to be stable when exceptions are thrown in heavily tabled computations. Consider the issues that arise when an exception or other action causes an incremental table for to be abolished under basic incremental tabling. This mechanism depends for correctness on the connectivity of the IDG, if is abolished, any tables that affects must also be abolished. Lazy recomputation can simply abolish after calling traverse_affected_nodes(). When a call is made to an affected node, portions of the IDG that were removed through abolishing are reconstructed during the calls made by incremental_reeval(), since the previously abolished subgoals are treated as new.

### 4.2 View Consistency

A fundamental principle of databases is to support view consistency: that is, to ensure that answers to a query should be those derivable at the time was begun, and should not be affected by any updates. Accordingly, the ISO standard for Prolog [ISO working group JTC1/SC22 (1995)] specifies that an update to dynamic code should not affect the behavior of choice points that were created before . Extending view consistency to incremental tables is critical for understandable system behavior, especially when KRR features such as hypothetical reasoning must be supported. Because XSB’s incremental tabling does not allow updates that affect tables that are still being computed (Section 2), supporting view consistency effectively means ensuring consistency for choice points into completed tables. As such choice points correspond to database cursors, we term them Open Cursor Choice Points, (OCCPs).

The approach to view consistency adopted by automatic incremental tabling is summarized in this section, with further details provided in A. A main goal is to avoid overhead when there are no choice points whose “view” needs to be maintained (including those of non-incremental tables). For this purpose, an occp_num field is maintained in the subgoal frame of a completed incremental table to indicate whether there are OCCPs for (Appendix A.1). occp_num is incremented when the subgoal for is called; and decremented when the last answer for has been returned to the call, or when a cut or throw removes the call from the choice point stack. Only if occp_num must the OCCP’s view be preserved. Automatic incremental tabling performs this preservation during incremental_reeval() by calling preserve_occp_views() (Fig. 4, line 4). While preserve_occp_views() is fully described in Appendix A.2, its main actions are as follows. The choice point stack is traversed, and for each OCCP for , the answer substitutions that have not yet been resolved by are determined and then copied from into the heap as a list (making sure that their heap space is frozen so they are not lost upon backtracking). For each answer substitution that corresponds to a answer whose truth value is undefined, the copying includes a special marker undef. Next, the structure of is altered, and its instruction is modified to backtrack through the list on the heap rather than through the table. Once preserve_occp_views() has executed, incremental_reeval() proceeds as it would otherwise do. Later, when the modified version of is backtracked into, a new instruction, preservedViewMember is called to return the answer substitutions for the preserved view (Appendix A.3) using the correct truth value. When the answers in the list have been exhausted, the heap space used for the list is unfrozen if it is safe to do so.

## 5 Abstracting the IDG

The IDG is clearly essential to efficiently update incremental tables, but in certain situations constructing the IDG can cause non-trivial overheads in query time and table space. These overheads can be addressed in many cases by abstracting the IDG. When a tabled subgoal is called, rather than creating an edge between and its nearest tabled ancestor (if any), one could abstract , or both. The semantics and implementation of subgoal abstraction was defined in [Riguzzi and Swift (2013)], here we appeal to an intuitive notion of depth abstraction: given a subgoal and integer , subterms of with depth are replaced by unique new variables. For instance, in Fig. 1, abstracting q(f(1)) at level 1 gives q(f(X)); abstracting at level 0 gives q(X).

Figure 6 illustrates an important case where abstracting the IDG can be critical to good performance for incremental tabling. In the case of left-linear recursion, if no abstraction is used a new node will be created for each call to edge/2 as shown on the left side of this figure. If a large number of data elements are in fact reachable, the size of the IDG can be very large. If calls to the edge/2 predicate make use of depth-0 abstraction, the graph may be much smaller as seen on the right side of Fig. 6. Whether abstracting a IDG in this manner is useful or not is application dependent; however, performance results in the next section illustrate cases where abstraction greatly reduces both query time and space.

Abstracting the edge/2 predicate has subtle differences from abstracting tabled subgoals. In the first place, the edge/2 predicate of Fig. 6 is not tabled. Furthermore, the actual edge/2 subgoal itself should not be abstracted to depth 0 since losing the first argument instantiation would prevent the use of indexing. Rather, only the IDG’s representation of the subgoal should be abstracted. Fortunately, in XSB the code to intern dynamic goals for the IDG shares code used for tabling, so that extending abstraction to handle dynamic incremental predicates is relatively straightforward. In XSB, abstraction of dynamic code for the IDG can be specified via the declaration:
:- dynamic edge/2 as incremental, abstract(0).

\mycomment

The IDG for a set of incremental predicates can be coarsened using XSB’s implementation of subgoal abstraction [Riguzzi and Swift (2013)]. Subgoal abstraction is a mechanism that allows a tabled subgoal to be abstracted to a subgoal such that . Although subgoal abstraction was originally implemented to provide stronger termination properties for tabled evaluations, using subgoal abstraction provides a mechanism to coarsen a IDG by coalescing different vertices for a given tabled predicate. In Version 1(beta) of XSB, subgoal abstraction is based on term depth, and can be declared as:

:- table p/2 as incremental,abstract()

so that terms whose depth is greater than are abstracted: e.g., the level 2 term depth abstraction of p(f(g(1)),g(2)) is represented as p(f(g(X)),g(2)). An important case of this is 0-level abstraction that would abstract the above term to p(X,X). automatic incremental tabling extends abstraction to incremental dynamic facts, via declarations such as:

For programs that make heavy use of incremental tabling, the question arises of how to use it efficiently. Consider a goal to a dynamic predicate that directly affects a set of tabled subgoals to a given predicate. The IDG will contain an edge from to each , and To the extent that these subgoals unify, updates that unify with will perform redundant work as the various are updated. Conversely, a IDG that contains a large set of dynamic incremental goals, such as , , , may or may not work well compared to a IDG that simply contains . If the affect different subgoals, maintaining the instantiation in the first argument can provide an important mechanism to reduce the amount of reevaluation needed when a change is made to . However, to the extent that the affect the same set of subgoals, keeping such subgoals is inefficient, as each dynamic goal requires space for a IDG leaf and edge.

The IDG for a set of incremental predicates can be coarsened using XSB’s implementation of subgoal abstraction [Riguzzi and Swift (2013)]. Subgoal abstraction is a mechanism that allows a tabled subgoal to be abstracted to a subgoal such that . Although subgoal abstraction was originally implemented to provide stronger termination properties for tabled evaluations, using subgoal abstraction provides a mechanism to coarsen a IDG by coalescing different vertices for a given tabled predicate. In Version 1(beta) of XSB, subgoal abstraction is based on term depth, and can be declared as:

:- table p/2 as incremental,abstract()

so that terms whose depth is greater than are abstracted: e.g., the level 2 term depth abstraction of p(f(g(1)),g(2)) is represented as p(f(g(X)),g(2)). An important cass of this is 0-level abstraction that would abstract the above term to p(X,X). automatic incremental tabling extends abstraction to incremental dynamic facts, via declarations such as:

:- table p/2 as incremental,abstract()

As will be shown in Section 6, the use of subgoal abstraction can have major effects on the time and space required for incremental tabling, indicating that determining cost metrics for how much abstraction to use for a given program and set of queries is an important open question for incremental tabling.

## 6 Performance Results and Analysis

The performance of manual incremental tabling in XSB has been analyzed previously, most extensively in [Saha (2006)]. By and large the behavior of manual incremental tabling features are not affected by the rewriting to support automatic incremental tabling. Accordingly, the performance questions addressed here analyze new features, scalability, and the behavior of incremental tabling for KRR-style computations. A summary of performance results is given in this section, with tables and other details provided in B.

Left-Linear Recursion Recursion is heavily used in KRR-style programs that make use of features such as Hilog or defeasibility. As a first test, queries of the form were made to a left recursive predicate (Fig. 6) with and without IDG abstraction on the edge/2 predicate (cf. Appendix B.1). In the benchmarks, edge/2 consists of ground facts representing randomly generated graphs of 50,000 – 5,000,000 edges. As shown in Fig. 11 if IDG abstraction is not used, creating the IDG adds a CPU time overhead of roughly 50% and a table space overhead of about 300% compared to non-incremental tabling. By using IDG abstraction at depth 0, the table space overhead becomes approximately 30%, and the time overhead 5-10%. Fig. 11 shows that for a batch updates (0.02%-2% of EDB), the overhead of re-evaluation is negligible, particularly if abstraction is used.

Performance Analysis on a Program with KRR Features The program in Fig. 15 represents a social network in which certain members of a population are at risk, and other members of the population may influence the behavior of the at-risk members. While the program contains stratified negation, its main computational challenge arises from its heavy use of equality between constants and functional terms – a reasoning capability similar in flavor to some description logics. This use of equality over functional terms quickly leads to non-termination and unsafe negative subgoals during query evaluation. As discussed in Appendix B.2, these behaviors are handled using various tabling mechanisms, so that the ability to incrementally maintain tables for queries to this program requires the ability to update three-valued models that arise from answer abstraction [Grosof and Swift (2013)], tabled negation and subgoal abstraction.

As a first benchmarkËof this program, a small EDB of about 10,000 facts was generated, and good_infuence was queried for 200 randomly chosen values for its first argument. In this case, incremental tabling caused a time overhead of about 240% and a space overhead of 280% – although further exploration of abstraction would likely reduce these numbers. Next, updates of substantial sizes were performed on the EDB, and the revaluation time was computed (Fig. 16). While most of these times are near the level of noise, recomputation of several of the predicates timed out. Analysis of these timeouts showed that they arose because the additional facts caused a large number of new tables to be created when the 200 queries were re-evaluated. Fig. 17 shows the times to assert or retract, plus time to invalidate affected subgoals via traverse_affected_nodes(). Except for updates to parent_of_edb/2 which directly affect equality, invalidation did not take a significant amount of time

Scalability Analysis on a Program with KRR Features As a next step, the computational burden of the equality relation in the previously mentioned program was reduced by specializing it as discussed in Appendix B.2.1. Tests were performed on EDBs from around 100,000 – 10,000,000 facts. As shown in Fig. 18, the space and time for these computations scales roughly linearly. For the EDB of about 10,000,000 facts, times were obtained for various large batch updates and for query re-evaluation (Figs. 19 and 20). Except for updates to parent_of_edb/2, re-evaluation time was very low compared to initial query time (even for initial queries using non incremental tabling), illustrating the promise of incremental tabling for large, reactive systems. These benchmarks also demonstrate the scalability of this implementation, even for very large IDGs. In Fig. 18, the IDG contained over 750 million edges; after the update sequences mentioned above were applied, it contained more than 1 billion edges.

## 7 Discussion

This paper has introduced automatic incremental tabling, which improves previous versions of incremental tabling in both semantics and efficiency. The semantics of lazy recomputation (Section 4) together with the preservation of view consistency (Section 4.2 and A) guarantee that incremental tables will always reflect the state of the underlying knowledge base at the time they were queried. This view consistency takes tabled logic programming a step closer to deductive databases, and supports hypothetical reasoning in KRR applications. In addition, the ability to update 3-valued computations (Section 3) is necessary when defeasibility is used over the well-founded semantics, as well as for other features such as answer abstraction. In terms of efficiency, lazy recomputation avoids recomputing invalidated queries until they are requeried, and IDG abstraction (Section 5) can significantly reduce the amount of time and space required for queries. The efficiency and scalability of the resulting implementation was summarized in Section 6 and discussed in detail in B. C provides further information about how to use automatic incremental tabling in practice.

Although the major semantic issues for incremental tabling have been addressed in this paper, KRR-style computations incur a heavy computational burden, and the benchmark programs do show cases where transparent incremental tabling incurs more cost than is desirable. An important goal is to “guarantee” bounds for transparent incremental tabling when used on representative KRR programs. For instance, for constant bounds , initial query time using incremental tabling should never be more than times that of non-incremental tabling; recomputation time should never be significantly more than initial query time; and the space for the IDG should never be more than times the space of the tables themselves. Such bounds may be obtained through a mixture of program analysis (some of which may itself be incremental cf. [Hermenegildo et al. (2000)]) and adaptive incremental tabling algorithms. Even today, incremental tabling is starting to be used to prototype applications in stream deductive databases and event monitoring; continued efficiency improvements should make commercial applications in these areas possible.

## References

• Chen and Warren (1996) Chen, W. and Warren, D. S. 1996. Tabled Evaluation with Delaying for General Logic Programs. Journal of the ACM 43, 1, 20–74.
• Grosof and Swift (2013) Grosof, B. and Swift, T. 2013. Radial restraint: A semantically clean approach to bounded rationality for logic programs. In American Association for Artificial Intelligence Press.
• Hermenegildo et al. (2000) Hermenegildo, M., Puebla, G., Marriott, K., and Stuckey, P. 2000. Incremental Analysis of Constraint Logic Programs. ACM TOPLAS 22, 2 (March), 187–223.
• ISO working group JTC1/SC22 (1995) ISO working group JTC1/SC22. 1995. Prolog international standard iso-iec 13211-1. Tech. rep., International Standards Organization.
• Lindholm and O’Keefe (1987) Lindholm, T. and O’Keefe, R. A. 1987. Efficient implementation of a defensible semantics for Prolog. In Intl. Conf. on Logic Prog. 21–40.
• Lloyd and Topor (1984) Lloyd, J. and Topor, R. 1984. Making Prolog more expressive. Journal of Logic Prog. 1, 3, 225–240.
• Naish (2006) Naish, L. 2006. A three-valued semantics for logic programmers. Theory and Practice of Logic Programming 6, 5, 509–538.
• Ramakrishnan et al. (2007) Ramakrishnan, C., Ramakrishnan, I., and Warren, D. S. 2007. XcelLog: A deductive spreadsheet system. Knowledge Engineering Review 22, 3, 269–279.
• Ramakrishnan et al. (1999) Ramakrishnan, I. V., Rao, P., Sagonas, K., Swift, T., and Warren, D. S. 1999. Efficient access mechanisms for tabled logic programs. Journal of Logic Prog. 38, 1, 31–55.
• Reece et al. (2010) Reece, J., Urry, L., Cain, M., Wasserman, S., Minorsky, P., and Jackson, R. 2010. Campbell Biology. B. Cummings. 9th Edition.
• Riguzzi and Swift (2013) Riguzzi, F. and Swift, T. 2013. Well-definedness and efficient inference for probabilistic logic programming under the distribution semantics. Theory and Practice of Logic Programming 13, 2, 279–302.
• Sagonas and Swift (1998) Sagonas, K. and Swift, T. 1998. An abstract machine for tabled execution of fixed-order stratified logic programs. ACM TOPLAS 20, 3 (May), 586 – 635.
• Sagonas et al. (2000) Sagonas, K., Swift, T., and Warren, D. S. 2000. An abstract machine for efficiently computing queries to well-founded models. Journal of Logic Prog. 45, 1-3, 1–41.
• Saha (2006) Saha, D. 2006. Incremental evaluation of tabled logic programs. Ph.D. thesis, SUNY Stony Brook.
• Saha and Ramakrishnan (2005) Saha, D. and Ramakrishnan, C. 2005. Incemental and demand-driven points-to analysis using logic programming. In Principles and Practice of Decl. Prog. 117–128.
• Swift and Warren (2012) Swift, T. and Warren, D. 2012. XSB: Extending the power of Prolog using tabling. Theory and Practice of Logic Programming 12, 1-2, 157–187.
• Yang et al. (2013) Yang, G., Kifer, M., Wan, H., and Zhao, C. 2013. FLORA-2: User’s Manual Version 0.99.3.
• Zhou and Have (2012) Zhou, N. and Have, C. 2012. Efficient tabling of structured data with enhanced hash-consing. Theory and Practice of Logic Programming 12, 4-5, 547–563.

## Acknowledgements

The research in this paper was partially funded by Vulcan, Inc. and Coherent Knowledge Systems. The author would like to thank Paulo Moura for latex-related help, Fabrizio Riguzzi for making the University of Ferrara server available for benchmarks, and anonymous reviewers for their careful comments. Finally, the author would like to thank Michael Kifer for finding and reporting many, many bugs in automatic incremental tabling.

## Appendix A View Consistency and Table Updates

As discussed in Section 4.2 the approach to maintaining view consistency for automatic incremental tabling has three main parts. (1) a count of the OCCPs for an incremental table is always maintained. (2) when an update affects an incremental table, the view of an OCCP is preserved by copying its unconsumed answers onto the heap and altering the OCCP to use the copied answers. (3) a new instruction returns answers from the preserved views upon backtracking.

More than other aspects of automatic incremental tabling, the details of view consistency support rely on tabling data structures and algorithms used by XSB, some background for which is presented here.

• Answer Tries Steps 1 and 2 use the sequence of choice points set up when backtracking through an answer trie, the default data structure used by XSB to represent answers [Ramakrishnan et al. (1999)]. Answer tries are constructed to support substitution factoring, so that they contain only the information used to bind variables in the associated subgoals, i.e., the answer substitution introduced in Section 3. Each trie node contains an SLG-WAM instruction, so that returning an answer substitution directly corresponds to traversing a path from the root of a trie to a leaf, and backtracking through the trie corresponds to traversing the trie in a fixed depth-first order. A choice point is created whenever traversing a new node that has multiple children and is removed when all children have been traversed (through a trust-style instruction).

• Freeze Registers Steps 2 and 3 make use of the SLG-WAM’s HF (heap freeze) register, which is used to protect terms in the heap from being over-written when tabled computations are repeatedly suspended and resumed.

While these data structures are not unique to XSB, other engines that differ from XSB in their representation of answers or in their implementation of suspension and resumption may implement this approach with suitable modifications.

### a.1 Maintaining a Count of OCCPs for a Completed Incremental Table

To maintain a count of OCCPs, a field called occp_num is added to subgoal frames. In addition, the first choice point created in backtracking through an answer trie, is modified so that it increases the OCCP number when is created, and decreases the number when is removed. Finally, any routines that remove choice points must also be modified to reset the occp_num, including code that removes choice points upon executing a cut, and when executing a throw operation.

### a.2 Preserving Views and Altering OCCPs

Once the is constructed, the choice points between and inclusive are coalesced via coalesce_choice_points() into a new choice point, . This routine is easiest to illustrate by its results (Fig. 8). The address of is that of , but when is backtracked into, it will restore the engine environment as it would be if backtracking into , and when its choices are exhausted, it will backtrack into the choice point prior to . Of course, also contains a field.

In Fig. 8 the values of come from rather than from , with the exception of fields representing heap values. In the stack-oriented backtracking used by Prolog, the can be protected by setting to the value of the H register after the construction of . If there is a possibility that tabling will suspend and resume computations, preserve_occp_views() needs to freeze the heap space containing these answers so that the heap cells containing them will not be overwritten. If (HF reg bottom of heap), then there is an active tabled computation, and the heap freeze register is set to the value of the H register after construction of . The previous value of the HF will be reset using .previous_hfreg once backtracking through is done.

\mycomment

tabling use of lazy recomputation also supports a semantics for updating the table for a subgoal while a computation has choice points set to answers for . The subgoal frame of each table has a field up to ; and is decremented whenever answers for have been exhausted, or a cut or a thrown exception removes choice points to . In the tabletry instruction, lazy recomputation is only on the answers for just as they are. Thus all choice points for will see the same answers, regardless of whether incremental facts have changed, which we term the cautious semantics. This semantics is not ideal: for most purposes incremental tabling would benefit from an extension of the ISO semantics in which each choice point saw exactly the answers that were present at that call to . At the same time the cautious semantics is easy to implement and to understand, and will be properly updated as soon as all choice points to have been eliminated 777A Prolog flag can be set to allow an error can be thrown if an invalid subgoal with choice points is called..

### a.3 Backtracking through Preserved Views

Fig. 9 shows the new SLG-WAM instruction that returns an answer through a preserved OCCP view when a coalesced choice point is backtracked into. The instruction reconstructs the SLG-WAM state at the time of its call (except for the heap register which was adjusted to protect the ). Each answer substitution cell of the coalesced choice point is dereferenced to a heap or local stack cell, the dereferenced cell is bound to an element of the answer substitution, and the binding trailed. Afterwards, the field is reset to point to the next list element if one is present; otherwise the HF register is set to its before the view was preserved, and the B register is set to the previous choice point.

### a.4 Discussion of View Consistency in Automatic incremental tabling

Of course, other approaches to view consistency are possible besides the one just presented. Before the above was implemented, answer tries were extended to include timestamps indicating when a given answer was valid (analogous to that of [Lindholm and O’Keefe (1987)] for dynamic Prolog code). However, the time and space overhead of this approach was deemed to be too high. The actual implementation of the heap copying approach presented here uses XSB’s general tabling code as much as possible, so that the cost to traverse tries and copy answers is generally very low.

It should be noted that the approach to view consistency is more closely linked to the data structures of the XSB engine than are other features of automatic incremental tabling, as view consistency interfaces with XSB’s heap and stack freezing mechanisms.

## Appendix B Performance Results

In the benchmarks that follow, all times are measured in seconds, and all space is measured in bytes unless otherwise specified 888 Except for those reported in Section B.2.1, the benchmarks below were performed on a MacBook Pro, with a dual core 2.53 Ghz Intel i5 chip and 4 Gbytes of RAM. The benchmarks for Section B.2.1 were performed on a server at the University of Ferrara with 3 Intel dual-core 3.47 GHz CPUs and 188 megabytes of RAM running under Fedora Linux. The default 64-bit, single-threaded SVN repository version of XSB was used for all tests. Benchmark programs can be obtained at www.cs.sunysb.edu/~tswift/interpreters.html..

### b.1 Transparent Incremental Tabling and Linear Left Recursion

Recursion is heavily used in KRR-style programs that make use of features such as Hilog or defeasibility. As a first test, queries of the form reach were made to a left recursive predicate (Fig. 12) with and without IDG abstraction on the edge/2 predicate. As discussed in Section 5 and shown in Fig. 6, the IDG created for such a query may differ greatly depending on whether abstraction is used. In the benchmarks, edge/2 consists of ground facts representing a randomly generated graph where is the number of possible nodes in the graph, while is the number of directed edges. Because of the left recursive form of reach/2 together with its query form, the IDG nodes for edge/2 are associated with subgoals from clause 1 of reach/2, and where argument 1 is instantiated by different values of in clause 2 of reach/2. Using the re-evaluation strategies described in previous sections, any update to edge/2 will cause a re-evaluation of the subgoal reach so that (in this program fragment) maintaining nodes of the form provides no benefit, as their dependencies will be captured by .

As shown in Fig. 11 if IDG abstraction is not used, creating the IDG adds a CPU time overhead of roughly 50% and a table space overhead of about 300%. By using IDG abstractionat depth 0, the table space overhead becomes approximately 30%, and the time overhead 5-10%. Regardless of whether abstraction is used, Fig. 11 demonstrates scalability for 2 orders of magnitude; the time scales log-linearly due to the need to maintain indices. Fig. 11 shows that for a batch updates (0.02%-2% of EDB), the overhead of re-evaluation is negligible, particularly if abstraction is used.

### b.2 Analysis of Transparent Incremental Tabling on a Program with KRR-style Features

The program in Fig. 15 represents a social network in which certain members of a population are at risk, and other members of the population may influence the behavior of the at-risk members. Although the program is simplified and idealized in its content, computationally it requires the use of some sophisticated reasoning features. While the program contains stratified negation, its main computational challenge arises from its use of equality, which provides a reasoning capability similar in flavor to some description logics. The predicate equals/2 allows terms using the function symbol parent_of/1 (formed from the EDB predicate parent_of_edb/2) to be considered as equal to constants representing individuals.

The EDB for this program consists of 12 different dynamic predicates as seen at the bottom of the program 999The social network programs and supporting data can be found at http://www.cs.sunysb.edu/~tswift.. The use of the parent_of/1 function within equals/2 quickly leads to non-termination and unsafe negative subgoals during query evaluation. Unsafe negative subgoals are soundly addressed by XSB’s sk_not/1 which skolemizes non-ground variables in an atomic subgoal for the purpose of calling a negative subgoal. Non-termination is addressed in two ways. The use of subgoal abstraction in equals/2 ensures that there will be only a finite number of tabled queries to this predicate, and in general ensures termination for programs with finite models [Riguzzi and Swift (2013)]. However, the predicate possible_risk_association/2, produces an infinite number of answers for the benchmark data set. The use of answer abstraction (or restraint) for this predicate, ensures sound (but not complete) terminating query evaluation [Grosof and Swift (2013)] 101010Briefly, if an answer has an argument with depth greater than a given bound, is rewritten so that terms with depth equal to the bound are replaced by new variables; then the answer is assigned the truth value undefined.

Thus, the ability to incrementally maintain tables for queries to this program requires the ability to update three-valued models that arise from answer abstraction, and to combine with tabled negation and subgoal abstraction. As a first benchmark test, a small EDB of about 10,000 facts about a population of 10,000 persons was generated, and good_infuence was queried for 200 randomly chosen values for its first argument. If no incremental tabling was used, the combined CPU time for these queries averaged 1.14 seconds and table space was about 233 megabytes — as discussed further below the relatively large cost for this query was almost entirely due to the use of equality. When transparent incremental tabling was used with no abstraction, the cost rose to 3.02 seconds, and 865 megabytes. By applying IDG abstraction the initial query time dropped to 2.73 seconds and 655 megabytes. The purpose of this sets of declarations was only to test the overhead of automatic incremental tabling for queries and updates: they should not necessarily be considered to be “optimal” for these tests

Fig. 16 shows times to re-evaluate the queries to good_influence/2 mentioned above after inserting randomly generated facts for a given predicate (the “Asserts” column); and then after retracting these inserted facts (the “Retracts” column). Most of the times in Fig. 16 are near the level of noise, however recomputation of several of the predicates timed out 111111Timeouts, denoted Tout in Fig. 16, were triggered after one minute. The short timeout period was to avoid excessive memory consumption on the laptop benchmarking machine. Retracts of bulk inserts could not be measured, and are designated as n/a. As the population size was 10,000, 12,500 distinct facts could not be generated for the unary EDB predicate has_disease/1.. Analysis of these timeouts showed that they arose because the additional facts caused a large number of new (sub-)tables to be created for the 200 queries. Usually, this only occurred after 12,500 facts were added, but for parent_of_edb/2 which strongly affects goals to equals/2, the addition of 500 facts led to a timeout, while the addition of 100 facts led to a 5.57 second recomputation time. Although the program is not wholly monotonic, it is largely so, and computations after retractions were always fast. Fig. 17 shows the times to assert or retract plus the time taken to invalidate affected subgoals via traverse_affected_nodes(). Except for updates to parent_of_edb/2 invalidation did not take a significant amount of time

#### b.2.1 Scalability Analysis on a Program with KRR Features

As a next step, the equality relation in the previously mentioned program of Fig. 15 was specialized so that it had the form:

 :- table equals/2 as incremental, subgoal_abstract(3). equals(X,Y):- atomic(X),Y = parent_of(_),equals(Y,X). equals(parent_of(X),parent_of(X)). equals(parent_of(X),Y):- parent_of_edb(X,Y). equals(parent_of(parent_of(X)),Y):- parent_of_edb(X,Z),equals(parent_of_(Z),Y1),Y1 = Y.

In this form, the first clause of equals/2 is changed so that symmetry is applied only if the first argument corresponds to a nominal individual (constant), and the second argument has a functional form. The fourth clause is changed so that subgoals of the form equals are not called by this clause, but instead subgoals of the form equals are called. These changes, which do not affect the semantics of the program, significantly reduce the time and space required for query evaluation, although goals to equals/2 are still computationally expensive to update.

With this change, a series of 200 queries as described above were tested on EDBs ranging from around 100,000–10,000,000 facts. As shown in Fig. 18, the space and time for these computations scales roughly linearly. For the EDB of about 10,000,000 facts, various batch updates were timed along with time to re-evaluate queries (Figs. 19 and 20). Specifically for and 312500, asserts of each EDB predicate were performed and timed; and then the asserted facts were retracted and timed. Except for updates to parent_of_edb/2, re-evaluation time was low compared to initial query time (even compared to the initial query time for non-incremental tabling). These benchmarks illustrate the scalability of this implementation of automatic incremental tabling even for very large IDGs. In Figs. 19 and 20, the IDG contained over 750 million edges; after the update sequences mentioned above were applied, it contained more than 1 billion edges.

## Appendix C A Note on Usability

The XSB manual contains information on how transparent incremental tabling may be used in practice; however to make this paper self-contained, we provide an outline of some usability and system aspects.

XSB has a variety of tabling mechanisms that are used for different purposes. As seen from Fig. 15, automatic incremental tabling works properly with subgoal abstraction and with answer abstraction; as discussed in Section 3, automatic incremental tabling works properly with well-founded negation regardless of the tabled negation operator: for instance with sk_not/1 in Fig. 15, or with other XSB operators such as tnot/1. It also works properly with tabled attributed variables (supporting tabled constraints). A variety of dynamic code may be used as a basis for automatic incremental tabling including not only regular facts and rules, but also facts that are interned as XSB tries. Incremental tables, of whatever form, may be used alongside non-incremental tables, although special declarations must be made if an incremental table depends on a non-incremental table.

Within the current version of XSB, automatic incremental tabling does not yet work properly with call subsumption, answer subsumption, hash-consed tables, or multi-threaded tables; also, predicates that are tabled as incremental must use static code rather than dynamic code. Attempts to declare a predicate using an unsupported mixture of tabling features causes a compile-time permission error.

There are situations where it is convenient or necessary to abolish an incremental table rather than updating it. An example of this occurs when an exception is thrown. If an exception is thrown over a choice point to a completed table no action need be taken; however if an exception is thrown over a choice point to an incomplete tabled subgoal (including one that is being recomputed), XSB abolishes the table as its computation has become compromised. In automatic incremental tabling, abolishing an incremental table is not problematic. If a table is to be abolished, tables that depend on must be invalidated before actually abolishing itself. When a call is made to a subgoal with an invalidated affected node, portions of the IDG that were removed through abolishing will be reconstructed during the calls made by incremental_reeval(), due to the actions of lazy recomputation.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters