ColumnOriented Datalog Materialization for Large Knowledge Graphs
(Extended Technical Report)
Abstract
The evaluation of Datalog rules over large Knowledge Graphs (KGs) is essential for many applications. In this paper, we present a new method of materializing Datalog inferences, which combines a columnbased memory layout with novel optimization methods that avoid redundant inferences at runtime. The proactive caching of certain subqueries further increases efficiency. Our empirical evaluation shows that this approach can often match or even surpass the performance of stateoftheart systems, especially under restricted resources.
ColumnOriented Datalog Materialization for Large Knowledge Graphs
(Extended Technical Report)
Jacopo Urbani Dept. Computer Science VU University Amsterdam Amsterdam, The Netherlands jacopo@cs.vu.nl Ceriel Jacobs Dept. Computer Science VU University Amsterdam Amsterdam, The Netherlands c.j.h.jacobs@vu.nl Markus Krötzsch Faculty of Computer Science Technische Universität Dresden Dresden, Germany markus.kroetzsch@tudresden.de
This is the extended version of the eponymous paper published at the AAAI 2016 (?).
Introduction
Knowledge graphs (KGs) are widely used in industry and academia to represent large collections of structured knowledge. While many types of graphs are in use, they all rely on simple, highlynormalized data models that can be used to uniformly represent information from many diverse sources. On the Web, the most prominent such format is RDF (?), and large KGs such as Bio2RDF (?), DBpedia (?), Wikidata (?), and YAGO (?) are published in this format.
The great potential in KGs is their ability to make connections – in a literal sense – between heterogeneous and often incomplete data sources. Inferring implicit information from KGs is therefore essential in many applications, such as ontological reasoning, data integration, and information extraction. The rulebased language Datalog offers a common foundation for specifying such inferences (?). While Datalog rules are rather simple types of ifthen rules, their recursive nature is making them powerful. Many inference tasks can be captured in this framework, including many types of ontological reasoning commonly used with RDF. Datalog thus provides an excellent basis for exploiting KGs to the full.
Unfortunately, the implementation of Datalog inferencing on large KGs remains very challenging. The task is worstcase timepolynomial in the size of the KG, and hence tractable in principle, but huge KGs are difficult to manage. A preferred approach is therefore to materialize (i.e., precompute) inferences. Modern DBMS such as Oracle 11g and OWLIM materialize KGs of 100M–1B edges in times ranging from half an hour to several days (?; ?). Research prototypes such as Marvin (?), C/MPI (?), WebPIE (?), and DynamiTE (?) achieve scalability by using parallel or distributed computing, but often require significant hardware resources. ?, e.g., used up to 64 highend machines to materialize a KG with 100B edges in 14 hours (?). In addition, all the above systems only support (fragments of) the OWL RL ontology language, which is subsumed by Datalog but significantly simpler.
? have recently presented a completely new approach to this problem (?). Their system RDFox exploits fast mainmemory computation and parallel processing. A groundbreaking insight of this work is that this approach allows processing midsized KGs on commodity machines. This has opened up a new research field for inmemory Datalog systems, and ? have presented several advancements (?; ?; ?).
Inspired by this line of research, we present a new approach to inmemory Datalog materialization. Our goal is to further reduce memory consumption to enable even larger KGs to be processed on even simpler computers. To do so, we propose to maintain inferences in an adhoc columnbased storage layout. In contrast to traditional rowbased layouts, where a data table is represented as a list of tuples (rows), columnbased approaches use a tuple of columns (value lists) instead. This enables more efficient joins (?) and effective, yet simple data compression schemes (?). However, these advantages are set off by the comparatively high cost of updating columnbased data structures (?). This is a key challenge for using this technology during Datalog materialization, where frequent insertions of large numbers of newly derived inferences need to be processed. Indeed, to the best of our knowledge, no materialization approach has yet made use of columnar data structures. Our main contributions are as follows:

We design novel columnbased data structures for inmemory Datalog materialization. Our memoryefficient design organizes inferences by rule and inference step.

We develop novel optimization techniques that reduce the amount of data that is considered during materialization.

We introduce a new memoization method (?) that caches results of selected subqueries proactively, improving the performance of our procedure and optimizations.

We evaluate a prototype implementation or our approach.
Evaluation results show that our approach can significantly reduce the amount of main memory needed for materialization, while maintaining competitive runtimes. This allowed us to materialize fairly large graphs on commodity hardware. Evaluations also show that our optimizations contribute significantly to this result.
Proofs for the claims in this paper can be found in the appendix.
Preliminaries
We define Datalog in the usual way; details can be found in the textbook by ? (?). We assume a fixed signature consisting of an infinite set of constant symbols, an infinite set of predicate symbols, and an infinite set of variable symbols. Each predicate is associated with an arity . A term is a variable or a constant . We use symbols , for terms; , , , , for variables; and , , for constants. Expressions like , , and denote finite lists of such entities. An atom is an expression with and . A fact is a variablefree atom. A database instance is a finite set of facts. A rule is an expression of the form
(1) 
where and , …, are head and body atoms, respectively. We assume rules to be safe: every variable in must also occur in some . A program is a finite set of rules.
Predicates that occur in the head of a rule are called intensional (IDB) predicates; all other predicates are extensional (EDB). IDB predicates must not appear in databases. Rules with at most one IDB predicate in their body are linear.
A substitution is a partial mapping . Its application to atoms and rules is defined as usual. For a set of facts and a rule as in (1), we define . For a program , we define , and shortcuts and . The set is the materialization of with . This materialization is finite, and contains all facts that are logical consequences of .
Knowledge graphs are often encoded in the RDF data model (?), which represents labelled graphs as sets of triples of the form . Technical details are not relevant here. Schema information for RDF graphs can be expressed using the W3C OWL Web Ontology Language. Since OWL reasoning is complex in general, the standard offers three lightweight profiles that simplify this task. In particular, OWL reasoning can be captured with Datalog in all three cases, as shown by ? (?; ?) and (implicitly by translation to path queries) by ? (?).
The simplest encoding of RDF data for Datalog is to use a ternary EDB predicate triple to represent triples. We use a simple Datalog program as a running example:
(2)  
(3)  
(4)  
(5)  
(6) 
To infer new triples, we need an IDB predicate T, initialised in rule (2). Rule (3) “extracts” an RDFencoded OWL statement that declares a property to be the inverse of another. Rules (4) and (5) apply this information to derive inverted triples. Finally, rule (6) is a typical transitivity rule for the RDF property hasPart.
We abbreviate hasPart, partOf and owl:inverseOf by hP, pO and iO, respectively. Now consider a database . Iteratively applying rules (2)–(6) to , we obtain the following new derivations in each step, where superscripts indicate the rule used to produce each fact:
No further facts can be inferred. For example, applying rule (5) to only yields duplicates of previous inferences.
SemiNaive Evaluation
Our goal is to compute the materialization . For this we use a variant of the wellknown technique of seminaive evaluation (SNE) (?) that is based on a more finegrained notion of derivation step.
In each step of the algorithm, we apply one rule to the facts derived so far. We do this fairly, so that each rule will be applied arbitrarily often. This differs from standard SNE where all rules are applied in parallel in each step. We write for the rule applied in step , and for the set of new facts with predicate derived in step . Note that if is not the head predicate of . Moreover, for numbers , we define the set of all facts derived between steps and . Consider a rule
(7) 
where are IDB predicates and are EDB predicates. The naive way to apply in step to compute is to evaluate the following “rule”^{1}^{1}1Treating sets of facts like predicates is a common abuse of notation for explaining SNE (?).
(8) 
and to set . However, this would recompute all previous inferences of in each step where is applied. Assume that rule has last been evaluated in step . We can restrict to evaluating the following rules:
(9) 
for all . With the union of all sets of facts derived from these rules, we can define as before. It is not hard to see that the rules of form (9) consider all combinations of facts that are considered in rule (8). We call this procedure the oneruleperstep variant of SNE. The procedure terminates if all rules in have been applied in the last steps without deriving any new facts.
Theorem 1
For every input database instance , and for every fair application strategy of rules, the oneruleperstep variant of SNE terminates in some step with the result .
SNE is still far from avoiding all redundant computations. For example, any strategy of applying rules (2)–(6) above will lead to being derived by rule (4). This new inference will be considered in the next application of the second SNE variant of rule (5), leading to the derivation of . However, this fact must be a duplicate since it is necessary to derive in the first place.
ColumnOriented Datalog Materialization
Our variant of SNE provides us with a highlevel materialization procedure. To turn this into an efficient algorithm, we use a columnbased storage layout described next.
Our algorithms distinguish the data structures used for storing the initial knowledge graph (EDB layer) from those used to store derivations (IDB layer), as illustrated in Fig. 1. The materialization process accesses the KG by asking conjunctive queries to the EDB layer. There are wellknown ways to implement this efficiently, such as (?), and hence we focus on the IDB layer here.
Our work is inspired by columnbased databases (?), an alternative to traditional rowbased databases for efficiently storing large data volumes. Their superior performance on analytical queries is compensated for by lower performance for data updates. Hence, we structure the IDB layer using a columnbased layout in a way that avoids the need for frequent updates. To achieve this, we store each of the sets of inferences that are produced during the derivation in a separate columnoriented table. The table for is created when applying in step and never modified thereafter. We store the data for each rule application (step number, rule, and table) in one block, and keep a separate list of blocks for each IDB predicate. The set of facts derived for one IDB predicate is the union of the contents of all tables in the list of blocks for . Figure 1 illustrates this scheme, and shows the data computed for the running example.
The columnar tables for are sorted by extending the order of integer indices used for constants to tuples of integers in the natural way (lexicographic order of tuples). Therefore, the first column is fully sorted, the second column is a concatenation of sorted lists for each interval of tuples that agree on the first component, and so on. Each column is compressed using runlength encoding (RLE), where maximal sequences of repeated constants are represented by pairs (?).
Our approach enables valuable space savings for inmemory computation. Ordering tables improves compression rates, and rules with constants in their heads (e.g., (6)) lead to constant columns, which occupy almost no memory. Furthermore, columns of EDB relations can be represented by queries that retrieve their values from the EDB layer, rather than by a copy of these values. Finally, many inference rules simply “copy” data from one predicate to another, e.g., to define a subclass relationship, so we can often share columnobjects in memory rather than allocating new space.
We also obtain valuable time savings. Sorting tables means they can be used in merge joins, the most efficient type of join, where two sorted relations are compared in a single pass. This also enables efficient, setatatime duplicate elimination, which we implement by performing outer merge joins between a newly derived result and all previously derived tables . The use of separate tables for each eliminates the cost of insertions, and at the same time enables efficient bookkeeping to record the derivation step and rule used to produce each inference. Step information is needed to implement SNE, but the separation of inferences by rule enables further optimizations (see next section).
There is also an obvious difficulty for using our approach. To evaluate a SNE rule (9), we need to find all answers to the rule’s body, viewed as a conjunctive query. This can be achieved by computing the following join:
(10) 
The join of the EDB predicates can be computed efficiently by the EDB layer; let denote the resulting relation. Proceeding from left to right, we now need to compute . However, our storage scheme stores the second relation in many blocks, so that we actually must compute , which could be expensive if there are many nonempty blocks in the range .
We reduce this cost by performing ondemand concatenation of tables: before computing the join, we consolidate () in a single data structure. This structure is either a hash table or a fully sorted table – the rule engine decides heuristically to use a hash or a merge join. In either case, we take advantage of our columnar layout and concatenate only columns needed in the join, often just a single column. The join performance gained with such a tailormade data structure justifies the cost of ondemand concatenation. We delete the auxiliary structures after the join.
This approach is used whenever the union of many IDB tables is needed in a join. However, especially the expression may often refer to only one (nonempty) block, in which case we can work directly on its data. We use several optimizations that aim to exclude some nonempty blocks from a join so as to make this more likely, as described next.
Dynamic Optimization
Our storage layout is most effective when only a few blocks of fact tables must be considered for applying a rule, as this will make ondemand concatenation simpler or completely obsolete. An important advantage of our approach is that we can exclude individual blocks when applying a rule, based on any information that is available at this time.
We now present three different optimization techniques whose goal is precisely this. In each case, assume that we have performed derivation steps and want to apply rule of the form (7) in step , and that was the last step in which has been applied. We consider each of the versions of the SNE rule (9) in separation. We start by gathering, for each IDB atom in the body of , the relevant range of nonempty tables . We also record which rule was used to create this table in step .
Mismatching Rules
An immediate reason for excluding from the join is that the head of does not unify with . This occurs when there are distinct constant symbols in the two atoms. In such a case, it is clear that none of the IDB facts in can contribute to matches of , so we can safely remove from the list of blocks considered for this body atom. For example, rule (3) can always ignore inferences of rule (6), since the constants hasPart and owl:inverseOf do not match.
We can even apply this optimization if the head of unifies with the body atom , by exploiting the information contained in partial results obtained when computing the join (10) from left to right. Simplifying notation, we can write (10) as follows:
(11) 
where denotes the relation obtained by joining the EDB atoms. We compute this ary join by applying binary joins from left to right. Thus, the decision about the blocks to include for only needs to be made when we have already computed the relation . This relation yields all possible instantiations for the variables that occur in the terms , and we can thus view as a set of possible partial substitutions that may lead to a match of the rule. Using this notation, we obtain the following result.
Theorem 2
If, for all , the atom does not unify with the head of , then the result of (10) remains the same when replacing the relation by .
This turns a static optimization technique into a dynamic, datadriven optimization. While the static approach required a mismatch of rules under all possible instantiations, the dynamic version considers only a subset of those, which is guaranteed to contain all actual matches. This idea can be applied to other optimizations as well. In any case, implementations must decide if the cost of checking a potentially large number of partial instantiations in is worth paying in the light of the potential savings.
Redundant Rules
A rule is trivially redundant if its head atom occurs in its body. Such rules do not need to be applied, as they can only produce duplicate inferences. While trivially redundant rules are unlikely to occur in practice, the combination of two rules frequently has this form. Namely, if the head of unifies with , then we can resolve rule with , i.e., apply backward chaining, to obtain a rule of the form:
(12) 
where is a variant of the body of to which a most general unifier has been applied. If rule is trivially redundant, we can again ignore . Moreover, we can again turn this into a dynamic optimization method by using partially computed joins as above.
Theorem 3
If, for all , the rule is trivially redundant, then the result of (10) remains the same when replacing the relation by .
For example, assume we want to apply rule (5) of our initial example, and was derived by rule (4). Using backward chaining, we obtain , which is not trivially redundant. However, evaluating the first part of the body for our initial example data, we obtain just a single substitution . Now is trivially redundant. This optimization depends on the data, and cannot be found by considering rules alone.
Subsumed Rules
Many further optimizations can be realized using our novel storage layout. As a final example, we present an optimization that we have not implemented yet, but which we think is worth mentioning as it is theoretically sound and may show a promising direction for future works. Namely, we consider the case where some of the inferences of rule were already produced by another rule since the last application of in step . We say that rule is subsumed by rule if, for all sets of facts , . It is easy to compute this, based on the wellknown method of checking subsumption of conjunctive queries (?). If this case is detected, can be ignored during materialization, leading to another form of static optimization. However, this is rare in practice. A more common case is that one specific way of applying is subsumed by .
Namely, when considering whether to use when applying rule , we can check if the resolved rule shown in (12) is subsumed by a rule that has already been applied after step . If yes, then can again be ignored. For example, consider the rules (2)–(6) and an additional rule
(13) 
which is a typical way to declare the domain of a property. Then we never need to apply rule (13) to inferences of rule (6), since the combination of these rules is subsumed by rule (13).
One can precompute these relationships statically, resulting in statements of the form “ does not need to be applied to inferences produced by in step if has already been applied to all facts up until step .” This information can then be used dynamically during materialization to eliminate further blocks. The special case was illustrated in the example. It is safe for a rule to subsume part of its own application in this way.
Memoization
The application of a rule with IDB body atoms requires the evaluation of SNE rules of the form (9). Most of the joined relations range over (almost) all inferences of the respective IDB atom, starting from . Even if optimizations can eliminate many blocks in this range, the algorithm may spend considerable resources on computing these optimizations and the remaining ondemand concatenations, which may still be required. This cost occurs for each application of the rule, even if there were no new inferences for since the last computation.
Therefore, rules with fewer IDB body atoms can be evaluated faster. Especially rules with only one IDB body atom require only a single SNE rule using the limited range of blocks . To make this favorable situation more common, we can precompute the extensions of selected IDB atoms, and then treat these atoms as part of the EDB layer. We say that the precomputed IDB atom is memoized. For example, we could memoize the atom in (3). Note that we might memoize an atom without precomputing all instantiations of its predicate. A similar approach was used for OWL RL reasoning by ? (?), who proved the correctness of this transformation.
SNE is not efficient for selective precomputations, since it would compute large parts of the materialization. Goaldirected methods, such as QSQR or Magic Sets, focus on inferences needed to answer a given query and hence are more suitable (?). We found QSQR to perform best in our setting.
Which IDB atoms should be memoized? For specific inferencing tasks, this choice is often fixed. For example, it is very common to precompute the subproperty hierarchy. We cannot rely on such prior domain knowledge for general Datalog, and we therefore apply a heuristic: we attempt precomputation for all most general body atoms with QSQR, but set a timeout (default 1 sec). Memoization is only performed for atoms where precomputation completes before this time. This turns out to be highly effective in some cases.
Evaluation
#triples  VLog  Rule sets  
Dataset  (EDB facts)  DB size  L  U  LE 
LUBM1K  133M  5.5GB  170  202  182 
LUBM5K  691M  28GB  "  "  " 
DBpedia  112M  4.8GB  9396  —  — 
Claros  19M  980MB  2689  3229  2749 
ClarosS  500K  41MB  "  "  " 
In this section, we evaluate our approach based on a prototype implementation called VLog. As our main goal is to support KG materialization under limited resources, we perform all evaluations on a laptop computer. Our source code and a short tutorial is found at https://github.com/jrbn/vlog.
Experimental Setup The computer used in all experiments is a Macbook Pro with a 2.2GHz Intel Core i7 processor, 512GB SDD, and 16GB RAM running on MacOS Yosemite OS v10.10.5. All software (ours and competitors) was compiled from C++ sources using Apple CLang/LLVM v6.1.0.
We used largely the same data that was also used to evaluate RDFox (?). Datasets and Datalog programs are available online.^{2}^{2}2http://www.cs.ox.ac.uk/isg/tools/RDFox/2014/AAAI/ The datasets we used are the culturalheritage ontology Claros (?), the DBpedia KG extracted from Wikipedia (?), and two differently sized graphs generated with the LUBM benchmark (?). In addition, we created a random sample of Claros that we call ClarosS. Statistics on these datasets are given in Table 1.
All of these datasets come with OWL ontologies that can be used for inferencing. ? used a custom translation of these ontologies into Datalog. There are several types of rule sets: “L” denotes the custom translation of the original ontology; “U” is an (upper) approximation of OWL ontologies that cannot be fully captured in Datalog; “LE” is an extension of the “L” version with additional rules to make inferencing harder. All of these rules operate on a Datalog translation of the input graph, e.g., a triple might be represented by a fact . We added rules to translate EDB triples to IDB atoms. The W3C standard also defines another set of derivation rules for OWL RL that can work directly on triples (?). We use “O” to refer to 66 of those rules, where we omitted the rules for datatypes and equality reasoning (?, Tables 4 and 8).
VLog combines an ondisk EDB layer with an inmemory columnar IDB layer to achieve a good memory/runtime balance on limited hardware. The specifically developed ondisk database uses six permutation indexes, following standard practice in the field (?). No other tool is specifically optimized for our setting, but the leading inmemory system RDFox is most similar, and we therefore use it for comparison. As our current prototype does not use parallelism, we compared it to the sequential version of the original version of RDFox (?). We recompiled it with the “release” configuration and the sequential storage variant. Later RDFox versions perform equality reasoning, which would lead to some input data being interpreted differently (?; ?). We were unable to deactivate this feature, and hence did not use these versions. If not stated otherwise, VLog was always used with dynamic optimizations activated but without memoization.
Runtime and Memory Usage Table 2 reports the runtime and memory usage for materialization on our test data, and the total number of inferences computed by VLog. Not all operations could be completed on our hardware: oom denotes an outofmemory error, while tout denotes a timeout after 3h. Memory denotes the peak RAM usage as measured using OS APIs.
The number of IDB facts inferred by VLog is based on a strict separation of IDB and EDB predicates, using rules like (2) to import facts used in rules. This is different from the figure reported for RDFox, which corresponds to unique triples (inferred or given). We have compared the output of both tools to ensure correctness.
RDFox has been shown to achieve excellent speedups using multiple CPUs, so our sequential runtime measurements are not RDFox’s best performance but a baseline for fast inmemory computation in a single thread. Memory usage can be compared more directly, since the parallel version of RDFox uses only slightly more memory (?). As we can see, VLog requires only 6%–46% of the working memory used by RDFox. As we keep EDB data on disk, the comparison with a pure inmemory system like RDFox should take the ondisk file sizes into account (Table 1); even when we add these, VLog uses less memory in all cases where RDFox terminates. In spite of these memory savings, VLog shows comparable runtimes, even when considering an (at most linear) speedup when parallelizing RDFox.
Dynamic Optimization Our prototype supports the optimizations “Mismatching Rules” (MR) and “Redundant Rules” (RR) discussed earlier. Table 3 shows the runtimes obtained by enabling both, one, or none of them.
Both MR and RR have little effect on LUBM and DBpedia. We attribute this to the rather “shallow” rules used in both cases. In constrast, both optimizations are very effective on Claros, reducing runtime by a factor of almost five. This is because SNE leads to some expensive joins that produce only duplicates and that the optimizations can avoid.
Data/Rules  No Mem.  Memoization  

#atoms  
LUBM1K/L  38  39  1.4  40.4  41.5 
LUBM1K/O  1514  41  6.5  230  236.5 
Memoization To evaluate the impact of memoization, we materialized LUBM1K with and without this feature, using the L and O rules. Table 4 shows total runtimes with and without memoization, the number of IDB atoms memoized, and the time used to compute their memoization.
For the L rules, memoization has no effect on materialization runtime despite the fact that 39 IDB atoms were memoized. For the O rules, in contrast, memoization decreases materialization runtime by a factor of six, at an initial cost of 6.5 seconds. We conclude that this procedure is indeed beneficial, but only if we use the standard OWL RL rules. Indeed, rules such as (4), which we used to motivate memoization, do not occur in the L rules. In a sense, the construction of L rules internalizes certain EDB facts and thus precomputes their effect before materialization.
Discussion and Conclusions
We have introduced a new columnoriented approach to perform Datalog inmemory materialization over large KGs. Our goal was to perform this task in an efficient manner, minimizing memory consumption and CPU power. Our evaluation indicates that it is a viable alternative to existing Datalog engines, leading to competitive runtimes at a significantly reduced memory consumption.
Our evaluation has also highlighted some challenges to address in future work. First, we observed that the execution of large joins can become problematic when many tables must be scanned for removing duplicates. This was the primary reason why the computation did not finish in time on some large datasets. Second, our implementation does not currently exploit multiple processors, and it will be interesting to see to how techniques of intra/inter query parallelism can be applied in this setting. Third, we plan to study mechanisms for efficiently merging inferences back into the input KG, which is not part of Datalog but useful in practice. Finally, we would also like to continue extending our dynamic optimizations to more complex cases, and to develop further optimizations that take advantage of our design.
Many further continuations of this research come to mind. To the best of our knowledge, this is the first work to exploit a columnbased approach for Datalog inferencing, and it does indeed seem as if the research on largescale inmemory Datalog computation has only just begun.
Acknowledgments This work was partially funded by COMMIT, the NWO VENI project 639.021.335, and the DFG in Emmy Noether grant KR 4381/11 and in CRC 912 HAEC within the cfAED Cluster of Excellence.
References
 [Abadi et al. 2009] Abadi, D. J.; Marcus, A.; Madden, S.; and Hollenbach, K. 2009. SWStore: a vertically partitioned DBMS for Semantic Web data management. VLDB J. 18(2):385–406.
 [Abadi, Madden, and Ferreira 2006] Abadi, D.; Madden, S.; and Ferreira, M. 2006. Integrating compression and execution in columnoriented database systems. In Proceedings of SIGMOD, 671–682.
 [Abiteboul, Hull, and Vianu 1995] Abiteboul, S.; Hull, R.; and Vianu, V. 1995. Foundations of Databases. Addison Wesley.
 [Bischoff et al. 2014] Bischoff, S.; Krötzsch, M.; Polleres, A.; and Rudolph, S. 2014. Schemaagnostic query rewriting for SPARQL 1.1. In Proc. 13th Int. Semantic Web Conf. (ISWC’14), volume 8796 of LNCS, 584–600. Springer.
 [Bishop et al. 2011] Bishop, B.; Kiryakov, A.; Ognyanoff, D.; Peikov, I.; Tashev, Z.; and Velkov, R. 2011. OWLIM: a family of scalable semantic repositories. Semantic Web Journal 2(1):33–42.
 [Bizer et al. 2009] Bizer, C.; Lehmann, J.; Kobilarov, G.; Auer, S.; Becker, C.; Cyganiak, R.; and Hellmann, S. 2009. DBpedia – A crystallization point for the Web of Data. J. of Web Semantics 7(3):154–165.
 [Bonet and Koenig 2015] Bonet, B., and Koenig, S., eds. 2015. Proc. AAAI’15. AAAI Press.
 [Callahan, CruzToledo, and Dumontier 2013] Callahan, A.; CruzToledo, J.; and Dumontier, M. 2013. Ontologybased querying with Bio2RDF’s linked open data. J. of Biomedical Semantics 4(S1).
 [Cyganiak, Wood, and Lanthaler 2014] Cyganiak, R.; Wood, D.; and Lanthaler, M., eds. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. Available at http://www.w3.org/TR/rdf11concepts/.
 [Guo, Pan, and Heflin 2005] Guo, Y.; Pan, Z.; and Heflin, J. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3:158–182.
 [Hoffart et al. 2013] Hoffart, J.; Suchanek, F. M.; Berberich, K.; and Weikum, G. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell., Special Issue on Artificial Intelligence, Wikipedia and SemiStructured Resources 194:28–61.
 [Idreos et al. 2012] Idreos, S.; Groffen, F.; Nes, N.; Manegold, S.; Mullender, K. S.; and Kersten, M. L. 2012. MonetDB: two decades of research in columnoriented database architectures. IEEE Data Eng. Bull. 35(1):40–45.
 [Kolovski, Wu, and Eadon 2010] Kolovski, V.; Wu, Z.; and Eadon, G. 2010. Optimizing enterprisescale OWL 2 RL reasoning in a relational database system. In Proc. 9th Int. Semantic Web Conf. (ISWC’10), volume 6496 of LNCS, 436–452. Springer.
 [Krötzsch 2011] Krötzsch, M. 2011. Efficient rulebased inferencing for OWL EL. In Walsh, T., ed., Proc. 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI’11), 2668–2673. AAAI Press/IJCAI.
 [Krötzsch 2012] Krötzsch, M. 2012. The notsoeasy task of computing class subsumptions in OWL RL. In Proc. 11th Int. Semantic Web Conf. (ISWC’12), volume 7649 of LNCS, 279–294. Springer.
 [Motik et al. 2009] Motik, B.; Cuenca Grau, B.; Horrocks, I.; Wu, Z.; Fokoue, A.; and Lutz, C., eds. 2009. OWL 2 Web Ontology Language: Profiles. W3C Recommendation. Available at http://www.w3.org/TR/owl2profiles/.
 [Motik et al. 2014] Motik, B.; Nenov, Y.; Piro, R.; Horrocks, I.; and Olteanu, D. 2014. Parallel materialisation of Datalog programs in centralised, mainmemory RDF systems. In Proc. AAAI’14, 129–137. AAAI Press.
 [Motik et al. 2015a] Motik, B.; Nenov, Y.; Piro, R.; and Horrocks, I. 2015a. Combining rewriting and incremental materialisation maintenance for datalog programs with equality. In Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI’15), 3127–3133. AAAI Press.
 [Motik et al. 2015b] Motik, B.; Nenov, Y.; Piro, R.; and Horrocks, I. 2015b. Handling owl:sameAs via rewriting. In Bonet and Koenig (?), 231–237.
 [2015c] Motik, B.; Nenov, Y.; Piro, R.; and Horrocks, I. 2015c. Incremental update of datalog materialisation: the backward/forward algorithm. In Bonet and Koenig (?), 1560–1568.
 [2010] Neumann, T., and Weikum, G. 2010. The RDF3X engine for scalable management of RDF data. VLDB J. 19(1):91–113.
 [2009] Oren, E.; Kotoulas, S.; Anadiotis, G.; Siebes, R.; ten Teije, A.; and van Harmelen, F. 2009. Marvin: Distributed reasoning over largescale Semantic Web data. J. of Web Semantics 7(4):305–316.
 [2003] Russell, S., and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Prentice Hall, second edition.
 [2012] Urbani, J.; Kotoulas, S.; Maassen, J.; Van Harmelen, F.; and Bal, H. 2012. WebPIE: A Webscale Parallel Inference Engine using MapReduce. Journal of Web Semantics 10:59–75.
 [2013] Urbani, J.; Margara, A.; Jacobs, C.; van Harmelen, F.; and Bal, H. 2013. Dynamite: Parallel materialization of dynamic RDF data. In The Semantic Web–ISWC 2013. Springer. 657–672.
 [2014] Urbani, J.; Piro, R.; van Harmelen, F.; and Bal, H. 2014. Hybrid reasoning on OWL RL. Semantic Web 5(6):423–447.
 [2016] Urbani, J.; Jacobs, C.; and Krötzsch, M. 2016. Columnoriented Datalog materialization for large knowledge graphs. In Proc. AAAI’16. AAAI Press.
 [2014] Vrandečić, D., and Krötzsch, M. 2014. Wikidata: A free collaborative knowledge base. Commun. ACM 57(10).
 [2009] Weaver, J., and Hendler, J. A. 2009. Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In Proc. 8th Int. Semantic Web Conf. (ISWC’09), volume 5823 of LNCS, 682–697. Springer.
Appendix A Appendix: Proofs
Proof of Theorem 1
We first observe that the naive approach (8) terminates and leads to a unique least model . Recall that the latter was defined by applying all rules in parallel in each step. Now consider an arbitrary, fair sequence of individual applications of rules , each applied naively as in (8). Let denote the set of all facts derived in this way up until step . Clearly, the rulebyrule inference is sound, i.e., for all derivation steps . It remains to show that it is also complete in the sense that for some . Since we apply rules fairly, there is a sequence of derivation step indices such that every rule has been applied in each interval of the form . Formally, for every in the sequence, and for every rule , there is such that . It follows that (in words: the sequential application of rules derives at least the inferences that a parallel application of rules would derive). Therefore, by a simple induction, for every . Since for some finite (?), we have . Together with soundness, this implies that , as required.
Now to show that the seminaive application strategy based on rules of the form (9) is also sound, we merely need to show that it produces the same inferences as the naive rulebyrule application would produce (based on the same, fair sequence of rules). Let refer to the facts derived for in step using the seminaive procedure, and let denote the set of facts produced for in step by the naive procedure. We show by induction that holds for ever predicate and every step .
The induction base is trivial, since . For the induction step, assume that the claim holds for all . Let of form (7) be the rule applied in step , and assume that was last applied in step (set to if it was never applied). is computed by evaluating (9) for every , while is obtained by evaluating (8).
For every inference obtained from (8), there is a ground substitution such that the rule
is applicable and . Being applicable here means that for every (and likewise for expressions ). Now whenever , there is an index such that .
Given an inference and ground substitution as above, is also inferred by a rule of the form (9). Indeed, let be the largest index from the range such that . Then the following ground instantiation of (9) is applicable:
This follows from the induction hypothesis and the definition of . Note that the case where for all can be disregarded, since it follows by the induction hypothesis that such inferences have already been produced when applying rule in step . This completes the induction and the proof.
Proof of Theorem 2
This claim is immediate from the definitions. In detail, consider and as in the claim of the theorem. Moreover, let be the set of all complete rule body matches that could be computed without taking any optimization into account. Clearly, , i.e., contains only tuples compatible with . By the assumption in the theorem, for all , the atom does not unify with the head of . Therefore, does not contain any fact that is compatible with , i.e., (where the join here is meant to join the positions in accordance with the terms used in ). This implies that , and thus does not need to be considered for finding matches of when computing .
Proof of Theorem 3
The claim is again rather immediate, but we spell it out in detail for completeness. Assume we apply a rule of the form (7) in step , after it was last applied in step . We use similar notation for (partial) joins as in the proof of Theorem 2 in the previous section. In addition, let be of the following form:
As shown in Theorem 1, is the same set of facts that would be produced by evaluating a naive version of in step , i.e., by using a computation of the form
Note that . Let denote the result of the following join
We can again consider the element of as substitutions over the variables of , where we assume without loss of generality that shares no variables with .
Now consider the situation as in the claim where we apply a particular seminaive rule of the form (9) and have partially evaluated the rule body until . Consider any (where the join identifies positions/variables as necessary to unify the atoms and ). By the definition of redundancy, contains an atom such that (in particular ). As assigns values to all variables in , we find that is a list of ground terms. By definition of , . Since and , we get . Therefore, applying rule with any substitution that extends in step is redundant. Since the argument holds for all assignments in , and since the projection to of to variables in is a superset of , we find that all tuples from can be ignored when applying .