Turning 30: New Ideas in Inductive Logic Programming

Turning 30: New Ideas in Inductive Logic Programming

Abstract

Common criticisms of state-of-the-art machine learning include poor generalisation, a lack of interpretability, and a need for large amounts of training data. We survey recent work in inductive logic programming (ILP), a form of machine learning that induces logic programs from data, which has shown promise at addressing these limitations. We focus on new methods for learning recursive programs that generalise from few examples, a shift from using hand-crafted background knowledge to learning background knowledge, and the use of different technologies, notably answer set programming and neural networks. As ILP approaches 30, we also discuss directions for future research.

1 Introduction

Common criticisms of state-of-the-art machine learning include poor generalisation, a lack of interpretability, and a need for large numbers of training examples [32, 7, 5]. In this paper, we survey recent work in inductive logic programming (ILP) [41], a form of machine learning that induces logic programs from data, which has shown promise at addressing these limitations.

Compared to most machine learning approaches, ILP has several advantages. ILP systems can learn using background knowledge (BK) (relational theories), such as using a theory of light to understand images [36]. Because of the expressivity of logic programs, ILP systems can learn complex relational theories, such as cellular automata [21], event calculus theories [24], and Petri nets [3]. Because hypotheses are logic programs, they can be read by humans, crucial for explainable AI and ultra-strong machine learning [33]. Due to the strong inductive bias imposed by the BK, ILP systems can generalise from small numbers of examples, often a single example [31]. Finally, because of their symbolic nature, ILP systems naturally support lifelong and transfer learning [15], which is considered essential for human-like AI [26].

Old ILP New ILP
Recursion Limited Yes
Predicate invention No Yes
Hypotheses First-order Higher-order, ASP
Optimality No Yes
Technology Prolog Prolog, ASP, NNs
Table 1: A simplified comparison between old and new ILP systems.

Some of these advantages come from recent developments, which we survey in this paper. To aid the reader, we coarsely compare old and new ILP systems. We use FOIL [43], Progol [42], TILDE [6], and Aleph [51] as representative old systems and ILASP [29], Metagol [10], and ILP [19] as representative new systems. This comparison, shown in Table 1, is, of course, vastly over simplified, and there are many exceptions to the comparison. We discuss each development (each row in the table) in turn. We discuss new methods for learning recursive logic programs, which allows for more generalisation from fewer examples (Section 2); a shift from using hand-crafted BK to learning BK, namely through predicate invention and transfer learning, which has been shown to improve predictive accuracies and reduce learning times (Section 3); learning programs of various expressivity, notably Datalog and answer set programs (Section 4); new methods for learning optimal programs, such as efficient time complexity programs (Section 5); and recent work that uses different underlying technologies to perform the learning, notably to leverage recent improvements with ASP/SMT solvers and neural networks (Section 6). Finally, as ILP approaches 30, we conclude by proposing directions for future research.

1.1 What is ILP?

Given positive and negative examples and BK, the goal of an ILP system is to induce (learn) a hypothesis (a logic program) which, with the BK, entails as many positive and as few negative examples as possible [41]. Whereas most machine learning approaches represent data as vectors, ILP represents data (examples, BK, and hypotheses) as logic programs.

Example 1.

To illustrate ILP, suppose you want to learn a string transformation program from the following examples.

Input Output
machine e
learning g
algorithm m

Most forms of machine learning would represent these examples as vectors. By contrast, in ILP, we would represent these examples as logical atoms, such as f([m,a,c,h,i,n,e], e), where f is the target predicate that we want to learn (i.e. the relation to generalise). BK (features) is likewise represented as a logical theory (a logic program). For instance, for the string transformation problem, we could provide BK that contains logical definitions for string operations, such as empty(A), which holds when the list A is empty, head(A,B), which holds when B is the head of the list A, and tail(A,B), which holds when C is the tail of the list A. Given the aforementioned examples and BK, an ILP system could induce the hypothesis (a logic program) shown in Figure 1.

f(A,B):-tail(A,C),empty(C),head(A,B).
f(A,B):-tail(A,C),f(C,B).
Figure 1: A hypothesis (logic program) for the string transformation problem in Example 1. Each line of the program is a logical clause (or rule). The first rule says that the relation f(A,B) holds when the three literals tail(A,C), empty(C), and head(A,B) hold, i.e. B is the last element of A when the tail of A is empty and B is the head of A. The second rule is recursive and says that the relation f(A,B) holds when the two literals tail(A,C) and f(C,B) hold, i.e. f(A,B) holds when the same relation holds for the tail of A.

2 Recursion

A recursive logic program is one where the same predicate appears in the head and body of a rule. Learning recursive programs has long been considered a difficult problem for ILP [39]. To illustrate the importance of recursion, consider the string transformation problem in Example 1. With recursion, an ILP system can learn the compact program shown in Figure 1. Because of the symbolic representation and the recursive nature, the program generalises to lists of arbitrary size and to lists that contain arbitrary elements (e.g integers, characters, etc). Without recursion, an ILP system would need to learn a separate clause to find the last element for each list of length , such as:

f(A,B):-tail(A,C),empty(C),head(A,B).
f(A,B):-tail(A,C),tail(C,D),empty(D),head(C,B).
f(A,B):-tail(A,C),tail(C,D),tail(D,E),empty(E),head(E,B).

In other words, without recursion, it is often difficult for an ILP system to generalise from small numbers of examples [12].

Older ILP systems struggle to learn recursive programs, especially from small numbers of training examples. A common limitation with existing approaches is that they rely on bottom clause construction [42]. In this approach, for each example, an ILP system creates the most specific clause that entails the example, and then tries to generalise the clause to entail other examples. However, this sequential covering approach requires examples of both the base and inductive cases, and in that order.

Interest in recursion resurged with the introduction of meta-interpretive learning (MIL) [37, 38, 8] and the MIL system Metagol [10]. The key idea of MIL is to use metarules, or program templates, to restrict the form of inducible programs, and thus the hypothesis space1. A metarule is a higher-order clause. For instance, the chain metarule is , where the letters , , and denote higher-order variables and , and denote first-order variables. The goal of a MIL system, such as Metagol, is to find substitutions for the higher-order variables. For instance, the chain metarule allows Metagol to induce programs such as f(A,B):-tail(A,C),head(C,B)2. Metagol induces recursive programs using recursive metarules, such as the tail recursion metarule P(A,B) Q(A,C), P(C,B).

Following MIL, many ILP systems can learn recursive programs [29, 19, 22]. With recursion, ILP systems can now generalise from small numbers of examples, often a single example [31, 14]. The ability to learn recursive programs has opened up ILP to new application areas, including learning string transformations programs [31], robot strategies [9], and answer set grammars [27].

3 Learning Background Knowledge

A key characteristic of ILP is the use of BK as a form of inductive bias. BK is similar to features used in most forms of machine learning. However, whereas features are vectors, BK usually contains facts and rules (extensional and intensional definitions) in the form of a logic program. For instance, when learning string transformation programs, we may want to supply helper background relations, such as head/23 and tail/2. For other domains, we may want to supply more complex BK, such as a theory of light to understand images [36] or higher-order operations, such as map/3, filter/3, and fold/4, to solve programming puzzles [8].

As with choosing appropriate features, choosing appropriate BK is crucial for good learning performance. ILP has traditionally relied on hand-crafted BK, often designed by domain experts, i.e. feature engineering. This approach is clearly limited because obtaining suitable BK can be difficult and expensive. Indeed, the over reliance on hand-crafted BK is a common criticism of ILP [19].

Two recent avenues of research attempt to overcome this limitation. The first idea is to enable an ILP system to automatically invent new predicate symbols. The second idea is to perform lifelong and transfer learning to discover knowledge that can be reused to help learn other programs. We discuss these ideas in turn.

3.1 Predicate Invention

Rather than expecting a user to provide all the necessary BK, the goal of predicate invention is for an ILP system to automatically invent new auxiliary predicate symbols. This idea is similar to when humans create new functions when manually writing programs, as to reduce code duplication or to improve readability.

Whilst predicate invention has attracted interest since the beginnings of ILP [35], most previous attempts have been unsuccessful resulting in no support for predicate invention in popular ILP systems [43, 42, 6, 51]. A key limitation of early ILP systems is that the search is complex and under-specified in various ways. For instance, it was unclear how many arguments an invented predicate should have, and how the arguments should be ordered.

As with recursion, interest in predicate invention has resurged with the introduction of MIL. MIL avoids the issues of older ILP systems by using metarules to define the hypothesis space and in turn reduce the complexity of inventing a new predicate symbol. As mentioned, a metarule is a higher-order clause. For instance, the chain metarule () allows Metagol to induce programs such as f(A,B):- tail(A,C),tail(C,D), which would drop the first two elements from a list. To induce longer clauses, such as to drop first three elements from a list, Metagol can use the same metarule but can invent a new predicate symbol and then chain their application, such as to induce the program:

f(A,B):-tail(A,C),inv1(C,B).
inv1(A,B):-tail(A,C),tail(C,B).

To learn this program, Metagol invents the predicate symbol inv1 and induces a definition for it using the chain metarule. Metagol uses this new predicate symbol in the definition for the target predicate f.

Predicate invention allows Metagol (and other ILP systems) to learn programs by expanding their BK. A major side-effect of this metarule and predicate invention driven approach is that problems are forced to be decomposed into smaller problems, where the decomposed solutions can be reused. For instance, suppose you wanted to learn a program that drops the first four elements of a list, then Metagol could learn the following program, where the invented predicate symbol inv1 is used twice:

f(A,B):-inv1(A,C),inv1(C,B).
inv1(A,B):-tail(A,C),tail(C,B).

Predicate invention has been shown to help reduce the size of target programs, which in turns reduces sample complexity and improves predictive accuracy [14, 8]. Following Metagol, other newer ILP systems support predicate invention [19, 22], often using a metarule guided approach. Such systems all have the general principle of introducing new predicate symbols when their current BK is insufficient to learn a hypothesis.

In contrast to aforementioned approaches, a different idea is to invent new predicates to improve knowledge representation. For instance, CURLED [17] learns new predicates by clustering constants and relations in the provided BK, turning each identified cluster into a new BK predicate. The key insight of CURLED is not to use a single similarity measure, but rather a set of various similarities. This choice is motivated by the fact that different similarities are useful for different tasks, but in the unsupervised setting the task itself is not known in advance. CURLED would therefore invent predicates by producing different clusterings according to the features of the objects, community structure and so on.

ALPs [18] perform predicate invention inspired by a (neural) auto-encoding principle: they learn an encoding logic program that maps the provided data to a new, compressive latent representation (defined in terms of the invented predicates), and a decoding logic program that can reconstruct the provided data from its latent representation. Both approaches have demonstrated an improved performance on supervised tasks, even though the predicate invention step is task-agnostic.

3.2 Lifelong Learning

An advantage of a symbolic representation is that learned knowledge can be remembered, i.e. explicitly stored in the BK. Therefore, the second line of research that tries to address the limitations of hand-crafted BK tries to leverage transfer learning. The general idea is to reuse knowledge gained from solving one problem to help solve a different problem.

One notable application of transfer learning is the Metagol system [31] which, given a set of tasks, uses Metagol to try to learn a solution for each task using at most 1 clause. If Metagol finds a solution for any task, it adds the solution to the BK and removes the task from set. It then tries to find solutions for the rest of the tasks, but can now (1) use an additional clause, and (2) reuse solutions from solved tasks. This process repeats until Metagol solves all the tasks, or reaches a maximum program size. In other words, Metagol automatically identifies easier problems, learn programs for them, and then reuses the solutions to help learn programs for more difficult problems. One of the key ideas of Metabias is to not only save the induced target relation to the BK, but to also add its constituent parts discovered through predicate invention. The authors experimentally show that their multi-task approach performs substantially better than a single-task approach because learned programs are frequently reused. Moreover, they show that this approach leads to a hierarchy of BK composed of reusable programs, where each builds on simpler programs, which can be seen as deep inductive logic programming.

Metagol saves all learned programs (including invented predicates) to the BK, which can be problematic because too much irrelevant BK is detrimental to learning performance. To address this problem, Forgetgol [15] introduces the idea of forgetting. In this approach, Forgetgol continually grows and shrinks its hypothesis space by adding and removing learned programs to and from its BK. The authors show that forgetting can reduce both the size of the hypothesis space and the sample complexity of an ILP learner when learning from many tasks, which shows potential for ILP to be useful in a lifelong or continual learning setting, which is considered crucial for AI [26].

The aforementioned Metagol and Forgetgol approaches assume a corpus of user-supplied tasks to train from. This assumption is unrealistic in many situations. To overcome this limitation, Playgol [14] first plays by randomly sampling its own tasks to solve, and tries to solve them, adding any solutions to the BK, which can be seen as a form of self-supervised learning. After playing Playgol tries to solve the user-supplied tasks by reusing solutions learned whilst playing. The goal of Playgol is similar to all the approaches discussed in this section: to automatically discover reusable general programs as to improve learning performance, but does so with fewer labelled examples.

4 Expressiveness

ILP systems have traditionally induced Prolog programs. A recent development has been to use alternative hypothesis representations.

4.1 Datalog

Datalog is a syntactical subset of Prolog which disallows complex terms as arguments of predicates and imposes restrictions on the use of negation (and negation with recursion). These restrictions make Datalog attractive for two reasons. First, Datalog is a truly declarative language, whereas in Prolog reordering clauses can change the program. Second, a Datalog query is guaranteed to terminate, though this guarantee is at the expense of not being a Turing-complete language, which Prolog is. Due to the aforementioned benefits, several works [1, 19, 49, 46] induce Datalog programs. The general motivation for reducing the expressivity of the representation language from Prolog to Datalog is to allow the problem to be encoded as a satisfiability problem, particularly to leverage recent developments in SAT and SMT. We discuss the advantages of this approach more in Section 6.1.

4.2 Answer Set Programming

ASP is a logic programming paradigm. Like Datalog, ASP is a truly declarative language. Compared to Datalog, ASP is more expressive, allowing, for instance, disjunction in the head of a clause, hard and weak constraints, and support for default inference and default assumptions. A key difference between ASP and Prolog is semantics. A definite logic program (a Prolog program) has only one model (the least Herbrand model). By contrast, an ASP program can have one, many, or even no models (answer sets). Due to its non-monotonicity, ASP are particularly attractive for expressing common-sense reasoning [30].

Approaches to learning ASP programs can mostly be divided into two categories: brave learners, which aim to learn a program such that at least one answer set covers the examples, and cautious learners, which aim to find a program which covers the examples in all answer sets. ILASP (Inductive Learning of Answer Set Programs) [29] is a collection of ILP systems that learn ASP programs. ILASP is notable because it supports both brave and cautious learning, which are both needed to learn some ASP programs [30]. Moreover, ILASP differs from most Prolog-based ILP systems because it learns unstratified ASP programs, including programs with normal rules, choice rules, and both hard and weak constraints, which classical ILP systems cannot. Learning ASP programs allows for ILP to be used for new problems, such as inducing answer set grammars [27].

4.3 Higher-Order Programs

Imagine learning a droplasts program, which removes the last element of each sublist in a list, e.g. [alice,bob,carol] [alic,bo,caro]. Given suitable input data, Metagol can learn this first-order recursive program:

f(A,B):-empty(A),empty(B).
f(A,B):-head(A,C),tail(A,D),head(B,E),
        tail(B,F),f1(C,E),f(D,F).
f1(A,B):-reverse(A,C),tail(C,D),reverse(D,B).

Although semantically correct, the program is verbose. To learn more compact programs, Metagol [8] extends Metagol to support learning higher-order programs, where predicate symbols can be used as terms. For instance, for the same droplasts problem, Metagol learns the higher-order program:

f(A,B):-map(A,B,f1).
f1(A,B):-reverse(A,C),tail(C,D),reverse(D,B).

To learn this program, Metagol invents the predicate symbol f1, which is used twice in the program: once as term in the map(A,B,f1) literal and once as a predicate symbol in the f1(A,B) literal. Compared to the first-order program, this higher-order program is smaller because it uses map/3 to abstract away the manipulation of the list and to avoid the need to learn an explicitly recursive program (recursion is implicit in map/3). By reducing the size of the target program by learning higher-order programs, Metagol has been shown to reduce sample complexity and learning times, and improve predictive accuracies [8]. This example illustrates the value of higher-order abstractions and inventions, which allow ILP systems to learn more complex programs using fewer examples with less search.

5 Optimal Programs

In ILP there are often multiple (sometimes infinite) hypotheses that explain the data. Deciding which hypothesis to choose has long been a difficult problem. Older ILP systems were not guaranteed to induce optimal programs, where optimal typically means with respect to the size of the induced program, or the coverage of examples.

A key reason for this limitation was that most search techniques learned a single clause at a time, leading to the construction of sub-programs which were sub-optimal in terms of program size and coverage. For instance, programs induced by Aleph offer no guarantee of optimality with respect to the program size and coverage.

Newer ILP systems try to address this limitation. As with the ability to learn recursive programs, the main development is to take a global view of the induction task. In other words, rather than induce a single clause at a time from a subset of the examples, the idea is to induce a whole program. For instance, ILASP is given as input a hypothesis space with a set of candidate clauses. The ILASP task is to find a minimal subset of clauses that covers as many positive and as few negative examples as possible. To do so, ILASP uses ASP’s optimisation abilities to provably learn the program with the fewest literals. Likewise, Metagol and HEXMIL are guaranteed to induce programs with the fewest clauses.

An advantage of learning optimal programs is learning performance. The idea is that the smallest program should provide better generalisations. When \citeauthorlaw:noisy (\citeyearlaw:noisy) compared ILASP (which is guaranteed to learn optimal programs) to Inspire [25] (which is not guaranteed to learn optimal programs), ILASP achieved a higher F1 score (both systems were given identical hypothesis spaces and optimisation criteria).

In addition to performance advantages, the ability to learn optimal programs opens up ILP to new problems. For instance, learning efficient logic programs has long been considered a difficult problem in ILP [40, 39], mainly because there is no declarative difference between an efficient program, such as mergesort, and an inefficient program, such as bubble sort. To address this issue, Metaopt [11] extends Metagol to support learning efficient programs. Metaopt maintains a cost during the hypothesis search and uses this cost to prune the hypothesis space. To learn minimal time complexity logic programs, Metaopt minimises the number of resolution steps. For instance, imagine trying to learn a find duplicate program, which finds any duplicate element in a list e.g. [p,r,o,g,r,a,m] r, and [i,n,d,u,c,t,i,o,n] i. Given suitable input data, Metagol can induce the recursive program:

f(A,B):-head(A,B),tail(A,C),element(C,B).
f(A,B):-tail(A,C),f(C,B).

This program goes through the elements of the list checking whether the same element exists in the rest of the list. Given the same input, Metaopt induces the recursive program:

f(A,B):-mergesort(A,C),f1(C,B).
f1(A,B):-head(A,B),tail(A,C),head(C,B).
f1(A,B):-tail(A,C),f1(C,B).

This program first sorts the input list and then goes though the list to check whether for duplicate adjacent elements. Although larger, both in terms of clauses and literals, the program learned by Metaopt is more efficient than the program learned by Metagol . The main implication of this work is that Metaopt can learn efficient robot strategies, efficient time complexity logic programs, and even efficient string transformation programs.

Following this idea, FastLAS [28] is an ASP-based ILP system that takes as input a custom scoring function and computes an optimal solution with respect to the given scoring function. The authors show that this approach allows a user to optimise domain-specific performance metrics on real-world datasets, such as access control policies.

6 Different Technologies

Older ILP systems mostly use Prolog for reasoning. Recent work considers using different technologies.

6.1 Constraint Satisfaction and Satisfiability

There have been tremendous recent advances in SAT and SMT solvers. To leverage these advances, much recent work uses ASP to induce logic programs [47, 2, 37, 29, 23, 24, 22]. The main motivations are to leverage (1) the language benefits of ASP (Section 4.2), and (2) the efficiency and optimisation techniques of modern ASP solvers, such as CLASP [20], which supports conflict propagation and learning. With similar motivations, several works [1, 49, 46] synthesise Datalog program by encoding the ILP task into a SMT problem.

These approaches have been shown able to reduce learning times compared to standard Prolog-based approaches. However, some unresolved issues remain. A key issue is that most approaches encode an ILP problem as a single (often very large) satisfiability problem. These approaches therefore often struggle to scale to very large problems [8]. Although preliminary work attempts to tackle this issue [28], work is still needed for these approaches to scale to large problems.

6.2 Neural Networks

With the recent rise of deep learning and neural networks, several approaches have explored using gradient-based methods to learn logic programs. These approaches all replace absolute logical reasoning with a relaxed version that yields continuous values reflecting the confidence of the conclusion. Although this approach limits the expressivity of hypotheses, it potentially allows for gradient-based methods to be used to learn from large datasets.

The research has primarily developed in three directions. The first concerns imitating logical reasoning with tensor calculus [53, 16]. These approaches represent predicates as binary tensors over the domain of constants and perform reasoning by chains of tensor products imitating as clause. The second concerns the relaxation of the subset selection problem [19, 50] in which the task of a neural network is to select a subset of clauses from a space of pre-defined clauses. The third, neural theorem provers [48] turn the learning problem towards learning to perform soft unification, which unifies not only the matching symbols but also similar ones, from a fixed set of proofs.

The major challenge of neural approaches is the inability to generalise beyond training data and data efficiency. The majority of these approaches embed logical symbols, i.e. they replace symbols with vectors, and therefore a learned model is unable to work with unseen constants. Moreover, neural methods often require millions of examples [16] to learn concepts that symbolic ILP is able to learn from just a few.

7 Limitations and Future Work

The recent advances surveyed in this paper have opened new problems for future work to address.

Relevance

New methods for predicate invention (Section 3.1) and transfer learning (Section 3.2) have improved the abilities of ILP systems to learn large programs. Moreover, these techniques raise the potential for ILP to be used in lifelong learning settings. However, inventing and acquiring new BK could lead to a problem of too much BK, which can overwhelm an ILP system [15]. On this issue, a key under-explored topic is that of relevancy. Given a new induction problem with large amounts of BK, how does an ILP system decide which BK is relevant? Without efficient methods of relevance identification, it is unclear how efficient lifelong learning can be achieved.

Noisy BK

Lifelong learning is seen as key to AI, and recent work in ILP has shown promise in this direction (Section 3.2). However, unresolved issues remain. One key issue is the underlying uncertainty associated with adding learned programs to the BK. By the nature of induction, such programs are expected to be noisy, yet they are the building blocks for further inductive inference. Building noisy programs on top of other noisy programs could lead to eventual incoherence of the learned program.

Probabilistic ILP

A principled way to handle noise is to unify logical and probabilistic reasoning, which is the focus of statistical relational artificial intelligence (StarAI) [45]. While StarAI is a growing field, inducing probabilistic logic programs has received little attention, with few notable exceptions [4, 44], as inference remains the main challenge. Addressing this issue, i.e. unifying probabiliy and logic in an inductive setting, would be a major achievement [32] and the ILP developments outlined in this paper will be a crucial element of the progress.

Explainability

Explainability is one of the claimed advantages of a symbolic representation. Recent work [34] evaluates the comprehensibility of ILP hypotheses using Michie’s (\citeyearmichie:usml) framework of ultra-strong machine learning, where a learned hypothesis is expected to not only be accurate but to also demonstrably improve the performance of a human being provided with the learned hypothesis. The paper empirically demonstrates improved human understanding directly through learned hypotheses. However, more work is required to better understand the conditions under which this can be achieved.

Summary

As ILP approaches 30, we think that the recent advances surveyed in this paper puts ILP in a prime position to have a significant impact on AI over the next decade, especially to address the key limitations of state-of-the-art machine learning.

Footnotes

  1. The idea of using metarules to restrict the hypothesis space has been widely adopted by many approaches [52, 1, 48, 19, 49, 3, 50, 22]. However, despite their now widespread use, there is little work determining which metarules to use for a given learning task ([13] is an exception), which future work must address.
  2. Metagol can induce longer clauses though predicate invention, which is described in Section 3.1.
  3. Notation for a predicate symbol head with two arguments.

References

  1. A. Albarghouthi, P. Koutris, M. Naik and C. Smith (2017) Constraint-based synthesis of Datalog programs. In CP, Cited by: §4.1, §6.1, footnote 1.
  2. D. Athakravi, D. Corapi, K. Broda and A. Russo (2013) Learning through hypothesis refinement using answer set programming. In ILP, External Links: Link, Document Cited by: §6.1.
  3. M. Bain and A. Srinivasan (2018) Identification of biological transition systems using meta-interpreted logic programs. Machine Learning 107 (7), pp. 1171–1206. External Links: Link, Document Cited by: §1, footnote 1.
  4. E. Bellodi and F. Riguzzi (2015) Structure learning of probabilistic logic programs by searching the clause space. TPLP 15 (2), pp. 169–212. External Links: Link, Document Cited by: §7.
  5. Y. Bengio, T. Deleu, N. Rahaman, N. R. Ke, S. Lachapelle, O. Bilaniuk, A. Goyal and C. J. Pal (2019) A meta-transfer objective for learning to disentangle causal mechanisms. CoRR abs/1901.10912. External Links: Link, 1901.10912 Cited by: §1.
  6. H. Blockeel and L. D. Raedt (1998) Top-down induction of first-order logical decision trees. Artif. Intell. 101 (1-2), pp. 285–297. External Links: Link, Document Cited by: §1, §3.1.
  7. F. Chollet (2019) On the measure of intelligence. CoRR abs/1911.01547. External Links: Link, 1911.01547 Cited by: §1.
  8. A. Cropper, R. Morel and S. Muggleton (2019-12-03) Learning higher-order logic programs. Machine Learning. External Links: ISSN 1573-0565, Document, Link Cited by: §2, §3.1, §3, §4.3, §4.3, §6.1.
  9. A. Cropper and S. H. Muggleton (2015) Learning efficient logical robot strategies involving composable objects. In IJCAI, Cited by: §2.
  10. A. Cropper and S. H. Muggleton (2016) Metagol system. Note: https://github.com/metagol/metagol External Links: Link Cited by: §1, §2.
  11. A. Cropper and S. H. Muggleton (2019) Learning efficient logic programs. Machine Learning 108 (7), pp. 1063–1083. External Links: Link, Document Cited by: §5.
  12. A. Cropper, A. Tamaddoni-Nezhad and S. H. Muggleton (2015) Meta-interpretive learning of data transformation programs. In ILP, Cited by: §2.
  13. A. Cropper and S. Tourret (2019-11-20) Logical reduction of metarules. Machine Learning. External Links: ISSN 1573-0565, Document, Link Cited by: footnote 1.
  14. A. Cropper (2019) Playgol: learning programs through play. In IJCAI, Cited by: §2, §3.1, §3.2.
  15. A. Cropper (2020) Forgetting to learn logic programs. In AAAI, Cited by: §1, §3.2, §7.
  16. H. Dong, J. Mao, T. Lin, C. Wang, L. Li and D. Zhou (2019) Neural logic machines. In ICLR, Cited by: §6.2, §6.2.
  17. S. Dumančić and H. Blockeel (2017) Clustering-based relational unsupervised representation learning with an explicit distributed representation. In IJCAI, Cited by: §3.1.
  18. S. Dumančić, T. Guns, W. Meert and H. Blockeel (2019) Learning relational representations with auto-encoding logic programs. In IJCAI, Cited by: §3.1.
  19. R. Evans and E. Grefenstette (2018) Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, pp. 1–64. External Links: Link, Document Cited by: §1, §2, §3.1, §3, §4.1, §6.2, footnote 1.
  20. M. Gebser, B. Kaufmann and T. Schaub (2012) Conflict-driven answer set solving: from theory to practice. Artif. Intell. 187, pp. 52–89. External Links: Link, Document Cited by: §6.1.
  21. K. Inoue, T. Ribeiro and C. Sakama (2014) Learning from interpretation transition. Machine Learning 94 (1), pp. 51–79. External Links: Link, Document Cited by: §1.
  22. T. Kaminski, T. Eiter and K. Inoue (2019) Meta-interpretive learning using HEX-programs. In IJCAI, Cited by: §2, §3.1, §6.1, footnote 1.
  23. N. Katzouris, A. Artikis and G. Paliouras (2015) Incremental learning of event definitions with inductive logic programming. Machine Learning 100 (2-3), pp. 555–585. External Links: Link, Document Cited by: §6.1.
  24. N. Katzouris, A. Artikis and G. Paliouras (2016) Online learning of event definitions. TPLP 16 (5-6), pp. 817–833. External Links: Link, Document Cited by: §1, §6.1.
  25. M. Kazmi, P. Schüller and Y. Saygin (2017) Improving scalability of inductive logic programming via pruning and best-effort optimisation. Expert Syst. Appl. 87, pp. 291–303. External Links: Link, Document Cited by: §5.
  26. B. M. Lake, T. D. Ullman, J. B. Tenenbaum and S. J. Gershman (2016) Building machines that learn and think like people. CoRR abs/1604.00289. External Links: Link, 1604.00289 Cited by: §1, §3.2.
  27. M. Law, A. Russo, E. Bertino, K. Broda and J. Lobo (2019) Representing and learning grammars in answer set programming. In AAAI, Cited by: §2, §4.2.
  28. M. Law, A. Russo, E. Bertino, K. Broda and J. Lobo (2020) FastLAS: scalable inductive logic programming incorporating domain-specific optimisation criteria. In AAAI, Cited by: §5, §6.1.
  29. M. Law, A. Russo and K. Broda (2014) Inductive learning of answer set programs. In JELIA, Cited by: §1, §2, §4.2, §6.1.
  30. M. Law, A. Russo and K. Broda (2018) The complexity and generality of learning answer set programs. Artif. Intell. 259, pp. 110–146. External Links: Link, Document Cited by: §4.2, §4.2.
  31. D. Lin, E. Dechter, K. Ellis, J. B. Tenenbaum and S. Muggleton (2014) Bias reformulation for one-shot function induction. In ECAI, Cited by: §1, §2, §3.2.
  32. G. Marcus (2018) Deep learning: A critical appraisal. CoRR abs/1801.00631. External Links: Link, 1801.00631 Cited by: §1, §7.
  33. D. Michie (1988) Machine learning in the next five years. In EWSL, Cited by: §1.
  34. S.H. Muggleton, U. Schmid, C. Zeller, A. Tamaddoni-Nezhad and T. Besold (2018) Ultra-strong machine learning - comprehensibility of programs learned with ILP. Machine Learning 107, pp. 1119–1140. External Links: Link Cited by: §7.
  35. S. Muggleton and W. L. Buntine (1988) Machine invention of first order predicates by inverting resolution. In ICML, pp. 339–352. Cited by: §3.1.
  36. S. Muggleton, W. Dai, C. Sammut, A. Tamaddoni-Nezhad, J. Wen and Z. Zhou (2018) Meta-interpretive learning from noisy images. Machine Learning 107 (7), pp. 1097–1118. External Links: Link, Document Cited by: §1, §3.
  37. S. H. Muggleton, D. Lin, N. Pahlavi and A. Tamaddoni-Nezhad (2014) Meta-interpretive learning: application to grammatical inference. Machine Learning 94 (1), pp. 25–49. External Links: Link, Document Cited by: §2, §6.1.
  38. S. H. Muggleton, D. Lin and A. Tamaddoni-Nezhad (2015) Meta-interpretive learning of higher-order dyadic Datalog: predicate invention revisited. Machine Learning 100 (1), pp. 49–73. External Links: Link, Document Cited by: §2.
  39. S. Muggleton, L. D. Raedt, D. Poole, I. Bratko, P. A. Flach, K. Inoue and A. Srinivasan (2012) ILP turns 20 - biography and future challenges. Machine Learning 86 (1), pp. 3–23. External Links: Link, Document Cited by: §2, §5.
  40. S. Muggleton and L. D. Raedt (1994) Inductive logic programming: theory and methods. J. Log. Program. 19/20, pp. 629–679. External Links: Link, Document Cited by: §5.
  41. S. Muggleton (1991) Inductive logic programming. New Generation Comput. 8 (4), pp. 295–318. External Links: Link, Document Cited by: §1.1, §1.
  42. S. Muggleton (1995) Inverse entailment and progol. New Generation Comput. 13 (3&4), pp. 245–286. External Links: Link, Document Cited by: §1, §2, §3.1.
  43. J. R. Quinlan (1990) Learning logical definitions from relations. Machine Learning 5, pp. 239–266. External Links: Link, Document Cited by: §1, §3.1.
  44. L. D. Raedt, A. Dries, I. Thon, G. V. den Broeck and M. Verbeke (2015) Inducing probabilistic relational rules from probabilistic examples. In IJCAI, Cited by: §7.
  45. L. D. Raedt, K. Kersting, S. Natarajan and D. Poole (2016) Statistical relational artificial intelligence: logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers. External Links: Link, Document Cited by: §7.
  46. M. Raghothaman, J. Mendelson, D. Zhao, M. Naik and B. Scholz (2020) Provenance-guided synthesis of Datalog programs. PACMPL. Cited by: §4.1, §6.1.
  47. O. Ray (2009) Nonmonotonic abductive inductive learning. J. Applied Logic 7 (3), pp. 329–340. External Links: Link, Document Cited by: §6.1.
  48. T. Rocktäschel and S. Riedel (2017) End-to-end differentiable proving. In NIPS, Cited by: §6.2, footnote 1.
  49. X. Si, W. Lee, R. Zhang, A. Albarghouthi, P. Koutris and M. Naik (2018) Syntax-guided synthesis of Datalog programs. In ESEC/SIGSOFT, Cited by: §4.1, §6.1, footnote 1.
  50. X. Si, M. Raghothaman, K. Heo and M. Naik (2019) Synthesizing Datalog programs using numerical relaxation. In IJCAI, Cited by: §6.2, footnote 1.
  51. A. Srinivasan (2001) The ALEPH manual. Machine Learning at the Computing Laboratory, Oxford University. Cited by: §1, §3.1.
  52. W. Y. Wang, K. Mazaitis and W. W. Cohen (2014) Structure learning via parameter learning. In CIKM, External Links: Link, Document Cited by: footnote 1.
  53. F. Yang, Z. Yang and W. W. Cohen (2017) Differentiable learning of logical rules for knowledge base reasoning. In NIPS 2017, Cited by: §6.2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
410042
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description