From Knowledge Graph Embedding to Ontology Embedding:Region Based Representations of Relational StructuresAuthors appear in strict alphabetical order

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures††thanks: Authors appear in strict alphabetical order

Víctor Gutiérrez-Basulto    Steven Schockaert
School of Computer Science and Informatics
Cardiff University, Cardiff, UK
{gutierrezbasultov, schockaerts1}@cardiff.ac.uk
Abstract

Recent years have witnessed the enormous success of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. Currently, however, it is not yet well-understood how ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a framework based on convex regions, which can faithfully incorporate ontological knowledge into the vector space embedding. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding approaches are not capable of modelling even very simple types of rules. Second, we show that our framework can represent ontologies that are expressed using so-called quasi-chained existential rules in an exact way, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology.

From Knowledge Graph Embedding to Ontology Embedding:
Region Based Representations of Relational Structuresthanks: Authors appear in strict alphabetical order

Víctor Gutiérrez-Basulto and Steven Schockaert School of Computer Science and Informatics Cardiff University, Cardiff, UK {gutierrezbasultov, schockaerts1}@cardiff.ac.uk

1 Introduction

Knowledge graphs (KGs), i.e. sets of (subject,predicate,object) triples, play an increasingly central role in fields such as information retrieval and natural language processing (??). A wide variety of KGs are currently available, including carefully curated resources such as WordNet (?), crowdsourced resources such as Freebase (?), ConceptNet (?) and WikiData (?), and resources that have been extracted from natural language such as NELL (?). However, despite the large scale of some of these resources, they are, perhaps inevitably, far from complete. This has sparked a large amount of a research on the topic of automated knowledge base completion, e.g. random-walk based machine learning models (??) and factorization and embedding approaches (??). The main premise underlying these approaches is that many plausible triples can be found by exploiting the regularities that exist in a typical knowledge graph. For example, if we know (Peter Jackson, Directed, The fellowship of the ring) and (The fellowship of the ring, Has-sequel, The two towers), we may expect the triple (Peter Jackson, Directed, The two towers) to be somewhat plausible, if we can observe from the rest of the knowledge graph that sequels are often directed by the same person.

Due to their conceptual simplicity and high scalability, knowledge graph embeddings have become one of the most popular strategies for discovering and exploiting such regularities. These embeddings are -dimensional vector space representations, in which each entity (i.e. each node from the KG) is associated with a vector and each relation name is associated with a scoring function that encodes information about the likelihood of triples. For the ease of presentation, we will formulate KG embedding models such that iff the triple is considered more likely than the triple . Both the entity vectors and the scoring functions are learned from the information in the given KG. The main assumption is that the resulting vector space representation of the KG is such that it captures the important regularities from the considered domain. In particular, there will be triples which are not in the original KG, but for which is nonetheless low. They thus correspond to facts which are plausible, given the regularities that are observed in the KG as a whole, but which are not contained in the original KG. The number of dimensions of the embedding essentially controls the cautiousness of the knowledge graph completion process: the fewer dimensions, the more regularities can be discovered by the model, but the higher the risk of unwarranted inferences. On the other hand, if the number of dimensions is too high, the embedding may simply capture the given KG, without suggesting any additional plausible triples.

For example, in the seminal TransE model (?), relations are modelled as vector translations. In particular, the TransE scoring function is given by , where is the Euclidean distance and is a vector encoding of the relation name . Another popular model is DistMult (?), which corresponds to the choice , where we write for the coordinate of , and similar for and .

Given the embedding of a given knowledge graph, it often makes sense to consider hard thresholds such that a triple is considered valid iff . In fact, KG embeddings are often learned using a max-margin loss function which directly encodes this assumption. The vector space representation of a given relation name can then be viewed as a region in , defined as follows:

 η(R)={e⊕f|sR(e,f)≥λ}

where we write for vector concatenation. In particular, note that is considered a valid triple iff . Figure 1 illustrates the types of regions that are obtained for the TransE and DistMult models.

This region-based view of relations has a number of advantages. First, such regions can naturally be defined for relations of any arity, while the standard formulations of knowledge graph embedding models are typically restricted to binary relations. Second, and perhaps more fundamentally, it suggests a natural way to take into account prior knowledge about dependencies between different relation types. In particular, for many knowledge graphs, some kind of ontology is available, which can be viewed as a set of rules describing such dependencies. These rules naturally translate to spatial constraints on the regions . For instance, if we know that holds, it would be natural to require that . If a knowledge graph embedding captures the rules of a given ontology in this sense, we will call it a geometric model of the ontology.

By requiring that the embedding of a knowledge graph should be a geometric model of a given ontology, we can effectively exploit the knowledge contained in that ontology to obtain higher-quality representations. Indeed, there exists empirical support for the usefulness of rules for learning embeddings (????). A related advantage of geometric models over standard KG embeddings is that the set of triples which is considered valid based on the embedding is guaranteed to be logically consistent and deductively closed (relative to the given ontology). Finally, since geometric models are essentially “ontology embeddings”, they could be used for ontology completion, i.e. for finding plausible missing rules from the given ontology similar to how standard KG embedding models are used to find plausible missing triples from a KG.

Contributions. First, we will show that the most popular approaches to KG embedding are actually not compatible with the notion of a geometric model. For instance, as we will see, the representations obtained by DistMult (and its variants) can only model a very restricted class of subsumption hierarchies. This is problematic, as it not only means that we cannot impose the rules from a given ontology for learning knowledge graph embeddings, but also that the types of regularities that are captured by such rules cannot be learned from data either. We then investigate a novel framework in which relations are modelled as arbitrary convex regions in , with the arity of the relation name. Apart from providing a unified view on many existing models, this framework suggests several natural approaches to knowledge graph embedding which have not previously been considered.

In this paper, we will investigate knowledge bases composed of an ontology and a database (set of facts, generalizing triples). We are particularly concerned with pinpointing the kinds of existential rules that can be faithfully captured by such convex geometric models. The choice of existential rules as a target relational language is motivated by the fact that it is, arguably, one of the most important logical formalisms for encoding ontologies, describing not only constraints on the currently available knowledge or data, but also intentional knowledge about the domain of discourse. We show that the class of so-called quasi-chained existential rules can be properly modelled by convex geometric models. While convex geometric models are thus still not general enough to capture arbitrary existential rules, this particular class does subsume several key ontology languages based on description logics. Finally, we show that to capture arbitrary existential rules, a further generalization is needed, based on a non-linear transformation of the vector concatenations.

2 Background

In this section we provide some background on knowledge graph embedding and existential rules.

2.1 Knowledge Graph Embedding

A wide variety of KG embedding methods have already been proposed, varying mostly in the type of scoring function that is used. One popular class of methods was inspired by the TransE model. In particular, several authors have proposed generalizations of TransE to address the issue that TransE is only suitable for one-to-one relations (??): if and were both in the KG, then the TransE training objective would encourage and to be represented as identical vectors. The main idea behind these generalizations is to map the entities to a relation-specific subspace before applying the translation. For instance, the TransR scoring function is given by , where is an matrix (?). As a further generalization, in STransE a different matrix is used for the head entity and for the tail entity , leading to the scoring function (?).

A key limitation of DistMult, introduced above, is the fact that it can only model symmetric relations. A natural solution is to represent each entity using two vectors and , which are respectively used when appears in the head (i.e. as the first argument) or in the tail (i.e. as the second argument). In other words, the scoring function then becomes , where we write and similar for . The problem with this approach is that there is no connection at all between and , which makes learning suitable representations more difficult. To address this, the ComplEx model (?) represents entities and relations as vectors of complex numbers, such that and only differ in their imaginary parts. Let us write , with , for the bilinear product . Furthermore, for a complex vector , we write and for the real and imaginary parts of respectively. It can be shown (?) that the scoring function of ComplEx is equivalent to

 sR(e,f)= −⟨re(e),re(r),% re(f)⟩−⟨re(e),im(r),im(f)⟩ −⟨im(e),re(r),% im(f)⟩+⟨im(e),im(r),re(f)⟩

Thanks to the fact that and thus have a partly shared representation, the ComplEx model can achieve state-of-the-art results. Recently, in (?), a simpler approach was proposed to address the symmetry issue of DistMult. The proposed model, called SimplE, avoids the use of complex vectors. In this model, the DistMult scoring function is used with a separate representation for head and tail mentions of an entity, but for each triple in the knowledge graph, the triple is additionally considered. This means that each such triple affects the representation of , , and , and in this way, the main drawback of using separate representations for head and tail mentions is avoided. This model was experimentally found to perform similarly to ComplEx.

The RESCAL model (?) uses a bilinear scoring function , where the relation is modelled as an matrix . Note that DistMult can be seen as a special case of RESCAL in which only diagonal matrices are considered. Similarly, it is easy to verify that ComplEx also corresponds to a bilinear model, with a slightly different restriction on the type of considered matrices. Without any restriction on the type of considered matrices, however, the RESCAL model is prone to overfitting. The neural tensor model (NTN), proposed in (?) further generalizes RESCAL by using a two-layer neural network formulation, but similarly tends to suffer from overfitting in practice.

Expressivity. Intuitively, the reason why KG embedding models are able to identify plausible triples is because they can only represent knowledge graphs that exhibit a certain type of regularity. They can be seen as a particular class of dimensionality reduction methods: the lower the number of dimensions , the stronger the KG model enforces some notion of regularity (where the exact kind of regularity depends on the chosen KG embedding model). However, when the number of dimensions is sufficiently high, it is desirable that any KG can be represented in an exact way, in the following sense: for any given set of triples which are known to be valid and any set of triples which are known to be false, given a sufficiently high number of dimensions , there always exists an embedding and thresholds such that

 ∀(e,R,f)∈P.sR(e,f)≤λR (1) ∀(e,R,f)∈N.sR(e,f)>λR (2)

A KG embedding model is called fully expressive (?), or universal (?), if (1)–(2) can be guaranteed for any disjoint sets of triples and . If a KG embedding model is not fully expressive, it means that there are a priori constraints on the kind of knowledge graphs that can be represented, which can lead to unwarranted inferences when using this model for KG completion. In contrast, for fully expressive models, the types of KGs that can be represented is determined by the number of dimensions, which is typically seen as a hyperparameter, i.e. this number is tuned separately for each KG to avoid (too many) unwarranted inferences.

It turns out that translation based methods such as TransE, STransE and related generalizations are not fully expressive (?), and in fact put rather severe restrictions on the types of relations that can be represented in the sense of (1)–(2). For instance, it was shown in (?) that translation based methods can only fully represent a knowledge graph if each of its relations satisfies the following properties for every subset of entities :

1. If is reflexive over , then is also symmetric and transitive over .

2. If and then we also have .

However, both ComplEx and SimplE have been shown to be fully expressive.

Modelling Textual Descriptions. Several methods have been proposed which aim to learn better knowledge graph embeddings by exploiting textual descriptions of entities (???) or by extracting information about the relationship between two entities from sentences mentioning both of them (?). Apart from improving the overall quality of the embeddings, a key advantage of such approaches is that they allow us to predict plausible triples involving entities which do not occur in the initial knowledge graph.

2.2 Existential Rules

Existential rules (a.k.a. Datalog) are a family of rule-based formalisms for modelling ontologies. An existential rule is a datalog-like rule with existentially quantified variables in the head, i.e. it extends traditional datalog with value invention. The appeal of existential rules comes from the fact that they are extensions of the prominent and DL-Lite families of description logics (DLs) (?). For instance, existential rules can describe -ary relations, while DLs are constrained to unary and binary relations. We next formally introduce existential rules.

Syntax. Let and be infinite disjoint sets of constants, (labelled) nulls and variables, respectively. A term is an element in ; an atom is an expression of the form , where is a relation name (or predicate symbol, or simply predicate) with arity and terms . We denote with the set and with the set . An existential rule is an expression of the form

 B1∧…∧Bn→∃X1,…,Xj.H1∧…∧Hk, (3)

where for , for , are atoms with terms in and for are variables. From here on, we assume w.l.o.g that  (?). We use and to refer to and , respectively. We call the existential variables of ; if , is called a datalog rule. We further allow negative constraints (or simply constraints) which are expressions of the form where the s are as above and denotes the truth constant false. A finite set of existential rules and constraints is called an ontology; and a datalog program if contains only datalog rules (and constraints).

Let be a set of relation names. A database is a finite set of facts over , i.e. atoms with terms in . A knowledge base (KB) is a pair with an ontology (datalog program) and a database.

Semantics. An interpretation over is a (possibly infinite) set of atoms over with terms in . An interpretation is a model of if it satisfies all rules and constraints: implies for every defined as above in , where existential variables might be witnessed by labelled nulls, and for all constraints defined as above in ; it is a model of a database if ; it is a model of a KB , written , if it is a model of and . We say that a KB is satisfiable if it has a model. We refer to elements in simply as objects, call atoms containing only objects as terms ground, and denote with the set of all objects occurring in .

Example 1.

Let be a database and an ontology composed by the following rules

 Wife(X)∧Married(X,Y)→% Husband(Y) (4) Wife(Y)→∃X.Husband(X)∧Married(X,Y) (5) Husband(X)∧Wife(X)→⊥ (6)

Then, an example of a model of is the set of atoms where are labelled nulls. Note that e.g.  is not included in any model of due to (6).

Notation. We use for constants and for variables. We write for the set of relation names from which have arity . Given a KB , we use , and to denote, respectively, the set of constants, relation names and -ary relation names occurring in . For vectors and , we denote their concatenation by .

3 Geometric Models

In this section, we formalize how regions can be used for representing relations, and what it means for such representations to satisfy a given knowledge base. The resulting formalization will provide the foundations of a framework for knowledge base completion, based on embeddings that are jointly learned from a given database and ontology. We first define the geometric counterpart of an interpretation.

Definition 1 (Geometric interpretation).

Let be a set of relation names and let be a set of objects. An -dimensional geometric interpretation of assigns to each -ary relation name from a region and to each object from a vector .

An example of a 1-dimensional geometric interpretation of is depicted in Figure 2. Note that in this case, the unary predicates Husband and Wife are represented as intervals, whereas the binary predicate Married is represented as a convex polygon in . We now define what it means for a geometric interpretation to satisfy a ground atom.

Definition 2 (Satisfaction of ground atoms).

Let be an -dimensional geometric interpretation of . Let and let . We say that satisfies a ground atom , written , if .

For , we will write for the set of ground atoms over Y which are satisfied by , i.e.:

 {R(y1,...,yk)| R∈Rk,y1,...,yk∈Y,η⊨R(y1,...,yk)}

If , we also abbreviate as . For example, if is the geometric interpretation from Figure 2, we find:

 ϕ(η)={Husband(p),Wife(p),% Married(p,q),Married(q,p)}

As will become clear in Section 5, for any interpretation of a given knowledge base , we can always construct a geometric interpretation such that .

The notion of satisfaction in Definition 2 can be extended to propositional combinations of ground atoms in the usual way. Specifically, satisfies a rule , with ground atoms, if or . Now consider the case of a non-ground rule, e.g.:

 R(X,Y)∧S(Y,Z)→T(X,Z) (7)

Intuitively what we want to encode is whether satisfies every possible grounding of this rule, i.e. whether for any objects such that and it holds that . However, since an important aim of vector space representations is to enable inductive generalizations, this property of should not only hold for the constants occurring in the given knowledge base, but also for any possible constants whose representation we might learn from external sources (???). As a result, we need to impose the following, stronger requirement for to satisfy (7): for every such that and , it has to hold that . Note that a rule like (7) thus naturally translates into a spatial constraint on the representation of the relation names. Finally, let us consider an existential rule:

 R(X,Y)→∃Z.S(X,Y,Z) (8)

For to be a model of this rule, we require that for every such that there has to exist a such that . These intuitions are formalized in the following definition of a geometric model.

Definition 3.

Let be a knowledge base and a (possibly infinite) set of objects. A geometric interpretation of is called an -dimensional geometric model of if

1. , for some model of , and

2. for any set of points , can be extended to a geometric interpretation such that

1. for each there is a fresh constant such that ,

2. for some model of .

The first point in Definition 3 ensures that we can view geometric models as geometric representations of classical models. The second point in Definition 3 ensures that we can use geometric models to introduce objects from external sources, without introducing any inconsistencies. It captures the fact that the logical dependencies between the relation names encoded in should be properly captured by the spatial relationships between their geometric representations. Naturally, might contain additional (in comparison to ) nulls to witness existential demands over the new constants. For datalog programs, however, is completely determined by and the fact that for , that is, only Conditions 1 and 2 (a) above are necessary. Further, note that in this case .

As an example, note that the geometric interpretation depicted in Figure 2 is indeed a geometric model of the rules from Example 1.

Practical Significance of Geometric Models. The framework presented in this section offers several key advantages over standard KG embedding methods. First, it allows us to take into account a given ontology when learning the vector space representations, which should lead to higher-quality representations, and thus more faithful predictions, in cases where such an ontology is available. Also note that the region based framework can be applied to relations of any arity. Conversely, the framework also naturally allows us to obtain plausible rules from a learned geometric model, as this geometric model may (approximately) satisfy rules which are not entailed by the given ontology. Moreover, our framework allows for a tight integration of deductive and inductive modes of inference, as the facts and rules that are satisfied by a geometric model are deductively closed and logically consistent.

Modelling Relations as Convex Regions. While, in principle, arbitrary subsets of can be used for representing -ary relations, in practice the type of considered regions will need to be restricted in some way. This is needed to ensure that the regions can be efficiently learned from data and can be represented compactly. Moreover, the purpose of using vector space representations is to enable inductive inferences, but this is only possible if we impose sufficiently strong regularity conditions on the representations. For this reason, in this paper we will particularly focus on convex geometric interpretations, i.e. geometric interpretations in which each relation name is represented using a convex region. While this may seem like a strong assumption, the vast majority of existing KG embedding models in fact learn representations that correspond to such convex geometric interpretations. Moreover, when learning regions in high-dimensional spaces, strong assumptions such as convexity are needed to avoid overfitting, especially if the amount of training data is limited. Finally, the use of convex regions is also in accordance with cognitive models such as conceptual spaces (?), and more broadly with experimental findings in psychology, especially in cases where we are presented with few training examples (?).

One may wonder whether it is possible to go further and restrict attention e.g. to convex models that are induced by vector translations. For instance, we could consider regions which are such that means that we also have whenever , i.e. only the vector difference between and matters. Note that TransE and most of its generalizations aim to learn representations that correspond to such regions. Alas, as the next example illustrates, such translation-based regions do not have the desired generality, in the sense that they cannot properly capture even simple rules.

Example 2.

For instance, consider rules (5)-(6) in Example 1. For the ease of presentation, let us write for and for , i.e. we assume that holds for a constant iff . Let us furthermore assume that and are convex Finally, assume that is characterized by a region in such that holds iff . To capture the logical dependencies encoded by the rules, the following spatial relationships would then have to hold:

 CH ⊇{p+r|p∈CW,r∈CM} (9) CW ⊆{p+r|p∈CH,r∈CM} (10)

However, (9) and (10) entail111Indeed, suppose that , then by (10) there must exist some and such that . By (9) we furthermore have . Since is between and , both of which belong to , by the convexity of it follows that . that . Since, by rule (4), the concepts Wife and Husband are disjoint, we would have to choose and would not be able to represent any instances of these concepts.

It is perhaps not surprising that translation based representations are not suitable for modelling rules, since they are already known not to be fully expressive in the sense of (1)–(2). As we discussed in Section 2.1, there are several bilinear models which are known to be fully expressive, and which may thus be thought of as more promising candidates for defining suitable types of regions. As we will show in the next section, however, these bilinear models are not able to represent ontologies either. In Section 5, we will then consider arbitrary convex geometric models, and show that they can correctly represent a large class of existential rules.

4 Limitations of Bilinear Models

As already mentioned, translation based approaches incur rather severe limitations on the kinds of databases and ontologies that can be modelled. In this section, we show that while bilinear models are fully expressive, and can thus model any database, they are not suitable for modelling ontologies. This motivates the need for novel embedding methods, which are better suited at modelling ontologies; this will be the focus of the next section.

Let us consider rules of the following form:

 R(X,Y)→S(X,Y) (11)

Such rules are by far the most common type of rules that are encountered in applications (noting that bilinear models are limited to binary relations). Let us consider a bilinear model in which each relation name is associated with an matrix and a threshold . We then say that (11) is satisfied if for each , it holds that:

 (12)

where denotes the transpose of . It turns out that bilinear models are severely limited in how they can model sets of rules of the form (11). This limitation stems from the following result.

Proposition 1.

Suppose that (12) is satisfied for the matrices and some thresholds . Then there exists some such that .

If then the rule (11) must be satisfied trivially, in the sense that the following rule is also satisfied for the matrix and threshold :

 ⊤→S(X,Y)

Let us consider the case where . Note that the for the thresholds and we only need to consider the values -1 and 1 since other thresholds can always be simulated by rescaling the matrices and . Now assume that the following rules are given:

 R1(X,Y) →S(X,Y) ... Rk(X,Y) →S(X,Y)

By Proposition 1, we know that for there is some such that . If we thus have that either the rule or the rule is satisfied (depending on whether and on whether is 1 or -1). This means in particular that we can always find two rankings and such that and:

 ∀1≤i

This clearly puts drastic restrictions on the type of subsumption hierarchies that can be modelled using bilinear models. Moreover, these limitations carry over to DistMult and ComplEx, as these are particular types of bilinear models. Due to the close links between DistMult and SimplE, it is also easy to see that the latter model has the same limitations.

In fact, the use of different vectors for head and tail mentions of entities in the SimplE model leads to even further limitations. To illustrate this, let us consider a rule of the following form:

 R(X,Y)∧S(Y,Z)→T(X,Z) (13)

where we say that the SimplE representation defined by the vectors and corresponding thresholds satisfies (13) if for all entity vectors it holds that:

 ⟨eh,r,ft⟩≥λr∧⟨fh,ri,et⟩≥λri (14) ∧⟨fh,s,gt⟩≥λs∧⟨gh,si,ft⟩≥λsi ⇒⟨eh,t,gt⟩≥λt∧⟨gh,ti,et⟩≥λti

Then we can show the following result.

Proposition 2.

Suppose and define a SimplE representation satisfying (13). Then one of the following two rules is satisfied as well:

 R(X,Y)∧S(Y,Z)→⊥ (15) ⊤→T(X,Z) (16)

5 Relations as Arbitrary Convex Regions

We next identify the type of KBs that are properly captured by convex geometric models, in the sense that for each finite model of , there exists a convex geometric model such that . As we shall see, it turns out that not all the ontology languages introduced in Section 2 can be faithfully represented in this way. Nevertheless, we show that this semantics is general enough to capture important fragments of these families.

We start by presenting our positive result which shows that convex geometric models can capture KBs based on quasi-chained rules (QC). We say that an existential rule , defined as in (3) above, is quasi-chained if for all

 |(vars(B1)∪...∪vars(Bi−1))∩vars(Bi)|≤1

An ontology is quasi-chained if all its rules are either quasi-chained or quasi-chained negative constraints.

We note that quasi-chainedness is a natural and useful restriction. Quasi-chained rules are indeed closely related to the well-known chain-datalog fragment of datalog (??) in which important properties, e.g. reachability, are still expressible. Furthermore, prominent Horn description logics can be expressed using decidable fragments of quasi-chained existential rules. For example, ontologies222We assume they are in a suitable normal form (?) can be embedded into the guarded fragment (?) of QC existential rules. Further, QC existential rules subsume linear existential rules (?), which only allow rule bodies that consist of a single atom and capture a -ary extension of DL-Lite.

We next show that geometric models indeed properly capture quasi-chained ontologies.

Proposition 3.

Let , with a quasi-chained ontology, and let be a finite model of . Then has a convex geometric model such that .

To clarify the intuitions behind this proposition, we show how an -dimensional geometric model satisfying can be constructed, where . Let be an enumeration of the elements in , then for each , is defined as the vector in with value in the coordinate and in all others. Further, for each , we define as follows, where CH denotes the convex-hull:

 η(R)=CH{η(y1)⊕...⊕η(yk)|R(y1,...,yk)∈M} (17)

A proof that , and that satisfies Conditions 1 and 2 from Definition 3, is provided in the appendix.

For the next corollary we assume that the quasi-chained ontology belongs to fragments enjoying the finite model property, i.e. if a KB is satisfiable, it has a finite model, e.g. where is weakly-acyclic (?), guarded, linear, or a quasi-chained datalog program. It follows directly from Proposition 3.

Corollary 1.

Let with as above. It holds that is satisfiable iff has a convex geometric model.

A natural question is whether there is a way of defining a convex -dimensional geometric model for an considerably smaller than for some model . For the case of datalog rules, where , it turns out that this is in general not possible.

Proposition 4.

For each , there exists a KB with a datalog program, over a signature with constants and unary predicates such that does not have a convex geometric model in for .

To see this, consider the knowledge base with , for some , and consisting of the following rule

 A1(X)∧...∧An(X)→⊥ (18)

It is clear that is satisfiable. Now, let be an dimensional convex geometric model of . Clearly, for each , it holds that and thus . Using Helly’s Theorem333This theorem states that if are convex regions in , with , and each among these regions have a non-empty intersection, it holds that ., it follows that contains some point . Further, let be the extension of to defined by . Then contains which together with (18) implies that does not have a convex model. Thus, cannot be an dimensional convex geometric model , and the dimensionality of any convex model of has to be at least .

Note that the model that we constructed above is -dimensional, but the lower bound from Proposition 4 only states that at least dimensions are needed in general. In fact, it is easy to see that such an -dimensional convex geometric model indeed exists for datalog programs. In particular, let be the hyperplane defined by then clearly for every constant and . In other words, each is located in an dimensional space, and is a subset of an dimensional space.

Finally, the main remaining question is whether the restriction to QC rules is necessary. Unfortunately, the next example illustrates that if a KB contains rules that do not satisfy this restriction, it may not be possible to construct a convex geometric model.

Example 3.

Consider consisting of the following rule:

 R1(X,Y)∧R2(X,Y)→⊥

and let . Then clearly is a model of the knowledge base . Now suppose this KB had a convex geometric model . Let be an extension of to the fresh constant , defined by . Note that we then have:

 η∗(b)⊕η∗(b) =0.5(η(a1)⊕η(a1)+0.5(η(a2)⊕η(a2) =0.5(η(a1)⊕η(a2)+0.5(η(a2)⊕η(a1)

and thus, by the convexity of and , it follows that . This means that does not have a model, which contradicts the assumption that was a geometric model.

6 Extended Geometric Models

As shown in Section 5, there are knowledge bases which have a finite model but which do not have a convex geometric model. To deal with arbitrary knowledge bases, one possible approach is to simply drop the convexity requirement. In this section, we briefly explore another solution, based on the idea that for each relation symbol , we can consider a function which embeds -tuples into another vector space. This can be formalized as follows

Definition 4 (Extended convex geometric interpretation).

Let be a set of relation names and let be a set of objects. An -dimensional extended convex geometric interpretation of is a pair , where for each , is a mapping, for some , and assigns to each a convex region in and to each constant from a vector .

We can now adapt the definition of satisfaction of a ground atom as follows.

Definition 5 (Satisfaction of ground atoms).

Let be an extended convex geometric interpretation of . Let and let . We say that satisfies a ground atom , written , if .

The notion of extended convex geometric model is then defined as in Definition 3, by simply using extended convex geometric models instead of (standard) geometric models.

Note that we almost trivially have that every knowledge base which has a finite model also has an extended convex geometric model. Indeed, to construct such a model, we can choose for constants from arbitrarily, as long as if . We can then define as follows: if contains a ground atom such that , and otherwise. Finally we can define . It can be readily checked that the extended convex geometric interpretation which is constructed in this way is indeed an extended convex geometric model of .

The extended convex geometric model which is constructed in this way is uninteresting, however, as it does not allow us to use the geometric representations of the constants to induce any knowledge which is not already given in . Specifically, suppose and let be the extension of to , then for and , we have iff contains some atom such that . This means that in practice, we need to impose some restrictions on the functions . Note, however, that we cannot restrict to be linear, as that would lead to the same restrictions as we encountered for standard convex geometric models. For instance, it is easy to verify that the knowledge base from Example 3 cannot have an extended geometric model in which and are linear.

One possible alternative would be to encode each function as a neural network, but there are still several important open questions related to this choice. First, it is far from clear how we would then be able to check whether an extended convex geometric interpretation is a model of a given ontology. In contrast, for standard convex geometric interpretations, we can use standard linear programming techniques to check whether a given existential rule is satisfied. It is furthermore unclear which types of neural networks would be needed to guarantee that all types of existential rules can be captured.

7 Related Work

A number of approaches to KG completion have been proposed that are based on neural network architectures (???). Interestingly, some of these approaches can be seen as special cases of the extended convex geometric models we considered in Section 6. For example, in the E-MLP model (?), to predict whether is a valid triple, the concatenation of the vectors and is fed into a two-layer neural network.

Instead of constructing tuple representations from entity embeddings, some authors have also considered approaches that directly learn a vector space embedding of entity tuples (??). For each relation a vector can then be learned such that reflects the likelihood that a tuple represented by is an instance of . This model does not put any a priori restrictions on the kind of relations that can be modeled, although it is clearly not suitable for modelling rules (e.g. it is easy to see that this model carries over the limitations of bilinear models). Moreover, as enough information needs to be available about each tuple, this strategy has primarily been used for modelling knowledge extracted from text, where representations of word-tuples are learned from sentences that contain these words.

Note that KG embedding methods model relations in a soft way: their associated scoring function can be used to rank ground facts according to their likelihood of being correct, but no attempt is made at modelling the exact extension of relations. This means that logical dependencies among relations cannot be modeled, which makes such representations fundamentally different from the geometric representations that we have considered in this paper. Nonetheless, some authors have used logical rules to improve the predictions that are made in a KG completion setting. For example, in (?), a mixed integer programming formulation is used to combine the predictions made from a given KG embedding with a set of hard rules. Specifically, the aim of this approach is to determine the most plausible set of facts which is logically consistent with the given rules. Another strategy, used in (?), is to incorporate background knowledge in the loss function of the learning problem. Specifically, the authors propose to take advantage of relation inclusions, i.e. rules of the form , for learning better tuple embeddings. The main underlying idea is to translate such a rule to the soft constraint that should hold for each tuple . This is imposed in an efficient way by restricting tuple embeddings to vectors with non-negative coordinates and then requiring that for each coordinate of and corresponding coordinate of . However, this strategy cannot straightforwardly be generalized to other types of rules.

To overcome this shortcoming, neural network architectures dealing with arbitrary Datalog-like rules have been recently proposed (??). Other related approaches include (???). However, such methods essentially use neural network methods to simulate deductive inference, but do not explicitly model the extension of relations, and do not allow for the tight integration of induction and deduction that our framework supports. Moreover, these methods are aimed at learning (soft versions of) first-order rules from data, rather then constraining embeddings based on a given set of (hard) rules.

Within KR research, (?) recently made first steps towards the integration of ontological reasoning and deep learning, obtaining encouraging results. Indeed, the developed system was considerably faster than the state of the art RDFox (?), while retaining high-accuracy. Initial results have also been obtained in the use of ontological reasoning to derive human-interpretable explanations from the output of a neural network (?).

8 Conclusions and Future Work

We have studied knowledge base embeddings in which relations are represented as convex regions in a space of tuples. These tuples are simply represented as concatenations of the vector representations of the individual arguments. This means that they can be obtained using standard approaches for learning entity embeddings. Our main consideration in this paper was whether this approach allows us to faithfully encode symbolic knowledge about how different relations are related to each other. We have found that this is indeed the case for knowledge bases that are restricted to the important class of quasi-chained existential rules. We have also shown that this result can be generalized to arbitrary sets of existential rules with a finite model, by first applying an appropriate non-linear transformation to the tuple space.

We believe this paper provides an important step towards a comprehensive integration of neural embeddings and KR technologies. In particular, the proposed framework lays important foundations to further the current embedding-based approaches for completing knowledge bases, and more generally, to develop methods that combine deductive and inductive reasoning in a tighter way than current approaches. In particular, due to the nature of the proposed representations, it seems feasible to develop data-driven approaches for completing not only the data component of a knowledge base (as for knowledge graphs), but also for extending the input ontology itself. Our framework can also be seen as a relational extension of traditional conceptual spaces (?), which is an intermediate knowledge representation setting, siting in between neural network embeddings and symbolic knowledge bases.

As future work, the most natural next step is to develop practical methods for learning geometric models. From a theoretical point of view, one of the most important open problems is to characterize particular classes of extended convex geometric models, which are sufficiently expressive to model arbitrary existential rules (or interesting sub-classes). Indeed, the non-linear representation from Section 6 is too general to be practically useful, and we therefore need to characterize what types of knowledge bases can be captured by different kinds of simple neural network architectures. Finally, it would be interesting to extend our framework to model recently introduced ontology languages especially tailored for KGs (??), which include means for representing and reasoning about annotations on data and relations, capturing e.g. temporal provenance of data.

Acknowledgments

Víctor Gutiérrez-Basulto was supported by EU’s Horizon 2020 programme under the Marie Skłodowska-Curie grant 663830 and Steven Schockaert by ERC Starting Grant 637277 ‘FLEXILOG’.

References

• [Baader et al. 2017] Baader, F.; Horrocks, I.; Lutz, C.; and Sattler, U. 2017. An Introduction to Description Logic. Cambridge University Press.
• [Bollacker et al. 2008] Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; and Taylor, J. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1247–1250.
• [Bordes et al. 2013] Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Proc. NIPS. 2787–2795.
• [Calì, Gottlob, and Kifer 2013] Calì, A.; Gottlob, G.; and Kifer, M. 2013. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48:115–174.
• [Calì, Gottlob, and Lukasiewicz 2012] Calì, A.; Gottlob, G.; and Lukasiewicz, T. 2012. A general datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14:57–83.
• [Camacho-Collados, Pilehvar, and Navigli 2016] Camacho-Collados, J.; Pilehvar, M. T.; and Navigli, R. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240:36–64.
• [Carlson et al. 2010] Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka Jr., E. R.; and Mitchell, T. M. 2010. Toward an architecture for never-ending language learning. In Proc. AAAI, 1306–1313.
• [Demeester, Rocktäschel, and Riedel 2016] Demeester, T.; Rocktäschel, T.; and Riedel, S. 2016. Lifted rule injection for relation embeddings. In Proc. EMNLP, 1389–1399.
• [Dong et al. 2014] Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; and Zhang, W. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 601–610.
• [Fagin et al. 2005] Fagin, R.; Kolaitis, P. G.; Miller, R. J.; and Popa, L. 2005. Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1):89–124.
• [Gärdenfors 2000] Gärdenfors, P. 2000. Conceptual Spaces: The Geometry of Thought. MIT Press.
• [Gardner and Mitchell 2015] Gardner, M., and Mitchell, T. M. 2015. Efficient and expressive knowledge base completion using subgraph feature extraction. In Proc. of EMNLP-15, 1488–1498.
• [Hohenecker and Lukasiewicz 2017] Hohenecker, P., and Lukasiewicz, T. 2017. Deep learning for ontology reasoning. arXiv preprint arxiv:1705.10342.
• [Kazemi and Poole 2018] Kazemi, S. M., and Poole, D. 2018. SimplE embedding for link prediction in knowledge graphs. arXiv preprint arXiv:1802.04868.
• [Krötzsch et al. 2017] Krötzsch, M.; Marx, M.; Ozaki, A.; and Thost, V. 2017. Attributed description logics: Ontologies for knowledge graphs. In Proc. of ISWC-17.
• [Lao, Mitchell, and Cohen ] Lao, N.; Mitchell, T. M.; and Cohen, W. W. Random walk inference and learning in A large scale knowledge base. In Proc. of EMNLP-11.
• [Lin et al. 2015] Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI, 2181–2187.
• [Marx, Krötzsch, and Thost 2017] Marx, M.; Krötzsch, M.; and Thost, V. 2017. Logic on MARS: ontologies for generalised property graphs. In Proc. of IJCAI-17, 1188–1194.
• [Miller 1995] Miller, G. A. 1995. Wordnet: a lexical database for english. Communications of the ACM 38:39–41.
• [Minervini et al. 2017] Minervini, P.; Demeester, T.; Rocktäschel, T.; and Riedel, S. 2017. Adversarial sets for regularising neural link predictors. In Proc. of UAI-17.
• [Nenov et al. 2015] Nenov, Y.; Piro, R.; Motik, B.; Horrocks, I.; Wu, Z.; and Banerjee, J. 2015. Rdfox: A highly-scalable RDF store. In Proc. of ISWC-15, 3–20.
• [Nguyen et al. 2016] Nguyen, D. Q.; Sirts, K.; Qu, L.; and Johnson, M. 2016. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proc. of NAACL-HLT, 460–466.
• [Nguyen 2017] Nguyen, D. Q. 2017. An overview of embedding models of entities and relationships for knowledge base completion. arXiv preprint arxiv:1703.08098.
• [Nickel, Tresp, and Kriegel 2011] Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011. A three-way model for collective learning on multi-relational data. In Proc. ICML, 809–816.
• [Niepert 2016] Niepert, M. 2016. Discriminative gaifman models. In Proc. of NIPS-16, 3405–3413.
• [Riedel et al. 2013] Riedel, S.; Yao, L.; McCallum, A.; and Marlin, B. M. 2013. Relation extraction with matrix factorization and universal schemas. In Proc. HLT-NAACL, 74–84.
• [Rocktäschel and Riedel 2017] Rocktäschel, T., and Riedel, S. 2017. End-to-end differentiable proving. In Proc. NIPS, 3791–3803.
• [Rosseel 2002] Rosseel, Y. 2002. Mixture models of categorization. Journal of Mathematical Psychology 46(2):178 – 210.
• [Sarker et al. 2017] Sarker, M. K.; Xie, N.; Doran, D.; Raymer, M.; and Hitzler, P. 2017. Explaining trained neural networks with semantic web technologies: First steps. In Proc. of NeSy-17.
• [Shmueli 1987] Shmueli, O. 1987. Decidability and expressiveness of logic queries. In Proc. of PODS-87, 237–249.
• [Socher et al. 2013] Socher, R.; Chen, D.; Manning, C. D.; and Ng, A. 2013. Reasoning with neural tensor networks for knowledge base completion. In Proc. NIPS, 926–934.
• [Sourek et al. 2017] Sourek, G.; Svatos, M.; Zelezný, F.; Schockaert, S.; and Kuzelka, O. 2017. Stacked structure learning for lifted relational neural networks. In Proc. ILP, 140–151.
• [Speer, Chin, and Havasi 2017] Speer, R.; Chin, J.; and Havasi, C. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proc. AAAI, 4444–4451.
• [Toutanova et al. 2015] Toutanova, K.; Chen, D.; Pantel, P.; Poon, H.; Choudhury, P.; and Gamon, M. 2015. Representing text for joint embedding of text and knowledge bases. In Proc. of EMNLP-15, 1499–1509.
• [Trouillon et al. 2016] Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In Proc. ICML, 2071–2080.
• [Turney 2005] Turney, P. D. 2005. Measuring semantic similarity by latent relational analysis. In Proc. IJCAI, 1136–1141.
• [Ullman and Gelder 1988] Ullman, J. D., and Gelder, A. V. 1988. Parallel complexity of logical query programs. Algorithmica 3:5–42.
• [Vrandečić and Krötzsch 2014] Vrandečić, D., and Krötzsch, M. 2014. Wikidata: a free collaborative knowledge base. Communications of the ACM 57:78–85.
• [Wang and Cohen 2016] Wang, W. Y., and Cohen, W. W. 2016. Learning first-order logic embeddings via matrix factorization. In Proc. of IJCAI-16, 2132–2138.
• [Wang et al. 2014] Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowledge graph embedding by translating on hyperplanes. In AAAI, 1112–1119.
• [Wang et al. 2017] Wang, Q.; Mao, Z.; Wang, B.; and Guo, L. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12):2724–2743.
• [Wang, Gemulla, and Li 2018] Wang, Y.; Gemulla, R.; and Li, H. 2018. On multi-relational link prediction with bilinear models. arXiv preprint arXiv:1709.04808.
• [Wang, Wang, and Guo 2015] Wang, Q.; Wang, B.; and Guo, L. 2015. Knowledge base completion using embeddings and rules. In Proc. IJCAI, 1859–1866.
• [Xiao et al. 2017] Xiao, H.; Huang, M.; Meng, L.; and Zhu, X. 2017. Ssp: Semantic space projection for knowledge graph embedding with text descriptions. In Proc. AAAI, volume 17, 3104–3110.
• [Xie et al. 2016] Xie, R.; Liu, Z.; Jia, J.; Luan, H.; and Sun, M. 2016. Representation learning of knowledge graphs with entity descriptions. In Proc. of AAAI, 2659–2665.
• [Yang et al. 2015] Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proc. of ICLR-15.
• [Zhong et al. 2015] Zhong, H.; Zhang, J.; Wang, Z.; Wan, H.; and Chen, Z. 2015. Aligning knowledge and text embeddings by entity descriptions. In EMNLP, 267–272.

Appendix A Appendix

a.1 Proof of Proposition 1

Let us write for the element on the row and column of , and similar for .

Lemma 1.

Suppose and are matrices for which (12) is satisfied. Let be such that . For each such that it holds that .

Proof.

Assume that there exists some index such that but . Then we show that (12) cannot be satisfied. We define and as follows:

 ei ={0if i≠k1otherwise fi =⎧⎪ ⎪⎨⎪ ⎪⎩0if i∉{l,m}Krklif i=lLskmif i=m

Then we have:

 eTMrf≥λr =K+L⋅rkmskm eTMsf =L

We can choose for an arbitrary value such that , and then choose such that . It follows that (12) is not satisfied for and . ∎

Lemma 2.

Suppose and are matrices for which (12) is satisfied. Let be such that . For each such that it holds that .

Proof.

Entirely analogous to the proof of Lemma 1. ∎

Lemma 3.

Suppose the indices are such that for all , and assume . Then it holds that and cannot satisfy (12).

Proof.

Note that the assumptions imply that either or . Let us assume for instance that ; the case where is entirely analogous. We define and as follows:

 ei ={0if i≠k1otherwise fi =⎧⎪ ⎪⎨⎪ ⎪⎩0if i∉{l,m}Ksklif i=lLskmif i=m

Then we have:

 eTMrf =K⋅rklskl+L⋅rkmskm eTMsf =K+L

Using the assumption we made that we can choose

 K=λrsklrkl−L⋅rkmsklskmrkl

which guarantees . To guarantee that , for this particular choice of , we need to ensure:

 λs>λrsklrkl−L⋅rkmsklskmrkl+L

Noting that since we assumed , this is either equivalent to one of

 L <λs−λrsklrkl1−rkmsklskmrkl L >λs−λrsklrkl1−rkmsklskmrkl

depending on the sign of