Query2box: Reasoning over KnowledgeGraphs in Vector Space using Box Embeddings

Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings


Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions () and existential quantifiers (). Handling queries with logical disjunctions () remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with , , and operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with , , in a scalable manner. We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.


1 Introduction

Knowledge graphs (KGs) capture different types of relationships between entities, e.g., Canada Hinton. Answering arbitrary logical queries, such as “where did Canadian citizens with Turing Award graduate?”, over such KGs is a fundamental task in question answering, knowledge base reasoning, as well as AI more broadly.

First-order logical queries can be represented as Directed Acyclic Graphs (DAGs) (Fig. 1(A)) and be reasoned according to the DAGs to obtain a set of answers (Fig. 1(C)). While simple and intuitive, such approach has many drawbacks: (1) Computational complexity of subgraph matching is exponential in the query size, and thus cannot scale to modern KGs; (2) Subgraph matching is very sensitive as it cannot correctly answer queries with missing relations. To remedy (2) one could impute missing relations (Koller et al., 2007; Džeroski, 2009; De Raedt, 2008; Nickel et al., 2016) but that would only make the KG denser, which would further exacerbate issue (1) (Dalvi and Suciu, 2007; Krompaß et al., 2014).

Figure 1: Query2Box reasoning framework. (A) A given conjunctive query “Where did Canadian citizens with Turing Award graduate?” can be represented with a dependency graph. (B) Computation graph specifies the reasoning procedure to obtain a set of answers for the query in (A). (C) Example knowledge graph, where green nodes/entities denote answers to the query. Bold arrows indicate subgraphs that match the query graph in (A). (D) In query2box, nodes of the KG are embedded as points in the vector space. We then obtain query embedding according to the computation graph (B) as a sequence of box operations: start with two nodes TuringAward and Canada and apply Win and Citizen projection operators, followed by an intersection operator (denoted as a shaded intersection of yellow and orange boxes) and another projection operator. The final embedding of the query is a green box and query’s answers are the entities inside the box.

Recently, a promising alternative approach has emerged, where logical queries as well as KG entities are embedded into a low-dimensional vector space such that entities that answer the query are embedded close to the query (Guu et al., 2015; Hamilton et al., 2018; Das et al., 2017). Such approach robustly handles missing relations (Hamilton et al., 2018) and is also orders of magnitude faster, as answering an arbitrary logical query is reduced to simply identifying entities nearest to the embedding of the query in the vector space.

However, prior work embeds a query into a single point in the vector space. This is problematic because answering a logical query requires modeling a set of active entities while traversing the KG (Fig. 1(C)), and how to effectively model a set with a single point is unclear. Furthermore, it is also unnatural to define logical operators (e.g., set intersection) of two points in the vector space. Another fundamental limitation of prior work is that it can only handle conjunctive queries, a subset of first-order logic that only involves conjunction () and existential quantifier (), but not disjunction (). It remains an open question how to handle disjunction effectively in the vector space.

Here we present query2box, an embedding-based framework for reasoning over KGs that is capable of handling arbitrary Existential Positive First-order (EPFO) logical queries (i.e., queries that include any set of , , and ) in a scalable manner. First, to accurately model a set of entities, our key idea is to use a closed region rather than a single point in the vector space. Specifically, we use a box (axis-aligned hyper-rectangle) to represent a query (Fig. 1(D)). This provides three important benefits: (1) Boxes naturally model sets of entities they enclose; (2) Logical operators (e.g., set intersection) can naturally be defined over boxes similarly as in Venn diagrams (Venn, 1880); (3) Executing logical operators over boxes results in new boxes, which means that the operations are closed; thus, logical reasoning can be efficiently performed in query2box by iteratively updating boxes according to the query computation graph (Fig. 1(B)(D)).

We show that query2box can naturally handle conjunctive queries. We first prove a negative result that embedding EPFO queries to only single points or boxes is intractable as it would require embedding dimension proportional to the number of KG entities. However, we provide an elegant solution, where we transform a given EPFO logical query into a Disjunctive Normal Form (DNF) (Davey and Priestley, 2002), i.e., disjunction of conjunctive queries. Given any EPFO query, query2box represents it as a set of individual boxes, where each box is obtained for each conjunctive query in the DNF. We then return nearest neighbor entities to any of the boxes as the answers to the query. This means that to answer any EPFO query we first answer individual conjunctive queries and then take the union of the answer entities.

We evaluate query2box on three standard KG benchmarks and show: (1) query2box provides strong generalization as it can answer complex queries; (2) query2box can generalize to new logical query structures that it has never seen during training; (3) query2box is able to implicitly impute missing relations as it can answer any EPFO query with high accuracy even when relations involving answering the query are missing in the KG; (4) query2box provides up to 25% relative improvement in accuracy of answering EPFO queries over state-of-the-art baselines.

2 Further Related Work

Most related to our work are embedding approaches for multi-hop reasoning over KGs (Bordes et al., 2013; Das et al., 2017; Guu et al., 2015; Hamilton et al., 2018). Crucial difference is that we provide a way to tractably handle a larger subset of the first-order logic (EPFO queries vs. conjunctive queries) and that we embed queries as boxes, which provides better accuracy and generalization.

Second line of related work is on structured embeddings, which associate images, words, sentences, or knowledge base concepts with geometric objects such as regions (Erk, 2009; Vilnis et al., 2018; Li et al., 2019), densities (Vilnis and McCallum, 2014; He et al., 2015; Athiwaratkun and Wilson, 2018), and orderings (Vendrov et al., 2016; Lai and Hockenmaier, 2017; Li et al., 2017). While the above work uses geometric objects to model individual entities and their pairwise relations, we use the geometric objects to model sets of entities and reason over those sets. In this sense our work is also related to classical Venn Diagrams (Venn, 1880), where boxes are essentially the Venn Diagrams in vector space, but our boxes and entity embeddings are jointly learned, which allows us to reason over incomplete KGs.

Box embeddings have also been used to model hierarchical nature of concepts in an ontology with uncertainty (Vilnis et al., 2018; Li et al., 2019). While our work is also based on box embeddings we employ them for logical reasoning in massive heterogeneous knowledge graphs.

3 Query2Box: Logical Reasoning over KGs in Vector Space

Here we present the query2box, where we will define an objective function that allows us to learn embeddings of entities in the KG, and at the same time also learn parameterized geometric logical operators over boxes. Then given an arbitrary EPFO query (Fig. 1(A)), we will identify its computation graph (Fig. 1(B)), and embed the query by executing a set of geometric operators over boxes (Fig. 1(D)). Entities that are enclosed in the final box embedding are returned as answers to the query (Fig. 1(D)).

In order to train our system, we generate a set of queries together with their answers at training time and then learn entity embeddings and geometric operators such that queries can be accurately answered. We show in the following sections that our approach is able to generalize to queries and logical structures never seen during training. Furthermore, as we show in experiments, our approach is able to implicitly impute missing relations and answer queries that would be impossible to answer with traditional graph traversal methods.

In the following we first only consider conjunctive queries (conjunction and existential operator) and then we extend our method to also include disjunction.

3.1 Knowledge Graphs and Conjunctive Queries

We denote a KG as , where represents an entity, and is a binary function , indicating whether the relation holds between a pair of entities or not. In the KG, such binary output indicates the existence of the directed edge between a pair of entities, i.e., iff True.

Conjunctive queries are a subclass of the first-order logical queries that use existential () and conjunction () operations. They are formally defined as follows.


where represents non-variable anchor entity, are existentially quantified bound variables, is the target variable. The goal of answering the logical query is to find a set of entities such that iff True. We call the denotation set (i.e., answer set) of query .

As shown in Fig. 1(A), the dependency graph is a graphical representation of conjunctive query , where nodes correspond to variable or non-variable entities in and edges correspond to relations in . In order for the query to be valid, the corresponding dependency graph needs to be a Directed Acyclic Graph (DAG), with the anchor entities as the source nodes of the DAG and the query target as the unique sink node (Hamilton et al., 2018).

From the dependency graph of query , one can also derive the computation graph, which consists of two types of directed edges that represent operators over sets of entities:

  • Projection: Given a set of entities , and relation , this operator obtains , where .

  • Intersection: Given a set of entity sets , this operator obtains

For a given query , the computation graph specifies the procedure of reasoning to obtain a set of answer entities, i.e., starting from a set of anchor nodes, the above two operators are applied iteratively until the unique sink target node is reached. The entire procedure is analogous to traversing KGs following the computation graph (Guu et al., 2015).

3.2 Reasoning over Sets of Entities Using Box Embeddings

So far we have defined conjunctive queries as computation graphs that can be executed directly over the nodes and edges in the KG. Now, we define logical reasoning in the vector space. Our intuition follows Fig. 1: Given a complex query, we shall decompose it into a sequence of logical operations, and then execute these operations in the vector space. This way we will obtain the embedding of the query, and answers to the query will be entities that are enclosed in the final query embedding box.

In the following, we detail our two methodological advances: (1) the use of box embeddings to efficiently model and reason over sets of entities in the vector space, and (2) how to tractably handle disjunction operator (), expanding the class of first-order logic that can be modeled in the vector space (Section 3.3).

Box embeddings. To efficiently model a set of entities in the vector space, we use boxes (i.e., axis-aligned hyper-rectangles). The benefit is that unlike a single point, the box has the interior; thus, if an entity is in a set, it is natural to model the entity embedding to be a point inside the box. Formally, we operate on , and define a box in by as:


where is element-wise inequality, is the center of the box, and is the positive offset of the box, modeling the size of the box. Each entity in KG is assigned a single vector (i.e., a zero-size box), and the box embedding models , i.e., a set of entities whose vectors are inside the box. For the rest of the paper, we use the bold face to denote the embedding, e.g., embedding of is denoted by .

Our framework reasons over KGs in the vector space following the computation graph of the query, as shown in Fig. 1(D): we start from the initial box embeddings of the source nodes (anchor entities) and sequentially update the embeddings according to the logical operators. Below, we describe how we set initial box embeddings for the source nodes, as well as how we model projection and intersection operators (defined in Sec. 3.1) as geometric operators that operate over boxes. After that, we describe our entity-to-box distance function and the overall objective that learns embeddings as well as the geometric operators.

Initial boxes for source nodes. Each source node represents an anchor entity , which we can regard as a set that only contains the single entity. Such a single-element set can be naturally modeled by a box of size/offset zero centered at . Formally, we set the initial box embedding as , where is the anchor entity vector and is a -dimensional all-zero vector.

Figure 2: The geometric intuition of the two operations and distance function in query2box. (A) Projection generates a larger box with a translated center. (B) Intersection generates a smaller box lying inside the given set of boxes. (C) Distance is the weighted sum of and , where the latter is weighted less.

Geometric projection operator. We associate each relation with relation embedding with . Given an input box embedding , we model the projection by , where we sum the centers and sum the offsets. This gives us a new box with the translated center and larger offset because , as illustrated in Fig. 2(A). The adaptive box size effectively models a different number of entities/vectors in the set.

Geometric intersection operator. We model the intersection of a set of box embeddings as , which is calculated by performing attention over the box centers (Bahdanau et al., 2015) and shrinking the box offset using the sigmoid function:

where is the dimension-wise product, is the Multi-Layer Perceptron, is the sigmoid function, is the permutation-invariant deep architecture (Zaheer et al., 2017), and both and are applied in a dimension-wise manner. Following Hamilton et al. (2018), we model all the deep sets by , where all the hidden dimensionalities of the two MLPs are the same as the input dimensionality. The intuition behind our geometric intersection is to generate a smaller box that lies inside a set of boxes, as illustrated in Fig. 2(B).1 Different from the generic deep sets to model the intersection (Hamilton et al., 2018), our geometric intersection operator effectively constrains the center position and models the shrinking set size.

Entity-to-box distance. Given a query box and an entity vector , we define their distance as


where , and is a fixed scalar, and

As illustrated in Fig. 2(C), corresponds to the distance between the entity and closest corner/side of the box. Analogously, corresponds to the distance between the center of the box and its side/corner (or the entity itself if the entity is inside the box).

The key here is to downweight the distance inside the box by using . This means that as long as entity vectors are inside the box, we regard them as “close enough” to the query center (i.e., is 0, and is scaled by ). When , reduces to the ordinary distance, i.e., , which is used by the conventional TransE (Bordes et al., 2013) as well as prior query embedding methods (Guu et al., 2015; Hamilton et al., 2018).

Training objective. Our next goal is to learn entity embeddings as well as geometric projection and intersection operators.

Given a training set of queries and their answers, we optimize a negative sampling loss (Mikolov et al., 2013) to effectively optimize our distance-based model (Sun et al., 2019):


where represents a fixed scalar margin, is a positive entity (i.e., answer to the query ), and is the -th negative entity (non-answer to the query ) and is the number of negative entities.

3.3 Tractable Handling of Disjunction Using Disjunctive Normal Form

So far we have focused on conjunctive queries, and our aim here is to tractably handle in the vector space a wider class of logical queries, called Existential Positive First-order (EPFO) queries (Dalvi and Suciu, 2012) that involve in addition to and . We specifically focus on EPFO queries whose computation graphs are a DAG, same as that of conjunctive queries (Section 3.1), except that we now have an additional type of directed edge, called union defined as follows:

  • Union: Given a set of entity sets , this operator obtains

A straightforward approach here would be to define another geometric operator for union and embed the query as we did in the previous sections. An immediate challenge for our box embeddings is that boxes can be located anywhere in the vector space, so their union would no longer be a simple box. In other words, union operation over boxes is not closed.

Theoretically, we prove a general negative result that holds for any embedding-based method that embeds query into and uses some distance function to retrieve entities, i.e., iff . Here, is the distance between entity and query embeddings, e.g., or , and is a fixed threshold.

Theorem 1.

Consider any conjunctive queries whose denotation sets are disjoint with each other, . Let be the VC dimension of the function class , where represents the query embedding space and is the sign function. Then, we need to model any EPFO query, i.e., is satisfied for every EPFO query .

The proof is provided in Appendix A, where the key is that with the introduction of the union operation any subset of denotation sets can be the answer, which forces us to model the powerset in a vector space.

For a real-world KG, there are conjunctive queries with non-overlapping answers. For example, in the commonly-used FB15k dataset (Bordes et al., 2013), derived from the Freebase (Bollacker et al., 2008), we find = 13,365, while is 14,951 (see Appendix B for the details).

Theorem 1 shows that in order to accurately model any EPFO query with the existing framework, the complexity of the distance function measured by the VC dimension needs to be as large as the number of KG entities. This implies that if we use common distance functions based on hyper-plane, Euclidean sphere, or axis-aligned rectangle,2 their parameter dimensionality needs to be , which is for real KGs we are interested in. In other words, the dimensionality of the logical query embeddings needs to be , which is not low-dimensional; thus not scalable to large KGs and not generalizable in the presence of unobserved KG edges.

To rectify this issue, our key idea is to transform a given EPFO query into a Disjunctive Normal Form (DNF) (Davey and Priestley, 2002), i.e., disjunction of conjunctive queries, so that union operation only appears in the last step. Each of the conjunctive queries can then be reasoned in the low-dimensional space, after which we can aggregate the results by a simple and intuitive procedure. In the following, we describe the transformation to DNF and the aggregation procedure.

Figure 3: Illustration of converting a computation graph of an EPFO query into an equivalent computation graph of the Disjunctive Normal Form.

Transformation to DNF. Any first-order logic can be transformed into the equivalent DNF (Davey and Priestley, 2002). We perform such transformation directly in the space of computation graph, i.e., moving all the edges of type “union” to the last step of the computation graph. Let be the computation graph for a given EPFO query , and let be a set of nodes whose in-coming edges are of type “union”. For each , define as a set of its parent nodes. We first generate different computation graphs as follows, each with different choices of in the first step.

  1. For every , select one parent node .

  2. Remove all the edges of type ‘union.’

  3. Merge and , while retaining all other edge connections.

We then combine the obtained computation graphs as follows to give the final equivalent computation graph.

  1. Convert the target sink nodes of all the obtained computation graphs into the existentially quantified bound variables nodes.

  2. Create a new target sink node , and draw directed edges of type “union” from all the above variable nodes to the new target node.

An example of the entire transformation procedure is illustrated in Fig. 3. By the definition of the union operation, our procedure gives the equivalent computation graph as the original one. Furthermore, as all the union operators are removed from , all of these computation graphs represent conjunctive queries, which we denote as . We can then apply existing framework to obtain a set of embeddings for these conjunctive queries as .

Aggregation. Next we define the distance function between the given EPFO query and an entity . Since is logically equivalent to , we can naturally define the aggregated distance function using the box distance :


where is parameterized by the EPFO query . When is a conjunctive query, i.e., , . For , takes the minimum distance to the closest box as the distance to an entity. This modeling aligns well with the union operation; an entity is inside the union of sets as long as the entity is in one of the sets. Note that our DNF-query rewriting scheme is general and is able to extend any method that works for conjunctive queries (e.g., (Hamilton et al., 2018)) to handle more general class of EPFO queries.

Computational complexity. The computational complexity of answering an EPFO query with our framework is equal to that of answering the conjunctive queries. In practice, might not be so large, and all the computations can be parallelized. Furthermore, answering each conjunctive query is very fast as it requires us to execute a sequence of simple box operations (each of which takes constant time) and then perform a range search (Bentley and Friedman, 1979) in the embedding space, which can also be done in constant time using techniques based on Locality Sensitive Hashing (Indyk and Motwani, 1998).

4 Experiments

Our goal in the experiment section is to evaluate the performance of query2box on discovering answers to complex logical queries that cannot be obtained by traversing the incomplete KG. This means, we will focus on answering queries where one or more missing edges in the KG have to be successfully predicted in order to obtain the additional answers.

Figure 4: Query structures considered in the experiments, where anchor entities and relations are to be specified to instantiate logical queries. Naming for each query structure is provided under each subfigure, where ‘p’, ‘i’, and ‘u’ stand for ‘projection’, ‘intersection’, and ‘union’, respectively. Models are trained on the first 5 query structures, and evaluated on all 9 query structures. For example, “3p” is a path query of length three, and “2i” is an intersection of cardinality two.
Dataset 1p 2p 3p 2i 3i ip pi 2u up
FB15k 10.8 255.6 250.0 90.3 64.1 593.8 190.1 27.8 227.0
FB15k-237 13.3 131.4 215.3 69.0 48.9 593.8 257.7 35.6 127.7
NELL995 8.5 56.6 65.3 30.3 15.9 310.0 144.9 14.4 62.5
Table 1: Average number of answer entities of test queries with missing edges grouped by different query structures (for a KG with 10% edges missing).

4.1 Knowledge Graphs and Query Generation

We perform experiments on three standard KG benchmarks, FB15k (Bordes et al., 2013), FB15k-237 (Toutanova and Chen, 2015), and NELL995 (Xiong et al., 2017) (see Appendix E for NELL995 pre-processing details). Dataset statistics are summarized in Table 5 in Appendix F.

We follow the standard evaluation protocol in KG literture: Given the standard split of edges into training, test, and validation sets, we first augment the KG to also include inverse relations and effectively double the number of edges in the graph. We then create three graphs: , which only contains training edges and we use this graph to train node embeddings as well as box operators. We then also generate two bigger graphs: , which contains plus the validation edges, and , which includes as well as the test edges.

We consider 9 kinds of diverse query structures shown and named in Fig. 4. We use 5 query structures for training and then evaluate on all the 9 query structures. We refer the reader to Appendix D for full details on query generation and Table 6 in Appendix F for statistics of the generated logical queries. Given a query , let , , and denote a set of answer entities obtained by running subgraph matching of on , , and , respectively. At the training time, we use as positive examples for the query and other random entities as negative examples. However, at the test/validation time we proceed differently. Note that we focus on answering queries where generalization performance is crucial and at least one edge needs to be imputed in order to answer the queries. Thus, rather than evaluating a given query on the full validation (or test) set () of answers, we validate the method only on answers that include missing relations. Given how we constructed , we have and thus we evaluate the method on to tune hyper-parameters and then report results identifying answer entities in . This means we always evaluate on queries/entities that were not part of the training set and the method has not seen them before. Furthermore, for these queries, traditional graph traversal techniques would not be able to find the answers (due to missing relations).

Table 1 shows the average number of answer entities for different query structures. We observe that complex logical queries (especially 2p, 3p, ip, pi, up) indeed require modeling a much larger number of answer entities (often more than 10 times) than the simple 1p queries do. Therefore, we expect our box embeddings to work particularly well in handling complex queries with many answer entities.3

4.2 Evaluation Protocol

Given a test query , for each of its non-trivial answers , we use in Eq. 3 to rank among . Denoting the rank of by , we then calculate evaluation metrics for answering query , such as Mean Reciprocal Rank (MRR) and Hits at (H@):


where for MRR, and for H@.

We then average Eq. 6 over all the queries within the same query structure,4 and report the results separately for different query structures. The same evaluation protocol is applied to the validation stage except that we evaluate on rather than .

4.3 Baseline and Model Variants

We compare our framework query2box against the state-of-the-art gqe (Hamilton et al., 2018). gqe embeds a query to a single vector, and models projection and intersection operators as translation and deep sets (Zaheer et al., 2017), respectively. The distance is used as the distance between query and entity vectors. For a fair comparison, we also compare with gqe-double (gqe with doubled embedding dimensionality) so that query2box and gqe-double have the same amount of parameters. Refer to Appendix G for the model hyper-parameters used in our experiments. Although the original gqe cannot handle EPFO queries, we apply our DNF-query rewriting strategy and in our evaluation extend gqe to handle general EPFO queries as well. Furthermore, we perform extensive ablation study by considering several variants of query2box (abbreviated as q2b). We list our method as well as its variants below.

  • q2b (our method): The box embeddings are used to model queries, and the attention mechanism is used for the intersection operator.

  • q2b-avg: The attention mechanism for intersection is replaced with averaging.

  • q2b-deepsets: The attention mechanism for intersection is replaced with the deep sets.

  • q2b-avg-1p: The variant of q2b-avg that is trained with only 1p queries (see Fig. 4); thus, logical operators are not explicitly trained.

  • q2b-sharedoffset; The box offset is shared across all queries (every query is represented by a box with the same trainable size).

Method Avg 1p 2p 3p 2i 3i ip pi 2u up
q2b 0.484 0.786 0.413 0.303 0.593 0.712 0.211 0.397 0.608 0.33
gqe 0.386 0.636 0.345 0.248 0.515 0.624 0.151 0.310 0.376 0.273
gqe-double 0.384 0.630 0.346 0.250 0.515 0.611 0.153 0.320 0.362 0.271
q2b 0.268 0.467 0.24 0.186 0.324 0.453 0.108 0.205 0.239 0.193
gqe 0.228 0.402 0.213 0.155 0.292 0.406 0.083 0.17 0.169 0.163
gqe-double 0.23 0.405 0.213 0.153 0.298 0.411 0.085 0.182 0.167 0.16
q2b 0.306 0.555 0.266 0.233 0.343 0.48 0.132 0.212 0.369 0.163
gqe 0.247 0.418 0.228 0.205 0.316 0.447 0.081 0.186 0.199 0.139
gqe-double 0.248 0.417 0.231 0.203 0.318 0.454 0.081 0.188 0.2 0.139
Table 2: H@3 results of query2box vs. gqe on FB15k, FB15k-237 and NELL995.
Method Avg 1p 2p 3p 2i 3i ip pi 2u up
q2b 0.484 0.786 0.413 0.303 0.593 0.712 0.211 0.397 0.608 0.330
q2b-avg 0.468 0.779 0.407 0.300 0.577 0.673 0.199 0.345 0.607 0.326
q2b-deepsets 0.467 0.755 0.407 0.294 0.588 0.699 0.197 0.378 0.562 0.324
q2b-avg-1p 0.385 0.812 0.262 0.173 0.463 0.529 0.126 0.263 0.653 0.187
q2b-sharedoffset 0.372 0.684 0.335 0.232 0.442 0.559 0.144 0.282 0.417 0.252
q2b 0.268 0.467 0.24 0.186 0.324 0.453 0.108 0.205 0.239 0.193
q2b-avg 0.249 0.462 0.242 0.182 0.278 0.391 0.101 0.158 0.236 0.189
q2b-deepsets 0.259 0.458 0.243 0.186 0.303 0.432 0.104 0.187 0.231 0.190
q2b-avg-1p 0.219 0.457 0.193 0.132 0.251 0.319 0.083 0.142 0.241 0.152
q2b-sharedoffset 0.207 0.391 0.199 0.139 0.251 0.354 0.082 0.154 0.15 0.142
q2b 0.306 0.555 0.266 0.233 0.343 0.480 0.132 0.212 0.369 0.163
q2b-avg 0.283 0.543 0.250 0.228 0.300 0.403 0.116 0.188 0.36 0.161
q2b-deepsets 0.293 0.539 0.26 0.231 0.317 0.467 0.11 0.202 0.349 0.16
q2b-avg-1p 0.274 0.607 0.229 0.182 0.277 0.315 0.097 0.18 0.443 0.133
q2b-sharedoffset 0.237 0.436 0.219 0.201 0.278 0.379 0.096 0.174 0.217 0.137
Table 3: H@3 results of query2box vs. several variants on FB15k, FB15k-237 and NELL995.

4.4 Main Results

We start by comparing our q2b with state-of-the-art query embedding method gqe (Hamilton et al., 2018) on FB15k, FB15k-237, and NELL995. As listed in Tables 2, our method significantly and consistently outperforms the state-of-the-art baseline across all the query structures, including those not seen during training as well as those with union operations. On average, we obtain 9.8% (25% relative), 3.8% (15% relative), and 5.9% (24% relative) higher H@3 than the best baselines on FB15k, FB15k-237, and NELL995, respectively. Notice that naïvely increasing embedding dimensionality in gqe yields limited performance improvement. Our q2b is able to effectively model a large set of entities by using the box embedding, and achieves a significant performance gain compared with gqe-double (with same number of parameters) that represents queries as point vectors. Also notice that q2b performs well on new queries with the same structure as the training queries as well as on new query structures never seen during training, which demonstrates that q2b generalizes well within and beyond query structures.

We also conduct extensive ablation studies (Tables 3). We summarize the results as follows:

Importance of attention mechanism. First, we show that our modeling of intersection using the attention mechanism is important. Given a set of box embeddings , q2b-avg is the most naïve way to calculate the center of the resulting box embedding while q2b-deepsets is too flexible and neglects the fact that the center should be a weighted average of . Compared with the two methods, q2b achieves better performance in answering queries that involve intersection operation, e.g., 2i, 3i, pi, ip. Specifically, on FB15k-237, q2b obtains more than 4% and 2% absolute gain in H@3 compared to q2b-avg and q2b-deepsets, respectively.

Necessity of training on complex queries. Second, we observe that explicitly training on complex logical queries beyond one-hop path queries (1p in Fig. 4) improves the reasoning performance. Although q2b-avg-1p is able to achieve strong performance on 1p and 2u, where answering 2u is essentially answering two 1p queries with an additional minimum operation (see Eq. 5 in Section 3.3), q2b-avg-1p fails miserably in answering other types of queries involving logical operators. On the other hand, other methods (q2b, q2b-avg, and q2b-deepsets) that are explicitly trained on the logical queries achieve much higher accuracy, with up to 10% absolute average improvement of H@3 on FB15k.

Adaptive box size for different queries. Third, we investigate the importance of learning adaptive offsets (box size) for different queries. q2b-sharedoffset is a variant of our q2b where all the box embeddings share the same learnable offset. q2b-sharedoffset does not work well on all types of queries. This is most likely because different queries have different numbers of answer entities, and the adaptive box size enables us to better model it. In fact, we find that box offset varies significantly across different relations, and one-to-many relations tend to have larger offset embeddings (see Appendix H for the details).

5 Conclusion

In this paper we proposed a reasoning framework called query2box that can effectively model and reason over sets of entities as well as handle EPFO queries in a vector space. Given a logical query, we first transform it into DNF, embed each conjunctive query into a box, and output entities closest to their nearest boxes. Our approach is capable of handling all types of EPFO queries scalably and accurately. Experimental results on standard KGs demonstrate that query2box significantly outperforms the existing work in answering diverse logical queries.


We thank William Hamilton, Rex Ying, and Jiaxuan You for their helpful discussion. W.H is supported by Funai Overseas Scholarship and Masason Foundation Fellowship. J.L is a Chan Zuckerberg Biohub investigator. We gratefully acknowledge the support of DARPA under Nos. FA865018C7880 (ASED), N660011924033 (MCS); ARO under Nos. W911NF-16-1-0342 (MURI), W911NF-16-1-0171 (DURIP); NSF under Nos. OAC-1835598 (CINES), OAC-1934578 (HDR); Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, Chan Zuckerberg Biohub, JD.com, Amazon, Boeing, Docomo, Huawei, Hitachi, Observe, Siemens, UST Global.

The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied, of DARPA, NIH, ARO, or the U.S. Government.

Appendix A Proof of Theorem 1


To model any EPFO query, we need to at least model a subset of EPFO queries , where the corresponding denotation sets are . For the sake of modeling , without loss of generality, we consider assigning a single entity embedding to all , so there are kinds of entity vectors, . To model all queries in , it is necessary to satisfy the following.


where is the embedding of query . Eq. 7 means that we can learn the kinds of entity vectors such that for every query in , we can obtain its embedding to model the corresponding set using the distance function. Notice that this is agnostic to the specific algorithm to embed query into ; thus, our result is generally applicable to any method that embeds the query into a single vector.

Crucially, satisfying Eq. 7 is equivalent to being able to shutter , i.e., any binary labeling of the points can be perfectly fit by some classifier in the function class. To sum up, in order to model any EPFO query, we need to at least model any query in , which requires the VC dimension of the distance function to be larger than or equal to . ∎

Appendix B Details about Computing in Theorem 1

Given the full KG for the FB15k dataset, our goal is to find conjunctive queries such that are disjoint with each other. For conjunctive queries, we use two types of queries: ‘1p’ and ‘2i’ whose query structures are shown in Figure 4. On the FB15k, we instantiate 308,006 queries of type ‘1p’, which we denote by . Out of all the queries in , 129,717 queries have more than one answer entities, and we denote such a set of the queries by . We then generate a set of queries of type ‘2i’ by first randomly sampling two queries from and then taking conjunction; we denote the resulting set of queries by .

Now, we use and to generate a set of conjunctive queries whose denotation sets are disjoint with each other. First, we prepare two empty sets , and . Then, for every , if holds, we let and . This procedure already gives us , where we have conjunctive queries whose denotation sets are disjoint with each other. We can further apply the analogous procedure for , which gives us a further increased , where we have conjunctive queries whose denotation sets are disjoint with each other. Therefore, we get .

Appendix C Experiments on Link Prediction

FB15k FB15k-237 NELL995
Method H@3 MRR H@3 MRR H@3 MRR
query2box 0.613 0.516 0.331 0.295 0.382 0.303
query2box-1p 0.633 0.531 0.323 0.292 0.415 0.320
TransE 0.611 0.522 0.318 0.289 0.413 0.320
Table 4: Performance comparison on the simple link prediction task on the three datasets.

In Table 4, we report the link prediction performance (no multi-hop logical reasoning required) following the conventional metrics (taking average over the triples of head, relation, and tail). Here query2box is trained on all five query structures as shown in Figure 4, and query2box-1p is only trained on simple 1p queries. We found that our query2box is comparable or slightly better than TransE on simple link prediction. Note that in the case of simple link prediction, we do not expect a huge performance gain by using box embeddings as link prediction does not involve logical reasoning nor handling a large set of answer entities. Also, we see that even if we train query2box over diverse queries, its performance on link prediction is still comparable to TransE and query2box-1p, which are trained solely on the link prediction task.

Appendix D Details on Query Generation

Given , , and as defined in Section 4.1, we generate training, validation and test queries of different query structures. During training, we consider the first 5 kinds of query structures. For evaluation, we consider all the 9 query structures in Fig. 4, containing query structures that are both seen and unseen during training time. We instantiate queries in the following way.

Given a KG and a query structure (which is a DAG), we use pre-order traversal to assign an entity and a relation to each node and edge in the DAG of query structure to instantiate a query. Namely, we start from the root of the DAG (which is the target node), we sample an entity uniformly from the KG to be the root, then for every node connected to the root in the DAG, we choose a relation uniformly from the in-coming relations of in the KG, and a new entity from the set of entities that reaches by in the KG. Then we assign the relation to the edge and to the node, and move on the process based on the pre-order traversal. This iterative process stops after we assign an entity and relation to every node and edge in DAG. The leaf nodes in the DAG serve as the anchor nodes. Note that during the entity and relation assignment, we specifically filter out all the degenerated queries, as shown in Fig. 5. Then we perform a post-order traversal of the DAG on the KG, starting from the anchor nodes, to obtain a set of answer entities to this query.

Figure 5: Example of the degenerated queries, including (1) and appear along one path and (2) same anchor node and relation in intersections.

When generating validation/test queries, we explicitly filter out trivial queries that can be fully answered by subgraph matching on /.

Appendix E Details of NELL995 Dataset

Here we detail our pre-processing of the NELL995 dataset, which is originally presented by Xiong et al. (2017). Following Allen et al. (2019), we first combine the validation and test sets with the training set to create the whole knowledge graph for NELL995. Then we create new validation and test set splits by randomly selecting 20,000 triples each from the whole knowledge graph. Note that we filter out all the entities that only appear in the validation and test sets but not in the training set.

Appendix F Dataset Statistics

Table 5 summarizes the basic statistics of the three datasets used in our experiments. Table 6 summarizes the basic statistics of the generated logical queries.

Dataset Entities Relations Training Edges Validation Edges Test Edges Total Edges
FB15k 14,951 1,345 483,142 50,000 59,071 592,213
FB15k-237 14,505 237 272,115 17,526 20,438 310,079
NELL995 63,361 200 114,213 14,324 14,267 142,804
Table 5: Knowledge graph dataset statistics as well as the split into training, validation, and test sets.
Queries Training Validation Test
Dataset 1p others 1p others 1p others
FB15k 273,710 273,710 59,097 8,000 67,016 8,000
FB15k-237 149,689 149,689 20,101 5,000 22,812 5,000
NELL995 107,982 107,982 16,927 4,000 17,034 4,000
Table 6: Number of training, validation, and test queries generated for different query structures.

Appendix G Hyper-parameters

We use embedding dimensionality of and set , for the loss in Eq. 4. We train all types of training queries jointly. In every iteration, we sample a minibatch size of 512 queries for each query structure (details in Appendix D), and we sample 1 answer entity and 128 negative entities for each query. We optimize the loss in Eq. 4 using Adam Optimizer (Kingma and Ba, 2015) with learning rate = 0.0001. We train all models for 250 epochs, monitor the performance on the validation set, and report the test performance.

Appendix H Analysis of Learned Box Offset size

Here we study the correlation between the box size (measured by the L1 norm of the box offset) and the average number of entities that are contained in 1p queries using the corresponding relation. Table 7 shows the top 10 relations with smallest/largest box sizes. We observe a clear trend that the size of the box has a strong correlation with the number of entities the box encloses. Specifically, we see that one-to-many relations tend to have larger offset embeddings, which demonstrates that larger boxes are indeed used to model sets of more points (entities).

Top 10 relations with smallest box size #Ent Box size Top 10 relations with largest box size #Ent Box size
/architecture/…/owner 1.0 2.3 /common/…/topic 3616.0 147.0
/base/…/dog_breeds 2.0 4.0 /user/…taxonomy 1.0 137.2
/education/…/campuses 1.0 4.3 /common/…/category 1.3 125.6
/education/…/educational_institution 1.0 4.6 /base/…/administrative_area_type 1.0 123.6
/base/…/collective 1.0 5.1 /medicine/…/legal_status 1.5 114.9
/base/…/member 1.0 5.1 /people/…/spouse 889.8 114.3
/people/…/appointed_by 1.0 5.2 /sports/…/team 397.9 113.9
/base/…/fashion_models_with_this_hair_color 2.0 5.2 /people/…/location_of_ceremony 132.0 108.4
/fictional_universe/…/parents 1.0 5.5 /sports/…/team 83.1 104.5
/american_football/…/team 2.0 6.7 /user/…/subject 495.0 104.2
Table 7: Top 10 relations with smallest/largest box size in FB15k.

Appendix I MRR Results

Method Avg 1p 2p 3p 2i 3i ip pi 2u up
q2b 0.41 0.654 0.373 0.274 0.488 0.602 0.194 0.339 0.468 0.301
gqe 0.328 0.505 0.320 0.218 0.439 0.536 0.139 0.272 0.3 0.244
gqe-double 0.326 0.49 0.3 0.222 0.438 0.532 0.142 0.28 0.285 0.242
q2b 0.235 0.4 0.225 0.173 0.275 0.378 0.105 0.18 0.198 0.178
gqe 0.203 0.346 0.193 0.145 0.25 0.355 0.086 0.156 0.145 0.151
gqe-double 0.205 0.346 0.191 0.144 0.258 0.361 0.087 0.164 0.144 0.149
q2b 0.254 0.413 0.227 0.208 0.288 0.414 0.125 0.193 0.266 0.155
gqe 0.21 0.311 0.193 0.175 0.273 0.399 0.078 0.168 0.159 0.13
gqe-double 0.211 0.309 0.192 0.174 0.275 0.408 0.08 0.17 0.156 0.129
Table 8: MRR results of query2box vs. gqe on FB15k, FB15k-237 and NELL995.
Method Avg 1p 2p 3p 2i 3i ip pi 2u up
q2b 0.41 0.654 0.373 0.274 0.488 0.602 0.194 0.339 0.468 0.301
q2b-avg 0.396 0.648 0.368 0.27 0.476 0.564 0.182 0.295 0.465 0.3
q2b-deepsets 0.402 0.631 0.371 0.269 0.499 0.605 0.181 0.325 0.437 0.298
q2b-avg-1p 0.324 0.688 0.236 0.159 0.378 0.435 0.122 0.225 0.498 0.178
q2b-sharedoffset 0.296 0.511 0.273 0.199 0.351 0.444 0.132 0.233 0.311 0.213
q2b 0.235 0.4 0.225 0.173 0.275 0.378 0.105 0.18 0.198 0.178
q2b-avg 0.219 0.398 0.222 0.171 0.236 0.328 0.1 0.145 0.193 0.177
q2b-deepsets 0.23 0.395 0.224 0.172 0.264 0.372 0.101 0.168 0.194 0.176
q2b-avg-1p 0.196 0.41 0.18 0.122 0.217 0.274 0.085 0.127 0.209 0.145
q2b-sharedoffset 0.18 0.328 0.18 0.131 0.207 0.289 0.083 0.136 0.135 0.132
q2b 0.254 0.413 0.227 0.208 0.288 0.414 0.125 0.193 0.266 0.155
q2b-avg 0.235 0.406 0.219 0.2 0.251 0.342 0.114 0.174 0.259 0.149
q2b-deepsets 0.246 0.405 0.226 0.207 0.275 0.403 0.107 0.182 0.256 0.153
q2b-avg-1p 0.227 0.468 0.191 0.16 0.234 0.275 0.094 0.162 0.332 0.125
q2b-sharedoffset 0.196 0.318 0.187 0.172 0.228 0.312 0.098 0.156 0.169 0.127
Table 9: MRR results of query2box vs. several variants on FB15k, FB15k-237 and NELL995.


  1. One possible choice here would be to directly use raw box intersection, however, we find that our richer learnable parameterization is more expressive and robust
  2. For the detailed VC dimensions of these function classes, see Vapnik (2013). Crucially, their VC dimensions are all linear with respect to the number of parameters .
  3. On the simple link prediction (1p query) task, box embeddings provide minor empirical performance improvement over TransE, possibly because simple link prediction does not require modeling large sets of entities, as shown in Table 1. See Appendix C for full experimental results on link prediction.
  4. Note that our evaluation metric is slightly different from conventional metric (Nickel et al., 2016; Hamilton et al., 2018; Guu et al., 2015), where average is taken over query-answer pairs. The conventional metric is problematic as it can be significantly biased toward correctly answering generic queries with huge number of answers, while dismissing fine-grained queries with a few answers. Here, to treat queries equally regardless of the number of answers they have, we take average over queries.


  1. On understanding knowledge graph representation. arXiv preprint arXiv:1909.11611. Cited by: Appendix E.
  2. Hierarchical density order embeddings. In International Conference on Learning Representations (ICLR), Cited by: §2.
  3. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR), Cited by: §3.2.
  4. Data structures for range searching. ACM Computing Surveys (CSUR) 11 (4), pp. 397–409. Cited by: §3.3.
  5. Freebase: a collaboratively created graph database for structuring human knowledge. In ACM SIGMOD international conference on Management of data (SIGMOD), pp. 1247–1250. Cited by: §3.3.
  6. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2787–2795. Cited by: §2, §3.2, §3.3, §4.1.
  7. Efficient query evaluation on probabilistic databases. VLDB 16 (4), pp. 523–544. Cited by: §1.
  8. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM) 59 (6), pp. 30. Cited by: §3.3.
  9. Chains of reasoning over entities, relations, and text using recurrent neural networks. In European Chapter of the Association for Computational Linguistics (EACL), pp. 132–141. Cited by: §1, §2.
  10. Introduction to lattices and order. Cambridge university press. Cited by: §1, §3.3, §3.3.
  11. Logical and relational learning. Springer Science & Business Media. Cited by: §1.
  12. Relational data mining. In Data Mining and Knowledge Discovery Handbook, pp. 887–911. Cited by: §1.
  13. Representing words as regions in vector space. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 57–65. Cited by: §2.
  14. Traversing knowledge graphs in vector space. In Empirical Methods in Natural Language Processing (EMNLP), pp. 318–327. Cited by: §1, §2, §3.1, §3.2, footnote 4.
  15. Embedding logical queries on knowledge graphs. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2027–2038. Cited by: §1, §2, §3.1, §3.2, §3.2, §3.3, §4.3, §4.4, footnote 4.
  16. Learning to represent knowledge graphs with gaussian embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 623–632. Cited by: §2.
  17. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 604–613. Cited by: §3.3.
  18. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: Appendix G.
  19. Introduction to statistical relational learning. MIT press. Cited by: §1.
  20. Querying factorized probabilistic triple databases. In International Semantic Web Conference, pp. 114–129. Cited by: §1.
  21. Learning to predict denotational probabilities for modeling entailment. In Annual Meeting of the Association for Computational Linguistics (ACL), pp. 721–730. Cited by: §2.
  22. Improved representation learning for predicting commonsense ontologies. arXiv preprint arXiv:1708.00549. Cited by: §2.
  23. Smoothing the geometry of probabilistic box embeddings. In International Conference on Learning Representations (ICLR), Cited by: §2, §2.
  24. Efficient estimation of word representations in vector space. In International Conference on Learning Representations (ICLR), Cited by: §3.2.
  25. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE 104 (1), pp. 11–33. Cited by: §1, footnote 4.
  26. Rotate: knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations (ICLR), Cited by: §3.2.
  27. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §4.1.
  28. The nature of statistical learning theory. Springer science & business media. Cited by: footnote 2.
  29. Order-embeddings of images and language. In International Conference on Learning Representations (ICLR), Cited by: §2.
  30. I. on the diagrammatic and mechanical representation of propositions and reasonings. The London, Edinburgh, and Dublin philosophical magazine and journal of science 10 (59), pp. 1–18. Cited by: §1, §2.
  31. Probabilistic embedding of knowledge graphs with box lattice measures. In Annual Meeting of the Association for Computational Linguistics (ACL), Cited by: §2, §2.
  32. Word representations via gaussian embedding. In International Conference on Learning Representations (ICLR), Cited by: §2.
  33. Deeppath: a reinforcement learning method for knowledge graph reasoning. In Empirical Methods in Natural Language Processing (EMNLP), Cited by: Appendix E, §4.1.
  34. Deep sets. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3391–3401. Cited by: §3.2, §4.3.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description