Multirelational Poincaré Graph Embeddings
Abstract
Hyperbolic embeddings have recently gained attention in machine learning due to their ability to represent hierarchical data more accurately and succinctly than their Euclidean analogues. However, multirelational knowledge graphs often exhibit multiple simultaneous hierarchies, which current hyperbolic models do not capture. To address this, we propose a model that embeds multirelational graph data in the Poincaré ball model of hyperbolic space. Our MultiRelational Poincaré model (MuRP) learns relationspecific parameters to transform entity embeddings by Möbius matrixvector multiplication and Möbius addition. Experiments on the hierarchical WN18RR knowledge graph show that our multirelational Poincaré embeddings outperform their Euclidean counterpart and existing embedding methods on the link prediction task, particularly at lower dimensionality.
Multirelational Poincaré Graph Embeddings
Ivana Balažević Carl Allen Timothy Hospedales School of Informatics University of Edinburgh {ivana.balazevic, carl.allen, t.hospedales}@ed.ac.uk
noticebox[b]Preprint. Under review.\end@float
1 Introduction
Hyperbolic space can be thought of as a continuous analogue of discrete trees, making it suitable for modelling hierarchical data structures Sarkar (2011); De Sa et al. (2018). Various types of hierarchical data have recently been embedded in hyperbolic space Nickel and Kiela (2017, 2018); Gulcehre et al. (2019); Tifrea et al. (2019), requiring relatively few dimensions and achieving promising results on downstream tasks. This demonstrates the advantage of modelling treelike structures in spaces with constant negative curvature (hyperbolic) over zerocurvature spaces (Euclidean). More recently, tools needed to construct hyperbolic neural networks have been developed Ganea et al. (2018a); Bécigneul and Ganea (2019), facilitating the use of hyperbolic embeddings in downstream tasks.
Certain data structures, such as knowledge graphs, often exhibit multiple hierarchies simultaneously. For example, lion is near the top of the animal food chain but near the bottom in a tree of taxonomic mammal types Miller (1995). Despite the widespread use of hyperbolic geometry in representation learning, the only existing approach to embedding hierarchical multirelational graph data in hyperbolic space Suzuki et al. (2019) does not outperform Euclidean models. The difficulty with representing multirelational data in hyperbolic space lies in finding a way to represent entities (nodes), shared across relations, such that they form a different hierarchy under different relations, e.g. nodes near the root of the tree under one relation may be leaf nodes under another. Further, many stateoftheart approaches to modelling multirelational data, such as DistMult Yang et al. (2015), ComplEx Trouillon et al. (2016), and TuckER Balažević et al. (2019) (i.e. bilinear models), rely on inner product as a similarity measure and there is no clear correspondence to the Euclidean inner product in hyperbolic space Tifrea et al. (2019) by which these models can be converted. Existing translational approaches that use Euclidean distance to measure similarity, such as TransE Bordes et al. (2013) and STransE Nguyen et al. (2016), can be converted to the hyperbolic domain, but do not currently compete with the bilinear models in terms of predictive performance. However, it has recently been shown in the closely related field of word embeddings Allen and Hospedales (2019) that the difference (i.e. relation) between word pairs that form analogies manifests as a vector offset, justifying a translational approach to modelling relations.
In this paper, we propose MuRP, a theoretically inspired method to embed hierarchical multirelational data in the Poincaré ball model of hyperbolic space. Unlike Euclidean space, where all space is equivalent in terms of distance, in the Poincaré ball, points closer to the origin have a relatively low distance to all other points, whereas distance grows exponentially towards the boundary of the ball. This makes the origin suitable for embedding the root of a tree and the space near the boundary suitable for leaf nodes. MuRP learns relationspecific parameters that transform entity embeddings by Möbius matrixvector multiplication and Möbius addition Ungar (2001). The model outperforms not only its Euclidean counterpart, but also current stateoftheart models on the link prediction task on the hierarchical WN18RR dataset. We also show that our Poincaré embeddings require far fewer dimensions than Euclidean embeddings to achieve comparable performance. We visualize the learned embeddings and analyze the properties of the Poincaré model compared to its Euclidean analogue, such as convergence rate, performance per relation, and influence of embedding dimensionality.
2 Background and preliminaries
Multirelation link prediction A knowledge graph is a multirelational graph representation of a collection of facts (or triples) of the form , where denotes the set of entities and denotes the set of binary relations between them. The presence of indicates that subject entity is related to object entity by relation . In a multirelational graph representation of , nodes correspond to entities and typed directed edges represent relations, i.e. nodes for and are linked by a directed edge of type if and only if . Given a set of facts , the task of multirelational link prediction is to predict triples that are true in . A perfect encoding of would simply recall known facts. However, knowledge graphs are typically incomplete, so the aim is to infer other facts that are true but missing from . Typically, a score function is learned, that assigns a score to each triple, indicating the strength of prediction that a particular triple corresponds to a true fact. A nonlinearity, such as the logistic sigmoid function, is often used to convert the score to a predicted probability of the triple being true.
Knowledge graph relations exhibit multiple properties, such as symmetry, asymmetry, and transitivity. Certain knowledge graph relations, such as “hypernym” and “has_part”, induce a hierarchical structure over entities, suggesting that embedding them in hyperbolic rather than Euclidean space may lead to improved representations Sarkar (2011); Nickel and Kiela (2017, 2018); Ganea et al. (2018b); Tifrea et al. (2019). Based on this intuition, we focus on embedding multirelational knowledge graph data in hyperbolic space.
Hyperbolic geometry of the Poincaré ball The Poincaré ball model is one of five isometric models of hyperbolic geometry Cannon et al. (1997), each offering different perspectives for performing mathematical operations in hyperbolic space. The isometry means there exists a onetoone distancepreserving mapping from the metric space of one model onto that of another , where are sets and distance functions, or metrics, providing a notion of equivalence between the models.
The Poincaré ball of radius is a dimensional manifold equipped with the Riemannian metric which is conformal to the Euclidean metric (i.e. anglepreserving with respect to the Euclidean space Ganea et al. (2018a)) with the conformal factor , i.e. . The distance between two points is measured along a geodesic (i.e. shortest path between the points, see Figure 0(a)) and is given by:
(1) 
where denotes the Euclidean norm and represents Möbius addition Ungar (2001); Ganea et al. (2018a):
(2) 
with being the Euclidean inner product.
Each point has a tangent space , a dimensional vector space, that is a local firstorder approximation of the manifold around , which for the Poincaré ball is a dimensional Euclidean space, i.e. . The exponential map allows one to move on the manifold from in the direction of a vector , tangential to at . The inverse is the logarithmic map . For the Poincaré ball, these are defined Ganea et al. (2018a) as:
(3) 
(4) 
Ganea et al. (2018a) show that matrixvector multiplication in hyperbolic space (Möbius matrixvector multiplication) can be obtained by projecting a point onto the tangent space at with , performing matrix multiplication by in the Euclidean tangent space, and projecting back to via the exponential map at , i.e.:
(5) 
3 Related work
3.1 Hyperbolic geometry
Embedding hierarchical data in hyperbolic space has recently gained popularity in representation learning. Nickel and Kiela (2017) first embedded the transitive closure^{1}^{1}1Each node in a directed graph is connected not only to its children, but to every descendant, i.e. all nodes to which there exists a directed path from the starting node. of the WordNet noun hierarchy, in the Poincaré ball, showing that lowdimensional hyperbolic embeddings can significantly outperform higherdimensional Euclidean embeddings in terms of both representation capacity and generalization ability. The same authors subsequently embedded hierarchical data in the Lorentz model of hyperbolic geometry Nickel and Kiela (2018).
Ganea et al. (2018a) introduced Hyperbolic Neural Networks, connecting hyperbolic geometry with deep learning. They build on the definitions for Möbius addition, Möbius scalar multiplication, exponential and logarithmic maps of Ungar (2001) to derive expressions for linear layers, bias translation and application of nonlinearity in the Poincaré ball. Hyperbolic analogues of several other algorithms have been developed since, such as Poincaré Glove Tifrea et al. (2019) and Hyperbolic Attention Networks Gulcehre et al. (2019). More recently, Gu et al. (2019) note that data can be nonuniformly hierarchical and learn embeddings on a product manifold with components of different curvature: spherical, hyperbolic and Euclidean. To our knowledge, only Riemannian TransE Suzuki et al. (2019) seeks to embed multirelational data in hyperbolic space, but the Riemannian translation method fails to outperform Euclidean baselines.
3.2 Link prediction for knowledge graphs
Bilinear models typically represent relations as linear transformations acting on entity vectors. An early model, RESCAL Nickel et al. (2011), optimizes a score function , containing the bilinear product between the subject entity embedding , a full rank relation matrix and the object entity embedding . RESCAL is prone to overfitting due to the number of parameters per relation being quadratic relative to the number per entity. DistMult Yang et al. (2015) is a special case of RESCAL with diagonal relation matrices, reducing parameters per relation and controlling overfitting. However, due to its symmetry, DistMult cannot model asymmetric relations. ComplEx Trouillon et al. (2016) extends DistMult to the complex domain, enabling asymmetry to be modelled. TuckER Balažević et al. (2019) performs a Tucker decomposition of the tensor of triples, which enables information sharing between different relations via the core tensor. The authors show each of the linear models above to be a special case of TuckER.
Translational models regard a relation as a translation (or vector offset) from the subject to the object entity embeddings. These models include TransE Bordes et al. (2013) and its many successors, e.g. FTransE Feng et al. (2016), STransE Nguyen et al. (2016). The score function for translational models typically considers Euclidean distance between the translated subject entity embedding and the object entity embedding.
4 Multirelational Poincaré embeddings
A set of entities can form different hierarchies under different relations. In the WordNet knowledge graph Miller (1995), the “hypernym”, “has_part” and “member_meronym” relations each induce different hierarchies over the same set of entities. For example, the noun chair is a parent node to different chair types (e.g. folding_chair, armchair) under the relation “hypernym” and both chair and its types are parent nodes to parts of a typical chair (e.g. backrest, leg) under the relation “has_part”. An ideal embedding model should capture all hierarchies simultaneously.
Score function Bilinear multirelational models measure similarity between the subject entity embedding (after relationspecific transformation) and an object entity embedding using the Euclidean inner product Nickel et al. (2011); Yang et al. (2015); Trouillon et al. (2016); Balažević et al. (2019). However, a clear correspondence to the Euclidean inner product does not exist in hyperbolic space Tifrea et al. (2019). The Euclidean inner product can be expressed as a function of Euclidean distances and norms, i.e. , . Noting this, in Poincaré Glove, Tifrea et al. (2019) absorb squared norms into biases and replace the Euclidean with the Poincaré distance to obtain the hyperbolic version of Glove Pennington et al. (2014).
Separately, it has recently been shown in the closely related field of word embeddings that statistics pertaining to analogies naturally contain linear structures Allen and Hospedales (2019), explaining why similar linear structure appears amongst word embeddings of Word2Vec Mikolov et al. (2013a, b); Levy and Goldberg (2014). Analogies are word relationships of the form “ is to as is to ”, such as “man is to woman as king is to queen”, and are in principle not restricted to two pairs (e.g. “…as brother is to sister”). It can be seen that analogies have much in common with relations in multirelational graphs, as a difference between pairs of words (or entities) common to all pairs, e.g. if and hold, then we could say “ is to as is to ”. Of particular relevance is the demonstration that the common difference, i.e. relation, between the word pairs (e.g. (man, woman) and (king, queen)) manifests as a common vector offset Allen and Hospedales (2019), suggesting justifying the previously heuristic translational approach to modelling relations.
Inspired by these two ideas, we define the basis score function for multirelational graph embedding:
(6) 
where is a distance function, are the embeddings and scalar biases of the subject and object entities and respectively. is a diagonal relation matrix and a translation vector (i.e. vector offset) of relation . and represent the subject and object entity embeddings after applying the respective relationspecific transformations, a stretch by to and a translation by to .
Hyperbolic model Taking the hyperbolic analogue of Equation 6, we define the score function for our MultiRelational Poincaré (MuRP) model as:
(7) 
where are hyperbolic embeddings of the subject and object entities and respectively, and is a hyperbolic translation vector of relation . The relationadjusted subject entity embedding is obtained by Möbius matrixvector multiplication: the original subject entity embedding is projected to the tangent space of the Poincaré ball at with , transformed by the diagonal relation matrix , and then projected back to the Poincaré ball by . The relationadjusted object entity embedding is obtained by Möbius addition of the relation vector to the object entity embedding . Since the relation matrix is diagonal, the number of parameters of MuRP increases linearly with the number of entities and relations, making it scalable to large knowledge graphs. To obtain the predicted probability of a fact being true, we apply the logistic sigmoid to the score, i.e. .
To directly compare the properties of hyperbolic embeddings with the Euclidean, we implement the Euclidean version of Equation 6 with . We refer to this model as MultiRelational Euclidean (MuRE) model.
Geometric intuition We see from Equation 6 that the biases determine the radius of a hypersphere decision boundary centered at . Entities and are predicted to be related by if relationadjusted falls within a hypershpere of radius (see Figure 0(b)). Since biases are subject and object entityspecific, each subjectobject pair induces a different decision boundary. The relationspecific parameters and determine the position of the relationadjusted embeddings, but the radius of the entityspecific decision boundary is independent of the relation. The score function in Equation 6 resembles the score functions of existing translational models Bordes et al. (2013); Feng et al. (2016); Nguyen et al. (2016), with the main difference being the entityspecific biases, which can be seen to change the geometry of the model. Rather than considering an entity as a point in space, each bias defines an entityspecific sphere of influence surrounding the center given by the embedding vector (see Figure 0(c)). The overlap between spheres measures relatedness between entities. We can thus think of each relation as moving the spheres of influence in space, so that only the spheres of subject and object entities that are connected under that relation overlap.
4.1 Training and Riemannian optimization
To train both models, we generate negative samples for each true triple , where we corrupt either the subject or the object entity with a randomly chosen entity from the set of all entities . Both models are trained to minimize the Bernoulli negative loglikelihood loss:
(8) 
where is the predicted probability, is the binary label indicating whether a sample is positive or negative and is the number of training samples.
For fairness of comparison, we optimize the Euclidean model using stochastic gradient descent (SGD) and the hyperbolic model using Riemannian stochastic gradient descent (RSGD) Bonnabel (2013). We note that the Riemannian equivalent of adaptive optimization methods has recently been developed Bécigneul and Ganea (2019), but leave replacing SGD and RSGD with their adaptive equivalent to future work. To compute the Riemannian gradient , the Euclidean gradient is multiplied by the inverse of the Poincaré metric tensor:
(9) 
Instead of the Euclidean update step , a first order approximation of the true Riemannian update, we use the exponential map at to project the gradient onto its corresponding geodesic on the Poincaré ball and compute the Riemannian update:
(10) 
where denotes the learning rate.
5 Experiments
To evaluate both Poincaré and Euclidean models, we first test their performance on the knowledge graph link prediction task using standard WN18RR and FB15k237 datasets:
FB15k237 Toutanova et al. (2015) is a subset of Freebase, a database of real world facts, created from FB15k Bordes et al. (2013) by removing the inverse of many relations from validation and test sets to make the dataset more challenging. FB15k237 contains 14,541 entities and 237 relations.
WN18RR Dettmers et al. (2018) is a subset of WordNet, a hierarchical database of relations between words, created in the same way as FB15k237 from WN18 Bordes et al. (2013). WN18RR contains 40,943 entities and 11 relations.
We evaluate each triple from the test set as in Bordes et al. (2013): we generate (where denotes number of entities in the dataset) evaluation triples for each test triple by keeping the subject entity and relation fixed and replacing the object entity with all possible entities and similarly keeping and fixed and varying . The scores obtained for each evaluation triple are ranked. All true triples are removed from the evaluation triples apart from the current test triple, i.e. the commonly used filtered setting Bordes et al. (2013). We evaluate our models using the evaluation metrics standard across the link prediction literature: mean reciprocal rank (MRR) and hits@, . Mean reciprocal rank is the average of the inverse of a mean rank assigned to the true triple over all evaluation triples. Hits@ measures the percentage of times the true triple appears in the top ranked evaluation triples.
5.1 Implementation details
We implement both models in PyTorch and make our code publicly available.^{2}^{2}2https://github.com/ibalazevic/multirelationalpoincare We choose the learning rate from by MRR on the validation set and find that the best learning rate is for WN18RR and for FB15k237 for both models. We initialize all embeddings near the origin where distances are small in hyperbolic space, similar to Nickel and Kiela (2017). We set the batch size to 128 and the number of negative samples to . In all experiments, we set the curvature of MuRP to , since preliminary experiments showed that any material change reduced performance.
5.2 Link prediction results
Table 1 shows the results obtained for both datasets. As expected, MuRE performs slightly better on the nonhierarchical FB15k237 dataset, whereas MuRP outperforms on WN18RR which contains hierarchical relations (as shown in Section 5.3). Both MuRE and MuRP outperform previous stateoftheart models on WN18RR on all metrics apart from hits@1, where MuRP obtains second best overall result. In fact, even at relatively low embedding dimensionality (), this is maintained, demonstrating the ability of hyperbolic models to succinctly represent multiple hierarchies. On FB15k237, MuRE is outperformed only by TuckER Balažević et al. (2019), a model capable of multitask learning between relations, which is highly advantageous on that dataset due to a large number of relations compared to WN18RR and thus relatively little data per relation in some cases.
WN18RR  FB15k237  
MRR  Hits@10  Hits@3  Hits@1  MRR  Hits@10  Hits@3  Hits@1  
TransE Bordes et al. (2013)  
DistMult Yang et al. (2015)  
ComplEx Trouillon et al. (2016)  
Neural LP Yang et al. (2017)  
MINERVA Das et al. (2018)  
ConvE Dettmers et al. (2018)  
ComplExN3 Lacroix et al. (2018)  
MWalk Shen et al. (2018)  
TuckER Balažević et al. (2019)  
RotatE Sun et al. (2019)  
MuRE  
MuRE  
MuRP  
MuRP 
5.3 MuRE vs MuRP
Effect of dimensionality We compare the MRR achieved by MuRE and MuRP on WN18RR for embeddings of different dimensionalities . As expected, the difference between MRRs is greatest at lower embedding dimensionality (see Figure 1(a)).
Convergence rate Figure 1(b) shows the MRR per epoch for MuRE and MuRP on the WN18RR training and validation sets, showing that MuRP also converges faster.
Performance per relation Since not every relation in WN18RR induces a hierarchical structure over the entities, we report the Krackhardt hierarchy score (Khs) Krackhardt (2014) of the entity graph formed by each relation to obtain a measure of the hierarchy induced by each relation. The score is defined only for directed networks and measures the proportion of node pairs where there exists a directed path , but not (see Appendix A for further details). The score takes a value of one for all directed acyclic graphs, and zero for cycles and cliques. We also report the length of the longest path (i.e. tree depth) for hierarchical relations as both need to be considered. To gain insight as to which relations benefit most from embedding entities in hyperbolic space, we compare Hits@10 per relation of MuRE and MuRP for entity embeddings of low dimensionality (). From Table 2 we see that both models achieve comparable performance on nonhierarchical, symmetric relations with the Krackhardt hierarchy score 0, such as “similar_to” and “verb_group”, whereas MuRP generally outperforms MuRE on hierarchical relations. We also see that the difference between the performances of MuRE and MuRP is generally larger for relations that form deeper trees, fitting the hypothesis that hyperbolic space is of most benefit for modelling hierarchical relations.
Computing the Krackhardt hierarchy score for FB15k237, we find that of the relations have , however, the average of longest path lengths over those relations is with only relations having paths longer than 2, meaning that the vast majority of relational subgraphs consist of directed edges between pairs of nodes, rather than a tree.
Relation Name  MuRE  MuRP  Khs  Longest Path  
hypernym  
has_part  
member_meronym  
also_see  
synset_domain_topic_of  
instance_hypernym  
member_of_domain_region  
member_of_domain_usage  
derivationally_related_form  
similar_to  
verb_group 
Biases vs embedding vector norms We plot the norms versus the biases for MuRP and MuRE in Figure 3. This shows an overall correlation between embedding vector norm and bias (or radius of the sphere of influence) for both MuRE and MuRP. This makes sense intuitively, as the sphere of influence increases to “fill out the space” in regions that are less cluttered, i.e. further from the origin.
Spatial layout In Figure 4, we show a 40dimensional subject embedding for the word asia and a random subset of 1500 object embeddings for the hierarchical WN18RR relation “has_part”, projected to 2 dimensions so that distances and angles of object entity embeddings relative to the subject entity embedding are preserved (see Appendix B for details of the projection method). We show subject and object entity embeddings before and after relationspecific transformation. For both MuRE and MuRP, we see that applying the relationspecific transformation separates true object entities from false ones. However, in the Poincaré model, where distances increase further from the origin, embeddings are moved further towards the boundary of the disk, where, loosely speaking, there is more space to separate and therefore distinguish them.
Quality of learned embeddings Here we analyze the false positives and false negatives predicted by both models. MuRP predicts 15 false positives and 0 false negatives, whereas MuRE predicts only 2 false positives and 1 false negative, so seemingly performs better. However, inspecting the false positives predicted by MuRP, we find they are all countries on the Asian continent (e.g. sri_lanka, palestine, malaysia, sakartvelo, thailand), so are actually correct, but missing from the dataset. MuRE’s predicted false positives (philippines and singapore) are both also correct but missing, whereas the false negative (bahrain) is indeed falsely predicted. We note that this suggests current evaluation methods may be unreliable.
6 Conclusion and future work
We introduce a novel, theoretically inspired, translational method for embedding multirelational graph data in the Poincaré ball model of hyperbolic geometry. Our multirelational Poincaré model MuRP learns relationspecific parameters to transform entity embeddings by Möbius matrixvector multiplication and Möbius addition. We show that MuRP outperforms its Euclidean counterpart MuRE and existing models on the link prediction task on the hierarchical WN18RR knowledge graph dataset, and requires far lower dimensionality to achieve comparable performance to its Euclidean analogue. We analyze various properties of the Poincaré model compared to its Euclidean analogue and provide insight through a visualization of the learned embeddings.
Future work may include investigating the impact of recently introduced Riemannian adaptive optimization methods compared to Riemannian SGD. Also, given not all relations in a knowledge graph are hierarchical, we may look into combining the Euclidean and hyperbolic models to produce mixedcurvature embeddings that best fit the curvature of the data.
Acknowledgements
We thank Rik Sarkar, Ivan Titov and Jonathan Mallinson for helpful comments on this manuscript. Ivana Balažević and Carl Allen were supported by the Centre for Doctoral Training in Data Science, funded by EPSRC (grant EP/L016427/1) and the University of Edinburgh.
References
 Allen and Hospedales [2019] Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding Word Embeddings. In International Conference on Machine Learning, 2019.
 Balažević et al. [2019] Ivana Balažević, Carl Allen, and Timothy M Hospedales. TuckER: Tensor Factorization for Knowledge Graph Completion. arXiv preprint arXiv:1901.09590, 2019.
 Bécigneul and Ganea [2019] Gary Bécigneul and OctavianEugen Ganea. Riemannian Adaptive Optimization Methods. In International Conference on Learning Representation, 2019.
 Bonnabel [2013] Silvere Bonnabel. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control, 2013.
 Bordes et al. [2013] Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multirelational Data. In Advances in Neural Information Processing Systems, 2013.
 Cannon et al. [1997] James W Cannon, William J Floyd, Richard Kenyon, Walter R Parry, et al. Hyperbolic Geometry. Flavors of Geometry, 31:59–115, 1997.
 Das et al. [2018] Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. Go for a Walk and Arrive at the Answer: Reasoning over Paths in Knowledge Bases Using Reinforcement Learning. In International Conference on Learning Representations, 2018.
 De Sa et al. [2018] Christopher De Sa, Albert Gu, Christopher Ré, and Frederic Sala. Representation Tradeoffs for Hyperbolic Embeddings. In International Conference on Machine Learning, 2018.
 Dettmers et al. [2018] Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2D Knowledge Graph Embeddings. In Association for the Advancement of Artificial Intelligence, 2018.
 Feng et al. [2016] Jun Feng, Minlie Huang, Mingdong Wang, Mantong Zhou, Yu Hao, and Xiaoyan Zhu. Knowledge Graph Embedding by Flexible Translation. In KR, pages 557–560, 2016.
 Ganea et al. [2018a] Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Neural Networks. In Advances in Neural Information Processing Systems, 2018a.
 Ganea et al. [2018b] OctavianEugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. In International Conference on Machine Learning, 2018b.
 Gu et al. [2019] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning MixedCurvature Representations in Product Spaces. In International Conference on Learning Representations, 2019.
 Gulcehre et al. [2019] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, and Nando de Freitas. Hyperbolic Attention Networks. In International Conference on Learning Representations, 2019.
 Krackhardt [2014] David Krackhardt. Graph Theoretical Dimensions of Informal Organizations. In Computational organization theory. Psychology Press, 2014.
 Lacroix et al. [2018] Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical Tensor Decomposition for Knowledge Base Completion. In International Conference on Machine Learning, 2018.
 Levy and Goldberg [2014] Omer Levy and Yoav Goldberg. Linguistic Regularities in Sparse and Explicit Word Representations. In Proceedings of the 18th conference on Computational Natural Language Learning, 2014.
 Mikolov et al. [2013a] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, 2013a.
 Mikolov et al. [2013b] Tomas Mikolov, Wentau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013b.
 Miller [1995] George A Miller. WordNet: a Lexical Database for English. Communications of the ACM, 1995.
 Nguyen et al. [2016] Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark Johnson. STransE: a Novel Embedding Model of Entities and Relationships in Knowledge Bases. In NAACLHLT, 2016.
 Nickel et al. [2011] Maximilian Nickel, Volker Tresp, and HansPeter Kriegel. A ThreeWay Model for Collective Learning on MultiRelational Data. In International Conference on Machine Learning, 2011.
 Nickel and Kiela [2017] Maximillian Nickel and Douwe Kiela. Poincaré Embeddings For Learning Hierarchical Representations. In Advances in Neural Information Processing Systems, 2017.
 Nickel and Kiela [2018] Maximillian Nickel and Douwe Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. In International Conference on Machine Learning, 2018.
 Pennington et al. [2014] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing, 2014.
 Sarkar [2011] Rik Sarkar. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. In International Symposium on Graph Drawing, 2011.
 Shen et al. [2018] Yelong Shen, Jianshu Chen, PoSen Huang, Yuqing Guo, and Jianfeng Gao. MWalk: Learning to Walk over Graphs using Monte Carlo Tree Search. In Advances in Neural Information Processing Systems, 2018.
 Sun et al. [2019] Zhiqing Sun, ZhiHong Deng, JianYun Nie, and Jian Tang. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In International Conference on Learning Representations, 2019.
 Suzuki et al. [2019] Atsushi Suzuki, Yosuke Enokida, and Kenji Yamanishi. Riemannian TransE: Multirelational Graph Embedding in NonEuclidean Space, 2019. URL https://openreview.net/forum?id=r1xRW3A9YX.
 Tifrea et al. [2019] Alexandru Tifrea, Gary Bécigneul, and OctavianEugen Ganea. Poincaré GloVe: Hyperbolic Word Embeddings. In International Conference on Learning Representations, 2019.
 Toutanova et al. [2015] Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. Representing Text for Joint Embedding of Text and Knowledge Bases. In Empirical Methods in Natural Language Processing, 2015.
 Trouillon et al. [2016] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex Embeddings for Simple Link Prediction. In International Conference on Machine Learning, 2016.
 Ungar [2001] Abraham A Ungar. Hyperbolic Trigonometry and its Application in the Poincaré Ball Model of Hyperbolic Geometry. Computers & Mathematics with Applications, 41(12):135–147, 2001.
 Yang et al. [2015] Bishan Yang, Wentau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In International Conference on Learning Representations, 2015.
 Yang et al. [2017] Fan Yang, Zhilin Yang, and William W Cohen. Differentiable Learning of Logical Rules for Knowledge Base Reasoning. In Advances in Neural Information Processing Systems, 2017.
Appendix A Krackhardt hierarchy score
Let be the binary reachability matrix of a directed graph with nodes, with if there exists a directed path from node to node and otherwise. The Krackhardt hierarchy score of Krackhardt [2014] is defined as:
(11) 
Appendix B Dimensionality reduction method
To project highdimensional embeddings to 2 dimensions for visualization purposes, we use the following method to compute dimensions for projection of entity :

, where is the original highdimensional subject entity embedding and is the number of object entity embeddings.

.
This projects the reference subject entity embedding onto the axis () and all object entity embeddings are positioned relative to it, according to their component aligned with the subject entity and their “remaining” component .