TeRo: A Timeaware Knowledge Graph Embedding via Temporal Rotation
Abstract
In the last few years, there has been a surge of interest in learning representations of entities and relations in knowledge graph (KG). However, the recent availability of temporal knowledge graphs (TKGs) that contain time information for each fact created the need for reasoning over time in such TKGs. In this regard, we present a new approach of TKG embedding, TeRo, which defines the temporal evolution of entity embedding as a rotation from the initial time to the current time in the complex vector space. Specially, for facts involving time intervals, each relation is represented as a pair of dual complex embeddings to handle the beginning and the end of the relation, respectively. We show our proposed model overcomes the limitations of the existing KG embedding models and TKG embedding models and has the ability of learning and inferring various relation patterns over time. Experimental results on four different TKGs show that TeRo significantly outperforms existing stateoftheart models for link prediction. In addition, we analyze the effect of time granularity on link prediction over TKGs, which as far as we know has not been investigated in previous literature.
1 Introduction
\blfootnoteThis work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/. In recent years, a number of sizable Knowledge Graphs (KGs) have been constructed, including DBpedia [1], YAGO [20], Nell [4] and Freebase [2]. In these KGs, a fact is represented as a triple (s, r ,o), where s (subject) and o (object) are entities (nodes), and r (relation) is the relation (edge) between them.
Several KG embedding (KGE) models are developed to perform learning and inference over these KGs [3, 27, 23, 21, 28]. The most common learning task for these models is link prediction, which is to complete a fact with the missing entity. For instance, one can use a KGE model to perform an object query like (Barack Obama, visits, ?). In this case, there are several valid answers to this question, regardless of the time factor. Obviously, the inclusion of time information can make this query more specific, e.g., (Barack Obama, visits, ?, 20140708).
Some temporal KGs (TKGs) including ICEWS [12], GDELT [14], YAGO3 [17] and Wikidata [6] store billions of timeaware facts as quadruples (s, r ,o, t) where t is the time annotation, e.g., (Barack Obama, visits, Ukraine, 20140708). The availability of these TKGs that exhibits complex temporal dynamics in addition to its multirelational nature has created the need for approaches that can characterize and reason over them. Traditional KGE models disregard time information, leading to an ineffectiveness of performing link prediction on TKGs involving temporary relations (e.g., visits, live in, etc.).
To tackle this problem, TKG embedding (TKGE) models encode time information in their embeddings. Such TKGE models [22, 13, 7, 16, 5, 10] were shown to have better performances on link prediction over TKGs than traditional KGE models. However, most of the existing TKGE models are the extensions of TransE [3] and DistMult [27], and thus are not fully expressive for some relation patterns [21].
In this paper, we propose a novel approach for TKGEs, TeRo, which defines the temporal evolution of an entity embedding as a rotation from the initial time to the current time in the complex vector space. We show the limitation of the existing TKGE models and the advantage of our proposed model on learning various relation patterns over time.
Specially, for facts involving time intervals, each relation is represented as a pair of dual complex embeddings which are used to handle the beginning and the end of the relation, respectively. In this way, TeRo can adapt well to datasets where time annotations are represented in the various forms: time points, beginning or end time, time intervals.
Most of previous TKGErelated works as far as we know use specific time granularities for various TKGs. For example, the time granularity of the ICEWS datasets is fixed as 24 hours in [5, 25]. In this work, we adopt various timedivision approaches for different TKG datasets and investigate the effect of the length of the time steps on the performance of our model.
To verify our approach, we compare the performance of our proposed models on link prediction and time prediction tasks over four different TKGs with the stateoftheart KGE models and the existing TKGE models. The experimental results demonstrate that our proposed model outperforms other baseline models significantly by inferring various relation patterns and encoding time information.
2 Related Work
KGE models can be roughly classified into distancebased models and semantic matching models.
Distancebased models measure the plausibility of a fact as the distance between the two entities, usually after a translation or rotation carried out by the relation. A typical example of distancebased models is TransE [3]. TransE exhibits deficiencies when learning 1n relations. Thus, various extensions of TransE [24, 9, 15, 18], were proposed to tackle this problem. They use different mapping methods to project entities from entity space to relation space. Specially, RotatE [21] defines each relation as a rotation from the subject to the object. Nevertheless, these distancebased distance models are still unable to capture reflexive relations which can hold, i.e. for a particular relation each entity is related to itself via . In distancebased models, the values or the phases of vectors for all reflexive relations are enforced to be 0, which does not allow to fully express the semantic characteristics of these relations.
Semantic matching models measure plausibility of facts by matching latent semantics of entities and relations embodied in their embedding representations. A few examples of such models include RESCAL [19], DistMult [27], ComplEx [23], QuatE [28] and GeomE [26]. RESCAL and DistMult cannot capture asymmetric relations since the score of the triple is always equal to the score of its symmetric triple . ComplEx, QuatE and GeomE have been proven to be able to capture various relation patterns for static KGs, but cannot model temporary relations in TKGs due to their ignorance of time information.
Recent research illustrated that the performances of KGE models can be further improved by incorporating time information in TKGs. Some TKGE models are extended from TransE, e.g., TTransE [13], TATransE [7], HyTE [5] and ATiSE [25]. Another part of TKGE models are temporal extensions of DistMult, e.g., KnowEvolve [22], TDistMult [16] and TADistMult [7]. Similar to TransE and DistMult, these TKGE models have issues with capturing reflexive relations or asymmetric relations. Specially, DESimplE [8] incorporates time information into diachronic entity embeddings and has capability of modeling various relation patterns. However, this approach only focuses on eventbased TKG datasets, and cannot model facts involving time intervals shaped like [2003####, 2005####].
3 A Novel TKGE Approach based on Temporal Rotation
Although various KGE models have been developed to learn multirelational interactions between entities, all of them have problems with inferring temporary relations which are only valid for a certain time point or a last for a certain time period. To illustrate this by example, assume we are given a quadruple (Barack Obama, visits, France, 20140212) as a training sample, where the relation visits is a temporary relation. If we query (Barack Obama, visits, ?, 20140708), a trained static KGE model probably returns the incorrect answer France due to the validness of the triple Barack Obama, visits, France, while the correct answer is Ukraine considering the given time constraint. On the other hand, most of the existing TKGE models, which were extended from TransE [3] and DistMult [27], incorporate time information in the embedding space, but have limitations on learning transitive relations or asymmetric relations as discussed in Section 2.
To overcome the limitations of these existing KGE and TKGE models on learning and inferring over TKGs, we propose a new TKGE model, TeRo, which defines the temporal evolution of an entity embedding as a rotation in the complex vector space. Let denote the set of entities, denote the set of relations, and denote the set of time steps. Then a TKG is a collection of factual quadruples (s, r, o, t), where are the subject and object entities, is the relation, denotes the actual time when the fact occurs. For any time , we have a time step representing this actual time. We map , , to their complex embeddings, i.e., ; then we define the functional mapping induced by each time step as an elementwise rotation from the timeindependent entity embeddings s and o to the timespecific entity embeddings and . The mapping function is defined as follows:
(1) 
where denotes the Hermitian dot product between complex vectors. Here, we constrain the modulus of each element of , i.e., , to be . By doing this, is of the form , which corresponds to a counterclockwise rotation by radians around the origin of the complex plane, and only affects the phases of the entity embeddings in the complex vector space. This idea is motivated by Eulerâs identity , which indicates that a unitary complex number can be regarded as a rotation in the complex plane.
We regard the relation embedding r as translation from the timespecific subject embedding to the conjugate of the timespecific object embedding for a single quadruple (s, r, o, t), where and denotes the set of all positive quadruples. The score function is defined as:
(2) 
For a fact (s, r, o, t) occurring in a certain time interval, i.e., t = [t, t] where denote the beginning time and the end time of the fact, we separate this fact into two quadruples, namely, (s, r, o, t) and (s, r, o, t). Here, we extend the relation set in a TKG which involves time intervals to a pair of dual relation sets, and . A relation is used to handle the beginning of relation , meanwhile a relation is used to handle the end of relation . By doing this, we score a fact (s, r, o, [t, t]) as the mean value of scores of two quadruples, (s, r, o, t) and (s, r, o, t) which represent the beginning and the end of this fact respectively.
(3) 
Specially, for a fact missing either the beginning time or the end time, e.g., (s, r, o, [t, ]) or (s, r, o, [, t]), the score of this fact is equal to the score of the quadruple involving the known time, i.e., , .
In this paper, we use the same loss function as the negative sampling loss proposed in [21] for optimizing our model. This loss function has been proved to be very effective on optimizing some distancebased KGE models, e.g., TransE, RotatE [21] and ATiSE [25].
(4) 
where is a positive training quadruple, is the th negative sample corresponding to generated by randomly corrupting the subject or the objects of such as and , denotes the sigmoid function, is a fixed margin and is the ratio of negatives over positive training samples.
3.1 Learning Various Relation Patterns
Static KGE models and some existing TKGE models which are the temporal extensions of TransE or DistMult have limitations on capturing some key relation patterns which are defined as follows.
Definition 1.
A relation r is a temporary relation if holds True
Definition 2.
A relation r is asymmetric if holds True.
Definition 3.
A relation r is a reflexive relation if holds True.
As mentioned in Section 2, static KGE models can not model temporary relations, e.g., ’visits’, since . Temporal extensions of DistMult (denoted as TDistMult) including TDistMult, TADistMult and KnowEvolve can not model asymmetric relations, e.g., ’parentOf’, since , where s, o, r are timespecific entity/relation embeddings corresponding to different TDistMult models. Temporal extensions of TransE (denoted as TTransE) including HyTE, TTransE, TATransE have difficulties of modeling multiple reflexive relations, e.g., ’equalTo’ and ’subsetOf’, since .
By defining each time step as a rotation in the complex vector spaces, TeRo can capture all of the above three relation patterns. Given an observed fact (s, r, o, t) where :

as shown in Figure 1(b), if is a temporary relation, we can have for TeRo to make hold true.

as shown in Figure 1(c), if is an asymmteric relation, we can have for TeRo to make hold true.

as shown in Figure 1(d), if is a reflexive relation, we have for TeRo. Thus, TeRo can represent multiple reflexive relations as different embeddings due to the conjugate operations of object embeddings.
3.2 Complexity
In Table 1, we summarize the scoring functions and the space complexites of several stateoftheart TKGE approaches and our model as well as TransE. , , and are numbers of entities, relations, time steps and temporal tokens used in [7]; is the dimensionality of embeddings. denotes the temporal projection for embeddings [5]. denotes an LSTM neural network; denotes the concatenation of the relation embedding and the sequence of temporal tokens [7]. and denote the temporal part and untemporal part of a timespecific diachronic entity embedding [8]; denotes the inverse relation embedding of , i.e., . denotes the KL divergence between two Gaussian distributions; denote the Gaussian embeddings of , and at time [25].
As shown in Table 1, the space complexity of TeRo and TransE will be close if . In practice, we can achieve this condition by tuning the time granularity.
Model  Scoring Function  Space Complexity 

TransE  
TTransE  
HyTE  
TATransE  
TADistMult  
DESimplE  
ATiSE  
TeRo 
4 Experiments
4.1 Temporal Knowledge Graph Datasets
Common TKG benchmarks include GDELT [8], ICEWS14, ICEWS0515, YAGO15k, Wikidata11k [7], YAGO11k and Wikidata12k [5]. In this work, we choose ICEWS14, ICEWS0515, YAGO11k and Wikidta12k as datasets for the following reasons: 1. ICEWS14 and ICEWS0515 are two wellestablished eventbased datasets which are commonly used in previous literature [7, 8, 25], these two datasets are subsets of ICEWS [12] corresponding to facts in 2014 and facts between 2005 and 2015, where all time annotations are time points; 2. YAGO15k, Wikidata11k, YAGO11k and Wikidata12k are subsets of YAGO3 [17] and Wikidata [6] where a part of time annotations are time intervals. In YAGO15k and Wikidata11k, each time interval only contains either beginning dates or end dates, shaped like âoccurSince 2003â or âoccurUntill 2005â and a part of facts in YAGO15k exclude time information. Thus we prefer to using YAGO11k and Wikidata12k where each fact includes time information and time annotations are represented in the various forms, i.e., time points like [20030101, 20030101], beginning or end time like [2003, ##], and time intervals like [2003, 2005]. We list the statistics of the four datasets we use in Table 2.
Dataset  #Entities  #Relations  Time Span  #Training  #Validation  #Test 

ICEWS14  6,869  230  2014  72,826  8,941  8,963 
ICEWS0515  10,094  251  20052015  368,962  46,275  46,092 
YAGO11k  10,623  10  4532844  16,406  2,050  2,051 
Wikidata12k  12,554  24  14792018  32,497  4,062  4,062 
4.2 Time Granularity
In some recent work [5, 25], the time span of a TKG dataset was splitted into a number of time steps. For ICEWS14 and ICEWS0515, the time granularity was fixed as 1 day. For YAGO11k and Wikidata12k, month and day information was dropped, and less frequent year mentioned were clubbed into same time steps but years with high frequency formed individual time steps in order to alleviate the effect of the longtail property of time data. In other words, the lengths of different time steps were different for the balance of numbers of triples in different time steps. However, it has not been investigated whether the lengths of time steps affect the performances of TKGE models.
In this work, we test our model with different time units, denoted as , in a range of {1, 2, 3, 7, 14, 30, 90 and 365} days for ICEWS datasets. Dasgupta et al. \shortciteHyTE and Xu et al. \shortciteATiSE applied a minimum threshold of 300 triples per interval during construction for YAGO11k and Wikidata12k. We follow their timedivision approaches for these two datasets and test different minimum thresholds, denoted as , in a range of {1, 3, 10, 30, 100, 300, 1000, 3000, 10000, 30000}. The change of time granularity will reconstruct the set of time steps . For ICEWS14, when the time unit is 1 day, we have totally 365 time steps and the date 20140102 is represented by the second time step, i.e., . If the time unit is changed as 2 days, the total number of time steps will be 183 and the date 20140102 will be denoted as . For YAGO11k, when the mini threshold , we have 396 time steps since there are totally 396 different years existing as timestamps in YAGO11k. Years like 453, 100 and 2008 are all taken as independent time steps. When for YAGO11k rises to 300, the number of time steps drops to 127 and years between 431 and 100 are clubbed into a same time step.
4.3 Evaluation Metrics
We evaluate our model by testing the performances of our model on link prediction task over TKGs under the timewise filtered setting defined in [25, 8]. This task is to complete a timewise fact with a missing entity. For a test quadruple , we first generate candidate quadruples by replacing or with all possible entities. Different from the timeunwise filtered setting [3] which filters the triples appearing either in the training, validation or test set from the candidate list , we only filter the quadruples existing in the dataset. This ensures that the facts which do not appear at time are still considered as candidates for evaluating the given test quadruple. We obtain the final rank of the test quadruple among filtered candidate quadruples by sorting their scores.
Two commonly used evaluation metrics are used here, i.e., Mean Reciprocal Rank and Hits@k. The Mean Reciprocal Rank (MRR) is the means of the reciprocal values of all computed ranks. And the fraction of test quadruples ranking in the top is called Hits@k.
4.4 Baselines
We compare our approach with several stateoftheart KGE approaches and existing TKGE approaches, including TransE [3], DistMult [27], ComplExN3 [11], RotatE [21], QuatE [28], TTransE [13], TATransE, TADistMult [7], DESimplE [8] and ATiSE [25]. The results of most baselines are taken from some recent work [8, 25] which used the same evaluation protocol as ours. DESimplE which mainly focuses on eventbased datasets, cannot model time intervals or time annotations missing moth and day information which are common in YAGO and Wikidata. Thus its result on YAGO11k and Wikidata12k are unobtainable. Since the original source code of TATransE and TADistMult [7] is not released, we reimplement these models according to the implementation details reported in the original paper, in order to obtain their results on YAGO11k and Wikidata12k.
4.5 Experimental Setup
We implement our proposed model in PyTorch. The code is available at https://github.com/soledad921/ATISE.
We select the optimal hyperparameters by early validation stopping according to MRR on the validation set. We restrict the iterations to 5000. Following the setup used in [25], the batch size is kept for all datasets, the embedding dimensionality is tuned in {}, the ratio of negative over positive training samples is tuned in {} and the margin is tuned in {1, 2, 3, 5, 10, 20, , 120}. Regarding optimizer, we choose Adagrad for TeRo and tune the learning rate in a range of {}. Specially, the time granularity parameters and are also regraded as hyperparameters for TeRo as mentioned in Section 4.2.
The default configuration for TeRo is as follows: , . Below, we only list the nondefault parameters: , , on ICEWS14; , , on ICEWS0515; , , on YAGO11k; , , on Wikidata12k.
5 Results and Analysis
5.1 Comparative Study
Datasets  ICEWS14  ICEWS0515  

Metrics  MRR  Hits@1  Hits@3  Hits@10  MRR  Hits@1  Hits@3  Hits@10 
TransE*  .280  .094    .637  .294  .090    .663 
DistMult*  .439  .323    .672  .456  .337    .691 
ComplExN3  .467  .347  .527  .716  .481  .362  .535  .729 
RotatE  .418  .291  .478  .690  .304  .164  .355  .595 
QuatE  .471  .353  .530  .712  .482  .370  .529  .727 
TTransE  .255  .074    .601  .271  .084    .616 
HyTE  .297  .108  .416  .655  .316  .116  .445  .681 
TATransE*  .275  .095    .625  .299  .096    .668 
TADistMult*  .477  .363    .686  .474  .346    .728 
DESimplE  .526  .418  .592  .725  .513  .392  .578  .748 
ATiSE  .550  .436  .629  .750  .519  .378  .606  .794 
TeRo  .562  .468  .621  .732  .586  .469  .668  .795 
Datasets  YAGO11k  Wikidata12k  

Metrics  MRR  Hits@1  Hits@3  Hits@10  MRR  Hits@1  Hits@3  Hits@10 
TransE  .100  .015  .138  .244  .178  .100  .192  .339 
DistMult  .158  .107  .161  .268  .222  .119  .238  .460 
ComplExN3  .167  .106  .154  .282  .233  .123  .253  .436 
RotatE  .167  .103  .167  .305  .221  .116  .236  .461 
QuatE  .164  .107  .148  .270  .230  .125  .243  .416 
TTransE  .108  .020  .150  .251  .172  .096  .184  .329 
HyTE  .105  .015  .143  .272  .180  .098  .197  .333 
TATransE  .127  .027  .160  .326  .178  .030  .267  .429 
TADistMult  .161  .103  .171  .292  .218  .122  .232  .447 
ATiSE  .170  .110  .171  .288  .280  .175  .317  .481 
TeRo  .187  .121  .197  .319  .299  .198  .329  .507 
Table 3 and 4 list all link prediction results of our proposed model and baseline models on four datasets. TeRo surpassed all baseline embedding models regarding all metrics on all datasets except that the ATiSE got the better Hits@3 and Hits@10 than TeRo on ICEWS14. Compared to ATiSE, TeRo achieved the improvement of 1.2 MRR points, 6.7 MRR points, 1.7 MRR points and 1.9 MRR points on ICEWS14, ICEWS0515, YAGO11k and Wikidata12k respectively.
5.2 Ablation Study
In this work, we analyze the effect of the change of the time granularity on the performance of our model. As mentioned in Section 4.2, we adopt two different timedivision approaches for eventbased datasets, i.e., ICEWS datasets, and timewise KGs involving time intervals, i.e., YAGO11k as well as Wikidata12k. For ICEWS14 and ICEWS0515, we use time steps with fixed length since the the distribution of numbers of facts in ICEWS datasets over time are relatively uniform as shown in Figure 2. The time granularities of ICEWS datasets are equal to the lengths of time units . On the other hand, the time distributions of numbers of facts in YAGO15k and Wikidata12k are longtailed. Thus we divide the time steps in YAGO15k and Wikidata12k by setting a mini threshold for the numbers of facts in each time step. Time granularities of these two datasets can be changed by setting different thresholds .
In ICEWS14, time distribution is relatively uniform and thus representing time with a small time granularity can provide more abundant time information. As shown in Figure 3, TeRo with small time granularities, e.g., 1 day, 2 days and 3 days, had better performance on ICEWS14 compared to TeRo with big time granularities regarding MRR and Hits@3. Likewise, the optimal time unit for TeRo on ICEWS0515 was proven by our experiments to be 2 days. For Wikidata12k, using a very small time granularity was nonoptimal due to the longtail property of time data. On the other hand, using an overly big time granularity resulted in the invalid incorporation of time information. Figure 3 demonstrates the low performances of TeRo with big time granularities. More concretely, when time unit was 1 year, all of time annotations in ICEWS14 were represented by a uniform time embedding, which meant this time embedding was temporally unmeaningful. Table 5 demonstrates a few examples of link prediction results on ICEWS14 of TeRo models with time units of two days and one year.
Link Prediction  TeRo with day  TeRo with days 

Colombia, Host a vist, ?, 20140604  Kyungwha Kang  John F. Kelly 
Head of Government (China), visits, ?, 20140704  South Korea  Serbia 
UN Security Council, Criticize or denounce, ?, 20140810  North Korea  Armed Band (South Sudan) 
South Korea, Host a vist, ?, 20140620  Kim JongUn  National Security Advisor (Japan) 
Police (Australia), Accuse, ?, 20141022  Criminal (Australia)  Citizen (Australia) 
As shown in Table 5, in many cases, TeRo with predicated correctly, meanwhile TeRo with gave the wrong predictions. We notice that these predictions of TeRo with in Table 5 would be valid if we disregarded the time constraint. For instance, (Colombia, Host a visit, John F. Kelly) happened on 20140327, (UN Security Council, Criticize or denounce, Armed Band (South Sudan)) was true on 20140807. As mentioned in Section 3, Host a visit and Criticize or denounce are temporary relations. The above results prove that using a reasonable time granularity is helpful for TeRo to effectively incorporate time information. And the inclusion of time information enables TeRo to capture temporary relations and improve its performance on link prediction over TKGs.
5.3 Efficiency Study
TeRo has the same space complextiy as TTransE [13] and HyTE [5]. Since we constrained the numbers of time steps of the four TKG datasets by tuning time granularities (183 time steps in ICEWS14, 1339 time steps in ICEWS0515, 127 time steps in YAGO11k and 82 time steps in Wikidata12k), the numbers of time steps are much less than the numbers of entities in these datasets, which means that the space complexity of TeRo is close to the space complexity of TransE [3] as mentioned in Section 3.2. Regarding the concrete memory consumption, the recent stateoftheart TKGE models, ATiSE [25] and DESimplE [8] have 1.8 times and 2.2 times as large memory size as TeRo on ICEWS14 with the same embedding dimensionality. The training processes of TeRo with 500dimensional embeddings on ICEWS14, ICEWS0515, YAGO11k and Wikidata12k take 4.3 seconds, 25.9 seconds, 1.9 seconds and 4.1 seconds per epoch on a single GeForce RTX 2080 device, respectivly.
It is also noteworthy that representing each relation as a pair of dual complex embeddings is helpful to save training time on TKGs involving time intervals. Given a fact (, , , [, ]), some TKGE models, e.g., HyTE and ATiSE, discretize this fact into several quadruples involving continuous time points, i.e., [(, , , ), (, , , ), , (, , , )]. When , each fact lasts for averagely around 15 and 8 time steps in YAGO11k and Wikidata12k. In other words, such method that discretizes facts involving time intervals expands the sizes of both datasets by 15 and 8 times. In our model, we propose a more efficient method to handle time intervals by using two different quadruples, (, , , ) and (, , , ) to represent the beginning and the end of each fact. In this way, we only expand the sizes of datasets as less than twice as their original sizes.
For relations in YAGO11k, we analyze the similarities between the embeddings and . As shown in Figure 4, for shortterm relations, e.g., deadIn, the real parts of and , as well as their imaginary parts, have high similarities since and always happen at the same time and have the same semantics. By contrast, for longterm relations, e.g., isMarriedTo, the real parts of and show their semantic similarities and the imaginary parts capture their temporal dissimilarities.
6 Conclusion
In this work, we introduce TeRo, a new TKGE model which represents entities or relations as single or dual complex embeddings and temporal changes as rotations of entity embeddings in the complex vector space. Our model is advantageous with its capability in modelling several key relation patterns and handling time annotations in various forms. Experimental results show that TeRo remarkably outperforms the existing stateoftheart KGE models and TKGE models on link prediction over four wellestablished TKG datasets. Specially, we adopt two different timedivision approaches for various datasets and investigate the effect of the time granularity on the performance of our model.
Acknowledgements
This work is supported by the CLEOPATRA project (GA no. 812997), the German national funded BmBF project MLwin and the BOOST project.
References
 (2007) Dbpedia: a nucleus for a web of open data. The semantic web, pp. 722–735. Cited by: §1.
 (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §1.
 (2013) Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems, pp. 2787–2795. Cited by: §1, §1, §2, §3, §4.3, §4.4, §5.3.
 (2010) Toward an architecture for neverending language learning.. In AAAI, Vol. 5, pp. 3. Cited by: §1.
 (2018) HyTE: hyperplanebased temporally aware knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2001–2011. Cited by: §1, §1, §2, §3.2, §4.1, §4.2, §5.3.
 (2014) Introducing wikidata to the linked data web. In International Semantic Web Conference, pp. 50–65. Cited by: §1, §4.1.
 (2018) Learning sequence encoders for temporal knowledge graph completion. arXiv preprint arXiv:1809.03202. Cited by: §1, §2, §3.2, §4.1, §4.4, Table 3.
 (2020) Diachronic embedding for temporal knowledge graph completion. In AAAI, Cited by: §2, §3.2, §4.1, §4.3, §4.4, §5.3, Table 3.
 (2015) Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 687–696. Cited by: §2.
 (2019) Recurrent event network for reasoning over temporal knowledge graphs. arXiv preprint arXiv:1904.05530. Cited by: §1.
 (2018) Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning, pp. 2869–2878. Cited by: §4.4.
 Cited by: §1, §4.1.
 (2018) Deriving validity time in knowledge graph. In Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 1771–1776. Cited by: §1, §2, §4.4, §5.3.
 (2013) Gdelt: global data on events, location, and tone, 1979–2012. In ISA annual convention, Vol. 2, pp. 1–49. Cited by: §1.
 (2015) Learning entity and relation embeddings for knowledge graph completion. In Twentyninth AAAI conference on artificial intelligence, Cited by: §2.
 (2018) Embedding models for episodic knowledge graphs. Journal of Web Semantics, pp. 100490. Cited by: §1, §2.
 (2013) Yago3: a knowledge base from multilingual wikipedias. In CIDR, Cited by: §1, §4.1.
 (2019) Toward understanding the effect of loss function on the performance of knowledge graph embedding. Cited by: §2.
 (2011) A threeway model for collective learning on multirelational data.. In ICML, Vol. 11, pp. 809–816. Cited by: §2.
 (2007) Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pp. 697–706. Cited by: §1.
 (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: §1, §1, §2, §3, §4.4.
 (2017) Knowevolve: deep temporal reasoning for dynamic knowledge graphs. Cited by: §1, §2.
 (2016) Complex embeddings for simple link prediction. arXiv preprint arXiv:1606.06357. Cited by: §1, §2.
 (2014) Knowledge graph embedding by translating on hyperplanes.. In AAAI, pp. 1112–1119. Cited by: §2.
 (2019) Temporal knowledge graph embedding model based on additive time series decomposition. arXiv preprint arXiv:1911.07893. Cited by: §1, §2, §3.2, §3, §4.1, §4.2, §4.3, §4.4, §4.5, §5.3, Table 3, Table 4.
 (2020) Knowledge graph embeddings in geometric algebras. arXiv preprint arXiv:2010.00989. Cited by: §2.
 (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §1, §2, §3, §4.4.
 (2019) Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, pp. 2731–2741. Cited by: §1, §2, §4.4.