TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation

TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation


In the last few years, there has been a surge of interest in learning representations of entities and relations in knowledge graph (KG). However, the recent availability of temporal knowledge graphs (TKGs) that contain time information for each fact created the need for reasoning over time in such TKGs. In this regard, we present a new approach of TKG embedding, TeRo, which defines the temporal evolution of entity embedding as a rotation from the initial time to the current time in the complex vector space. Specially, for facts involving time intervals, each relation is represented as a pair of dual complex embeddings to handle the beginning and the end of the relation, respectively. We show our proposed model overcomes the limitations of the existing KG embedding models and TKG embedding models and has the ability of learning and inferring various relation patterns over time. Experimental results on four different TKGs show that TeRo significantly outperforms existing state-of-the-art models for link prediction. In addition, we analyze the effect of time granularity on link prediction over TKGs, which as far as we know has not been investigated in previous literature.

1 Introduction


This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/. In recent years, a number of sizable Knowledge Graphs (KGs) have been constructed, including DBpedia [1], YAGO [20], Nell [4] and Freebase [2]. In these KGs, a fact is represented as a triple (s, r ,o), where s (subject) and o (object) are entities (nodes), and r (relation) is the relation (edge) between them.

Several KG embedding (KGE) models are developed to perform learning and inference over these KGs [3, 27, 23, 21, 28]. The most common learning task for these models is link prediction, which is to complete a fact with the missing entity. For instance, one can use a KGE model to perform an object query like (Barack Obama, visits, ?). In this case, there are several valid answers to this question, regardless of the time factor. Obviously, the inclusion of time information can make this query more specific, e.g., (Barack Obama, visits, ?, 2014-07-08).

Some temporal KGs (TKGs) including ICEWS [12], GDELT [14], YAGO3 [17] and Wikidata [6] store billions of time-aware facts as quadruples (s, r ,o, t) where t is the time annotation, e.g., (Barack Obama, visits, Ukraine, 2014-07-08). The availability of these TKGs that exhibits complex temporal dynamics in addition to its multi-relational nature has created the need for approaches that can characterize and reason over them. Traditional KGE models disregard time information, leading to an ineffectiveness of performing link prediction on TKGs involving temporary relations (e.g., visits, live in, etc.).

To tackle this problem, TKG embedding (TKGE) models encode time information in their embeddings. Such TKGE models [22, 13, 7, 16, 5, 10] were shown to have better performances on link prediction over TKGs than traditional KGE models. However, most of the existing TKGE models are the extensions of TransE [3] and DistMult [27], and thus are not fully expressive for some relation patterns [21].

In this paper, we propose a novel approach for TKGEs, TeRo, which defines the temporal evolution of an entity embedding as a rotation from the initial time to the current time in the complex vector space. We show the limitation of the existing TKGE models and the advantage of our proposed model on learning various relation patterns over time.

Specially, for facts involving time intervals, each relation is represented as a pair of dual complex embeddings which are used to handle the beginning and the end of the relation, respectively. In this way, TeRo can adapt well to datasets where time annotations are represented in the various forms: time points, beginning or end time, time intervals.

Most of previous TKGE-related works as far as we know use specific time granularities for various TKGs. For example, the time granularity of the ICEWS datasets is fixed as 24 hours in [5, 25]. In this work, we adopt various time-division approaches for different TKG datasets and investigate the effect of the length of the time steps on the performance of our model.

To verify our approach, we compare the performance of our proposed models on link prediction and time prediction tasks over four different TKGs with the state-of-the-art KGE models and the existing TKGE models. The experimental results demonstrate that our proposed model outperforms other baseline models significantly by inferring various relation patterns and encoding time information.

2 Related Work

KGE models can be roughly classified into distance-based models and semantic matching models.

Distance-based models measure the plausibility of a fact as the distance between the two entities, usually after a translation or rotation carried out by the relation. A typical example of distance-based models is TransE [3]. TransE exhibits deficiencies when learning 1-n relations. Thus, various extensions of TransE [24, 9, 15, 18], were proposed to tackle this problem. They use different mapping methods to project entities from entity space to relation space. Specially, RotatE [21] defines each relation as a rotation from the subject to the object. Nevertheless, these distance-based distance models are still unable to capture reflexive relations which can hold, i.e. for a particular relation each entity is related to itself via . In distance-based models, the values or the phases of vectors for all reflexive relations are enforced to be 0, which does not allow to fully express the semantic characteristics of these relations.

Semantic matching models measure plausibility of facts by matching latent semantics of entities and relations embodied in their embedding representations. A few examples of such models include RESCAL [19], DistMult [27], ComplEx [23], QuatE [28] and GeomE [26]. RESCAL and DistMult cannot capture asymmetric relations since the score of the triple is always equal to the score of its symmetric triple . ComplEx, QuatE and GeomE have been proven to be able to capture various relation patterns for static KGs, but cannot model temporary relations in TKGs due to their ignorance of time information.

Recent research illustrated that the performances of KGE models can be further improved by incorporating time information in TKGs. Some TKGE models are extended from TransE, e.g., TTransE [13], TA-TransE [7], HyTE [5] and ATiSE [25]. Another part of TKGE models are temporal extensions of DistMult, e.g., Know-Evolve [22], TDistMult [16] and TA-DistMult [7]. Similar to TransE and DistMult, these TKGE models have issues with capturing reflexive relations or asymmetric relations. Specially, DE-SimplE [8] incorporates time information into diachronic entity embeddings and has capability of modeling various relation patterns. However, this approach only focuses on event-based TKG datasets, and cannot model facts involving time intervals shaped like [2003-##-##, 2005-##-##].

3 A Novel TKGE Approach based on Temporal Rotation

Although various KGE models have been developed to learn multi-relational interactions between entities, all of them have problems with inferring temporary relations which are only valid for a certain time point or a last for a certain time period. To illustrate this by example, assume we are given a quadruple (Barack Obama, visits, France, 2014-02-12) as a training sample, where the relation visits is a temporary relation. If we query (Barack Obama, visits, ?, 2014-07-08), a trained static KGE model probably returns the incorrect answer France due to the validness of the triple Barack Obama, visits, France, while the correct answer is Ukraine considering the given time constraint. On the other hand, most of the existing TKGE models, which were extended from TransE [3] and DistMult [27], incorporate time information in the embedding space, but have limitations on learning transitive relations or asymmetric relations as discussed in Section 2.

To overcome the limitations of these existing KGE and TKGE models on learning and inferring over TKGs, we propose a new TKGE model, TeRo, which defines the temporal evolution of an entity embedding as a rotation in the complex vector space. Let denote the set of entities, denote the set of relations, and denote the set of time steps. Then a TKG is a collection of factual quadruples (s, r, o, t), where are the subject and object entities, is the relation, denotes the actual time when the fact occurs. For any time , we have a time step representing this actual time. We map , , to their complex embeddings, i.e., ; then we define the functional mapping induced by each time step as an element-wise rotation from the time-independent entity embeddings s and o to the time-specific entity embeddings and . The mapping function is defined as follows:


where denotes the Hermitian dot product between complex vectors. Here, we constrain the modulus of each element of , i.e., , to be . By doing this, is of the form , which corresponds to a counter-clockwise rotation by radians around the origin of the complex plane, and only affects the phases of the entity embeddings in the complex vector space. This idea is motivated by Euler’s identity , which indicates that a unitary complex number can be regarded as a rotation in the complex plane.

We regard the relation embedding r as translation from the time-specific subject embedding to the conjugate of the time-specific object embedding for a single quadruple (s, r, o, t), where and denotes the set of all positive quadruples. The score function is defined as:


For a fact (s, r, o, t) occurring in a certain time interval, i.e., t = [t, t] where denote the beginning time and the end time of the fact, we separate this fact into two quadruples, namely, (s, r, o, t) and (s, r, o, t). Here, we extend the relation set in a TKG which involves time intervals to a pair of dual relation sets, and . A relation is used to handle the beginning of relation , meanwhile a relation is used to handle the end of relation . By doing this, we score a fact (s, r, o, [t, t]) as the mean value of scores of two quadruples, (s, r, o, t) and (s, r, o, t) which represent the beginning and the end of this fact respectively.


Specially, for a fact missing either the beginning time or the end time, e.g., (s, r, o, [t, -]) or (s, r, o, [-, t]), the score of this fact is equal to the score of the quadruple involving the known time, i.e., , .

In this paper, we use the same loss function as the negative sampling loss proposed in [21] for optimizing our model. This loss function has been proved to be very effective on optimizing some distance-based KGE models, e.g., TransE, RotatE [21] and ATiSE [25].


where is a positive training quadruple, is the th negative sample corresponding to generated by randomly corrupting the subject or the objects of such as and , denotes the sigmoid function, is a fixed margin and is the ratio of negatives over positive training samples.

3.1 Learning Various Relation Patterns

Static KGE models and some existing TKGE models which are the temporal extensions of TransE or DistMult have limitations on capturing some key relation patterns which are defined as follows.

Definition 1.

A relation r is a temporary relation if holds True

Definition 2.

A relation r is asymmetric if holds True.

Definition 3.

A relation r is a reflexive relation if holds True.

As mentioned in Section 2, static KGE models can not model temporary relations, e.g., ’visits’, since . Temporal extensions of DistMult (denoted as T-DistMult) including TDistMult, TA-DistMult and Know-Evolve can not model asymmetric relations, e.g., ’parentOf’, since , where s, o, r are time-specific entity/relation embeddings corresponding to different T-DistMult models. Temporal extensions of TransE (denoted as T-TransE) including HyTE, TTransE, TA-TransE have difficulties of modeling multiple reflexive relations, e.g., ’equalTo’ and ’subsetOf’, since .

Figure 1: (a) Illustration of TeRo with only one embedding dimension; (b) an example of modeling a temporary relation; (c) an example of modeling an asymmetric relation; (d) an example of modeling a reflexive relation.

By defining each time step as a rotation in the complex vector spaces, TeRo can capture all of the above three relation patterns. Given an observed fact (s, r, o, t) where :

  • as shown in Figure 1(b), if is a temporary relation, we can have for TeRo to make hold true.

  • as shown in Figure 1(c), if is an asymmteric relation, we can have for TeRo to make hold true.

  • as shown in Figure 1(d), if is a reflexive relation, we have for TeRo. Thus, TeRo can represent multiple reflexive relations as different embeddings due to the conjugate operations of object embeddings.

3.2 Complexity

In Table 1, we summarize the scoring functions and the space complexites of several state-of-the-art TKGE approaches and our model as well as TransE. , , and are numbers of entities, relations, time steps and temporal tokens used in [7]; is the dimensionality of embeddings. denotes the temporal projection for embeddings [5]. denotes an LSTM neural network; denotes the concatenation of the relation embedding and the sequence of temporal tokens [7]. and denote the temporal part and untemporal part of a time-specific diachronic entity embedding [8]; denotes the inverse relation embedding of , i.e., . denotes the KL divergence between two Gaussian distributions; denote the Gaussian embeddings of , and at time  [25].

As shown in Table 1, the space complexity of TeRo and TransE will be close if . In practice, we can achieve this condition by tuning the time granularity.

Model Scoring Function Space Complexity
Table 1: Comparison of our models with several baseline models for space complexity.

4 Experiments

4.1 Temporal Knowledge Graph Datasets

Common TKG benchmarks include GDELT [8], ICEWS14, ICEWS05-15, YAGO15k, Wikidata11k [7], YAGO11k and Wikidata12k [5]. In this work, we choose ICEWS14, ICEWS05-15, YAGO11k and Wikidta12k as datasets for the following reasons: 1. ICEWS14 and ICEWS05-15 are two well-established event-based datasets which are commonly used in previous literature [7, 8, 25], these two datasets are subsets of ICEWS [12] corresponding to facts in 2014 and facts between 2005 and 2015, where all time annotations are time points; 2. YAGO15k, Wikidata11k, YAGO11k and Wikidata12k are subsets of YAGO3 [17] and Wikidata [6] where a part of time annotations are time intervals. In YAGO15k and Wikidata11k, each time interval only contains either beginning dates or end dates, shaped like ’occurSince 2003’ or ’occurUntill 2005’ and a part of facts in YAGO15k exclude time information. Thus we prefer to using YAGO11k and Wikidata12k where each fact includes time information and time annotations are represented in the various forms, i.e., time points like [2003-01-01, 2003-01-01], beginning or end time like [2003, ##], and time intervals like [2003, 2005]. We list the statistics of the four datasets we use in Table 2.

Dataset #Entities #Relations Time Span #Training #Validation #Test
ICEWS14 6,869 230 2014 72,826 8,941 8,963
ICEWS05-15 10,094 251 2005-2015 368,962 46,275 46,092
YAGO11k 10,623 10 -453-2844 16,406 2,050 2,051
Wikidata12k 12,554 24 1479-2018 32,497 4,062 4,062
Table 2: Statistics of datasets.

4.2 Time Granularity

In some recent work [5, 25], the time span of a TKG dataset was splitted into a number of time steps. For ICEWS14 and ICEWS05-15, the time granularity was fixed as 1 day. For YAGO11k and Wikidata12k, month and day information was dropped, and less frequent year mentioned were clubbed into same time steps but years with high frequency formed individual time steps in order to alleviate the effect of the long-tail property of time data. In other words, the lengths of different time steps were different for the balance of numbers of triples in different time steps. However, it has not been investigated whether the lengths of time steps affect the performances of TKGE models.

In this work, we test our model with different time units, denoted as , in a range of {1, 2, 3, 7, 14, 30, 90 and 365} days for ICEWS datasets. Dasgupta et al. \shortciteHyTE and Xu et al. \shortciteATiSE applied a minimum threshold of 300 triples per interval during construction for YAGO11k and Wikidata12k. We follow their time-division approaches for these two datasets and test different minimum thresholds, denoted as , in a range of {1, 3, 10, 30, 100, 300, 1000, 3000, 10000, 30000}. The change of time granularity will reconstruct the set of time steps . For ICEWS14, when the time unit is 1 day, we have totally 365 time steps and the date 2014-01-02 is represented by the second time step, i.e., . If the time unit is changed as 2 days, the total number of time steps will be 183 and the date 2014-01-02 will be denoted as . For YAGO11k, when the mini threshold , we have 396 time steps since there are totally 396 different years existing as timestamps in YAGO11k. Years like -453, 100 and 2008 are all taken as independent time steps. When for YAGO11k rises to 300, the number of time steps drops to 127 and years between -431 and 100 are clubbed into a same time step.

4.3 Evaluation Metrics

We evaluate our model by testing the performances of our model on link prediction task over TKGs under the time-wise filtered setting defined in [25, 8]. This task is to complete a time-wise fact with a missing entity. For a test quadruple , we first generate candidate quadruples by replacing or with all possible entities. Different from the time-unwise filtered setting [3] which filters the triples appearing either in the training, validation or test set from the candidate list , we only filter the quadruples existing in the dataset. This ensures that the facts which do not appear at time are still considered as candidates for evaluating the given test quadruple. We obtain the final rank of the test quadruple among filtered candidate quadruples by sorting their scores.

Two commonly used evaluation metrics are used here, i.e., Mean Reciprocal Rank and Hits@k. The Mean Reciprocal Rank (MRR) is the means of the reciprocal values of all computed ranks. And the fraction of test quadruples ranking in the top is called Hits@k.

4.4 Baselines

We compare our approach with several state-of-the-art KGE approaches and existing TKGE approaches, including TransE [3], DistMult [27], ComplEx-N3 [11], RotatE [21], QuatE [28], TTransE [13], TA-TransE, TA-DistMult [7], DE-SimplE [8] and ATiSE [25]. The results of most baselines are taken from some recent work [8, 25] which used the same evaluation protocol as ours. DE-SimplE which mainly focuses on event-based datasets, cannot model time intervals or time annotations missing moth and day information which are common in YAGO and Wikidata. Thus its result on YAGO11k and Wikidata12k are unobtainable. Since the original source code of TA-TransE and TA-DistMult [7] is not released, we reimplement these models according to the implementation details reported in the original paper, in order to obtain their results on YAGO11k and Wikidata12k.

4.5 Experimental Setup

We implement our proposed model in PyTorch. The code is available at https://github.com/soledad921/ATISE.

We select the optimal hyperparameters by early validation stopping according to MRR on the validation set. We restrict the iterations to 5000. Following the setup used in [25], the batch size is kept for all datasets, the embedding dimensionality is tuned in {}, the ratio of negative over positive training samples is tuned in {} and the margin is tuned in {1, 2, 3, 5, 10, 20, , 120}. Regarding optimizer, we choose Adagrad for TeRo and tune the learning rate in a range of {}. Specially, the time granularity parameters and are also regraded as hyperparameters for TeRo as mentioned in Section 4.2.

The default configuration for TeRo is as follows: , . Below, we only list the non-default parameters: , , on ICEWS14; , , on ICEWS05-15; , , on YAGO11k; , , on Wikidata12k.

5 Results and Analysis

5.1 Comparative Study

Datasets ICEWS14 ICEWS05-15
Metrics MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10
TransE* .280 .094 - .637 .294 .090 - .663
DistMult* .439 .323 - .672 .456 .337 - .691
ComplEx-N3 .467 .347 .527 .716 .481 .362 .535 .729
RotatE .418 .291 .478 .690 .304 .164 .355 .595
QuatE .471 .353 .530 .712 .482 .370 .529 .727
TTransE .255 .074 - .601 .271 .084 - .616
HyTE .297 .108 .416 .655 .316 .116 .445 .681
TA-TransE* .275 .095 - .625 .299 .096 - .668
TA-DistMult* .477 .363 - .686 .474 .346 - .728
DE-SimplE .526 .418 .592 .725 .513 .392 .578 .748
ATiSE .550 .436 .629 .750 .519 .378 .606 .794
TeRo .562 .468 .621 .732 .586 .469 .668 .795
Table 3: Link prediction results on ICEWS14 and ICEWS05-15. *: results are taken from [7]. : results are taken from [8]. : results are taken from [25]. Dashes: results are unobtainable. The best results among all models are written bold.
Datasets YAGO11k Wikidata12k
Metrics MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10
TransE .100 .015 .138 .244 .178 .100 .192 .339
DistMult .158 .107 .161 .268 .222 .119 .238 .460
ComplEx-N3 .167 .106 .154 .282 .233 .123 .253 .436
RotatE .167 .103 .167 .305 .221 .116 .236 .461
QuatE .164 .107 .148 .270 .230 .125 .243 .416
TTransE .108 .020 .150 .251 .172 .096 .184 .329
HyTE .105 .015 .143 .272 .180 .098 .197 .333
TA-TransE .127 .027 .160 .326 .178 .030 .267 .429
TA-DistMult .161 .103 .171 .292 .218 .122 .232 .447
ATiSE .170 .110 .171 .288 .280 .175 .317 .481
TeRo .187 .121 .197 .319 .299 .198 .329 .507
Table 4: Link prediction results on YAGO11k and Wikidata12k. : results are taken from [25]. The best results among all models are written bold.

Table 3 and 4 list all link prediction results of our proposed model and baseline models on four datasets. TeRo surpassed all baseline embedding models regarding all metrics on all datasets except that the ATiSE got the better Hits@3 and Hits@10 than TeRo on ICEWS14. Compared to ATiSE, TeRo achieved the improvement of 1.2 MRR points, 6.7 MRR points, 1.7 MRR points and 1.9 MRR points on ICEWS14, ICEWS05-15, YAGO11k and Wikidata12k respectively.

5.2 Ablation Study

In this work, we analyze the effect of the change of the time granularity on the performance of our model. As mentioned in Section 4.2, we adopt two different time-division approaches for event-based datasets, i.e., ICEWS datasets, and time-wise KGs involving time intervals, i.e., YAGO11k as well as Wikidata12k. For ICEWS14 and ICEWS05-15, we use time steps with fixed length since the the distribution of numbers of facts in ICEWS datasets over time are relatively uniform as shown in Figure 2. The time granularities of ICEWS datasets are equal to the lengths of time units . On the other hand, the time distributions of numbers of facts in YAGO15k and Wikidata12k are long-tailed. Thus we divide the time steps in YAGO15k and Wikidata12k by setting a mini threshold for the numbers of facts in each time step. Time granularities of these two datasets can be changed by setting different thresholds .

Figure 2: Time distribution of numbers of facts.
Figure 3: Results of TeRo with different time granularities on ICEWS14 and Wikidata12k.

In ICEWS14, time distribution is relatively uniform and thus representing time with a small time granularity can provide more abundant time information. As shown in Figure 3, TeRo with small time granularities, e.g., 1 day, 2 days and 3 days, had better performance on ICEWS14 compared to TeRo with big time granularities regarding MRR and Hits@3. Likewise, the optimal time unit for TeRo on ICEWS05-15 was proven by our experiments to be 2 days. For Wikidata12k, using a very small time granularity was non-optimal due to the long-tail property of time data. On the other hand, using an overly big time granularity resulted in the invalid incorporation of time information. Figure 3 demonstrates the low performances of TeRo with big time granularities. More concretely, when time unit was 1 year, all of time annotations in ICEWS14 were represented by a uniform time embedding, which meant this time embedding was temporally unmeaningful. Table 5 demonstrates a few examples of link prediction results on ICEWS14 of TeRo models with time units of two days and one year.

Link Prediction TeRo with day TeRo with days
Colombia, Host a vist, ?, 2014-06-04 Kyung-wha Kang John F. Kelly
Head of Government (China), visits, ?, 2014-07-04 South Korea Serbia
UN Security Council, Criticize or denounce, ?, 2014-08-10 North Korea Armed Band (South Sudan)
South Korea, Host a vist, ?, 2014-06-20 Kim Jong-Un National Security Advisor (Japan)
Police (Australia), Accuse, ?, 2014-10-22 Criminal (Australia) Citizen (Australia)
Table 5: Examples of link prediction results on ICEWS14. The correct predictions are written bold.

As shown in Table 5, in many cases, TeRo with predicated correctly, meanwhile TeRo with gave the wrong predictions. We notice that these predictions of TeRo with in Table 5 would be valid if we disregarded the time constraint. For instance, (Colombia, Host a visit, John F. Kelly) happened on 2014-03-27, (UN Security Council, Criticize or denounce, Armed Band (South Sudan)) was true on 2014-08-07. As mentioned in Section 3, Host a visit and Criticize or denounce are temporary relations. The above results prove that using a reasonable time granularity is helpful for TeRo to effectively incorporate time information. And the inclusion of time information enables TeRo to capture temporary relations and improve its performance on link prediction over TKGs.

5.3 Efficiency Study

TeRo has the same space complextiy as TTransE [13] and HyTE [5]. Since we constrained the numbers of time steps of the four TKG datasets by tuning time granularities (183 time steps in ICEWS14, 1339 time steps in ICEWS05-15, 127 time steps in YAGO11k and 82 time steps in Wikidata12k), the numbers of time steps are much less than the numbers of entities in these datasets, which means that the space complexity of TeRo is close to the space complexity of TransE [3] as mentioned in Section 3.2. Regarding the concrete memory consumption, the recent state-of-the-art TKGE models, ATiSE [25] and DE-SimplE [8] have 1.8 times and 2.2 times as large memory size as TeRo on ICEWS14 with the same embedding dimensionality. The training processes of TeRo with 500-dimensional embeddings on ICEWS14, ICEWS05-15, YAGO11k and Wikidata12k take 4.3 seconds, 25.9 seconds, 1.9 seconds and 4.1 seconds per epoch on a single GeForce RTX 2080 device, respectivly.

Figure 4: Visualization of the absolute difference vectors between and for relations deadIn and isMarriedTo (reshaped into 2520 matrices): (a) for relation deadIn; (b) for relation deadIn; (c) for relation isMarriedTo; (d) for relation isMarriedTo.

It is also noteworthy that representing each relation as a pair of dual complex embeddings is helpful to save training time on TKGs involving time intervals. Given a fact (, , , [, ]), some TKGE models, e.g., HyTE and ATiSE, discretize this fact into several quadruples involving continuous time points, i.e., [(, , , ), (, , , ), , (, , , )]. When , each fact lasts for averagely around 15 and 8 time steps in YAGO11k and Wikidata12k. In other words, such method that discretizes facts involving time intervals expands the sizes of both datasets by 15 and 8 times. In our model, we propose a more efficient method to handle time intervals by using two different quadruples, (, , , ) and (, , , ) to represent the beginning and the end of each fact. In this way, we only expand the sizes of datasets as less than twice as their original sizes.

For relations in YAGO11k, we analyze the similarities between the embeddings and . As shown in Figure 4, for short-term relations, e.g., deadIn, the real parts of and , as well as their imaginary parts, have high similarities since and always happen at the same time and have the same semantics. By contrast, for long-term relations, e.g., isMarriedTo, the real parts of and show their semantic similarities and the imaginary parts capture their temporal dissimilarities.

6 Conclusion

In this work, we introduce TeRo, a new TKGE model which represents entities or relations as single or dual complex embeddings and temporal changes as rotations of entity embeddings in the complex vector space. Our model is advantageous with its capability in modelling several key relation patterns and handling time annotations in various forms. Experimental results show that TeRo remarkably outperforms the existing state-of-the-art KGE models and TKGE models on link prediction over four well-established TKG datasets. Specially, we adopt two different time-division approaches for various datasets and investigate the effect of the time granularity on the performance of our model.


This work is supported by the CLEOPATRA project (GA no. 812997), the German national funded BmBF project MLwin and the BOOST project.


  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives (2007) Dbpedia: a nucleus for a web of open data. The semantic web, pp. 722–735. Cited by: §1.
  2. K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §1.
  3. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pp. 2787–2795. Cited by: §1, §1, §2, §3, §4.3, §4.4, §5.3.
  4. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr and T. M. Mitchell (2010) Toward an architecture for never-ending language learning.. In AAAI, Vol. 5, pp. 3. Cited by: §1.
  5. S. S. Dasgupta, S. N. Ray and P. Talukdar (2018) HyTE: hyperplane-based temporally aware knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2001–2011. Cited by: §1, §1, §2, §3.2, §4.1, §4.2, §5.3.
  6. F. Erxleben, M. Günther, M. Krötzsch, J. Mendez and D. Vrandečić (2014) Introducing wikidata to the linked data web. In International Semantic Web Conference, pp. 50–65. Cited by: §1, §4.1.
  7. A. García-Durán, S. Dumančić and M. Niepert (2018) Learning sequence encoders for temporal knowledge graph completion. arXiv preprint arXiv:1809.03202. Cited by: §1, §2, §3.2, §4.1, §4.4, Table 3.
  8. R. Goel, S. M. Kazemi, M. Brubaker and P. Poupart (2020) Diachronic embedding for temporal knowledge graph completion. In AAAI, Cited by: §2, §3.2, §4.1, §4.3, §4.4, §5.3, Table 3.
  9. G. Ji, S. He, L. Xu, K. Liu and J. Zhao (2015) Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 687–696. Cited by: §2.
  10. W. Jin, C. Zhang, P. Szekely and X. Ren (2019) Recurrent event network for reasoning over temporal knowledge graphs. arXiv preprint arXiv:1904.05530. Cited by: §1.
  11. T. Lacroix, N. Usunier and G. Obozinski (2018) Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning, pp. 2869–2878. Cited by: §4.4.
  12. Cited by: §1, §4.1.
  13. J. Leblay and M. W. Chekol (2018) Deriving validity time in knowledge graph. In Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 1771–1776. Cited by: §1, §2, §4.4, §5.3.
  14. K. Leetaru and P. A. Schrodt (2013) Gdelt: global data on events, location, and tone, 1979–2012. In ISA annual convention, Vol. 2, pp. 1–49. Cited by: §1.
  15. Y. Lin, Z. Liu, M. Sun, Y. Liu and X. Zhu (2015) Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence, Cited by: §2.
  16. Y. Ma, V. Tresp and E. A. Daxberger (2018) Embedding models for episodic knowledge graphs. Journal of Web Semantics, pp. 100490. Cited by: §1, §2.
  17. F. Mahdisoltani, J. Biega and F. M. Suchanek (2013) Yago3: a knowledge base from multilingual wikipedias. In CIDR, Cited by: §1, §4.1.
  18. M. Nayyeri, C. Xu, Y. Yaghoobzadeh, H. S. Yazdi and J. Lehmann (2019) Toward understanding the effect of loss function on the performance of knowledge graph embedding. Cited by: §2.
  19. M. Nickel, V. Tresp and H. Kriegel (2011) A three-way model for collective learning on multi-relational data.. In ICML, Vol. 11, pp. 809–816. Cited by: §2.
  20. F. M. Suchanek, G. Kasneci and G. Weikum (2007) Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pp. 697–706. Cited by: §1.
  21. Z. Sun, Z. Deng, J. Nie and J. Tang (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: §1, §1, §2, §3, §4.4.
  22. R. Trivedi, H. Dai, Y. Wang and L. Song (2017) Know-evolve: deep temporal reasoning for dynamic knowledge graphs. Cited by: §1, §2.
  23. T. Trouillon, J. Welbl, S. Riedel, É. Gaussier and G. Bouchard (2016) Complex embeddings for simple link prediction. arXiv preprint arXiv:1606.06357. Cited by: §1, §2.
  24. Z. Wang, J. Zhang, J. Feng and Z. Chen (2014) Knowledge graph embedding by translating on hyperplanes.. In AAAI, pp. 1112–1119. Cited by: §2.
  25. C. Xu, M. Nayyeri, F. Alkhoury, J. Lehmann and H. S. Yazdi (2019) Temporal knowledge graph embedding model based on additive time series decomposition. arXiv preprint arXiv:1911.07893. Cited by: §1, §2, §3.2, §3, §4.1, §4.2, §4.3, §4.4, §4.5, §5.3, Table 3, Table 4.
  26. C. Xu, M. Nayyeri, Y. Chen and J. Lehmann (2020) Knowledge graph embeddings in geometric algebras. arXiv preprint arXiv:2010.00989. Cited by: §2.
  27. B. Yang, W. Yih, X. He, J. Gao and L. Deng (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §1, §2, §3, §4.4.
  28. S. Zhang, Y. Tay, L. Yao and Q. Liu (2019) Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, pp. 2731–2741. Cited by: §1, §2, §4.4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description