Temporal Knowledge Graph Embedding Model based on Additive Time Series Decomposition
Abstract
Knowledge Graph (KG) embedding has attracted more attention in recent years. Most of KG embedding models learn from timeunaware triples. However, the inclusion of temporal information beside triples would further improve the performance of a KGE model. In this regard, we propose ATiSE, a temporal KG embedding model which incorporates time information into entity/relation representations by using Additive Time Series decomposition. Moreover, considering the temporal uncertainty during the evolution of entity/relation representations over time, we map the representations of temporal KGs into the space of multidimensional Gaussian distributions. The mean of each entity/relation embedding at a time step shows the current expected position, whereas its covariance (which is temporally stationary) represents its temporal uncertainty. Experimental results show that ATiSE not only achieves the stateoftheart on link prediction over temporal KGs, but also can predict the occurrence time of facts with missing time annotations, as well as the existence of future events. To the best of our knowledge, no other model is capable to perform all these tasks.
1 Introduction
Knowledge Graphs (KGs) are being used for gathering and organizing scattered human knowledge into structured knowledge systems. YAGO [YAGO], NELL [NELL], DBpedia [Dbpedia] and Freebase [Freebase] are among existing KGs that have been successfully used in various applications including question answering, assistant systems, information retrieval, etc. In these KGs, knowledge can be represented as RDF triples (s, p ,o) in which s (subject) and o (object) are entities (nodes), and p (predicate) is the relation (edge) between them.
KG embedding attempts to learn the representations of entities and relations in highdimensional latent feature spaces while preserving certain properties of the original graph. Recently, KG embedding has become a very active research topic due to the wide ranges of downstream applications. Different KG embedding models have been proposed so far to efficiently learn the representations of KGs and perform KG completion as well as inferencing [TransE, TransH, DISTMULT, KG2E, ComplEx, SimplE].
We notice that most of existing KG embedding models solely learn from timeunknown facts and ignore the useful temporal information in the KBs. In fact, there are many timeaware facts (or events) in some temporal KBs. For example, (Obama, wasBornIn, Hawaii) happened at August 4, 1961. (Obama, presidentOf, USA) was true from 2009 to 2017. These temporal KGs, e.g. Integrated Crisis Early Warning System (ICEWS) [ICEWS2015], Global Database of Events, Language, and Tone (GDELT) [GDELT], YAGO3 [YAGO3] and Wikidata [Wikidata], store such temporal information either explicitly or implicitly. Traditional KBE models such as TransE learn only from timeunknown facts. Therefore, they cannot distinguish relations with similar semantic meaning. For instance, they often confuse relations such as wasBornIn and diedIn when predicting (person,?,location).
To tackle this problem, Temporal KGE models [leblay, HyTE, TATransE, DESimplE] encode time information in their embeddings. TKGE models outperform traditional KGE models on link prediction over temporal KGs. It justifies that incorporation of time information can further improve the performance of a KGE model. Some existing TKGE models embed time information into a latent space, e.g. representing time as a vector. These models cannot capture some properties of time information such as the length of time interval as well as order of two time points. Moreover, these models ignore the uncertainty during the temporal evolution. We argue that the evolution of entity representations has randomness, because the features of an entity at a certain time are not completely determined by the past information. For example, (Steve Jobs, diedIn, California) happened on 20111005. The semantic characteristics of this entity should have a sudden change at this time point. However, due to the incompleteness of knowledge in KGs, this change can not be predicted only according to its past evolutionary trend. Therefore, the representation of Steve Jobs is supposed to include some random components to handle this uncertainty, e.g. a Gaussian noise component.
In order to address the above problems, in this paper, we propose a temporal KG embedding model, ATiSE, which uses additive time series decomposition to capture the evolution process of KG representations. ATiSE fits the evolution process of an entity or relation as a multidimensional additive time series which composes of a trend componet, a seasonal component and a random component. Our approach represents each entity and relation as a multidimensional Gaussian distribution at each time step to introduce a random component. The mean of an entity/relation representation at a certain time step indicates its current expected position, which is obtained from its initial representation, its linear change term, and its seasonality term. The covariance which describes the temporal uncertainty during its evolution, is denoted as a constant diagonal matrix for computing efficiency. Our contributions are as follows.

Learning the representations for temporal KGs is a relatively unexplored problem because most of existing KG embedding models only learn from timeunknown facts. We propose ATiSE, a new KG embedding model to incorporate the time information into the KG representations.

We specially consider the temporal uncertainty during the evolution process of KG representations. Thus, we model each entity as a Gaussian distribution at each time step and use KLdivergence between two Gaussian distributions to compute the scores of facts for optimization.

Different from the previous temporal KG embedding models which use time embedding to incorporate time information, ATiSE fits the evolution process of KG representations as a multidimensional additive time series. Our work establishes a previously unexplored connection between relational processes and time series analysis with a potential to open a new direction of research on reasoning over time.

The application of time series decomposition technique enables us to predict the occurrence of a fact in a future time. Other TKGE models are not able to perform this new task, i.e., future event prediction. ATiSE also outperforms other TKG models and the stateoftheart static KGE on completing link prediction and time prediction. To the best of our knowledge, ATiSE is the only TKGE model which can perform this three tasks over TKGs.
The rest of the paper is organized as follows: In the section 2, we first review related works; in the section 3, we introduce the architecture and the learning process of our proposed models; in the section 4 and 5, we compare the performance of our models with the stateoftheart models; in the section 6, we make a conclusion in the end of this paper.
2 Related Work
A large amount of research has been done in KG embeddings. These approaches can generally be categorized into two groups, namely semantic matching models and transnational distance models [survey1]. RESCAL [RESCAL] and its extensions, e.g. DistMult [DISTMULT], ComplEx [ComplEx], SimplE [SimplE], are the semantic matching models. These models measure plausibility of facts by matching latent semantics of entities and relations embodied in their vector space representations. A few examples of translational distance models include TransE [TransE], TransH [TransH], TransD [TransD]. These models measure the plausibility of a fact as the distance between the two entities, usually after a translation carried out by the relation. Particularly, KG2E [KG2E] takes into account the uncertainties of KG representations and represents entities and relations as random vectors drawn from multivariate Gaussian distributions. KG2E scores a fact by measuring the distance between the distributions of the entities and the relation. In addition, RotatE [RotatE] achieves the stateoftheart results on link prediction by using relational rotations in complex space instead of relational translations.
The above methods achieve good results on link prediction in KGs. Moreover, some recent researches illustrate that the performances of KG embedding models can be further improved by incorporating time information in temporal KGs. TAE [TAE2016] imposes temporal order constraints on timesensitive relation pairs, e.g. BornIn and wasDiedIn, where the prior relation is supposed to lie close to the subsequent relation after a temporal transition. TAE only uses temporal order information between relations, but not the exact time information in facts. TTransE [leblay] proposes scoring functions which incorporate time representations into a TransEtype score function in different ways. TATransE and TADistMult[TATransE] utilize recurrent neural networks to learn timeaware representations of relations and uses standard scoring functions from TransE and DistMult. HyTE [HyTE] encodes time in the entityrelation space by associating a corresponding hyperplane to each timestamp. Inspired by diachronic word embeddings, DESimplE [DESimplE] uses diachronic entity embeddings to represent entities at different time steps and exploit the same score function as SimplE [SimplE] to score the plausibility of a quadruple. DESimplE is proved to achieve the SOTA results on link prediction over three benchmark TKG datasets. The above four methods represent each time step as a latent feature vector or a hyperplane matrix and update the entity/relation representations at different time steps with the corresponding time representations. That means the all entity/relation representations have the same evolution trend and these models can not represent a future point in time or predict a future event.
In this paper, we fit the temporal evolution of entity/relation representations by deploying additive time series decomposition. In contrast to the above TKGE models, each entity/relation representation in our model has its unique evolution trend over time. Moreover, we map the entity and relation representations in a space of multidimensional Gaussian distributions to model the randomness during the temporal evolution of KG representations. Our model has ability to predict the occurrence time of facts with missing time annotations, as well as the existence of future events. To the best of our knowledge, the above TKGE models are incapable to perform both these tasks.
3 Our Method
In this section, we present a detailed description of our proposed method, ATiSE, which not only uses relational properties between entities in triples but also incorporates the associated temporal metadata by using additive time series decomposition.
3.1 Additive Time Series Embedding Model
A time series is a series of timeoriented data. Time series analysis is widely used in many fields, ranging from economics and finance to managing production operations, to the analysis of political and social policy sessions [timeseries]. An important technique for time series analysis is additive time series decomposition. This technique decomposes a time series into three components, i.e., a trend component, a seasonal component and an irregular component (i.e. “noise”).
In our method, we regard the evolution of an entity/relation representation as an additive time series. Figure 1 shows an instance of additive time series decomposition. In our model, the time series here is the evolution process of one dimension of an entity/relation representation over time. For each entity/relation, we use a linear function and a Sine function to fit the trend component and the seasonal component respectively due to their simplicity. Considering the efficiency of model training, we model the irregular term by using a Gaussian noise instead of a moving average model (MA model) [ARIMA], since training an MA model requires a global optimization algorithm which will lead to more computation consumption.
To incorporate temporal information into traditional KGs, a new temporal dimension is added to fact triples, denoted as a quadruple . It represents the creation of relationship edge p between subject entity , and object entity at time step t. The score term can represent the conditional probability or the confidence value of this event , where , are representations of , and . In term of a longterm fact , we consider it to be a positive triple for each time step between and . and denote the start and end time during which the triple is valid.
At each time step, the timespecific representations of an entity or a relation should be updated as or . In order to avoid information redundancy, we only incorporate time information into entity representations or relation representations, but not both. The model where time information is incorporated into relation representations is denoted as ATiSER. Another model with evolving entity representations is called as ATiSEE. Thus, the score of a quadruple can be represented as or . Due to the similarity between ATiSEE and ATiSER, we take ATiSEE as an example to describe our method in this section.
In our proposed model ATiSEE, we utilize additive time series decomposition to fit the evolution processes of entity representations as:
(1) 
where the is the timeindependent latent representation of the ith entity which is subjected to . is the trend component where the coefficient denotes its evolutionary rate, the vector represents the direction of its evolution which is restricted to . is the seasonal component where and denotes the amplitude vector and the frequency vector. The Gaussian noise is the random component, where denotes the corresponding covariance matrix.
In other words, for a fact , entity embeddings and obey Gaussian probability distributions: and , where and are the mean vectors of and , which do not include the random components. Similarly, the predicate is represented as , where .
Similar to translationbased KGE models, we consider the transformation result of ATiSEE from the subject to the object to be akin to the predicate in a positive fact. We use the following formula to express this transformation: , which corresponds to the probability distribution . Here, and . As a result, combined with the probability of relation , we measure the similarity between and to score the fact.
#Entities  #Relations  #Time Steps  Time Span  #Training  #Validation  #Test  

ICEWS14  6,869  230  365  2014  72,826  8,941  8,963 
ICEWS0515  10,094  251  4,017  20052015  368,962  46,275  46,092 
YAGO11k  10,623  10  70  4532844  16,408  2,050  2,051 
Wikidata12k  12,554  24  81  17092018  32,497  4,062  4,062 
KL divergence is a straightforward method of measuring the similarity of two probability distributions. We optimize the following score function based on the KL divergence between the entitytransformed distribution and relation distribution [KLdivergence].
(2)  
where, and indicate the trace and inverse of the covariance matrix, respectively.
Considering the simplified diagonal covariance, we can compute the trace and inverse of the matrix simply and effectively for ATiSEE. The gradient of log determinant is , the gradient , and the gradient [petersen2008matrix]. We can compute the gradients of Equation 3.1 with respect to the timeindependent latent feature vectors, evolutionary direction vectors and covariance matrix (here acting as a vector) as follows:
(3)  
where , .
3.2 Learning
We use the loss function same as the negative sampling loss [RotatE] for effectively optimizing our proposed models.
(4) 
where, is the set of time steps in the temporal KG, is the set of positive triples with time stamp , and is the negative sample corresponding to . In this paper, we not only generate negative samples by randomly corrupting subjects or objects of the positives such as and , but also add extra negative samples which are present in the KG but do not exist in the subgraph for a particular time [HyTE]. We use this timedependent negative sampling approach for time prediction. In the other hand, to compare our model with baseline models fairly, we use uniform negative sampling method [TransE] for link prediction and future event prediction. To avoid overfitting, we add some regularizations while learning the Gaussian embedding. As described in the section 3.1, the norms of the original representations of entities and relations, as well as the norms of all evolutionary direction vectors, are restricted by 1. Besides, the following constraint is considered for covariance when we minimize the loss [KG2E]:
(5) 
where, and are the set of entities and relations respectively, and are two positive constants. During training process, we use to achieve this regularization for diagonal covariance matrices. These constraints for the mean and covariance are also considered during initialization.
4 Experiments
To show the capability of ATiSE, we compare its variants with other stateoftheart baselines and LiTSE on link prediction. Particularly, we also evaluate ATiSE and LiTSE for two other tasks: time prediction and future event prediction. A noteworthy point is that other TKGE models are incapable of future event prediction.
4.1 Datasets
The common TKGs include Integrated Crisis Early Warning System (ICEWS) [ICEWS2015], Global Database of Events, Language, and Tone (GDELT) [GDELT], Wikidata [Wikidata] and YAGO3 [YAGO3]. TATransE and TADistMult use ICEWS14, ICEWS0515, YAGO15k, Wikidata11k as datasets [TATransE]. However, all of the time intervals in the datasets YAGO15k and Wikidata11k only contain either start dates or end dates, shaped like ’occurSince 2003’ or ’occurUntill 2005’. In practice, most of time intervals in Wikidata and YAGO3 are represented by both start dates and end dates, shaped like [2003####, 2005####]. Considering this point, we decide to abandon datasets YAGO15k and Wikidata11k, and use YAGO11k and Wikidata12k released in [HyTE] instead. In addition, DESimple [DESimplE] extract a subset of GDELT as a dataset for TKGE. However the authors do not release this dataset.
To compare our model with baselines, we test our models on four datasets as metioned above, namely, ICEWS14, ICEWS0515, YAGO11k and Wikidata12k.
ICEWS is a repository that contains political events with specific time annotations, e.g. (Barack Obama, Make a visit, Ukraine, 20140708). ICEWS14 and ICEWS0515 are subsets of ICEWS [ICEWS2015] extracted by [TATransE]: 1. ICEWS14 corresponding to the facts in 2014; 2. ICEWS0515 corresponding to the facts between 2005 to 2015. These two datasets are filtered by only selecting the most frequently occurring entities in the graph [TATransE]. It is noteworthy that time annotations in ICEWS are all time points.
YAGO11k is a subset of YAGO3 [YAGO3]. Different from ICEWS, a part of time annotations in YAGO3 are repsented as time intervals, e.g. (Paul Konchesky, playsFor, England national football team, [2003####, 2005####]). Similar to the setting in HyTE [HyTE], we only deal with year level granularity by dropping the month and date information and treat timestamps as 70 different time steps (intervals) in the consideration of the balance about numbers of triples in different time steps.
Wikidata12k is a subset of Wikidata [Wikidata]. Similar to YAGO11k, Wikidata12k contains some facts involving time intervals. We treat timestamps as 81 different time steps by using the same setting as YAGO11k.
The statistics of the datasets are listed in Table 1. We compare our method and other baselines by performing link prediction on ICEWS14, ICEWS0515, YAGO11k and Wikidata12k. Specially, we also evaluate the performance of our proposed models on time prediction over ICEWS14 and YAGO11k, as well as future event prediction with on ICEWS14 and ICEWS0515.
ICEWS14  ICEWS0515  
MRR  Hits@1  Hits@3  Hits@10  MRR  Hits@1  Hits@3  Hits@10  
TransE  0.310  14.2  42.1  58.9  0.274  15.8  32.9  47.9 
DistMult  0.430  31.7  50.2  66.6  0.309  17.6  40.1  60.5 
KG2E  0.507  39.7  57.7  70.5  0.439  30.8  50.7  69.9 
ComplEx  0.481  36.9  55.8  70.2  0.471  34.7  55.5  72.3 
RotatE  0.500  38.1  58.6  71.2  0.505  38.9  54.9  68.7 
TransE  0.280  9.4    63.7  0.294  9.0    66.3 
DistMult  0.439  32.3    67.2  0.456  33.7    69.1 
SimplE  0.458  34.1  51.6  68.7  0.478  35.9  53.9  70.8 
TTransE  0.246  9.8  32.9  51.9  0.295  14.5  39.0  56.4 
HyTE  0.321  13.4  44.9  63.5  0.340  16.8  45.5  63.1 
TATransE  0.275  9.5    62.5  0.299  9.6    66.9 
TADistMult  0.477  36.3    68.6  0.474  34.6    72.8 
DESimplE  0.526  41.8  59.2  72.5  0.513  39.2  57.8  74.8 
ATiSEE  0.569  46.3  63.9  76.3  0.520  39.7  59.5  77.3 
ATiSER  0.571  46.5  64.3  75.5  0.484  35.0  55.8  74.9 
Wikidata12k  YAGO11k  
MRR  Hits@1  Hits@3  Hits@10  MRR  Hits@1  Hits@3  Hits@10  
TransE  0.438  37.1  46.2  56.5  0.196  6.7  26.5  41.0 
DistMult  0.317  26.9  35.4  40.1  0.196  16.6  19.4  26.1 
KG2E  0.462  41.6  48.4  59.7  0.205  17.8  20.7  28.7 
ComplEx  0.343  26.3  38.5  48.8  0.190  15.7  19.6  25.2 
RotatE  0.506  43.6  54.5  63.9  0.226  12.1  27.0  41.9 
TTransE  0.295  14.5  39.0  56.4  0.210  8.4  27.6  40.3 
HyTE  0.457  38.6  48.2  60.6  0.199  7.1  26.7  40.6 
ATiSEE  0.530  44.5  57.0  68.5  0.260  19.9  27.9  39.6 
ATiSER  0.555  48.2  57.9  69.4  0.245  18.6  26.0  35.6 
4.2 Evaluation Protocol
In this paper, we report the experimental results on three tasks: Link Prediction, Time Prediction and Future Event Prediction.
Link Prediction:
This task is to complete a fact with a missing entity. For a test quadruple , we generate corrupted triples by replacing or with all possible entities. We sort all the quadruples including corrupted quadruples and the test quadruples and obtain the rank of the test quadruples. Two evaluation metrics are used here, i.e. and . The Mean Reciprocal Rank (MRR) is the mean of the reciprocal values of all computed ranks. And the fraction of test quadruples ranking in the top is called Hits@k. We report the filtered setting as described in [TransE].
Time Prediction:
This task is to complete a fact with a missing time annotation. For a test quadruple , we generate corrupted triples by replacing with all possible time annotations. According to the filtered protocol, the corrupted quadruples must not be a part of the graph itself. We also get the rank of the test quadruples among corrupted quadruples and report MRR and Hit@k.
Future Event Prediction:
We split anew the facts into training, validation and test in a proportion of 80%/10%/10%. All facts in the test set occur after facts in the training/validation set. We train the model with the training set and judge whether quadruples in the test set are positive or not. The decision process is similar to triple classification [NTN]: for a fact , if is below a relationspecific threshold , then positive; otherwise negative. The thresholds are determined on the validation set.
For link prediction task, we compare our method with several stateoftheart KGE models and existing timewise KGE models, including TransE [TransE], DistMult [DISTMULT], KG2E [KG2E], ComplEx [ComplEx], RotatE [RotatE], SimplE [SimplE], TTransE [leblay], HyTE [HyTE], TATransE and TADistMult [TATransE] as well as DESimplE [DESimplE]. Beside link prediction, we also compare the performances of ATiSEE and ATiSER with HyTE and TTransE on time prediction over ICEWS14 and YAGO11k, based on our implementation. Considering that the above timewise KGE models are not capable to represent a future time step, we compare our models with the above static KGE models for future event prediction over ICEWS14 and ICEWS0515.
4.3 Experimental Setup
We implemented our models and most of baseline models in PyTorch, except DESimplE [DESimplE], TATransE and TADistMult [TATransE]. For these three models, we take the results reported in [DESimplE] and [TATransE] on ICEWS14 and ICEWS0515. The results of these models on Wikidata15k and YAGO11k could not be obtained since the codes of DESimplE, TATransE as well as TADistMult are not available, and some technique details are missing in the original papers, which leads to the difficulties of reimplementation. For the fairness of results, we used the similar experimental setup as in [DESimplE] but a lower ratio of negatives over positive training samples due to the limitation of computing resources. We used Adagrad optimizer to train all the implemented models and selected the optimal hyperparameters by early validation stopping according to MRR on the validation set. We restricted the iterations to 5000. For all the models, the batch size was kept on both the datasets. We tuned the embedding dimensionalities in {}, the learning rates in {} and the ratio of negatives over positive training samples in {}. For translationbased models, the margins were varied in the range {1, 2, 3, 5, 10, 15, , 50}. For semantic matching models, the regularizer weights were chosen from the set {0.001, 0.01, 0.1}. Similar to the setting in KG2E [KG2E], we selected the pair of restriction values and for covariance among {(0.003, 0.3), (0.005, 0.5), (0.01, 1), (0.03, 3)} for Gaussian embedding models. The default configuration for our proposed models is as follows: , , , . Below, we only list the nondefault parameters. For ATiSEE, the optimal configuration is as follows: , on ICEWS0515; , , on Wikidata12k; , on YAGO11k. For ATiSER, the optimal configuration is as follows: on ICEWS14; , on ICEWS0515; , on Wikidata12k; , on YAGO11k. The above configurations were used for all three tasks.
5 Results and Analysis
The obtained results for different tasks are based on the above mentioned experimental setup.
5.1 Link Prediction
Table 3 shows the results for link prediction task. On ICEWS14, ATiSER and ATiSEE outperformed all baseline models, regarding MR, MRR, Hits@10 and Hits@1. On ICEWS15, ATiSEE had the best performance among all embedding models, and ATiSER almost beat all baselines except DESimplE [DESimplE]. DESimplE and ATiSER had similar results on Hits@10 (74.8% vs 74.9%). It is noteworthy that the ratio of negatives over positive samples used in [DESimplE] was 500, much higher than our setting. [ComplEx] investigated the influence of on KG embedding models and discovered that increasing could lead to better results. Thus, the results obtained from [DESimplE] would become worse if the same as ours was used. On YAGO11k and Wikidata12k where a part of relations between entities continue during multiple time steps, ATiSEE and ATiSER outperformed all baseline models except DESimplE [DESimplE], TATransE and TADistMult [TATransE] in terms of MRR and Hits@1. ATiSER had the best performance on Wikidata12k regarding Hits@3 and Hits@10. On YAGO11k, ATiSEE had the best performance regarding Hits@3 and performed well on Hits@10.
The results of DESimplE, TATransE and TADistMult on Wikidata12k and YAGO11k could not be obtained because the codes of these models are not available and some implementation details are missed in the original papers. Moreover, DESimplE, TADistMult and TATransE mainly focus on eventbased dataset where all time annotations are time points. Although TATransE and TADistMult can capture some special temporal modifiers in YAGO15k and Wikidata11k [TATransE], i.e. ’occurSince’ and ’occurUntill’, they have no ability of learning common time intervals in YAGO [YAGO3] and Wikidata [Wikidata], e.g., [2003####, 2005####].
Another noteworthy finding is that KG2E significantly outperformed TransE from our implementation and got similar results to ComplEx and RotatE. On the other hand, the results from [RotatE] and [ComplEx] showed that the performances of RotatE and ComplEx were remarkably better than KG2E on static KGs, e.g. FB15k and WN18 [TransE].
These results illustrate that modeling uncertainty in TKGs by mapping KG representations into the space of multidimensional Gaussian distribution might be helpful to improve the performances of KGE models on TKGs. To prove this view, we did ablation study on the effects of random components of entity/relation representations in ATiSEE and ATiSER on the model performance.
5.2 Ablation Study
In this work, we also develop other three TKGE models as baselines, namely, LiTSE, LiSTSE and LiNTSE. In LiTSE, each entity/relation representation only includes one component, i.e., a linear trend component. In LiSTSE, each entity/relation representation consists of two components, namely, a linear trend component and a seasonal component. In LiNTSE, each entity/relation representation consists of a linear trend component and a random component. Similar to ATiSE, the variants of LiTSE, LiSTSE and LiRTSE are named as LiTSEE, LiTSER, LiSTSEE, LiSTSER, LiRTSEE and LiRTSEE. In LiTSEE, LiSTSEE and LiRTSEE, the evolving representation of an entity can be respectively represented as:
(6) 
We use the same score function as ATiSE to measure the plausibility of a fact for LiRTSE. Regarding LiTSE and LiSTSEE, we use the following translationbased scoring function, similar to [TransE].
(7) 
where .
We trained these models on ICEWS14 and YAGO11k, and searched their optimal hyperparameters under the same experimental setup as ATiSE. Through comparing the performance of LiTSE, LiSTSE and ATiSE, we can analyze the impacts of different components of entity/relation representations. The link prediction results of LiTSE and LiTSER on MRR and Hits@1 are shown in Table 4.
ICEWS14  YAGO11k  
MRR  Hits@1  MRR  Hits@1  
LiTSEE  0.342  11.0  0.196  6.6 
LiTSER  0.312  7.9  0.196  6.7 
LiSTSEE  0.325  7.5  0.215  8.0 
LiSTSER  0.297  6.2  0.214  7.7 
LiRTSEE  0.567  45.8  0.257  19.0 
LiRTSER  0.568  46.4  0.243  18.1 
ATiSEE  0.569  46.3  0.260  19.9 
ATiSER  0.571  46.5  0.245  18.6 
In Table 4, we can find that compared to LiTSE and LiSTSE, LiRTSE and ATiSE got remarkable improvements on both datasets by adding the random component into the evolving entity/relation representations. According to these results, we can make a conclusion that modeling temporal uncertainty in TKGs can significantly improve the performaces of KGE models on TKGs.
On the other hand, the performances of LiTSE on ICEWS14 were better than LiSTSE, and LiRTSE got very close results to ATiSE on ICEWS14, although the LiSTSE and ATiSE use more parameters to learn the seasonal terms of entity/relation representation. On the contrary, the performances of LiSTSE and ATiSE on YAGO11k were improved by adding seasonal terms into entity/relation representations, compared to LiTSE and LiRTSE. The difference between these two results were caused by the difference between datasets. ICEWS14 do not demonstrate any seasonal patterns since each fact in ICEWS14 occurs in an instant. By contrast, there exist both shortterm relations and longterm relations in YAGO11k. Adding seasonal components into evolving entity/relation representations is helpful to distinguish shortterm patterns and longterm patterns in YAGO11k. It can be seen from Table 5 that shortterm relations learned by LiSTSER, e.g., wasBornIn, have high evolutionary rates, and their seasonal components have smaller amplitudes and higher frequencies than longterm relations, e.g., isMarriedTo.
Relations  #TS  

wasBornIn  1.0  0.122  0.002  3.376 
worksAt  18.7  0.044  0.071  0.137 
playsFor  4.7  0.075  0.013  0.495 
hasWonPrize  28.6  0.026  0.137  0.022 
isMarriedTo  16.5  0.061  0.175  0.171 
owns  24.9  0.018  0.234  0.034 
graduatedFrom  38.1  0.008  0.056  0.015 
deadIn  1.0  0.147  0.005  1.984 
isAffliatedTo  25.8  0.018  0.044  0.065 
created  27.1  0.041  0.198  0.081 
5.3 Time Prediction
As mentioned in the previous section, we corrupted time information in positive facts to generate negative samples () for time prediction. Table 6 shows the results of our proposed models for time prediction on ICEWS14 and ICEWS0515. Since [TATransE] and [DESimplE] do not release their source codes and the implementation details are not completely clear in their papers, we compared our models with HyTE and TTransE based on our implementation. Experimental results demonstrate that ATiSEE and ATiSER outperformed HyTE and TTransE on ICEWS14, and ATiSEE also had the best performance on YAGO11k among TKGE models.
ICEWS14  YAGO11k  

MRR  Hits@10  MRR  Hits@10  
TTransE  0.029  6.8  0.633  82.3 
HyTE  0.139  22.9  0.718  85.9 
ATiSEE  0.174  29.0  0.723  87.5 
ATiSER  0.146  23.8  0.676  83.0 
5.4 Future Event Prediction
Table 7 demonstrates that ATiSEE outperformed other embedding models for future event prediction remarkably. ComplEx achieved the best performance among all baseline models. Compared to ComplEx, ATiSEE and ATiSER improved by 2.3%, 1.8% on ICEWS14 and 4.9%, 1.4% on ICEWS0515. These results show that incorporating time information into embedding models can improve the ability of embedding models to predict future events in the premise that the entity/relation representations at a future time step can be obtained. The previous TKGE models have no ability of performing future event prediction since the entity/relation representations at a future time step in these TKGE models are unknown.
Datasets  ICEWS14  ICEWS0515 

TransE  69.6  64.2 
DistMult  72.5  74.7 
KG2E  70.6  71.8 
ComplEx  72.9  76.6 
RotatE  71.7  75.6 
ATiSEE  75.2  80.5 
ATiSER  74.7  78.0 
6 Conclusion
We introduce ATiSE, a temporal KGE model that incorporates time information into KG representations by using additive time series decomposition. ATiSE fits the temporal evolution of KG representations over time as additive time series, which enables itself to estimate time information of a triple with the missing time annotation and predict the occurrence of a future event. Considering the uncertainty during the temporal evolution of KG representations, ATiSE maps the representations of temporal KGs into the space of multidimensional Gaussian distributions. The covariance of an entity/relation representation represents its randomness component. Experimental results demonstrate that our method significantly outperforms the stateoftheart methods on link prediction, time prediction and future event prediction. Besides, the results of ablation experiments show the effects of different components of entity/relation representations in ATiSE on the model performance.
Our work establishes a previously unexplored connection between relational processes and time series analysis with a potential to open a new direction of research on reasoning over time. In the future, we will explore to use more sophisticated time series analysis techniques [timeseries] to model the temporal evolution of KG representations, e.g., ARIMA model [ARIMA]. Along with considering the temporal uncertainty, another benefit of using time series analysis is to enable the embedding model to encode temporal rules. For instance, given two quadruple () and (), there exists a temporal constraint . Since the time information is represented as a numerical variable in a time series model, it is feasible to incorporate such temporal rules into our models. We will investigate the possibility of encoding temporal rules into our proposed models.
We would like to thank the referees for their comments, which helped improve this paper considerably