Temporal Knowledge Graph Embedding Model based on Additive Time Series Decomposition

Temporal Knowledge Graph Embedding Model based on Additive Time Series Decomposition

Chengjin Xu University of Bonn, Germany, email: xuc@iai.uni-bonn.de    Mojtaba Nayyeri    Fouad Alkhoury    Jens Lehmann    Hamed Shariat Yazdi
Abstract

Knowledge Graph (KG) embedding has attracted more attention in recent years. Most of KG embedding models learn from time-unaware triples. However, the inclusion of temporal information beside triples would further improve the performance of a KGE model. In this regard, we propose ATiSE, a temporal KG embedding model which incorporates time information into entity/relation representations by using Additive Time Series decomposition. Moreover, considering the temporal uncertainty during the evolution of entity/relation representations over time, we map the representations of temporal KGs into the space of multi-dimensional Gaussian distributions. The mean of each entity/relation embedding at a time step shows the current expected position, whereas its covariance (which is temporally stationary) represents its temporal uncertainty. Experimental results show that ATiSE not only achieves the state-of-the-art on link prediction over temporal KGs, but also can predict the occurrence time of facts with missing time annotations, as well as the existence of future events. To the best of our knowledge, no other model is capable to perform all these tasks.

1 Introduction

Knowledge Graphs (KGs) are being used for gathering and organizing scattered human knowledge into structured knowledge systems. YAGO [YAGO], NELL [NELL], DBpedia [Dbpedia] and Freebase [Freebase] are among existing KGs that have been successfully used in various applications including question answering, assistant systems, information retrieval, etc. In these KGs, knowledge can be represented as RDF triples (s, p ,o) in which s (subject) and o (object) are entities (nodes), and p (predicate) is the relation (edge) between them.

KG embedding attempts to learn the representations of entities and relations in high-dimensional latent feature spaces while preserving certain properties of the original graph. Recently, KG embedding has become a very active research topic due to the wide ranges of downstream applications. Different KG embedding models have been proposed so far to efficiently learn the representations of KGs and perform KG completion as well as inferencing [TransE, TransH, DISTMULT, KG2E, ComplEx, SimplE].

We notice that most of existing KG embedding models solely learn from time-unknown facts and ignore the useful temporal information in the KBs. In fact, there are many time-aware facts (or events) in some temporal KBs. For example, (Obama, wasBornIn, Hawaii) happened at August 4, 1961. (Obama, presidentOf, USA) was true from 2009 to 2017. These temporal KGs, e.g. Integrated Crisis Early Warning System (ICEWS) [ICEWS2015], Global Database of Events, Language, and Tone (GDELT) [GDELT], YAGO3 [YAGO3] and Wikidata [Wikidata], store such temporal information either explicitly or implicitly. Traditional KBE models such as TransE learn only from time-unknown facts. Therefore, they cannot distinguish relations with similar semantic meaning. For instance, they often confuse relations such as wasBornIn and diedIn when predicting (person,?,location).

To tackle this problem, Temporal KGE models [leblay, HyTE, TA-TransE, DE-SimplE] encode time information in their embeddings. TKGE models outperform traditional KGE models on link prediction over temporal KGs. It justifies that incorporation of time information can further improve the performance of a KGE model. Some existing TKGE models embed time information into a latent space, e.g.  representing time as a vector. These models cannot capture some properties of time information such as the length of time interval as well as order of two time points. Moreover, these models ignore the uncertainty during the temporal evolution. We argue that the evolution of entity representations has randomness, because the features of an entity at a certain time are not completely determined by the past information. For example, (Steve Jobs, diedIn, California) happened on 2011-10-05. The semantic characteristics of this entity should have a sudden change at this time point. However, due to the incompleteness of knowledge in KGs, this change can not be predicted only according to its past evolutionary trend. Therefore, the representation of Steve Jobs is supposed to include some random components to handle this uncertainty, e.g.  a Gaussian noise component.

In order to address the above problems, in this paper, we propose a temporal KG embedding model, ATiSE, which uses additive time series decomposition to capture the evolution process of KG representations. ATiSE fits the evolution process of an entity or relation as a multi-dimensional additive time series which composes of a trend componet, a seasonal component and a random component. Our approach represents each entity and relation as a multi-dimensional Gaussian distribution at each time step to introduce a random component. The mean of an entity/relation representation at a certain time step indicates its current expected position, which is obtained from its initial representation, its linear change term, and its seasonality term. The covariance which describes the temporal uncertainty during its evolution, is denoted as a constant diagonal matrix for computing efficiency. Our contributions are as follows.

  • Learning the representations for temporal KGs is a relatively unexplored problem because most of existing KG embedding models only learn from time-unknown facts. We propose ATiSE, a new KG embedding model to incorporate the time information into the KG representations.

  • We specially consider the temporal uncertainty during the evolution process of KG representations. Thus, we model each entity as a Gaussian distribution at each time step and use KL-divergence between two Gaussian distributions to compute the scores of facts for optimization.

  • Different from the previous temporal KG embedding models which use time embedding to incorporate time information, ATiSE fits the evolution process of KG representations as a multi-dimensional additive time series. Our work establishes a previously unexplored connection between relational processes and time series analysis with a potential to open a new direction of research on reasoning over time.

  • The application of time series decomposition technique enables us to predict the occurrence of a fact in a future time. Other TKGE models are not able to perform this new task, i.e., future event prediction. ATiSE also outperforms other TKG models and the state-of-the-art static KGE on completing link prediction and time prediction. To the best of our knowledge, ATiSE is the only TKGE model which can perform this three tasks over TKGs.

The rest of the paper is organized as follows: In the section 2, we first review related works; in the section 3, we introduce the architecture and the learning process of our proposed models; in the section 4 and 5, we compare the performance of our models with the state-of-the-art models; in the section 6, we make a conclusion in the end of this paper.

2 Related Work

A large amount of research has been done in KG embeddings. These approaches can generally be categorized into two groups, namely semantic matching models and transnational distance models [survey1]. RESCAL [RESCAL] and its extensions, e.g. DistMult [DISTMULT], ComplEx [ComplEx], SimplE [SimplE], are the semantic matching models. These models measure plausibility of facts by matching latent semantics of entities and relations embodied in their vector space representations. A few examples of translational distance models include TransE [TransE], TransH [TransH], TransD [TransD]. These models measure the plausibility of a fact as the distance between the two entities, usually after a translation carried out by the relation. Particularly, KG2E [KG2E] takes into account the uncertainties of KG representations and represents entities and relations as random vectors drawn from multivariate Gaussian distributions. KG2E scores a fact by measuring the distance between the distributions of the entities and the relation. In addition, RotatE [RotatE] achieves the state-of-the-art results on link prediction by using relational rotations in complex space instead of relational translations.

The above methods achieve good results on link prediction in KGs. Moreover, some recent researches illustrate that the performances of KG embedding models can be further improved by incorporating time information in temporal KGs. TAE [TAE2016] imposes temporal order constraints on time-sensitive relation pairs, e.g. BornIn and wasDiedIn, where the prior relation is supposed to lie close to the subsequent relation after a temporal transition. TAE only uses temporal order information between relations, but not the exact time information in facts. TTransE [leblay] proposes scoring functions which incorporate time representations into a TransE-type score function in different ways. TA-TransE and TA-DistMult[TA-TransE] utilize recurrent neural networks to learn time-aware representations of relations and uses standard scoring functions from TransE and DistMult. HyTE [HyTE] encodes time in the entity-relation space by associating a corresponding hyperplane to each timestamp. Inspired by diachronic word embeddings, DE-SimplE [DE-SimplE] uses diachronic entity embeddings to represent entities at different time steps and exploit the same score function as SimplE [SimplE] to score the plausibility of a quadruple. DE-SimplE is proved to achieve the SOTA results on link prediction over three benchmark TKG datasets. The above four methods represent each time step as a latent feature vector or a hyperplane matrix and update the entity/relation representations at different time steps with the corresponding time representations. That means the all entity/relation representations have the same evolution trend and these models can not represent a future point in time or predict a future event.

In this paper, we fit the temporal evolution of entity/relation representations by deploying additive time series decomposition. In contrast to the above TKGE models, each entity/relation representation in our model has its unique evolution trend over time. Moreover, we map the entity and relation representations in a space of multi-dimensional Gaussian distributions to model the randomness during the temporal evolution of KG representations. Our model has ability to predict the occurrence time of facts with missing time annotations, as well as the existence of future events. To the best of our knowledge, the above TKGE models are incapable to perform both these tasks.

3 Our Method

In this section, we present a detailed description of our proposed method, ATiSE, which not only uses relational properties between entities in triples but also incorporates the associated temporal meta-data by using additive time series decomposition.

3.1 Additive Time Series Embedding Model

A time series is a series of time-oriented data. Time series analysis is widely used in many fields, ranging from economics and finance to managing production operations, to the analysis of political and social policy sessions [timeseries]. An important technique for time series analysis is additive time series decomposition. This technique decomposes a time series into three components, i.e., a trend component, a seasonal component and an irregular component (i.e. “noise”).

Figure 1: Illustration of additive time series decomposition.

In our method, we regard the evolution of an entity/relation representation as an additive time series. Figure 1 shows an instance of additive time series decomposition. In our model, the time series here is the evolution process of one dimension of an entity/relation representation over time. For each entity/relation, we use a linear function and a Sine function to fit the trend component and the seasonal component respectively due to their simplicity. Considering the efficiency of model training, we model the irregular term by using a Gaussian noise instead of a moving average model (MA model) [ARIMA], since training an MA model requires a global optimization algorithm which will lead to more computation consumption.

To incorporate temporal information into traditional KGs, a new temporal dimension is added to fact triples, denoted as a quadruple . It represents the creation of relationship edge p between subject entity , and object entity at time step t. The score term can represent the conditional probability or the confidence value of this event , where , are representations of , and . In term of a long-term fact , we consider it to be a positive triple for each time step between and . and denote the start and end time during which the triple is valid.

At each time step, the time-specific representations of an entity or a relation should be updated as or . In order to avoid information redundancy, we only incorporate time information into entity representations or relation representations, but not both. The model where time information is incorporated into relation representations is denoted as ATiSER. Another model with evolving entity representations is called as ATiSEE. Thus, the score of a quadruple can be represented as or . Due to the similarity between ATiSEE and ATiSER, we take ATiSEE as an example to describe our method in this section.

In our proposed model ATiSEE, we utilize additive time series decomposition to fit the evolution processes of entity representations as:

(1)

where the is the time-independent latent representation of the ith entity which is subjected to . is the trend component where the coefficient denotes its evolutionary rate, the vector represents the direction of its evolution which is restricted to . is the seasonal component where and denotes the amplitude vector and the frequency vector. The Gaussian noise is the random component, where denotes the corresponding covariance matrix.

In other words, for a fact , entity embeddings and obey Gaussian probability distributions: and , where and are the mean vectors of and , which do not include the random components. Similarly, the predicate is represented as , where .

Similar to translation-based KGE models, we consider the transformation result of ATiSEE from the subject to the object to be akin to the predicate in a positive fact. We use the following formula to express this transformation: , which corresponds to the probability distribution . Here, and . As a result, combined with the probability of relation , we measure the similarity between and to score the fact.

#Entities #Relations #Time Steps Time Span #Training #Validation #Test
ICEWS14 6,869 230 365 2014 72,826 8,941 8,963
ICEWS05-15 10,094 251 4,017 2005-2015 368,962 46,275 46,092
YAGO11k 10,623 10 70 -453-2844 16,408 2,050 2,051
Wikidata12k 12,554 24 81 1709-2018 32,497 4,062 4,062
Table 1: Statistics of datasets.

KL divergence is a straightforward method of measuring the similarity of two probability distributions. We optimize the following score function based on the KL divergence between the entity-transformed distribution and relation distribution [KLdivergence].

(2)

where, and indicate the trace and inverse of the covariance matrix, respectively.

Considering the simplified diagonal covariance, we can compute the trace and inverse of the matrix simply and effectively for ATiSEE. The gradient of log determinant is , the gradient , and the gradient  [petersen2008matrix]. We can compute the gradients of Equation 3.1 with respect to the time-independent latent feature vectors, evolutionary direction vectors and covariance matrix (here acting as a vector) as follows:

(3)

where , .

3.2 Learning

We use the loss function same as the negative sampling loss [RotatE] for effectively optimizing our proposed models.

(4)

where, is the set of time steps in the temporal KG, is the set of positive triples with time stamp , and is the negative sample corresponding to . In this paper, we not only generate negative samples by randomly corrupting subjects or objects of the positives such as and , but also add extra negative samples which are present in the KG but do not exist in the subgraph for a particular time [HyTE]. We use this time-dependent negative sampling approach for time prediction. In the other hand, to compare our model with baseline models fairly, we use uniform negative sampling method [TransE] for link prediction and future event prediction. To avoid overfitting, we add some regularizations while learning the Gaussian embedding. As described in the section 3.1, the norms of the original representations of entities and relations, as well as the norms of all evolutionary direction vectors, are restricted by 1. Besides, the following constraint is considered for covariance when we minimize the loss  [KG2E]:

(5)

where, and are the set of entities and relations respectively, and are two positive constants. During training process, we use to achieve this regularization for diagonal covariance matrices. These constraints for the mean and covariance are also considered during initialization.

4 Experiments

To show the capability of ATiSE, we compare its variants with other state-of-the-art baselines and LiTSE on link prediction. Particularly, we also evaluate ATiSE and LiTSE for two other tasks: time prediction and future event prediction. A noteworthy point is that other TKGE models are incapable of future event prediction.

4.1 Datasets

The common TKGs include Integrated Crisis Early Warning System (ICEWS) [ICEWS2015], Global Database of Events, Language, and Tone (GDELT) [GDELT], Wikidata [Wikidata] and YAGO3 [YAGO3]. TA-TransE and TA-DistMult use ICEWS14, ICEWS05-15, YAGO15k, Wikidata11k as datasets [TA-TransE]. However, all of the time intervals in the datasets YAGO15k and Wikidata11k only contain either start dates or end dates, shaped like ’occurSince 2003’ or ’occurUntill 2005’. In practice, most of time intervals in Wikidata and YAGO3 are represented by both start dates and end dates, shaped like [2003-##-##, 2005-##-##]. Considering this point, we decide to abandon datasets YAGO15k and Wikidata11k, and use YAGO11k and Wikidata12k released in [HyTE] instead. In addition, DE-Simple [DE-SimplE] extract a subset of GDELT as a dataset for TKGE. However the authors do not release this dataset.

To compare our model with baselines, we test our models on four datasets as metioned above, namely, ICEWS14, ICEWS05-15, YAGO11k and Wikidata12k.

ICEWS is a repository that contains political events with specific time annotations, e.g. (Barack Obama, Make a visit, Ukraine, 2014-07-08). ICEWS14 and ICEWS05-15 are subsets of ICEWS [ICEWS2015] extracted by [TA-TransE]: 1. ICEWS14 corresponding to the facts in 2014; 2. ICEWS05-15 corresponding to the facts between 2005 to 2015. These two datasets are filtered by only selecting the most frequently occurring entities in the graph [TA-TransE]. It is noteworthy that time annotations in ICEWS are all time points.

YAGO11k is a subset of YAGO3 [YAGO3]. Different from ICEWS, a part of time annotations in YAGO3 are repsented as time intervals, e.g. (Paul Konchesky, playsFor, England national football team, [2003-##-##, 2005-##-##]). Similar to the setting in HyTE [HyTE], we only deal with year level granularity by dropping the month and date information and treat timestamps as 70 different time steps (intervals) in the consideration of the balance about numbers of triples in different time steps.

Wikidata12k is a subset of Wikidata [Wikidata]. Similar to YAGO11k, Wikidata12k contains some facts involving time intervals. We treat timestamps as 81 different time steps by using the same setting as YAGO11k.

The statistics of the datasets are listed in Table 1. We compare our method and other baselines by performing link prediction on ICEWS14, ICEWS05-15, YAGO11k and Wikidata12k. Specially, we also evaluate the performance of our proposed models on time prediction over ICEWS14 and YAGO11k, as well as future event prediction with on ICEWS14 and ICEWS05-15.

ICEWS14 ICEWS05-15
MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10
TransE 0.310 14.2 42.1 58.9 0.274 15.8 32.9 47.9
DistMult 0.430 31.7 50.2 66.6 0.309 17.6 40.1 60.5
KG2E 0.507 39.7 57.7 70.5 0.439 30.8 50.7 69.9
ComplEx 0.481 36.9 55.8 70.2 0.471 34.7 55.5 72.3
RotatE 0.500 38.1 58.6 71.2 0.505 38.9 54.9 68.7
TransE 0.280 9.4 - 63.7 0.294 9.0 - 66.3
DistMult 0.439 32.3 - 67.2 0.456 33.7 - 69.1
SimplE 0.458 34.1 51.6 68.7 0.478 35.9 53.9 70.8
TTransE 0.246 9.8 32.9 51.9 0.295 14.5 39.0 56.4
HyTE 0.321 13.4 44.9 63.5 0.340 16.8 45.5 63.1
TA-TransE 0.275 9.5 - 62.5 0.299 9.6 - 66.9
TA-DistMult 0.477 36.3 - 68.6 0.474 34.6 - 72.8
DE-SimplE 0.526 41.8 59.2 72.5 0.513 39.2 57.8 74.8
ATiSEE 0.569 46.3 63.9 76.3 0.520 39.7 59.5 77.3
ATiSER 0.571 46.5 64.3 75.5 0.484 35.0 55.8 74.9
Table 2: Link prediction results (filtered setting). Rows 1-7: basic models with no time information. Rows 8-11: models which encode information. indicates results in this row were taken from [TA-TransE]. indicates results in this row were taken from [DE-SimplE]. Dashes: results could not be obtained. The best results among all models are written bold.
Wikidata12k YAGO11k
MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10
TransE 0.438 37.1 46.2 56.5 0.196 6.7 26.5 41.0
DistMult 0.317 26.9 35.4 40.1 0.196 16.6 19.4 26.1
KG2E 0.462 41.6 48.4 59.7 0.205 17.8 20.7 28.7
ComplEx 0.343 26.3 38.5 48.8 0.190 15.7 19.6 25.2
RotatE 0.506 43.6 54.5 63.9 0.226 12.1 27.0 41.9
TTransE 0.295 14.5 39.0 56.4 0.210 8.4 27.6 40.3
HyTE 0.457 38.6 48.2 60.6 0.199 7.1 26.7 40.6
ATiSEE 0.530 44.5 57.0 68.5 0.260 19.9 27.9 39.6
ATiSER 0.555 48.2 57.9 69.4 0.245 18.6 26.0 35.6
Table 3: Link prediction results (filtered setting). Rows 1-5: basic models with no time information. Rows 6-7: models which encode information. The best results among all models are written bold. The results of some baselines could not be obtained since we could not reimplement these models, e.g., DE-SimplE [DE-SimplE], TA-TransE and TA-DistMult [TA-TransE].

4.2 Evaluation Protocol

In this paper, we report the experimental results on three tasks: Link Prediction, Time Prediction and Future Event Prediction.

Link Prediction:

This task is to complete a fact with a missing entity. For a test quadruple , we generate corrupted triples by replacing or with all possible entities. We sort all the quadruples including corrupted quadruples and the test quadruples and obtain the rank of the test quadruples. Two evaluation metrics are used here, i.e.  and . The Mean Reciprocal Rank (MRR) is the mean of the reciprocal values of all computed ranks. And the fraction of test quadruples ranking in the top is called Hits@k. We report the filtered setting as described in [TransE].

Time Prediction:

This task is to complete a fact with a missing time annotation. For a test quadruple , we generate corrupted triples by replacing with all possible time annotations. According to the filtered protocol, the corrupted quadruples must not be a part of the graph itself. We also get the rank of the test quadruples among corrupted quadruples and report MRR and Hit@k.

Future Event Prediction:

We split anew the facts into training, validation and test in a proportion of 80%/10%/10%. All facts in the test set occur after facts in the training/validation set. We train the model with the training set and judge whether quadruples in the test set are positive or not. The decision process is similar to triple classification [NTN]: for a fact , if is below a relation-specific threshold , then positive; otherwise negative. The thresholds are determined on the validation set.

For link prediction task, we compare our method with several state-of-the-art KGE models and existing time-wise KGE models, including TransE [TransE], DistMult [DISTMULT], KG2E [KG2E], ComplEx [ComplEx], RotatE [RotatE], SimplE [SimplE], TTransE [leblay], HyTE [HyTE], TA-TransE and TA-DistMult [TA-TransE] as well as DE-SimplE [DE-SimplE]. Beside link prediction, we also compare the performances of ATiSEE and ATiSER with HyTE and TTransE on time prediction over ICEWS14 and YAGO11k, based on our implementation. Considering that the above time-wise KGE models are not capable to represent a future time step, we compare our models with the above static KGE models for future event prediction over ICEWS14 and ICEWS05-15.

4.3 Experimental Setup

We implemented our models and most of baseline models in PyTorch, except DE-SimplE [DE-SimplE], TA-TransE and TA-DistMult [TA-TransE]. For these three models, we take the results reported in [DE-SimplE] and [TA-TransE] on ICEWS14 and ICEWS05-15. The results of these models on Wikidata15k and YAGO11k could not be obtained since the codes of DE-SimplE, TA-TransE as well as TA-DistMult are not available, and some technique details are missing in the original papers, which leads to the difficulties of reimplementation. For the fairness of results, we used the similar experimental setup as in [DE-SimplE] but a lower ratio of negatives over positive training samples due to the limitation of computing resources. We used Adagrad optimizer to train all the implemented models and selected the optimal hyperparameters by early validation stopping according to MRR on the validation set. We restricted the iterations to 5000. For all the models, the batch size was kept on both the datasets. We tuned the embedding dimensionalities in {}, the learning rates in {} and the ratio of negatives over positive training samples in {}. For translation-based models, the margins were varied in the range {1, 2, 3, 5, 10, 15, , 50}. For semantic matching models, the regularizer weights were chosen from the set {0.001, 0.01, 0.1}. Similar to the setting in KG2E [KG2E], we selected the pair of restriction values and for covariance among {(0.003, 0.3), (0.005, 0.5), (0.01, 1), (0.03, 3)} for Gaussian embedding models. The default configuration for our proposed models is as follows: , , , . Below, we only list the non-default parameters. For ATiSEE, the optimal configuration is as follows: , on ICEWS05-15; , , on Wikidata12k; , on YAGO11k. For ATiSER, the optimal configuration is as follows: on ICEWS14; , on ICEWS05-15; , on Wikidata12k; , on YAGO11k. The above configurations were used for all three tasks.

5 Results and Analysis

The obtained results for different tasks are based on the above mentioned experimental setup.

5.1 Link Prediction

Table 3 shows the results for link prediction task. On ICEWS14, ATiSER and ATiSEE outperformed all baseline models, regarding MR, MRR, Hits@10 and Hits@1. On ICEWS15, ATiSEE had the best performance among all embedding models, and ATiSER almost beat all baselines except DE-SimplE [DE-SimplE]. DE-SimplE and ATiSER had similar results on Hits@10 (74.8% vs 74.9%). It is noteworthy that the ratio of negatives over positive samples used in [DE-SimplE] was 500, much higher than our setting. [ComplEx] investigated the influence of on KG embedding models and discovered that increasing could lead to better results. Thus, the results obtained from [DE-SimplE] would become worse if the same as ours was used. On YAGO11k and Wikidata12k where a part of relations between entities continue during multiple time steps, ATiSEE and ATiSER outperformed all baseline models except DE-SimplE [DE-SimplE], TA-TransE and TA-DistMult [TA-TransE] in terms of MRR and Hits@1. ATiSER had the best performance on Wikidata12k regarding Hits@3 and Hits@10. On YAGO11k, ATiSEE had the best performance regarding Hits@3 and performed well on Hits@10.

The results of DE-SimplE, TA-TransE and TA-DistMult on Wikidata12k and YAGO11k could not be obtained because the codes of these models are not available and some implementation details are missed in the original papers. Moreover, DE-SimplE, TA-DistMult and TA-TransE mainly focus on event-based dataset where all time annotations are time points. Although TA-TransE and TA-DistMult can capture some special temporal modifiers in YAGO15k and Wikidata11k [TA-TransE], i.e. ’occurSince’ and ’occurUntill’, they have no ability of learning common time intervals in YAGO [YAGO3] and Wikidata [Wikidata], e.g., [2003-##-##, 2005-##-##].

Another noteworthy finding is that KG2E significantly outperformed TransE from our implementation and got similar results to ComplEx and RotatE. On the other hand, the results from [RotatE] and [ComplEx] showed that the performances of RotatE and ComplEx were remarkably better than KG2E on static KGs, e.g. FB15k and WN18 [TransE].

These results illustrate that modeling uncertainty in TKGs by mapping KG representations into the space of multi-dimensional Gaussian distribution might be helpful to improve the performances of KGE models on TKGs. To prove this view, we did ablation study on the effects of random components of entity/relation representations in ATiSEE and ATiSER on the model performance.

5.2 Ablation Study

In this work, we also develop other three TKGE models as baselines, namely, Li-TSE, LiS-TSE and LiN-TSE. In Li-TSE, each entity/relation representation only includes one component, i.e., a linear trend component. In LiS-TSE, each entity/relation representation consists of two components, namely, a linear trend component and a seasonal component. In LiN-TSE, each entity/relation representation consists of a linear trend component and a random component. Similar to ATiSE, the variants of Li-TSE, LiS-TSE and LiR-TSE are named as Li-TSEE, Li-TSER, LiS-TSEE, LiS-TSER, LiR-TSEE and LiR-TSEE. In Li-TSEE, LiS-TSEE and LiR-TSEE, the evolving representation of an entity can be respectively represented as:

(6)

We use the same score function as ATiSE to measure the plausibility of a fact for LiR-TSE. Regarding Li-TSE and LiS-TSEE, we use the following translation-based scoring function, similar to [TransE].

(7)

where .

We trained these models on ICEWS14 and YAGO11k, and searched their optimal hyperparameters under the same experimental setup as ATiSE. Through comparing the performance of Li-TSE, LiS-TSE and ATiSE, we can analyze the impacts of different components of entity/relation representations. The link prediction results of Li-TSE and Li-TSER on MRR and Hits@1 are shown in Table 4.

ICEWS14 YAGO11k
MRR Hits@1 MRR Hits@1
Li-TSEE 0.342 11.0 0.196 6.6
Li-TSER 0.312 7.9 0.196 6.7
LiS-TSEE 0.325 7.5 0.215 8.0
LiS-TSER 0.297 6.2 0.214 7.7
LiR-TSEE 0.567 45.8 0.257 19.0
LiR-TSER 0.568 46.4 0.243 18.1
ATiSEE 0.569 46.3 0.260 19.9
ATiSER 0.571 46.5 0.245 18.6
Table 4: Link prediction results of ablation experiments.

In Table 4, we can find that compared to Li-TSE and LiS-TSE, LiR-TSE and ATiSE got remarkable improvements on both datasets by adding the random component into the evolving entity/relation representations. According to these results, we can make a conclusion that modeling temporal uncertainty in TKGs can significantly improve the performaces of KGE models on TKGs.

On the other hand, the performances of Li-TSE on ICEWS14 were better than LiS-TSE, and LiR-TSE got very close results to ATiSE on ICEWS14, although the LiS-TSE and ATiSE use more parameters to learn the seasonal terms of entity/relation representation. On the contrary, the performances of LiS-TSE and ATiSE on YAGO11k were improved by adding seasonal terms into entity/relation representations, compared to Li-TSE and LiR-TSE. The difference between these two results were caused by the difference between datasets. ICEWS14 do not demonstrate any seasonal patterns since each fact in ICEWS14 occurs in an instant. By contrast, there exist both short-term relations and long-term relations in YAGO11k. Adding seasonal components into evolving entity/relation representations is helpful to distinguish short-term patterns and long-term patterns in YAGO11k. It can be seen from Table 5 that short-term relations learned by LiS-TSER, e.g., wasBornIn, have high evolutionary rates, and their seasonal components have smaller amplitudes and higher frequencies than long-term relations, e.g., isMarriedTo.

Relations #TS
wasBornIn 1.0 0.122 0.002 3.376
worksAt 18.7 0.044 0.071 0.137
playsFor 4.7 0.075 0.013 0.495
hasWonPrize 28.6 0.026 0.137 0.022
isMarriedTo 16.5 0.061 0.175 0.171
owns 24.9 0.018 0.234 0.034
graduatedFrom 38.1 0.008 0.056 0.015
deadIn 1.0 0.147 0.005 1.984
isAffliatedTo 25.8 0.018 0.044 0.065
created 27.1 0.041 0.198 0.081
Table 5: Relations in YAGO11k and the mean step numbers of their duration time (TS), as well as the corresponding parameters learned from LiS-TSER, including the evolutionary rate , the mean amplitude and the mean frequency of the seasonal component for each relation.

5.3 Time Prediction

As mentioned in the previous section, we corrupted time information in positive facts to generate negative samples () for time prediction. Table 6 shows the results of our proposed models for time prediction on ICEWS14 and ICEWS05-15. Since [TA-TransE] and [DE-SimplE] do not release their source codes and the implementation details are not completely clear in their papers, we compared our models with HyTE and TTransE based on our implementation. Experimental results demonstrate that ATiSEE and ATiSER outperformed HyTE and TTransE on ICEWS14, and ATiSEE also had the best performance on YAGO11k among TKGE models.

ICEWS14 YAGO11k
MRR Hits@10 MRR Hits@10
TTransE 0.029 6.8 0.633 82.3
HyTE 0.139 22.9 0.718 85.9
ATiSEE 0.174 29.0 0.723 87.5
ATiSER 0.146 23.8 0.676 83.0
Table 6: Time Prediction Results

5.4 Future Event Prediction

Table 7 demonstrates that ATiSEE outperformed other embedding models for future event prediction remarkably. ComplEx achieved the best performance among all baseline models. Compared to ComplEx, ATiSEE and ATiSER improved by 2.3%, 1.8% on ICEWS14 and 4.9%, 1.4% on ICEWS05-15. These results show that incorporating time information into embedding models can improve the ability of embedding models to predict future events in the premise that the entity/relation representations at a future time step can be obtained. The previous TKGE models have no ability of performing future event prediction since the entity/relation representations at a future time step in these TKGE models are unknown.

Datasets ICEWS14 ICEWS05-15
TransE 69.6 64.2
DistMult 72.5 74.7
KG2E 70.6 71.8
ComplEx 72.9 76.6
RotatE 71.7 75.6
ATiSEE 75.2 80.5
ATiSER 74.7 78.0
Table 7: Future Event Prediction: accuracy(%)

6 Conclusion

We introduce ATiSE, a temporal KGE model that incorporates time information into KG representations by using additive time series decomposition. ATiSE fits the temporal evolution of KG representations over time as additive time series, which enables itself to estimate time information of a triple with the missing time annotation and predict the occurrence of a future event. Considering the uncertainty during the temporal evolution of KG representations, ATiSE maps the representations of temporal KGs into the space of multi-dimensional Gaussian distributions. The covariance of an entity/relation representation represents its randomness component. Experimental results demonstrate that our method significantly outperforms the state-of-the-art methods on link prediction, time prediction and future event prediction. Besides, the results of ablation experiments show the effects of different components of entity/relation representations in ATiSE on the model performance.

Our work establishes a previously unexplored connection between relational processes and time series analysis with a potential to open a new direction of research on reasoning over time. In the future, we will explore to use more sophisticated time series analysis techniques [timeseries] to model the temporal evolution of KG representations, e.g., ARIMA model [ARIMA]. Along with considering the temporal uncertainty, another benefit of using time series analysis is to enable the embedding model to encode temporal rules. For instance, given two quadruple () and (), there exists a temporal constraint . Since the time information is represented as a numerical variable in a time series model, it is feasible to incorporate such temporal rules into our models. We will investigate the possibility of encoding temporal rules into our proposed models.

\ack

We would like to thank the referees for their comments, which helped improve this paper considerably

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
399562
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description