Building Dynamic Knowledge Graphs from Text-based Games

Building Dynamic Knowledge Graphs from Text-based Games

Mikuláš Zelinka, Xingdi Yuan1, Marc-Alexandre Côté1
Romain Laroche, Adam Trischler
Charles University, Faculty of Mathematics and Physics, Czech Republic
Microsoft Research, Montréal
Equal contribution, work performed when first author was an intern at Microsoft Research, Montréal.
1footnotemark: 1

We are interested in learning how to update Knowledge Graphs (KG) from text. In this preliminary work, we propose a novel Sequence-to-Sequence (Seq2Seq) architecture to generate elementary KG operations. Furthermore, we introduce a new dataset for KG extraction built upon text-based game transitions (over 300k data points). We conduct experiments and discuss the results.

1 Introduction

Text-based games are complex, interactive simulations in which text describes the game state and players make progress by entering text actions. They can be seen as sequential decision making tasks where accomplishing certain goals earns rewards (points). Solving these games requires both Reinforcement Learning (RL) and Natural Language Processing (NLP) techniques.

Given the complex, partially observable nature of text-based games, an explicit structured memory – e.g., in the form of a graph – is a useful component for game-playing agents. In this work, we side step the game-playing aspect of the problem to focus solely on learning how to build and dynamically maintain a KG from text observations. Specifically, our proposed model learns to generate graph update operations to update an existing KG given new text information. We see this KG module as an independent block that can be leveraged by game-playing agents to improve their performance.

Related Work: Numerous recent works focus on learning or using KGs in textual environments. das18dynamickg leverage a machine reading comprehension (MRC) mechanism to query for entities and states in short text passages and use attention to address aliased entity occurrences and to track the entity states dynamically. xiong2018one focus on one-shot learning of new relations from only one training instance.

For text games specifically, ammanabrolu19graph leverage KGs to improve performance, relying on OpenIE for entity extraction and several game-specific rules for building and maintaining the KG. sinha2019clutrr introduce a dataset aimed at natural language understanding and generalization in reasoning about entity relations that was built using a KG-like structure.

To the best of our knowledge, all of the approaches for learning KGs are either concerned with building static KGs (rather than focusing on small, dynamic updates), or employ some domain-specific knowledge or rules to facilitate learning. In contrast, our proposed model learns to generate general update operations for modifying a KG.

Figure 1: Illustration of an example in TextWorld KG. By issuing an action at game step , the environment returns a new observation, . Given the KG at step , a model is required to predict the new KG given the text observation.

2 The TextWorld KG Dataset

In this section, we introduce a new dynamic KG extraction dataset, TextWorld KG. TextWorld KG is based on a set of text-based games generated using TextWorld (cote18textworld).111We use the dataset provided for the First TextWorld Problems competition, available at
That framework enables us to extract the underlying partial KG for every state, i.e., the subgraph that represents the agent’s partial knowledge of the world – what it has observed so far. All games share the same overarching theme: the agent finds itself hungry in a simple modern house with the goal of gathering ingredients and cooking a meal.

To build the TextWorld KG dataset, we collect game transitions obtained by following each game’s walkthrough (provided by TextWorld). Additionally, after each step in a walkthrough, we perform 5 additional actions sampled at random from the list of admissible commands222This is the set of game actions understood by the game in a given state. (also provided by TextWorld). This presumably promotes robustness and generalizability of a training agent since it will encounter off-the-path transitions during game playing in the RL setting due to the absent of walkthroughs. Therefore an agent pre-trained on such data is more likely to work well in the RL setting. Formally, each data point in TextWorld KG is a tuple, {, , , }, where is a partial KG (in the format of Resource Description Framework (RDF) triples) representing the information an agent has seen during all previous game steps up to . After the agent issues an action (a string of words), the game engine returns a new text observation describing its effects on the world. The task is to predict the updated partial KG given the above information. Note that the new observation might not contain new information, since some actions do not change the game state (e.g. look in the same room twice). Table 1 shows some statistics about TextWorld KG and Figure 1 illustrates an example data point.

An important challenge posed by TextWorld is generalization. In each individual game instance, the interactable objects and their locations change along with the layout of the environment. Similarly, object names can be composed of multiple adjectives and a noun (e.g., red hot chili pepper), and at test time, players may encounter object names never seen during training. TextWorld KG inherits both of these challenging features.

#Train #Valid #Test Avg. Obs. Avg. #Operations #Vertices #Edges Avg. #Connections
267,031 13,442 41,865 29.3 tokens 3.1 99 10 43.1
Table 1: Statistics of TextWorld KG. Avg. Obs. is the average number of tokens an observation has. Avg. #Operations is the average number of update operations to generate per time step. #Vertices and #Edges correspond to the number of unique entities and relation types. Avg. #Connections is the average number of connections a graph has.

3 Learning to Update a KG

3.1 KG Definition

In a text-based game, at any given game step , the game state can be represented as a graph . In our setting, vertices represent entities (including objects, the player, and locations) and their states (e.g., closed, fried, sliced). Vertices are connected by edges , which represent a set of relations between entities (e.g. north_of, in, is).

Since games are partially observable, at every step an agent only observes part of the full game state (e.g., the agent cannot know facts in a room it has not visited). Thus, an agent must build its belief about the world, , from its observations. Ideally, the belief graph should match the ground truth graph, , which is a subgraph of representing what has been seen so far in the game.

TextWorld games are deterministic, so by progressively exploring and observing, an agent should discover more knowledge to push into its belief graph. Eventually, this ought to converge to a graph that accurately represents the entire game state.

3.2 Updating a KG

Instead of generating the entire belief graph at every game step, we generate a set of update operations such that , where Update is an oracle function that applies . In our case, each update operation in is represented as a text command. We define the following two types of update operation:

  • add(node1, node2, relation): add a directed edge, named relation, between node1 and node2; if any of these nodes does not exist, add that node first.

  • delete(node1, node2, relation): delete a directed edge, named relation, between node1 and node2; if any of the nodes or the edge does not exist, ignore this operation.

Given a new observation string and an agent’s current belief , the agent is required to generate operations as defined above to merge newly observed information into its belief graph. For the example shown in Figure 1, the generated operations are listed in Table 2.

add (player, shed, at)
add (shed, backyard, west_of)
add (wooden door, shed, east_of)
add (toolbox, shed, in)
add (toolbox, closed, is)
add (workbench, shed, in)
delete (player, backyard, at)
Table 2: Update operations corresponding to the transition shown in Figure 1.

We formulate the update generation task as a Seq2Seq problem. Specifically, we adopt the decoding strategy from yuan18kpgen, where given an observation sequence and a belief graph , the agent generates a sequence of tokens consisting of multiple graph update operations separated by a delimiter token.

As pointed out by meng19order, the order of ground-truth tokens and sequences (in our case, graph update operations) matters in Seq2Seq language generation. We therefore define a set of rules (e.g., always add before delete) to order ground-truth operations for teacher forcing during training.

3.3 Model Architecture

Figure 2: Graph update operation generation model.

We use a transformer-based Seq2Seq model (vaswani17allyouneed) to generate update operations. As shown in Figure 2 in Appendix A, the model consists of the following components:

  1. A text encoder, which reads text inputs (the concatenation of observation and the action at the previous game step, ), and generates hidden representations.

  2. A graph encoder, which encodes the previous belief into hidden representations.

  3. An attention-based representation aggregator, which combines the two above representations.

  4. A command generator, which takes aggregated representations and generates update operations token by token.

For space considerations, we elaborate our model components in Appendix A. Following common practice in natural language generation (NLG), we train our operation generation model via teacher forcing. Specifically, during training, a right-shifted ground truth target sequence is provided as input to the decoder and the model is trained with the negative log-likelihood (NLL) loss. During test, the model starts generating from a start-of-sentence token and uses the previously generated token as input to the next step. The model terminates after generating an end-of-sequence token.

4 Experiments and Discussion

Model TF-F FR-F
Transformer 0.832 0.434
+ GCN 0.928 0.664
+ R-GCN 0.965 0.645
+ R-Emb 0.962 0.697
Table 3: Test performance.

In this preliminary study we test 4 graph encoder variants of the proposed model. First, as a baseline, we disable the graph encoder, which renders the model a standard Seq2Seq transformer. Second, we utilize a Graph Convolutional Network (GCN) (kipf16gcn) as the graph encoder. The GCN does not consider multiple relations.333For models that do not consider relational information, we use single relation KGs as ground-truth during evaluation. Third, we enable conditioning on multiple relations by using a Relational Graph Convolutional Network (R-GCN) (schlichtkrull2018rgcn). Although R-GCN takes into account multiple relations, it does not consider information in relation labels. In our task, this information is important (e.g., east_of and west_of are symmetric relations). Therefore, we finally learn a vector representation for each relation that is conditioned on the label’s text embeddings. The resulting relation representation is used as an extra input to the R-GCN layer. Table 3 shows the test results for all models.

During training, a model takes {, , } as input, where is the input graph, and and are the text action issued at the previous game step and the resulting text observation, respectively. The model outputs a sequence describing an update operation to the graph and the resulting . During evaluation:

  • Teacher-force (TF) F: we use the ground-truth as input graph and compute the score between the model’s generated graph update commands and the ground-truth commands. Note the score is computed on command level (i.e., if any token in a command is incorrect, this command is treated as incorrect).

  • Free-run (FR) F: we initialize the belief graph at the beginning of each game with an empty graph. For each game step, we use (the graph generated by the model) as input. At the end of each game, we compute score between the final belief graph and ground truth , graphs are represented as RDF triples.

In general, although all model variants show good performance on TF-F, they perform worse on FR-F. This is not surprising since errors accumulate in the latter setting. Models using R-GCN outperform those using GCN by a noticeable margin, which suggests relational information is essential in the proposed tasks. Interestingly, while the two R-GCN models perform similarly on TF-F, the variant with relational embedding (considering information in relation labels) significantly outperforms the other on FR-F.

To better understand the behavior of our proposed models on TextWorld KG, we conduct an error analysis, which we show in Appendix B.

The next step for this project is to leverage the KG update module while playing text-based games. We believe that maintaining such a graph could help an RL agent (1) to avoid re-discovering known facts about the world and (2) to discover new world knowledge efficiently. We are also interested in finding ways of transferring learned graphs from one game to another to improve agents’ ability to generalize.


Appendix A Model Architecture

Text Encoder:

An observation string is provided in response to the text action an agent issued at previous game step. We concatenate the two text information together as the input to the text encoder, in which indicates vector/string concatenation. The text encoder consists an embedding layer and a stack of transformer blocks. The text encoder results a sequence of hidden vectors , where is length of the concatenated string, is hidden size.

Graph Encoder:

At the same time, the graph encoder takes the agent’s belief KG (in which stores the agent’s observations a all previous game steps) as input. We adopt different off-the-shelf graph neural networks (GNN) as the graph encoder (we will describe more details in experiment section). After several layers of propagation, graph representations are generated, where is number of nodes in the KG.

Representation Aggregator:

We use an attention based layer to aggregate text and graph representations (bahdanau2014attention; seo2016bidaf). Specifically, we use weighted sum of text representations to represent graph information, which results ; similarly, we use weighted sum of graph representations to represent text information, resulting .

Command Generator

Finally, both and are used to condition text generation. Input tokens are first converted into embeddings, then they are fed into a stack of decoder transformer blocks which result a probability distribution over the vocabulary. To prevent model from utilizing future information, we follow vaswani17allyouneed to use a masked multi-head self attention layer in the beginning of each block. For being able to both generate a word from vocabulary, and point a word from the source text , we adopt the pointer-softmax mechanism (gulcehre16pointer).

Appendix B Error Analysis

Figure 3: Average TF-F scores grouped by verbs.

In Figure 3, we report average TF-F scores grouped by verbs found in input actions . We can observe that vanilla transformer model performs poorly on go actions, which aligns with the fact that after issuing a go action, the resulting observation text does not contain any information of the agent’s previous location. On the other hand, the other models benefit from their belief graph to retrieve that single information.

Also, we notice that all models perform relatively poorly on prepare actions. This also makes sense since in TextWorld games, the action of preparing a meal consumes multiple food ingredients at once in order to produce a meal object. The resulting observation text following a prepare action only contains information about the newly produced meal and does not mention what food ingredients have been consumed. Even though the information about the recipe (i.e. ingredients needed) is part of the KG graph, a model has to learn to retrieve multiple information at once from its belief graph to figure out what ingredients will be consumed.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description