Toward Automated Quest Generation in Text-Adventure Games

Toward Automated Quest Generation in Text-Adventure Games

Prithviraj Ammanabrolu, William Broniec, Alex Mueller,
Jeremy Paul, and Mark O. Riedl
School of Interactive Computing
Georgia Institute of Technology
Atlanta, GA, USA
{raj.ammanabrolu,wbroniec3,alexmueller,jeremypaul,riedl}@gatech.edu
Abstract

Interactive fictions, or text-adventures, are games in which a player interacts with a world entirely through textual descriptions and text actions. Text-adventure games are typically structured as puzzles or quests wherein the player must execute certain actions in a certain order to succeed. In this paper, we consider the problem of procedurally generating a quest, defined as a series of actions required to progress towards a goal, in a text-adventure game. Quest generation in text environments is challenging because they must be semantically coherent. We present and evaluate two quest generation techniques: (1) a Markov chains, and (2) a neural generative model. We specifically look at generating quests about cooking and train our models on recipe data. We evaluate our techniques with human participant studies looking at perceived creativity and coherence.

1 Introduction

Natural language can be used to express creativity in the form of narrative. Prior research has shown that narrative is used in everything from environmental understanding (Bruner, 1991) to developing language (Johnston, 2008). Given this wide ranging impact, using narrative in language helps us understand human perceptions of creativity and what it takes to replicate this through computational models is natural. Text-adventure games or interactive fiction, in which a player interacts with a world entirely through text, provide us with a platform on which to explore these ideas on creativity in language. These games are usually structured as puzzles or quests in which a player must complete a sequence of actions in order to succeed. The world and corresponding quests in text-adventures are often painstakingly hand-crafted; we seek to reduce the and explore methods that learn to automatically generate such content from raw, unstructured data and how the entire generation process can be used to better understand narrative intelligence, specifically looking at perceived creativity and coherence.

Automated generation of text-adventure games can broadly be split into two considerations: (1) the structure of the world, including the layout of rooms, textual description of rooms, objects, and non-player characters; and (2) the quest, consisting of the partial ordering of activities that the player must engage in to make progress toward the end of the game. In this work, we focus on methods of automatically generating such a quest and how it can be used to better understand narrative intelligence, specifically looking at perceived creativity and coherence. Quest generation requires narrative intelligence as a quest must maintain coherence throughout and progress towards a goal. Maintaining quest coherence also means following the constraints of the given game world. The quest has to fit within the confines of the world in terms of both genre and given affordances—e.g. using magic in a fantasy world. This is further complicated in the case of a text-adventure as a consequence of all interactions being in natural language—the potential output space is combinatorial in size. Because the player “sees” and “acts” entirely through text, any quest generation system must also take into account the lack of visual information and generate sufficiently descriptive text accordingly.

There are multiple variables that could potentially affect a player’s perception of creativity in a text-adventure game such as the vocabulary used, the structure of the world, stylistic variations in writing, etc. We use the TextWorld framework (Côté et al., 2018) which lets us generate text-adventure game worlds based on a grammar. It lets us fix variables concerned with game world and logic generation and focus only on the generation of quests within this world. We use this framework’s ”home” theme—providing us with a textual simulation of a house—and restrict the types of quests that can be generated to those involving the completion of a cooking recipe. We then attempt to learn how to generate a quest to complete this recipe—as well as how to create the recipe itself—using a large scale knowledge base of recipes. In these quests, players are provided with a list of ingredients and their locations, and they have to navigate the environment to find and prepare those ingredients to complete the given recipe. For example, given a recipe to make peanut butter cookies the quest would first tell the player to find eggs, peanut butter, flour, and baking soda. The player would then have to figure out that the first ingredient is in the fridge while the others are in the pantry and prepare each item accordingly. Generating this sort of quest requires knowledge of the ingredients, how they fit together, and how those ingredients interact with the environment.

The contribution of this work is thus twofold. We first detail a framework, and variations thereof, that can learn to generate creative quests in a text-adventure game. This framework includes two quest generation models using Markov chains as well as a neural language model. It also uses a semantically grounded knowledge graph to improve overall quest coherence. Our second contribution provides human subject evaluations that give us insight into how each variation of this framework affects human perception of creativity and coherence in such games.

2 Related Work

Although there has been much work recently on text-adventure gameplay (Bordes et al., 2010; He et al., 2016; Narasimhan et al., 2015; Fulda et al., 2017; Yang et al., 2018; Haroush et al., 2018; Côté et al., 2018; Tao et al., 2018; Ammanabrolu and Riedl, 2019a; Hausknecht et al., 2019a; Ammanabrolu and Riedl, 2019b; Hausknecht et al., 2019b), these works focus on creating agents that can play a given game as opposed to being able to automatically generate content for them.

Outside of this, there has been some work on learning to create content in the context of interactive narrative. These systems mainly work to overcome a significant bottleneck in the form of the human authoring required to create such works. Permar and Magerko (2013) present a method of generating cognitive scripts required for freeform activities in the form of pretend play. Specifically, they use interactive narrative—a form of pretend play that requires a high level of improvisation and creativity and uses cognitive scripts acquired from multiple experience sources. They take existing cognitive scripts and blend them in the vein of more traditional conceptual blending (Veale et al., 2000; Zook et al., 2011) to create new blended scripts. Closely related is Magerko et al. (2014) who present a Co-Creative Cognitive Architecture (CoCoA), detailing the set of components that support the design of co-creative agents in the context of interactive narrative. These methods all follow singular cognitive models that do not learn to generate content automatically.

Li et al. (2012) present Scheherazade, a system which learns a plot graph based on stories written by crowd sourcing the task of writing short stories through Amazon Mechanical Turk. This plot graph contains details relevant for the coherence of the story and includes: plot events, temporal precedence, and mutual exclusion relations. The generated narrative contains events that can be executed from this plot graph by both players and non-player characters. Guzdial et al. (2015) introduce Scheherazade-IF, a system that learns to generate choose-your-own-adventure style interactive fictions in which the player chooses from prescribed options. More recently, Martin et al. (2017) introduce a pipeline systems for improvisational storytelling agents capable of collaboratively creating stories. These agents first focus on creating a plot for the story and then expand that plot into natural language sentences. Urbanek et al. (2019) introduce Light, a dataset of crowdsourced text-adventure game dialogs focusing on giving collaborative agents the ability to generate contextually relevant dialog and emotes.

Giannatos et al. (2011) use genetic algorithms to create new story plot points for an existing game of interactive fiction using an encoding known as a precedence-constraint graph. This graph gives the system information regarding the ordering of events that must happen in the game in order to advance. They demonstrate the workings of their system by generating additional content for the popular interactive fiction game Anchorhead, and show that this can be integrated into the original game. This work, however, is offline and relies on existing interactive fiction games and having knowledge of the precedence-constraint graph for this existing game.

The Game Forge system (Hartsook et al., 2011) also uses genetic algorithms to generate a game world and plot line for related type of game, a computer role playing game (CRPG). This work focuses on generating layouts and plot structures to create novel game worlds through with a fitness function based on a transition graph that encodes pre-built game requirements. Tamari et al. (2019) focus on extracting action graphs for sequential decision making problems such as material science experiments and turn them into text-adventure games. Although these works use graph structures in order to constrain the generation of the game, we use these graph structure only to maintain coherence and focus on content creation.

Although there are works that attempt to automatically evaluate the creativity of the output of a generative process by computationally modeling potential human responses — such as with story telling (Purdy et al., 2018), etc. — we choose to rely on a human subject study based on the definition of creativity as presented in Boden (2007). Specifically we focus on the concepts of novelty and value, despite collecting data for other defined metrics as well. We use the definition of novelty stemming from the idea of p-creativity, i.e. a concept that is entirely new to a single agent – in this case a subject in our evaluation study. Value, as a component of computational creativity, however, is not defined concretely in Boden’s work for a general domain. Our definition of value in the context of text-adventure games relies on accomplishment or achievement.

Figure 1: Example of ingredient connections.

3 Content Generation

In this section, we present the two models based on Markov chains and a neural language model to generate the content, i.e. recipes, for our quests. Content generation for a quest in a text-adventure game, in this case a recipe, can be though of as being equivalent to generating a sequence of events in which prior elements affect the probability of subsequent events. Markov chains present a simplified and well studied method to generate such content. Neural language models, designed to predict an element of a sequence conditioned on a given number of prior elements, let us generate sequences of events with more prior context—i.e. in the absence of the Markov assumption.

3.1 Markov Chains

Our first quest generation model is based on the use of Markov chains. This generation process consists of two steps. We first learn a weighted ingredient graph, a Markov chain, from a large scale knowledge base of recipes and then probabilistically walk along this graph to generate the instructions for the recipe.

3.1.1 Ingredient Graph

Generating the recipe requires domain knowledge. For example, creating a recipe for peanut butter cookies requires an understanding that an ingredient like peanut butter fits well with eggs, flour, and sugar while something like fish does not. We represent this knowledge with an undirected graph of ingredients. Our ingredient graph is based off of recipes scraped from allrecipes.com111https://github.com/kbrohkahn/recipe-parser The raw, uncleaned dataset included over 20,000 recipes with over 4000 unique ingredients. A list of ingredients was extracted from each recipe, and each of these lists was converted into a set of ingredient pairs (Fig. 2). In total, there were 118,116 unique ingredient pairings, and 73,088 of those pairings (62%) only occurred once. We reduced the number of distinct ingredients from 4460 to 1703 by merging items with the same base ingredient and by replacing name-brand items with a generic equivalent.

Each of the nodes within the graph represents a possible ingredient, and weighted connections between these nodes represent how well the ingredients go together. The weight of each edge is the total number of occurrences of that ingredient pair within the recipe corpus. The edge connecting eggs and white sugar would have a weight of 3774 while the edge between hot milk and orange juice would have a weight of 1. Ingredient pairings that do not occur within the recipe corpus did not have an edge within this network, and this helped prevent our model from generating completely incoherent recipe pairings (e.g. hot sauce and baby food). Take the graph in Fig. 1 as an example. In this complete graph, all of the ingredients go well with each other except for fish and sugar, which is indicated by the low weight connection between them. The weak connection between sugar and fish suggest that they would likely not go well together in a recipe.

Figure 2: Ingredient extraction process.

3.1.2 Instruction Generation

With the ingredient graph created, we begin the process of instruction generation based on sub-graph mining and prior generative methods based on probabilistic graph walks (Fleishman, 1978). We start by selecting an initial random ingredient ‘x’ weighted by its distribution in the graph.

(1)

We probabilistically select one of its neighbors based on the conditional frequency of the pair. Each iteration further computes conditional probabilities and selects them. We exclude all ingredients in which any bag of words token is contained by any other, ensuring that a variety of different ingredients are selected.

(2)

In Eq. 2, Bxi refers to the 1-gram bag of words model.

However, just computing complete conditional probabilities would remove the chance for entirely new combinations to emerge. Therefore, we calculate just the partial probability of having shared ingredients with a bias designed to favor such combinations.

(3)
(4)

This process repeated recursively to generate a recipe with the desired number of ingredients.

(5)

Finally, resultant combinations are referenced back against the original corpus to guarantee novelty in the result.

3.2 Neural Language Model

Here we use a neural language model to generate both the ingredients for a recipe and the steps of the ingredients as well. We use the same knowledge base as described in Sec. 3.1.1 and train two separate language models: one to generate the ingredients, and the other to generate the recipe given a set of ingredients.

The first language model uses a simple 4-layer LSTM to generate a sequence of ingredients, treating all the words in a single ingredient as a single token. For example, “peanut butter” would be considered a single token in this model. We train this model using the sets of ingredients found in each recipe for the entire recipe dataset, with each set ending with an EOI or End of Ingredients tag. Once trained, the model then generates a sequence of ingredients until the EOI is reached using the top- sampling technique (Holtzman et al., 2019).

To generate the actual recipe, we use GPT-2 (Radford et al., 2019) and fine-tune their pre-trained 345m parameter model on the recipe data. The data to fine-tune this model is designed to contain the recipe title, ingredients, and instructions in an unstructured text-form. Once this model has been fine-tuned, we use it to generate the title and instructions for the recipe conditioned on the ingredients generated by the first language model. The entire generated recipe consists of the ingredients, title, and instructions.

4 Quest Assembly

We now use the generated content, i.e. the recipe, to assemble a quest—grounding the generated ingredients and instructions in the game world. This requires us to first determine the structure of the game world and the locations of objects within this world in addition to transforming the set of generated instructions into executable actions. We use two types of semantically grounded knowledge graphs to represent this information: the object and action graphs.

Figure 3: Object graph in the one room map.

The object graph is used to determine the structure of the world and the most likely locations of objects within this world. For example, we could have information that says that vegetables must be stored in a refrigerator. If a recipe requires carrots, then the carrots would automatically be placed in a refrigerator at the start of the game. This graph is constructed by hand and is built to make the game world and resulting quest as coherent as possible.

We construct object graphs for two different room layouts. The first, the one room (1R) map, consists of a kitchen as well as the objects and actions that exist within it. The second map, the five room (5R) map, is an extension of the first map and contains four additional rooms.

The object graph for the 1R map as shown in Fig. 3 is largely inspired by the simple, pre-built game provided within TextWorld (Côté et al., 2018). This object graph determines how and where objects are placed within the environment during game generation, and the action graph (Fig.6) dictates how generated instructions are transformed into executable actions in the game. The object graph was constructed logically: tools and utensils go in the drawer, meat and dairy belong in the refrigerator, and so on. Food item placements are deterministic and coherent. Vegetables always go in the refrigerator, and fruit always goes on the kitchen island. The action graph was also designed to prevent the player from conducting illogical actions.

Figure 4: Room layout in the five room map.

Figure 5: Object graph in the five room map.

The 5R map included a dining room, garage, backyard, and garden in addition to the kitchen (Fig. 4). The map (Fig. 5) is designed to maintain the same levels of coherency as the 1R map while allowing for more diverse gameplay, which could in turn lead to higher levels of perceived creativity. The additional rooms are selected based on their possible relationships to the domain of food and cooking, and each new room has its own unique objects that players can interact with. For example, the garage has an old refrigerator that can be used to store meat. These new rooms and objects also allow for dynamic food placement. Meat can be placed in one of two refrigerators, and fruits and vegetables can possibly be found in the garden. The existence of these new locations is not immediately clear to the player. The garage and backyard are additionally obscured by closed doors, adding to quest complexity. While the additional rooms and dynamic food placement allow for more diverse gameplay, they do not sacrifice coherency.

Figure 6: Example action graph for both maps.

The action graph contains information regarding the affordances of the objects in the world and what kinds of objects are required to complete a given generated instruction. For example, if a generated instruction tells us to prepare vegetables, i.e. cut them, then this graph tells us that there must be a knife somewhere in this world. This graph is partially extracted from static cooking guides online using a mixture of OpenIE (Angeli et al., 2015) and hand-authored rules to account for the irregularities of cooking guides. An example of an action graph is given in Fig. 6. A player can peel fruit and vegetables, for example, but cannot peel a steak. There are also strict rules on what tools are required for certain actions. A player can only cut something if they have a knife and can only peel something with a peeler. While this restricts how players can interact with the environment, it ultimately reinforces game coherency.

We also note that when generating the quests, both the Markov chain and neural language model based generation systems use the object graph to determine object placement but only the Markov chain based model uses the action graph. This is because the instructions generated by the Markov chain model is in the form of a sequence of ingredients which then requires action graph to determine the actions and additional objects required to turn this list of ingredients into a playable quest. The action graph would thus take an ingredient such as a carrot and determine first that it needs to be cut and that a knife is required for this task. The neural language model on the other hand already generates the full action, including potentially required objects, that can be executed and so does not make use of this graph.

5 Experiments

Our experiments were designed to compare perceived creativity and coherence, specifically testing our models in addition to factors such as complexity. We tested five types of designs: Human Designed (HD), Random Assignment (RA), Markov Chains Simple (MCS), Markov Chains Complex (MCC), and Language Model (LM). HD is simply what it sounds like, a game that was created by a person. In this game, a human creates a both the ingredients and instructions for a recipe and is also responsible for quest assembly, i.e. grounding the generated content in a given game world. We do not consider experience in designing text-adventure games when picking a human to create this game as this task can be performed even by novices given the easily understandable “home” theme of the game world. The game is manually crafted it in terms of decided what ingredients to put where and what the final recipe would come together to be. RA is on the opposite end of the spectrum where, as the name suggests, everything is placed in a random location, and the recipe could be totally random with ingredients and instructions that might not normally be seen. MCS and MCC use our Markov chains approach to generate quest content. The difference between MCS and MCC are that the former has four ingredients involved in its recipe while the latter has eight. This was to vary the complexity to see how that affected perceived creativity. LM refers to the games generated using the recipes generated by the language model. We additionally had one-room and five-room variants for each of the models to test how the structure and length of the game would affect the players.

Evaluating the creativity of the output of any computational generation process is a difficult task which requires concrete definitions of the metrics being defined. We thus setup the experiment by having our game designs deployed on Amazon Mechanical Turk for people to play and provide feedback. Specifically, they would play one randomly selected game from the 1 room layout and then fill out a survey for that game, and then play one randomly selected game from the 5 room layout and fill out an identical survey. Subjects were provided with a simple practice game that they could play beforehand to familiarize themselves with TextWorld and its interface. We had 75 total participants for the entire study and had an average of 15 people play each game. The only restrictions that we had for participants was that they had to be fluent in English—this was determined by means of prebuilt restrictions on Amazon Mechanical Turk and game completion verification.

The users were asked questions pertaining to two metrics: coherence and creativity. We looked at creativity as a metric in the survey using the components of creativity as defined by Boden: novelty, surprise, and value. The survey detailed questions that measured our defined metrics, using Likert Scale values along a scale of 1-7. For example, it posed questions such as ”How original was the quest you played? 1: not at all novel, 7: exceptionally novel” when measuring novelty. The other factors were also measured using similarly phrased questions. A one-way ANOVA test was then conducted followed by Tukey HSD post-hoc analysis to determine significance. The results of the raw scores for each group as well as the significant results between pairs of different models are presented below.

6 Results and Discussion

Figure 7: Coherence scores for each game. Error bars indicate one standard deviation.

We present results for four metrics: coherence, unpredictability (or surprise), novelty (or originality), and value (or accomplishment) for each of the games. Additionally, we also show the p-value result of a one way ANOVA test for the distributions in each of the categories to determine statistical significance. This test tells us if the differences in the means across the different games is significant for each of the categories separately. They Tukey HSD post-hoc analysis further tells us which specific pairs of results are significant. We hypothesized that semantic grounding using the knowledge graph would enable our models to maintain coherence on par with the human designed games. Further, given the stochastic nature of our generative models, we further predicted that our models would also rate as being comparable in terms of creativity to the human designed games—with all models relatively outperforming the randomly generated games. As seen in Section 6 and below, these predictions hold.

We find that the results for each individual category are significant— in all the cases. Additionally, all the specific pairwise comparisons we make are significant with —a full set of these pairwise results can be found in Appendix A. The rest of this section will discuss each of these metrics in more detail.

Figure 8: Unpredictability (surprise) scores for each game. Error bars indicate one standard deviation.

Fig. 7 displays trends in the players’ perception of coherence for each of the games. We first see that the one-room games were consistently rated to be more coherent than the five-room games, indicating that overall quest coherence—and thus the coherence of our generative system—degrades the longer and more complex the quest. Across the games, we see that the RA models were the considered to be the least coherent. The MCS model slightly outperforms the MCC model, showing that the Markov chain based models are more coherent the less complex the output. The LM achieving higher score than both of the Markov chain models and maintains coherence more easily than either. Most importantly, all of these methods are comparable in coherence to the human-authored games, i.e. our semantically grounded knowledge graph ensures that coherence is not lost when generating content.

Figure 9: Originality (novelty) scores for each game. Error bars indicate one standard deviation.

Similarly, Fig. 8 describes how surprising the game was to the players. The difference between the one-room and five-room games here is much more pronounced. The players find the five-room, the longer and more complex game, much more surprising than their one-room counterparts, showing that complexity is an important factor in determining surprise. Another indication of this is that the MCC model is rated as more surprising than the less complex MCS model. The LM achieves comparable performance to MCC and once again they all perform as well as the HD games.

Originality (Fig. 9), which we use as a proxy to measure novelty, exhibits similar trends as surprise. The more longer, more complex games are deemed more original. Despite being random, the RA games are seen to be less original than the the rest of the games perhaps indicating that there is a link between perceptions of coherence and originality. The gaps in performance here are much less pronounced, however, and the Markov chain models slightly edge out the LM — with all three being comparable to the HD games.

To measure value, or utility, in a text-adventure game, we asked the players if they felt a sense of accomplishment after finishing the game. We see players reported a higher sense of accomplishment after finishing more complex games in general with the exception of the RA games, both of which performed poorly—likely due to them being relatively incoherent. We also note that the LM showed the highest values here, surpassing the HD games. We hypothesize that this might be due to the player having to perform a wider range of actions, some relatively unintuitive, that are not constrained by our action graph.

Figure 10: Accomplishment (value) scores for each game. Error bars indicate one standard deviation.

7 Conclusions

We have demonstrated a framework to automatically generate cooking quests in a “home” themed text-adventure game, although our framework can be generalized to other themes as well. Quest generation in a given game world is a subset of the overall problem of generating entire text-adventure games. Content generated by both the Markov chains and neural language model can be grounded into a given game world using domain knowledge encoded in the form of a knowledge graph. The models each excel on different metrics, the Markov chains model is more surprising and novel while the neural language model offers greater value and coherence. We also note, however, that the neural language model requires less domain knowledge than the Markov chains and is thus potentially more generalizable to other themes and types of quests.

Our human subject study shows us that there is an inverse relationship between creativity and coherence but only when a certain threshold of coherence is passed. In other words, the less coherent a game the more creative it is, but incoherent games—such as those generated by the RA model—are perceived to be less creative. Furthermore, our automatically generated games consistently perform at least as well as human designed games in this setting, both in terms of coherence and creativity—implying that the generative process can be automated without a loss in perceived game quality.

8 Acknowledgements

The authors would like to thank Dr. Ashok Goel and Fred Bane for their advice and encouragement of the ideas in this work.

References

  • P. Ammanabrolu and M. O. Riedl (2019a) Playing text-adventure games with graph-based deep reinforcement learning. In Proceedings of 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Cited by: §2.
  • P. Ammanabrolu and M. O. Riedl (2019b) Transfer in deep reinforcement learning using knowledge graphs. CoRR abs/1908.06556. Cited by: §2.
  • G. Angeli, J. Premkumar, M. Jose, and C. D. Manning (2015) Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Cited by: §4.
  • M. A. Boden (2007) Creativity in a nutshell. Think 5 (15), pp. 83–96. Cited by: §2.
  • A. Bordes, N. Usunier, and J. Weston (2010) Towards understanding situated natural language. In Proceedings of the 2010 International Conference on Artificial Intelligence and Statistics, Cited by: §2.
  • J. Bruner (1991) The narrative construction of reality. Critical Inquiry 18 (1), pp. 1–21. Cited by: §1.
  • M. Côté, Á. Kádár, X. Yuan, B. Kybartas, T. Barnes, E. Fine, J. Moore, M. Hausknecht, L. E. Asri, M. Adada, et al. (2018) Textworld: a learning environment for text-based games. arXiv preprint arXiv:1806.11532. Cited by: §1, §2, §4.
  • A. I. Fleishman (1978) A method for simulating non-normal distributions. Psychometrika 43 (4), pp. 521–532. Cited by: §3.1.2.
  • N. Fulda, D. Ricks, B. Murdoch, and D. Wingate (2017) What can you do with a rock? affordance extraction via word embeddings. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1039–1045. External Links: Document, Link Cited by: §2.
  • S. Giannatos, M. J. Nelson, Y. Cheong, and G. N. Yannakakis (2011) Suggesting New Plot Elements for an Interactive Story. In In Workshop on Intellignet Narrative Technologies (INT’11), External Links: Link Cited by: §2.
  • M. Guzdial, B. Harrison, B. Li, and M. Riedl (2015) Crowdsourcing open interactive narrative.. In FDG, Cited by: §2.
  • M. Haroush, T. Zahavy, D. J. Mankowitz, and S. Mannor (2018) Learning How Not to Act in Text-Based Games. In Workshop Track at ICLR 2018, pp. 1–4. Cited by: §2.
  • K. Hartsook, A. Zook, S. Das, and M. O. Riedl (2011) Toward supporting stories with procedurally generated game worlds. In 2011 IEEE Conference on Computational Intelligence and Games (CIG’11), Vol. , pp. 297–304. External Links: Document, ISSN 2325-4270 Cited by: §2.
  • M. Hausknecht, P. Ammanabrolu, M. Côté, and X. Yuan (2019a) Interactive fiction games: a colossal adventure. CoRR abs/1909.05398. Cited by: §2.
  • M. Hausknecht, R. Loynd, G. Yang, A. Swaminathan, and J. D. Williams (2019b) NAIL: a general interactive fiction agent. CoRR abs/1902.04259. Cited by: §2.
  • J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf (2016) Deep Reinforcement Learning with a Natural Language Action Space. In Association for Computational Linguistics (ACL), Cited by: §2.
  • A. Holtzman, J. Buys, M. Forbes, and Y. Choi (2019) The curious case of neural text degeneration. CoRR abs/1904.09751. Cited by: §3.2.
  • J. R. Johnston (2008) Narratives Twenty-Five Years Later. In Top Lang Disorders, Vol. 28, pp. 93–98. External Links: Link Cited by: §1.
  • B. Li, S. Lee-Urban, D. S. Appling, and M. O. Riedl (2012) Crowdsourcing Narrative Intelligence . In Advances in Cognitive Systems, Vol. 1, pp. 1–18. External Links: Link Cited by: §2.
  • B. Magerko, J. Permar, M. Jacob, M. Comerford, and J. Smith (2014) An Overview of Computational Co-creative Pretend Play with a Human. In Proceedings of IVA 2014, External Links: Link Cited by: §2.
  • L. J. Martin, P. Ammanabrolu, X. Wang, S. Singh, B. Harrison, M. Dhuliawala, P. Tambwekar, A. Mehta, R. Arora, N. Dass, C. Purdy, and M. O. Riedl (2017) Improvisational Storytelling Agents. In Workshop on Machine Learning for Creativity and Design (NeurIPS 2017), Long Beach, CA. External Links: Link Cited by: §2.
  • K. Narasimhan, T. Kulkarni, and R. Barzilay (2015) Language Understanding for Text-based Games Using Deep Reinforcement Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Cited by: §2.
  • J. Permar and B. Magerko (2013) A Conceptual Blending Approach to the Generation of Cognitive Scripts for Interactive Narrative. In 9th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’13), Cited by: §2.
  • C. Purdy, X. Wang, L. He, and M. Riedl (2018) Predicting Generated Story Quality with Quantitative Measures. In In Proceedings of 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’18), Cited by: §2.
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever (2019) Language models are unsupervised multitask learners. Cited by: §3.2.
  • R. Tamari, H. Shindo, D. Shahaf, and Y. Matsumoto (2019) Playing by the book: an interactive game approach for action graph extraction from text. In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications, Minneapolis, Minnesota, pp. 62–71. External Links: Link, Document Cited by: §2.
  • R. Y. Tao, M. Côté, X. Yuan, and L. E. Asri (2018) Towards solving text-based games by producing adaptive action spaces. In Proceedings of the 2018 NeurIPS Workshop on Wordplay: Reinforcement and Language Learning in Text-based Games, Cited by: §2.
  • J. Urbanek, A. Fan, S. Karamcheti, S. Jain, S. Humeau, E. Dinan, T. Rocktäschel, D. Kiela, A. Szlam, and J. Weston (2019) Learning to speak and act in a fantasy text adventure game. CoRR abs/1903.03094. Cited by: §2.
  • T. Veale, D. O’donoghue, and M. T. Keane (2000) Computation and Blending. Cognitive Linguistics 11(3/4), pp. 253–282. External Links: Link Cited by: §2.
  • Z. Yang, S. Zhang, J. Urbanek, W. Feng, A. H. Miller, A. Szlam, D. Kiela, and J. Weston (2018) Mastering the dungeon: grounded language learning by mechanical turker descent. In In proceedings of the The Sixth International Conference on Learning Representations (ICLR’18), Cited by: §2.
  • A. Zook, B. Magerko, and M. Riedl (2011) Formally modeling pretend object play. In Proceedings of the 8th ACM Conference on Creativity and Cognition, New York, NY, USA, pp. 147–156. External Links: ISBN 978-1-4503-0820-5, Link, Document Cited by: §2.

Appendix A Results

Figure 11: Coherence scores for each game. Error bars indicate one standard deviation. Group 1 Group 2 Meandiff HD-1 RA-5 -1.3958 LM-1 RA-5 -1.5 MCC-5 RA-5 -0.5333 MCS-1 RA-5 -1.1667 RA-1 RA-5 -1.2157 LM-5 MCS-1 0.1275 LM-5 MCS-5 -0.4392 LM-5 RA-5 -1.0392 LM-1 RA-1 -0.2843
Table 1: Coherence results for the post-hoc Tukey HSD test with , only significant results are shown.
Figure 12: Originality (novelty) scores for each game. Error bars indicate one standard deviation. Group 1 Group 2 Meandiff HD-1 HD-5 1.8125 HD-1 LM-1 1.2292 HD-1 LM-5 1.4007 HD-1 MCC-5 1.6125 HD-1 MCS-1 0.5347 HD-1 MCS-5 1.5458 HD-1 RA-1 1.5772 HD-1 RA-5 0.9458 MCC-1 MCC-5 1.05 MCC-1 RA-1 1.0147
Table 2: Originality results for the post-hoc Tukey HSD test with , only significant results are shown.
Figure 13: Unpredictability (surprise) scores for each game. Error bars indicate one standard deviation. Group 1 Group 2 Meandiff HD-1 HD-5 2.6106 HD-1 LM-5 2.7463 HD-1 MCC-5 2.6875 HD-1 MCS-5 1.8875 HD-1 MCC-1 0.9375 HD-1 RA-1 0.9228 HD-1 RA-5 1.4208 HD-5 LM-1 -2.3397 HD-5 MCC-1 -1.6731 HD-5 MCS-1 -2.5342 LM-1 LM-5 2.4755 LM-1 MCC-1 0.6667 LM-1 MCC-5 2.4167 LM-1 MCS-5 1.6167 LM-5 MCC-1 -1.8088 LM-5 MCS-1 -2.6699 LM-5 RA-1 -1.8235 LM-5 RA-5 -1.3255 MCC-1 MCC-5 1.75 MCC-1 MCS-1 -0.8611 MCC-5 RA-1 -1.7647 MCC-5 RA-5 -1.2667 MCS-1 MCS-5 1.8111 MCS-1 RA-1 0.8464 MCS-1 RA-5 1.3444
Table 3: Surprise results for the post-hoc Tukey HSD test with , only significant results are shown.
Figure 14: Accomplishment (value) scores for each game. Error bars indicate one standard deviation. Group 1 Group 2 Meandiff HD-1 HD-5 1.899 HD-1 LM-1 2.1042 HD-1 LM-5 2.7904 HD-1 MCC-5 2.0375 HD-1 MCS-1 1.6042 HD-1 MCS-5 1.7708 HD-1 RA-1 1.6728 HD-5 LM-5 0.8914 HD-5 RA-5 -1.3282 LM-1 MCC-1 -1.25 LM-1 MCS-1 -0.5 LM-1 RA-5 -1.5333 LM-5 MCC-1 -1.9363 LM-5 MCS-1 -1.1863 LM-5 MCS-5 -1.0196 LM-5 RA-1 -1.1176 LM-5 RA-5 -2.2196 MCC-1 MCC-5 1.1833 MCC-5 RA-5 -1.4667 MCS-5 RA-5 -1.2
Table 4: Value results for the post-hoc Tukey HSD test with , only significant results are shown.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
390067
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description