Exploration in NetHack With Secret Discovery
Abstract
Roguelike games generally feature exploration problems as a critical, yet often repetitive element of gameplay. Automated approaches, however, face challenges in terms of optimality, as well as due to incomplete information, such as from the presence of secret doors. This paper presents an algorithmic approach to exploration of roguelike dungeon environments. Our design aims to minimize exploration time, balancing coverage and discovery of secret areas with resource cost. Our algorithm is based on the concept of occupancy maps popular in robotics, adapted to encourage efficient discovery of secret access points. Through extensive experimentation on NetHack maps we show that this technique is significantly more efficient than simpler greedy approaches. We further investigate optimized parameterization for the algorithm through a comprehensive data analysis. These results point towards better automation for players as well as heuristics applicable to fully automated gameplay.
I Introduction
Many video games place emphasis on the idea of exploration of the unknown. In roguelikes, a popular subset of RolePlaying Games (RPGs), exploration of the game space is a key game mechanic, essential to resource acquisition and game progress. The high level of repetition involved, however, makes automation of the exploration process useful, as an assistance in game design, for relieving player tedium in relatively safe levels or under casual play, and to reduce control requirements for those operating with reduced interfaces [1]. Basic forms of automated exploration are found in several roguelikes, including the popular Dungeon Crawl Stone Soup.
Algorithmic approaches to exploration typically aim at being exhaustive. Even with full information, however, ensuring complete coverage can result in significant inefficiency, with coverage improvement coming at greater costs as exploration continues [2]. Diminishing returns are further magnified in the presence of “secret rooms,” areas which must be intentionally searched for at additional, nontrivial resource cost, and which are a common feature of roguelike games. In such contexts the complexity is less driven by the need to be thorough, and more given by the need to balance the time spent exploring a space with respect to the amount of benefit accrued (area revealed, items collected).
In this work we present a novel algorithm for exploration of an initially unknown environment. Our design aims to accommodate features common to roguelike games. In particular, we aim for an efficient, balanced approach to exploration, considering the cost of further exploration in relation to the potential benefit. We factor in the relative importance of different areas, focusing on room coverage versus full/corridor coverage, and address the existence of secret rooms (secret doors) as well. Our design is inspired by a variation of occupancy maps, adapted from robotics into video games [3]. In this way we can control how the space is explored, following a probability gradient that flows from places of higher potential benefit.
We compare this approach with a simpler, greedy algorithm more typical of a basic automated strategy, applying both to levels from the canonical roguelike, NetHack. This environment gives us a realistic and frequently mimicked game context, with uneven exploration potential (rooms versus corridors), critical resource limitations (every move consumes scarce food resources), and a nontrivial, dungeonlike map environment, including randomized placement and discovery of secret doors. Compared to the greedy approach, our algorithm shows improvement in overall efficiency, particularly with regard to discovery of secret areas. We enhance this investigation with a deep consideration of the many different parameterizations possible, showing the relative impact of a wide variety of algorithm design choices.
Our design is intended to provide a core system useful in higher level approaches to computing game solutions, as well as in helping good game design. For the former we hope to reduce the burden of exploration itself as a concern in research into techniques that fully automate gameplay.
Specific contributions of this work include:

We heavily adapt a known variation on occupancy maps to the task of performing efficient exploration of dungeonlike environments.

We further extend the exploration algorithm to address the presence of secret doors. Locating and stochastically revealing an unknown set of hidden areas adds notable complexity and cost to optimizing an exploration algorithm.

Our design is backed by extensive experimental work, validating the approach and comparing it with a simpler, greedy design, as well as exploring the impact of the variety of different parameterizations available in our approach.
This work builds on a previous short (poster) publication, wherein we described the basic exploration algorithm [4]. Here we significantly extend that work, incorporating discovery of secret doors into the greedy and occupancy map algorithms, performing additional experimental comparison in that context, and adding a nontrivial regression analysis to better understand the importance of the many individual parameters involved in the algorithm design.
Ii Related Work
Automated exploration or mapping of an environment has been frequently studied in several fields, primarily including robotics and with respect to the problem of graph traversal, with the latter having some connections to video games.
Exploration in robotics branches into many different topics, with some factors being the type of environment to be explored, amount of prior knowledge about the environment, and accuracy of robotic sensors. One frequentlydiscussed approach is simultaneous localization and mapping (SLAM), where a robot must map a space while keeping precise its current position inside said space. Since the environment we deal with in this paper gives a topdown view and thus accurate information about player position, we can avoid this issue. Good surveys of robotic exploration algorithms, with coverage on the SLAM issue, can be found in [5] and [6]. A more general survey of robotic mapping with coverage of exploration can be found in [7].
One algorithm popular in robotics for exploring unknown environments is known as occupancy mapping [8, 9]. This approach, used in conjunction with a mobile robot and planning algorithm, maps out an initially unknown space by maintaining a grid of cells over the space, with each cell representing the probability that the corresponding area is occupied (by an obstacle/wall, e.g.). With this data structure, knowledge within a certain confidence margin can be established about which areas of the space are traversable, with the data from different sensors being combined to even out sensor inaccuracies.
This sort of representation of the learned map must then be leveraged to decide where to move next for efficient exploration. Strategies typically involve an ordering or choice of frontiers to visit, sometimes determined by an evaluation function which takes into account objectives like minimizing distance travelled or exploring the largest amount of map the fastest. Yamauchi described a strategy using occupancy maps to always move towards the closest frontier in order to explore a space [10], with a focus on how to detect frontiers in imprecise occupancy maps. GonzàlezBaños and Latombe discussed taking into account both distance to a frontier and the ‘utility’ of that frontier (a measure of the unexplored area potentially visible when at that position) [11], also taking into account robotic sensor issues. We use a similar costutility strategy for our evaluation function, with utility determined by probabilities in the occupancy map, and cost by distance to player. Juliá showed that a costutility method for frontier evaluation explores more of the map faster than the closest frontier approach, but in the end takes longer to explore the entire map than the latter since it must backtrack to explore areas of low utility [5]. Further discussion and comparison of evaluation functions can be found in [12].
The exploration problem in robotics is also related to the coverage path planning problem, where a robot must determine a path to take that traverses the entirety of a space. A cellular decomposition of the space is used in many such approaches. For example, Xu et al. presented an algorithm to guarantee complete coverage of a known environment (containing obstacles) while minimizing distance travelled based on the boustrophedon cellular decomposition method, which decomposes a space into slices [13]. See Choset [14] for a comprehensive discussion and survey of selected coverage approaches.
There have also been formulations of exploration in the context of graph traversal. An obvious correspondence exists with the travelling salesman problem. Kalyanasundarum and Pruhs describe the ‘online TSP’ problem as exploring an unknown weighted graph, visiting all vertices while minimizing total cost, and presented an algorithm to do so efficiently [15]. Koenig analyzed a greedy approach to explore an unknown graph (to always move to the closest frontier), and showed that the upper bound for worstcase travel distances for full map exploration is reasonably small [16, 17]. Hsu and Hwang demonstrate a provably complete graphbased algorithm for autonomous exploration of an indoor environment [18].
Graph traversal for exploration can also be applied to video games. Chowdhury looked at approaches for computing a tour of a fully known environment in the context of exhaustive exploration strategies for nonplayer characters in video games [2]. Baier et al. proposed an algorithm to guide an agent through both known and partially known terrain in order to catch a moving target in video games [19]. Hagelbäck and Johansson explored the use of potential fields to discover unvisited portions of a realtime strategy game map with the goal of creating a better computer AI for the game [20]. Our work, in contrast, focuses on uneven exploration in sparse, dungeonlike environments, where exhaustive approaches compete with critical resource efficiency.
Iii Background
Three concepts underpin our work and will be briefly discussed below: the particular flavour of occupancy maps used as the basis for our exploration algorithm; the game used for our research environment; and a brief elucidation on the concept of secret rooms and their presence in video games.
Occupancy Maps in Games
Using the aforementioned occupancy maps from robotics as inspiration, Damián Isla created an algorithm geared towards searching for a moving target in a video game context [21]. The algorithm has been used in at least one game to date [3].
Like the original occupancy map, here a discrete grid of probabilities is maintained over a space (e.g., game map), but here a probability represents confidence in the corresponding area containing the target or not. A nonplayer character (NPC) can then use said map to determine where best to move in order to locate a target (such as the player).
At each timestep, after the NPC moves, probabilities in the grid will update to account for the target’s movement in that timestep. At any timestep, the searcher (NPC) can only be completely confident that the target is not in the cells within its fieldofview (i.e., those cells will have a probability of 0 if the target is not present there). If the target is in sight, then the NPC can simply move towards them; if not, then probabilities in the grid will diffuse to their neighbours, to account for possible target movements in the areas outside the searcher’s current fieldofview. Diffusion for each cell outside the NPC’s field of view at time is performed as follows (assuming each cell has 4 neighbours):
where controls the amount of diffusion.
Our implementation of occupancy maps borrows concepts from Isla’s formulation, namely the idea of diffusion, which is repurposed for an exploration context.
NetHack
NetHack is a popular roguelike video game created in 1987 and is used as the environment for our experiments. Gameplay occurs on a twodimensional textbased grid of size 80x20, wherein a player can move around, collect items, fight monsters, and travel to deeper dungeon levels. To win the game, a player must travel through all 53 levels of the dungeon, fight the high priest of Moloch and collect the Amulet of Yendor, then travel back up through all the levels while being pursued by the angry Wizard and finally ascend through the five elemental planes [22].
Levels in NetHack consist of large, rectangular rooms (around 8 on average) connected by mazelike corridors. Levels can be sparse, with many empty (nontraversable) tiles. For the most part, levels are created using a procedural content generator, an advantage for conducting research in exploration since an algorithm can be tested on many different map configurations. At the start of each level, the player can observe only their current room with the rest of the map hidden, and must explore to uncover more. An example of a typical Nethack map is presented in Figure 1; other maps can be seen in Figures 2 and 6.
Although map exploration is important, it is also exigent to do so in a minimal fashion. Movement in NetHack is turnbased (each move taking one turn), and the more turns made, the more hungry one becomes. Hunger can be satiated by food, which is randomly and sparingly placed within the rooms of a level [23]. Most food does not regenerate after having been picked up on a level, so a player must move to new levels at a brisk pace to maintain food supplies. A player that does not eat for an extended period will eventually starve to death and lose the game [24].
In this context, it is critical to minimize the number of actions taken to explore a level so that food resources are preserved. Rooms are critical to visit since they may contain food and items that increase player survivability, as well as the exit to the next level (needed to advance further in the game). Conversely, the corridors that connect rooms have no intrinsic value. Some may lead to deadends or circle around to already visited rooms. Exploring all corridors of a level is typically considered a waste of valuable actions. Therefore, a good exploration strategy will minimize visitation of corridors while maximizing room visitation.
Secret areas
Secret areas are a popular element of game levels and motivate comprehensive exploration of a space. These areas are not immediately observable by a player and must be discovered through extra action on the player’s part. Secret areas can be a mechanism to reward players for thoroughly exploring an area, sometimes containing valuable rewards [25]. In certain genres, secret areas are irrelevant to player power but confer a sense of achievement for the player clever enough to find them. Gaydos & Squire found that hidden areas in the context of educational games are memorable moments for players and generate discussion amongst them [26]. Secret areas are common in many game genres, being perhaps most prevalent in the roguelike genre, with the prototypical roguelike games (Rogue, NetHack, et al.) all including procedurallygenerated secret areas. Unlike the areas discussed by Gaydos & Squire, however, procedurallygenerated secret areas seem to involve less excitement since the searching process becomes repetitive.
Not much work has been done in terms of algorithms to search for secret areas. In terms of NetHack specifically, the ‘BotHack’ autonomous player (the first bot to win the game) employs a simple secret area detection strategy. If either the exit to the next level has not yet been found and/or there is a large rectangular chunk of the level that is unexplored and has no neighbouring frontiers, it will start searching at positions that face that area [27, 28].
NetHack implementation
Secret areas in NetHack are created during level generation by marking certain traversable spots of the map as hidden. Both corridors as well as doors (areas that transition between rooms and corridors) can be marked as hidden (with a 1/8 chance for a door, and 1/100 chance for a corridor) [29]. On average, there are 7 hidden spots in a level. These hidden spots initially appear to the player as regular room walls (if generated as doors) or as empty spaces (if corridors) and cannot be traversed. The player can discover and make traversable a hidden spot by moving to a square adjacent to it and using the ‘search’ action, which consumes one turn. The player may have to search multiple times since revealing the secret position is stochastic.
Since searching consumes actions like regular movement, the number of searches as well as the choice of locations searched must likewise be optimized to preserve food resources. Intuitively, we would like to search walls adjacent to large, unexplored areas of the map, for which there do not appear to be any neighbouring frontiers. Similarly, corridors that end in deadends are also likely candidates for secret spots, as seen in Figure 2.
With the NetHack method of secret spot generation in mind, it becomes clear that it is not a good idea to attempt to discover every single hidden spot on a map. Some secret doors or corridors may lead to nowhere at all, or perhaps lead to another secret door which opens into a room that the player has already visited. Depending on the map configuration, the player may be able to easily spot such an occurrence and avoid wasting time searching in those areas. There is also a tradeoff between finding all disconnected rooms in a map and conserving turns; if there is only a small area of the map that seems to contain a hidden area, then spending a large effort trying to find it may not be worthwhile.
Iv Exploration Approach
Below we detail the basic exploration algorithm involving occupancy maps, and contrast it with a simpler, greedy approach as well as an approximately optimal solution. Key to our algorithm is the idea of limiting exploration to a subset of interesting space in order to minimize exploration time, by taking into account frontier utility and distance. We begin by discussing the modified NetHack environment in which the algorithms will run, followed by an outline of each algorithm with and without support for detecting secret areas. Results and discussion close the chapter with an emphasis on analysis of algorithm parameters.
Environment
A modified version of the base NetHack game is used to test our exploration algorithms. Mechanics that might alter experiment results were removed, including monsters, starvation, weight limitations, locked doors, and certain dungeon features that introduce an irregular field of view. In addition, a switch to enable or disable generation of secret doors and corridors was added.
The maps used in testing are those generated by NetHack for the first level of the game. The same level generation algorithm is used throughout a large part of the game, so using maps from only the first level does not limit generality. Later levels can contain special, fixed structures, but there is no inherent obstacle to running our algorithm on these structures; we are just mainly interested in applying exploration to the general level design (basic room/corridor structure).
The algorithms below use the NetHack player field of view. When a player enters a room in NetHack, they are able to immediately perceive the entire room shape, size, and exits (doors). In corridors, knowledge is revealed about only the immediate neighbours to the player’s position. Our algorithms will gain the same information as the player in these cases. We do not however support ‘peaking’ into rooms, where a player can perceive a portion of a room by being parallel to and within a certain distance of one of its doors.
Greedy algorithm
A greedy algorithm is used as baseline for our experiments, which simply always moves to the frontier closest to the player. This type of approach is often formalized as a graph exploration problem, where we start at a vertex , learn the vertices adjacent to , move to the closest unvisited vertex (using the shortest path) and repeat [16]. The algorithm terminates when no frontiers are left. We also take into account the particularities of the NetHack field of view as described above (when we enter a room, all positions in the room are set to visited, and its exits are added to the frontier list).
Note that this formulation will by nature uncover every traversable space on the map, both rooms and corridors alike.
Approximately optimal algorithm
For a lower bound on the number of moves needed to visit all rooms on a NetHack map, we present an approximately optimal algorithm. We call the algorithm ‘optimal’ since it will be given the full map and so can plan the best route to take for room visitation. It is only approximate since it will seek to visit the center of each room, while a faster version could move from room exit to room exit, avoiding the center and thus saving a couple of moves per each room on a map.
To run this algorithm, we construct a complete graph where each vertex represents the centroid of a room on the current NetHack map, and edges between room centroids represent the shortest distance between them in map units (calculated using A*). We then pass this graph to a travelling salesman problem solver, along with the player’s starting room. In order to prevent the TSP solver from returning to initial centroid at end, we add two ‘dummy’ vertices, one with a connection to every other vertex at cost of 0, and the other connected to the starting room and other dummy vertex with cost of 0, as suggested by [30].
This solution will guarantee exploration of all rooms, but not necessarily all corridors (similar to the occupancy map algorithm, below). It is thus a lowerbound to said algorithm, but of course cannot explore intrinsically since it must know the full map in advance.
Note that this problem is similar to the shortest Hamiltonian path problem, which attempts to find a path that visits each vertex on a map, but requires that each vertex only be visited once which may not be possible in many maps.
Occupancy maps
With any exploration strategy, there are two key parts: the internal representation of the space to be explored, and how said representation is used in planning where to move next. Both components of our strategy will be described below, in addition to a detailed look at how diffusion, a concept from Damián Isla’s algorithm for searching for a moving target, is used as the engine that drives planning.
The main goal of the algorithm is to optimize exploration time by prioritizing visitation of areas most likely confer benefit (rooms) while minimizing time spent in unhelpful areas (corridors). The combination of an occupancy map as representation with a frontier list and frontier evaluation function will allow for an identification of which frontiers are more likely to lead to helpful areas. As mentioned earlier, only the rooms (and not corridors) of a NetHack level contain food (necessary for survival) and other useful items, so minimizing corridor visitation (by ignoring certain frontiers) does not have any drawback with regard to food/item collection.
A key parameter of the algorithm is the probability threshold value. The threshold value controls in a general sense the cutoff for exploration in areas of lower benefit; a higher value will mark more frontiers as unhelpful and thus focus exploration on areas of higher benefit (giving a tradeoff between time and amount explored). This threshold can be fixed at the start of the algorithm, or in another formulation, it could vary depending on the percentage of map uncovered (ignoring more frontiers as more of the map gets uncovered).
Representation
To represent the map of a NetHack level we use something akin to an occupancy map, which will store information about the map as in robotics. However, there are a few key differences, since our goal is to have a data structure that helps us determine general areas that are beneficial to visit (i.e., locations of as yet undiscovered rooms in a NetHack map), not just locations of obstacles (walls).
In robotics, an occupancy map is used to mark areas that contain obstacles; here we use it to mark visited (open) areas. Each cell of the occupancy map contains a probability, like in robotics, but instead of representing a combination of sensor readings, here it is rather an estimate of how likely that cell/area is to contain an unexplored room. Thus, a cell probability of zero means there is no chance an unexplored room can be found in that cell; we thus assign zero probability to any already visited room/corridor cell. Specifically, whenever we observe a room/corridor, we add its coordinate(s) to our memory; at each timestep, we set the probability of each coordinate cell in our memory to 0 in the occupancy map. (These cell probabilities must be reset at each timestep since the diffusion step we run may alter them, as detailed below.) After setting a cell to 0 for the first time, we also renormalize all other cells in the grid to ensure the total probability sums to 1.
Figure 3 gives a visualization of a sample occupancy map, with darker areas corresponding to lower probabilities (e.g., visited rooms/diffused areas).
Diffusion
Diffusion of probabilities is a central concept in Isla’s algorithm, as mentioned earlier, and we here adapt it for two purposes: to elicit a gradient of probability that flows from visited areas into unknown areas, in order to better measure the utility of frontiers, as well as to separate the occupancy map into distinct components of high probability. We leave explanation of the latter purpose for a later section, here discussing the former, as well as describing how and when to run diffusion.
Diffusion affects the utility of a frontier. By dispersing the zero probability of visited rooms into surrounding areas, frontiers close to low probability areas can more easily be identified and ignored during exploration. This effect is desirable since these frontiers likely do not lead to as yet undiscovered rooms. These low utility frontiers are shown as red triangles in the occupancy map of Figure 3. In particular, a frontier is ignored when all of its neighbours have probability below the threshold value. For a more forgiving measure, the neighbours of neighbours could also be checked – or neighbours up to distance away.
An example of how this diffusion is advantageous can be seen in the NetHack map of Figure 1. At the top of the map, there is a room in the centre that has an unopened door in its topleft corner. A few spaces past this wall, there are some observed corridors. When the occupancy map algorithm is run, the low probabilities from the visited corridors and room will diffuse towards each other, lowering the utility of the door frontier. This behaviour is desirable since there is no need to visit a door which has no chance to lead to an undiscovered room.
Diffusion is run at each timestep by imparting each cell with a fragment of the probabilities of its neighbouring cells, as given in the diffusion formula in section III. For extra diffusion, we also diffuse inward from the borders of the occupancy map. Specifically, when updating cells that lie on the borders, we treat their outofbounds neighbours as having a fixed low probability. Diffusing in this manner tends to increase separation of components of high probability (since rooms/corridors rarely extend to the edge of the map). More importantly, it lessens the utility of frontiers that lie near the borders, which are in fact most likely deadends.
Diffusion is only run at each timestep that a new part of the map (room/corridor) is observed. By diffusing only at these times, probabilities in the occupancy map will not change while we are travelling to a frontier through explored space (and neither will the length of distance travelled have an effect). Probabilities will diffuse at the rate that map spaces are uncovered, and stop when the map is completely known.
This scheduling is the opposite of the diffusion in Isla’s algorithm, which diffused when the search target was not observed to account for possible movements of the target. In our case, however, the ‘targets’ (unexplored rooms) are fixed.
Planning
With this representation in place, we now use the information it contains to select the most promising frontier to visit, while (as previously stated) ignoring frontiers that lie in areas of low probability. To do so, we need a global view of the areas of high utility in the map, in the form of collections of adjacent cells of high probability, or components. There are two basic parts to the process: identifying these components, and then evaluating them to find the most useful one. First we describe reasons for dealing with components instead of frontiers directly.
At any given time, there could be many frontiers: unvisited doors in rooms, unvisited spots in corridors, etc. Since we want to move to frontiers that have the highest probability of leading to an unvisited room, the utility of visiting any particular frontier should be in some way based on the amount of adjacent cells of high probability in the occupancy map. For each of these collections of cells, or components, there could be multiple adjacent frontiers, perhaps right next to each other, or bordering disparate sides of the component. To make computation easier and better elucidate differences in value between frontiers, we first determine these general components, evaluate them (based on utility and distance), then pick the frontier closest to the best component, instead of dealing with frontiers directly.
Components are retrieved by running a depthfirst search (DFS) on the occupancy map, traversing any cell that has a probability value above the threshold. To further increase separation of components, we do not visit cells that have less than a certain number of traversable neighbours, which helps to deal with narrow alleys of high probability cells that could otherwise connect two disparate components.
Some components are ignored due to small size or absence of neighbouring frontiers. If a component is smaller than the minimum size of a NetHack room, it is impossible for a room to be there. Likewise, if a component has no neighbouring frontiers, it cannot contain a room since there is no access point (unless secret doors/corridors are enabled, as discussed later). Pseudocode for finding the components in the occupancy map is shown in Figure 5.
The visualization of a sample occupancy map in Figure 3 gives an idea of this process, with three components highlighted in different colours using a crisscross pattern. Each of the three are cut off from the others because the neighbouring rooms have diffused towards the edges of the map (and the border has diffused towards them). Meanwhile, the unmarked component in the upperright is ignored since it has no neighbouring frontiers.
The list of remaining components are then passed through an evaluation function to determine which one best maximizes a combination of utility and distance values. Utility is calculated by summing the probabilities of all cells in the component. (The sum is then normalized by dividing by the sum of all probabilities in the map.) To determine distance to player, the component is first matched to the closest frontier on the open frontiers list (by calculating the Manhattan distance from each frontier to the closest cell in the component). Distance from component to player is then calculated as: with the first half calculated using A*, and the second half using Manhattan distance (since that part of the path goes through unknown space). This distance is then normalized by dividing by the sum of the distances for all frontiers for the specific component under evaluation. With the normalized utility and distance values, we pick the component that maximizes , where controls the balance of the two criteria.
Once the best component is determined, the algorithm moves to the frontier matched to that component. On arrival, it will learn new information about the game map, update the occupancy map, and run diffusion. Components will then be reevaluated and a new frontier chosen. Exploration terminates when no interesting frontiers remain. Pseudocode for the main planning loop of the exploration approach is presented in Figure 4.
Greedy algorithm for secret rooms
A trivial adaptation can be made to the basic greedy algorithm in order to support searching for secret areas. When entering a room, before proceeding to the next frontier, each wall of the room is searched for secret doors for a certain number of turns. Searches are also performed in deadend corridors. If a secret door/corridor is discovered upon searching, it is added to the frontier list as usual. Exploration ends when no frontiers or remaining search targets remain.
For efficiency, searching for doors in a room is done by first choosing the unsearched wall closest to the player, then selecting a spot next to the wall that is adjacent to the most walls still needing to be searched (since searching can be performed diagonally).
Note that this approach will not be capable of finding all secret corridors in a level, since they may (rarely) appear in regular (not deadend) corridors. However, searching all corridors would be too strenuous for this edge case. The below occupancy map approach also ignores these rare secret corridors.
Occupancy maps for secret rooms
The occupancy map algorithm has a natural extension to support the discovery of secret door and corridor spots. In the original case, components of high probability in the occupancy map with no neighbouring frontiers would be ignored, but here, these components are precisely those that we would like to investigate for potential hidden rooms. Below we detail the adjustments necessary for this extension.
The first modification relates to the component evaluation function. Since these ‘hidden’ components have by definition no bordering frontiers, the distance from player to frontier and frontier to component used in the evaluation must be adjusted. Instead of using a frontier to calculate distance, we will choose a particular room wall or deadend corridor adjacent to the hidden component, and calculate distance using that.
The selection of such a room wall or deadend corridor for a hidden component requires its own evaluation function. This function will likewise consider both utility and distance. Utility is given by the number of searches already performed at that spot. Distance is taken as the length from the spot to the player plus the length from the spot to the closest component cell. Distance to player is calculated using A*, and distance to closest cell by Manhattan distance. Walls whose distance from the component exceed a certain maximum will be ignored. Both distance and search count are normalized, the former by dividing by the sum of distances for all walls, and the latter by dividing by the sum of search counts for all walls. We then pick the spot that minimizes , where is the parameter that controls the balance of the two criteria. (The value is minimized in order to penalize larger distance and higher search counts.)
The selected wall/corridor spot is used in place of a frontier in component evaluation which proceeds as described earlier. If after evaluation a hidden component is selected, then we will move to the closest traversable spot adjacent to the component’s associated wall/corridor spot. In case of ties in closest distance, the spot adjacent to the most walls will be chosen to break the tie, since searches performed at a position will search all adjacent spots (including diagonally).
When the player reaches the search location, the algorithm will use the search action for a certain number of turns (a parameterized value), before reevaluating all components and potentially choosing a new target. If a secret door or corridor spot is discovered while searching, it is added to the open frontier list and its probability in the occupancy map is reset to the default value. (Diffusion is then run throughout the map since new information has been revealed.)
It is possible for a hidden component to not contain a secret area. Thus, if a wall or deadend corridor surpasses a certain number of searches (a parameterized value) with no hidden spots being revealed, it will no longer be considered as a viable search target.
Exploration terminates when there no components are left, or only hidden components remain and none have searchable walls below the maximum search parameter.
Figure 6 presents a visualization of a sample occupancy map with secret doors/corridors enabled and corresponding NetHack map. The component on the left side (marked with a grid pattern) has no neighbouring frontiers and is thus considered a hidden component; nearby walls that will be considered for searching during evaluation are marked with blue squares. (In this case, a low minimum wall distance is used, preventing walls in the lower room from being selected for evaluation.)
V Experimental Results
Results will be shown below for the greedy and occupancy map algorithms as a function of the exhaustive nature of their searching, followed by results for the algorithms that can search for secret areas. We will look first at the metrics to be used for comparison of the algorithms.
Exploration metrics
To evaluate the presented exploration strategies, we use as metrics the average number of actions per game (lower is better) as well as average percentage of rooms explored, taken over a number of test runs on randomized NetHack maps. As will be seen below, the presented algorithms tend to do quite well on these metrics. Thus, to get a more finegrained view of map exploration which penalizes nonexhaustive exploration, we also use a third metric which we call the ‘exhaustive’ metric. This metric counts only the runs that explored all rooms on a map, with runs that did not counted as zero. We do not use amount of food collected as a metric since food is usually uniformly randomly distributed amongst map rooms, and so is highly correlated with the percentage of rooms explored.
For algorithms that support detection of secret areas, two further metrics are used: the average percentage of secret doors and corridors found, and the average percentage of ‘secret rooms’ found. Neither of these metrics are ideal, however, and it is important to understand limitations in evaluating secret room discovery.
The average percentage of secret doors/corridors found is problematic since it does not correlate well with actual benefit – only a handful of secret spots will lead to undiscovered rooms and so be worth searching for. Further, it is biased towards the greedy algorithm, since that algorithm will search all walls, and so have a higher chance to discover more secret doors than the occupancy map algorithm, which will only search areas selected by its evaluation function.
The average percentage of ‘secret rooms’ found is also problematic, due to the ambiguous classification of secret rooms. One of the possible ways to define secret rooms in the NetHack context is to classify them as any room not directly reachable from the player’s initial position in the level. In this case, the metric would be dependent on the individual level configuration: a map could exist such that the player actually starts in a ‘secret’ room, separated from the rest of the map by a hidden door, and the algorithm would only have to find that spot in order to get a full score for this metric.
Further, while almost all maps tend to contain secret doors or corridors, only approximately half of all maps contain secret rooms as defined above (in the other half, any secret doors/corridors that exist lead nowhere useful). This discrepancy also skews the secret room metric since maps containing no secret rooms will still get a full score using that metric.
Exhaustive approaches
Figure 7 presents results for the exhaustive exploration approaches (those that explore all rooms on a map). Each result is an average over 200 runs on different randomlygenerated NetHack maps. The greedy algorithm comes in at around 324 average actions per game, while the average for the fastest occupancy map model (with parameters that gave complete exploration on 99.5% of all runs) is 292 actions.
The greedy algorithm by nature explores all corridors, while the occupancy map algorithm limits exploration to areas likely to contain new rooms. The greedy algorithm is also a bit more reliable for complete room discovery than the occupancy map algorithm: we cited in the figure the occupancy map model that discovered all rooms in 99.5% of runs, meaning a small number of runs failed to discover all rooms on the map (maybe missing one or two rooms in those cases).
In the same figure we present the result for the approximately optimal solution for room visitation, which visits all rooms in 122 actions on average. This approach can only be applied to a fullyknown map, and so does not lend itself to exploration, but is instructive as a lowerbound. The large discrepancy between this result and the other two algorithms is the result of this algorithm knowing where all the rooms are; the true exploration approaches can make mistakes in guessing, perhaps wandering down a corridor that seems likely to lead to a room but instead terminates in a deadend.
Nonexhaustive approaches
Exhaustive approaches are fine in certain circumstances, but it is often acceptable to occasionally leave one or two rooms on a map unexplored, especially when there is a cost to movement. Figure 8 gives the results for the bestperforming nonexhaustive occupancy map models in terms of actions taken vs. percentage of rooms explored. Each model (represented by blue dots) represents an average over 200 runs using a unique combination of model parameters. (A grid search over the parameter space was performed – the models shown lie on the upperleft curve of all models.)
As seen in the figure, there is a mostly linear progression in terms of the two metrics. The relationship between the ‘exhaustive’ metric and total percentage of explored rooms is also consistent, with both linearly increasing.
The figure also shows that by sacrificing at most 10% of room discovery on average, the average number of actions taken can be decreased to 200, compared to the 282 average actions of the exhaustive (99.5%) approach.
To determine the importance of the various parameters of the occupancy map algorithm, a linear regression was performed. Parameter coefficients for average actions and percentage of rooms explored under the ‘exhaustive’ metric are shown in Figure 9. Rsquared values for the regression were 0.742/0.693 (for average actions and room exploration) on test data. Running a random forest regressor on the same data gave the same general importances for each parameter with more confident rsquared values of 0.993/0.993, but those importances are not presented here due to lack of indication of the correlation direction.
The coefficients indicate that parameters directly associated with probabilities in the occupancy map are most effective on average actions and percentage of rooms explored. These parameters include the diffusion factor (how much to diffuse to neighbours), border diffusion factor (how much to diffuse from outer borders), probability threshold (at what probability to ignore frontiers, etc.), and whether to vary the threshold as more of the map is explored. The border diffusion is probably important due to the small (80x20) map size; on larger maps, it is less likely that this parameter would have such an impact.
Meanwhile, parameters that influence component size and choice, like distance factor (importance of distance in component evaluation) and minimum number of neighbours for a cell to be visited by DFS (which separates components connected by small alleys) did not seem to have a pronounced effect on the metric values. This finding may suggest that the location of frontiers, and ignoring ones that lie in areas of low probability, has more of an impact than the separation of components.
The specific parameter values that led to the fastest performing exhaustive exploration model (presented in Figure 7) were as follows: diffusion factor of 1, distance importance of 0.75, border diffusion of 0.75, minimum room size of 7, DFS min. neighbours of 4, probability threshold of 0.15, vary threshold set to false, and frontier radius of 0. The parameters for the fastest model at 80% nonexhaustive exploration (the full map being explored about 30% of the time) using 167 actions on average (as shown in Figure 8) were: diffusion factor of 0.75, distance importance of 0.25, border diffusion of 0.5 (smaller values diffuse more), minimum room size of 7, DFS min. neighbours of 8, probability threshold of 0.5, vary threshold set to false, and frontier radius of 0.
One parameter, the minimum component size, had very little effect on results. Small component sizes will have low summed probability, giving the component a low evaluation score; diffusion will then eliminate it after a certain time, so removing it beforehand is unnecessary, except in terms of decreasing computation time (which is why it was introduced).
Secret rooms
Greedy algorithm for secret rooms
Figure 10 shows the results for the greedy algorithm with support for secret detection in terms of average actions versus exploration. Different colours represent the different settings for the number of searches per wall parameter (the number of times the algorithm will search a particular wall/corridor before moving on). Both the average percentage of secret rooms found and average percentage of secret doors and corridors found are displayed.
As expected, both metrics increase as the number of searches per wall increases, plateauing at around 95% discovery of both secret rooms and secret doors/corridors at around 2250 average actions per game. As mentioned earlier, the algorithm will only search for secret corridors in deadends, so the 5% of hidden spots not found is most probably from secret corridors occurring (rarely) in other locations.
Another observation is that when the number of searches per wall is set to 0, the algorithm is reduced to the regular greedy algorithm, with no secret doors/corridors being found (since there no searching is performed). The approximately 50% score for the secret rooms metric is due to the fact that, in that percentage of runs, there were no secret rooms at all, thus giving 100% exploration as mentioned in the metrics discussion.
Occupancy maps for secret rooms
Figure 11 gives the results for the bestperforming secretdetecting occupancy map models in terms of best time versus highest secret room exploration. Each model represents an average over 200 runs using a unique combination of model parameters. (A grid search over the parameter space was performed; the models shown lie on the upperleft curve of all models.)
Results here are much better than the greedy algorithm, with approximately 90% secret room exploration at around 500 actions. The reason for the discrepancy between this result and the greedy algorithm (over 1600 actions for 90%) is that the occupancy map model has global knowledge of the map and can target particular walls for searching, in contrast to the greedy algorithm which searches every wall.
This global knowledge also explains the much lower percentage of secret doors/corridors discovered using this algorithm (20% for the model exploring 90% of secret rooms) compared to the greedy algorithm (80% for the model exploring the same percentage of secret rooms). This result is expected since exploration of secret doors/corridors only weakly correlates with secret room exploration (only a few secret doors/corridors will actually lead to otherwise inaccessible rooms).
Importances of the parameters for the secretdetecting occupancy map algorithm are shown in Figure 12. These importances were calculated by running a random forest regressor on the model results. Rsquared value for the average actions coefficient was 0.864 on the test data, while for the secret room exploration coefficient, the value was much lower at 0.334, suggesting that some parameters are not linearly independent in relation to that variable.
The importances show that the three diffusionrelated parameters (diffusion factor, border diffusion and probability threshold) continue to have a large impact on the average actions and secret room exploration metrics. In addition, other factors that did not have any importance in the earlier occupancy map algorithm have a significant impact here, particularly the minimum neighbours for DFS. This parameter affects the separation of components, suggesting that the use of components for this algorithm matters more than in the earlier case.
Parameters exclusive to this algorithm also had somewhat of an effect on the dependent variables, including the wall distance factor (importance of distance in the choice of walls to search for a hidden component) and maximum wall distance (maximum distance between a wall and a hidden component before it is removed from consideration for searching).
Vi Conclusions and Future Work
Automated exploration is an interesting, surprisingly complex task. In strategy or roguelike games, the tedium of repetitive movement during exploration is a concern for players, and offering efficient automation can be helpful. Exploration is also a significant subproblem in developing more fully automated, learning AI, and techniques which can algorithmically solve exploration can be useful in allowing further automation to focus on applying AI to higher level strategy rather than basic movement concerns.
In this work we detailed an algorithm for efficient exploration of an initially unknown environment. Inspired by the occupancy map algorithm by Damián Isla for tracking a moving target, we built an occupancy map approach to select frontiers to visit when performing exploration of interesting areas of a map, while also considering complete coverage. Our design notably improves over a more straightforward, greedy design, particularly in the presence of secret areas, where exploration cost versus benefit is especially important.
Our further work on the occupancy map algorithm aims at increasing efficiency in exploration. In particular, a ‘local’ diffusion of probabilities (within a radius of the player position) instead of the current global diffusion may prove fruitful to explore. Further verification of the algorithm on other video games with different map configurations would also be interesting.
Acknowledgements
This work supported by NSERC grant 249902.
References
 [1] K. Sutherland, “Playing roguelikes when you can’t see — Rock, Paper, Shotgun,” https://www.rockpapershotgun.com/2017/04/05/playingroguelikeswhenyoucantsee, 2017.
 [2] M. Chowdhury and C. Verbrugge, “Exhaustive exploration strategies for NPCs,” in Proceedings of the 1st International Joint Conference of DiGRA and FDG: 7th Workshop on Procedural Content Generation, August 2016.
 [3] D. Isla, “Third Eye Crime: Building a stealth game around occupancy maps,” in Proceedings of the 9th Annual AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2013.
 [4] J. Campbell and C. Verbrugge, “Exploration in NetHack using occupancy maps,” in Proceedings of the 12th International Conference on Foundations of Digital Games, August 2017.
 [5] M. Juliá, A. Gil, and O. Reinoso, “A comparison of path planning strategies for autonomous exploration and mapping of unknown environments,” Autonomous Robots, vol. 33, no. 4, pp. 427–444, 2012.
 [6] S. M. LaValle, Planning Algorithms. Cambridge, U.K.: Cambridge University Press, 2006, available at http://planning.cs.uiuc.edu/.
 [7] S. Thrun, “Robotic mapping: A survey,” in Exploring Artificial Intelligence in the New Millenium, G. Lakemeyer and B. Nebel, Eds. Morgan Kaufmann, 2002, pp. 1–35.
 [8] H. Moravec and A. E. Elfes, “High resolution maps from wide angle sonar,” in Proceedings of the IEEE International Conference on Robotics and Automation, March 1985, pp. 116–121.
 [9] H. Moravec, “Sensor fusion in certainty grids for mobile robots,” AI Magazine, vol. 9, no. 2, pp. 61–74, Jul 1988.
 [10] B. Yamauchi, “A frontierbased approach for autonomous exploration,” in Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation, July 1997, pp. 146–151.
 [11] H. H. GonzàlezBaños and J.C. Latombe, “Navigation strategies for exploring indoor environments,” International Journal of Robotics Research, vol. 21, no. 1011, pp. 829–848, October 2002.
 [12] F. Amigoni, “Experimental evaluation of some exploration strategies for mobile robots,” in Proceedings of the IEEE International Conference on Robotics and Automation, May 2008, pp. 2818–2823.
 [13] A. Xu, C. Viriyasuthee, and I. Rekleitis, “Efficient complete coverage of a known arbitrary environment with applications to aerial operations,” Autonomous Robots, vol. 36, no. 4, pp. 365–381, April 2014.
 [14] H. Choset, “Coverage for robotics — a survey of recent results,” Annals of Mathematics and Artificial Intelligence, vol. 31, no. 14, pp. 113–126, May 2001.
 [15] B. Kalyanasundaram and K. R. Pruhs, “Constructing competitive tours from local information,” Theoretical Computer Science, vol. 130, no. 1, pp. 125–138, Aug 1994.
 [16] S. Koenig, C. Tovey, and W. Halliburton, “Greedy mapping of terrain,” in Proceedings of the IEEE International Conference on Robotics and Automation, vol. 4, February 2001, pp. 3594–3599.
 [17] C. Tovey and S. Koenig, “Improved analysis of greedy mapping,” in International Conference on Intelligent Robots and Systems, vol. 4, October 2003, pp. 3251–3257.
 [18] J. Y.J. Hsu and L.S. Hwang, “A graphbased exploration strategy of indoor environments by an autonomous mobile robot,” in Proceedings of the IEEE International Conference on Robotics and Automation, vol. 2, May 1998, pp. 1262–1268.
 [19] J. A. Baier, A. Botea, D. Harabor, and C. HernÃ¡ndez, “Fast algorithm for catching a prey quickly in known and partially known game maps,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 7, no. 2, pp. 193–199, June 2015.
 [20] J. Hagelbäck and S. J. Johansson, “Dealing with fog of war in a Real Time Strategy game environment,” in IEEE Symposium on Computational Intelligence and Games, December 2008, pp. 55–62.
 [21] D. Isla, “Probabilistic targettracking and search using occupancy maps,” in AI Game Programming Wisdom 3. Charles River Media, 2005.
 [22] NetHack Wiki, “Standard strategy — NetHack wiki,” https://nethackwiki.com/wiki/Standard_strategy, 2016.
 [23] ——, “Comestible — NetHack wiki,” https://nethackwiki.com/wiki/Comestible#Food_strategy, 2016.
 [24] ——, “Starvation — NetHack wiki,” https://nethackwiki.com/wiki/Starvation, 2015.
 [25] K. Hullett and J. Whitehead, “Design patterns in FPS levels,” in Proceedings of the Fifth International Conference on the Foundations of Digital Games, 2010, pp. 78–85.
 [26] M. J. Gaydos and K. D. Squire, “Role playing games for scientific citizenship,” Cultural Studies of Science Education, vol. 7, no. 4, pp. 821–844, 2012.
 [27] J. Krajicek, “BotHack  a NetHack bot framework,” https://github.com/krajj7/BotHack, 2015.
 [28] ——, “Framework for the implementation of bots for the game NetHack,” Master’s thesis, Charles University in Prague, 2015. [Online]. Available: https://is.cuni.cz/webapps/zzp/detail/151037/?lang=en
 [29] NetHack Dev Team, “NetHack 3.6.0: Download the source,” http://www.nethack.org/v360/downloadsrc.html, 1987–2015.
 [30] E. L. Lawler, J. K. Lenstra, A. H. G. R. Kan, and D. B. Shmoys, Eds., The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. New York: Wiley, 1986.