Safe and Complete RealTime Planning and Exploration
in Unknown Environments
Abstract
We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward reachable set of the initial state, while safely exploring the space. This preserves the safety of the initial state, and guarantees that that we will eventually find the goal if it is possible to do so while exploring safely. We implement our framework in the Robot Operating System (ROS) software environment and demonstrate it in a realtime simulation.
in: \AtEveryBibitem\clearlistlanguage\clearfieldpages
The first two authors contributed equally to this work.
I Introduction
Motion planning is a foundational problem in mobile robotics, and the community has devoted significant effort to building theoretical and practical tools for a wide variety of applications. Traditionally, the output of a motion planner is a desired plan or trajectory for a dynamical system model. This trajectory is then tracked by one or more layers of lowlevel controllers. Since the real, physical vehicle may follow higherorder, more complex dynamics than those used during planning, the trajectory it actually follows will not coincide with that which was planned. This presents a problem for planners which aim to provide collisionavoidance guarantees.
Recently, the FaSTrack framework [herbert2017fastrack] provides a mechanism for quantifying the maximum tracking error between a highorder dynamical model of the physical system and a lowerorder model used for planning. This analysis can be done offline, using a reachability computation, and supplied to a realtime motion planner for online collisionchecking. Other, similarly motivated work, e.g. [Singh2017], also seeks to quantify this maximum tracking error.
Still, a key challenge remains: in a priori unknown environments where obstacles are sensed online, it can be difficult to guarantee recursive feasibility. Informally, a planning algorithm is recursively feasible if it explores the environment safely and without losing its ability to reach the goal. The dangers of unsafe exploration are illustrated in Fig. 1 (left), in which a nonrecursively feasible planner enters a dead end which the system cannot exit. Most motion planners bypass these issues, for example by assuming full prior knowledge of the environment or by assuming that it is safe to stop and possible to do so instantaneously. While such techniques are effective in many scenarios, there are important applications and systems for which safe exploration really does matter, e.g. a fixedwing aircraft operating with limited visibility. More generally, it is important to consider recursive feasibility for systems such as unicycles, bicycles, and cars that have inertia and function at relatively high speeds. These issues are especially pronounced for nonholonomic systems.
We propose building a graph of forwardreachable states (for a given dynamical system planning model) within known free space, while simultaneously identifying those states from which the initial state is reachable. This graph implicitly represents a discrete underapproximation of the backwardreachable set of the initial state. We guarantee the safety of the physical system, as modeled by a higherorder dynamical model, using a robust control scheme [herbert2017fastrack]. Our framework, illustrated in Fig. 1, ensures:

Safety: all trajectories initiated by the physical system will be robustly collisionfree.

Liveness: if the goal is safely reachable from the initial state, it will always be safely reachable.

Completeness: if a goal was originally reachable by a plan that preserves the ability to return home, our framework guarantees that it will eventually be found.
Ii Related Work
Though we defer a formal definition until Section IIIB, ultimately, a motion planner is recursively feasible if it can explore unknown space while always remaining safe. There is an extensive body of literature in motion planning and safe exploration, which we cannot hope to fully summarize here. Rather, we identify two main categories of related work and discuss several of the most closely related approaches.
Iia Safe Exploration
Exploration has been studied within the context of Markov Decision Processes (MDPs) and Reinforcement Learning (RL). Moldovan and Abbeel [moldovan2012safe] propose an approach for generating a sequence of actions which preserve ergodicity with high probability. Other similar approaches, e.g. [achiam2017constrained, moldovan2012risk, berkenkamp2017safe], also design control policies which are riskaware or satisfy approximate constraints. Interestingly, Berkenkamp et. al. [berkenkamp2017safe] define safety in terms of Lyapunov stability. Though generally desirable, stability is insufficient to guarantee collision avoidance; in this work, we use a stronger setbased definition of safety. Indeed, our formulation of safe exploration is closely related with that of [akametalu2014reachability, fisac2018general], which characterize safety with reachable sets.
IiB Safe motion planning
Recent methods such as [herbert2017fastrack, Majumdar2017, Singh2017, fridovich2018planning] provide a variety of mechanisms for robust motion planning. Here, robustness is characterized in terms of an envelope around planned trajectories which the physical system is guaranteed to remain within. Our work relies upon this idea, building upon [herbert2017fastrack].
However, robust planning does not automatically guarantee recursive feasibility. Richards and How [richards2003model] and Rosolia and Borrelli [rosolia2018learning] directly address this problem within a model predictive control framework. The major differences between these works and our own are that [richards2003model] assumes linear timeinvariant system dynamics, while [rosolia2018learning] addresses an iterative task. Moreover, both assume a priori knowledge of all obstacles. Schouwenaars et. al. [schouwenaars2004decentralized] also plan in a receding horizon, but as in our work, recursive safety (though not liveness) is guaranteed by ensuring that all planned trajectories terminate in a safe loiter pattern.
Our work may also be viewed as a generalization of graphbased kinodynamic planners, e.g. the probabilistic roadmap [svestka1996probabilistic]. Importantly, our framework guarantees recursive feasibility in an a priori unknown environment, with potentially highorder system dynamics, and in the presence of environmental disturbances.
Iii Problem Formulation
Iiia Preliminaries
We consider an autonomous navigation task in a bounded a priori unknown static environment . The autonomous system has dynamic state , which includes, but is in general not limited to its location in the environment . We presume that for each point , the environment representation can assign a label . The system’s knowledge of the environment will be updated online according to measurements from a wellcharacterized sensor, with field of view . In this work, we will restrict our attention to deterministic sensing models, i.e. if is within the sensor’s field of view , it will be correctly identified as either . Probabilistic extensions are possible, though beyond the scope of this paper.
We assume known system dynamics, of the form:
where is the system’s control input and is a bounded disturbance.
In general, the dynamical model of the physical system will be nonlinear and highorder, making it challenging to compute trajectories in real time. Instead, we can use an approximate, lowerorder dynamical model for realtime trajectory computation, along with a framework which produces a known tracking controller for the fullorder model allowing it to follow the trajectories of the loworder model with a guarantee on accuracy. Let the simplified state of the system for planning purposes be , governed by approximate planning dynamics:
with the control input of the simplified system, which we will refer to as the planning system.
We use the FaSTrack framework [herbert2017fastrack] to provide a robust controlled invariant set in the relative state space between the planning reference and the full system. This relative state depends on the dynamical models used. A concrete example will be presented in Section V, and we direct the reader to [herbert2017fastrack, fridovich2018planning] for further discussion. The output of this robust analysis is twofold: the autonomous system is given an optimal tracking control law that will keep the relative state inside of this invariant set at runtime regardless of the loworder trajectory proposed by the planning algorithm. In turn, the planning algorithm can use the projection of the invariant set onto the planning state space as a guaranteed tracking error bound , for the purposes of collisionchecking at planning time. A feature of the FaSTrack framework is that the robust safety analysis depends only upon the relative dynamics, and not on the particular algorithm used for planning loworder trajectories. We inherit this modularity in our recursively feasible planning framework, which can be used with an arbitrary lowlevel motion planner. In Section V, we demonstrate our framework with a standard thirdparty algorithm from the Open Motion Planning Library (OMPL) [sucan2012ompl].
IiiB Recursive Feasibility: Safety and Liveness
We now define several important concepts more formally, as they pertain directly to the theoretical safety guarantees of our proposed framework. Let denote the trajectory followed by the planning system starting at state at time under some control signal over time.
Given a planned state , we refer to its footprint as the set of points that are occupied by the system in this state. We additionally define the robust footprint as the set of points that are occupied by some (with here denoting Minkowski addition). This represents the set of locations that may be occupied by the physical system while attempting to track the planned state . We will require that the system is at all times guaranteed to only occupy locations know to be FREE. For convenience, we will denote by the set of points that are labelled as FREE at time .
We then have the following definitions.
Definition 1
(Safety) A planned trajectory is known at time to be safe, i.e. collisionfree, if it satisfies the following criterion:
Observe that Definition 1 is not a statement about stability, as in e.g. [berkenkamp2017safe]. Dynamic stability is in fact neither a necessary nor a sufficient condition for safety understood as guaranteed collision (and failure) avoidance.
Definition 2
(Safe Reachable Set) The safe forward reachable set of a state at time is the set of states that are known at to be safely reachable from under some control signal .
Analogously, the safe backward reachable set of at is the set of states from which is known at time to be safely reachable under some control signal (this can also be thought of as the set of states that can be safely reached from in backward time, hence the name backward reachable set):
We now proceed to define viability in terms of these sets.
Definition 3
(Viability) A state is viable at time with respect to a goal state and a home state if at it is known to be possible to reach either or from while remaining safe, i.e. . A trajectory is viable at if all states along are viable at .
Definition 4
(Safely Explorable Set) The safely explorable set of a state is the collection of states that can eventually be visited by the system through a trajectory starting at state with no prior knowledge of whose states are, at each time , viable according to the known free space .
Based on the idea of the safely explorable set we can now introduce the important notion of liveness for the purposes of our work.
Definition 5
(Liveness) A state is live with respect to a goal state if it is possible to reach from while remaining in the safely explorable set for all time, i.e if . A trajectory is live if all states in are live.
Finally, we will refer to a planning algorithm as recursively feasible if, given that the initial state is live, all future states are both live and viable. We will show that our proposed framework is recursively feasible. Moreover, we will also show that it is complete, in the sense that, if is live with respect to , then we will eventually reach through continued guaranteed safe exploration.
Iv General Framework
Iva Overview
Our framework is comprised of two concurrent, asynchronous operations: building a graph of states which discretely underapproximate the forward and backward reachable sets of the initial “home” state, and traversing this graph to find recursively feasible trajectories. Namely, we define the graph of vertices and edges . Vertices are individual states in , and directed edges are trajectories between pairs of vertices. will be a discrete underapproximation of the current safe forward reachable set of the initial state . We also define the graph to contain only those vertices which are in the safe backward reachable set of the and the goal state , and the corresponding edges. We use the notation to mean that state is a vertex in , and likewise for .
We use following two facts extensively. They follow directly from the defintions above and our assumptions on deterministic sensing and a static environment.
Remark 1
(Permanence of Safety) A trajectory that is safe at time will continue to be safe for all .
Remark 2
(Permanence of Reachability) A state that is in the safe forward or backward reachable set of another state at time will continue to belong to this set for all , i.e. and .
IvB Building the graph
We incrementally build the graph by alternating between outbound expansion and inbound consolidation steps. In the outbound expansion step, new candidate states are sampled, and if possible, connected to . This marks them as part of the forward reachable set of . In the inbound consolidation step, we attempt to find a safe trajectory from forwardreachable states in back to a state in , which is known to be viable. Successful inbound consolidation marks a state as either able to reach or safely return to .
IvB1 Outbound expansion
This process incrementally expands a discrete underapproximation of the forward reachable set of the home state, . Note that, by Remark 2, can only grow as the environment is gradually explored over time and therefore any state added to at a given time is guaranteed to belong to for all .
We add states to via a Monte Carlo sampling strategy inspired by existing graphbased kinodynamic planners [karaman2011RRTPRM], illustrated in Fig. 1(a). We present a relatively simple strategy here, although more sophisticated options for sampling new states are possible, e.g. [karaman2013sampling, gammell2015batch].
Let be sampled uniformly at random from at time such that . We wish to establish whether or not is in the safe forward reachable set of home at , i.e. . This is accomplished by invoking a thirdparty motion planner, which will attempt to find a safe trajectory to from any of the points already known to be in . In Section V, we use a standard kinodynamic planner from the OMPL [sucan2012ompl] for this purpose.
We observe that repeatedly executing this procedure will, in the limit, result in a dense discrete underapproximation of . Formally, assuming that the lowlevel planner will find a valid trajectory to a sampled state if one exists, then for any , we have that the probability that a new sampled state will lie within a distance of from the nearest state goes to as . We formalize this observation below:
Lemma 1
(Dense Sampling) For all , assuming we sample candidate states uniformly and independently from and is compact, then letting be the th sampled state from we have that :
This follows directly from the properties of uniform sampling from compact sets.
This will be useful in proving completeness of our framework.
IvB2 Inbound consolidation
This process incrementally adds states in to a discrete approximation of the safe backward reachable set of . By Definition 3, any state added to this set is viable, which means that a trajectory will always exist from it to either or . This is a crucial element of our overall guarantee of recursive feasibility. We recall that .
Suppose that . We will attempt to add to by finding a safe trajectory from to any of the states currently in by invoking the lowlevel motion planner. If we succeed in finding such a trajectory, then by construction there exists a trajectory all the way to , so we add to . If is added to , we also add all of its ancestors in to , since there now exists a trajectory from each ancestor through to either or . This procedure is illustrated in Fig. 1(b).
IvC Exploring the graph
When requested, we must be able to supply a safe trajectory beginning at the current state reference tracked by the system. Recall from Section IIIA that under the robust tracking framework [herbert2017fastrack], the physical system’s state is guaranteed to remain within an error bound of measured on the planning state space . This property allows us to make guarantees in terms of planning model states rather than full physical system states .
Trajectories output by our framework must guarantee future safety for all time; that is, as the system follows we must always be able to find a safe trajectory starting from any future state. In addition, we require that remains safely reachable throughout the trajectory; this ensures that liveness is preserved (if it was in principle possible from to safely explore and reach then this possibility will not be lost by embarking on ). Note that liveness is an important property separate from safety: a merely safe planner may eventually trap the system in a periodic safe orbit that it cannot safely exit.
By construction, any cycle in is safe for all future times (Remark 1). Readily, this suggests that we could guarantee perpetual recursive feasibility by always returning the same cycle. However, this naive strategy would never reach the goal. Moreover, it would not incrementally explore the environment. In order to force the system to explore unknown regions of , we modify this naive strategy by routing the system through a randomly selected unvisited state , and then back to . The trajectory always ends in a periodic safe orbit between and . Note that this random selection does not need to be done naively (e.g. by uniform sampling of unvisited states in ), and efficient exploration strategies are certainly possible. In our examples we will use an greedy sampling heuristic by which, with probability , we select the unvisited closest to , and otherwise, with probability , we uniformly sample an unvisited state in . Fig. 3 illustrates this exploration procedure.
Of course, if is ever added to , we may simply return a trajectory from the current state to . This will always be possible because, by construction, every state in is safely reachable from every other state in (if necessary, looping through ).
IvD Algorithm summary
To summarize, our framework maintains graph representations of the forward reachable set of and the backward reachable set of . Over time, these graphs become increasingly dense (Lemma 1). Additionally, all output trajectories terminate at or in a cycle that includes . This implies our main theoretical result:
Theorem 1
(Recursive Feasibility)
Assuming that we are able to generate an initial viable trajectory (e.g. a loop through ), all subsequently generated trajectories will be viable and preserve the liveness of .
Thus, our framework guarantees recursive feasibility.
{proof}
By assumption, the initial trajectory output at is safe (Definition 1).
We now proceed by induction: assume that the th reference trajectory is viable for the knowledge of free space at the time at which it was generated, i.e. .
Assuming has not been reached yet at the time of the next planning request, , a new trajectory will be generated from initial state .
The new trajectory will be created by concatenating safe trajectories between states in and therefore will be a viable trajectory.
Such a trajectory can always be found, because it is always possible to choose , which, by the inductive hypothesis was a viable trajectory at time and, by Remark 2, continues to be viable at .
Therefore all planned trajectories will retain the ability to either safely reach or safely return to . In the former case, is immediately live (and since , must have been live too); in the latter, will inherit the liveness of , by observing that .
Corollary 1
(Dynamical System Exploration) Given that the safety of trajectories is evaluated using the robust footprint , and the relative state between the dynamical system and the planning reference is guaranteed to be contained in , Theorem 1 implies that the dynamical system can continually execute safe trajectories in the environment.
Moreover, we ensure that each output trajectory visits an unexplored state in , which implies that approaches the safely explorable set introduced in Definition 4. Together with Theorem 1, this implies the following completeness result:
Theorem 2
(Completeness)
In the limit of infinite runtime, our framework eventually finds the goal
with probability
if it is within the safely explorable set.
{proof}
By Theorem 1, all trajectories output will be viable; hence, the autonomous system will remain safe for all time (Corollary 1).
Further, since each generated trajectory visits a previously unvisited state in , by Lemma 1 it will eventually observe new regions in the safely explorable set if any exist. Moreover, those regions will eventually be sampled, added to , and visited by subsequent trajectories. Because we have assumed all sets of interest to be bounded, this implies that we will eventually add to as long as .
IvE Remarks
We conclude this section with several brief remarks regarding implementation.
In Sec. IVB, we specify that states should be connected to existing states in and . In practice, we find that connecting to one of the nearest neighbors (measured in the Euclidean norm over ) in the appropriate graph suffices.
In Sec. IVC, we describe traversing to find safe trajectories between vertices. For efficiency, we recommend maintaining the following at each vertex: costfromhome, costtohome, and costtogoal, where cost may be any consistent metric on trajectories (e.g. duration). If these quantities are maintained, then care must be taken to update them appropriately for descendants and ancestors of states that are added to and in Sec. IVB.
Finally, we observe that outbound expansion, inbound consolidation, and graph exploration may all be performed in parallel and asynchronously.
V Example
We demonstrate our framework in a realtime simulation, implemented within the Robot Operating System (ROS) software environment [quigley2009ros].
Va Setup
Let the highorder system dynamics be given by the following 6D model:
(1) 
where is acceleration due to gravity, the states are position and velocity in , and the controls are . These dynamics are a reasonably accurate model for a lightweight quadrotor operating near a hover and at zero yaw.
We consider the following lowerorder 3D dynamical model for planning:
(2) 
where is a constant tangential speed in the Frenet frame, states are absolute heading , and position in fixed frame, and control is the turning rate. We interpret these dynamics as a Dubins car operating at a fixed height of .
We take controls to be bounded in all dimensions independently by known constants: and . In order to compute the FaSTrack tracking error bound , we must solve a Hamilton Jacobi (HJ) reachability problem for the relative dynamics defined by (1) and (2). In this case, the relative dynamics are given by:
(3) 
where the relative states (distance), (bearing), (tangential velocity), and (normal velocity) are illustrated in Fig. 3(a).
Fig. 3(b) is a 3D rendering of the FaSTrack value function [herbert2017fastrack] computed using level set methods [LSToolbox]. The value function records the maximum relative distance between the high and loworder dynamical models (i.e. ). In order to guarantee that, at runtime, the distance between the two systems does not exceed this value, the value function is computed by solving a differential game in which seeks to maximize the relative distance and seeks to minimize it. Observe in Fig. 3(b) that level sets of the value function with sufficiently high value are wellapproximated by discs centered on the origin in . Thus, we approximate the TEB by such a disc for rapid collisionchecking during each call to the lowlevel motion planner. Since the highorder dynamics (1) do allow for variation in , we also incorporate a extent for which may be obtained by solving a similar differential game in the subsystem of (1), as in [fridovich2018planning].
We use the KPIECE1 kinodynamic planner [csucan2009kinodynamic] within the Open Motion Planning Library (OMPL) [sucan2012ompl] to plan all trajectories for the lowlevel dynamics while building the graphs and . For simplicity, we model static obstacles as spheres in and use an omnidirectional sensing model in which all obstacles within a fixed range of the vehicle are sensed exactly. We emphasize that these choices of environment and sensing models are deliberately simplified in order to more clearly showcase our framework. The framework itself is fully compatible with arbitrary representations of static obstacles and deterministic sensing models. Extensions to dynamic obstacles and probabilistic sensing are promising directions for future research.
VB Simulation Results
We test our framework in a variety of simulated environments, some of which are shown in our supplementary video. Here, we present more detailed results for a single environment which illustrates the importance of maintaining recursive feasibility.
Fig. 1 shows a canonical case in which our guarantee of recursive feasibility avoids collision where a nonrecursivelyfeasible approach would likely fail. Here, the goal is directly in front of the home position and the way there appears to be in . However, just beyond our sensor’s field of view , there is a narrow dead end. Standard planning techniques would likely either optimistically assume the unknown regions of the environment are free space, or plan in a receding horizon within known free space . In both cases, the strong incentive would be to enter the narrow dead end, and eventually crash (recall that the planner’s speed is fixed).
By contrast, our approach eventually takes a more circuitous—but recursively feasible—route to the goal. The evolution of planned viable trajectories is shown on the right in Fig. 1. Initially, we plan tight loops near , but over time we visit more of the safely explorable space , and eventually we find .
Vi Discussion & Conclusion
In this paper, we have introduced a novel framework for recursively feasible motion planning for dynamical systems. Our approach is based on the ideas of forward and backward reachability, and uses FaSTrack [herbert2017fastrack] to make a strong guarantee of safety over all time. Moreover, we also guarantee that, if the initial “home” state is live—i.e. the goal is safely explorable from —then each state along all motion plans will also be live, and eventually we will find a trajectory to .
To our knowledge, this is the first motion planning algorithm to make this guarantee of recursive feasibility. As such, we have presented it as generally as possible and without optimization. While we make no claims of optimality, we do believe that many of the advances in optimal samplebased planning could be readily applied to our work. We are also sanguine about implementing our work in hardware for different, more complicated dynamical systems.