Safety Considerations in Deep Control Policies with Safety Barrier Certificates Under Uncertainty
Abstract
Recent advances in Deep Machine Learning have shown promise in solving complex perception and control loops via methods such as reinforcement and imitation learning. However, guaranteeing safety for such learned deep policies has been a challenge due to issues such as partial observability and difficulties in characterizing the behavior of the neural networks. While a lot of emphasis in safe learning has been placed during training, it is nontrivial to guarantee safety at deployment or test time. This paper extends how under mild assumptions, Safety Barrier Certificates can be used to guarantee safety with deep control policies despite uncertainty arising due to perception and other latent variables. Specifically for scenarios where the dynamics are smooth and uncertainty has a finite support, the proposed framework wraps around an existing deep control policy and generates safe actions by dynamically evaluating and modifying the policy from the embedded network. Our framework utilizes control barrier functions to create spaces of control actions that are safe under uncertainty, and when the original actions are found to be in violation of the safety constraint, uses quadratic programming to minimally modify the original actions to ensure they lie in the safe set. Representations of the environment are built through Euclidean signed distance fields that are then used to infer the safety of actions and to guarantee forward invariance. We implement this method in simulation in a droneracing environment and show that our method results in safer actions compared to a baseline that only relies on imitation learning to generate control actions.
I Introduction
Trained deep control policies via reinforcement learning (RL) and imitation learning (IL) allow generating control outputs directly from sensor inputs. However, in contrast to simulations and games, applying such techniques to realworld safetycritical applications remains an incredibly challenging task. The strong reliance on deep neural networks makes them vulnerable to overconfident or unpredictable results when presented with data distributions unseen during training. Additionally, in reallife scenarios there might be environmental factors (e.g. friction, wind, viscosity, etc.) and uncertainties due to machine perception, that may not have been explicitly modeled in the formulation. The importance of predictions and actions being robust to such exogenous variations is paramount in the safetycritical aspects of realworld robotics.
Much of the previous work on safety in deep control policies has focused on modifying the training phase. These include reward engineering [20], constrained optimization to incorporate safety constraints [6] and worstcase optimization [27]. Providing safety guarantees that would hold at the deployment phase (test time) is challenging, since it is difficult to characterize or enumerate the complete state space of the agent. For example, it is impossible to characterize apriori all images a robot would see. Furthermore, the mathematical structure of the deep policies further makes it difficult to provide an analysis of the deep policies.
In this work, we explore a runtime alternative that aims to keep the system safe by providing minimal deviations of control signals stemming from an embedded deep control policy. The framework attempts to continuously preserve the safety via a barrier function, while the agent continues to make progress towards the task it was trained for. In particular, the work extends Safety Barrier Certificates (SBC) [28] to handle safety considerations in deep control policies, and focuses solely on a testtime implementation, thus not affecting the training phase. We specifically focus on the problem of autonomous droneracing, where a quadrotor needs to negotiate several gates without collisions while moving as fast as possible on a racing track. There are several technical challenges that we encounter and address. First, deep control policies are often used when there is no explicit model of systems dynamics available. In the absence of such a dynamics model, it is nontrivial to use SBCs. Second, the safety constraints for applications such as droneracing can be complex. For example, in our application case, the quadrotors are not allowed to collide with the gates or other objects in the environment. Representations such as Euclidean signed distance fields (ESDF) are popular and useful to formally define the safety conditions. However, it is unclear how SBCs can be applied here. Finally, the safety framework also needs to account for any uncertainty and nondeterminism that might arise due to environmental factors.
The core insight in this work is that for many realworld applications system dynamics are well approximated as an ordinary differential equation that is uniformly continuous, bounded and Lipschitz continuous. This allows us to make a locally linear approximation while ensuring that the approximation error is small. Similarly, we incorporate safety constraints defined over ESDFs via smooth approximations and finally discuss safety under uncertainty and nondeterminism. We implement our framework in simulation and show that our method results in guaranteeable safety and improved avoidance compared to the original deep control policies, even under perception uncertainty. In summary, the main contributions of this paper include:

Enhancing any trained deep control policy using SBC under uncertainty.

Simplifying safety constraints for barrier functionbased avoidance in complex environments.

Improving the safety of deep control policies.
Ii Related Work
Recent research has focused on learning control policies directly from raw data using deep neural networks (DNN) by either imitation or reinforcement learning [23], [18]. Much of the work in safe deep control focuses on training time and aims to induce riskaversion via reward function or through constrained optimization [20, 11, 1]. However, none of these approaches guarantee safety during the test or deployment phase. Formal verification and certification has also been proposed to address the application of deep neural networks in safetycritical applications. For example, [13, 19] focus on verification procedure of DNNs through analysis of activation functions and layers. Similarly, [9] perform verification for a feedback control network using a receding horizon formulation that attempts to enforce properties such as reachability, safety and stability. [10] discuss controltheoretic modifications to reinforcement learning for safety analysis. The notion of probabilistic safety under uncertainty has also been explored previously via formal methods [24]. Much of this work results in computationally intensive procedures that cannot be easily used in realtime systems.
Safety Barrier Certificates (SBC) with permissive control barrier functions (CBF), have been previously used to guarantee runtime safety in both deterministic and nondeterministic settings [2, 3, 28]. CBF were also used for a safe exploration during learning of RL models [8, 22, 17]. The key idea is to first define a barrier function by considering a set of unsafe states and the system dynamics, and then use it to minimally modify a given controller so that the resulting solution is safe. The framework can be extended to handle uncertainty in the environment to probabilistically guarantee safety [21]. Recent work by [4] also proposed a realtime safety framework on top of learningbased planners, based on HamiltonJacobi reachability. This paper builds upon this line of works where the key idea centers on wrapping a deep control policy within the SBC framework. However, unlike most applications of SBC, in our case there is no explicit system dynamics model available. The use of CBF to ensure safety was also introduced in [7, 29], which enforced CBF constraints and obtained control actions through a mixed integer program or a quadratic program. These works assumed that the obstacles were convex, whereas our framework can handle nonconvex obstacles in a fast and easy way, forming a QP to be solved for CBF obstacle avoidance. In [26], the authors present a deep controller that performs autonomous drone landing while providing stability guarantees under unsteady dynamics. In comparison, we explicitly also focus on observation uncertainty along with system uncertainty: as in most of the cases where deep controllers are applied there is uncertainty that stems from the sensing and perception part of the system.
We demonstrate the framework on the task of autonomous droneracing, where within a realistic simulator [25], a quadrotor uses an RGB camera to perceive and negotiate multiple gates as fast as possible on a racing track. The approaches to solve droneracing consider inferring simple representation of the environment, and then using either classical control and planning methods [14, 16] or building deep controllers [15, 5]. In this paper we explore deep control policies and assume that at least one gate is always in the field of view of the quadrotor.
Iii Proposed Framework
Assume we have been given a policy, that produces a control signal . The goal of the proposed framework is to provide a projection of , such that the system is safe with respect to safety constraints. In the context of this discussion, we make use of an important assumption that the uncertainties pertaining to system dynamics as well as the perception observations can be modeled as distributions with finite support. The finite support assumption is obtained from knowledge of the application domain. As the system of interest is a quadrotor, due to physical actuation limits, the uncertainties arising from translational and rotational dynamics are bounded. At the same time, the perception based localization system that is responsible for generating control actions only attempts to localize the drone gate within the viewing frustum of the camera, hence bounding the observation uncertainty accordingly.
In order to use the SBC framework, we need to first characterize (1) system dynamics, (2) safety constraints and (3) handle uncertainty. We describe these in detail below:
Iiia System Dynamics Model
Due to the absence of an explicit system dynamics model in most deep control scenarios, we need to make certain assumptions. First, we consider a simplified dynamics model for a robot evolving as a continuoustime system:
(1) 
where is the system state, the noisy observation is , is the control input action and are the process and measurement noises. denotes the uniform distribution, with a finite support defined by and . Specifically, similar to [12] we make the assumption that the underlying unknown dynamics of the system is uniformly continuous, bounded and Lipschitz continuous. Additionally, as mentioned earlier, we make an assumption that approximation error, uncertainty, and nondeterminism in the system could be explained via noise with finite support [21]. Thus, we can simplify the robot dynamics as stochastic controlaffine single integrator dynamics of the form , where accounts for both the model nonlinearity and any model uncertainties. This also allows us to make locally linear approximations of the dynamics over the control input. In the context of a quadrotor, the virtual control inputs provided through the single integrator dynamics are mapped to the corresponding nonlinear physical model inside the simulation.
IiiB Safety Constraints
Given our application domain of droneracing, we propose safety constraints based on rich representations common in robotics. Our obstacle model is a static model and is represented through a distance transform inspired by the Euclidean signed distance field (ESDF). We define three regions for every obstacle with a pose denoted as , that are subsets of the 3D metric space: inside the obstacle, outside the obstacle and as its border. For any point in 3D space that is the position of the robot, we define a custom distance function to obstacle as follows:
(2) 
Under the assumption that a robot cannot physically be inside an obstacle, i.e. for every state and for every obstacle , there are properties of the distance function defined in (2) that are useful in defining the safety barrier. In particular, we make the following observation:
Remark 1
where is Lipschitz continuous, differentiable almost everywhere and bounded under a finite support.
We define a state to be safe with respect to an obstacle with the pose if the following conditions hold:
(3)  
(4) 
The set indicates the set of states that are safe with respect to the obstacle , where is a buffer safety radius. Naturally, the condition of ensures valid robot states lie only outside obstacles. For our application in droneracing, we consider square gates as obstacles (see Fig. 1). Additionally, considering the finite boundary of (2) and the fact that all our obstacles are the same, we precompute the signed distance field using (2) for a set of sampled poses of the robot within a region of interest relative to the gate, thus creating a distance map.
IiiC Safety Under Uncertainty
In most of the deep control scenarios, the state variable used to define and evaluate safety is latent. Consequently, we assume that a state estimation routine is available that would provide the system state with a bounded error. In our work on droneracing, we use a Variational Autoencoder (VAE) based module to estimate the pose of the gates. This estimation not precise but considered to have finite support. To address safety under the uncertainty arising due to such state estimation, we perform worstcase safety computation. Formally, we define a new distance function as follows:
(5) 
where is the set of points that is occupied when the obstacle is replicated at all possible positions within the error threshold of the predicted pose.
Remark 2
We can represent a new obstacle with pose that comprises of all possible positions such that .
This new obstacle allows us to consider the worstcase scenario under gate pose estimation uncertainty and can be tackled using the same safety definition as in (4). A new corresponding distance map can also be precomputed. This method simplifies the way to provide safety under uncertainties as the basic underlying fabric stays unchanged. Fig. 1 and Fig. 2 show the distance maps with and without considering uncertainty, and the difference in the measurements of the obstacle. Now the safe set considering worstcase scenarios under uncertainty can be defined by:
where  (6) 
It is easy to show using Remark (2) that there exists an equivalent safety set that considers the original ESDF using the newly constructed obstacle with pose :
where  (7) 
Iv Safety Barrier Certificates Under Uncertainty
Barrier certificates, or barrier functions, are used to ensure that robots remain in safe sets for all time. Controllers are expected to satisfy the barrier certificates while taking control actions that are as close as possible to the nominal action. For the discussion in this section, we assume the perspective of the robot that is running the perceptionaction loop, thus expressing the obstacle poses relative to the pose of the robot. This allows for a simplification of the notation for the distance function from to . Under the assumption that at least one obstacle is in the field of view of the robot at any time, we simplify the notation and represent the safety set and constraints as a function of the next obstacle pose relative to the robot. The set is again defined by all states that correspond to the center of the robot being outside the obstacle. The safety set in the new simpler representation is defined similarly to (4):
where  (8) 
Based on the theory of Zeroing Control Barrier Functions (ZCBF) and SBC, some conditions need to be applied to the controller to guarantee forward invariance of the safety set. A continuously differentiable function is a ZCBF, and the admissible control space can be defined as:
(9) 
Any Lipschitz continuous action guarantees that the set is forward invariant. Considering the extended class function as for , and based on the admissible control space, the SBC that defines the constraints can be formulated as:
(10) 
Remark 3
For all the positions occupied by the robot where the distance map is differentiable, and assuming the initial position is collisionfree, it can be shown that the constrained control space described by (10) induces a linear constraint over the robot controller. Proof and further discussion can be found in [21].
We recall here that as we precompute a distance map for over a grid of sampled poses, it is possible to determine the relevant constraints efficiently at runtime.
(a) Performance on Safety  (b) Performance on Task Success 
In order to ensure defined in (8) is continuously differentiable, we can use a smooth approximation to (2) (for example using the softmax trick). In our experiments, we simply work with ESDFs noting that the regions of nondifferentiability (with respect to a drone gate’s shape) arise only at the places where the vehicle is guaranteed to be safe by a wide margin. Second, we transform the VAE’s estimated gate position coordinates from spherical to Euclidean coordinates, where the quadrotor’s yaw angle is equal to the predicted angle with respect to the obstacle. These actions help prove the continuity of the derivative of . Thus we can formally rewrite as:
(11) 
(a) Trajectories on four track difficulty levels  (b) Minimum distance  Averaged over track difficulty level 
Similar to precomputing the distance map, it is also possible to precompute its gradient . Further, under our assumed locally linear dynamics (1), we can write in (11) to be . Given that has finite support () and is Lipschitz continuous, we can compute a constant , thus resulting in a linear constraint on that guarantees the inequality in (11). Finally, we formulate our safety problem as a Quadratic Program (QP) to minimally change the action if needed, i.e. modify the original control action if it is found to violate the safety constraints. Formally, we solve the following program using the safety constraints defined in (8) and the SBC (11):
(12)  
s.t.  (13)  
(14)  
(15) 
where is the boundary of the controller action, is the original deep policy control and the safe action is denoted by . In practice, considering worstcase safety leads to a difference in the modifications to the original controller, and it is more restricted near the obstacle (for example see in Fig. 1 and Fig. 2).
V Experiments and Results
We performed experiments to verify the robustness of the method and understand its limitations via a droneracing simulation built on top of AirSim [25]. Each experiment comprised of a quadrotor navigating through a set of ten racing tracks for three laps. Each track was around 50m in length, built by eight gates positioned randomly. Each experiment was associated with one of four difficulty levels (ranging from 0 to 1.5 with a step size of 0.5), defined by the maximum offset between the centers of two consecutive gates, where a larger offset requires more maneuvering to stay on track.
We use two key metrics for evaluation: safety and the ability to solve the given tasks successfully. A trial consists of maneuvering through three consecutive laps of the track, and it is defined as safe when the quadrotor stays collisionfree over the entire trial. The percent of gates negotiated safely through a trial is a measure of success on the task. We wish to explore if the proposed framework allows us to be safe while still being competitive, as defined by the success criterion.
For the perception module and baseline control policies, we use the networks from [5]: a variational autoencoder (VAEconstrained) that predicts next gate poses and Behavior Cloning (BC) policies constrained and unconstrained, which are the best performing networks for control in the mentioned work. We compare both deep control policies with our proposed safety framework with and without uncertainty (corresponding to gate pose localization). Our uncertainty estimation is based on the errors in gate pose estimation computed empirically by [5].
Fig. 3 shows the performances of both baselines when augmented with our optimization method, by safety and success metrics. In our experiments, the success rate seems to be almost similar for all methods, with a slight advantage for the safety method considering uncertainty. As the track difficulty increases to 1.5, we observed the safety performance of the original policies deteriorate drastically, while that of the safety policies decreases slower. We observe that the best safety rates were achieved when considering uncertainty.
We evaluated the experiments also by a distance metric defined in (2). For every trial, we recorded the minimum distance between the quadrotor and the next gate at each time step, which indicates how close the quadrotor was to a possible collision. If a trial ended in collision, then the score is zero. Fig. 4b shows the minimum distance values seen, averaged for each difficulty level. The results show that our proposed safety method considering uncertainties achieves the best performance overall. While we sometimes observe lesser minimum distances when considering uncertainty, this can be attributed to the fact that under uncertainty, the obstacle is artificially inflated to a larger size. A visualization of safe control commands and trajectories are shown in Fig. 4a and Fig. 5, applied to the BC unconstrained network. In Fig. 4a, we show the differences between the original policy trajectories and the safety controls with and without considering uncertainties. The trajectories are almost the same for the first three difficulty levels, but when the difficulty level increases to 1.5, then the only safe trajectory is when considering safety, which leads out of track. For the same track, the original policy and the safety method lead to a collision with the second and fourth gates, respectively. A detailed control visualization is shown in Fig. 5, where the actions of the original policy are violating the safety constraints and would lead to a possible collision with a gate, whereas the safety method with uncertainty computes a collisionfree action.
We have observed a few limitations of the proposed method in our experiments. For example, when the angle between the current gate and next gate was too sharp but still in the field of view of the quadrotor, occasionally, all methods caused a collision with the current gate. Another issue we encountered was when the quadrotor starts a trial facing a gate’s pole, and in close proximity. In this situation, the quadrotor most of the time collided with the gate for all the methods, which could have been because of significant noise in gate estimated position and estimation errors exceeding the worstcase values considered. One way to overcome such an issue is to consider optimization for the next two gates instead of only one.
Vi Conclusions and Future Work
We have presented a framework for safe deep control policies for the task of droneracing. At the heart of our method are safety barrier certificates, used to minimally change the controller to ensure forward invariance of safety. The main idea to overcome uncertainty in obstacle position is considering the worst case in error threshold of predicted obstacle pose and building a precomputed distance map through Euclidean signed distance field. Our experiments show that using our proposed method is elevating the safety rate of deep control policies, while still achieving competitive results. Future work includes investigating a prediction process of more than one gate position. We would also be interested in exploring the use of this method during training time of deep control policies, to balance safety and performance before execution.
Acknowledgment
We would like to thank Ratnesh Madaan and Rogerio Bonatti for their inputs regarding the baseline perception and control policies; as well as Matthew Brown and Nicholas Gyde for their help with the simulations.
References
 (2017) Constrained policy optimization. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 22–31. Cited by: §II.
 (2016) Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control 62 (8), pp. 3861–3876. Cited by: §II.
 (201906) Control barrier functions: theory and applications. 2019 18th European Control Conference (ECC). External Links: ISBN 9783907144008, Link, Document Cited by: §II.
 (2019) An efficient reachabilitybased framework for provably safe autonomous navigation in unknown environments. External Links: 1905.00532 Cited by: §II.
 (2019) Learning controls using crossmodal representations: bridging simulation and reality for drone racing. arXiv preprint arXiv:1909.06993. Cited by: §II, §V.
 (2019) Safe reinforcement learning with scene decomposition for navigating complex urban environments. arXiv preprint arXiv:1904.11483. Cited by: §I.
 (2017) Obstacle avoidance for lowspeed autonomous vehicles with barrier function. IEEE Transactions on Control Systems Technology 26 (1), pp. 194–206. Cited by: §II.
 (2019) Endtoend safe reinforcement learning through barrier functions for safetycritical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3387–3395. Cited by: §II.
 (2018) Learning and verification of feedback control systems using feedforward neural networks. IFACPapersOnLine 51 (16), pp. 151–156. Cited by: §II.
 (2019) Bridging hamiltonjacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556. Cited by: §II.
 (2012) Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45, pp. 515–564. Cited by: §II.
 (2017) FaSTrack: a modular framework for fast and guaranteed safe motion planning. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1517–1522. Cited by: §IIIA.
 (2017) Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pp. 97–117. Cited by: §II.
 (2019) Beauty and the beast: optimal methods meet learning for drone racing. In 2019 International Conference on Robotics and Automation (ICRA), pp. 690–696. Cited by: §II.
 (201829–31 Oct) Deep drone racing: learning agile flight in dynamic environments. In Proceedings of The 2nd Conference on Robot Learning, A. Billard, A. Dragan, J. Peters and J. Morimoto (Eds.), Proceedings of Machine Learning Research, Vol. 87, , pp. 133–145. External Links: Link Cited by: §II.
 (201906) OIL: observational imitation learning. Robotics: Science and Systems XV. External Links: ISBN 9780992374754, Link, Document Cited by: §II.
 (2019) Temporal logic guided safe reinforcement learning using control barrier functions. arXiv preprint arXiv:1903.09885. Cited by: §II.
 (2015) Continuous control with deep reinforcement learning. External Links: 1509.02971 Cited by: §II.
 (2019) Algorithms for verifying deep neural networks. arXiv preprint arXiv:1903.06758. Cited by: §II.
 (2018) Towards optimally decentralized multirobot collision avoidance via deep reinforcement learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. Cited by: §I, §II.
 (2019) Airborne collision avoidance systems with probabilistic safety barrier certificates. In NeurIPS 2019, Cited by: §II, §IIIA, Remark 3.
 (2019) Barriercertified adaptive reinforcement learning with applications to brushbot navigation. IEEE Transactions on Robotics 35 (5), pp. 1186–1205. Cited by: §II.
 (2018) Agile autonomous driving using endtoend deep imitation learning. In Robotics: science and systems, Cited by: §II.
 (201606) Safe control under uncertainty with probabilistic signal temporal logic. In Proceedings of Robotics: Science and Systems, AnnArbor, Michigan. External Links: Document Cited by: §II.
 (2018) Airsim: highfidelity visual and physical simulation for autonomous vehicles. In Field and service robotics, pp. 621–635. Cited by: §II, §V.
 (2019) Neural lander: stable drone landing control using learned dynamics. In 2019 International Conference on Robotics and Automation (ICRA), pp. 9784–9790. Cited by: §II.
 (2019) Worst cases policy gradients. arXiv preprint arXiv:1911.03618. Cited by: §I.
 (201702) Safety barrier certificates for collisionsfree multirobot systems. IEEE Transactions on Robotics PP, pp. 1–14. External Links: Document Cited by: §I, §II.
 (2019) Samplingbased motion planning via control barrier functions. In Proceedings of the 2019 3rd International Conference on Automation, Control and Robots, pp. 22–29. Cited by: §II.