Hamilton-Jacobi formulation for Reach-Avoid Differential Games

# Hamilton-Jacobi formulation for Reach-Avoid Differential Games

Kostas Margellos  and  John Lygeros
###### Abstract.

A new framework for formulating reachability problems with competing inputs, nonlinear dynamics and state constraints as optimal control problems is developed. Such reach-avoid problems arise in, among others, the study of safety problems in hybrid systems. Earlier approaches to reach-avoid computations are either restricted to linear systems, or face numerical difficulties due to possible discontinuities in the Hamiltonian of the optimal control problem. The main advantage of the approach proposed in this paper is that it can be applied to a general class of target hitting continuous dynamic games with nonlinear dynamics, and has very good properties in terms of its numerical solution, since the value function and the Hamiltonian of the system are both continuous. The performance of the proposed method is demonstrated by applying it to a two aircraft collision avoidance scenario under target window constraints and in the presence of wind disturbance. Target Windows are a novel concept in air traffic management, and represent spatial and temporal constraints, that the aircraft have to respect to meet their schedule.

K. Margellos and J. Lygeros are with the Automatic Control Laboratory, Department of Electrical Engineering and Information Technology, Swiss Federal Institute of Technology (ETH), Physikstrasse 3, 8092 Zürich, e-mail: margellos, lygeros@control.ee.ethz.ch

## 1. Introduction

Reachability for continuous and hybrid systems has been an important topic of research in the dynamics and control literature. Numerous problems regarding safety of air traffic management systems [1], [2], flight control [3], [4], [5] ground transportation systems [6], [7], etc. have been formulated in the framework of reachability theory. In most of these applications the main aim was to design suitable controllers to steer or keep the state of the system in a ”safe” part of the state space. The synthesis of such safe controllers for hybrid systems relies on the ability to solve target problems for the case where state constraints are also present. The sets that represent the solution to those problems are known as capture basins [8]. One direct way of computing these sets was proposed in [9], [10], and was formulated in the context of viability theory [8]. Following the same approach, the authors of [11], [12] formulated viability, invariance and pursuit-evasion gaming problems for hybrid systems and used non-smooth analysis tools to characterize their solutions. Computational tools to support this approach have been already developed by [13].

An alternative, indirect way of characterizing such problems is through the level sets of the value function of an appropriate optimal control problem. By using dynamic programming, for reachability/invariant/viability problems without state constraints, the value function can be characterized as the viscosity solution to a first order partial differential equation in the standard Hamilton-Jacobi form [14], [15], and [16]. Numerical algorithms based on level set methods have been developed by [17], [18], have been coded in efficient computational tools by [16], [19] and can be directly applied to reachability computations.

In the case where state constraints are also present, this target hitting problem is the solution to a reach-avoid problem in the sense of [1]. The authors of [1], [20] developed a reach-avoid computation, whose value function was characterized as a solution to a pair of coupled variational inequalities. In [19], [21], [22] the authors proposed another characterization, which involved only one Hamilton-Jacobi type partial differential equation together with an inequality constraint. These methods are hampered from a numerical computation point of view by the fact that the Hamiltonian of the system is in general discontinuous [20].

In [23], a scheme based on ellipsoidal techniques so as to compute reachable sets for control systems with constraints on the state was proposed. This approach was restricted to the class of linear systems. In [24], this approach was extended to a list of interesting target problems with state constraints. The calculation of a solution to the equations proposed in [23], [24] is in general not easy apart from the case of linear systems, where duality techniques of convex analysis can be used.

In this paper we propose a new framework of characterizing reach-avoid sets of nonlinear control systems as the solution to an optimal control problem. We consider the case where we have competing inputs and hence adopt the gaming formulation proposed in [15]. We first restrict our attention to a specific reach-avoid scenario, where the objective of the control input is to make the states of the system hit the target at the end of our time horizon and without violating the state constraints, while the disturbance input tries to steer the trajectories of the system away from the target. We then generalize our approach to the case where the controller aims to steer the system towards the target not necessarily at the terminal, but at some time within the specified time horizon. Both problems could be treated as pursuit-evasion games, and for a worst case setting we define a value function similar to [24] and prove that it is the unique continuous viscosity solution to a quasi-variational inequality of a form similar to [25], [26]. The advantage of this approach is that the properties of the value function and the Hamiltonian (both of them are continuous) enable us using existing tools to compute the solution of the problem numerically.

To illustrate our approach, we consider a reach-avoid problem that arises in the area of air traffic management, in particular the problem of collision avoidance in the presence of 4D constraints, called Target Windows. Target Windows (TW) are spatial and temporal constraints and form the basis of the CATS research project [27], whose aim is to increase punctuality and predictability during the flight. In [28] a reachability approach of encoding TW constraints was proposed. We adopt this framework and consider a multi-agent setting, where each aircraft should respect its TW constraints while avoiding conflict with other aircraft in the presence of wind. Since both control and disturbance inputs (in our case the wind) are present, this problem can be treated as a pursuit-evasion differential game with state constraints, which are determined dynamically by performing conflict detection.

In Section II we pose two reach-avoid problems for continuous systems with competing inputs and state constraints, and formulate them in the optimal control framework. Section III provides the characterization of the value functions of these problems as the viscosity solution to two variational inequalities. In Section IV we present an application of this approach to a two aircraft collision avoidance scenario with realistic data. Finally, in Section V we provide some concluding remarks and directions for future work.

## 2. Differential games and Reach-Avoid problems

### 2.1. Differential game problem formulation

Consider the continuous time control system , and an arbitrary time horizon . with , , , and . Let , denote the set of Lebesgue measurable functions from the interval to U, and V respectively. Consider also two functions , to be used to encode the target and state constraints respectively,

Assumption 1. and are compact. , and are bounded and Lipschitz continuous in x and continuous in u and v.

Under Assumption the system admits a unique solution for all , and . For this solution will be denoted as

 (1) ϕ(τ,t,x,u(⋅),v(⋅))=x(τ).

Let be a bound such that for all and and for all ,

 |f(x,u)|≤Cf and |f(x,u)−f(^x,u)|≤Cf|x−^x|.

Let also and be such that

 |l(x)|≤Cl and |l(x)−l(^x)|≤Cl|x−^x|, |h(x)|≤Ch and |h(x)−h(^x)|≤Ch|x−^x|.

In a game setting it is essential to define the information patterns that the two players use. Following [29], [15] we restrict the first player to play non-anticipative strategies. A non-anticipative strategy is a function such that for all and for all , if for almost every , then for almost every . We then use to denote the class of non-anticipative strategies.

Consider the sets , related to the level sets of the two bounded, Lipschitz continuous functions and respectively. For technical purposes assume that is closed whereas is open. Then and could be characterized as

 R={x∈Rn | l(x)≤0}, A={x∈Rn | h(x)>0}.

### 2.2. Reach-Avoid at the terminal time

Consider now a closed set that we would like to reach while avoiding an open set . One would like to characterize the set of the initial states from which trajectories can start and reach the set at the terminal time without passing through the set over the time horizon . To answer this question on needs to determine whether there exists a choice of such that for all , the trajectory satisfies and for all .

The set of initial conditions that have this property is then

 (2) RA(t,R,A) ={x∈Rn | ∃γ(⋅)∈Γ[t,T], ∀v(⋅)∈V[t,T], (ϕ(T,t,x,γ(⋅),v(⋅))∈R)∧(∀τ∈[t,T], ϕ(τ,t,x,γ(⋅),v(⋅))∉A)}.

Now introduce the value function

 (3) V(x,t)=infγ(⋅)∈Γ[t,T]supv(⋅)∈V[t,T]max{l(ϕ(T,t,x,u(⋅),v(⋅))),maxτ∈[t,T]h(ϕ(τ,t,x,u(⋅),v(⋅)))}.

can be thought of as the value function of a differential game, where is trying to minimize, whereas is trying to maximize the maximum between the value attained by at the end of the time horizon and the maximum value attained by along the state trajectory over the horizon . Based on [14], [15] and [25], we will show that the value function defined by is the unique viscosity solution of the following quasi-variational inequality.

 (4) max{h(x)−V(x,t),∂V∂t(x,t)+supv∈Vinfu∈U∂V∂x(x,t)f(x,u,v)}=0,

with terminal condition .

It is then easy to link the set of to the level set of the value function defined in .

Proposition 1. .

###### Proof.

if and only if . Equivalently, there exists a strategy such that for all , . The last statement is equivalent to there exists a such that for all , and . Or in other words, there exists a such that for all , and for all . ∎

### 2.3. Reach-Avoid at any time

Another related problem that one might need to characterize is the set of initial states from which trajectories can start, and for any disturbance input can reach the set not at the terminal, but at some time within the time horizon , and without passing through the set until they hit . In other words, we would like to determine the set

 (5) ˜RA(t, R,A)={x∈Rn | ∃γ(⋅)∈Γ[t,T], ∀v(⋅)∈V[t,T], ∃τ1∈[t,T], (ϕ(τ1,t,x,γ(⋅),v(⋅))∈R)∧(∀τ2∈[t,τ1], ϕ(τ2,t,x,γ(⋅),v(⋅))∉A)}.

Based on [30], define the augmented input as and consider the dynamics

 (6) ~f(x,~u,v)=¯uf(x,u,v).

Let denote the solution of the augmented system, and define , and similarly to the previous case. Following [30] for every the pseudo-time variable is given by

 (7) σ(τ)=t+∫τt¯u(s)ds.

Consider to be almost an inverse of in the sense that . In [30], was defined as the limit of a convergent sequence of functions, and it was shown that

 (8) ϕ(σ(τ),x,t,u(σ∗(⋅)),v(σ∗(⋅)))=~ϕ(τ,x,t,~u(⋅),v(⋅)),

for any . Based on the analysis of [30], equation implies that the trajectory of the augmented system visits only the subset of the states visited by the trajectory of the original system in the time interval .

Define now the value function

 ˜V(x,t)=inf~γ(⋅)∈˜Γ[t,T]supv(⋅)∈V[t,T]max{l(~ϕ(T,t,x,~γ[v](⋅),v(⋅))),maxτ∈[t,T]h(~ϕ(τ,t,x,~γ[v](⋅),v(⋅)))}.

One can then show that is related to the set .

Proposition 2. For , .

The proof of this proposition is given in Appendix A.

## 3. Characterization of the value function

### 3.1. Basic properties of V

We first establish the consequences of the principle of optimality for .

Lemma 1. For all and all :

 (9) V(x,t)=infγ(⋅)∈Γ[t,t+α]supv(⋅)∈V[t,t+α][max{maxτ∈[t,t+α]h(ϕ(τ,t,x,u(⋅))),V(ϕ(t+α,t,x,u(⋅)),t+α)}].

Moreover, for all .

The proof for the second part is straightforward and follows from the definition of . The proof for the first part is given in Appendix B.

We now show that is a bounded, Lipschitz continuous function.

Lemma 2. There exists a constant such that for all :

 |V(x,t)|≤C and |V(x,t)−V(^x,^t)|≤C(|x−^x|+|t−^t|).

The proof of this Lemma is given in Appendix B.

### 3.2. Variational inequality for V

We now introduce the Hamiltonian , defined by

 H(p,x)=supv∈Vinfu∈UpTf(x,u).

Lemma 3. There exists a constant such that for all , and all :

 |H(p,x)−H(q,x)|

The proof of this fact is straightforward (see [14], [15] for details). We are now in a position to state and prove the following Theorem, which is the main result of this section.

Theorem 1. is the unique viscosity solution over of the variational inequality

 max{h(x)−V(x,t),∂V∂t(x,t)+supv∈Vinfu∈U∂V∂x(x,t)f(x,u,v)}=0,

with terminal condition .

###### Proof.

Uniqueness follows from Lemma 2, Lemma 3 and [25]. Note also that by definition of the value function we have . Therefore it suffices to show that

1. For all and for all smooth : , if attains a local maximum at , then

 max{h(x0)−V(x0,t0),∂W∂t(x0,t0)+supv∈Vinfu∈U∂W∂x(x0,t0)f(x0,u,v)}≥0
2. For all and for all smooth : , if attains a local minimum at , then

 max{h(x0)−V(x0,t0),∂W∂t(x0,t0)+supv∈Vinfu∈U∂W∂x(x0,t0)f(x0,u,v)}≤0

The case is automatically captured by [31].
Part 1. Consider an arbitrary and a smooth such that has a local maximum at . Then, there exists such that for all with

 (V−W)(x0,t0)≥(V−W)(x,t).

We would like to show that

 max{h(x0)−V(x0,t0),∂W∂t(x0,t0)+supv∈Vinfu∈U∂W∂x(x0,t0)f(x0,u,v)}≥0.

Since by Lemma 1 , either or, . For the former the claim holds, whereas for the latter it suffices to show that there exists such that for all

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,u,v)≥0.

For the sake of contradiction assume that for all there exists such that for some

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,u,v)<−2θ<0.

Since is smooth and is continuous, then based on [15] we have that

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,u,ζ)<−3θ2<0,

for all and some , where denotes a ball centered at with radius . Because V is compact there exist finitely many distinct points , and such that and for

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,ui,ζ)<−3θ2<0.

Define by setting for , if . Then

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,g(v),v)<−3θ2<0.

Since is smooth and is continuous, there exists such that for all with

 ∂W∂t(x,t)+∂W∂x(x,t)f(x,g(v),v)<−θ<0.

Finally, define by for all . It is easy to see that is now non-anticipative and hence . So for all and all such that ,

 ∂W∂t(x,t)+∂W∂x(x,t)f(x,γ[v](⋅),v(⋅))<−θ<0.

By continuity, there exists such that for all . Therefore, for all

 V(ϕ (t,t0,x0,γ(⋅),v(⋅)),t)−V(x0,t0)≤W(ϕ(t,t0,x0,γ(⋅),v(⋅)),t)−W(x0,t0) =∫tt0(∂W∂s(ϕ(s,t0,x0,γ(⋅),v(⋅)),s) +∂W∂x(ϕ(s,t0,x0,γ(⋅),v(⋅)),s)f(ϕ(s,t0,x0,γ(⋅),v(⋅)),γ(⋅),v(⋅)))ds <−θ(t−t0).

Let be such that

 h(ϕ(τ0,t0,x0,γ(⋅),v(⋅)))=maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),v(⋅))).

Case 1.1: If , then for we have

 (10) V(ϕ(τ0,t0,x0,γ(⋅),v(⋅)),τ0)−V(x0,t0)<−θ(τ0−t0)<0.

Then by the dynamic programming argument of Lemma we have:

 V(x0,t0)≤supv(⋅)∈V[t0,t0+δ3][max{ maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),v(⋅))), V(ϕ(τ0,t0,x0,γ(⋅),v(⋅)),τ0)}].

We can choose such that

 V(x0,t0)≤max{maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),v(⋅))),V(ϕ(τ0,t0,x0,γ(⋅),v(⋅)),τ0)}+ϵ,

and set . Since for all we have that

 maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),^v(⋅)))=h(ϕ(τ0,t0,x0,γ(⋅),^v(⋅)))≤V(ϕ(τ0,t0,x0,γ(⋅),^v(⋅)),τ0).

Hence

 V(x0,t0)≤V(ϕ(τ0,t0,x0,γ(⋅),^v(⋅)),τ0)+θ2(τ0−t0).

Since holds for all , it will also hold for , and hence the last argument establishes a contradiction.

Case 1.2: If then for we have that for all

 V(ϕ(t0+δ3,t0,x0,γ(⋅),v(⋅)),t0+δ3)−V(x0,t0)<−θδ3<0.

Since by Lemma 1

 V(x0,t0)≤supv(⋅)∈V[t0,t0+δ3]max{ maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),v(⋅))), V(ϕ(t0+δ3,t0,x0,γ(⋅),v(⋅)),t0+δ3)},

then if

 V(x0,t0)≤supv(⋅)∈V[t0,t0+δ3]V(ϕ(t0+δ3,t0,x0,γ(⋅),v(⋅)),t0+δ3),

we can choose such that

 V(x0,t0)≤V(ϕ(t0+δ3,t0,x0,γ(⋅),^v(⋅)),t0+δ3)+θδ32,

If

 V(x0,t0)≤supv(⋅)∈V[t0,t0+δ3]maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),v(⋅))),

then we can choose such that

 V(x0,t0)≤maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,γ(⋅),^v(⋅)))+ϵ,

or equivalently , since . Based on our initial hypothesis that , there exists a such that . If we take we establish a contradiction.

Part 2. Consider an arbitrary and a smooth such that has a local minimum at . Then, there exists such that for all with

 (V−W)(x0,t0)≤(V−W)(x,t).

We would like to show that

 max{h(x0)−V(x0,t0),∂W∂t(x0,t0)+supv∈Vinfu∈U∂W∂x(x0,t0)f(x0,u,v)}≤0.

Since it suffices to show that . This implies that for all there exists a such that

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,u,v)≤0.

For the sake of contradiction assume that there exists such that for all there exists such that

 ∂W∂t(x0,t0)+∂W∂x(x0,t0)f(x0,u,^v)>2θ>0.

Since is smooth, there exists such that for all with

 ∂W∂t(x,t)+∂W∂x(x,t)f(x,u,^v)>θ>0.

Hence, following [15], for and any

 ∂W∂t(x,t)+∂W∂x(x,t)f(x,γ(⋅),v(⋅))>θ>0.

By continuity, there exists such that for all . Therefore, for all

 V(ϕ (t0+δ3,t0,x0,γ(⋅),v(⋅)),t0+δ3)−V(x0,t0) ≥W(ϕ(t0+δ3,t0,x0,γ(⋅),v(⋅)),t0+δ3)−W(x0,t0) =∫t0+δ3t0(∂W∂t(ϕ(t,t0,x0,γ(⋅),v(⋅)),t) +∂W∂x(ϕ(t,t0,x0,γ(⋅),v(⋅)),t)f(ϕ(t,t0,x0,γ(⋅),v(⋅)),γ(⋅),v(⋅)))dt >θδ3.

But by the dynamic programming argument of Lemma we can choose a such that

 V(x0, t0)≥supv(⋅)∈V[t0,t0+δ3][max{maxτ∈[t0,t0+δ3]h(ϕ(τ,t0,x0,^γ(⋅),v(⋅))),