Hamiltonian-Based Algorithm for Optimal Control

# Hamiltonian-Based Algorithm for Optimal Control

M.T. Hale Y. Wardi H. Jaleel M. Egerstedt School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
Email: {mhale30,ywardi,magnus}@ ece.gatech.edu.
Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
Email: hassanjaleel@uet.edu.pk.
###### Abstract

This paper proposes an algorithmic technique for a class of optimal control problems where it is easy to compute a pointwise minimizer of the Hamiltonian associated with every applied control. The algorithm operates in the space of relaxed controls and projects the final result into the space of ordinary controls. It is based on the descent direction from a given relaxed control towards a pointwise minimizer of the Hamiltonian. This direction comprises a form of gradient projection and for some systems, is argued to have computational advantages over direct gradient directions. The algorithm is shown to be applicable to a class of hybrid optimal control problems. The theoretical results, concerning convergence of the algorithm, are corroborated by simulation examples on switched-mode hybrid systems as well as on a problem of balancing transmission- and motion energy in a mobile robotic system.

###### keywords:
Optimal control, relaxed controls, optimization algorithms, switched-mode systems.
footnotetext: Research supported in part by the NSF under Grant CNS-1239225.

## 1 Introduction

Consider dynamical systems described by the differential equation

 ˙x = f(x,u), (1)

where is its state variable and is the input control. The set is assumed to be compact. Suppose that the initial time is , and the initial state and the final time are given and fixed. A control is said to be ordinary if the function is Lebesgue measurable, and we call a control admissible if it is ordinary and for every . Let be an absolutely-integrable cost function, and let

 J := ∫tf0L(x(t),u(t))dt (2)

be its related performance functional. The optimal control problem that we consider is to minimize over the space of admissible controls.

The following assumption will be made throughout the paper:

###### Assumption 1.1

(1). The function is twice-continuously differentiable in for every ; the functions , , and are locally-Lipschitz continuous in ; and there exists such that, for every and for every , . (2). The function is continuously differentiable in for every ; and the functions and are locally-Lipschitz continuous in .

Observe that part (1) of the assumption guarantees the existence of a unique absolutely-continuous solution of Equation (1) in its integral form for every admissible control and , while part (2) implies that the Lebesgue integral in Equation (2) is well defined.

This paper has been motivated by two kinds of problems: one concerns switched-mode hybrid systems, and the other concerns optimal balancing of the energy required for transmission and motion in mobile robotic networks. We propose an algorithm defined in the setting of relaxed controls, and analyze its convergence using Polak’s framework of optimality functions. The theoretical results are first derived in the abstract framework of Eqs. (1) and (2), and then applied to problems in the two aforementioned areas of interest.

A standard basic requirement of algorithms for nonlinear-programming (finite-dimensional optimization) problems, and especially gradient-descent methods, is that every accumulation point of an iterate-sequence they compute must satisfy an optimality condition for a local minimum, such as the Kuhn-Tucker condition (e.g., Polak97). Thus, if a bounded sequence of iteration points is computed then it has at least one accumulation point which, therefore, must satisfy that condition. However, in infinite-dimensional optimization this requirement can be vacuous since bounded sequences need not have accumulation points. This issue is not only theoretical but also has practical implications. An infinite-dimensional problem may not have a solution or a local minimum on a closed and bounded set, and even if a local minimum exists, it is not guaranteed that it can be approximated by the solution points of the problem’s finite-dimensional discretizations at any level of precision. To get around these issues, E. Polak developed a comprehensive framework for design and analysis of algorithms for infinite-dimensional optimization that gives the user considerable discretion in choosing the discretization levels in an adaptive fashion Polak97. A survey of the framework will be carried out in the next section, and we mention that recently it has been used in the context of switched-mode hybrid systems in Refs. Caldwell10; Caldwell11; Caldwell12; Caldwell16; Gonzalez10; Vasudevan12; Egerstedt06; Axelsson08.

A relaxed control is a mapping from the time interval into the space of probability measures on the set McShane67; Warga72. Relaxed controls provide a useful framework for optimal control problems for the following reasons. First, the space of relaxed controls is convex even though the input-constraint set may not be convex. Second, the space of relaxed controls is compact in a suitable topology, namely the weak-star topology, and hence it contains solutions to optimal control problems lacking solutions in a functional space of admissible controls like (a more detailed survey of these points is provided in Section 2). However, a relaxed control is a more abstract object than an ordinary control, which can make it problematic for an algorithm to handle an iterative sequence of relaxed controls. This point will be discussed in the sequel.

This paper combines the frameworks of optimality functions and relaxed controls to define a new algorithm for the optimal control problem. Its main innovation is in the choice of the search direction from a given relaxed control, which is based on a pointwise minimizer of the Hamiltonian (defined below) at each time .111The computation of the minimizer of the Hamiltonian does not require the solving of a two-point boundary value problem, but rather is based on sequential integrations of the state equation forwards and the costate (adjoint) equation backwards. The structure of the problem, and especially the absence of constraints on the final state, make this possible under Assumption 1.1. Final-state constraints can be handled by the application of penalty functions thereby transforming the problem into one without final-state constraints, as a forthcoming example will illustrate. Its step size is determined by the Armijo procedure Armijo66; Polak97. To our knowledge it is the first such general algorithm which is defined entirely in the space of relaxed controls without projecting the results of each iteration into the space of ordinary controls. The following results will be established.

• The aforementioned search direction yields descent of the cost functional (2) even though it is not defined in explicit gradient terms. It is a form of gradient projection. For a class of problems (including autonomous switched-mode systems) its computation is straightforward and simpler than that of direct gradients.

• The algorithm is stable in the sense that it yields descent in the performance integral regardless of the initial guess, Furthermore, the simulation results presented in Section 4 indicate its rapid descent at the early stages of its runs. By this we do not claim a high rate of asymptotic convergence since it is a first-order algorithm, but rather that most of its descent is obtained in few early iterations requiring meagre computing (CPU) times.

• The combination of concepts and techniques from the settings of optimality functions and relaxed controls yields an analysis framework that is based on simple arguments. This point will become evident from the simplicity and brevity of the forthcoming proofs.

Whereas the area of numerical techniques for optimal control has had a long history (e.g., Polak97 and references therein), recently there has been a considerable interest in optimal control problems defined on switched-mode hybrid systems. In this context the problem was formulated in Branicky98; Piccoli98, and variants of the Maximum Principle were derived for it in Piccoli98; Sussmann99; Shaikh02; Garavello05; Bengea05; Shaikh07; Taringoo13; Taringoo13a. New and emerging algorithmic approaches include first- and second-order techniques Xu02; Xu02a; Shaikh02; Egerstedt06; Caldwell11; Vasudevan12, zoning algorithms based on the geometric properties of the underlying systems Shaikh03; Caines05; Shaikh05; Shaikh07, projection-based algorithms Caldwell10; Caldwell12; Caldwell12a; Caldwell12b; Vasudevan12; Caldwell16, methods based on dynamic programming and convex optimization Hellund99, and algorithms based on needle variations Attia05; Egerstedt06; Axelsson08; Gonzalez10; Taringoo13a. Concerning the relaxed hybrid problem, Ref. Ge75 developed generalized-linear programming techniques and convex-programming algorithms, Bengea05 derived optimality conditions (both necessary and sufficient) for a class of hybrid optimal control problems, and Meyer12 applied to them the MATLAB fmincon nonlinear programming solver. Comprehensive recent surveys can be found in Zhu11; Lin14.

The forthcoming algorithm will be presented and analyzed in the abstract problem formulation of Eqs. (1) and (2) and their extensions to the relaxed-control setting. However, for implementation, we restrict the class of problems in the following two ways: 1). The pointwise minimizer of the Hamiltonian can be computed or adequately estimated by a simple formula. 2). For every , the dynamic response function in Eq. (1) is affine in , and the cost function in Eq. (2) is convex in . Many problems of theoretical and practical interest in optimal control satisfy these restrictions. These include problems defined on autonomous switched-mode systems and other hybrid systems, which will be shown to admit efficient implementations of the algorithm. Other kinds of hybrid systems are not yet included, and these will be mentioned in the sequel as a subject of current research.

The rest of the paper is organized as follows. Section 2 presents brief surveys of the frameworks of relaxed controls and optimality functions. Section 3 describes the algorithm and derives related theoretical results, while Section 4 presents simulation results. Finally, Section 5 concludes the paper and points out directions for future research.

The term control refers to the function and is denoted by the boldface symbol to distinguish it from a point in the set which is denoted by the lower-case or . Similarly, boldface notation refers to a function of as in for the state trajectory, for the costate (adjoint) trajectory, for a relaxed control defined as a function from into the space of probability measures on , etc.

## 2 Review of Established Results

This section recounts the basic framework of relaxed controls and some fundamental notions of algorithms’ convergence in infinite-dimensional spaces.

### 2.1 Relaxed Control

Comprehensive presentations of the theory of relaxed controls and their role in optimal control can be found in McShane67; Warga72; Young69; Gamkrelidze78; Berkovitz13; also see Lou09 for a recent survey. In the following paragraphs we summarize its main points that are relevant to our discussion. Let denote the space of Borel probability measures on the set , and denote by a particular point in . A relaxed control associated with the system (1) is a mapping which is measurable in the following sense: For every continuous function , the function is Lebesgue measurable in . We denote the space of relaxed controls by , and in accordance with previous notation we denote a relaxed control by .

Recall that an ordinary control is admissible if the function is (Lebesgue) measurable. Note that the space of ordinary controls is embedded in the space of relaxed controls by associating with the Dirac probability measure at . In this case we say, with a slight abuse of notation, that is an ordinary control, and indicate this by the notation . Furthermore, the space of ordinary controls is dense in the space of relaxed controls in the weak-star topology, namely in the following sense: For every relaxed control there exists a sequence of ordinary controls such that, for every function ,222 is the space of functions that are measurable and absolutely integrable in on for every , and continuous on for every .

 limk→∞∫tf0ψ(t,uk(t))dt = ∫tf0∫Uψ(t,u)dμ(t)dt.

Furthermore, the space of relaxed controls is compact in the weak-star topology.

An extension of the system defined by Equations (1) and (2) to the setting of relaxed controls is provided by the state equation

 ˙x(t)=∫Uf(x(t),u)dμ(t) (3)

with the same boundary condition as for (1), and the related cost functional defined as

 J(μ)=∫tf0∫UL(x(t),u)dμ(t)dt. (4)

The relaxed optimal control problem is to minimize over .

There are two noteworthy special cases. First, if , then Equations (3) and (4) are reduced to Equations (1) and (2), respectively. Second, in the case where is a finite set, (3) and (4) have the following respective forms: and , with and , and this corresponds to the case of autonomous switched-mode systems. The space of relaxed controls generally is convex even though the set need not be convex.

Essential parts of the theory of optimal control, including the Maximum Principle Vinter00, apply to the relaxed-control problem (see also Young69; McShane67; Warga72; Gamkrelidze78; Berkovitz13). Thus, defining the adjoint (costate) variable by the equation

 ˙p(t)=−∫U(∂f∂x(x(t),u)⊤p(t)+∂L∂x(x(t),u)⊤)dμ(t) (5)

with the boundary condition , the Hamiltonian has the form

 H(x(t),μ(t),p(t)) =∫U(p(t)⊤f(x(t),u)+L(x(t),u))dμ(t), (6)

and the Maximum Principle states that if is a minimum for the relaxed optimal control problem then minimizes the Hamiltonian at almost every time-point .333Note that the integrand of (6) is the usual Hamiltonian , while the term in the Left-hand Side of (6), namely , refers to the relaxed Hamiltonian. These two notations are distinguished by their second variable, vs. , which suffices to render the usage of the functional term unambiguous in the sequel.

### 2.2 Infinite-dimensional Optimization

It is a common practice to characterize convergence of algorithms for infinite-dimensional optimization problems in terms of optimality functions Polak97. Consider the abstract optimization problem of minimizing a function where is a topological space, and consider an optimality condition (necessary or sufficient) associated with this optimization problem. An optimality function is a function having the property that if and only if satisfies the optimality condition. The optimality-function concept is useful if is a meaningful heuristic measure of the extent to which fails to satisfy the optimality condition. For example, if is a continuously Frechet-differentiable functional defined on a Hilbert space , then an optimality condition is ( meaning the Frechet derivative), and a meaningful associated optimality function is , where the indicated norm is in .

Reference Polak97 devised a framework for analysis of algorithms in this abstract setting, where convergence is defined as follows: If an algorithm computes a sequence , , of points in then

 limk→∞θ(vk)=0. (7)

In this abstract setting such a sequence need not have an accumulation point even if it is a bounded sequence in a metric space (unless it is isomorphic to a Euclidean space), and therefore a characterization of an algorithm’s convergence in terms of such accumulation points could be vacuous. The use of optimality functions via Equation (7) serves to resolve this conceptual issue. Note that it is a form of weak convergence.

Consider an algorithm that computes from a given the next iteration point, denoted by , and suppose that its repetitive application computes a sequence of iteration points , where . The algorithm is said to have the property of sufficient descent with respect to if the following two conditions hold: (i) for every , and (ii) for every there exists such that for every , if then . For such an algorithm, the following result is a straightforward corollary of Theorem 1.2.8 in Polak97 and hence its proof is omitted.

###### Proposition 2.1

Suppose that is bounded over . If an algorithm is of sufficient descent then it is convergent in the sense of (7).

Refs. Caldwell10; Caldwell11; Caldwell12; Caldwell16; Gonzalez10; Vasudevan12; Egerstedt06; Axelsson08 used the framework of optimality functions and sufficient descent to define and analyze their respective algorithms for the switched-mode optimal control problem, where is the space of admissible controls , and typically is related to the magnitude of the steepest feasible descent-direction vector. In contrast, the optimality function defined in this paper is based on the Hamiltonian rather than the steepest descent or any explicit form of a derivative.

Consider the relaxed control problem defined by Equations (3) and (4). Given a relaxed control , let and be the associated state trajectory and costate trajectory as defined by (3) and (5), respectively. We use the following optimality function, :444In this and later equations we drop the explicit notational dependence of various integrand-terms on when no confusion arises.

 θ(μ) = minν∈M∫tf0(H(x,ν,p)−H(x,μ,p))dt, (8)

where the Hamiltonians in the integrand of (8) were defined in (6). Recall that and in (8) are associated with and hence are independent of ; therefore, the compactness of the space of relaxed controls implies that the minimum (not only inf) in (8) exists. Let denote an argmin, then (8) becomes

 θ(μ)=∫tf0(H(x,μ⋆,p)−H(x,μ,p))dt. (9)

We observe that this optimality function satisfies the aforementioned properties with respect to the Maximum Principle: Obviously for every ; if and only if minimizes the Hamiltonian at almost every and hence satisfies the Maximum Principle; and arguably indicates the extent to which the Maximum Principle is not satisfied at .

## 3 Hamiltonian-based Algorithm

The analysis in this section is carried out under Assumption 1.1.

Given a relaxed control , let and denote the related state trajectory and costate trajectory defined by (3) and (5), respectively. For these state and costate, and for every , consider the Hamiltonian , defined by (6), as a function of its second variable. Fix another relaxed control . Now for every , is also a relaxed control, and we denote it by . Furthermore, let denote the state trajectory associated with as defined by (3), namely,

 ˙xλ(t )= λ∫Uf(xλ(t),u)dν(t) +(1−λ)∫Uf(xλ(t),u)dμ(t), (10)

and define , where is defined by (4), namely

 ~J(λ) =∫tf0(λ∫UL(xλ(t),u)dν(t) +(1−λ)∫UL(xλ(t),u)dμ(t))dt. (11)

The algorithm described in this section is based on moving from in the direction of by choosing a step size , and therefore, we next characterize those that provide a direction of descent.

###### Proposition 3.1

The one-sided derivative exists and has the following form,

 d~Jdλ+(0) = ∫tf0(H(x,ν,p)−H(x,μ,p))dt, (12)

where all the terms in the integrand in the Right-Hand Side (RHS) of (12) are functions of time.

Proof. Consider the Right-Hand Sides (RHS) of Equations (10) and (11) as functions of and , for given relaxed controls and . By Assumption 1.1 these functions are twice-continuously differentiable () in , and by (10) and (11) they are linear in and hence as well. Therefore, standard variational techniques show that the function is differentiable on and its derivative has the following form,

 d~Jdλ(λ) =∫tf0(pλ(t)⊤∫Uf(xλ(t),u)(dν(t)−dμ(t)) +∫UL(xλ(t),u)(dν(t)−dμ(t)))dt, (13)

where is the costate associated with this derivative. Furthermore, by (10), it is seen that is given by Equation (5) with instead of .

Next, for , we have that , , and , and therefore, with in (13) we obtain,

 d~Jdλ+(0) =∫tf0(p(t)⊤∫Uf(x(t),u)(dν(t)−dμ(t)) (14)

By Equation (6) the RHS of (14) is identical to the RHS of (12).

Equation (12) implies that is a descent direction from if the RHS of (12) is negative. This is the case if is a pointwise minimizer of the Hamiltonian over at each time , unless for almost every . In this case, let us use the notation for a pointwise minimizer of the Hamiltonian. The following result implies that the pointwise search for such a minimizer can be confined to and need not be extended to , the space of Borel probability measures on .

###### Proposition 3.2

Fix and . Let . Then, for every probability measure ,

 H(x,u⋆,p)≤H(x,ν,p). (15)

Note that the Left-hand side (LHS) of (15) is the usual Hamiltonian while its RHS is the relaxed Hamiltonian defined by (6).

Proof. By (6) and the fact that minimizes the ordinary Hamiltonian over , we have, for every ,

 H(x,ν,p)=∫UH(x,u,p)dν ≥∫UH(x,u⋆,p)dν=H(x,u⋆,p). (16)

We point out that the point in the statement of Proposition 3.2 is the pointwise minimizer of the Hamiltonian over , for given and . Such a minimizer exists and the minimum is finite since, by assumption the set is compact, and by Assumption 1.1 the function is continuous in .

Let be a relaxed control, and let and be the associated state trajectory and costate trajectory as defined by Equations (3) and (5), respectively. For every , let . It does not mean that the function , is an admissible control since it might not be Lebesgue measurable. On the other hand, we have seen that there exists a relaxed control that minimizes the RHS of (8) over , and hence minimizes over (the space of Borel probability measures on ) for almost every .

Ideally we would like to choose such as the descent direction of the algorithm from , but its computation may be fraught with difficulties for the following two reasons: (i) for a given , may not be a Dirac measure at a point in , and (ii) The pointwise minimizer has to be computed for every in the infinite set . Therefore we choose as descent direction a relaxed control having the following two properties: (i) where is a piecewise-constant ordinary control, and (ii) “almost” minimizes the Hamiltonian in the following sense: For a given a constant which we fix throughout the algorithm (below),

 ∫tf0(H(x,ν,p)−H(x,μ,p))dt≤ηθ(μ); (17)

and are all functions of time. We label such an -minimizer of the Hamiltonian. It will be seen that it is always possible to choose a relaxed control with these two properties as long as . With this direction of descent, the algorithm uses the Armijo step size Armijo66; Polak97. It has the following form.

Given constants , , and .

###### Algorithm 3.3

Given , compute by the following steps.

Step 0: If , set , then exit.
Step 1: Compute the state and costate trajectories, and , associated with , by using Equations (3) and (5), respectively.
Step 2: Compute a relaxed control which is an -minimizer of the Hamiltonian, namely it satisfies Equation (17).
Step 3: Compute the integer defined as follows,

 ℓμ=min{ℓ=0,1,…,: (18)

Define .
Step 4: Set

 μnext=μ+λμ(ν−μ). (19)

A few remarks are due.

1). The algorithm is meant to be run iteratively and compute a sequence such that , as long as it does not exit in Step 0.

2). The algorithm does not attempt to solve a two-point boundary value problem. In Step 1 it first integrates the differential equation (3) forward from the initial condition , and then the differential equation (5) backwards from the specified terminal condition . We do not specify the particular numerical integration technique that should be used, but say more about it in the sequel.

3). We do not specify the choice of in Step 2 but rather leave it to the user’s discretion. However, we point out that such an -minimizer of the Hamiltonian always exists in the form of for a piecewise-constant ordinary control , unless . The reason is that the space of ordinary controls is dense in the space of relaxed controls in the weak star topology on , and the space of piecewise-constant ordinary controls is dense in the space of ordinary controls in the norm and hence in the weak-star topology.

4). In Step 4, is a convex combination of and in the sense of measures, meaning that for every continuous function ,

 ∫Ug(u)dμnext(t)=∫Ug(u)((1−λμ)dμ(t)+λμdν(t)).

In the event that and , this means that

 ∫Ug(u)dμnext(t)=(1−λμ)g(u(t))+λμg(v(t)),

which does not necessarily imply that .

5). The Armijo step size, computed in Step 3, is commonly used in nonlinear programming as well as in infinite-dimensional optimization. Reference Polak97 contains analyses of several algorithms using it and practical guidelines for its implementation.

6). It will be proven that the integer defined in Step 3 is finite as long as , hence the algorithm cannot jam at a point (relaxed control) that does not satisfy the Maximum Principle, namely the condition .

We point out that the idea of a descent direction comprised of a pointwise minimizer of the Hamiltonian has its origin in Mayne75. The algorithm proposed in that reference is defined in the setting of ordinary controls, its descent direction is all the way to a minimizer of the Hamiltonian at a subset of the time-horizon , and its main convergence result is stated in terms of accumulation points of computed iterate-sequences. The algorithm in this paper is quite different in that it is defined in the setting of relaxed controls, it moves part of the way towards a minimizer of the Hamiltonian throughout the entire time horizon, and its analysis is carried out in the context of optimality functions.

We next establish convergence of Algorithm 3.3.

###### Proposition 3.4

Suppose that Assumption 1.1 is satisfied. Let be a sequence of relaxed controls computed by Algorithm 3.3, such that for every , . Then,

 limk→∞θ(μk)=0. (20)

The proof is based on the following lemma, variants of which have been proved in Polak97 (e.g., Theorem 1.3.7). We supply the proof in order to complete the presentation.

###### Lemma 3.5

Let be a twice-continuously differentiable ) function. Suppose that , and there exists such that for every . Fix , and define . Then for every positive ,

 g(λ)−g(0) ≤ αλg′(0). (21)

Proof. Recall (see Polak97, Eq. (18b), p.660) the following exact second-order expansion of functions,

 g(λ) = g(0)+λg′(0)+λ2∫10(1−s)g′′(sλ)ds. (22)

Using this and the assumption that , we obtain that

 g(λ)−g(0)−αλg′(0) = (1−α)λg′(0)+λ2∫10(1−s)g′′(sλ)ds ≤ (1−α)λg′(0)+λ2K/2 = λ((1−α)g′(0)+λK/2). (23)

For every positive , , and hence, and by (23), Equation (21) follows.

Proof of Proposition 3.4.  Let and be any two relaxed controls, and for , consider as defined by Equation (11). We next show that is a twice-continuously differentiable function of , and there exists such that, for all , , and ,

 ∣∣d2~Jdλ2(λ)∣∣≤K. (24)

This follows from variational arguments developed and summarized in Polak97 as follows. First, consider two ordinary controls, and , and define for every . Denote by the state trajectory associated with as defined by (1), and let be the cost functional, defined by (2), as a function of . By Assumption 1.1, an application of Corollary 5.6.9 and Proposition 5.6.10 in Polak97 to variations in yields that is continuously differentiable in , and the term is bounded from above over all ordinary admissible controls , , and . A second application of these arguments to the derivative , supported by the assumption (Assumption 1.1), yields that is twice continuously differentiable and its second derivative also is bounded from above over all ordinary admissible controls , , and .

By the Lebesgue Dominated Convergence Theorem, the same result holds true when and are replaced by two respective relaxed controls, and , and the state equation and cost function are defined by Equations (3) and (4), respectively. This shows that is in , and there exists , independent of , , and , such that Equation (24) is satisfied.

Let us apply this result to and where is a given relaxed control and is defined in Step 2 of Algorithm 3.3. Recall the constant that is used by the algorithm, and define By Lemma 3.5, for every positive

 ~J(λ)−~J(0)≤αλd~Jdλ+(0). (25)

Recall that , and therefore, according to the notation in the RHS of (18), ; consequently, if , then (25) is satisfied with , namely,

 J(μ+βℓ(μ⋆−μ))−J(μ)≤αβℓd~Jdλ+(0). (26)

By Proposition 3.1 (Equation (12)) and the choice of in Step 2 (Equation (17)), the RHS of (26) implies that

 J(μ+βℓ(μ⋆−μ))−J(μ)≤αβℓηθ(μ) (27)

as long as . But (12), (17), and the fact that imply that and hence ; consequently (27) is satisfied as long as . Therefore, and by Equation (18), the step size defined in Step 3 satisfies the inequality

 λμ:=βℓμ≥βηγ|θ(μ)|. (28)

Next, Equations (18) and (19) imply that

 J(μnext)−J(μ)≤αλμηθ(μ), (29)

and hence, and by (28) and the fact that , we have that

 J(μnext)−J(μ)≤−αβγη2θ(μ)2. (30)

But is independent of or , and hence Equation (30) implies that the algorithm is of a sufficient descent.

Finally, the set is compact by assumption, and therefore standard applications of the Bellman-Gronwall inequality and Equation (4) yield that is upper-bounded over all . Consequently Proposition 2.1 implies the validity of Equation (20).

A note on implementation. Implementations of Algorithm 3.3 generally require numerical integration methods for computing (or approximating) and in Step 1, and a computation of in Step 2 that is based on the pointwise minimization of the Hamiltonian at a finite number of points in the time-horizon . Both require finite grids on the time horizon, which may be different and can vary from one iteration to the next. The choice of the grid sizes generally comprises a balance between precision and computing times. A rule-of-thumb proposed in Polak97 is to adjust the grid adaptively by tightening it whenever it is sensed that a local optimum is approached. This adaptive-precision technique underscores Polak’s algorithmic framework of consistent approximation for infinite-dimensional optimization while guaranteeing convergence in the sense of Eq. (20).

In this paper we are not concerned with the formal rules for adjusting the grids (and hence precision levels). Instead, we run the algorithm several times per problem, each with a fixed grid, to see how far it can minimize the cost functional. The goal of this experiment is to test the tradeoff between precision and computing times. The results, presented in the next section, indicate rapid descent from the initial guess regardless of how far it is from the optimum. In fact, the key argument in the proof of convergence is the sufficient descent property of the algorithm, captured in Equation (30), which implies large descents until an iteration-sequence approaches a local minimum. This suggests that the main utility of the algorithm is not in the asymptotic convergence close to a minimum (where higher-order methods can be advantageous), but rather in its approach to such points. For a detailed discussion of this point and some comparative results please see Section 4.

Another point related to implementation concerns the algorithm’s having to compute relaxed controls, which are more complicated objects than ordinary controls. To address this concern we next examine a class of systems where the state equation is affine in and the cost function is convex in .

Consider the case where

 f(x,u)=ϕf(x)+Ψf(x)u, (31)

where the functions and (the latter being the space of matrices) satisfy Assumption 1.1. By the linearity of the integration operator and the fact that is a probability measure, Equation (3) assumes the form

 ˙x(t)=f(x(t),∫Uudμ(t)), (32)

meaning that, for the purpose of computing the state trajectory, the convexification of the vector field inherent in (3) yields the same results as a convexification of the control. Defining , the state equation becomes , and we can view as a control function from into . Likewise, if with and , then (4) becomes

 J(μ)=∫tf0L(x,¯u)dt. (33)

In this case the relaxed optimal control problem is cast as an ordinary optimal control problem with the input constraints . Moreover, if Algorithm 3.3 starts at an ordinary control on then it could compute only such ordinary controls. The reason is that in Step 2 can always be an ordinary control (as earlier said), and the convexification of measures in Eq. (19) can be carried out by the convexification of ordinary controls. Therefore, if in Eq. (19), and for ordinary admissible controls (on ), then (19) yields

 μnext∼u+λμ(v−u), (34)

which is an ordinary admissible control on . This is the case of autonomous switched-mode systems, as will be demonstrated in Section 4.

Consider next the case where is affine in as in Equation (31), and is convex in for every . Then Equation (32) is true but (33) and hence (34) are not true. Therefore, supposing that and for ordinary controls (on ), Equation (19) yields that is a relaxed control but not an ordinary control on . However, the convexity of in in conjunction with (32) imply that, for every

 J(u+λ(v−u))≤J(u)+λμ(J(v)−J(u)) =J(μ)+λμ(J(ν)−J(μ)). (35)

In this setting, Algorithm 3.3 uses the convexified cost in Step 3 (Eq. (18)), but by (35), we would get a lower value by convexifying the control. Consequently, the inequality in (18) would imply a similar an inequality with the convexified control, as in Eq. (36), below. Modifying Algorithm 3.3 accordingly, the following algorithm results.

Given constants , , and .

###### Algorithm 3.6

Given such that is an admissible control on , compute by the following steps.

Step 0: If , set , then exit.
Step 1: Compute the state and costate trajectories, and , associated with , by using Equations (3) and (5), respectively.
Step 2: Compute an -minimizer of the Hamiltonian, , such that is an ordinary control on .
Step 3: Compute the integer defined as follows,

 ℓμ=min{ℓ=0,1,…,: J(u+βℓ(v−u))−J(u)≤αβℓηθ(μ)}. (36)

Define .
Step 4: Set

 μnext∼u+λμ(v−u). (37)

Note that, by (34) and (37), an iterative application of Algorithm 3.6 would compute only ordinary controls that are admissible on . Furthermore, as discussed earlier, by Equation (35), if the Armijo test in Equation (18) were to be satisfied for a given then it would be satisfied in (36) as well and result in a lower descent in . Since this descent in (18) yields the key condition of uniform descent for Algorithm 3.3, it also holds for Algorithm 3.6 thereby guaranteeing its convergence via a verbatim application of Proposition 3.4.

## 4 Simulation Results

This section reports on applications of Algorithm 3.3 and Algorithm 3.6 to three problems: an autonomous switched-mode problem, a controlled switched-mode problem, and a problem of balancing motion energy with transmission energy in a mobile network.555The code was written in MATLAB and executed on a laptop computer with an Intel i7 quad-core processor, clock frequency of 2.1 GHz, and 8GB of RAM. In the first problem the state equation and the cost function are affine in and hence Algorithm 3.3 is identical to Algorithm 3.6; in the second problem the cost function is not affine in and hence we use Algorithm 3.6; and in the third problem the state equation is affine in and the cost function in convex in and hence we can use Algorithm 3.6. The first two problems were considered in Vasudevan12, and we use its reported results as benchmarks for our algorithms. The third problem was addressed in Jaleel13, but we choose here an initial guess that is farther from the optimum. As stated earlier the efficiency of the algorithm depends on the ease with which the pointwise minimizer of the Hamiltonian can be computed, and for all three problems it will be shown to be computable via a simple formula. Since the resulting function is an ordinary control, we use in Step 2.

### 4.1 Double Tank System

Consider a fluid-storage tank with a constant horizontal cross section, where fluid enters from the top and discharged through a hole at the bottom. Let denote the fluid inflow rate from the top, and let be the fluid level in the tank. According to Toricelli’s law, the state equation of the system is . The system considered in this subsection is comprised of two such tanks, one on top of the other, where the fluid input to the upper tank is from a valve-controlled hose at the top, and the input to the lower tank consists of the outflow process from the upper tank. Denoting by the inflow rate to the upper tank, and by the fluid levels in the upper tank and lower tank, respectively, the state equation of the system is

 ˙x=(u−√x1√x1−√x2); (38)

we assume the initial condition . The control input is assumed to be constrained to the two-point set , and hence the system can be viewed as an autonomous switched-mode system whose modes correspond to the two possible values of . The considered problem is to have the fluid level at the lower tank track the value of , and accordingly we choose the cost functional to be

 J=2∫tf0(x2−3)2dt. (39)

As in Vasudevan12, the final time is .

By (38), is affine in , and by (39), does not depend on and hence can be considered affine. Therefore Algorithm 3.6 can be run as a special case of Algorithm 3.3, where . Moreover, with the costate , the Hamiltonian has the form

 H(x,u,p)=p1u−p1√x1+p2(√x1−√x2)+2(x2−3)2,

whose pointwise minimizer is

 u⋆={1,if p1≥0,2,if p1<0;

if then can be any point in the interval .

We ran Algorithm 3.6 starting from the initial control , having the cost . All numerical integrations were performed by the forward Euler method with , and we approximate by its zero-order hold with the sample values , . We benchmark the results against the reported run of the algorithm in Vasudevan12 which, starting from the same initial control, obtained the final cost of 4.829; we reached a similar final cost. In fact, 100 iterations of our algorithm reduced the cost from to in seconds of CPU time.

Figure 1 depicts the graph of vs. the iteration count . The graph indicates a rapid reduction in the cost from its initial value until it stabilizes after about iterations. The L-shaped graph is not atypical in applications of descent algorithms with Armijo step sizes, whose strength lies in its global stability and large strides towards local solution points at the initial stages of its runs. As a matter of fact, similar L shaped graphs were obtained from all of the algorithm runs reported on in this section.

The final relaxed control computed by Algorithm 3.6, , was projected onto the space of admissible (switched-mode) controls by Pulse-Width Modulation (PWM) with cycle time of seconds. The resulting cost value is , and the combined runs of the algorithm and the projection took 2.6825 seconds of CPU time. We point out that the projection had to be performed only once, after Algorithm 3.6 had completed its run.

Returning to the run of Algorithm 3.6, the L-shaped graph in Figure 1 suggests that a reduction in CPU times can be attained, if necessary, by computing fewer iterations. Moreover, further reduction can be obtained by taking larger integration steps without significant changes in the final cost. To test this point we ran the algorithm from the same initial control for 50 iterations with , and it reduced the cost-value from to in 0.2939 seconds of CPU time; including the projection onto the space of switched-mode controls it reached in a total time of 0.3043 seconds. With a larger integration step, , the algorithm yielded a cost-reduction from to in 0.1566 seconds of CPU time, and in a total time of 0.1655 seconds. These results are summarized in Table 1.

The CPU times indicated in Table 1 are less than the run-time reported in Vasudevan12 for solving the same problem (32.38 seconds). However, these numbers should not be considered as a sole basis for comparing the two techniques since the respective algorithms were implemented on different hardware and software platforms.666Ref. Vasudevan12 refers to a software package for its algorithm, but we were unable to run it because apparently it is linked to a proprietary code. Therefore we were unable to conduct a direct comparison between the two algorithms. Furthermore, the algorithm in Vasudevan12 has a broader scope than ours, while our code is specific for the problem in question. The only conclusion we draw from Table 1 is that Algorithm 3.6 may have merit and deserves further investigation. We believe, however, that our choice of the descent direction, namely the pointwise minimizer of the Hamiltonian, plays a role in the fast run times as well as simplicity of the code as compared with explicit-gradient techniques.

We close this discussion with a comment on convergence of the algorithm in the control space. Algorithm 3.6 (as well as Algorithm 3.3) is defined in the space of relaxed controls where its convergence is established in the weak star topology. Therefore, there is no reason to expect the sequence of computed controls, , to converge in any (strong) functional norm such as . In fact, the graphs of for , are depicted in Figure 2, where no strong convergence is discerned. However, the weak convergence proved in Section 3 suggests that the cost-sequence would converge to the minimal cost, and this indeed is evident from Figure 1. Furthermore, the associated sequence of state trajectories are expected to converge in a strong sense ( norm) to the state trajectory of the optimal control, and this is indicated by Figure 3 depicting the graphs of (the fluid levels at the lower tank) for .

### 4.2 Hybrid LQR

This problem was considered in Vasudevan12 as well. Consider the switched-linear system , where ,

 A = ⎛⎜⎝1.0979−0.01050.0167−0.01051.04810.08250.01670.08251.1540⎞⎟⎠,

is constrained to a finite set , and is a continuum-valued input. We assume the initial condition to be . The set consists of three points, namely , with , , and . The continuum-valued control is constrained to .

All three modes of the system are unstable since the eigenvalues of are in the right-half plane, and the problem is to switch among the vectors and choose in a way that brings the state close to the target state in a given amount of time and in a way that minimizes the input energy. The corresponding cost functional is

 J=0.01∫tf0v2dt+∥x(tf)−xf∥2 (40)

with , and the control variable is . Reference Vasudevan12 solved this problem from the initial control of and for all , and attained the final cost of in (reported) seconds of CPU time.

Since is not affine in (due to the term ) Algorithm 3.6 may not be applicable and hence we used Algorithm 3.3. We chose the same initial guess as in Vasudevan12 and ran the algorithm for 20 iterations. The results indicate a similar L-shaped graph of to the one shown in Figure 1, and attained a cost-value reduction from to in seconds of CPU time. All integrations were performed by the forward Euler method with , and the boundary condition of Equation (5) was due to the cost on the final state.777The cost functional in (40) is not quite in the form of (2) due to the addition of the final-state cost term . However, using the standard transformation of a Bolza optimal control problem (as in (40)) to a Lagrange problem (as in (2)) and applying the algorithm to the latter, requires only one change, namely setting the boundary condition of the costate to ; all else remains the same. The Hamiltonian at a control has the form , and hence its minimizer over , , is computable as follows: For every , , define according to the following three contingencies: (i) if , then ; (ii) if , then and (iii) if