Symmetry reduction for dynamic programming\thanksreffootnoteinfo

Symmetry reduction for dynamic programming\thanksreffootnoteinfo

Abstract

We present a method of exploiting symmetries of discrete-time optimal control problems to reduce the dimensionality of dynamic programming iterations. The results are derived for systems with continuous state variables, and can be applied to systems with continuous or discrete symmetry groups. We prove that symmetries of the state update equation and stage costs induce corresponding symmetries of the optimal cost function and the optimal policies. We then provide a general framework for computing the optimal cost function based on gridding a space of lower dimension than the original state space. This method does not require algebraic manipulation of the state update equations; it only requires knowledge of the symmetries that the state update equations possess. Since the method can be performed without any knowledge of the state update map beyond being able to evaluate it and verify its symmetries, this enables the method to be applied in a wide range of application problems. We illustrate these results on two six-dimensional optimal control problems that are computationally difficult to solve by dynamic programming without symmetry reduction.

O
1

footnoteinfo]A preliminary version of this work was presented at the 2017 American Control Conference. Corresponding author J. Maidens.

Berkeley]John Maidens, Safran]Axel Barrau, Mines]Silvère Bonnabel, and Berkeley]Murat Arcak

ptimal control; Dynamic programming; Invariant systems; Model reduction; Path planning; Medical applications

1 Introduction

The dynamic programming algorithm for computing optimal control policies has, since its development, been known to suffer from the “curse of dimensionality” (Bellman, 1957). Its applicability in practice is typically limited to systems with four or five continuous state variables because the number of points required to grid a space of continuous state variables increases exponentially with the state dimension . This complexity has led to a collection of algorithms for approximate dynamic programming, which scale to systems with larger state dimension but lack the guarantees of global optimality of the solution associated with the original dynamic programming algorithm (Bellman and Dreyfus, 1959; Bertsekas, 2012; Powell, 2007, 2016).

In practice, many real-world systems exhibit symmetries that can be exploited to reduce the complexity of system models. Symmetry reduction has found applications in fields ranging from differential equations (Clarksonz and Mansfield, 1994; Bluman and Kumei, 2013) to model checking (Emerson and Sistla, 1996; Kwiatkowska et al., 2006). In control engineering, symmetries have been exploited to improve control of mechanical systems (Marsden et al., 1990; Bloch et al., 1996; Bullo and Murray, 1999), develop more reliable state estimators (Barrau and Bonnabel, 2014), study the controllability of multiagent systems (Rahmani et al., 2009) and to reduce the complexity of stability and performance certification for interconnected systems (Arcak et al., 2016; Rufino Ferreira et al., 2017). Symmetry reduction has also been applied to the computation of optimal control policies for continuous-time systems in (Grizzle and Marcus, 1984; Ohsawa, 2013) and Markov decision processes (MDPs) in (Zinkevich and Balch, 2001; Narayanamurthy and Ravindran, 2007).

In this paper, we present a theory of symmetry reduction for the optimal control of discrete-time, stochastic nonlinear systems with continuous state variables. This reduction allows dynamic programming to be performed in a lower-dimensional state space. Since the computational complexity of a dynamic programming iteration increases exponentially with state dimension, this reduction significantly decreases computational burden. Further, our proposed method does not rely on an explicit transformation of the state update equations, making the method applicable in situations where a such a transformation is difficult or impossible to find analytically.

We present two theorems that summarize our method of symmetry reduction. Theorem 3.1 describes how symmetries of the system dynamics imply symmetries of the optimal cost and optimal policy functions. Theorem 3.2 then describes a method of computing the cost function based on reduced coordinate system that depends on fewer state variables.

This paper builds on the work we presented in the conference paper (Maidens et al., 2017). The most substantial improvement is the additional theoretical results presented in Sections 2.3 and 3.2. The conference version presented an ad hoc symmetry reduction for a magnetic resonance imaging (MRI) application, but did not provide a general methodology for computing the coordinate reduction. This paper addresses this shortcoming by presenting a general method based on the moving frame formalism, which leads to the general symmetry reduction formula presented in Theorem 3.2. Additionally, the MRI example has been reworked to match this new formalism, and the numerical implementation and graphs of the numerical solution have been improved. We have also included two new extensions of this formalism to the case of equivariant costs in Section 3.3 and to the synchronization problem of stochastic dynamic systems on matrix groups in Section 3.4, along with examples to illustrate the algorithm in these contexts.

This paper is organized as follows: in Section 2 we introduce notation and provide background information both on dynamic programming for optimal control, and on the mathematical theory of symmetries. In Section 3, we derive our main theoretical results, that is, we prove that control system symmetries induce symmetries of the optimal cost function and optimal control policy, and then leverage the result to present a general method of performing dynamic programming in reduced coordinates. In Section 4 we apply the algorithm to a cooperative control problem for two Dubins vehicles using a Lie group formulation. In Section 5 we apply symmetry reduction to compute the solution of an optimal control problem arising in dynamic MRI acquisition. Code to reproduce the computational results in this paper is available at https://github.com/maidens/Automatica-2017.

2 Dynamic Programming and Symmetries

In this section, we first recall the main features of dynamic programming for optimal control of stochastic discrete time systems. Then we introduce our problem and provide the reader with a primer on the classical theory of symmetries. We also introduce the notion of invariant control systems with invariant costs.

2.1 Dynamic programming for optimal control of stochastic systems

We begin by introducing dynamic programming for finite horizon optimal control following the notation of (Bertsekas, 2005). We consider a discrete-time dynamical system

(1)

where is the system state, is the control variable to be chosen at time , are independent continuous random variables each with density , and is a finite control horizon. Associated with this system is an additive cost function

that we wish to minimize through our choice of . We define a control system to be a tuple where is the joint density of the random variables .

We consider a class of control policies where maps observed states to admissible control inputs. Given an initial state and a control policy , we define the expected cost under this policy as

An optimal policy is defined as one that minimizes the expected cost:

where denotes the set of all admissible control policies. The optimal cost function, denoted , is defined to be the expected cost corresponding to an optimal policy.

As in (Bertsekas, 2005), we use to denote the infimum value regardless of whether there is a policy that achieves a minimum. In the example problems presented in Sections 4 and 5, the existence of a minimum is guaranteed by compactness or finiteness arguments respectively. In the general case, the optimal cost function can be computed using the dynamic programming algorithm regardless of the existence of minimizers, but the existence of an optimal policy requires that a minimum be achieved for each .

We quote the following result due to Bellman from (Bertsekas, 2005):

{prop}

[Dynamic Programming] For every initial state , the optimal cost is equal to , given by the last step of the following algorithm, which proceeds backward in time from period to period 0:

(2)

where the expectation is taken with respect to the probability distribution of defined by the density . Furthermore, if there exists minimizing the right hand side of (2) for each and , then the policy where is optimal.

The intermediate functions for computed in this manner represent the optimal cost of the tail subproblem beginning at . The optimal cost of the entire problem is given by the function obtained when this recursion terminates.

2.2 Invariant system with invariant costs

We first recall the definition of a transformation group for a control system, as in (Martin et al., 2004; Jakubczyk, 1998; Respondek and Tall, 2002). See Olver (1999) for the more general theory.

{defn}

[Transformation group] A transformation group on is set of tuples parametrized by elements of a Lie group having dimension , such that the functions , and are all diffeomorhpisms and satisfy:

  • , , when is the identity of the group and

  • , , for all where denotes the group operation and denotes function composition.

To simplify notation we will sometimes suppress the subscripts . In the present paper, we will consider the following class of systems and cost functions. {defn} (Invariant control system with invariant costs) A control system is -invariant with -invariant costs if for all , , and we have:

where denotes the Jacobian of . The rationale is simple: For any fixed , consider the change of variables , , . Then, we have

and for we have also . As a result, if is a series of controls that minimize , then one can expect to minimize , under some assumptions on the noise. As a result, the optimal control problem needs only be solved once for all initial conditions belonging to the set , reducing the initial dimensional problem to a dimensional problem. The present paper derives a theory for such symmetry reduction in dynamic programming, and provides various examples of engineering interest.

2.3 Cartan’s moving frame method

To find a reduced coordinate system in which to perform dynamic programming, we will use the moving frame method of Cartan (Cartan, 1937). In general, this method only results in a local coordinate transformation as it relies on the implicit function theorem. In this paper we will focus on a transformation within a single coordinate chart. For many practical problems, including both examples in this paper, the transformation computed using this method extends to all of with the exception of a lower-dimensional submanifold of the state space. In such cases, only a single coordinate chart is required for the purpose of gridding the entire state space for dynamic programming.

We briefly introduce the moving frame method following the presentation in (Bonnabel et al., 2008). Consider an -dimensional transformation group (with ) acting on via the diffeomorphisms . Assume we can split as with and components respectively so that is an invertible map. Then, for some in the range of , we define a coordinate cross section to the orbits . This cross section is an -dimensional submanifold of . Assume moreover that for any point , there is a unique group element such that . Such will be denoted , and the map will be called moving frame.

A moving frame can be computed by solving the normalization equation:

Define the following map as

Note that, for all we have , that is, the components of are invariant to the group action on the state space. Further, due to our assumptions, the restriction of to is injective. We denote this restricted function , and it will serve as a reduced coordinate system to solve the invariant optimal control problem.

3 Main Results

In order to combat the “curse of dimensionality” associated with performing dynamic programming in high-dimensional systems, we describe a method to reduce the system’s dimension by exploiting symmetries in the dynamics and stage costs.

3.1 Symmetries imply equivalence classes of optimal policies

{thm}

(Symmetries of the optimal cost and policy) Let be a group and let be a -invariant control system with -invariant costs. Then the optimal cost functions satisfy the symmetry relations

for any and any . Furthermore, if is an optimal policy then so is for any .

{pf}

We prove this by induction on proceeding backward from the base case . First, note that

Now, suppose that for some we have for all . Then for any , and we have

where the change of variables is defined via and the tildes are subsequently dropped. Therefore, from the one-step dynamic programming principle we have

Thus . Now, if is an optimal policy and we denote then for any we have

Thus is an optimal policy. \qed

3.2 Dynamic programming can be performed using reduced coordinates

Theorem 3.1 readily implies the problem can be reduced, as all states along an orbit of are equivalent in terms of cost, and that there are equivalence classes of optimal policies. So it suffices to only consider the cost corresponding to a single representative of each equivalence class, and to find a single representative of the optimal policy within each class. This can now easily be done using the injective map .

For , let be such that , and define

The following result shows that the functions on the -dimensional space are completely determined by the values of on the subset of .

{cor}

For any and , the cost function for the full problem can be computed in terms of the lower-dimensional cost function as

where is well defined as . It is thus sufficient to have evaluated at all points of to be able to instantly evaluate at any point of . An optimal policy can also be lifted to an optimal policy on the original state space via this method:

This allows optimal trajectories of the system to be computed in the original coordinates via the lifted policy .

{thm}

(Dynamic programming in reduced coordinates) The reduced coordinates are in one to one correspondance with the cross-section . For any , let satistisfy . Then in the reduced coordinates, the sequence can be computed recursively via

{pf}

We have

3.3 Case of equivariant costs

So far, we have considered the costs to be invariant under transformation. We now briefly discuss how this can be generalized. The cost is said to be equivariant if there exists a family of diffeomorphisms such that . As we want the cost function to be equivariant too, we will need to be linear. Thus we will simply assume that is of the form , that is, it is a scaling of the cost, where . For simplicity’s sake, we consider here the problem to be noise free. Along the lines of the preceding sections it is easily proved that

as already noticed in (Alvarez and Stokey, 1998) for the case of homogeneous costs. Symmetry reduction can then be applied. We now give two tutorial examples.

Example 1

Consider the linear system

with quadratic costs . The system is invariant to scalings, , , and the cost is equivariant letting , where . The unit sphere is a cross section to the orbits, and the normalization equation yields . Applying the results above, we see that the controls that minimize are , where are those minimizing . This agrees with the well known fact that the optimal controller for the problem above is the linear quadratic controller, and is indeed of the linear form .

Example 2

Consider the following system and costs

where denotes the norm of and is a map satisfying for . Such costs may arise when one tries to force some controls to zero to create sparsity, a method known as regularization. This problem is challenging, particularly for nonconvex . But according to the theory above, it is sufficient to solve it numerically for initial conditions lying on the unit sphere of the state space.

3.4 Optimal formation control on Lie groups

We now apply the theory presented in Section 2.2 where the state space is the Cartesian product of matrix Lie groups. Note that, straightforward modifications arise along the way as the state space and noise space are not vector spaces as in the theory above. The methodology is then applied to the synchronization of two non-holonomic cars in the presence of uncertainties.

We model the system as a collection of agents, where the state of each agent evolves on a -dimensional matrix Lie group . We assume that the evolution of the state of agent proceeds according to the equation

(3)

where , , are all square matrices belonging to , is a control that lives in some finite dimensional vector space, and is the noise. The control objective is to reach a desired configuration, that is, a desired value for the relative configurations of the agents , see e.g., Sarlette et al. (2010) for more information.

Systems of this form are naturally invariant to left multiplication of all by some matrix :

where . Letting , , and the costs be of the form , we get an invariant system with invariant costs.

One can define a cross section to the orbits by letting the first agent coordinates be equal to the identity matrix, that is, . The normalization equation is given by , hence the moving frame is given by . The invariants are computed as

The optimal stochastic control problem can then be solved in the reduced coordinate system defined by , reducing the state space from dimension to .

4 Application I: Cooperative formation control for two stochastic Dubins vehicles

We consider two identical Dubins vehicles each with dynamics

where and denote the two-dimensional position of the vehicle, denotes the heading of the vehicle, is a velocity input, is a steering angle input, and is independent, identically-distributed zero-mean Gaussian noise with variance , and is a parameter that determines the vehicle’s steering radius.

These dynamics can be embedded in the three-dimensional special Euclidean matrix Lie group , by defining the state

input matrix

and noise matrix

with state update equation of the form (3).

We wish to compute a control policy for a two-vehicle system, with states and , where the controls can only take a finite number of values, and with terminal cost

where , that is, we want the vehicles to have the same heading, and follow each other at unit distance. Thanks to the theory developed above, the stochastic control problem is reduced from problem with a six dimensional state space to a problem with a three dimensional state space only.

For numerical simulations, the cost functions were computed on a fixed grid of dimension using turning radius parameter , input sets and Globally optimal input and state trajectory sequences corresponding to the initial condition are shown in Figures 1 and 2. These are compared against a deterministic version of the model with in Figures 3 and 4.

Figure 1: Optimal input sequence for cooperative stochastic Dubins vehicle model with .
Figure 2: Optimal state sequence for the cooperative stochastic Dubins vehicle model with .
Figure 3: Optimal input sequence for cooperative deterministic Dubins vehicle model.
Figure 4: Optimal state sequence for the cooperative deterministic Dubins vehicle model.

5 Application II: MRI Fingerprinting

Magnetic resonance imaging (MRI) has traditionally focused on acquisition sequences that are static, in the sense that sequences typically wait for magnetization to return to equilibrium between acquisitions. Recently, researchers have demonstrated promising results based on dynamic acquisition sequences, in which spins are continuously excited by a sequence of random input pulses, without allowing the system to return to equilibrium between pulses. Model parameters corresponding to and relaxation, off-resonance and spin density are then estimated from the sequence of acquired data. This technique, termed magnetic resonance fingerprinting (MRF), has been shown to increase the sensitivity, specificity and speed of magnetic resonance studies (Ma et al., 2013; Davies et al., 2014).

This technique could be further improved by replacing randomized input pulse sequences with sequences that have been optimized for informativeness about model parameters. To this end, we present a model of MR spin dynamics that describes the measured data as a function of and relaxation rates and the sequence of radio-frequency (RF) input pulses, used to excite the spins.

The following model was introduced in the conference paper (Maidens et al., 2017). In this paper, an optimal control was computed via dynamic programming on a very sparse six-dimensional grid. Now using our symmetry reduction technique, we exploit symmetry reduction to provide a much more accurate optimal input sequence computed on a finer five-dimensional grid.

We model the spin dynamics via the equations

(4)

where the states and describe the transverse magnetization (orthogonal to the applied magnetic field) and describes the longitudinal magnetization (parallel to the applied magnetic field). To simplify the presentation, off-resonance is neglected in this model. Control inputs SO(3) describe flip angles corresponding to RF excitation pulses that rotate the state about the origin. Between acquisitions, transverse magnetization decays according to the parameter and the longitudinal magnetization recovers to equilibrium (normalized such that the equilibrium is ) according to the parameter where is the sampling interval.

(5)

We assume that data are acquired immediately following the RF pulse, allowing us to make a noisy measurement of the transverse magnetization. We also assume that the measured data are described by a multivariate Gaussian random variable

where is a zero-mean Gaussian noise with covariance . This model results from a time discretization of the Bloch equations (Bloch, 1946; Nishimura, 2010) under a time scale separation assumption that specifies that the RF excitation pulses act on a much faster time scale than the relaxation time constants and . A simplified two-state version of this model was considered in (Maidens et al., 2016b), where the transverse magnetization was modelled using a single state describing the magnitude of .

We see from the model (4) that magnetization in the transverse direction decays while magnetization in the longitudinal direction grows. However only the transverse component of the magnetization can be measured. Thus there is a trade-off between making measurements (which leads to loss of magnetization) and magnetization recovery. This is the trade-off that we hope to manage through the optimal design of an input sequence .

We wish to quantify the informativeness of an acquisition sequence based on the information about the relaxation parameter that is contained in the resulting data set. More formally, we wish to choose SO(3) to maximize the Fisher information about contained in the joint distribution of . The Fisher information can be expressed as a quadratic function of the sensitivities of with respect to :

where the sensitivities satisfy the following sensitivity equations:

It should be noted that for system (4), the objective function has many local optima as a function of the input sequence . Thus, in contrast with (Maidens et al., 2016a) which consider optimal experiment design for hyperpolarized MRI problems, for this model, local search methods provide little insight into what acquisition sequences are good. In contrast with the MRI model presented in (Maidens and Arcak, 2016), where global optimal experiment design heuristics are developed for linear dynamical systems, in this model the decision variables multiply the state vector , making the output a nonlinear function of the sequence . Thus we must use dynamic programming to find a solution.

5.1 Model

To present this problem in the formalism we have introduced, we define an augmented state vector

We can write the dynamics of the augmented state as a control system with and defined in Equation (5). This system has a one-dimensional group of symmetries defined by

for any .

5.2 Dynamic programming in reduced coordinates

To perform dynamic programming in a reduced coordinate system, we begin by defining the cross-section , and computing the moving frame . To do so, we solve

Isolating yields

where denotes the multi-valued inverse tangent function

Next, we compute the invariants using

Further, restricted to the cross-section is injective with inverse given by The theory above tells us we can thus solve the optimal stochastic control problem in a 5 dimensional state space, reducing the original 6 dimensional problem of 1 dimension.

Note that this reduction can be computed using only the state transformation group , without reference to the state update equation . Thus the same reduction can be applied to any system with the same symmetries.

5.3 Results

To implement this algorithm, we discretize the reduced five-dimensional state space and two-dimensional input space via grids of size and respectively. The code was written in the Julia language and parallelized to allow evaluation of in parallel across grid points (Maidens et al., 2016b). The implementation is publicly available at https://github.com/maidens/Automatica-2017.

Optimal input and state trajectories for the model corresponding to the initial condition at the equilibrium are plotted in Figures 5 and 8.

Figure 5: Optimal input sequence for the MR fingerprinting model. The angles , and represent rotations about the , and axes respectively, resulting in an control input .
{subfigure}

[b]

Figure 6: Magnetizations
{subfigure}

[b]

Figure 7: Sensitivities
Figure 8: Optimal state sequence for the MR fingerprinting model. Here we have plotted the longitudinal and transverse components of both the magnetization (states , , and ) and the sensitivities (states , , and ) where the transverse component is computed as the Euclidean norm of the vectors and