Feedback Integrators
Abstract
A new method is proposed to numerically integrate a dynamical system on a manifold such that the trajectory stably remains on the manifold and preserves first integrals of the system. The idea is that given an initial point in the manifold we extend the dynamics from the manifold to its ambient Euclidean space and then modify the dynamics outside the intersection of the manifold and the level sets of the first integrals containing the initial point such that the intersection becomes a unique local attractor of the resultant dynamics. While the modified dynamics theoretically produces the same trajectory as the original dynamics, it yields a numerical trajectory that stably remains on the manifold and preserves the first integrals. The big merit of our method is that the modified dynamics can be integrated with any ordinary numerical integrator such as Euler or RungeKutta. We illustrate this method by applying it to three famous problems: the free rigid body, the Kepler problem and a perturbed Kepler problem with rotational symmetry. We also carry out simulation studies to demonstrate the excellence of our method and make comparisons with the standard projection method, a splitting method and StörmerVerlet schemes.
Contents
1 Introduction
Given a dynamical system on a manifold with first integrals, it is important for a numerical integrator to preserve the manifold structure and the first integrals of the equations of motion. This has been the focus of much effort in the development of numerical integration schemes [3]. In this paper we do not propose any specific numerical integration scheme, but rather propose a new paradigm of integration that can faithfully preserve conserved quantities with existing numerical integration schemes.
The main idea in our paradigm is as follows. Consider a dynamical system on a manifold with first integrals , . Assume that we can embed the manifold into Euclidean space and extend the first integrals to a neighborhood of in . For an arbitrary point , consider the set
which is the intersection of with all the level sets of the first integrals containing the point , and is an invariant set of the dynamical system. We then extend the dynamical system from to and then modify the dynamics outside of such that the set becomes a unique local attractor of the extended, modified system. Since the dynamics have not changed on by the extension and modification to , both the original system on and the extended, modified system on produce the same trajectory for the initial point . Numerically, however, integrating the extended system has the following advantage: if the trajectory deviates from at some numerical integration step, then it will get pushed back toward the attractor in the extended, modified dynamics, thus remaining on the manifold and preserving all the first integrals. It can be rigorously shown that the discretetime dynamical system derived from any onestep numerical integrator with uniform step size for the extended, modified continuoustime system indeed has an attractor that contains the set in its interior and converges to as . In this paper we shall use the word, preserve, in this sense. It is noteworthy that the numerical integration of the extended dynamics can be carried out with any ordinary integrator and is done in one global Cartesian coordinate system on . We find conditions for applicability of this method and implement the result on the following three examples: the free rigid body dynamics, the Kepler problem, and a perturbed Kepler problem with rotational symmetry. We also carry out simulation studies to show the excellence of our new paradigm of integration for numerical preservation of conserved quantities in comparison with other wellknown integration schemes, such as projection and splitting methods and symplectic StörmerVerlet integrators.
2 Theory
Consider a dynamical system on an open subset of :
(1) 
where is a vector field on . Let us make the following assumptions:

There is a function such that for all , , and
(2) for all .

There is a positive number such that is a compact subset of .

The set of all critical points of in is equal to .
Adding the negative gradient of to (1), let us consider the following dynamical system on :
(3) 
Since is the minimum value of , for all . Hence, the two vector fields and coincide on .
Theorem 2.1.
Under assumptions A1 – A3, every trajectory of (3) starting from a point in stays in for all and asymptotically converges to the set as . Furthermore, is an invariant set of both (1) and (3).
Proof.
Let be a trajectory of (3) starting from a point in . By A1
(4) 
for all . Hence, is a positively invariant set of (3). From (4) and A3, it follows that . Hence, by LaSalle’s invariance principle [6], converges asymptotically to as , where A2 is used for compactness of . The invariance of follows from (2) and the coincidence of (1) and (3) on . ∎
Let us find a higherorder condition than that in assumption A3 so that A3 can be relaxed. For the function and the vector field in the statement of assumption A1, which are now both assumed to be of , let
(5) 
where , and denotes the th order directional derivative of along , i.e.,
Consider the following assumption in place of A3:

.
The following theorem generalizes Theorem 2.1:
Theorem 2.2.
Under assumptions A1, A2 and A3, every trajectory of (3) starting in stays in for all and asymptotically converges to the set as . Furthermore, is an invariant set of both (1) and (3).
Proof.
Consider the dynamics (3). It is easy to show that is a positively invariant set of the dynamics. Let be the largest invariant set in . Let be an arbitrary trajectory in . Since as shown in the proof of Theorem 2.1, the trajectory satisfies , i.e.,
(6) 
for all and . Since along , the trajectory satisfies
(7) 
for all . By differentiating (6) repeatedly in and using (7) on each differentiation, we can show that the trajectory satisfies
for all , and . Thus, the entire trajectory is contained in the set defined in (5), implying , from which and A3 it follows . Hence, by LaSalle’s invariance principle, every trajectory starting in asymptotically converges to and thus to as .
Remark 2.3.
1. If condition (2) is replaced by in assumption A1, then Theorems 2.1 and 2.2 still hold provided that the invariance of is replaced by positive invariance in the statement of the theorems.
2.Theorems 2.1 and 2.2 still hold with the use of the following modified dynamics
instead of (3), where is an matrixvalued function with its symmetric part positive definite at each .
3. From the control viewpoint, the added term in (3) can be regarded as a negative feedback control to asymptotically stabilize the set for the control system with control .
Suppose that assumptions A1, A2 and A3 (or A3 instead of A3) hold and that we want to integrate the dynamics (1) for an initial point . Since is positively invariant, the trajectory must remain in for all . Recall that the two dynamics (1) and (3) coincide on , so we can integrate (3) instead of (1) for the initial condition. Though there is no theoretical difference between the two integrations, integrating (3) has a numerical advantage over integrating (1). Suppose that the trajectory numerically deviates from the positively invariant set during integration. Then the dynamics (3) will push the trajectory back toward since is the attractor of (3) in whereas the dynamics (1) will leave the trajectory outside of which would not happen in the exact solution. It is noteworthy that this integration strategy is independent of the choice of integration schemes. In the Appendix we show that any onestep numerical integrator, as a discretetime dynamical system, with uniform step size for (3) has an attractor that contains in its interior and converges to as .
Let us now apply this integration strategy to numerically integrate dynamics on a manifold while preserving its first integrals and the domain manifold. Consider a manifold and dynamics
(8) 
on that have first integrals , . Suppose that is an embedded manifold in as a level set of a function for some , and that both the dynamics (8) and the functions , extend to an open neighborhood of in . Our goal is to numerically integrate (8) with an initial condition while preserving the manifold and the first integrals. Let
(9) 
and define a function by
(10) 
where is an constant symmetric positive definite matrix. Notice that
and that is invariant under the flow of (8). Or, more generally we can define a function as where is a nonnegative function that takes the value of only at . If the function satisfies assumptions A1, A2 and A3 (or A3 instead of A3), then by Theorem 2.1 (or Theorem 2.2), is the local attractor of the modified dynamics
(11) 
that coincide with the original dynamics (8) on .
The following lemma provides a sufficient condition under which the function defined in (10) satisfies assumptions A2 and A3:
Lemma 2.4.
Consider the functions and defined in (9) and (10). If is compact and the Jacobian matrix of has rank for all , then there is a number such that assumptions A2 and A3 hold.
Proof.
By compactness of and the regularity of , there is a bounded open set such that , and (x) has rank for all , where denotes the closure of . Consider now the gradient of . An easy calculation shows that,
Now, since for all , is onto as a linear map, is therefore one to one. It follows that, for ,
(12) 
In other words, the set of all critical points of in is equal to . Since the boundary of , being closed and bounded, is compact and , the minimum value, denoted by , of on is positive. If necessary, restrict the function to , replacing its original domain with . Then, there is a positive number less than such that . Therefore, assumption A3 holds for this number . Since the closed set is contained in the bounded set , it is compact, which implies that assumption A2 holds. ∎
Theorem 2.5.
Theorem 2.6.
For the functions and defined in (9) and (10), if satisfies (2) for all , the set is compact and there is an open subset of containing such that the Jacobian matrix is onto for all , then there is a number such that every trajectory starting in remains in for all and asymptotically converges to as .
Proof.
Modify the proof of Lemma 2.4 appropriately. ∎
As discussed above, we can integrate (11) instead of (8) for the initial condition , which will yield a trajectory that is expected to numerically well remain on the manifold and preserve the values of the first integrals , . It is noteworthy that the integration is carried out in one Cartesian coordinate system on rather than over local charts on the manifold which would take additional computational costs for coordinate changes between local charts. In the following section, we will apply this strategy to the free rigid body dynamics, the Kepler problem and a perturbed Kepler problem with rotational symmetry to integrate the dynamics preserving their first integrals and domain manifolds.
3 Applications
3.1 The Free Rigid Body
Consider the free rigid body dynamics:
(13a)  
(13b) 
where ; is the moment of inertia matrix; and
(14) 
for
Since , from here on we assume that the rigid body dynamics are defined on the Euclidean space and that the matrix denotes a matrix, not necessarily in . This is the extension of the dynamics step.
Define two functions and by
(15)  
(16) 
where represents the kinetic energy of the free rigid body and the spatial angular momentum vector when . These quantities are first integrals of (13). Choose any
and let
(17) 
Define an open set by
and a function by
(18) 
for , where , are constants, and is the 2norm defined by for a matrix . Observe that we are endowing the space with the standard inner product, and that the trace norm is precisely the norm induced on by this inner product. We compute all gradients that follow with respect to this inner product. Notice that
The following lemma shows that the function satisfies assumption A1 stated in §2.
Lemma 3.2.
The function satisfies
(20) 
Proof.
One can compute
where, in the third equality we use the fact that for symmetric and antisymmetric, .
Next, we compute,
Hence,
∎
The following lemma shows that the function satisfies assumptions A2 and A3 stated in §2.
Lemma 3.3.
There is a number satisfying
(21) 
such that is a compact subset of and the set of all critical points of in is equal to .
Proof.
It is obvious that there is a number satisfying (21) such that becomes a compact set in . For such a number , the matrix is invertible for every . Since is the minimum value of , every point in is a critical point of .
Let be a critical point of in . By Lemma 3.1 it satisfies
(22a)  
(22b) 
where
Postmultiplying (22a) by and premultiplying (22b) by yield
(23a)  
(23b) 
since . Notice that would imply , contradicting . Hence, . It follows from (22) that if any of the three equations
holds, then the three of them all hold. Thus
(24) 
since . Since the matrix in (22a) has rank 1 and the matrix is symmetric, there exist a unit vector and a number such that
(25) 
Substitution of (25) into (22a) and (23a) yields
which implies
(26) 
where the symbol means ‘is parallel to.’ Hence, we can express and as
(27)  
(28) 
for some numbers , and vectors , where and the vectors and can be any vectors such that becomes an orthonormal basis for . Substitution of (27) into (25) implies that is an orthonormal basis for . Substitution of (27) and (28) into (22b) implies , which together with in (26), implies , i.e., is an eigenvector of . We can now choose or redefine the unit vectors and such that they become eigenvectors of the symmetric matrix , too. In the orthonormal basis , we can now write the moment of inertia matrix as
where are the eigenvalues of , which are all positive, corresponding to the eigenvectors , respectively. It is then easy to see that equations (23) imply
(29a)  
(29b) 
where we have used .
We consider the following two separate cases: and . Suppose . If , then
by (21), which contradicts . Hence, . If , then equation (29a) implies , but equation (29b) implies , implying . This cannot be compatible with . Hence, is ruled out. Similarly, can be ruled out. Hence, , which implies contradicting (24). Thus, when , there are no critical points of in .
Suppose . We analyze equations (29) using a continuity argument. At , (29a) implies or , neither of which satisfies (29b) at . Thus, by continuity there exists a number with such that for any with there is no number satisfying both (29a) and (29b). Hence, . We now shrink the number such that it not only satisfies (21) but also . For such a number , we have
which contradicts . Hence, when , there are no critical points of in for some .
Therefore, there exists a number such that is the set of all critical points of in . ∎
3.2 The Kepler Problem
The twobody dynamics in the Kepler problem are given in the usual barycentric coordinates by
(31a)  
(31b) 
where is the position vector, is the velocity vector and is the gravitational parameter. Define two functions and by
(32)  
(33) 
where is called the angular momentum vector and is called the LaplaceRungeLenz vector. It is known that both and are first integrals of the twobody dynamics (31) and they are orthogonal to each other, i.e.,
for all . The energy function
satisfies
(34) 
for all , implying that the energy is also a first integral of the twobody dynamics (31). It is also known that a nondegenerate elliptic Keplerian orbit is uniquely determined by a pair that satisfies , and [2].
Fix a nondegenerate elliptic Keplerian orbit, i.e., a pair of vectors that satisfies
Define a function by
(35) 
for , where and . Notice that
which is the nondegenerate Keplerian elliptic orbit whose angular momentum vector and LaplaceRungeLenz vector are and , respectively.
Lemma 3.5.
The following lemma shows that the function defined in (35) satisfies assumptions A1 and A2 stated in §2.
Lemma 3.6.
1. The function satisfies
2. For any number satisfying
(36) 
the set is a compact set in .
Proof.
The first fact is a straightforward calculation using the previous Lemma. For the second, the essential idea is that the fibers of are homeomorphic to circles, corresponding to the elliptic orbits, and are therefore compact. For a detailed proof of the second statement, refer to Corollary 2.2 in [2]. ∎
Lemma 3.7.
For any number satisfying (36) the set of all critical points of in is equal to .
Proof.
Choose an arbitrary number satisfying (36). Let be an arbitrary critical point of in . For notational convenience, let us write
suppressing the dependence on . By Lemma 3.5, the critical point satisfies
(37a)  
(37b) 
If , then , contradicting . Hence, , which together with (32) implies that the three vectors form a basis for . The dot product of (37b) with yields
so there are numbers and such that
(38) 
Substitution of (38) into (37) gives