Second-order variational equations for -body simulations
First-order variational equations are widely used in -body simulations to study how nearby trajectories diverge from one another. These allow for efficient and reliable determinations of chaos indicators such as the Maximal Lyapunov characteristic Exponent (MLE) and the Mean Exponential Growth factor of Nearby Orbits (MEGNO).
In this paper we lay out the theoretical framework to extend the idea of variational equations to higher order. We explicitly derive the differential equations that govern the evolution of second-order variations in the -body problem. Going to second order opens the door to new applications, including optimization algorithms that require the first and second derivatives of the solution, like the classical Newton’s method. Typically, these methods have faster convergence rates than derivative-free methods. Derivatives are also required for Riemann manifold Langevin and Hamiltonian Monte Carlo methods which provide significantly shorter correlation times than standard methods. Such improved optimization methods can be applied to anything from radial-velocity/transit-timing-variation fitting to spacecraft trajectory optimization to asteroid deflection.
We provide an implementation of first and second-order variational equations for the publicly available REBOUND integrator package. Our implementation allows the simultaneous integration of any number of first and second-order variational equations with the high-accuracy IAS15 integrator. We also provide routines to generate consistent and accurate initial conditions without the need for finite differencing.
keywords:methods: numerical — gravitation — planets and satellites: dynamical evolution and stability
Calculating the orbital motion of planets and predicting the position of planets in the night sky is one of astronomy’s oldest reoccurring tasks. Today this is considered a solved problem, a simple application of Newtonian physics. Typically the dynamical system is solved by numerically integrating forward in time using an -body integrator. Different techniques are available to do this very accurately over both short (see e.g. Rein & Spiegel, 2015) and long (see e.g. Wisdom & Holman, 1991; Rein & Tamayo, 2015) timescales.
But often the solutions of even very simple dynamical systems are complex, in some cases exhibiting chaos. This means that small perturbations in the initial conditions lead to exponentially diverging solutions at late times. The solar system is one such chaotic dynamical system (Roy et al., 1988; Sussman & Wisdom, 1988; Laskar & Gastineau, 2009). One way to characterize chaotic systems is to numerically determine the Maximal Lyapunov characteristic Exponent (MLE), which measures the rate of exponential divergence between neighbouring trajectories in phase space.
Calculating how particle trajectories vary with respect to their initial conditions is therefore an important numerical task in modern celestial mechanics. It is also immediately relevant to orbital fitting and optimization. For example, when fitting an -body simulation of a planetary system to data, one might want to calculate the derivative of the value with respect to a planet’s initial orbital eccentricity.
The MLE, or more generally any derivative with respect to initial conditions, can be calculated by running a separate -body simulation with shadow particles, where the initial conditions of one or more particles are slightly perturbed. Measuring how fast the distance in phase space of the shadow particles with respect to their unperturbed counterparts grows then yields the MLE (Benettin et al., 1976).
However, it is well known that there are problems associated with this shadow-particle method (Tancredi et al., 2001). On the one hand, we want to start the shadow particles close so that we obtain a local measure of the divergence of trajectories, and so that as the paths begin to drift apart, there are several decades over which to characterize the rate of divergence. On the other hand, the closer we put the shadow particles, the more digits we lose to numerical roundoff error. One workaround is to periodically rescale the separation vectors to keep shadow particles nearby their real counterparts (see, e.g., Sec. 9.3.4 of Murray & Dermott, 2000). However, we show in Sec. 5.4 that the use of shadow particles requires problem-dependent fine-tuning and that the problem is exacerbated when computing higher-order derivatives.
Luckily, instead of integrating a separate simulation of shadow particles, one can also use variational equations to measure divergences. Rather than differencing two nearly equal trajectories, one instead derives a new linearized dynamical system that directly evolves the small distance between two initially offset particles. These variational equations are scale-free and circumvent the numerical pitfalls associated with the shadow-particle method (Tancredi et al., 2001).
First-order variational equations have been widely discussed and applied in the literature (e.g., Mikkola & Innanen, 1999; Tancredi et al., 2001; Cincotta et al., 2003). In this paper we derive second-order variational equations for the -body problem for the first time. These provide the second derivatives of the solution with respect to the initial conditions. Although mathematically straightforward to calculate, the number of terms and therefore the complexity rises significantly. As we will see below, some terms involve 7 different (summation) indices.
Our work opens up many new opportunities for a variety of applications. Perhaps most importantly, it is now straightforward to implement derivative-based optimization methods. While the first derivatives provide a gradient that yields the direction towards a local minimum on a landscape, the second derivatives provide the scale for how far one should move to reach the minimum. This can, among other things, dramatically improve fitting algorithms for radial velocity and transit planet searches, posterior estimation using Markov Chain Monte Carlo (MCMC) methods and even spacecraft trajectory optimization.
We begin in Sec. 2 with a formal introduction to variational equations that generalizes to higher order. In Sec. 3 we specialize to the case we are interested in, the -body problem. For completeness, in Sec. 3.1 we rederive the first-order variational equations for the -body problem. We then go one step further in Sec. 3.2 and derive the second-order variational equations.
We have implemented the second-order variational equations within the REBOUND package and make them freely available. REBOUND is a very modular -body package written in C99 and comes with an optional python interface. We have abstracted the complexity of higher order variations significantly, and summarize our adopted syntax in Sec. 4. Obtaining consistent initial conditions for variational equations in terms of Keplerian orbital elements without relying on finite difference is non-trivial. We have therefore also implemented several convenience methods for this purpose.
In Sec. 5 we demonstrate how second-order variational equations and Newton’s method can be used to fit observational data to a dynamical -body model. Finally, we compare variational and finite-difference methods in Sec. 5.4, and conclude in Sec. 6 by outlining the next steps in using higher order variational equations efficiently in optimization problems and MCMC methods.
2 Derivation of differential equations
2.1 Variational Equations
In this section, we define what we mean by variational equations and introduce our notation. We follow the work of Morales-Ruiz et al. (2007) and start with an analytic differential equation of the form
In the case that we are interested in later, encodes the 3 position and 3 velocity coordinates for each particle. is then a vector field on . The dot represents a time derivative. Given a suitable set of initial conditions , an -body simulation allows us to calculate (or at least approximate) the solution to Eq 1. We denote this solution , a dimensional vector that depends on the initial conditions and time .
Our goal is to estimate the solution vector for different initial conditions, i.e. we want to approximate .
One way to do that is to simply solve the differential equation in Eq. 1 with the new initial conditions .
However, depending on the problem, finding the new solution with an -body integration can be either very inefficient or inaccurate
Thus, we are looking for a better way to estimate solutions for the initial conditions in a neighbourhood of . The approach we consider here uses the fact that one can expand around a reference solution in a power series. For simplicity, we first consider the case of varying a single scalar parameter on which the initial conditions depend, . If , then . This could correspond to varying a Cartesian component of , or a parameter that mixes Cartesian components, such as a planet’s orbital eccentricity. Then each component of can be expanded around the reference solution as a power series in ,
In the above equation, , i.e. the reference solution, and
i.e. a vector of the -th derivative of each of the reference solution’s components with respect to the parameter . For sufficiently small and , this approximation is accurate even if we terminate the series at a finite . The precise domain on which the solution can be trusted depends on the system and the initial conditions. For example, in chaotic dynamical systems, might grow exponentially fast, limiting the domain to relatively short times or small .
In conclusion, if one can obtain the , one can approximate all nearby solutions of . Each is a function of time and must be numerically integrated. We therefore seek their governing differential equations.
We henceforth denote the reference solution simply as . The solution , by definition, satisfies the original Eq. 1, in other words . We now take the derivative of this equation with respect , the parameter we are varying,
where we changed the order of the derivatives and made use of the chain rule. The summation index runs over all elements of the vector . The derivative of with respect to is the we seek for use in Eq. 2. Let us define the by matrix with components
Using the matrix we then arrive at a compact set of differential equations for the vector :
This equation is the first-order variational equation. We will later calculate the components of the matrix explicitly. We then solve for the vector by integrating the differential equation numerically. Note that depends on the time-dependent reference solution but not on . It is a linear operator acting on .
Repeating the steps above but differentiating Eq. 1 twice instead of once, we can write down the differential equation for . Because we apply the chain rule in the process, one finds that the time derivative of the second-order variations depends not only on but also on . Explicitly, the differential equation after two derivatives becomes
Defining the by by tensor with components
and using a short hand notation that suppresses the summation indices as well as the function arguments (we give explicit component forms for the general case in Sec. 2.3), we have
This set of equations is not linear anymore. But note that the linear term of the second line is the same as in the first line.
Higher order equations can be constructed in a straightforward way. Using the shorthand notation makes this particularly easy. One can reintroduce the indices at the end of the calculation. In this paper, we will only use variational equations up to second order (see e.g. Morales-Ruiz et al., 2007, for equations up to order 3).
2.2 Initial conditions
To integrate a differential equation forward in time, one needs appropriate initial conditions. To obtain the initial conditions for and , one simply applies the chain rule to Eq. 3 and evaluates it at ,
In the case where the varied parameter corresponds to a Cartesian component, choosing the initial conditions for and is straightforward. If we assume the varied parameter has the coordinate index , then the initial conditions for in component form are
where is the Kronecker delta. Thus
In practice, the function can be very complicated. As an example, let us consider a planetary system with one planet of mass on an initially circular and coplanar orbit around a star with mass . The initial conditions of the planet might then be defined through the semi-major axis as
If we vary the initial semi-major axis by some length , then the initial conditions for the first-order variation are given by Eq. 10, in our case
The initial conditions for the second-order variation are given by Eq. 11, which for the present case are
The components of and that we calculated above correspond to the planet. All components corresponding to the star are 0.
The initialization can quickly get complicated. Suppose we work in the centre-of-mass frame. Then the star’s initial conditions will also depend on the semi-major axis of the planet. Similarly, if we add an additional outer planet and work in Jacobi coordinates, the outer planet’s initial conditions depend on the inner planet’s orbital parameters. For that reason we’ve implemented convenience functions for the initialization of orbits which we present later in Sec. 4.3.
2.3 Multiple sets of variational equations
The above derivation of variational equations can be straightforwardly generalized when varying multiple parameters. Consider varying the initial value of separate parameters . Here and in the rest of this paper Greek variables indicate to variations with respect to one parameter and will run over the interval . We write all equations in this section in component form for direct comparison with our later results.
When varying several parameters, the coupled set of differential equations, Eq. 9, becomes
where , and . Therefore, when varying parameters, there are sets of first-order variational equations, one for each of the vectors . There are sets of second-order variational equations (one for each of the vectors ). Each set of variational equations has components.
Once numerically integrated, these variations can then be plugged into a multi-variate power series expansion analog to Eq. 2 to obtain trajectories for arbitrary nearby initial conditions. Explicitly, to second order,
Note that because derivatives commute we find that . Thus the total number of differential equations we need to integrate for the second order variations can be reduced from to . This is in addition to the differential equations for the reference simulation and equations for the first order variations.
2.4 Index convention
As we saw above, the number of indices in second-order expressions is high. We therefore adopt a consistent index notation for the remainder of this paper. Specifically, we will consider a dynamical system consisting of particles and use the indices and to label different particles. These indices thus run from to . The indices and label coordinate axes. Above, these indices ran over the coordinates of the -body system. We will find below that for the -body system, it is simpler to consider positions and velocities separately. Therefore, in what follows and will run over the Cartesian , , and components only. As before, we will also make use of Greek characters and to indicate different sets of variational equations corresponding to different varied parameters (different variations). In the following sections we explicitly write summation symbols, i.e., we do not use a summation convention over repeating indices.
3 Variational Equations for the -body System
Let us now derive the differential equations from above for a specific problem: the dynamical system of gravitationally interacting particles. The differential equation for the -body problem in vector notation is
where and is the norm of . We can also write the equation in component form
where is the relative position between particles and . This is a second-order differential equation. However, note that the first time-derivative of the position is just the velocity, . The differential equation can thus easily be brought into the form of Eq. 1 by introducing
We end up with a first-order differential equation with twice as many variables as Eq. 19 ( compared to ). This set of differential equations together with suitable initial conditions completely describes the -body problem.
3.1 First-order variational equations
To derive the first-order variational equation for the -body problem, we start by differentiating Eq. 20 with respect to . To be as explicit as possible we do this in component form. We end up with an equation with four indices. Two of the indices run over coordinates, and two over particles,
The above expression gives us the matrix elements of (Eq. 5). Note that two indices and combined correspond to the row of that matrix, the other two ( and ) correspond to the column of the matrix.
We also want to consider the influence of varying masses. Note that one can think of the masses as part of the initial conditions. However, we assume that the masses do not vary with time after the system has been initialized, thus . We need the derivative of the force with respect to the mass:
We can now write down the differential equation for the first-order variational equation using Eq. 6. We could do this in terms of and its components, but choose to use two separate vectors for the variational position and velocity components (to be consistent with Eq. 19). Variational quantities are denoted by double-striped symbols , and . First-order variational quantities receive the superscript . Thus, one might write in terms of and :
which should be compared to Eq. 21. The units of and depend on the variation we are considering. In general the units are not the same as those of and (see Eq. 10). We end up with the following set of equations for the components of (corresponding to the second half of the components of ):
where . We can rewrite this in vector notation, which allows us to drop the indices and :
The represent the usual vector product. The equations for the variational positions, (the first half of the components of ), are significantly easier to write down:
As mentioned before, the masses are assumed to be constant throughout a simulation; thus, the variational equation for the mass coordinates are not dynamic:
The solutions are trivial, , and we therefore do not evolve the quantities .
3.2 Second-order variational equations
We now derive the second-order variational equations. As a warning to the reader: this will get messy. We nevertheless present the calculation in full detail as it is easy to get confused with up to 7 indices in a single term. This should ease derivations for alternate dynamical systems, for example if one want to include additional non-gravitational effects. Conceptually, this is the same procedure as in the previous section, just to second order. Because of the chain rule, we end up with significantly more terms.
We begin by calculating various second-order derivatives that we will need later. The second-order derivative of the force with respect to the positions is
We look at both cases individually. The first case, , gives
The second case, , is similar but with the sign reversed and without the summation:
We also need the derivatives with respect to the particles’ masses. Luckily, if we differentiate the force twice with respect to mass, we get zero:
However, other second derivatives involving the mass are not zero:
Restricting ourselves to the case,
such that after putting all cases together we arrive at
With the above expressions of the second order force derivatives, we can now construct the second-order variational equations. At this point we introduce two more indices that describe the variation under consideration, and . They run over all the variations that we want to consider. In vector notation Eq. 9 can be expressed as
We replace the derivatives with what we calculated above. The result is a rather long expression with 7 different (summation) indices: