# Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

###### Abstract

We present a numerical approach for approximating unknown Hamiltonian systems using observation data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We present the technical details of the proposed algorithm and its error estimate, along with a practical de-noising procedure to cope with noisy data. A set of numerical examples are then presented to demonstrate the structure-preserving property and effectiveness of the algorithm.

Key words. data-driven discovery, Hamiltonian systems, equation approximation, structure-preserving method

## 1 Introduction

Data-driven discovery of physical laws has received an increasing amount of attention recently. While earlier attempts such as [2, 35] used symbolic regression to select the proper physical laws and determine the underlying dynamical systems, more recent efforts tend to treat the problem as an approximation problem. In this approach, the sought-after governing equation is treated as an unknown target function relating the data of the state variables to their temporal derivatives. Methods along this line of approach usually seek for exact recovery of the equations by using certain sparse approximation techniques (e.g., [37]) from a large set of dictionaries; see, for example, [4]. Many studies have been conducted to effectively deal with noises in data [4, 33], corruptions in data [38], limited data [34], partial differential equations [30, 32], etc, and in conjunction with other methods such as model selection approach [21], Koopman theory [3], and Gaussian process regression [26], to name a few. Standard approximations using polynomials without seeking exact recovery can also be effective (c.f. [41]). More recently, there is a surge to tackle the problem using machine learning methods, particularly via neural networks ([27, 28]), to systems involving ordinary differential equations (ODEs) [10, 29, 31, 24]) and partial differential equations (PDEs) [22, 20, 14, 12, 25, 19]. Neural network structures such as residual network (ResNet) were shown to be highly suitable for this type of problems ([24]).

Hamiltonian system is an important class of governing equations in science and engineering. One of the most important properties of Hamiltonian systems is the conservation of Hamiltonian, which is usually a nonlinear function of the state variables, along trajectories. Research efforts have been devoted to estimate the Hamiltonian function of a given system from measurements c.f., [36, 40, 11, 1, 18]. However, few studies exist to reconstruct an unknown Hamiltonian dynamical systems from trajectory data of state variables.

In this paper, we present a numerical approach to reconstruct an unknown Hamiltonian system from its trajectory data. The focus of this paper is on the conservation of the reconstructed Hamiltonian along the solution trajectories. The current method is an extension of the method proposed in [41], which seeks accurate approximation of unknown governing equations using orthogonal polynomials. However, instead of approximating the governing equations directly, as in [41] and most of other existing studies, our current method seeks to approximate the unknown Hamiltonian first and then reconstruct the approximate governing equations using the approximate Hamiltonian. The approximation of the unknown Hamiltonian is conducted using orthogonal polynomials and with controllable numerical errors. Since in most practical situations, the Hamiltonian takes the form of smooth functions, polynomial approximation can achieve high order accuracy with modest degree of approximations. The resulting approximate governing equations, which are derived from the reconstructed Hamiltonian, can then automatically satisfy the conservation of the reconstructed Hamiltonian, which is an accurate approximation of the true Hamiltonian. This structure preserving (SP) property—the conservation of Hamiltonians along trajectories—is a distinctly new feature of our present method, not found in most of the existing studies. Along with a detailed exposition of the algorithm, we also provide an error estimate of the method and use a set of numerical examples to demonstrate the properties of the method.

## 2 Preliminaries

In this section, we introduce some basics about Hamiltonian systems and the setup of our data-driven numerical approximation to the Hamiltonian systems.

### 2.1 Hamiltonian Systems

Let us consider a Hamiltonian system

\hb@xt@.01(2.1) | ||||

where and are column vectors in , and is a continuously differentiable scalar function called the Hamiltonian. It often represents the total energy of the system.

Let be the state variable vector. The Hamiltonian system (LABEL:eq:HamPQ) can be equivalently written as

\hb@xt@.01(2.2) |

where stands for full gradient and the matrix takes the form

\hb@xt@.01(2.3) |

with and being identity matrix and zero matrix of size , respectively. Hereafter, we will use in place of , unless confusion arises otherwise.

The Hamiltonian system (LABEL:eq:Ham) is an autonomous system and conserves the Hamiltonian along the integral curves (c.f., [23]). That is, the solution of the Hamiltonian system (LABEL:eq:Ham) satisfies

where is the initial state of the system at , and stands for the solution at time with an initial state .

### 2.2 Data and Problem Setup

We assume that the equations of the Hamiltonian system (LABEL:eq:HamPQ), or (LABEL:eq:Ham), are not known. Instead, we have data of the solution states, along various trajectories at various time instances. Our goal is to accurately reconstruct the form of the governing equations, while enforcing the conservation of the Hamiltonian at the same time.

Let be the number of trajectories and , be the corresponding initial states of these trajectories. Let , , be a sequence of time instances on the trajectories. We assume that the state variables data are available on these time instances, i.e.,

\hb@xt@.01(2.4) |

where are observational noises. That is, we have data of the state variables along trajectories, each of which contain data points. The total number of data is therefore .

Let be a bounded domain. It is a domain-of-interest, inside which we seek to construct an accurate approximation to the unknown governing equations (LABEL:eq:Ham). We assume that all trajectory data are inside . For notational convenience and without loss of generality, hereafter we assume that the observation data are available at the same time sequences for all trajectories, with a constant time step . This implies , for . We denote the data as

\hb@xt@.01(2.5) |

The total number of data points is thus .

### 2.3 De-noising and Time Derivatives

A common key ingredient in equation recovery methods is the requirement of accurate data on time derivatives of the state variables. When these time derivatives are not directly available, as in many practical situations, one needs to accurately estimate them using the state variable data (LABEL:x).

For noiseless data, their time derivatives can be computed by a proper numerical differentiation method. For example, a second-order finite difference

\hb@xt@.01(2.6) |

with proper one-sided second-order finite difference at the boundaries and .

For noisy data, direct numerical differentiation becomes less robust, as the error in scales as . Several approaches were developed for numerical differentiation of noisy data; see, for example, [15, 39, 9, 5, 16]. In this paper, we employ a straightforward approach to de-noise the data. We first construct a least squares polynomial approximation of the trajectory data and then conduct analytical differentiation to the least squares fitted polynomial. This approach has been shown to be effective for equation recovery ([41]). More specifically, let us consider the -th trajectory data (LABEL:x), where . The least squares polynomial approximation is to find a polynomial vector such that

\hb@xt@.01(2.7) |

where denotes the vector 2-norm, and denotes the space of polynomials of degree at most , with . Once the least squares fitting problem is solved, the time derivatives of the state variables can be approximated as the analytical derivatives of the polynomials,

\hb@xt@.01(2.8) |

This approach also provides a filter to denoise the noisy trajectory data. For noisy data (LABEL:x), we advocate the use of the filtered trajectory data to replace the original noisy data, i.e.,

\hb@xt@.01(2.9) |

Our numerical experiments indicate that this filtering procedure can improve the learning accuracy when the trajectory data (LABEL:x) are noisy.

Once the time derivatives of the state variables become available, we have in our possession a set of data pairs

\hb@xt@.01(2.10) |

where total number of data pairs is

## 3 The Main Method

With the data pairs (LABEL:pairing), our goal is now to accurately approximate the unknown Hamiltonian system (LABEL:eq:Ham). Let , which is the unknown in (LABEL:eq:Ham). We seek an accurate approximation such that

\hb@xt@.01(3.1) |

is an accurate approximation of the true system (LABEL:eq:Ham). Our key goal is to ensure the approximate system is also Hamiltonian, in the sense that , where becomes an approximation to the true (and unknown) Hamiltonian. The existing methods for equation recovery seek to approximate the right-hand-side of the true system directly and therefore do not enforce the conservation of Hamiltonian.

### 3.1 Algorithm

To preserve the Hamiltonian, we propose to directly approximate the unknown Hamiltonian first and then derive the approximate governing equations from the approximate Hamiltonian.

Let us assume the unknown Hamiltonian , which is a weighted Sobolev space on domain equipped with inner product

where is a probability measure defined on . We also assume

\hb@xt@.01(3.2) |

Let be a finite dimensional subspace. We then define its associated gradient function space of as

\hb@xt@.01(3.3) |

Let be a basis for . Then, for each , there exists a function such that

\hb@xt@.01(3.4) |

We then seek as an approximation to the true Hamiltonian and as an approximation to . Assume that , i.e., the dimension of the linear subspace is smaller than the total number of available data pairs (LABEL:pairing), we then define the following least squares problem

\hb@xt@.01(3.5) |

where denotes the vector 2-norm.

With the basis (LABEL:psiphi), can be expressed as

\hb@xt@.01(3.6) |

and the approximate Hamiltonian can then be expressed as

\hb@xt@.01(3.7) |

where is an arbitrary constant. The problem (LABEL:LSQ) is then equivalent to the following problem for the unknown coefficients

\hb@xt@.01(3.8) |

where

\hb@xt@.01(3.9) |

with

This is an over-determined system of equations and can be readily solved. Upon solving this least squares type problem, we obtain and subsequently , which gives us the approximate system of equations (LABEL:appsystem). It is trivial to see that the system preserves the approximate Hamiltonian in the following sense.

###### Theorem 3.1

Let be the solution of the system (LABEL:appsystem) with initial state , then,

### 3.2 Analysis

We now present error analysis for the proposed algorithm. Our analysis utilizes a few basic results from [7] for least squares polynomial approximations. For clarity of the analysis, we assume the basis functions are orthonormal in the following sense

Note that this assumption is only needed for the theoretical analysis. The practical computation of can be conducted by using any basis of , for the solution does not depend on the basis. (Also, any non-orthogonal basis can be orthogonalized by using the Gram-Schmidt procedure.)

For ease of notation only, we assume each trajectory contains only one data pair, i.e., , in (LABEL:pairing). The total number of data pairs is therefore , and (LABEL:pairing) is written in the following shortened notation

In the following, we assume that the initial states , , are i.i.d. drawn from a probability measure on .

#### 3.2.1 Stability

The following stability result holds for the least squares problem (LABEL:LSQc).

###### Lemma 3.2

Consider the problem (LABEL:LSQc), it holds that, for ,

\hb@xt@.01(3.10) |

where , and

\hb@xt@.01(3.11) |

Proof. The proof is a direct extension of the proof of Theorem 1 in [7] (see also [8] for a correction).

###### Remark 3.1

The function is the “diagonal” of the reproducing kernel of . It is independent of the choice of the orthonormal basis and only depends on the space .

The following result is a direct consequence of Lemma LABEL:thm:Prob with .

###### Corollary 3.3

The least squares problem (LABEL:LSQc) is stable in the following sense: for any ,

\hb@xt@.01(3.12) |

provided that

\hb@xt@.01(3.13) |

#### 3.2.2 Error bound

To analyze errors in the proposed algorithm, we consider only noiseless data case. For noisy data, the analysis is considerably more involved and will be pursued in a separate study.

For noiseless data, we have for all in (LABEL:pairing). That is, the trajectory data are exact. The derivatives in (LABEL:pairing) thus contain only the errors from their numerical approximation. Here we assume such approximation errors are uniformly bounded in ,

\hb@xt@.01(3.14) |

where only depends on the regularity of in the time interval and the accuracy of the numerical differentiation method.

###### Theorem 3.4

For any , under the condition (LABEL:condition1), it holds that

where the expectation is taken over the random sequences of , is defined in (LABEL:condition1), is the bound defined in (LABEL:H_bound), is defined by

and denotes the orthogonal projector of onto , i.e., the best approximation to in ,

Proof. See Appendix LABEL:app:proofthm2.

As a direct consequence of Theorem LABEL:thm:error1, we have the following corollary.

###### Corollary 3.5

Assume that Then, for any , under the condition (LABEL:condition1), the following result holds,

We now discuss the error bound for the reconstructed Hamiltonian

\hb@xt@.01(3.15) |

###### Theorem 3.6

Assume is a bounded connected open subset of with Lipschitz boundary and let . Then, there exists a real constant such that

\hb@xt@.01(3.16) |

where is a constant depending only on the domain and the dimensionality . Furthermore, under the assumptions of Corollary LABEL:thm:error2, we have

Proof. Let us take

Using the Poincaré inequality (cf. [17]) we obtain (LABEL:eq12). The proof is then completed upon combining (LABEL:eq12) with Corollary LABEL:thm:error2.

## 4 Numerical Examples

In this section we present numerical examples to demonstrate the properties and effectiveness of the proposed method.

In all the test cases, we generate synthetic trajectory data by solving certain Hamiltonian systems using a high resolution numerical solver. The proposed numerical method is then applied to the data to produce the corresponding approximate Hamiltonian systems, whose solution are then compared against that by the true Hamiltonian systems to examine numerical errors.

Without loss of generality, we employ polynomial basis functions in all the numerical examples. Specifically, we set the finite dimensional subspace as , the linear subspace of polynomials of total degree up to . That is,

where is multi-index with . In all examples, we use Legendre polynomials.

The gradient function space is defined via (LABEL:eq:gradspaceV), and we have

The basis functions of are set as , , where are the Legendre polynomials in .

For noiseless data, we employ second-order finite difference method to estimate the time derivatives. For noisy data, we use the polynomial least squares de-noising (LABEL:eq:LS) with a polynomial degree of .

Once the approximate system (LABEL:appsystem) is constructed, we simulate its trajectories for some arbitrarily chosen initial state and compare the errors against the trajectories produced by the exact Hamiltonian system from the same initial state . All errors are reported as relatively errors in the following form

The evolution of the learned Hamiltonian will be also compared with that of the true Hamiltonian with the free constant taken as .

#### Example 1: Single pendulum

The Hamiltonian of the ideal single pendulum with unit mass is its total energy

where is the length of the pendulum, is the angular displacement of the pendulum from its downward equilibrium position, the angular momentum, and the gravitational constant. The true Hamiltonian formulation of the dynamics is

\hb@xt@.01(4.1) |

We set and the computational domain . The data pairs (LABEL:pairing) consist of short trajectories, each of which is generated by random initial state in and contains steps. All data are then perturbed by a multiplicative factor , where is i.i.d. uniform distributed in . This corresponds to relative noises in all data.

The Hamiltonian is approximated with polynomials of degree up to . The numerical solution of the approximate Hamiltonian system is denoted as . To assess the accuracy of the algorithm, we set an arbitrarily chosen initial state and solve both the approximate solution and the exact solution . For cross-comparison, we also implemented the equation approximation algorithm from [41], which directly approximates the right-hand-side of the unknown governing equations and thus does not preserve Hamiltonian. We denote this solution as .

In Fig. LABEL:fig:ex1_HErrU(a), we plot the evolution of the relative errors in the numerical solutions. We clearly observe that the errors in our structure-preserving (SP) algorithm is notably smaller than the non-SP algorithm from [41]. In Fig. LABEL:fig:ex1_HErrU(b), we examine the time evolution of the Hamiltonians. The exact Hamiltonian is obviously conserved along the exact solution trajectory, i.e., . The approximate Hamiltonian is observed to be exactly preserved along the approximate trajectory. However, the method from [41], albeit quite accurate, does not preserve any Hamiltonian. In fact, the method does not relate to any particular approximate Hamiltonian.

The advantage of the new SP algorithm is more notable in Fig. LABEL:fig:ex1_U, we present system predictions over longer time. The new SP method is able to accurately capture the phase of the solution much better than the non-SP method.

Note that the de-noising procedure (LABEL:filter) has been applied in the computation. To assess the effectiveness of this procedure, we apply the same SP learning method but without the de-noising procedure. The results are plotted in Fig. LABEL:fig:ex1_U_NoF. It is evident that the results are less accurate than those obtained by using de-noising procedure, shown in Fig. LABEL:fig:ex1_U.

#### Example 2

We now consider the following Hamiltonian system

\hb@xt@.01(4.2) |

whose Hamiltonian is

We set the parameters and and the computational domain as . We use noiseless short trajectory data, each of which contains intervals (i.e. 3 data points). The degree of the polynomials for approximation of the Hamiltonian is . The reconstructed system is solved with an initial state and compared against the solution of the true system. The relative numerical errors in the solutions of the SP algorithm (denoted as ) and non-SP algorithm from [41] (denoted as ) are plotted in Fig. LABEL:fig:ex2_HErrU, along with the time evolution of the reconstructed Hamiltonian. The higher accuracy of the SP algorithm is again evident from the plot, as it induces smaller errors over long-term integration and preserves the approximate Hamiltonian along its trajectory. In Fig. LABEL:fig:ex2_U and Fig. LABEL:fig:ex2_Phase, we present the trajectories and the phase plots generated by the reconstructed system. The advantage of the new SP algorithm is again notable, as it is able to preserves both the phase and amplitude of the solution much better over long-term integration.

#### Example 3: Hénon-Heiles problem

We now consider the Hénon-Heiles system [13],

\hb@xt@.01(4.3) |

where the Hamiltonian is

This system is used to describe the motion of stars around a galactic center. Chaotic behavior of the solution will appear when the Hamiltonian is larger than ([13]). For our numerical tests, we choose the computational domain to be and employ trajectories, each of which contain intervals. Polynomials of degree up to are used to approximate the Hamiltonian. The reconstructed system is solved with an initial state and compared against the true solution. The time evolution of the reconstructed Hamiltonian and the numerical error in the solution are plotted in Fig. LABEL:fig:ex3_HErrU. We observe sufficiently small and stable numerical errors and good conservation of the Hamiltonian over relatively long-term integration.

In Fig. LABEL:fig:ex3_U and Fig. LABEL:fig:ex3_Phase, the trajectory plots and phase plots for the reconstructed system using the new SP algorithm are presented, along with those from the true system as reference. The solutions exhibit non-trivial behavior. And the reconstructed system is able to accurately produce the solutions.

#### Example 4: Cherry problem

We now consider the Cherry Hamiltonian system [6],

\hb@xt@.01(4.4) |

whose true Hamiltonian is

The system is known to be linearly stable but nonlinearly unstable.

We take the computational domain as and use short trajectory data, each of which contain intervals. Polynomials of degree up to are employed to approximate the Hamiltonian. The reconstructed system is then solved using an arbitrarily chosen initial state . The solutions are then compared against those from the true system with the same initial state. The relative errors in the numerical prediction are plotted in Fig. LABEL:fig:ex4_HErrU, along with the time evolution of the reconstructed Hamiltonian and the true Hamiltonian . We again observe good accuracy by the SP algorithm and conservation of the approximate Hamiltonian. The solution states are plotted in Figs. LABEL:fig:ex4_U, and their phase plots in LABEL:fig:ex4_Phase. The numerical solutions agree with the true solutions well.

#### Example 5: Double pendulum

Finally, we consider a double pendulum problem, as illustrated in Fig. LABEL:fig:ex5_DIAG.

Two masses and are connected via massless rigid rods of length and , and and are the angles of the two rods with respect to the vertical direction. We define the canonical momenta of the system as

By letting and , the Hamiltonian of the system is

where is the gravitational constant. The governing equations of the system are

\hb@xt@.01(4.5) |

where

In the numerical experiment, we set and , and set the computational domain as . The Hamiltonian in this example is notably more complicated than the ones in the previous examples. Consequently, we employ a higher order polynomial, of degree up to , to conduct the approximation. The data set include short trajectories, each of which contains intervals. The reconstructed Hamiltonian system is then solved with an arbitrarily chosen initial state for up to . Its solution is compared against the reference solutions from the true system with the same initial state. The relative numerical errors are plotted In Fig. LABEL:fig:ex5_HErrU, along with the evolution of the Hamiltonian. We observe good agreement with the true solution. The time evolution of the solution states are plotted in Fig. LABEL:fig:ex5_U, and the corresponding phase plots in Fig. LABEL:fig:ex5_Phase. Again, we observe good agreement with the true solution.

## 5 Conclusion

We presented a structure-preserving numerical method for reconstructing unknown Hamiltonian systems using observation data. The key ingredient of the method is to approximate the unknown Hamiltonian first and then derive the approximate equations using the reconstructed Hamiltonian. By doing so, the reconstructed system is able to preserve the approximate Hamiltonian along its trajectories. This is an important property often desired by many practical applications. We presented the algorithm, its error estimate and used a variety of examples to demonstrate the effectiveness of the approach. In its current form, polynomials are used to construct the approximation. Other forms of approximation, such as neural networks, will be explored in a separate work.

## A Proof of Theorem LABEL:thm:error1

Proof. Let be the probability measure of the random sequence . Let be the set of all possible draws, which is divided into the set of all draws such that

\hb@xt@.01(A.1) |

and the complement set . We consider the following splitting

\hb@xt@.01(A.2) | ||||

We now estimate the upper bounds for and .

Let us first consider . Based on Corollary LABEL:coro:Prob and under the condition (LABEL:condition1), we have

\hb@xt@.01(A.3) |

Note that

Therefore, we have

\hb@xt@.01(A.4) |

We now consider . For every , if , then

so that

For almost every with respect to , if , then

which implies