Generalized Schwarzschild’s method

Generalized Schwarzschild’s method

Mir Abbas Jalali and Scott Tremaine
Sharif University of Technology, Postal Code: 14588-89694, Azadi Avenue, Tehran, Iran
School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, NJ 08540, U.S.A. (MAJ) (ST)

We describe a new finite element method (FEM) to construct continuous equilibrium distribution functions of stellar systems. The method is a generalization of Schwarzschild’s orbit superposition method from the space of discrete functions to continuous ones. In contrast to Schwarzschild’s method, FEM produces a continuous distribution function (DF) and satisfies the intra element continuity and Jeans equations. The method employs two finite-element meshes, one in configuration space and one in action space. The DF is represented by its values at the nodes of the action-space mesh and by interpolating functions inside the elements. The Galerkin projection of all equations that involve the DF leads to a linear system of equations, which can be solved for the nodal values of the DF using linear or quadratic programming, or other optimization methods. We illustrate the superior performance of FEM by constructing ergodic and anisotropic equilibrium DFs for spherical stellar systems (Hernquist models). We also show that explicitly constraining the DF by the Jeans equations leads to smoother and/or more accurate solutions with both Schwarzschild’s method and FEM.

stellar dynamics, methods: numerical, galaxies: kinematics and dynamics, galaxies: elliptical

1 Introduction

For over three decades, Schwarzschild’s (1979) orbit superposition method has been one of the most important numerical tools for modelling the equilibrium states of spherical (Richstone & Tremaine, 1984), axisymmetric (Thomas et al., 2004) and triaxial galaxies (van den Bosch et al. 2008 and references therein). Schwarzschild’s method constructs discrete phase-space distribution functions (DF) and works even if the gravitational potential supports chaotic orbits (e.g., Capuzzo-Dolcetta et al., 2007). Observational constraints can also be incorporated, in particular on the surface brightness or the line-of-sight velocity distributions. Schwarzschild’s basic assumptions were: (i) the amount of mass contributed by an orbit to a cell/element in the configuration space is proportional to the fraction of time spent by that orbit inside the element; (ii) the matter density and velocity distribution inside each element is constant; (iii) the DF is non-zero only on a subset of phase space with measure zero (except perhaps in some cases where the potential admits large-scale chaos). In particular if all orbits are regular the DF is discrete, i.e., non-zero only at a finite set of positions in action space.

Despite its central role in modelling galaxies, Schwarzschild’s method has several shortcomings. (i) There is no mathematical proof that increasing the number of elements and orbits in this scheme will guarantee the convergence of the coarse-grained DF to a smooth function. (ii) In practice, Schwarzschild models often converge rather slowly, in part because of the inverse square-root singularity in the density contributed by an orbit near a turning point. (iii) Working with discrete DFs is not ideal if we require their derivatives for linear stability analysis, or use them to set initial conditions for -body simulations.

The majority of these limitations can be removed if one extends the method from the space of discrete functions to a more general continuous class. This is a problem in galactic dynamics whose solution is overdue and we aim to solve it using a modified finite element method (FEM).

The mathematical principles of FEM are rather simple. Assume a general governing equation


for the physical quantity with being a partial integro-differential operator, and seek the solutions in terms of the coordinates x and the time . An approximate solution of (1) may be expanded as the series


where constitute a complete set of basis functions that satisfy any given boundary conditions. Substituting from (2) into (1), multiplying both sides of the resulting equation by , and integrating over the x domain results in


This is called the weighted residual form, weak form, or Galerkin projection of the governing equation (1), from which one may compute the unknown functions . Here, the basic assumption is that if the equations (3) are satisfied for all then equation (1) is also satisfied to an adequate degree of approximation.

It is, however, a non-trivial and sometimes impossible task to find suitable basis functions that (i) satisfy all boundary conditions, (ii) make a complete set, (iii) do not contribute noise when the solution is changing rapidly. Moreover, the integration over the spatial variables is expensive if the domain of each basis function is the entire x-space. One can overcome these difficulties by dividing the x-space into finite elements. The union of elements, each of volume , is the entire x domain. Each element contains nodes on its boundaries or in its interior. The function is approximated inside the th element as a weighted sum of smooth shape functions ; these are defined only within element and are zero at all nodes except node of bin , where they are unity. The weights are time-dependent and are chosen to fit the unknown value of at the nodes. The determining equations of the weights are


This procedure, which leaves us with a system of ordinary integro-differential equations for the weights (the nodal values of ) is called the FEM (Zienkiewicz, Taylor & Zhu, 2005).

For example, when the operator has the form with a linear operator, the weighted residual form takes a simple matricial form


where the matrix is the projection of the operator and is a forcing vector. The vector contains nodal values of . Therefore, the combination of finite elements and Galerkin projection reduces an infinite-dimensional partial differential equation to a finite-dimensional system.

A formulation of FEM for stellar systems was presented in Jalali (2010), where the perturbed collisionless Boltzmann equation (CBE) was reduced to a form like (5) and solved over a range of finite ring elements in the configuration space. That analysis, however, cannot be directly used to construct equilibrium DFs because they are not unique: according to the Jeans theorem, any distribution function that depends on phase-space coordinates only through the integrals of motion I is an equilibrium solution of the CBE. Consequently the local variation of in the I-space is free. This implies non-uniqueness and can admit discrete, piece-wise continuous, continuous, and differentiable solutions.

In this paper we develop a general method to build numerical DFs that exhibit nice properties of local differentiability and global continuity. After defining the problem in §1.1, in §2 we discuss finite elements, interpolation functions, and their properties both in the configuration and action spaces. In §3, we obtain the Galerkin projections of velocity moments, and the continuity and Jeans equations. Schwarzschild’s method is derived as a special case of FEM in §4 and finite element models of spherical systems are discussed in §5. We apply the FEM to the spherical Hernquist model in §6 and §7 contains a discussion of our results.

1.1 Equilibrium stellar systems

The DF of a collisionless system in dynamical equilibrium depends on the position and the velocity vectors only through the integrals of motion.

Integrable systems

The Hamilton–Jacobi equation is separable in spherical systems, razor-thin axisymmetric discs, and triaxial systems in which the potential is of Stäckel form. Orbits in these systems are regular, and can be represented using a suitable action vector and its associated angle variables . The Hamiltonian function depends only on the action vector and the evolution of the angle variables is linear in time:


where is the vector of orbital frequencies. The actions are isolating integrals of motion, so any DF of the form defines a possible equilibrium stellar system. For separable systems, the actions can be computed by quadratures. In this study, we focus on building equilibrium models of stellar systems with integrable potentials.

Non-integrable systems

Generic axisymmetric or triaxial potentials are not integrable. There are surviving invariant tori of regular orbits (cf. KAM theory) but these are separated by chaotic layers. If the chaotic layers are narrow (the potential is ‘nearly integrable’) the DF may be assumed to be zero in the chaotic phase subspace and written as a function of the actions in the regular phase subspace. However, the actions must then be calculated either using canonical perturbation theory or by generating a frequency map of the system (Laskar, 1990; Hunter, 2002; Binney & Tremaine, 2008).

A simpler and more powerful approach, which can be used even if the potential is far from integrability, is to express the integrals of motion in terms of initial conditions of orbits (Schwarzschild, 1979). Thus let be the position and velocity of the trajectory through on some global surface of section through which all orbits must pass (e.g., a symmetry plane of a triaxial potential). Then any DF of the form defines an equilibrium stellar system.

1.1.1 Moments of the DF

A collisionless system with a given density function is a possible equilibrium if one can find a DF so that


We shall also sometimes use the first- and second-order velocity moments:


In an equilibrium system these are related by the steady-state continuity equation


and Jeans equations


with being the potential. In systems with spherical symmetry only the radial and tangential velocity dispersions matter and three Jeans equations reduce to one.

We shall argue below that including constraints based on the continuity and Jeans equations can significantly improve the accuracy of both Schwarzschild and FEM models of stellar systems.

2 Finite elements in configuration and action space

We assume that the configuration space has been split into elements, each of nodes, and that the density is known at the nodal points. Inside each element, the density function can be approximated by suitable interpolation (shape) functions. Denoting as the functional form of the density inside the th element, one may write


The density at the th node of the th element has been indexed by the pair . The function is unity inside the th element in the x-space and zero outside. The interpolation functions have the following properties:


where is the Kronecker delta, and is the position vector of the th node of the th element. Figure 1 shows some elementary one-, two- and three-dimensional elements. The rectangular and brick elements can be distorted to obtain the so-called mapped elements (Zienkiewicz, Taylor & Zhu, 2005), which help to reconstruct complex morphologies in curvilinear coordinates. For instance, elements confined between confocal ellipsoids and hyperboloids can better describe elliptical galaxy models that may have potentials close to Stäckel form. Thin rings and spherical shells are the most efficient elements for axisymmetric discs and spherical systems, respectively.

Figure 1: Elementary finite elements. The degree of interpolating polynomial increases by adding interior nodes, which can lie on the edges, sides or even inside elements.

Using the superscript T to transpose a vector/matrix, we define the row vector


and the column vector


and rewrite the components of in the following compact form


Here a dot denotes matrix/vector multiplication. The above procedure can be readily applied to higher order velocity moments. In particular, we obtain


where the column vectors and contain, respectively, the nodal values of and inside the th element.

To construct a DF , we divide the action space to finite elements, each of nodes, and write


with and being dimensional row and column vectors, respectively. The elements of the interpolating row vector are denoted by and they satisfy the condition


for the action vector associated with the th node of the th element in action space. The union of the domains of covers the action space.

In this study we use interpolation functions of class both in the x and J spaces. The use of functions implies that all physical quantities are smooth (continuous and differentiable) inside each element and along its boundary lines. In the direction perpendicular to the boundary lines and at the nodes, the DF, density and higher order velocity moments will only be continuous. For example, consider the simplest one-dimensional set of elements: element has boundaries at and and has two nodes, with node 1 at the smaller boundary and node 2 at the larger. The continuity of at implies . Using (13), this condition reduces to


The first derivative of with respect to exists inside element and is given by


but the differentiability condition at the nodes of elements is not necessarily satisfied, i.e.,


One can resolve this problem by applying finite elements. The application of elements requires larger vectors of nodal quantities (which should now include partial derivatives), and thus larger element matrices. In this paper we restrict ourselves to elements; however, we note that elements provide smoother solutions (at the cost of larger matrices and greater analytic complexity), and are likely to be useful when the partial derivatives of are also present (e.g., in linear stability analyses).

Any vectorial function of the form can be expressed in terms of angle-action variables (Jalali, 2010):


with . Here the asterisk stands for complex conjugation, k is a 3-vector of integers and . To simplify the notation, we will denote by if , by if and , and by if . These special cases correspond to the zeroth-, first-, and second-order velocity moments. The row vector has the same dimension as and it is calculated from


When a test particle with the action vector J is inside the th element in the configuration space, the function is unity and that particle contributes to . In other situations, the integrand of (24) will vanish.

To compute , we simply integrate the equations of motion corresponding to the action J or the initial conditions until the particle enters the th element at time and exits at . We then calculate the values of the angles at the entry and exit times, and . We then carry out the integration (24) using Gaussian quadrature, typically with - points. The numerical integration of the equations of motion continues and is updated every time that the particle enters element , until the trajectory closes on itself for periodic orbits or becomes dense in the w-space. The only extra effort of this procedure compared to Schwarzschild’s method is to perform the integral (24). In §3.3, we show that one can avoid this numerical integration for separable models.

3 Galerkin weighting of governing equations

This section implements a Bubnov-Galerkin procedure (Zienkiewicz, Taylor & Zhu, 2005) to satisfy the governing equations of physical quantities (dependent variables) over individual elements in a weighted residual sense. As a result, independent variables are eliminated from equations, leaving a system of algebraic equations between nodal values of DF, density and velocity moments. The formulation is done in Cartesian coordinates and it should be modified for non-Cartesian ones (see §5 for spherical systems).

3.1 Density and velocity moments

Inside the th element in the configuration space, equation (7) reduces to


the presence of on the right side ensures that the integration is carried out over a phase subspace whose particles visit the th element and contribute to the density and velocity dispersion of that element. By substituting from (16) and (18) into (25) we obtain


We now left-multiply this equation by and integrate the result over the x-domain to get




being an constant matrix. The function in the integrand of (28) restricts the domain of integration to the region occupied by the th element. The integral in can be done analytically should one use interpolation functions of polynomial type.

The transformation is canonical and so the infinitesimal phase space volume can be replaced by . Using (23) with , equation (27) is transformed to


It is obvious that only the term with contributes to the integral over the w-space and equation (29) reads




Repeating the above procedure for the functions and leads to




The constant element matrices , and are of the size and there are of them. There are additional constraints associated with the element equations (30) and (32) at a node shared by several elements, since a physical quantity must have the same value in the Galerkin projections of all those elements. In fact, one can introduce -dimensional column vectors b, and that contain all nodal densities and first- and second-order velocity moments, and because nodes are shared the dimension . Similarly, the nodal DFs constitute an -dimensional column vector p where is the total number of distinct nodes in the J-space. Equation (30) can thus be written as


This matrix equation can be solved to yield the DF, as parametrized by its nodal values p. The rank of the matrix F is generally less than its dimension , and there are additional constraints that the DF should be non-negative, so in either Schwarzschild’s method or the FEM the solution must be sought by linear or quadratic programming or some other optimization method (see §3.4). Once the solution is known, the nodal values of the streaming velocity and velocity-dispersion tensor are obtained from equations (32), which can be written as


The constant global matrices F, U and S are generally dense. Assembling the element equations is a routine procedure in finite element analysis, and its logic is to use the continuity condition and eliminate repeated nodal quantities (like density and DF) from all element equations except one.

3.2 Continuity and Jeans equations

The accuracy of FEM models of equilibrium stellar systems can be improved by adding additional constraints that ensure that the continuity and Jeans equations (10) and (11) are satisfied. Inside the th element, these equations can be written




one obtains the Galerkin projections of (37) and (38) as


We assemble these element equations to obtain the global forms


Combining (36) and (42) leads to


We make some remarks. (i) The solutions of the continuity and Jeans equations, whether the continuous versions (10) and (11) or their FEM counterparts (42) above, are generally not unique. A notable exception is triaxial potentials of Stäckel form, in which the second-order tensor is diagonal in ellipsoidal coordinates (van de Ven et al., 2003). (ii) When using finite elements, as we do here, the moments , , and (eqs. 79) are continuous across element boundaries but generally their derivatives are not; however, since the right-hand sides of the continuity and Jeans equations are continuous across boundaries, the combinations of derivatives of and that appear on the left-hand sides of these equations must also be continuous. (iii) Equations (43) do not say that the mass and momentum flows into each element, through its boundaries, exactly balance their outflows (although the balance will become more and more accurate as the number of nodes increases). Only finite volume methods (LeVeque, 2002) and conservative FEMs ensure the exact conservation of physical fluxes and this paper does not discuss those techniques.

3.3 Separable models

The computation of the element matrices , and is accelerated when the Hamilton–Jacobi equation is separable for the potential . In such a circumstance, the velocity vector can be expressed as , which implies


This allows us to bypass the costly integration of orbit equations needed for calculating the Fourier coefficients . Defining the matrix


the element matrices are computed from


The integrals over the x and J subdomains can be evaluated using Gaussian quadratures.

In separable models the turning points of orbits and their shapes in the configuration space are known. Therefore, the null integrals in the element matrices can be avoided, and the computational effort is reduced, by a priori identification of the J-subspaces whose orbits never enter a selected element in the x-space. In fact, the information related to the passage of an orbit through a given element is carried by the function , and before evaluating the integrals we can exclude all pairs that give . For separable models in non-Cartesian coordinates, the velocity components in the Jacobian are replaced by generalized momenta, and the matrix P is divided by the product of metric coefficients.

3.4 Linear and quadratic programming

The size of the unknown vector p is not necessarily, or usually, equal to the total number of constraints. Even if it were, the solution vector would not necessarily fulfill the requirement that the DF must be non-negative. We therefore employ either linear programming (LP) or quadratic programming (QP; Gill et al. 1981) and search for , the components of p, by minimizing the objective function


with for LP. The minimization is subject to the inequalities (for =) and the equality constraints (35). If we also demand satisfaction of the continuity and Jeans equation these are supplemented by the equality constraints (43). The QP routines begin from a vector that satisfies the imposed constraints with a tolerance of , then proceed to minimize . The vector is usually called a feasible solution and is the feasibility tolerance; the latter must be greater than the computational accuracy of the variables involved in the constraints.

The weights are chosen based on the desired attributes of the model, such as bias toward radial or tangential orbits, maximization of a quadratic entropy, or fitting to specified data. For example, if a series of observables are linear functions of the DF,


and they are observed to have values with observational errors , then a suitable objective function is specified by


The LP and QP algorithms we have used can stall at weak local minima or “dead points”. Whenever this happens, we perturb the solution and restart the algorithm.

4 Derivation of Schwarzschild’s method

It is now straightforward to show that Schwarzschild’s orbit superposition method is a subclass of FEM. We assume for simplicity that the potential is integrable so the orbits are regular. The orbit library in Schwarzschild’s method is collected by sampling over the space of initial conditions. For regular orbits there is a one to one and invertible map between and , and Schwarzschild’s DFs will have the following form (Vandervoort, 1984)


where is the Dirac delta function and is the discrete DF associated with an orbit of the action vector . Equation (52) is derived from (18) by shrinking the elements in the action space to zero size. The total mass of the galaxy is computed from


which is combined with (52) to obtain the constraint


Schwarzschild assumes a uniform density inside the th element in configuration space. This implies that there is one node per element () and that the vector function reduces to a scalar constant, . Equation (12) then reduces to


The matrix is now a single number , which is the volume of the th element. The quantity == will thus be the mass inside the th element. We substitute (52) and (55) into (30) and obtain


with the zeroth-order Fourier coefficient given by


According to time averages theorem (Binney & Tremaine, 2008), the quantity on the right hand side of (57) is the fraction of time that an orbit of action spends in the th spatial element. Consequently, we obtain


which is Schwarzschild’s equation.

The approach described in §3.2 to incorporate constraints based on the continuity and Jeans equations into FEM models does not work for Schwarzschild’s method: because the interpolating functions g are constants, the matrices defined in equation (39) are zero so the first of equations (41) is trivially satisfied and the second has no solution. Physically, the stress tensor is constant within an element so there is no divergence in the momentum flux to balance the gravitational force per unit volume .

The failure of Schwarzschild’s method to satisfy the Jeans equations within an element does not imply that the method is incorrect. Indeed, the method must satisfy the Jeans equations on larger scales because the assumed discrete DF (52) depends only on the actions and hence must satisfy the CBE, and the Jeans equations are moments of the CBE. The correct statement is that Schwarzschild’s method satisfies the Jeans equations approximately if we calculate gradients of the stress tensor between adjacent elements and match their sum to at the center of an element. In this process, which we carry out for spherical systems in §5, one must appropriately handle partial derivatives because elements in the configuration space are not usually separated uniformly.

5 Spherically symmetric systems

In spherical systems one can use simple shell elements (ring elements for axisymmetric disks). Moreover, the distance of particles from the centre is represented as the Fourier series of the radial angle only. This remarkably simplifies the calculation of the vectorial function should one decide to compute the element matrices from (31), (33) and (34). Consider the spherical polar coordinates and the corresponding velocities where is the radial distance from the centre, is the co-latitude and is the azimuthal angle. The density of a spherical system is a function of , its first-order velocity moment is zero, and the following relations hold for second-order velocity moments:


where with being the magnitude of angular momentum vector. We confine ourselves to models with ===0. The continuity equation (mass conservation) is trivially satisfied for such a system. The elements of the stress tensor are


and the Jeans equation in the radial direction reads


The other two equations, in the - and -direction, do not provide further information.

We consider a mesh of concentric shell elements and define the th element by its inner and outer radii and , respectively. We then approximate the density and velocity moments as


The continuity conditions at the boundaries of adjacent elements are , and . Let us now substitute from equations (62)–(64) into (61) and derive its Galerkin projection as


for . The matrices and are determined from (48) by replacing with and , respectively, and we have


All other equations of §3 can be directly applied to spherical systems using the following substitutions:


where is the orbital energy of particles:


If we apply the FEM without Jeans equation constraints we must satisfy the linear constraint equations (eq. 35). If we include the Jeans equation constraints we assemble equations (65) to a global form and combine this with to give


In the finite element formulation, it is difficult to construct a function (here the stress components) and its derivatives with the same accuracy. Therefore, we replace the equality constraints (71) with the weaker inequality constraints


where are the normalised components of the residual vector , defined as


and minimize by setting


Here and are the elements of T and F, respectively. The value of cannot be arbitrarily small: at large radii the magnitudes of the stresses become comparable to the numerical errors, and the Jeans equations are correspondingly less accurate. In our calculations, we have been able to secure convergence with as small as over a wide radial range.

It is worth deriving the weighted residual form of the Jeans equation for discrete Schwarzschild models, to investigate whether applying this as a constraint improves the accuracy of these models. For , the density and second velocity moments are constant inside each element and the vectors , and in (62)–(64) are replaced by the scalars , , and , respectively. We rewrite (61) as


We obtain the Galerkin projection through multiplying the differential equation (75) by and integrating over the th spatial element. For the first term this procedure gives where . Since is discontinuous at the element boundaries we must make some arbitrary choice; after some experimentation we have found that the best convergence is obtained by taking to be , that is, the value of the stress in the element interior to the boundary. Then the discretized Jeans equation is


This difference equation is not necessarily satisfied by an optimization procedure that attempts to fit observables using a DF of the form (52). We note that and are normal stresses exerted on the th shell element at its inner and outer boundaries. For sufficiently thin elements when