# Overdetermined Shooting Methods for Computing

Standing Water Waves with Spectral Accuracy

###### Abstract

A high-performance shooting algorithm is developed to compute time-periodic solutions of the free-surface Euler equations with spectral accuracy in double and quadruple precision. The method is used to study resonance and its effect on standing water waves. We identify new nucleation mechanisms in which isolated large-amplitude solutions, and closed loops of such solutions, suddenly exist for depths below a critical threshold. We also study degenerate and secondary bifurcations related to Wilton’s ripples in the traveling case, and explore the breakdown of self-similarity at the crests of extreme standing waves. In shallow water, we find that standing waves take the form of counter-propagating solitary waves that repeatedly collide quasi-elastically. In deep water with surface tension, we find that standing waves resemble counter-propagating depression waves. We also discuss existence and non-uniqueness of solutions, and smooth versus erratic dependence of Fourier modes on wave amplitude and fluid depth.

In the numerical method, robustness is achieved by posing the problem as an overdetermined nonlinear system and using either adjoint-based minimization techniques or a quadratically convergent trust-region method to minimize the objective function. Efficiency is achieved in the trust-region approach by parallelizing the Jacobian computation so the setup cost of computing the Dirichlet-to-Neumann operator in the variational equation is not repeated for each column. Updates of the Jacobian are also delayed until the previous Jacobian ceases to be useful. Accuracy is maintained using spectral collocation with optional mesh refinement in space, a high order Runge-Kutta or spectral deferred correction method in time, and quadruple-precision for improved navigation of delicate regions of parameter space as well as validation of double-precision results. Implementation issues for GPU acceleration are briefly discussed, and the performance of the algorithm is tested for a number of hardware configurations.

###### keywords:

water waves, standing waves, resonance, bifurcation, Wilton’s ripples, trust-region shooting method, boundary integral method, spectral deferred correction, GPU acceleration, quadruple precision^{†}

^{†}journal: Computational Science and Discovery

Contents\@mkbothLCONTENTSLCONTENTS

toc

## 1 Introduction

Time-periodic solutions of the free-surface Euler equations serve as an excellent benchmark for the design and implementation of numerical algorithms for two-point boundary value problems governed by nonlinear partial differential equations. In particular, there is a large body of existing work on numerical methods for computing standing waves schwartz:81 (); mercer:92 (); mercer:94 (); smith:roberts:99 (); tsai:jeng:94 (); bryant:stiassnie:94 (); schultz (); okamura:03 () and short-crested waves roberts83b (); Marchant87b (); bridges01 (); Ioualalen06 () for performance comparison. Moreover, many of these previous studies reach contradictory scientific conclusions that warrant further investigation, especially concerning extreme waves and the formation of a corner or cusp. Penney and Price penney:52 () predicted a 90 degree corner, which was verified experimentally by G. I. Taylor taylor:53 (), who was nevertheless skeptical of their analysis. Grant grant () and Okamura okamura:98 () gave theoretical arguments supporting the 90 degree corner. Schwartz and Whitney schwartz:81 () and Okamura okamura:03 () performed numerical experiments that backed the 90 degree conjecture. Mercer and Roberts mercer:92 () predicted a somewhat sharper angle and mentioned 60 degrees as a possibility. Schultz et al. schultz () obtained results similar to Mercer and Roberts, and proposed that a cusp may actually form rather than a corner. Wilkening breakdown () showed that extreme waves do not approach a limiting wave at all due to fine scale structure that emerges at the surface of very large amplitude waves and prevents the wave crest from sharpening in a self-similar manner. This raises many new questions about the behavior of large-amplitude standing waves, which we will explore in Section 4.4.

On the theoretical side, it has long been known concus:64 (); vanden:broeck:book (); iooss05 () that standing water waves suffer from a small-divisor problem that obstructs convergence of the perturbation expansions developed by Rayleigh rayleigh1876 (), Penney and Price penney:52 (), Tadjbakhsh and Keller tadjbakhsh (), Concus concus:62 (), Schwartz and Whitney schwartz:81 (), and others. Penney and Price penney:52 () went so far as to state, “there seems little likelihood that a proof of the existence of the stationary waves will ever be given.” Remarkably, Plotnikov and Toland plotnikov01 (), together with Iooss iooss05 (), have recently established existence of small-amplitude standing waves using a Nash-Moser iteration. As often happens in small-divisor problems craig:wayne (); bourgain99 (), solutions could only be proved to exist for values of an amplitude parameter in a totally disconnected Cantor set. No assertion is made about parameter values outside of this set. This raises intriguing new questions about whether resonance really causes a complete loss of smoothness in the dependence of solutions on amplitude, or if these results are an artifact of the use of Nash-Moser theory to prove existence. While a complete answer can only come through further analysis, insight can be gained by studying high precision numerical solutions.

In previous numerical studies, the most effective methods for computing standing water waves have been Fourier collocation in space and time vandenBroeck:81 (); tsai:jeng:94 (); okamura:03 (); ioualalen:03 (); okamura:10 (), semi-analytic series expansions schwartz:81 (); amick:87 (), and shooting methods mercer:92 (); mercer:94 (); schultz (); smith:roberts:99 (). In Fourier collocation, time-periodicity is built into the basis, and the equations of motion are imposed at collocation points to obtain a large nonlinear system of equations. This is the usual approach taken in analysis to prove existence of time-periodic solutions, e.g. of nonlinear wave equations craig:wayne () or nonlinear Schrödinger equations bourgain99 (). The drawback as a numerical method is that the number of unknowns in the nonlinear system grows like rather than for a shooting method, which limits the resolution one can achieve. Orthogonal collocation, as implemented in the software package AUTO doedel91 (), would be less efficient than Fourier collocation as more timesteps will be required to achieve the same accuracy.

The semi-analytic series expansions of Schwartz and Whitney schwartz:81 (); amick:87 () are a significant improvement over previous perturbation methods rayleigh1876 (); penney:52 (); tadjbakhsh (); concus:62 () in that the authors show how to compute an arbitrary number of terms rather than stopping at 3rd or 5th order. They also used conformal mapping to flatten the boundary, which leads to a more promising representation of the solution of Laplace’s equation. As a numerical method, the coefficients of the expansion are expensive to compute, which limits the number of terms one can obtain in practice. (Schwartz and Whitney stopped at 25th order). It may also be that the resulting series is an asymptotic series rather than a convergent series. Nevertheless, these series expansions play an essential role in the proof of existence of standing waves on deep water by Plotnikov, Toland and Iooss iooss05 ().

In a shooting method, one augments the known boundary values at one endpoint with additional prescribed data to make the initial value problem well posed, then looks for values of the new data to satisfy the boundary conditions at the other endpoint. For ordinary differential equations, this normally leads to a system of equations with the same number of equations as unknowns. The same is true of multi-shooting methods keller:68 (); stoer:bulirsch (); guckenheimer:00 (). When the boundary value problem is governed by a system of partial differential equations, it is customary to discretize the PDE to obtain an ODE, then proceed as described above. However, because of aliasing errors, quadrature errors, filtering errors, and amplification by the derivative operator, discretization causes larger errors in high-frequency modes than low-frequency modes when the solution is evolved in time. These errors can cause the shooting method to be too aggressive in its search for initial conditions, and to explore regions of parameter space (the space of initial conditions) where either the numerical solution is inaccurate, or the physical solution becomes singular before reaching the other endpoint. Even if safeguards are put in place to penalize high-frequency modes in the search for initial conditions, the Jacobian is often poorly conditioned due to these discretization errors.

We have found that posing boundary value problems governed by PDEs as overdetermined, nonlinear least squares problems can dramatically improve the robustness of shooting methods in two critical ways. First, we improve accuracy by padding the initial condition with high-frequency modes that are constrained to be zero. With enough padding, all the degrees of freedom controlled by the shooting method can be resolved sufficiently to compute a reliable Jacobian. Second, adding more rows to the Jacobian increases its smallest singular values, often improving the condition number by several orders of magnitude. The extra rows come from including the high-frequency modes of the boundary conditions in the system of equations, even though they are not included in the list of augmented initial conditions. As a rule of thumb, it is usually sufficient to set the top 1/3 to 1/2 of the Fourier spectrum to zero initially; additional zero-padding has little effect on the numerical solution or the condition number. Validation of accuracy by monitoring energy conservation and decay rates of Fourier modes will be discussed in Section 4.4, along with mesh refinement studies and comparison with quadruple precision calculations.

In this paper, we present two methods of solving the nonlinear least squares problem that arises in the overdetermined shooting framework. The first is the adjoint continuation method (ACM) of Ambrose and Wilkening benj1 (); benj2 (); vtxs1 (); lasers (), in which the gradient of the objective function with respect to initial conditions is computed by solving an adjoint PDE, and the BFGS algorithm bfgs (); nocedal () is used for the minimization. This was the approach used by one of the authors in her dissertation jia:thesis () to obtain the results of Sections 4.1 and 4.6. In the second approach, we exploit an opportunity for parallelism that makes computing the entire Jacobian feasible. Once this is done, a variant of the Levenberg-Marquardt method (with less frequent Jacobian updates) is used to rapidly converge to the solution. The main challenge here is organizing the computation to maximize re-use of setup costs in solving the variational equation with multiple right-hand sides, to minimize communication between threads or with the GPU device, and to ensure that most of the linear algebra occurs at level 3 BLAS speed. The performance of the algorithms on various platforms is reported in Section 4.7.

The scientific focus of the present work is on resonance and its effect on existence, non-uniqueness, and physical behavior of standing water waves. A summary of our main results is given in the abstract, and in more detail at the beginning of Section 4. We mention here that resonant modes generally take the form of higher-frequency, secondary standing waves oscillating at the surface of larger-scale, primary standing waves. Because the equations are nonlinear, only certain combinations of amplitude and phase can occur for each component wave. This leads to non-uniqueness through multiple branches of solutions. In shallow water, bifurcation curves of high-frequency Fourier modes behave erratically and contain many gaps where solutions do not appear to exist. This is expected on theoretical grounds. However, these bifurcation “curves” become smoother, or “heal,” as fluid depth increases. In infinite depth, such resonant effects are largely invisible, which we quantify and discuss in Section 5.

In future work water:stable (), the methods of this paper will be used to study other families of time-periodic solutions of the free-surface Euler equations with less symmetry than is assumed here, e.g. traveling-standing waves, unidirectional solitary wave interactions, and collisions of gravity-capillary solitary waves. The stability of these solutions will also be analyzed in water:stable () using Floquet methods.

## 2 Equations of motion and time-stepping

The effectiveness of a shooting algorithm for solving two-point boundary value problems is limited by the accuracy of the time-stepper. In this section, we describe a boundary integral formulation of the water wave problem that is spectrally accurate in space and arbitrary order in time. We also describe how to implement the method in double and quadruple precision using a GPU, and discuss symmetries of the problem that can be exploited to reduce the work of computing standing waves by a factor of 4. The method is similar to other boundary integral formulations lh76 (); baker:82 (); krasny:86 (); mercer:92 (); mercer:94 (); smith:roberts:99 (); baker10 (), but is simpler to implement than the angle–arclength formulation used in hls94 (); ceniceros:99 (); HLS01 (); vtxs1 (), and avoids issues of identifying two curves that are equal “up to reparametrization” when the and coordinates of the interface are both evolved (in non-symmetric problems). Our approach also avoids sawtooth instabilities that sometimes occur when using Lagrangian markers lh76 (); mercer:92 (). This is consistent with the results of Baker and Nachbin baker:nachbin:98 (), who found that sawtooth instabilities can be controlled without filtering using the correct combination of spectral differentiation and interpolation schemes. While conformal mapping methods dyachenko:1996 (); nachbin:04 (); milewski:11 () are more efficient than boundary integral methods in many situations, they are not suitable for modeling extreme waves as the spacing between grid points expands severely in regions where wave crests form, which is the opposite of what is needed for an efficient representation of the solution via mesh refinement.

### 2.1 Equations of motion

We consider a two-dimensional irrotational ideal fluid whitham74 (); johnson97 (); craik04 (); craik05 () bounded below by a flat wall and above by an evolving surface, . Because the flow is irrotational, there is a velocity potential such that . The restriction of to the free surface is denoted . The equations of motion governing and are

(2.1a) | ||||

(2.1b) |

Here is the acceleration of gravity, is the fluid density, is the surface tension (possibly zero), and is the projection to zero mean that annihilates constant functions,

(2.2) |

This projection is not standard in (2.1b), but yields a convenient convention for selecting the arbitrary additive constant in the potential. In fact, if the fluid has infinite depth and the mean surface height is zero, has no effect in (2.1b) at the PDE level, ignoring roundoff and discretization errors. The velocity components , on the right hand side of (2.1) are evaluated at the free surface to determine and . The system is closed by relating in the fluid to and on the surface as the solution of Laplace’s equation

(2.3a) | ||||||

(2.3b) | ||||||

(2.3c) |

where is the mean fluid depth (possibly infinite). We assume and are -periodic in . Applying a horizontal Galilean transformation if necessary, we may also assume is -periodic in . We generally assume in the finite depth case and absorb the mean fluid depth into itself. This causes to be a reflection of the free surface across the bottom boundary, which simplifies many formulas in the boundary integral formulation below. The same strategy can also be applied in the presence of a more general bottom topography nachbin:04 ().

Equation (2.1a) is a kinematic condition requiring that particles on the surface remain there. Equation (2.1b) comes from and the unsteady Bernoulli equation, , where is constant in space but otherwise arbitrary. At the free surface, we assume the pressure jump across the interface due to surface tension is proportional to curvature, . The ambient pressure is absorbed into the arbitrary function , which is chosen to preserve the mean of :

(2.4) |

The advantage of this construction is that is time-periodic with period if and only if and are time-periodic with the same period. Otherwise, could differ from by a constant function without affecting the periodicity of .

Details of our boundary integral formulation are given in Appendix A. Briefly, we identify with and parametrize the free surface by

(2.5) |

where the change of variables allows for smooth mesh refinement in regions of high curvature, and has been suppressed in the notation. We compute the Dirichlet-Neumann operator craig:sulem:93 (),

(2.6) |

which appears implicitly in the right hand side of (2.1) through and , in three steps. First, we solve the integral equation

(2.7) |

for the dipole density, , in terms of the (known) Dirichlet data . Formulas for and are given in (2.13) below. These kernels are smooth functions (even at ), so the integral is not singular; see Appendix A. Second, we differentiate to obtain the vortex sheet strength, . Finally, we evaluate the normal derivative of at the free surface via

(2.8) |

and are defined in (2.13) below, and is the Hilbert transform, which is diagonal in Fourier space with symbol . The only unbounded operation in this procedure is the second step, in which is obtained from by taking a derivative.

### 2.2 GPU-accelerated time-stepping and quadruple precision

Next we turn to the question of discretization. Because we are interested in studying large amplitude standing waves that develop relatively sharp wave crests for brief periods of time, we discretize space and time adaptively. Time is divided into segments , where and is the simulation time, usually an estimate of the period or quarter-period. In the simulations reported here, ranges from 1 to 5 and each was close to (within a factor of two). On segment , we fix the number of (uniform) timesteps, , the number of spatial grid points, , and the function

(2.10) |

which controls the grid spacing in the change of variables ; see Figure 1. As before, projects out the mean. The parameter lies in the range and satisfies

(2.11) |

Note that corresponds to uniform spacing while corresponds to the singular limit where ceases to be a diffeomorphism at one point. This approach takes advantage of the fact that we can arrange in advance that the wave crests will form at and , alternating between the two in time. A more automated approach would be to have the grid spacing evolve with the wave profile, perhaps as a function of curvature, rather than asking the user to specify the change of variables. We did not experiment with this idea since our approach also allows the number of grid points to increase in time, which would be complicated in an automated approach. We always set so that on the first segment. Respacing the grid from segment to boils down to interpolating and to obtain values on the new mesh, e.g. , , which is straightforward by Newton’s method. To be safe, we avoid refining the mesh in one region at the expense of another; thus, if , we also require so that the grid spacing decreases throughout the interval, but more so in the region where the wave crest is forming.

Since the evolution equations are not stiff unless the surface tension is large, high order explicit time-stepping schemes work well. For each Runge-Kutta stage within a timestep on a given segment , the integral equation (2.7) is solved by collocation using uniformly spaced grid points and the (spectrally accurate) trapezoidal rule,

(2.12) |

The matrices and that represent the discretized integral operators in (2.7) and (2.8) are computed simultaneously and in parallel. The formulas are and with

(2.13) | ||||

As explained in Appendix A, these kernels have been regularized. Indeed, and are continuous at if we define and . These formulas are used when computing the diagonal entries and . The terms in (2.13) are computed once and for all at the start. If the fluid depth is infinite, and are omitted. GMRES is used to solve (2.7) for , which consistently takes 4-30 iterations to reach machine precision (independent of problem size). In quadruple precision, the typical range is 9-36 GMRES iterations. The FFT is used to compute and in (2.8), as well as , , , and .

We wrote 3 versions of the code, which differ only in how the matrices and are computed. The simplest version uses openMP parallel for loops to distribute the work among all available threads. The most complicated version is parallelized using MPI and scalapack. In this case, the matrices and are stored in block-cyclic layout Lim96 () across the processors, and each processor computes only the matrix entries it is responsible for. The fastest version of the code is parallelized on a GPU in the cuda programming language. First, the CPU sends the GPU the vector , which holds complex numbers. Next, the GPU computes the matrices and and stores them in device memory. Finally, in the GMRES iteration, Krylov vectors are sent to the GPU, which applies the matrix and returns the result as a vector. After the last Krylov iteration, the device also applies to to help compute in (2.8). Thus, communication with the GPU involves passing vectors of length , while flops must be performed on each vector passed in. As a result, communication does not pose a computational bottleneck, and the device operates at near 100% efficiency. We remark that the formula

is relatively expensive to evaluate. Thus, it pays to compute and simultaneously (to re-use , , , results), and to actually store the matrices in device memory rather than re-compute the matrix entries each time a matrix-vector product is required.

In double-precision, we evolve (2.1) using Dormand and Prince’s DOP853 scheme hairer:I (). This is a 13 stage, 8th order, “first same as last” Runge-Kutta method, so the effective cost of each step is 12 function evaluations. We apply the 36th order filter described in hou:li:07 () to the right hand side of (1e) and (1f) each time they are evaluated in the Runge-Kutta procedure, and to the solution itself at the end of each time-step. This filter consists of multiplying the th Fourier mode by

(2.14) |

which allows the highest-frequency Fourier modes to remain non-zero (to help resolve the solution) while still suppressing aliasing errors. To achieve truncation errors of order in quadruple-precision, the 8th order method requires too many timesteps. Through trial and error, we found that a 15th order spectral deferred correction (SDC) method dutt (); huang (); minion () is the most efficient scheme for achieving this level of accuracy. Our GPU implementation of quadruple precision arithmetic will be discussed briefly in Section 3.3. The variant of SDC that we use in this paper employs eight Radau IIa quadrature nodes hairer:I (). The initial values at the nodes are obtained via fourth order Runge-Kutta. Ten correction sweeps are then performed to improve the solution to accuracy at the quadrature nodes. We use pure Picard corrections instead of the more standard forward-Euler corrections as they have slightly better stability properties. The final integration step yields a local truncation error of ; hence, the method is 15th order. See sdc:picard () for more information about this variant of the SDC method and its properties. If one wished to go beyond quadruple-precision arithmetic, it is straightforward to increase the order of the time-stepping scheme accordingly. We did not investigate the use of symplectic integrators since our approach already conserves energy to 12-16 digits of accuracy in double precision, and 24-32 digits in quadruple precision.

### 2.3 Translational and time-reversal symmetry

In this paper, we restrict attention to symmetric standing waves of the type studied in rayleigh1876 (); penney:52 (); tadjbakhsh (); concus:62 (); mercer:92 (); mercer:94 (); schultz (); ioualalen:03 (). For these waves, it is only necessary to evolve the solution over a quarter period. Indeed, if at some time the fluid comes to rest (), a time-reversal argument shows that the solution will evolve back to the initial state at with the sign of reversed. More precisely, the condition implies that and . Now suppose that, upon translation by , remains invariant while changes sign. Then we see that and are solutions of (2.1) with initial conditions

Therefore, , , and

Hence, and are time-periodic with period . It is natural to expect standing waves to have even symmetry when the origin is placed at a crest or trough and the fluid comes to rest. This assumption implies that and will remain even functions for all time since and in (2.1) are even whenever and are. Under all these assumptions, the evolution of and from to is a mirror image (about or ) of the evolution from to .

Once the initial conditions and period are found using symmetry to accelerate the search for time-periodic solutions, we double-check that the numerical solution evolved from to is indeed time-periodic. Mercer and Roberts exploited similar symmetries in their numerical computations mercer:92 (); mercer:94 ().

## 3 Overdetermined shooting methods

As discussed in the introduction, two-point boundary value problems governed by partial differential equations must be discretized before solving them numerically. However, truncation errors lead to loss of accuracy in the highest-frequency modes of the numerical solution, which can cause difficulty for the convergence of shooting methods. We will see below that robustness can be achieved by posing these problems as overdetermined nonlinear systems.

In Section 3.1, we define two objective functions with the property that driving them to zero is equivalent to finding a time-periodic standing wave. One of the objective functions exploits the symmetry discussed above to reduce the simulation time by a factor of 4. The other is more robust as it naturally penalizes high-frequency Fourier modes of the initial conditions. Both objective functions use symmetry to reduce the number of unknowns and eliminate phase shifts of the standing waves in space and time. The problem is overdetermined because the highest-frequency Fourier modes are constrained to be zero initially but not at the final time. Also, because often corresponds to a sharply crested wave profile, there are more active Fourier modes in the solution at that time than at . By refining the mesh adaptively, we include all of these active modes in the objective functions, making them more overdetermined. The idea that the underlying dynamics of standing water waves is lower-dimensional than predicted by counting active Fourier modes has recently been explored by Williams, et al. pod ().

In Sections 3.2 and 3.3, we describe two methods for solving the resulting nonlinear least squares problem. The first is the Adjoint Continuation Method benj1 (); benj2 (); vtxs1 (); lasers (), in which the gradient of the objective function is computed by solving an adjoint PDE and the BFGS algorithm bfgs (); nocedal () is used for the minimization. The second is a trust-region approach in which the Jacobian is computed by solving the variational equation in parallel with multiple right-hand sides. This allows the work of computing the Dirichlet-Neumann operator to be shared across all the columns of the Jacobian. We also discuss implementation issues in quadruple precision on a GPU.

### 3.1 Nonlinear least squares formulation

In the symmetric standing wave case considered here, we assume the initial conditions are even functions satisfying and . In Fourier space, they take the form

(3.15) | ||||

where are real numbers, and all other Fourier modes of the initial conditions are set to zero. (In the finite depth case, we also set , the mean fluid depth.) Here is taken to be somewhat smaller than , e.g. , where is the number of spatial grid points used during the first timesteps. (Recall that subscripts on and refer to mesh refinement sub-intervals.) Note that high-frequency Fourier modes of the initial condition are zero-padded to improve resolution of the first Fourier modes.

In addition to the Fourier modes of the initial condition, the period of the solution is unknown. We add a zeroth component to to represent the period:

(3.16) |

Our goal is to find such that . We therefore define the objective function

(3.17) |

where is the index of the final sub-interval in the mesh refinement strategy and the square root is a quadrature weight to approximate the integral via the trapezoidal rule after the change of variables , . Note that with , which is usually several times larger than , the number of non-zero initial conditions. The numerical solution is not sensitive to the choice of and as long as enough zero-padding is included in the initial condition to resolve the highest frequency Fourier modes. This will be confirmed in Section 4.4 through mesh-refinement studies and comparison with quadruple-precision computations.

One can also use an objective function that measures deviation from time-periodicity directly:

(3.18) |

When the underlying PDE is stiff (e.g. for the Benjamin-Ono benj1 (); benj2 () or KdV equations), an objective function of the form (3.18) has a key advantage over (3.17). For stiff problems, semi-implicit time-stepping methods are used in order to take reasonably large time-steps. Such methods damp high-frequency modes of the initial condition. This causes these modes to have little effect on an objective function of the form (3.17); thus, the Jacobian can be poorly conditioned if the shooting method attempts to solve for too many modes. By contrast, when implemented via (3.18), the initial conditions of high-frequency modes are heavily penalized for deviating from the damped values at time . As a result, the Jacobian does not suffer from rank deficiency, and high-frequency modes do not drift far from zero unless doing so is helpful. Since the water wave is not stiff, we use explicit schemes that do not significantly damp high-frequency modes; therefore, the computational advantage of evolving over a quarter-period outweigh any robustness advantage of using (3.18).

We used symmetry to reduce the number of unknown initial conditions in (3.15). This has the added benefit of selecting the spatial and temporal phase of each solution in a systematic manner. In problems where the symmetries of the solution are not known in advance, or to search for symmetry-breaking bifurcations, one can revert to the approach described in benj1 (), where both real and imaginary parts of the leading Fourier modes of the initial condition were computed in the search for time-periodic solutions. To eliminate spatial and temporal phase shifts, one of the Fourier modes was constrained to be real and its time derivative was required to be imaginary. Constraining the time-derivative of a mode is most easily done with a penalty function benj1 (). Alternatively, if two modes are constrained to be real and their time-derivatives are left arbitrary, it is easier to remove their imaginary parts from the search space than to use a penalty function.

Once phase shifts have been eliminated, the families of time-periodic solutions we have found appear to sweep out two-parameter families of solutions. To compute a solution in a family, we specify the mean depth and the value of one of the in (3.15) or (3.16) and solve for the other to minimize the objective function. If is reduced below a specified threshold (typically in double-precision or in quadruple precision), we consider the solution to be time-periodic. If reaches a local minimum that is higher than the specified threshold, we either (1) refine the mesh, increase , and try again; (2) choose a different value of closer to the previous successful value; or (3) change the index specifying which Fourier mode is used as a bifurcation parameter. Switching to a different is often useful when tracking a fold in the bifurcation curve. Since and one parameter has been frozen, is effectively a function of variables.

We note that once and the mesh parameters , , , and are chosen, is a smooth function that can be minimized using a variety of optimization techniques. Small divisors come into play when deciding whether would really converge to zero in the mesh refinement limit (with ever increasing numerical precision). The answer may depend on whether the bifurcation parameters ( and either or one of the ) are allowed to vary within the tolerance of the current roundoff threshold each time the mesh is refined and the floating point precision is increased. While it is likely that small divisors prevent the existence of smooth families of exact solutions, exceedingly accurate approximate solutions do appear to sweep out smooth families, with occasional disconnections in the bifurcation curves due to resonance.

### 3.2 Adjoint continuation method

Having recast the shooting method as an overdetermined nonlinear least squares problem, we must now minimize the functional in (3.17) or (3.18). The first approach we tried was the adjoint continuation method (ACM) developed by Ambrose and Wilkening to study time-periodic solutions of the Benjamin-Ono equation benj1 (); benj2 () and the vortex sheet with surface tension vtxs1 (). The method has also been used by Williams et al. to study the stability transition from single-pulse to multi-pulse dynamics in a mode-locked laser system lasers ().

The idea of the ACM is to compute the gradient of with respect to the initial conditions by solving an adjoint PDE, and then minimize using the BFGS method bfgs (); nocedal (). BFGS is a quasi-Newton algorithm that builds an approximate (inverse) Hessian matrix from the sequence of gradient vectors it encounters on successive line-searches. In more detail, let and denote the system (2.1) abstractly by

(3.19) |

We define the inner product

(3.20) |

so that in (3.17), written now as a function of the initial conditions and proposed period, which themselves depend on via (3.15) and (3.16), takes the form

(3.21) |

where solves (3.19). The case with of the form (3.18) is similar, so we omit details here. In the course of minimizing , the BFGS algorithm will repeatedly query the user to evaluate both and its gradient at a sequence of points . The derivative, , is easily obtained by evaluating

(3.22) |

using the trapezoidal rule after changing variables, , . Note that and are already known by solving (2.1). One way to compute the other components of , say , would be to solve the variational equation, (written abstractly here and explicitly in Appendix B)

(3.23) |

with initial conditions

(3.24) |

to obtain

(3.25) |

Note that a dot denotes a directional derivative with respect to the initial condition, not a time-derivative. To avoid the expense of solving (3.23) repeatedly for each value of , we solve a single adjoint PDE to find such that . From (3.25), we have

(3.26) |

where we have defined with and . Note that replacing 0 by did not affect the inner product. Next we observe that the solution of the adjoint equation

(3.27) |

which evolves backward in time (), has the property that

(3.28) |

Setting shows that this constant is actually . Setting gives the form we want:

(3.29) |

From (3.24), we obtain

(3.30) |

Together with (3.22), this gives all the components of at once. Explicit formulas for the linearized and adjoint equations (3.23) and (3.27) are derived in Appendix B.

Like (3.23), the adjoint equation (3.27) is linear, but non-autonomous, due to the presence of the solution of (3.19) in the equation. In the BFGS method, the gradient is always called immediately after computing the function value; thus, if and are stored in memory at each timestep in the forward solve, they are available in the adjoint solve at intermediate Runge-Kutta steps through cubic Hermite interpolation. We actually use dense output formulas shampine86 (); hairer:I () for the 5th and 8th order Dormand-Prince schemes since cubic Hermite interpolation limits the accuracy of the adjoint solve to 4th order, but the idea is the same. If there is insufficient memory to store the solution at every timestep, we store the solution at equally spaced mile-markers and re-compute between them when reaches that region. Thus, can be computed in approximately the same amount of time as itself, or twice the time if mile-markers are used.

It is worth mentioning that, when discretized, the values of and are stored on a non-uniformly spaced grid for each segment in the mesh-refinement strategy. The adjoint variables , are stored at the same mesh points, and are initialized by

with no additional weight factors needed. This works because the inner product (3.20) is defined with respect to rather than , and the change of variables has been accounted for by the factor in the formula (3.17) for .

### 3.3 Trust-region shooting method

While the ACM method gives an efficient way of computing the gradient of , it takes many line-search iterations to build up an accurate approximation of the Hessian of . This misses a key opportunity for parallelism and re-use of data that can be exploited if we switch from the BFGS framework to a Levenberg-Marquardt approach nocedal (). Instead of solving the adjoint equation (3.27) to compute efficiently, we solve the variational equation (3.23) with multiple right-hand sides to compute all the columns of the Jacobian simultaneously. From (3.17), we see that

(3.31) |

where is initialized as in (3.24) for . We avoid the need to store at every timestep (or at mile-markers) by evolving along with rather than interpolating :

(3.32) |

In practice, we replace in (3.32) by the matrix to compute all the columns of (besides ) at once. The linearized equations (B.56) involve the same Dirichlet-to-Neumann operator as the nonlinear equations (2.1), so the matrices and in (2.7) and (2.8) only have to be computed once to evolve the entire matrix through a Runge-Kutta stage. Moreover, the linear algebra involved can be implemented at level 3 BLAS speed. For large problems, we perform an LU-factorization of , the cost of which is made up for many times over by replacing GMRES iterations with a single back-solve stage for each right-hand side. In the GPU version of the code, all the linear algebra involving and is performed on the device (using the CULA library). As before, communication with the device is minimal in comparison to the computational work performed there.

We emphasize that the main advantage of solving linearized equations is that the same DNO operator is used for each column of in a given Runge-Kutta stage. This opportunity is lost in the simpler approach of approximating through finite differences by evolving (3.19) repeatedly, with initial conditions perturbed in each coordinate direction:

(3.33) |

Thus, while finite differences can also be parallelized efficiently by evolving these solutions independently, the matrices and will be computed times more often in the finite difference approach, and most of the linear algebra will drop from running at level 3 BLAS speed to level 2. Details of our Levenberg-Marquardt implementation are given in Appendix C, where we discuss how to re-use the Jacobian several times rather than re-computing it each time a step is accepted.

The CULA and LAPACK libraries could not be used for quadruple precision calculations, and we did not try FFTW in that mode. Instead, we used custom FFT and linear algebra libraries (written by Wilkening) for this purpose. However, for the GPU, we did not have any previous code to build on. Our solution was to write a block version of matrix-matrix multiplication in CUDA to compute residuals in quadruple precision, then use iterative refinement to solve for the corrections in double-precision, using the CULA library. Although quadruple precision is not native on any current GPU, we found M. Lu’s gqd package gqd (), which is a CUDA implementation of Bailey’s qd package qd (), to be quite fast. Our code is written so that the floating point type can be changed through a simple typedef in a header file. This is possible in C++ by overloading function names and operators to call the appropriate versions of routines based on the argument types.

## 4 Numerical results

This section is organized as follows: In Section 4.1, we use the Adjoint Continuation Method to study standing waves of wavelength in water of uniform depth . Several disconnections in the bifurcation curves are encountered, which are shown to correspond physically to higher-frequency standing waves superposed (nonlinearly) on the low-frequency carrier wave. In Section 4.2, we use the trust-region approach to study a nucleation event in which isolated large-amplitude solutions, and closed loops of such solutions, suddenly exist for depths below a threshold value. This gives a new mechanism for the creation of additional branches of solutions (besides harmonic resonance mercer:94 (); smith:roberts:99 ()). In Section 4.3, we study a “Wilton ripple” phenomenon vandenBroeck:84 (); chen:saffman:79 (); bryant:stiassnie:94 (); bridges:standing:3d (); akers:wilton () in which a pair of “mixed mode” solutions bifurcate along side the “pure mode” solutions at a critical depth. Our numerical solutions are accurate enough to identify the leading terms in the asymptotic expansion of these mixed mode solutions. Following the mixed-mode branches via numerical continuation reveals that they meet up with the pure mode branches again at large amplitude. We also study how this degenerate bifurcation splits when the fluid depth is perturbed. In Section 4.4, we study what goes wrong in the Penney and Price conjecture, which predicts that the limiting standing wave of extreme form will develop sharp 90 degree corner angles at the wave crests. We also discuss energy conservation, decay of Fourier modes, and validation of accuracy. In Section 4.5, we study collisions of counter-propagating solitary water waves that are elastic in the sense that the background radiation is identical before and after the collision. In Section 4.6, we study time-periodic gravity-capillary waves of the type studied by Concus concus:62 () and Vanden-Broeck vandenBroeck:84 () using perturbation theory. Finally, in Section 4.7, we compare the performance of the algorithms on a variety of parallel machines, using MINPACK as a benchmark for solving nonlinear least squares problems.

### 4.1 Standing waves of unit depth

We begin by computing a family of symmetric standing waves with mean fluid depth and zero surface tension. The linearized equations about a flat rest state are

(4.34) |

Thus, the linearized problem has standing wave solutions of the form

(4.35) |

Setting , , , these solutions have period . Here is a free parameter controlling the amplitude, and is determined by .

To find time-periodic solutions of the nonlinear problem, we start with a small amplitude linearized solution as an initial guess. Holding constant, we solve for the other in (3.15) using the ACM method of Section 3.2. Note that in the linearized regime. We then repeat this procedure for another value of to obtain a second small-amplitude solution of the nonlinear problem. The particular choices we made were and . We then varied in increments of , using linear extrapolation from the previous two solutions for the initial guess. The results are shown in Fig. 2. The two representative solutions labeled A and B show that the amplitude of the wave increases and the crest sharpens as the magnitude of increases. We chose to be negative so the peak at would occur at rather than . An identical bifurcation curve (reflected about the -axis) would be obtained by increasing from 0 to positive values. The solutions A and B would then be shifted by in space.

For most values of between and , the ACM method has no difficulty finding time-periodic solutions to an accuracy of . However, at , the minimum value of exceeds this target. On further investigation, we found there was a small gap, , where we were unable to compute time-periodic solutions even after increasing from 256 to 512 and decreasing the continuation stepsize to . By plotting other Fourier modes of the initial conditions versus the period, we noticed that the 9th mode jumps discontinuously when crosses this gap. A similar disconnection appears to be developing near solution B.

Studying the results of Fig. 2, we suspected we could find additional solutions by back-tracking from B to the region of the bifurcation curve around and performing a large extrapolation step to , hoping to jump over the disconnection at B. This worked as expected, causing us to land on the branch that terminates at G in Fig 3. We used the same technique to jump from this branch to a solution between E and D. We were unable to find any new branches beyond C by extrapolation from earlier consecutive pairs of solutions.

Next we track each solution branch as far as possible in each direction. This requires switching among the as bifurcation parameters when traversing different regions of the solution space. The period, , is one of the options. We also experimented with pseudo-arclength continuation keller:68 (); doedel91 (); smith:roberts:99 (), but found that it is necessary to re-scale the Fourier modes to successfully traverse folds in the bifurcation diagram. This requires just as much human intervention as switching among the , so we abandoned the approach. The disconnections at A and B turn out to meet each other, so that B is part of a closed loop and A is connected to the branch containing G. We stopped at G, F, C because the computations became too expensive to continue further with the desired accuracy of using the adjoint continuation method.

The use of Fourier modes of the initial conditions in the bifurcation diagrams is unconventional, but yields insight about the effect of resonance on the dynamics of standing waves. We observe experimentally that disconnections in the bifurcation curves correspond to higher-frequency standing waves appearing at the surface of lower-frequency carrier waves. Because the equations are nonlinear, only certain combinations of amplitude and phase can occur. We generally see two possible solutions, one in which the high and low-frequency component waves are in phase with each other, and another where they are out of phase. For example, solutions F and G in Fig. 3 can both be described as a wave-number standing wave oscillating on top of a carrier wave, but the smaller wave sharpens the crest at F and flattens it at G, being 180 degrees out of phase at F versus G when the composite wave comes to rest. (All the standing waves of this paper reach a rest state at , by construction. Other types of solutions will be considered in future work water:stable ().) In Section 4.3, we show that this disconnection between branches F and G is caused by a harmonic resonance at fluid depth , where the period of the mode is equal to 3 times the period of the mode for small-amplitude waves smith:roberts:99 (); ioualalen:03 ().

In Fig. 4, we plot versus , along with the evolution of for several solutions over time. Note that the scale on the -axis is 20 times larger here (with ) than in Fig. 3 (with ). This is why the secondary standing waves in the plots of solutions F and G appear to have wave number . We also note that the disconnections at A and B are nearly invisible in the plot of vs . This is because the dominant wave number of these branches is . Similarly, it is difficult to observe any of the side branches in the plot of vs in Fig. 3 since they all sweep back and forth along nearly the same curve. We will return to this point in Section 4.3.

### 4.2 Nucleation of imperfect bifurcations

We next consider the effect of fluid depth on these bifurcation curves. We found the ACM method was too slow to perform this study effectively, which partly motivated us to develop the trust region shooting algorithm. As shown in Fig. 5, if the fluid depth is increased from to , it becomes possible to track branches F and G to completion. The large amplitude oscillations in the 7th Fourier mode eventually die back down when these branches are followed past the folds at in Fig. 5. The branches eventually meet each other at an imperfect bifurcation close to the initial bifurcation from the zero-amplitude state to the standing wave solutions. This imperfect bifurcation was not present at . Its nucleation will be investigated in greater detail in Section 4.3. The small bifurcation loops at A and B in Fig. 3 have disappeared by the time . If we continue to increase to , the top wing of the S-shaped bifurcation loop breaks free from the bottom wing and forms a closed loop. This loop disappears by the time reaches . By , the resonance has all but disappeared.

In Fig. 2, we saw that the period of standing waves of unit depth decreases to a local minimum before increasing with wave amplitude. Two of the plots of Fig. 5 show that this remains true for , but not for . In the latter case, the period begins increasing immediately rather than first decreasing to a minimum. This is consistent with the asymptotic analysis of Tadjbakhsh and Keller tadjbakhsh (), which predicts that

(4.36) |

where controls the wave amplitude, and agrees with in (4.35) to linear order. The correction term is positive for and negative for .

We will see in Section 4.3 that the nucleation of bifurcation branches between and is partly caused by a harmonic resonance (defined below) at fluid depth . As this mechanism is complicated, we also looked for simpler examples in deeper water. The simplest case we found is shown in Fig. 6. For fluid depth , we noticed a pair of disconnections in the bifurcation curves that were not present for . The 23rd Fourier mode of the initial condition exhibits the largest deviation from 0 on the side branches of these disconnections. However, as discussed in the next section, this is not caused by a harmonic resonance of type for some integer . To investigate the formation of these side branches, we swept through the region with slightly different values of , using as the bifurcation parameter. As shown in Fig. 6, when , the bifurcation curve bulges slightly but does not break. As is decreased to , a pair of disconnections appear and spread apart from each other. We selected as a good starting point to follow the side branches. As we hoped would happen, the two red side branches in the second panel of the figure met up with each other (at ), as did the two black branches (at ). We switched between and as bifurcation parameters to follow these curves. We then computed two paths (not shown) in which was held fixed as was increased. We selected 4 of these solutions to serve as starting points to track the remaining curves in Figure 6, which have fluid depths through given in the figure. We adjusted to achieve a near three-way bifurcation. This bifurcation is quite difficult to compute as the Hessian of becomes nearly singular; for this reason, some of the solutions had to be computed in quadruple precision to avoid falling off the curves. Finally, to find the points A and B where a single, isolated solution exists at a critical depth, we computed as a function of on a small grid patch near A and B, and maximized the polynomial interpolant using Mathematica.

### 4.3 Degenerate and secondary bifurcations due to harmonic resonance

In this section, we explore the source of the resonance between the and modes in water of depth close to 1. While harmonic resonances such as this have long been known to cause imperfect bifurcations mercer:94 (); smith:roberts:99 (); ioualalen:03 (), we are unaware that anyone has been able to track the side-branches all the way back to the origin, where they meet up with mixed-mode solutions of the type studied asymptotically by Vanden-Broeck vandenBroeck:84 () and numerically by Bryant and Stiassnie bryant:stiassnie:94 (). In the traveling wave case, such mixed-mode solutions are known as Wilton’s ripples chen:saffman:79 (); akers:wilton (). When the fluid depth is perturbed, we find that the degenerate bifurcation splits into a primary bifurcation and two secondary bifurcations bauer:keller (). This is consistent with Bridges’ work on perturbation of degenerate bifurcations in three-dimensional standing water waves in the weakly nonlinear regime bridges:standing:3d ().

We begin by observing that the ratio of the periods of two small-amplitude standing waves is

(4.37) |

If we require to be an integer and set , we obtain

(4.38) |

Following mercer:94 (); smith:roberts:99 (); ioualalen:03 (), we say there is a harmonic resonance of order if satisfies (4.38). At this depth, linearized standing waves of wave number have a period exactly times larger than standing waves of wave number . This nomenclature comes from the short-crested waves literature ioualalen:93 (); ioualalen:96 (); a more general framework can be imagined in which is not assumed equal to 1 and is allowed to be rational, but we do not need such generality.

We remark that the nucleation event discussed in the previous section does not appear to be connected to a harmonic resonance. In that example, the fundamental mode must have a fairly large amplitude before the secondary wave becomes active, and the secondary wave is not a clean mode. Also, no integer causes the fluid depth of an resonance to be close to . The situation is simply that at a certain amplitude, the standing wave excites a higher-frequency, smaller amplitude standing wave that oscillates at its surface. It is not possible to decrease both of their amplitudes to zero without destroying the resonant interaction in this case.

We now restrict attention to the harmonic resonance. Setting and in (4.38) yields

(4.39) |

In the nonlinear problem, when the fluid depth has this critical value, we find that the and branches persist as if the other were not present. Indeed, the former can be computed as a family of solutions on a fluid of depth . The latter can be computed by taking a pure solution of the linearized problem as a starting guess and solving for the other Fourier modes of the initial conditions, as before. When this is done, after setting , we find that , , , and . To obtain these numbers, we used 10 values of between and and computed the slope of a log-log plot. The calculations were done in quadruple precision with a 32 digit estimate of to avoid corruption by roundoff error. If we repeat this procedure with , we find instead that , . Thus, the degeneracy of the bifurcation at appears to slow the decay rate of the 7th and higher modes, but not enough to affect the behavior at linear order.

We were surprised to discover that two additional branches also bifurcate from the stationary solution when . For these branches, we find that , where the first several values of are

These numbers were computed as described above, with ranging between and . To get a clean integer for , and , we had to drop down to the range . Using the Aitken-Neville algorithm deuflhard () to extrapolate to , we obtain the leading coefficients for the two branches: