# A Non-Intrusive Parallel-in-Time Adjoint Solver with the XBraid Library

## Abstract

In this paper, an adjoint solver for the multigrid in time software library XBraid [56] is presented. XBraid provides a non-intrusive approach for simulating unsteady dynamics on multiple processors while parallelizing not only in space but also in the time domain. It applies an iterative multigrid reduction in time algorithm to existing spatially parallel classical time propagators and computes the unsteady solution parallel in time. Techniques from Automatic Differentiation are used to develop a consistent discrete adjoint solver which provides sensitivity information of output quantities with respect to design parameter changes. The adjoint code runs backwards through the primal XBraid actions and accumulates gradient information parallel in time. It is highly non-intrusive as existing adjoint time propagators can easily be integrated through the adjoint interface. The adjoint code is validated on advection-dominated flow with periodic upstream boundary condition. It provides similar strong scaling results as the primal XBraid solver and offers great potential for speeding up the overall computational costs for sensitivity analysis using multiple processors.

## 1Introduction

With the rapid increase in computational capacities, computational fluid dynamics (CFD) has nowadays become a powerful tool to predict and analyze unsteady aerodynamics. For given problem specific design parameters (such as geometry, boundary and initial conditions or material coefficients) a numerical CFD simulation approximates the state (e.g. density, velocity or temperature) of a dynamical system that is determined by a set of unsteady partial differential equations (PDEs)

with initial condition and a right hand side that involves spatial derivatives, source terms, etc. Accounting for the unidirectional flow of information in the time domain, an approximation to the unsteady PDE solution is typically evolved forward in time in a step-by-step manner applying a time marching algorithm [4]. Starting from the initial condition, these schemes march forward in discrete time steps applying nonlinear subiterations to approximate a pseudo-steady state at each time step as for example in the dual-time stepping approach [31].

However, in many applications, the primal flow is not the only computation of particular interest. The ability to compute sensitivities is also needed to determine the influence of design changes to some objective function that approximates the time-average of instantaneous quantities of the flow dynamics:

The ability to compute sensitivities can improve and enhance the simulation tool, as for example through parameter estimation for validation and verification purposes [41], error estimation and adaptive grid refinement [55] or uncertainty quantification techniques [3]. Further, it broadens the application range from pure simulation to optimization as for example in an optimal control or shape optimization framework [30].

If the number of independent design parameters is large, the adjoint approach for sensitivity computations is usually preferred since its computational cost does not scale with the design space dimension [45]. In that approach, only one additional adjoint equation needs to be solved in order to set up the gradient of the objective function with respect to . For unsteady time marching schemes, solving the adjoint equation involves a reverse time integration loop that propagates sensitivities backwards through the time domain starting form a terminal condition [39]. Hence evaluating the gradient is a rather computationally expensive task as it involves a forward loop over the time domain to approximate the PDE solution followed by a backwards time marching loop for the adjoint.

One way to mitigate this costly procedure of forward and backward sweeps over the time domain is to parallelize the time dimension. Overall, the need for parallel-in-time methods is being driven by changes in computer architectures, where future speedups will be available through greater concurrency, not faster clock speeds, which are stagnant. Previously, ever faster clock speeds made it possible to speedup fixed-size sequential time stepping problems (strongly scale), and also to refine problems in time and space without increasing the wall-clock time (weakly scale). However with stagnate clock speeds, speedups will only be possible through the increasing concurrency of future architectures. Thus, to overcome this serial time stepping bottleneck, time stepping codes must look for new parallelism from the time dimension. This bottleneck is particularly acute for adjoint solvers because of the many sweeps over the time domain.

Research on parallel-in-time methods goes at least back to the 1964 work by Nievergelt [43]. Since then, a variety of approaches have been developed, including parareal [36], which is perhaps the most popular parallel-in-time method. For a recent literature review and a gentle introduction, see [17]. The method used here is multigrid reduction in time (MGRIT) [11] and is based on multigrid reduction [46]. MGRIT is relatively non-intrusive on existing codes and when restricted to two grid levels, is equivalent to parareal [11]. This multilevel distinction is important, because it allows for optimal parallel communication behavior, as opposed to a two-level scheme, where the size of the coarse-level limits concurrency. This work uses XBraid [56], an open source implementation of MGRIT developed at Lawrence Livermore National Lab.

Previous time parallel approaches to optimization include multiple shooting methods [29], parareal approaches applied to a reduced Hessian [8], and the Schwarz preconditioner approaches in [2]. While these previous approaches were effective, applying XBraid for optimization offers some possible advantages, e.g., XBraid offers multilevel scalability, and the potential for greater parallelism over the Schwarz approaches. The parallelism for the Schwarz approaches is limited by the size of fine-grid “subdomains” which must each be solved sequentially, but for XBraid, the size of such subdomains is determined by the temporal coarsening factor, which can be as small as 2.

When considering the intersection of optimization, CFD, and parallel-in-time, a prerequisite is that parallel-in-time be effective for standard fluid dynamics problems, and recent advances indicate that parallel-in-time can indeed be an effective tool here. In [14], the Parallel Implicit Time-Integration Algorithm (PITA) was proposed, which enjoys success for highly advective problems, targeting in part fluid dynamics [49]. It essentially uses a parareal framework, but creates a massive Krylov space to stabilize and accelerate the method. While effective, the memory usage of this algorithm is discouraging. When considering unmodified parareal, the works [16] show that parareal can be effective for fluids problems when sufficient physical (not artificial) diffusion is present. Similar success with XBraid was shown in [12] for vortex shedding. Lastly, the work [54] showed a path forward for XBraid and purely advective problems that relies only on artificial dissipation.

The XBraid package will be presented in Section Section 2. Section 3 reviews the adjoint approach for computing sensitivities. Then, in Section 4, an approach to modifying XBraid in order to integrate existing adjoint time stepping codes into the XBraid framework is developed. Exploiting techniques from Automatic Differentiation (AD) [27], the primal XBraid iterations are enhanced by an adjoint iteration that runs backwards through the XBraid actions and computes consistent discrete adjoint sensitivities. Similar to the primal XBraid solver, the resulting adjoint solver is non-intrusive in the sense that existing adjoint simulation codes can easily be integrated through the extended user interface. Finally, the adjoint code is validated in Section 5 by applying it to advection-dominated flow with a periodic upstream boundary.

## 2Multigrid Reduction in Time using XBraid

To describe the MGRIT algorithm implemented by XBraid, let be the solution to a time-dependent problem on the interval . Let be a time grid of that interval with spacing , and let be an approximation of .

Classical time marching schemes successively compute based on information at previous time steps and the design . Implicit methods are often prefered due to better stability properties leading to nonlinear equations at each time step that need to be solved iteratively for . Wrapping these iterations into a time stepper , a general one-step time marching scheme can be written as

with given initial condition . Application to multi-step methods which involve more than one previous time step can be found in [10] and are based on a reformulation into a block one-step scheme.

When considering the linear case (only as motivation) with , time stepping is equivalent to a forward sequential solve of the block lower bidiagonal system ,

The MGRIT algorithm replaces the sequential time stepping algorithm with a highly parallel multigrid [52] algorithm. Essentially, a sequence of coarser temporal grids are used to accelerate the solution of the fine grid (Equation 2). The MGRIT algorithm easily extends to nonlinear problems with the Full Approximation Storage (FAS) nonlinear multigrid method [6]. MGRIT solves the discrete space-time system in equation (Equation 1); i.e., MGRIT converges to the same solution produced by sequential time stepping on the finest grid. Lastly, while sequential time stepping and MGRIT are optimal, the constant for MGRIT is higher (an optimal direct sequential method has been replaced with an optimal iterative method). This creates a crossover point where a certain number of resources are required to overcome the extra computational work of MGRIT.

The MGRIT algorithm forms its sequence of coarse time grids from the original fine grid by successively coarsening with factor . When a time grid is coarsened, it is decomposed into two sets called -points (points that will go on to form the next coarser grid) and -points (points that exist only on the fine grid). Figure 1 illustrates an example of this. This decomposition then in turn, induces the relaxation method and coarse-grid correction step, which together form the basis of a multigrid method.

Relaxation is a block Jacobi method that alternates between the - and -points. An F-relaxation updates -point values on interval by propagating the -point value with . Each -interval is independent, and can be computed in parallel. Likewise, C-relaxation uses to update each -point value based on the previous -point value. These updates can also be carried out in parallel. Thus, FCF-relaxation corresponds to three relaxation sweeps over -points, -points and -points.

The coarse time grid problem is constructed by rediscretizing the problem using only the red -points, as depicted in Figure 1. This corresponds to eliminating the -points in equation (Equation 2), followed by the substitution of a cheaper coarse time step operator for the exact fine-grid time step operator. Injection is used to transfer vectors between grids (see [11] for more details). When the algorithm is done recursively, using FAS, a multilevel algorithm capable of solving complex nonlinear problems is produced (see [12]). Standard multigrid cycling strategies, e.g., V-cycles and F-cycles (see [52]), can then be applied.

The FAS nonlinear multigrid method is a general nonlinear solver method, and as such, the solution of the global space-time system is a fixed-point of the MGRIT algorithm. For instance, the paper [54] shows that the error propagator of MGRIT is a contraction for many linear cases. Thus, letting denote the solution to the space-time system after MGRIT iterations, and the operator represent the application of one MGRIT iteration, the overall solution process can be represented as

where is converging to a fixed-point with at all time steps .

### 2.1XBraid Interface

The XBraid interface places a high value on non-intrusiveness and is designed to wrap existing time stepping codes. The goal is for the user to generate only a small amount of new wrapper code that then yields a new parallel-in-time capability. XBraid is written in C, but also has an object oriented C++ interface and a Fortran90 interface. The time parallelism is handled with MPI.

To use XBraid, the user is responsible for defining and implementing the following. See [56] and the associated User’s Manual for more details.

A user (application) structure must be defined and is globally available to the user in all wrapper routines such as . This C-style structure generally holds time

*independent*information such as the time-grid definition, MPI communicators as well as the design variable and the current time-averaged objective function .A state structure must be defined so that it captures a complete solution snapshot at a single time point. This C-style structure generally holds time

*dependent*information, e.g., unknown values and time-dependent spatial mesh information.The key user-written routine is the function which takes as input a state at time step and advances it to time step . This function is called everywhere, i.e., on coarse and fine time-grids for large and small time step sizes. This function wraps the user’s existing time stepping routine and defines , i.e.

The user also writes the output routine . This function is called after each XBraid iteration to access and possibly modify the state vectors and to pass information on the instantaneous quantity of interest to the user . It is called individually for each time step value on the finest time grid.

The user must further define a variety of wrapper functions that provide XBraid with additional information for handling the structure. For instance, and must initialize and clone a , respectively. The wrapper must compute a vector sum of two , i.e., . Other functions must free a structure and take the norm of a , typically an Euclidean norm.

Additionally, the user must define how to pack and unpack an MPI buffer so that can be sent between processors. These and functions are generally straightforward and require the user to flatten a into a buffer, and then unpack that buffer into a .

## 3Adjoint Sensitivity Computation

The presented XBraid solver determines a global space-time solution by iteratively solving the fixed-point equation

for a fixed design . Small changes in the design variable result in a perturbed state which induces a perturbation in the objective function both explicitly from the design itself and implicitly through the solution of fixed-point equation . This change is quantified by the sensitivity (or derivative) of with respect to .

One way to determine the sensitivity numerically is the finite difference approach which approximates the derivative for design perturbations with unity directions by evaluating

for small , where is the space-time solution of the fixed-point equation for the perturbed design parameter. However, a full unsteady simulation would be necessary to compute such that the computational cost for computing the full gradient of , which consists of the derivatives in all unity directions in , grows proportional to the design space dimension .

### 3.1Adjoint sensitivities via Lagrangian formalism

A well-established alternative approach to computing the gradient of with respect to is the adjoint method [35]. In that approach, an additional equation is derived by differentiating and partially with respect to and . Its solution, the so-called adjoint variable, is then used to determine the gradient of for a computational cost that does not scale with the number of design parameters. The adjoint equation can be derived using an augmented function (the so-called Lagrangian function)

which adds a multiple of the fixed-point equation to the objective function with a multiplier . If is a solution to the fixed-point equation, the derivatives of and with respect to coincide and are computed from the chainrule as

The term represents the sensitivity of the flow solution with respect to design changes whose numerical computation is extremely expensive as it would require primal flow simulations in a finite difference setting. The adjoint approach avoids these computations by choosing the multiplier in such a way that terms containing these sensitivities add up to zero: Choosing such that it satisfies the so-called adjoint equation

the gradient of is then given by

according to equation . Instead of solving the fixed-point equation times for each unity direction in the design space, the adjoint approach requires only the solution of one additional (adjoint) equation for the adjoint variable in order to determine the gradient of with respect to .

### 3.2Simultaneous piggyback iteration for the primal and adjoint variable

The adjoint equation is linear in the adjoint variable. Hence, if is contractive, it can be solved with the linear fixed-point iteration

starting from where the right hand side is evaluated at a feasible state that satisfies . However, it is shown in [25] that the adjoint iteration can also be performed simultaneously with the primal iterations in the following *piggyback* approach:

This approach is especially attractive for simultaneous optimization algorithms as for example the One-shot method where small changes to the design are introduced in each piggyback iteration based on the current evaluation of the gradient [24]. For the case considered here of a fixed design , the piggyback iterations converge to the feasible primal and adjoint solution at the same asymptotic convergence rate determined by the contraction rate of . Due to the dependency of the adjoint solution on a feasible state, the adjoint variable is expected to lag a bit behind the primal iterates which has been analyzed in [26]. If the contractivity assumption is slightly violated such that eigenvalues close to or even outside the unit sphere are present, the adjoint iteration can be stabilized using the recursive projection method (RPM), see for example [50].

## 4The adjoint XBraid solver

While the primal XBraid solver provides an iteration for updating the primal state as in , an adjoint XBraid solver has been developed that updates the adjoint variable as in and evaluates the gradient at the current iterates. For current primal and adjoint input variables , it returns

where is the new adjoint iterate and holds the current approximation of the sensitivity of with respect to as in . Since and refer to numerical algorithms, their partial derivatives are computed by adopting techniques from Automatic Differentiation (AD) [27].

AD is a set of techniques that modify the semantics of a computer program in such a way that it not only computes the primal output but also provides sensitivities of the outputs with respect to input variations. It relies on the fact that any computer program for evaluating a numerical function can at runtime be regarded as a concatenation of elemental operations whose derivatives are known. Thus, the derivative of the output can be computed by applying the chain rule to the elemental operations. Two modes are generally distinguished: while the forward mode computes directional derivatives, the reverse mode returns transposed matrix-vector products of the sensitivities which is hence the method of choice for setting up the adjoint iteration.

To be more precise, consider a numerical algorithm for evaluating with input and output which at runtime is a concatenation of elemental operations denoted by such that

Given a vector , the reverse mode computes the sensitivity with

by applying the chain rule to the elemental operations. First the primal output itself is evaluated in a forward loop

with , which yields . Then a reverse loop evaluates and concatenates the local sensitivity of the elemental operations

using the reverse input , which yields the transposed sensitivity product as in .

Following the above methodology, the adjoint XBraid solver is constructed by concatenating local sensitivities of elemental operations that produce the output from the input which yields

Choosing the reverse input vector , this produces the desired sensitivities as in . Since XBraid modifies its input solely by calling the user-defined interface routines as introduced in Section 2.1, these interface routines are identified with the elemental operations . The primal XBraid iteration manages the control flow of these actions and performs a forward loop for computing the primal output as in . The adjoint XBraid solver marches backwards through the same control flow in a reverse loop as in calling the differentiated routines that evaluate the local sensitivities. A Last-In-First-Out (LIFO) data structure, called *action tape* therefore memorizes the control flow of the primal actions . The actions are overloaded in such a way, that they still calculate their primal output but as a side effect, they record themselves, their arguments and a pointer to the corresponding intermediate adjoint on the action tape. After the execution of a primal XBraid iteration, the reverse adjoint loop pops the elements from the action tape and calls the corresponding differentiated action which updates the corresponding intermediate adjoint and the gradient with the correct local sensitivity information. This essentially creates a separate adjoint XBraid solver that reverses through the primal XBraid actions collecting sensitivities. Thus, an equation of the form is solved but going backwards in time.

According to the reverse mode of AD, the local sensitivities are transposed matrix-vector products of the sensitivities of multiplied with the intermediate adjoint vectors. Table 1 lists the original interface routines and their differentiated versions. Note, that the differentiated routines perform updates of the intermediate variables using the “” notation for assigning “” rather then overwriting “”. This is due to an improved implementation of the reverse mode that avoids storing the complete program state during the forward loop but rather stores only the local input variables (see e.g. [27] for an elaborate analysis).

While the primal memory de-/allocation is managed by calling the primal user routines and , memory management of the intermediate adjoint variables is automated by the use of shared pointers with reference counting. The current implementation uses the smart pointer class defined in the C++11 standard library [32]. It counts the number of pointer copies to a specific object and destroys it automatically as soon as the last reference is removed.

Action |
Original |
Differentiated |

MPI_Send() | MPI_Recv() | |

MPI_Recv() | MPI_Send() | |

### 4.1Adjoint XBraid interface

Only two of the user-defined routines are nonlinear, namely and , such that their differentiated versions are non-constant and contain problem specific functions. The adjoint XBraid interface therefore consists only of two additional routines that provide their derivatives:

: This function corresponds to taking one adjoint time step backwards in time. Given a design , a state variable at time step and an adjoint input variable at time step , it updates the adjoint variable at that time step and the gradient according to

updates the adjoint variable at time and the gradient according to the partial derivatives of the instantaneous quantity :

Similar to the primal XBraid solver, the adjoint user interface is non-intrusive to existing adjoint time marching codes as they propagate the same sensitivities of the time stepper and backwards through the time domain in a step-by-step manner. Even though the adjoint XBraid solver has been derived utilizing techniques from AD, the adjoint user interface is not restricted to the use of AD for generating the desired derivatives of and . The adjoint interface rather enables integration of any standard unsteady adjoint solver into a parallel-in-time framework.

## 5Numerical Results

The adjoint XBraid solver is validated on a model problem that mimics unsteady flow behind bluff bodies at low Reynolds numbers. Such flows typically contain two dominant flow regimes: In the near wake, periodic vortices are forming asymmetrically which shed into the far wake where they slowly dissipate, known as the Karman Vortex street. The Van-der-Pol oscillator, a nonlinear limit cycle ODE, is used to model the near wake oscillations [33] while the far wake is modeled by an advection-diffusion equation whose upstream boundary condition is determined by the oscillations:

while the advection term dominates with and a small diffusion parameter is chosen. The Van-der-Pol oscillator determines the upstream boundary with

using the initial condition . The parameter influences the nonlinear damping term and serves as design variable that determines the state through the model equations. The objective function measures the space-time average of the solution

The time domain is discretized into time steps with for while a fixed the final time and varying and are used to set up different problem sizes with and corresponding , respectively. The implicit Crank-Nicolson time marching scheme is chosen to approximate the transient term of the model equations while the spatial derivatives are approximated with a second order linear upwind scheme for the advection and central finite differencing for the diffusive term on the spatial grid with . The nonlinear equations at each time step are solved by applying functional iterations for . These iterations are wrapped into the core user routine that moves a state to the next time step. The user routine then evaluates the instantaneous quantity at time step . The sensitivities and that are needed in the adjoint interface routines and are generated using the AD-Software CoDiPack [7] for differentiating through their corresponding primal routines in reverse mode. The time grid hierarchy for the multigrid iterations in the primal and adjoint XBraid solver use a coarsening factor of with a maximum of three time grid levels. Adding more levels generates coarse grid time step sizes incompatible with the nonlinear time step solver, leading it to diverge. However, even with three levels, a reasonable speedup can be demonstrated. Implementing a more stable nonlinear time step routine, that would allow for more coarsening in time, is future work. The consideration of spatial coarsening is also future work, which would control the ratio on coarser grids, which often has the effect of making the nonlinear time step solver more stable.

Figure 2 shows the residuals of the primal and adjoint iterates, and during a piggyback iteration as in – using a fixed design . As expected, both residuals drop simultaneously while the adjoint iterates exhibit a certain time lag.

The gradient provided by the adjoint XBraid solver is validated in Table 2 which shows good agreement with those computed from finite differences with a relative error below two percent.

finite differences | adjoint sensitivity | rel. error | |||

### 5.1Parallel scaling for the primal and adjoint XBraid solver

A weak scaling study for the primal and the adjoint XBraid solver is shown in Table 3. The reported speedups are computed by dividing the runtime of the primal and adjoint XBraid solver by that of a classical time-serial primal forward and adjoint backward time stepper, respectively.

Number of | Number of | Iterations | Runtime | Runtime | Speedup | |

time steps | Cores | XBraid | XBraid | Serial | ||

60000 | 64 | 5 | 0.49 sec | 1.13 sec | 2.84 | |

120000 | 128 | 4 | 0.50 sec | 2.97 sec | 5.97 | |

240000 | 256 | 4 | 0.76 sec | 4.81 sec | 6.31 | |

Number of | Number of | Iterations | Runtime | Runtime | Speedup | |

time steps | Cores | XBraid | XBraid | Serial | ||

60000 | 64 | 6 | 6.54 sec | 14.49 sec | 2.22 | |

120000 | 128 | 4 | 6.12 sec | 26.27 sec | 4.23 | |

240000 | 256 | 4 | 9.44 sec | 49.18 sec | 5.21 | |

Strong scaling results for primal as well as the adjoint runtimes are visualized in Figure 3. Even though the adjoint runtimes show an overhead factor of when compared to the primal ones, the slope of reduction for the adjoint runtimes closely follows that of the primal slope. This confirms that the adjoint XBraid solver indeed inherits the scaling behavior of the primal solver which is expected from its AD-based derivation. Since the primal XBraid solver is under active development (compare [56]), this property is particularly beneficial as improvements on primal scalability will automatically carry over to the adjoint code. The same data is used in Figure 4 to plot speedup when compared to a time-serial primal and adjoint time marching scheme. It shows the potential of the time-parallel adjoint for speeding up the runtime of existing unsteady adjoint time marching schemes for sensitivity evaluation.

## 6Conclusion

In this paper, we developed an adjoint solver that provides sensitivity computation for the parallel-in-time solver XBraid. XBraid applies nonlinear multigrid iterations to the time domain of unsteady partial differential equations and solves the resulting space-time system parallel-in-time. It operates through a high-level user interface that is non-intrusive to existing serial time marching schemes for solving unsteady PDEs. This property is of particular interest for applications where unsteady simulation tools have already been developed and refined over years, as is often the case for many CFD applications.

While the primal XBraid solver computes a space-time solution of the PDE for given input parameters and also evaluates objective functions that are of interest to the user, the proposed time-parallel adjoint XBraid solver computes derivatives of the objective function with respect to changes in the input parameters. Classical adjoint sensitivity computations for unsteady PDEs involve a forward-in-time sweep to compute the unsteady solution followed by a backwards-in-time loop to collect sensitivities. The parallel-in-time adjoint XBraid solver offers speedup by distributing the backwards-in-time phase onto multiple processors along the time domain. Its implementation is based on techniques of the reverse-mode of AD applied to one primal XBraid iteration. This yields a consistent discrete adjoint code that inherits parallel scaling properties from the primal solver and is non-intrusive to existing adjoint sequential time marching schemes.

The resulting adjoint solver adds two additional user routines to the primal XBraid interface: one for propagating sensitivities of the forward time stepper backwards in time and one for evaluating partial derivatives of the objective function at each time step. In cases where a time-serial unsteady adjoint solver is already available, this backwards time stepping capability can be easily wrapped into the adjoint XBraid interface with little extra coding.

The adjoint solver has been validated and tested on a model problem for advection-dominated flow. The original primal and adjoint time marching codes were limited to one processor such that a linear increase in the number of time steps results in a linear increase in corresponding runtime. This creates a situation analogous to the one where a spatially parallel code has reached its strong scaling limit. The parallel-in-time primal and adjoint XBraid solvers were able to achieve speedups of about (primal) and (adjoint) when compared to the serial ones, running on up to processors for the time parallelization. More importantly, it has been validated that the scaling behavior of the adjoint code is similar to that of the primal one such that improvements on the primal XBraid solver carry over to the adjoint implementation.

### References

**In: 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, AIAA Aviation (2017)**

Albring, T., Dick, T., Gauger, N.R.: Assessment of the recursive projection method for the stabilization of discrete adjoint solvers.**Computer Physics Communications****197**, 136–143 (2015)

Barker, A.T., Stoll, M.: Domain decomposition in time for pde-constrained optimization.**Computer methods in applied mechanics and engineering****196**(33), 3190–3218 (2007)

Beyer, H.G., Sendhoff, B.: Robust optimization–a comprehensive survey.**Elsevier Ltd. (2005)**

Blazek, J.: Computational fluid dynamics: principles and applications, 2nd edn.**In: G. Leugering, P. Benner, S. Engell, A. Griewank, H. Harbrecht, M. Hinze, R. Rannacher, S. Ulbrich (eds.) Trends in PDE Constrained Optimization, pp. 43–66. Springer International Publishing (2014)**

Bosse, T., Gauger, N.R., Griewank, A., Günther, S., Schulz, V.: One-shot approaches to design optimzation.**Math. Comp.****31**(138), 333–390 (1977)

Brandt, A.: Multi–level adaptive solutions to boundary–value problems.**http://www.scicomp.uni-kl.de/software/codi/ (2017)**

CoDiPack - Code Differentiation Package (Version 1.0).**ETNA****40**, 36–57 (2013)

Du, X., Sarkis, M., Schaerer, C., Szyld: Inexact and truncated parareal-in-time krylov subspace methods for parabolic optimal control problems.**In: 51st AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, p. 632 (2013)**

Economon, T., Palacios, F., Alonso, J.: Unsteady aerodynamic design on unstructured meshes with sliding interfaces.**SIAM J. Sci. Comput., submitted, LLNL-JRNL-678572 (2015)**

Falgout, R., Friedhoff, S., Kolev, T.V., MacLachlan, S., Schroder, J., Vandewalle, S.: Multigrid methods with space-time concurrency.**SIAM J. Sci. Comput.****36**(6), C635–C661 (2014).

Falgout, R.D., Friedhoff, S., Kolev, T.V., MacLachlan, S.P., Schroder, J.B.: Parallel time integration with multigrid. LLNL-JRNL-645325**Tech. Rep. LLNL-JRNL-663416, Lawrence Livermore National Laboratory (2015)**

Falgout, R.D., Katz, A., Kolev, T.V., Schroder, J.B., Wissink, A., Yang, U.M.: Parallel time integration with multigrid reduction for a compressible fluid dynamics application.**SIAM J. Sci. Comput. (to appear).**

Falgout, R.D., Manteuffel, T.A., O’Neill, B., Schroder, J.B.: Multigrid reduction in time for nonlinear parabolic problems. LLNL-JRNL-692258**Int. J. Numer. Meth. Engng****58**, 1397–1434 (2003)

Farhat, C., Chandesris, M.: Time-decomposed parallel time-integrators: Theory and feasibility studies for uid, structure, and fluid–structure applications.**Springer Berlin Heidelberg (2002)**

Ferziger, J., Peric, M.: Computational Methods for Fluid Dynamics, 3rd edn.**In: Proceedings of the Fifteenth International Conference on Domain Decomposition Methods, pp. 433–440. Springer-Verlag (2005)**

Fischer, P., Hecht, F., Maday, Y.: A parareal in time semi-implicit approximation of the Navier-Stokes equations.**In: T. Carraro, M. Geiger, S. Körkel, R. Rannacher (eds.) Multiple Shooting and Time Domain Decomposition, pp. 69–114. Springer (2015)**

Gander, M.J.: 50 years of time parallel time integration.**In: Domain Decomposition Methods in Computational Science and Engineering XXII,***Lecture Notes in Computational Science and Engineering*, vol. 104, pp. 207–216. Springer-Verlag (2016)

Gander, M.J., Kwok, F.: Schwarz Methods for the Time-Parallel Solution of Parabolic Control Problems.**SIAM J. Sci. Comput.****29**(2), 556–578 (2007)

Gander, M.J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method.**In: G. Leugering, S. Engell, A. Griewank, M. Hinze, R. Rannacher, V. Schulz, M. Ulbrich, S. Ulbrich (eds.) Constrained Optimization and Optimal Control for Partial Differential Equations, pp. 99–122. Springer Basel (2012)**

Gauger, N., Griewank, A., Hamdi, A., Kratzenstein, C., Özkaya, E., Slawig, T.: Automated extension of fixed point PDE solvers for optimal design with bounded retardation.**In: 13th Computational Fluid Dynamics Conference, p. 1850 (1997)**

Giles, M., Pierce, N., Giles, M., Pierce, N.: Adjoint equations in CFD: Duality, boundary conditions and solution behaviour.**Flow, Turbulence and Combustion****65**(3), 393–415 (2000)

Giles, M.B., Pierce, N.A.: An introduction to the adjoint approach to design.**Acta numerica****11**, 145–236 (2002)

Giles, M.B., Süli, E.: Adjoint methods for PDEs: a posteriori error analysis and postprocessing by duality.**In: G. Pillo, M. Roma (eds.) Large-Scale Nonlinear Optimization, pp. 151–171. Springer (2006)**

Griewank, A.: Projected hessians for preconditioning in one-step one-shot design optimization.**Numer. Algorithms****30**, 113–139 (2002)

Griewank, A., Faure, C.: Reduced functions, gradients and hessians from fixed-point iterations for state equations.**In: Proceedings of CARI’04, 7th African Conference on Research in Computer Science, pp. 295–304 (2004)**

Griewank, A., Ponomarenko, A.: Time-lag in derivative convergence for fixed point iterations.**SIAM (2008)**

Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.**Journal of Computational and Applied Mathematics****294**, 12–22 (2016)

Günther, S., Gauger, N.R., Wang, Q.: Simultaneous single-step one-shot optimization with unsteady pdes.**J. Comput. Appl. Math.****173**(1), 169––198 (2005)

Heinkenschloss, M.: A time-domain decomposition iterative method for the solution of distributed linear quadratic optimal control problems.**Journal of Scientific Computing****3**(3), 233–260 (1988)

Jameson, A.: Aerodynamic design via control theory.**In: Proc. 10th Comp. Fluid Dyn. Conf., Honolulu, USA, June 24-26, AIAA-Paper 91-1596 (1991)**

Jameson, A.: Time dependent calculations using multigrid, with applications to unsteady flows past airfoils and wings.**Addison-Wesley (2012)**

Josuttis, N.M.: The C++ standard library: a tutorial and reference.**Scholarpedia****2**(1), 2202 (2007)

Kanamaru, T.: Van der pol oscillator.**In: Domain Decomposition Methods in Science and Engineering XXI, pp. 647–655. Springer (2014)**

Krause, R., Ruprecht, D.: Hybrid space–time parallel solution of Burgers’ equation.- Lions, J.L.: Optimal control of systems governed by partial differential equations problèmes aux limites (1971)
**C.R.Acad Sci. Paris Sér. I Math****332**, 661–668 (2001)

Lions, J.L., Maday, Y., Turinici, G.: Résolution d’EDP par un schéma en temps pararéel.**In: 46th AIAA Aerospace Sciences Meeting and Exhibit, p. 727 (2008)**

Mavriplis, D.: Solution of the unsteady discrete adjoint for three-dimensional problems on dynamically deforming unstructured meshes.**Oxford University Press (2010)**

Mohammadi, B., Pironneau, O.: Applied shape optimization for fluids.**AIAA journal****45**(7), 1478–1491 (2007)

Nadarajah, S.K., Jameson, A.: Optimum shape design for unsteady flows with time-accurate continuous and discrete adjoint method.**Software, Environments, and Tools. Society for Industrial and Applied Mathematics (2012)**

Naumann, U.: The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation.**Dynamics of Atmospheres and Oceans****27**(1), 55 – 79 (1998)

Navon, I.: Practical and theoretical aspects of adjoint parameter estimation and identifiability in meteorology and oceanography.**AIAA journal****48**(6), 1195–1206 (2010)

Nielsen, E.J., Diskin, B., Yamaleev, N.K.: Discrete adjoint-based design optimization of unsteady turbulent flows on dynamic unstructured grids.**Comm. ACM****7**, 731–733 (1964)

Nievergelt, J.: Parallel methods for integrating ordinary differential equations.**SIAM review****42**(2), 247–264 (2000)

Pierce, N.A., Giles, M.B.: Adjoint recovery of superconvergent functionals from PDE approximations.**Journal of Fluid Mechanics****64**(1), 97–110 (1974)

Pironneau, O.: On optimum design in fluid mechanics.**Tech. Rep. Preprint 277 SFB 72, Universität Bonn (1979)**

Ries, M., Trottenberg, U.: MGR-ein blitzschneller elliptischer löser.**Linear Algebra Appl.****49**, 1–26 (1983)

Ries, M., Trottenberg, U., Winter, G.: A note on MGR methods.**In: 45th AIAA Aerospace Sciences Meeting and Exhibit, p. 1128 (2007)**

Rumpfkeil, M., Zingg, D.: A general framework for the optimal control of unsteady flows with applications.**Computers & Fluids****59**, 72–83 (2012)

Ruprecht, D., Krause, R.: Explicit parallel-in-time integration of a linear acoustic-advection system.**SIAM Journal on numerical analysis****30**(4), 1099–1120 (1993)

Shroff, G.M., Keller, H.B.: Stabilization of unstable procedures: the recursive projection method.**In: A. Abdulle, S. Deparis, D. Kressner, F. Nobile, M. Picasso (eds.) Numerical Mathematics and Advanced Applications - ENUMATH 2013: Proceedings of ENUMATH 2013, the 10th European Conference on Numerical Mathematics and Advanced Applications, Lausanne, August 2013, pp. 195–202. Springer International Publishing, Cham (2015)**

Steiner, J., Ruprecht, D., Speck, R., Krause, R.: Convergence of parareal for the Navier-Stokes equations depending on the Reynolds number.**Academic Press, San Diego (2001)**

Trottenberg, U., Oosterlee, C., Sch uller, A.: Multigrid.**Springer Science & Business Media (2014)**

Tucker, P.: Unsteady computational fluid dynamics in aeronautics.**Copper Mountain Special Section, SIAM J. Sci. Comput. (submitted) (2016).**

V. Dobrev Tz. Kolev, N.P., Schroder, J.: Two-level convergence theory for multigrid reduction in time (MGRIT). LLNL-JRNL-692418**Journal of Computational Physics****176**(1), 40–69 (2002)

Venditti, D.A., Darmofal, D.L.: Grid adaptation for functional outputs: application to two-dimensional inviscid flows.**http://llnl.gov/casc/xbraid**

XBraid: Parallel multigrid in time.