Comparison of adaptive multiresolution and adaptive mesh refinement applied to simulations of the compressible Euler equationsAn early version of this paper has appeared previously in ESAIM Proceedings, 16:181–194, 2009 [21].

Comparison of adaptive multiresolution and adaptive mesh refinement applied to simulations of the compressible Euler equations1

Abstract

We present a detailed comparison between two adaptive numerical approaches to solve partial differential equations (PDEs), adaptive multiresolution (MR) and adaptive mesh refinement (AMR). Both discretizations are based on finite volumes in space with second order shock-capturing, and explicit time integration either with or without local time-stepping. The two methods are benchmarked for the compressible Euler equations in Cartesian geometry. As test cases a 2D Riemann problem, Lax-Liu , and a 3D ellipsoidally expanding shock wave have been chosen. We compare and assess their computational efficiency in terms of CPU time and memory requirements. We evaluate the accuracy by comparing the results of the adaptive computations with those obtained with the corresponding FV scheme using a regular fine mesh. We find that both approaches yield similar trends for CPU time compression for increasing number of refinement levels. MR exhibits more efficient memory compression than AMR and shows slightly enhanced convergence; however, a larger absolute overhead is measured for the tested codes.

1Introduction

Adaptive discretization methods for solving nonlinear PDEs have a long tradition in scientific computing, see e.g., [9]. They are motivated by the computational complexity of real world problems which often involve a multitude of active spatial and temporal scales. Adaptivity can be understood in the sense that the computational effort is concentrated at locations and time instants where it is necessary to ensure a given numerical accuracy, while efforts may be significantly reduced elsewhere. Typical applications are combustion problems with thin chemical reaction zones, fluid and plasma turbulence showing self-organization into coherent vortices, and more generally, most kinds of interface and boundary layer type problems.

One of the essential ingredients of fully adaptive schemes is a reliable error estimator for the solution. It can be provided by Richardson extrapolation, auxiliary solutions of adjoint problems [2], gradient based approaches or wavelet coefficients [49]. However, there is a price for adaptivity, since the computational cost per unknown may increase significantly. Hence, an adaptive method can only be efficient when the data compression is sufficiently large to compensate the additional computational cost, and this is problem dependent. To perform efficient adaptive simulations there is an effort that has to be made to design algorithms and data structures, the latter are usually based on graded trees, hash-tables, linked lists or multi-domains.

In the present paper, adaptive computations of 2D and 3D compressible Euler equations are presented. The goal is to compare, for the same prescribed accuracy, the efficiency in terms of CPU time and memory compression of two approaches: the adaptive multiresolution (MR) method and the adaptive mesh refinement (AMR) method. We consider global time stepping and scale-dependent local time stepping. Both methods are based on explicit finite volume (FV) discretizations on adaptive meshes, with schemes in space and time of second order accuracy. Our interest in comparing MR and AMR methods in the current paper can be seen as a step towards detailed benchmarking, which has been started with a preliminary 2D case study in [21]. Beside considering 3D results, we use an improved version of the MR Carmen code where the memory allocation has been changed and the computational efficiency of the underlying FV scheme has been increased. A detailed analysis of CPU time and memory compression including error assessment allows for a sound evaluation of the MR and AMR methods.

The block-structured AMR technique for hyperbolic PDEs has been pioneered by Berger and Oliger [6]. While the first approach utilized rotated refinement meshes, AMR denotes today especially the variant of Berger and Collela [4] that allows only refinement patches aligned to the coarse mesh. The efficiency of this algorithm, in particular for instationary supersonic gas dynamical problems, has been demonstrated in Bell et al. [3]. Several implementations of the AMR method for single processor computers [5] and parallel systems [3] have been presented; variants utilizing simplified data structures have also been proposed [41]. For an overview of the AMR method and its implementation we refer to [20].

Multiresolution techniques have been developed after AMR, and became popular for hyperbolic conservation laws going back to the seminal work of Harten [30] in the context of FV schemes and cell-average MR analysis. Starting point is an FV scheme for hyperbolic conservation laws on a regular mesh. Subsequently, a discrete multiresolution analysis is used to avoid expensive flux computations in smooth regions, first without reducing memory requirements, e.g., for 1D hyperbolic conservation laws [30], 2D hyperbolic conservation laws [8], 2D compressible Euler equations [11], 2D hyperbolic conservation laws with curvilinear patches [15] and unstructured meshes [1]. A fully adaptive version, still in the context of 1D and 2D hyperbolic conservation laws, has been developed to reduce also memory requirements [29]. This algorithm has been extended to 3D and to parabolic PDEs [49], and more recently to self-adaptive global and local time-stepping by Müller and Stiriba and ourselves in [43]. Therewith the solution is represented and computed on a dynamically evolving automatically adapted mesh. Different strategies have been proposed to evaluate the flux without requiring full knowledge of fine mesh cell-average values. The MR approach has also been used in other discretization contexts. For instance, the Sparse Point Representation (SPR) method is the first fully adaptive MR scheme, introduced in [32] in the context of finite differences and point-value MR analysis, leading to both CPU time and memory reduction. This approach has been explored in applications of the SPR method in [23], and more recently in [46] for parallel implementation of the MR method for FV discretizations. For comprehensive literature about the subject, we refer to the books [12] and our review papers [50].

The outline of the paper is the following: first, we sketch the set of compressible Euler equations together with their finite volume discretization in space and an explicit scheme in time. Then, we briefly summarize the MR and AMR strategies. In the main part, we show and discuss the numerical results for two test problems, one in 2D and one in 3D. Final remarks and conclusions based on the performed computations are drawn in Section 5.

2Numerical methods

For the study of the present paper we consider FV discretizations of the 3D compressible Euler equations given in the conservation form

with , where is the density, is the velocity vector with components , and is the energy per unit of mass. All variables are functions of time and position . The flux function is given by

where the pressure satisfies the ideal gas constitutive relation

where is the temperature, the specific heat ratio and a specific gas constant. In the 2D formulation, the components of the vectors vanish and the variables only depend on and .

As reference discretization for equations in the conservation form (Equation 1), we consider the numerical solution represented by the vector of the approximated cell averages

on cells of a uniform mesh of the computation domain . For space discretization, a FV method is chosen, which results in a system of ordinary differential equations of the form

where denotes the vector of numerical flux function differences with respect to each cell. In all numerical schemes throughout this paper enhanced AUSM-type numerical flux functions with comparable second order accurate reconstruction and flux limiting are used. For all MR simulations, including the corresponding reference FV solution, a second order MUSCL scheme with an AUSM+ flux vector splitting scheme [40] and van Albada limiter is applied. In all AMR computations, also including the corresponding reference FV solution, a standard unsplit shock-capturing MUSCL scheme with Minmod limiter and AUSMDV flux-vector splitting is adopted [53].

For time integration, approximate solutions at a sequence of time instants are obtained using explicit ordinary differential equation solvers. Here, an explicit second order Runge-Kutta (RK2) scheme is used for MR, and the MUSCL-Hancock approach [17] is used for AMR. While both approaches correspond to using a second order accurate explicit midpoint rule for temporal integration, the key difference of the MUSCL-Hancock method is that it utilizes the exact flux function in the intermediate time step instead of the AUSM-type numerical flux. This makes this method slightly less accurate but computationally cheaper.

In summary, six numerical discretizations are considered. For both MR and AMR, FV reference solutions are computed using the corresponding numerical scheme. In addition, we perform in both cases either global or local time stepping. In the following subsections we briefly describe first the MR and then the AMR method.

The adaptive MR scheme belongs to a class of adaptive hybrid methods which are formed by two basic parts: the operational part and the representation part. The operational part consists of an accurate and stable discretization of the partial differential operators. In the representation part, wavelet tools are employed for the MR analysis of the discrete information.

The principle in MR methods is the transformation of the cell averages given by the FV discretization into a multiscale representation. A hierarchy of nested meshes endowed with projection and prediction operators to perform the inter-level transformations are the main building blocks [30]. The numerical solution at the highest resolution level is transformed into a set of coarser scale approximations plus a series of prediction errors corresponding to wavelet coefficients. These coefficients describe the difference between subsequent resolutions. The main idea is then to use the decay of the wavelet coefficients to estimate the local regularity of the solution [12]. In regions where the solution is smooth these coefficients are small, while they have large magnitude in regions of steep gradients or discontinuities. An adaptive MR representation of the numerical solution can then be obtained by a compact multiscale representation which is constructed by removing the wavelet coefficients whose magnitude is smaller than a chosen threshold [49].

A natural way to store the compact MR representation is to use a tree data structure. Locally refined adaptive meshes create incomplete trees, cf. Fig. ?. The adaptive computations are performed on the leaves of the tree, defined as nodes without children, where the cell averages are stored. In this procedure, gradedness of the tree is imposed, i.e., only one level of difference is permitted between subsequent neighbors. For the complete time evolution cycle, three basic operations are considered:

Refinement: The solution may change in time, the adaptive MR mesh at may not be sufficient at the next time step. Thus, a preventive action is necessary to account for possible translation or creation of finer structures in the solution between subsequent time steps. Before performing the time evolution, the solution is interpolated onto an extended mesh. For this the adaptive mesh at is refined by one level while maintaining the gradedness.

Time evolution: First, to ensure conservation of the flux computations between different levels it is necessary to add virtual leaves, as illustrated in Figure ?, right. The time evolution operation is applied only on the leaves of the extended mesh, but not on the virtual ones [49]. There is a possibility to save further CPU time by evolving the solution with level dependent time steps, instead of a global time step. This MRLT scheme with a given time step at the finest level , evolves the cells at coarser levels with larger time steps . Required missing values are interpolated in time at intermediate time steps.

Coarsening: The regions of smoothness of the solution can change after the time evolution. Hence, the adaptive MR mesh must be checked if a coarser mesh is sufficient for an accurate representation of the computed solution. A multiresolution analysis is performed and the thresholding of the wavelet coefficients determines the cells where the mesh can be coarsened.

The cell-average MR analysis used in the present paper corresponds to a third order prediction operator based on quadratic polynomial interpolation of the cell averages. The adaptive MR scheme and its MRLT version, cf. [24], are implemented in the code Carmen2 originally developed by Roussel [49]. Here an optimized version of the Carmen code is used where the runtime has been improved for both the MR and the FV method. The implementation is in C throughout and consists of approximately lines of code (LOC) in total.

The AMR method [6] follows a patch-oriented refinement approach, where non-overlapping rectangular submeshes define the domain of an entire level . As the construction of refinement proceeds recursively, a hierarchy of submeshes successively contained within the next coarser level domain is created, as illustrated in Figure ?. Note that the recursive nature of the algorithm allows only the addition of one new level in each refinement operation. The patch-based approach does not require special coarsening operations; submeshes are simply removed from the hierarchy. The coarsest possible resolution is thereby restricted to the level 0 mesh. Usually, it is assumed that all mesh widths on level are -times finer than on the level , i.e., and , with for and , which ensures that a time-explicit FV scheme remains stable under a CFL-type condition on all levels of the hierarchy.

The numerical update is applied on the level by calling a single-mesh routine implementing the FV scheme in a loop over all the submeshes . The regularity of the input data allows a straightforward implementation of the scheme and further permits optimization to take advantage of high-level caches, pipelining, etc. New refinement meshes are initialized by interpolating the vector of conservative quantities from the next coarser level. However, data in cells already refined is copied directly from the previous refinement patches. Ghost (also know as halo) cells around each patch are used to decouple the submeshes computationally. Ghost cells outside of the root domain are used to implement physical boundary conditions. Ghost cells in have a unique interior cell analogue and are set by copying the data value from the patch where the interior cell is contained (synchronization). For , internal boundaries can also be used. If recursive time step refinement is employed, ghost cells at the internal refinement boundaries on the level are set by time-space interpolation from the two previously calculated time steps of level . Otherwise, spatial interpolation from the level is sufficient.

The characteristic of the AMR algorithm is that refinement patches overlay coarser mesh data structures, instead of being embedded, again avoiding data fragmentation. Values of cells covered by finer submeshes are subsequently overwritten by averaged fine mesh values, which, in general, would lead to a loss of conservation on the coarser mesh. A remedy to this problem is to replace the coarse mesh numerical fluxes at refinement boundaries with the sum of fine mesh fluxes along the corresponding coarse cell boundary. Details about this flux correction can be found in [4]. The basic recursive AMR algorithm is formulated in Fig. ? (left). New refinement meshes on all the higher levels are created by calling Remesh() at a given level . The level by itself is not modified. To consider the nesting of the level domains already in the mesh generation, Remesh() starts at the highest level to be refined and proceeds down to . After evaluating the refinement indicators and flagging cells for refinement, a special clustering algorithm [3] is used to create new refinement patches until the ratio between flagged and all cells in every new submesh is above a prescribed threshold .

In the present paper, all the AMR computations have been carried out using the AMROC (Adaptive Mesh Refinement in Object-oriented C) framework [16]3. At the present time, the AMR core of AMROC consists of approximately LOC in C and approximately LOC for visualization and data conversion. Similarly to the AMR inter-level transfer operations (interpolation, averaging), the employed FV update routine is coded in Fortran-77 and all the codes are compiled with standard compiler optimizations (-O3 with loop unrolling, inlining, etc.) using the GNU compiler suite on the benchmark system with Intel-i7- quad-core processor. Although AMROC permits large-scale MPI-parallel AMR computations [19], the present investigation uses only the serial algorithm of the software.

The original recursively adaptive AMR method with time step refinement is denoted here as AMRLT. In order to provide a comparison to the MR method, we also employ the same implementation without local time stepping under the name AMR. Note that the performance of this variant could be improved by skipping some of the adaptation steps provided that refinement coverage is correspondingly enlarged. Yet, such optimizations have not been investigated here. As refinement indicators, scaled gradient criteria of the form

with , are applied to density and pressure ( and ). As it is common practice [4], a layer of one additional cell width is also tagged for the refinement around each refinement flag to ensure that the flagged feature does not leave the refinement region during the next time step. Furthermore, AMROC allows for the additional application of a heuristic local error indicator based on a Richardson estimation [4]. For the test problems of Section 4, however, scaled gradient and Richardson error estimation criteria were found to give virtually identical mesh refinement and the benchmarked computations only utilized the former.

3Error assessment

The goal of the adaptive MR and AMR computations is to obtain the solution with a significant gain in CPU time and memory while preserving the accuracy of the corresponding FV scheme on the regular finest mesh. To quantify the accuracy of the adaptive simulations the discrete -error is computed. For that a reference FV solution on a finer mesh restricted onto the current uniform mesh level is used.

For the MR methods, with or without local time stepping ( or MRLT), the adaptive solution is recursively projected up to the desired finest uniform mesh of level with a step size . The goal is to obtain using third order cell-average interpolation. The discrete error is then evaluated on the domain as

where denotes the projection of the reference solution down to the desired level . For the two AMR cases, considering again either global or local time stepping, the error is evaluated as the sum of the -error computed on the domain without higher refinement, i.e.,

where

denotes the -norm on the domain . The projection of the reference solution from the finer uniform mesh down to the desired mesh with step size is denoted by . The index stands for either AMR or AMRLT.

To evaluate the performance of the adaptive codes, i.e., CPU, memory and mesh compression rates, including accuracy perturbation, and overhead, different measures are introduced. The CPU time compression rate is defined as the ratio between the CPU time required to compute the solution at the final instant using the adaptive method and the one required to compute the same solution using the fine-mesh FV method

The ratio of the average memory requirement and the number of cells of the finest uniform mesh defines the memory compression rate in the adaptive computations,

where is the number of performed time steps, and denotes the sum of cells of the entire hierarchy at . Similarly, mesh compression can be defined as the ratio of the average leaf requirement (i.e., the average number of required degrees of freedom to represent the numerical solution) and the number of cells of the finest uniform mesh

In the adaptive codes, typically a perturbation error is introduced. This results in larger errors of the adaptive solution compared to the FV solution on the finest uniform mesh. Suitable thresholding guarantees that this perturbation does indeed not deteriorate the convergence order of the underlying FV scheme. The accuracy perturbation is measured by the relative difference between the error given by the reference FV scheme and the error introduced by the adaptive method,

Evolving the solution in each cell of the computational domain using adaptive computations is expected to be more expensive in terms of CPU time than using the reference scheme on the uniform mesh. Hence, an essential question is to know whether the reduction in the number of degrees of freedom required to represent the adaptive numerical solution compensates the additional computational overhead induced by the adaptive algorithm. The overhead of the adaptive computations can be evaluated by

which denotes the average CPU time per leaf spent to evolve the solution in the computational domain. Therefore, the ratio , where denotes the CPU time per leaf on the regular mesh, should be greater than one. Consequently, the overhead per iteration and per leaf of the adaptive computation is defined by

4Numerical results

In the following, the results of the different MR and AMR simulations are presented and compared with corresponding FV reference computations considering two Riemann problems, one in two and one in three space dimensions.

4.12D Riemann problem: Lax-Liu configuration #6

First we consider a classical Riemann problem for gas dynamics proposed in [39] known as Lax-Liu benchmark #. This configuration is initially discussed in [55]. As this is a 2D problem, holds true, the remaining vector components only depend on and , and Eq. () reduces to . The computational domain is the square with outflow boundary conditions. The domain is divided into four quadrants, where the initial data are set constant in each quadrant, according to the values given in Table ?. The simulations are performed until the final time . This benchmark only involves contact discontinuities and generates swirling motion in the clockwise direction.

Figure 1, left shows the reference solution computed with the FV scheme of the AMR method on a regular mesh with grid points which is used to evaluate the errors of the AMR and MR computations.4 Figures Figure 1, right and Figure 3, right illustrate, respectively, the adaptive solutions obtained with MR and AMR at the final time , using levels. The corresponding adaptive meshes are also shown in Figure 3. We observe that in the MR case the mesh is sparser and better adapted to the solution compared to the AMR case.

The underlying FV schemes of MR and AMR are benchmarked on uniform fine meshes with FV cells, where , (corresponding to the scale levels ). To reach the final instant , time steps are performed, where , respectively. Note that in all adaptive simulations time steps are performed as well. In both cases we observe convergence of the FV schemes towards the reference solution with a rate of about one, as shown in Table 1. The FV scheme of MR yields slightly smaller errors with slightly higher convergence rates. In the MR and MRLT computations we studied the influence of the wavelet threshold, and considered different values , 0.0010, which were fixed for all levels. For the MR and MRLT schemes, decreasing the threshold parameter has the effect of improving the accuracy to some extent, but mesh compression deteriorates for very small . In Table 1 only results for are presented. Similarly to the FV case, we observe decaying errors for increasing with the same rates of about one. This shows that the wavelet thresholding well preserves the order of the underlying FV scheme.

The AMR and AMRLT computations use a base mesh of 64 cells and a refinement factor of 2 at all levels. The full block-structured AMR algorithm is used here, including conservative correction at refinement boundaries and hierarchical time stepping. Refinement is based on the scaled gradient of the density with threshold . Again, we observe, similar to the FV case, decaying errors for increasing with rates slightly below one. The difference in the absolute errors between the MR/MRLT and AMR/AMRLT simulations is due to the use of slightly different FV schemes. In particularly, the FV errors of the AMR/AMRLT code are slightly larger, which is consistent with the employed second order method, as discussed in Section 2. However, the small perturbation rates of Table 1 as well as the similar memory and mesh compression rates (differing only ) of Table 2 indicate that both methods perform a comparable adaptation. It can be noted that the MR/MRLT compression rates decrease faster providing evidence for a more sophisticated adaptation criterion.

Table 3 presents the absolute computing time of both codes and the corresponding CPU time compression rates. Comparing the absolute CPU times already for the FV unigrid cases, it is important to point out that the two benchmarked implementations have vastly different absolute computational performance. When running in FV mode, the MR/MRLT code is about a factor of slower than the AMR/AMRLT code and an accordingly larger absolute computing time is consequently also measured in the adaptive simulations. In order to allow nevertheless a comparison, and to assess both mathematical approaches independent of mere implementation aspects as well as minor differences in the numerical discretizations applied, we employ the CPU time compression rate. For the highest resolved MR/MRLT simulations the CPU time compression is slightly better than for AMRLT, which is consistent with the better mesh compression rates seen in Table 2, and thus significantly better than for AMR. Contrary to the AMRLT computations, the present MRLT scheme does not improve the results significantly with respect to MR. This issue of local time stepping in MR is also discussed in [24]. MRLT is most beneficial for unbalanced trees, i.e., localization of small scale features of the solution, expensive flux evaluation, and larger number of well-localized active scales. Therefore, for few active cells on fine scales (e.g., point singularities) the speed-up becomes larger compared to the MR scheme with global time stepping. Similar results are also found by Müller and Stiriba [43] for their MRLT computations for space dimensions larger than one.

As predicted by an analytic cost estimate in [24], we found that the actual speed-up of local time stepping depends on the distribution of the active cells. If the majority of the cells is active on fine scales, the MRLT scheme is less efficient with respect to the MR scheme, whereas for few active cells on fine scales the speed-up becomes larger. In these cases, performing one large time step at a coarse level instead of several time steps on fine scale cells becomes indeed more efficient.

When analyzing overhead rates according to Eq. (), one notices that the relative mesh adaptation overhead rises in all approaches when increasing the number of levels . The respective variants without local time stepping exhibit a steeper increase than the LT methods as coarser grid cells are updated significantly more often. For the highest refined computations with , the overhead is for AMR and for AMRLT. When the overhead is calculated based on the performance of the MR code run in FV unigrid mode, the corresponding overhead rates for MR and MRLT are only and , respectively. Measured in relative terms, the MR/MRLT algorithms are more efficient than AMR/AMRLT, which is due to the unnecessary update of respective coarser cells independent of higher level coverage. However, if the performance numbers from the AMR code run in FV unigrid mode would be used for the sake of direct comparison, the overhead rates for MR and MRLT would jump to and , respectively. This underscores that exactly identical numerical discretizations and implementations in the same programming language of comparable quality and level of optimization are a prerequisite for meaningful head to head benchmarking of adaptive solution methods. Yet, due to the complexity of the involved codes, these requirements can generally never be met in practice, which explains the lack of published quantitative comparisons in this field and provides evidence for the benefit of our approach using primarily relative performance measures.

In order to better understand nevertheless the absolute performance of the used computer codes, a breakdown of the CPU time of the highest resolved adaptive computations at is provided in Table 4. This benchmark analysis has been obtained by using the perf tool. The tabulated items in the first group list major tasks already present in the respective FV implementations. The second group lists algorithmic tasks that have to be carried out in addition in the adaptive programs as they could be distinguished and unambiguously classified from within the performance analysis tool. The last group lists the expense of system calls for dynamic memory management and the accumulation of operations which have too insignificant cost for detailed classification. It can be seen from Table 4 that temporary data creation, handling and deletion are a considerable portion of the FV-MR code. This code is a cell-oriented implementation of a Cartesian unstructured grid, while the FV method in the AMR code is implemented for a generic structured block and allocates all necessary temporary data in advance. In the adaptive case, creation and organization of the hierarchical data becomes a visible portion of the computing time for both approaches. It can be seen that particularly orchestration of the unstructured quad-tree in the MR/MRLT approach, cf. Fig. ?, is rather expensive. In principle, organizing the more general mesh refinement tree in the AMR/AMRLT case is even more involved; however, thanks to clustering individual cells into blocks of considerable size the number of leaves of this general tree, cf. Fig. ?, is magnitudes smaller and the overhead accordingly reduced. A non-negligible portion of the AMR approach is the clustering operation, which generates rectangular blocks from ensembles of individual cells. It is obvious that in the variant without local time step refinement (AMR) the adaptation overhead is increased, since in the runs reported here complete mesh adaptation is allowed in every time step.

4.23D ellipsoidally expanding shock wave

For the next test problem, we consider the expansion of an ellipsoidal shock wave in three space dimensions. The 3D Euler equations are solved in the computational domain until the final simulation time . Outflow boundary conditions are applied at all sides of the domain. The initial ellipsoid is specified by

where with stretching and rotational parameters , , , , , and . Initial conditions in density and energy density are set as

while the velocity vector is initially zero, i.e., everywhere. For convergence analysis, we consider a reference solution computed with AUSMDV numerical flux and second order MUSCL-Hancock reconstruction with Minmod-limiter in the primitive variables , , . The resolution is cells on a uniform mesh, where automatic time step adjustment based on is used.5

The AMR and AMRLT computations use a base mesh of cells and a refinement factor at all levels. As in the 2D example, the entire block-structured AMR algorithm is applied, including a conservative correction at the refinement boundaries and hierarchical time stepping. The refinement is based on scaled gradients of density and pressure with the thresholds . The used mesh generation efficiency is , meaning that each patch can contain up to of cells not flagged for refinement. For the MR and MRLT computations, the threshold parameter is applied at all resolution levels.

The Figs. ? and ? assemble 2D cuts normal to the coordinate directions at the final time . The left image of Fig. ? shows the plane in the origin in the view direction , the middle one displays the plane in the origin in the view direction , and the right image shows the plane through the origin in the view direction . In Fig. ?, we compare the FV reference computation obtained with and down-sampled to with the MR computations for . Shown are only isolines of density ( between and at intervals ). Figure ? displays the 2D cuts and the corresponding adaptive meshes. Inspecting these graphics, we find that both the MR and the AMRLT adaptive computation agree well with the down-sampled FV reference solution. As expected, both adaptive grids concentrate cells in regions of steep gradients present in the solution. We also observe that the MR mesh is better adapted to the solution than the block-structured AMR mesh, i.e., the MR mesh is sparser. Similar to the previous section, we perform a detailed analysis of the computations. The underlying FV schemes of the adaptive methods are benchmarked on uniform meshes first with cells, where , which corresponds to the levels . To reach the final time instant , time steps are performed, where , respectively. The number of time steps is the same in all adaptive simulations.

The accuracy of the different schemes is analyzed in Table 5 by considering the error of density with respect to the reference solution, down-sampled to the corresponding level. The behavior of the errors is very similar to the previous 2D study. For all cases, the error decreases as the number of levels is increased. Since again slightly different FV schemes are employed (cf. Section 2), the MR/MRLT computations and their baseline FV scheme exhibit smaller absolute errors than the AMR/AMRLT methods and their respective FV scheme. For this configuration, all methods show a convergence order being slightly above one at the finest level. We also observe that the adaptive computations with all four methods preserve the convergence order of the underlying FV scheme particularly well and even better than in the 2D investigation, which can be seen in Table 5 reflected in the very small error perturbation rates.

Table 6 gives the total cell and leaf counts over the time steps and the corresponding memory and mesh compression rates. To keep the computational effort for the 3D case reasonable, the total numbers of used cells and leafs at the finest level is comparable to the 2D values of Table 2 at the respective finest level . Comparing the cell and leaf counts in Table 6 with Table 2, one finds that cell and leaf count increase roughly by a factor of 8 to 10 for each additional level in the 3D case, while factors to are found in the adaptive 2D simulations. This is consistent with an isotropic refinement by a factor of in the third dimension. Since the number of integrated cells in the FV simulations rises by the same factor, memory and mesh compression rates decrease with comparable factors in both the 2D and the 3D configuration. However, in the 3D case the convergence is enhanced, with generally smaller compression ratios than in 2D at all levels except the coarsest one. Again, it can be noted that for MR/MRLT the compression rates decrease slightly faster than for the AMR schemes but the difference is generally smaller than in the 2D case.

Denoting by the spatial dimension, the number of integrated cells in the FV simulations rises exactly by the factor at the next finer level. The absolute CPU times of Table 7 exhibit the expected factor of increase for both FV schemes at all levels. In FV mode, the 3D MR/MRLT code is roughly a factor of slower than the 3D AMR/AMRLT code, which is an improvement compared to the 2D case. As in 2D, MR and MRLT methods show very little difference in run time, where the inefficiency of the AMR method without local time step refinement, already observed in 2D, is actually clearly reduced in the 3D case. This behavior can be explained by the fact that the benefit of time step refinement can be expected to remain constant while the overall workload increases by a factor of with each additional spatial dimension. The overhead after Eq. () for the AMR and AMRLT computations at is and , respectively. For the MR and MRLT runs at this level, the overhead rate is and when being computed based on the performance of the MR code in FV unigrid code, and and when the AMR code in unigrid mode would be used as a somewhat questionable FV reference. See Section for a discussion. Comparing the results with the 2D case of the previous section, one finds that the relative mesh adaptation overhead of the AMR/AMRLT computations rises considerably less than for MR/MRLT. A consequence of this behavior is that the overhead in terms of the leaf updates is now consistently smaller for the AMRLT case than for MR but particularly also for MRLT. This result is different than in Section 4.1. However, for the adaption overhead of the AMR method without local time stepping is still larger than the overhead of MR or MRLT.

An explanation for the enhanced performance of the AMRLT method might be found in Table 8 that shows again a breakdown of the finest resolved adaptive computation as obtained by the perf tool. The breakdown in major task groups is the same as in Table 4. Although the unassigned portion has substantially risen in the MR/MRLT cases, the time spent in the core FV update routine has shrunk slightly while for AMR and AMRLT this portion has risen by about 5 and , respectively. Further, the portion of the clustering algorithm has been reduced considerably in the 3D case. An explanation for this behavior might be that the flow phenomenon to be refined is more localized in the 3D simulations and meshes are coarser.

4.3Software design aspects

The observed differences in performance warrant a deeper discussion of the internal data structures and algorithmic solutions adopted in the computer codes which were employed for this study.

The Carmen MR/MRLT code follows an object-oriented design whose base class is aCell that stores the vector of state for a single FV cell. When the code is operating in unigrid mode, a consecutive array ofCell elements is allocated; in MR/MRLT mode eachCell is a member of an object of typeNode. In order to incorporate eachNode object into the graded tree structure,Node has a pointer to its parent and possibly an array with pointers to the children nodes. Depending on the dynamic mesh evolution,Node objects are dynamically allocated or deleted individually, using the standardC commandsnew anddelete. The children array remains unallocated for leaf nodes. When the numerical update is performed in Carmen, the nodes of the graded tree are recursively visited starting from a single root node. If the node is a leaf, numerical fluxes are evaluated by querying the neighbor nodes for their cell-wise vector of state. All numerical fluxes along the cell boundary are then computed and their sum stored inCell. This cell-oriented approach minimizes the memory footprint but requires two evaluations of the computationally expensive numerical fluxes per facet, even when the code is run in FV unigrid mode. Since the graded tree is genuinely unstructured and the data are not stored consecutively in memory, the necessary tree traversal is computationally very costly as can be seen in Tables and (MR tree organization). Since only recursive tree traversal starting at the root node is currently supported, the execution of a higher level local time step requires a loop over allNode elements of the entire tree data structure, which explains the rather similar computing times of MR and MRLT.

The performance for accessing data through the tree in Carmen could be enhanced by implementing a recursive storage pattern that ensures data locality in memory, e.g., by employing a generalized space filling curve algorithm to define a recursive sequential ordering of the multi-dimensional cells of the Cartesian mesh, cf. [54]. The index information for the space filling curve as well as the local neighbor and parent/child information of the tree nodes could be encoded very effectively in bit patterns, cf. [10]. Double flux computations could be relatively easily avoided by adding available numerical fluxes at once to the vector of state of both neighboring cells or by introducing a more complex tree traversal operator.

In the object-oriented design of block-based AMR software it is common to introduce aBox class defining a rectangular area in integer index space. A list ofBox objects specifies the grid topology of every refinement level and aPatch class adds consecutive data storage to eachBox. Depending on the stencil width of the numerical method, layers of ghost cells are created in addition. In AMROC, the allocation ofPatch objects and their internal data arrays is also done with the standardCnew anddelete commands but note that thanks to clustering typically several hundreds of cells together in a single patch, the number of memory management requests is much smaller than in Carmen. After the ghost cells of thePatch objects of one level have been properly set, the FV update function can be called in a simple loop over all patches. In unigrid mode, this corresponds to a straightforward block-based FV code on a rectangular grid and the computational performance hence is quasi-optimal. Each numerical flux is computed only once per patch; double flux evaluations occur only along the boundary facets that a patch shares with a neighbor of the same level. In explicit FV methods the computing costs are dominated by the numerical flux evaluations, which in combination with the faster MUSCL-Hancock reconstruction approach (cf. Section ) explains well why the absolute costs for the portionNumerics in Tables and are roughly 2 to 2.5 times smaller in the AMROC code.

5Conclusions

In the present paper, fully adaptive computations of the 2D and 3D compressible Euler equations are presented and two approaches for introducing adaptivity, i.e., MR and adaptive mesh refinement, are compared and benchmarked in detail. For the same accuracy, the efficiency in terms of CPU time and memory compression of these two adaptive methods has been assessed. For time integration either global time stepping or scale-dependent local time stepping techniques of second order Runge-Kutta type are used. The main differences between MR and AMR is in the way the adaptive meshes are stored and how the error estimators are defined. The MR method uses a graded tree data structure and thresholding of the wavelet coefficients, corresponding to the details between two consecutive levels, to define the adaptive mesh. The AMR method uses a series of regular data blocks on the different levels and relies on scaled gradient criteria based on pressure and density to trigger the mesh refinement and coarsening. Although in both approaches the thresholds are chosen to impose approximately the same accuracy relative to a uniform grid computation, MR methods benefit from a rigorous and potentially more accurate regularity analysis, while for AMR methods rigorous error estimators are not available. Therefore, threshold values of AMR have to be tuned for a given problem, whereas in MR, in principle, the threshold is independent of the problem.

The computational results show that the MR method generally presents larger compression rates and has a prinicipal potential for obtaining larger gains in CPU time than the AMR method. The improved compression rates observed for the MR method are the consequences of a mathematically more sound adaptation criterion as well as applying a patch-based refinement in the AMR method, which leads to a larger number of total cells to avoid data fragmentation.

When absolute CPU times are the primary concern, this investigation has underscored the importance of utilizing hierarchical data structures that preserve some memory coherence on computing data and use auxiliary data to avoid repeated generation of topological and numerical information. While the fulfillment of these requirements is rather naturally embedded into the block-based AMR approach, it comes at the cost of a more complicated base implementation. On the other hand, the cell-based quad- or octree approach can be implemented prototypically with comparably little code; yet, the realization of a high-performance library for adaptive tree data is at least as involved, cf. [36], as devising a high-performance AMR library, cf. [45]. In relative terms, the benefit of the MR approach has been clearly demonstrated. However, the herein employed prototype code falls far short compared to the AMR approach in absolute compute time and – with local time stepping – even in relative adaption overhead for the 3D problem considered.

Our next step will be to implement MR analysis as a mesh refinement criterion for the AMR/AMRLT algorithms, which will combine the natural high computational efficiency of the block-based approach with a mathematically rooted error indicator, requiring less user adjustment. This paper has underscored that for typical, shock-dominated gas dynamical problems solved with explicit FV schemes the choice of data structures and adaptation algorithms has a higher impact on computational performance than the mathematical rigor of the mesh refinement criterion.

Acknowledgments

MOD and SG thankfully acknowledge financial support from Ecole Centrale de Marseille (ECM), Fundação de Amparo a Pesquisa do Estado de S ao Paulo - FAPESP, and the Brazilian Research Council - CNPq, Brazil. KS thanks the ANR project “SiCoMHD‘” for financial support. The authors also thank the CEMRACS 2012 summer program, where part of this study was done. We are grateful to D. Fougère and V. E. Mencone for their computational assistance.

Footnotes

1. An early version of this paper has appeared previously in ESAIM Proceedings, 16:181–194, 2009 [21].
2. The Carmen code is open access and available at
https://github.com/waveletApplications/carmen.git.
3. The latest open access release of AMROC is available at http://www.vtf.website.
4. Computation of the reference solution on the uniform grid used nodes with Intel-Xeon-GHz dual-core processors of a typical GNU/Linux cluster and required wall time to complete time steps. Note that all benchmarked computations were run in serial.
5. While all benchmarked 3D simulations where run in serial, only the computation of this reference solution on the uniform grid was performed on processors on an IBM-BG/P used in SMP mode, which required wall time to complete time steps.

References

1. Multiresolution representation in unstructured meshes.
R. Abgrall and A. Harten. SIAM J. Numer. Anal., 35(6):2128–2146, 1998.
2. An optimal control approach to a-posteriori error estimation.
R. Becker and R. Rannacher. In A. Iserles, R. Becker, R. Rannacher, and P. G. Ciarlet, editors, Acta Numerica, volume 10, pages 1–102. Cambridge University Press, 2001.
3. Three-dimensional adaptive mesh refinement for hyperbolic conservation laws.
J. Bell, M. Berger, J. Saltzman, and M. Welcome. SIAM J. Sci. Comput., 15(1):127–138, 1994.
4. Local adaptive mesh refinement for shock hydrodynamics.
M. Berger and P. Colella. J. Comput. Phys., 82:64–84, 1988.
5. Adaptive mesh refinement using wave-propagation algorithms for hyperbolic systems.
M. Berger and R. J. LeVeque. SIAM J. Numer. Anal., 35(6):2298–2316, 1998.
6. Adaptive mesh refinement for hyperbolic partial differential equations.
M. Berger and J. Oliger. J. Comput. Phys., 53:484–512, 1984.
7. Multiresolution schemes for conservation laws with viscosity.
B. L. Bihari. J. Comput. Phys., 123:207–225, 1997.
8. Multiresolution schemes for the numerical solution of 2-D conservation laws I.
B. L. Bihari and A. Harten. SIAM J. Sci. Comput., 18(2):315–354, 1996.
9. Multi-level adaptive solutions to boundary-value problems.
A. Brandt. Mathematics of Computations, 31(183):333–390, April 1977.
10. p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees.
C. Burstedde, L. C. Wilcox, and O. Ghattas. SIAM Journal on Scientific Computing, 33(3):1103–1133, 2011.
11. Point value multi-scale algorithms for 2D compressible flow.
G. Chiavassa and R. Donat. SIAM J. Sci. Comput., 23(3):805–823, 2001.
12. Wavelet methods in numerical analysis.
A. Cohen. In P. G. Ciarlet and J. L. Lions, editors, Handbook of Numerical Analysis, volume VII. Elsevier, Amsterdam, 2000.
13. Multiresolution finite volume schemes on triangles.
A. Cohen, N. Dyn, S. M. Kaber, and M. Postel. J. Comput. Phys., 161:264–286, 2000.
14. Fully adaptive multiresolution finite volume schemes for conservation laws.
A. Cohen, S. M. Kaber, S. Müller, and M. Postel. Math. Comp., 72:183–225, 2003.
15. Multiresolution schemes for conservation laws.
W. Dahmen, B. Gottschlich-Müller, and S. Müller. Numer. Math., 88(3):399–443, 2001.
16. AMROC - Blockstructured Adaptive Mesh Refinement in Object-oriented C++.
R. Deiterding. http://amroc.sourceforge.net.
17. Parallel adaptive simulation of multi-dimensional detonation structures.
R. Deiterding. PhD thesis, Brandenburgische Technische Universität Cottbus, Sep 2003.
18. Construction and application of an AMR algorithm for distributed memory computers.
R. Deiterding. In T. Plewa, T. Linde, and V. G. Weirs, editors, Adaptive Mesh Refinement - Theory and Applications, volume 41 of Lecture Notes in Computational Science and Engineering, pages 361–372. Springer, 2005.
19. A parallel adaptive method for simulating shock-induced combustion with detailed chemical kinetics in complex domains.
R. Deiterding. Comp. Struct., 87:769–783, 2009.
20. Block-structured adaptive mesh refinement - theory, implementation and application.
R. Deiterding. ESAIM Proceedings, 34:97–150, 2011.
21. Adaptive multiresolution or adaptive mesh refinement? A case study for 2d Euler equations.
R. Deiterding, M. O. Domingues, S. M. Gomes, O. Roussel, and K. Schneider. ESAIM Proceedings, 16:181–194, 2009.
22. Virtual Test Facility: A virtual shock physics facility for simulating the dynamic response of materials.
R. Deiterding, R. Radovitzki, S. Mauch, F. Cirak, D. J. Hill, C. Pantano, J. C. Cummings, and D. I. Meiron. http://www.vtf.website.
23. Adaptive wavelet representation and differenciation on block-structured grids.
M. O. Domingues, S. M. Gomes, and L. M. A Diaz. Appl. Num. Math., 47:421–437, 2003.
24. An adaptive multiresolution scheme with local time stepping for evolutionary PDEs.
M. O. Domingues, S. M. Gomes, O. Roussel, and K. Schneider. J. Comp. Phys., 227:3758–3780, 2008.
25. Space-time adaptive multiresolution methods for hyperbolic conservation laws: Applications to compressible Euler equations.
M. O. Domingues, S. M. Gomes, O. Roussel, and K. Schneider. Appl. Num. Math., 59:2303–2321, 2009.
M. O. Domingues, S. M. Gomes, O. Roussel, and K. Schneider. ESAIM Proceedings, 34:1–96, 2011.
27. An adaptive multiresolution method for parabolic pdes with time-step control.
M. O. Domingues, O. Roussel, and K. Schneider. Int. J. Numer. Meth. Engng., 78:652–670, 2009.
28. Adaptive mesh refinement for singular current sheets in incompressible magnetohydrodynamics flows.
H. Friedel, R. Grauer, and C. Marliani. J. Comput. Phys., 134(1):190–198, 1997.
29. Adaptive finite volume schemes for conservation laws based on local multiresolution techniques.
B. Gottschlich-Müller and S. Müller. In R. Jeltsch and M. Frey, editors, Hyperbolic Problems: Theory, Numerics, Applications, volume 129. ISNM, Inter. Ser. Numer. Math., 1999.
30. Multiresolution algorithms for the numerical solution of hyperbolic conservation laws.
A. Harten. Comm. Pure Appl. Math., 48:1305–1342, 1995.
31. Multiresolution representation of data: a general framework.
A. Harten. SIAM J. Numer. Anal., 33(3):385–394, 1996.
32. Wavelet Based Methods for Time Dependent PDEs.
M. Holmström. PhD thesis, Uppsala University, 1997.
33. Solving hyperbolic PDEs using interpolating wavelets.
M. Holmström. SIAM J. Sci. Comput., 21(2):405–420, 1999.
34. Managing complex data and geometry in parallel structured AMR applications.
R. D. Hornung, A. M. Wissink, and S. H. Kohn. Engineering with Computers, 22:181–195, 2006.
35. A fully adaptive multiresolution scheme for shock computations.
M. Kaibara and S. M. Gomes. In E.F. Toro, editor, Godunov Methods: Theory and Applications, volume . Klumer Academic/Plenum Publishers, 2001.
A. M. Khokhlov. J. Comput. Phys., 143:519–543, 1998.
37. A parallel software infrastructure for structured adaptive mesh methods.
S. R. Kohn and S. B. Baden. In Proc. of the Conf. on Supercomputing ’95, December 1995.
D. Kolomenskiy, J.-C. Nave, and K. Schneider. J. Sci. Comput., 2015.
39. Solution of two-dimensional Riemann problems of gas dynamics by positive schemes.
P. D. Lax and X. D. Liu. SIAM J. Sci. Comput., 19(2):319–340, 1998.
40. A sequel to AUSM: AUSM+.
M.-S. Liou. J. Comput. Phys., 129:364–382, 1996.
41. PARAMESH: A parallel adaptive mesh refinement community toolkit.
P. MacNeice, K. M. Olson, C. Mobarry, R. deFainchtein, and C. Packer. Computer Physics Communications, 126:330–354, 2000.
42. Adaptive multiscale schemes for conservation laws, volume 27 of Lectures Notes in Computational Science and Engineering.
S. Müller. Springer, Heidelberg, 2003.
43. Fully adaptive multiscale schemes for conservation laws employing locally varying time stepping.
S. Müller and Y. Stiriba. J. Sci. Comput., 30(3):493–531, 2007.
44. A low-numerical dissipation patch-based adaptive mesh refinement method for large-eddy simulation of compressible flows.
C. Pantano, R. Deiterding, D. J. Hill, and D. I. Pullin. J. Comput. Phys., 221:63–87, 2007.
45. Parallelization of structured, hierarchical adaptive mesh refinement algorithms.
C. A. Rendleman, V. E. Beckner, M. Lijewski, W. Crutchfield, and J B. Bell. Computing and Visualization in Science, 3, 2000.
46. Multicore/multi-gpu accelerated simulations of multiphase compressible flows using wavelet adapted grids.
D. Rossinelli, B. Hejazialhosseini, D. Spampinato, and P. Koumoutsakos. SIAM J. Sci. Comput., 33:512–540, 2011.
47. Mrag-i2d: Multiresolution adapted grids for remeshed vortex methods on multicore architectures.
D. Rossinelli, B. Hejazialhosseini, W. van Rees, M. Gazzola, M. Bergdorf, and P. Koumoutsakos. J. Comput. Phys., 288:1–18, 2015.
48. An adaptive multiresolution method for combustion problems: application to flame ball - vortex interaction.
O. Roussel and K. Schneider. Comp. Fluids, 34(7):817–831, 2005.
49. A conservative fully adaptive multiresolution algorithm for parabolic PDEs.
O. Roussel, K. Schneider, A. Tsigulin, and H. Bockhorn. J. Comput. Phys., 188(2):493–523, 2003.
50. Wavelet methods in computational fluid dynamics.
K. Schneider and O. V. Vasilyev. Annu. Rev. Fluid. Mech., 42:473–503, 2010.
51. Numerical solution of the Riemann problem for two-dimensional gas dynamics.
C. W. Schulz-Rinne, J. P. Collis, and H. M. Glaz. SIAM J. Sci. Comput., 14:1394–1414, 1993.
52. Riemann solvers and numerical methods for fluid dynamics.
E. F. Toro. Springer, 1997.
53. An accurate and robust flux splitting scheme for shock and contact discontinuities.
Y. Wada and M. S. Liou. SIAM J. Sci. Comput., 18(3):633–657, 1997.
54. Peano – A traversal and storage scheme for octree-like adaptive cartesian multiscale grids.
T. Weinzierl and M. Mehl. SIAM Journal on Scientific Computing, 33(5):2732–2760, 2011.
55. Conjecture on the structure of solutions the Riemann problem for two-dimensional gas dynamics systems.
T. Zhang and Y. Zheng. SIAM J. Math. Anal., 21:593–630, 1990.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters