ParallelinTime with Fully Finite Element Multigrid for 2D Spacefractional Diffusion Equations
Abstract
The paper investigates a nonintrusive parallel time integration with multigrid for spacefractional diffusion equations in two spatial dimensions. We firstly obtain a fully discrete scheme via using the linear finite element method to discretize spatial and temporal derivatives to propagate solutions. Next, we present a nonintrusive timeparallelization and its twolevel convergence analysis, where we algorithmically and theoretically generalize the MGRIT to timedependent fine timegrid propagators. Finally, numerical illustrations show that the obtained numerical scheme possesses the saturation error order, theoretical results of the twolevel variant deliver good predictions, and significant speedups can be achieved when compared to parareal and the sequential timestepping approach.
Yue X Q et. al.]X. Q. Yue, S. Shu, X. W. Xu, W. P. Bu and K. J. Pan^{1}^{1}1Corresponding author.

ParallelinTime with Fully Finite Element Multigrid for 2D Spacefractional Diffusion Equations

[

School of Mathematics and Computational Science, Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
Institute of Applied Physics and Computational Mathematics, Beijing 100088, China
School of Mathematics and Statistics, Central South University, Hunan 410083, China

AMS subject classifications: 35R11, 65F10, 65F15, 65N55

Key words: Spacefractional diffusion equations, spacetime finite element, multigridintime, parallel computing.
1 Introduction
In recent years, mathematical models with fractional derivatives and integrals attract a wide interest of scientists in a variety of fields (including physics, biology and chemistry, etc.), owing to their potences in descriptions of memory and heredity [1]. Particularly, fractional diffusion equations are shown to afford investigations on subdiffusive phenomena and Lévy fights [2]. In this article, we are interested in a class of twodimensional spacefractional diffusion equations (SFDEs)
(1.1)  
(1.2)  
(1.3) 
with orders , , constants , , solution domain , and Riesz fractional derivatives
where
Since closedform analytical solutions of fractional models are rarely accessible in practice, the numerical solutions become very prevalent to empower their successful applications. In literatures, numerical methods of SFDEs proposed to achieve high accuracy and efficiency include finite difference [3, 4, 5, 6, 7, 8, 9, 10], finite element (FE) [11, 12, 13, 14, 15, 16], finite volume [17, 18, 19] and spectral (element) [20, 21, 22] methods. It must be emphasized that no matter which discretization is applied, there usually persists intensive computational task in nonlocality caused by fractional differential operators [23]. Numerous scholars are working to identify fast algorithms most appropriate to tackle this challenge, see [24, 25, 26, 27, 28, 29] and related references therein. Except for these fast solutions, parallel computing can be viewed as another potential technique or even a basic strategy. Gong et al. presented MPIbased and GPUbased parallel algorithms for onedimensional Riesz SFDEs [30, 31], whose speedups are both achieved by spatial parallelism with sequential timestepping approach, using some time propagator to integrate from one time to the next. However, future computing speed must rely on the increased concurrency provided by more, instead of faster, processors. An immediate consequence of this is that solution algorithms, limited to spatial parallelism, for problems with evolutionary behavior entail long overall computation time, often exceeding computing resources available to resolve multidimensional SFDEs. Thus, algorithms achieving parallelism in time have been of especially high demand over the past decade. Currently, parareal in time [32] and multigridreductionintime (MGRIT) [33] are two active choices. Wu et al. analyzed convergence properties of the parareal algorithm for SFDEs via certain underlying ODEs in constant timesteps [34, 35], but they uninvolved largescale testings. Observe that parareal can be interpreted as a twolevel multigrid (reduction) method [33, 36], its concurrency is still limited because of the large sequential coarsegrid solve. MGRIT, a nonintrusively and truly multilevel algorithm implemented in the opensource XBraid [37] with optimal parallel communication, counteracts this and enables us to approximate simultaneously the evolution over all time points. The nonintrusive nature of MGRIT relies upon modalities of fine and coarse timegrid propagators rather than their internals. It has been proven to be effective and analyzed sharply in the twolevel setting for integer order basic parabolic and hyperbolic problems [38], however, with limitations on analysis that they only consider linear problems and the temporal grid is uniform, i.e., fine timegrid propagators are all the same. Furthermore, from the survey of references, there are no calculations of the MGRIT algorithm to SFDEs, nor to FE discretizations of parabolic and hyperbolic problems in time.
The main aim of the paper is to propose and analyze a nonintrusive optimalscaling MGRIT algorithm for spacetime FE discretizations of problem (1.1)(1.3) in uniform and nonuniform temporal partitions, where we shall extend the scope of the MGRIT method and develop a library of MGRIT modifications to timedependent propagators. The outline of our presentation proceeds as follows. In Section 2, we derive a fully discrete FE scheme. Section 3 introduces the MGRIT solver followed by its twolevel convergence analysis to the obtained scheme. In Section 4, we report some numerical results to illustrate optimal convergence rates in both space and time, present theoretical confirmations and analyze weak and strong scaling studies to show benefits. Finally, relevant results are summarized and followup work are drawn in Section 5.
2 Spacetime FE discretization for SFDEs
This section deals with the construction of our spacetime FE scheme for SFDEs. Here we denote by and the inner product and its associated norm on . First, we introduce some fractional derivative spaces.
Definition 2.1.
(Left and right fractional derivative spaces) For a given constant , define norms
Let and be closures of in regard to and , respectively.
Definition 2.2.
(Fractional Sobolev space) For a given constant , define the norm
where is the Fourier transform of . Let be the closure of in norm sense.
Remark 2.1.
The mathematical equivalence among , and with related norms can be proved to the case where , analogously to the work [39].
Utilizing Lemma 5 in [11], the weak formulation of problem (1.1)(1.3) is derived in reference to : given , and , hunting for subject to and
(2.1) 
where
(2.2) 
For the depiction of our spacetime FE discretization, we define a (possibly nonuniform) temporal partition and a uniform spatial triangulation with constant spacings and , let be the set of interior mesh points in , denote the th subintervals and for , and . We choose the usual spaces in tensor products
where , and with as the space of all polynomials of degree .
Now a fully discrete FE approximation for (2.1) is singled out immediately: for , to seek the solution satisfying and
(2.3) 
where the function approximates to the initial , and
Note that and can be respectively written as and , with being the Lagrange linear shape function at point , as well as
By simple algebraic calculations, we deduce
Hence, to compute satisfying Eq. (2.3) we solve
(2.4) 
where initial guess with , vector with , mass matrix takes the blocktridiagonal form
with
stiffness matrices
and
(2.5) 
with the permutation matrix produced in terms of the identity matrix by
Toeplitz matrices () whose entries are in forms
and
Remark 2.2.
It is worthwhile to point out that is introduced for a more economical memory requirement, since is naturally of the full block form, while (2.5) computes without explicit generation of or .
3 Timeparallelization and its twolevel convergence analysis
This section is devoted to the timeparallelization for the forward timemarching loop (2.4) as well as the twolevel convergence analysis.
3.1 The MGRIT algorithm
Consider the forward problem (2.4), a onestep method of Eq. (1.1), which is equivalent to the block unit lower bidiagonal system
(3.1) 
where the th timegrid propagator
(3.2) 
and
Obviously, the inverse in Eq. (3.2) corresponds to a spatial solve.
Regarding the MGRIT algorithm to solve the global spacetime problem (3.1), various components have to be chosen. Let be the coarsening factor in time and , we define a coarse mesh , . In this setting, all are Cpoints and the others are Fpoints. FCFrelaxation depicted in Fig. 1, an initial Frelaxation followed by a Crelaxation and then a second Frelaxation, is often the most reliable choice to produce optimal and scalable multilevel iterations [33]. Define the injection at Cpoints as our restriction operator, and the injection followed by an Frelaxation over the finegrid operator as our interpolation operator. The multilevel hierarchy can be constructed by applying the above processes recursively. Both sequential timestepping and MGRIT are in terms of spatial solve, but MGRIT is highly concurrent. About more processors are actually required in temporal concurrency to outweigh the extra work of MGRIT, where is the number of MGRIT iterations [33]. Below is the MGRIT Vcycle algorithm, where , , , , and () correspond to the th coarsescale time rediscretization, restriction and interpolation, respectively.
Algorithm 1.
MGRIT algorithm with Vcycle: .
 Step 1

Apply FCFrelaxation to .
 Step 2

Restrict the residual .
 Step 3

If , then solve ;
Else perform .  Step 4

Do the coarsegrid correction .
Remark 3.1.
To save computational work, Step 4 of Algorithm 1 is done by just correcting Cpoint values, and updating Fpoint values only when the Euclidean norm of the residual is small enough. The reason is that the correction at Fpoints is equivalent to an Frelaxation, which will be performed in Step 1 of the subsequent iteration.
3.2 Implementation details
For numerical experiments, the generalpurpose parallelintime library XBraid [37] was employed. Eq. (3.1) by uniform temporal partitions can be solved by XBraid in a straightforward way, whereas modifications on XBraid was done to timedependent propagators as part of this study. Wrapper routines were written in C and integrals were calculated by a quadrature formula.
One of the most noteworthy is that we split processors and communicators into spatial and temporal groups for purpose of running parallelized modules in space, time or both. In view of the linearity of Eq. (1.1) and Toeplitz structures of , and , all submatrices and (; ) are only set up once to reduce computational work. The spacetime approximation is obtained until the spacetime residual norm in the discrete sense is less than the absolute halting tolerance , where all spatial solves are accomplished by the Conjugate Gradient algorithm provided in the HYPRE library [40] with as the relative tolerance for stopping and the Euclidean norm used to measure solution progress. Since the procedure involves only matrixbyvector multiplications, matrices , and are kept in unassembled form to save on memory. We skip any work on the first MGRIT down cycle. In addition, we choose random initial guesses for the entire temporal grid hierarchy, except that the initial condition (1.3) is used at on the finest grid.
3.3 Twolevel convergence analysis
Setting , Algorithm 1 in this case reduces to a twolevel scheme, whose error propagator is characterized in the following lemma.
Lemma 3.1.
Proof.
Let be the approximation, we have the update sequence during FCFrelaxation
Notice that the exact solution can be written in the form
which follows from the recursion (2.4). Hence, the residual at the Cpoints becomes
Then we can get the coarsegrid solution
which gives
(3.3) 
It follows by the subsequent correction at Cpoints that
The desired result follows immediately by plugging (3.3) into the above equation. ∎
It is important to note that is introduced to approximate the ideal coarse timestepper . An obvious and effective choice of is to rediscretize problem (1.1)(1.3) on the coarse timegrid, i.e.,
(3.4) 
At this point, a coarse timestep is roughly as expensive to solve as a fine timestep.
Next we wish to establish the error reduction factor of the twolevel version of Algorithm 1, similar to the one in [38], based on the fact that defined by (3.2) and defined by (3.4) can be simultaneously diagonalized by a unitary matrix , which satisfies
(3.5) 
This will yield the following two lemmas.
Lemma 3.2.
The eigenvalue in Eq. (3.5) is real and positive for .
Proof.
Proof.
For and , let
(3.6) 
Then it is easy to verify that the following relations hold
(3.7) 
By Lemma 3.2, we conclude that