An efficient and accurate MPI based parallel simulator for streamer discharges in three dimensions

An efficient and accurate MPI based parallel simulator for streamer discharges in three dimensions

Bo Lin Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076.(linbo@u.nus.edu)    Chijie Zhuang State Key Lab of Power Systems and Department of Electrical Engineering, Tsinghua University, Beijing 100084, China.(chijie@tsinghua.edu.cn, corresponding author)    Zhenning Cai Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076. (matcz@nus.edu.sg)    Rong Zeng State Key Lab of Power Systems and Department of Electrical Engineering, Tsinghua University, Beijing 100084, China.(zengrong@tsinghua.edu.cn)    Weizhu Bao Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076. (matbaowz@nus.edu.sg)
Abstract

We propose an efficient and accurate message passing interface (MPI) based parallel simulator for streamer discharges in three dimensions using the fluid model. First, we propose a new second-order semi-implicit scheme for the temporal discretization of the model, which relaxes the dielectric relaxation time restriction. Moreover, it solves the Poisson equation only once at each time step, while classical second-order semi-implicit and explicit schemes typically need twice. Second, we introduce a geometric multigrid preconditioned FGMRES solver, which dramatically improves the efficiency of solving the Poisson equations with either constant or variable coefficients. It is numerically shown that the solver is faster than other Krylov subspace solvers, and it takes no more than 4 iterations for the Poisson solver to converge to a relative residual of during streamer simulations. Last but not the least, all the methods are implemented using MPI, and the good parallel efficiency of the code and great performance of the numerical algorithms are demonstrated by a series of numerical experiments, using up to 2560 cores on the Tianhe2-JK clusters. A double-headed streamer discharge as well as the interaction of two streamers is studied, using up to 10.7 billion mesh cells.

Key words. parallel simulation, streamer discharge, three-dimensional simulation, geometric multigrid, semi-implicit scheme, MPI

1 Introduction

A streamer is a cold plasma that appears very common in nature and industrials. As the building block of long air gap discharges, streamer discharges initiate many problems associated with insulations, e.g., the air gap breakdowns between a DC converter and the ground [9], flashovers along a insulator. Another typical example is the lightning bolts [36], where the sprites triggered by the strong quasi-electrostatic field generated by intense cloud-to-ground lightning flashes are found to be filament streamer discharges [17].

Based on the local field approximation, the simplest three-dimensional model for simulating streamer discharge consists of two convection dominated transport equations, coupled with a Poisson equation for the electric potential and field:

\hb@xt@.01(1.1)

where , denote the densities of electrons and positive ions, respectively; and denote the electric potential and electric field, respectively; and are the mobility constants for electrons and positive ions, respectively; is a diagonal matrix , and , , are the diffusion coefficients in , , directions, respectively; is the effective ionization coefficient; the parameter and are the elementary charge and the vacuum dielectric permittivity, respectively. In this paper, we focus on the simulator, and the photoionization is simply considered using the background ionization in the initial condition, like in [5, 30].

To model the streamer discharge between two the parallel plates, a cubic domain is considered. Dirichlet boundary conditions are applied for the potential on the upper and lower plate electrodes, i.e., and ; and homogeneous Neumann boundary conditions are applied on other four sides, which are and . The plasma is initially assumed to be electrically neutral everywhere, which gives the following initial conditions for and ,

\hb@xt@.01(1.2)

where . Homogeneous Neumann boundary conditions are applied at all the boundaries for , and at all inflow boundaries for .

Continuous great efforts have been taken to solve the model (LABEL:eM) during the past few decades. In 1980s and 1990s, the flux-corrected transport (FCT) technique [6, 35] was widely used. It has been combined with the finite difference method (FDM) and finite element method (FEM) to overcome the numerical oscillation when classical linear schemes are used to solve convection dominated equations [20, 12]. Later, the finite volume method (FVM) became popular owing to the property of local conservation [19]. Motivated by the success of FVM and FEM, the discontinuous Galerkin (DG) method, which uses a finite element discretization with discontinuous basis functions and incorporates the ideas of numerical fluxes and slope limiters from the high-resolution FDM and FVM, was used to simulate the streamers [38, 39, 37]. By these improvements in the numerical methods, great progress has been achieved in the streamer simulations [3], especially in the two-dimensional case where the streamer is assumed to be axisymmetric. However, three-dimensional simulations are still rarely seen in the literature [26].

The difficulty of three-dimensional simulations lies in the need of fine meshes due to rapid variation in the solution. Streamer discharges propagate dramatically fast, e.g., at m/s as in Fig. 7 of [8]. During such a rapid transient process, the electric field in the discharge channel, which is one of the key parameters dominating the development of a streamer, varies dramatically both temporally and spatially. After the inception of a streamer, the electric field at its head is greatly enhanced due to the net charge accumulation, which will further accelerate the ionization and charge accumulation. Thus a sharp charge density profile will form at the streamer’s head, which requires a spatial grid with very high resolution to capture the structure of the charge carriers. Typically, the order of magnitude for the grid size adopted in previous simulations is characterized by micrometers [5, 30], which is tiny when compared with the characteristic length of the problem at the scale of, e.g., centimeters. This will consequently restrict the maximal allowed time step to the order of several picoseconds or even smaller, when explicit schemes are used. In addition, since the Poisson equation and transport equations for the charge carriers are coupled together, the time step is further restricted by the dielectric relaxation time, i.e., , which is also typically at the order of several picoseconds. For these reasons, a two-dimensional simulation already takes a long computational time, let alone the three-dimensional simulations which need thousands times the number of degrees of freedom even for a small domain. It seems parallel computing is the only possible way to efficiently deliver large scale three-dimensional simulations for streamer discharges.

Recently, Tunissen and Ebert reported their work of simulating streamer in 3D with the parallel adaptive Afivo framework [30], which features adaptive mesh refinement, geometric multigrid methods for the Poisson equation, and OpenMP parallelism. Further improvement can be made by replacing the OpenMP parallelism by message passing interface (MPI) libraries, so that we can fully utilize the power of clusters. Another exciting progress in the MPI simulation was reported by Plewa, Eichwald, and Ducasse et al. [23], which used the successive over relaxation iterative solver in the red and black strategy (R&B SOR) as the Poisson solver, and tested the parallelization and the scalability with cell numbers ranging from 8 million to 512 million and numbers of cores ranging from 20 to 1600. Using high performance computing clusters with MPI implementation, their codes have dramatically shortened the computational time. For example, Fig. 6(b) in [23] shows that the iterative solver for the Poisson equation takes only 4.5 s to converge with a relative residual less than on a grid with 64 million cells, when 200 cores are used.

Motivated by the previous works, this paper contributes in three aspects. First, we propose a new second-order semi-implicit scheme for temporal discretization. In particular, the scheme is stable when the time step exceeds the dielectric relaxation time. It is numerically demonstrated to be second order accurate in time, however, at each time step, it solves the Poisson equation only once, while previous semi-implicit and explicit schemes typically need twice. Note that solving the Poisson equation is the most expensive part of the simulator. Second, we adopt the geometric multigrid preconditioned FGMRES solver with Chebyshev iteration as the smoother in the multigrid preconditioner, which dramatically improves the efficiency of solving Poisson equations with either constant or variable coefficients. We will show that multigrid preconditioned FGMRES is more efficient than other Krylov subspace based methods and R&B SOR. Last but not the least, all the methods are implemented using MPI, and the code runs at a good parallel efficiency on Tianhe2-JK cluster, for more than 2500 cores. Numerical experiments are carried out to demonstrate the good performance of the algorithms, and a double-headed streamer discharge as well as the interaction of two streamers are studied, using up to 10.7 billion mesh cells.

2 Numerical discretization

In this section, we first focus on the temporal discretization, where a second-order semi-implicit scheme will be presented. Then, the Finite Volume method (FVM) is introduced for spatial discretization.

2.1 Second-order semi-implicit temporal discretization

Suppose we simulate a streamer discharge from to a given final time , and we use , , and to denote the associated quantities at th step, with time step . To avoid solving nonlinear algebraic equations, the explicit schemes are frequently used to deal with the time discretization, among which the forward Euler scheme solves model (LABEL:eM) as

\hb@xt@.01(2.1)

At each time step, the potential is first calculated by the Poisson equation, and then and are obtained subsequently. We notice that the scheme (LABEL:explicit) is only first order in time, but it can be easily upgraded to second order by Heun’s method, as is used in [30]. The first stage of Heun’s method is to solve , and from and ,

\hb@xt@.01(2.2)

and then evolve the solution by one more stage to get and :

\hb@xt@.01(2.3)

The final solutions at the -th time step are constructed by

\hb@xt@.01(2.4)

Such a temporal scheme possesses second-order accuracy in time, and is widely used, e.g., in [5, 19].

We wish to emphasize that the second-order explicit scheme (LABEL:heun_ex_1)–(LABEL:heun_ex_3) needs to solve the Poisson equation twice at one time step (from to ). In addition, besides the CFL condition, these two explicit schemes are stable only if satisfies the dielectric relaxation time constraint, i.e.,

\hb@xt@.01(2.5)

To relax the dielectric relaxation time constraint, semi-implicit schemes were introduced [33, 34]. In [34], Villa et al. proposed a semi-implicit scheme with a rigorous asymptotic preserving property, which can be sketched by

\hb@xt@.01(2.6)

Here we have used a discretization of the source term slightly different from [34] for easier demonstration,111In [34], a fully implicit source term is adopted. which does not affect the proof of the asymptotic preserving property. By comparing (LABEL:semi-implicit) and (LABEL:explicit), one can see that the main difference between the semi-implicit scheme and the explicit schemes is whether the electric field is treated implicitly. As demonstrated in [34], when the reference states of , and are bounded, the time step is no longer restricted by the dielectric relaxation time.

For general nonlinear equations, implicit schemes require solving nonlinear algebraic equations using some iterative solvers like Newton’s method. Fortunately, thanks to the structure of model (LABEL:eM), we can solve (LABEL:semi-implicit) in an explicit way by rewriting the Poisson equation as a variable coefficient elliptic equation. A subtraction of first two equations in (LABEL:semi-implicit) gives

\hb@xt@.01(2.7)

Then we plug the expression of in (LABEL:subtract) into the Poisson equation in (LABEL:semi-implicit), and obtain an elliptic equation

\hb@xt@.01(2.8)

After solving the variable coefficient elliptic problem (LABEL:e15), we get . Then we can calculate , and evolve the first two equations in (LABEL:semi-implicit) to get and .

It should be noticed that (LABEL:semi-implicit) is only first order in time, which will be numerically demonstrated later in Table LABEL:t6. The numerical order can be improved using Heun’s method, as in the explicit scheme (LABEL:heun_ex_1)–(LABEL:heun_ex_3). However, in this case, solving the variable coefficient elliptic equation twice at each time step is required to achieve the second-order accuracy in time, which roughly doubles the computational cost compared with (LABEL:semi-implicit).

To reduce the computational cost, we propose a new second-order semi-implicit scheme for (LABEL:eM) without using Heun’s method, which can be regarded as a predictor-corrector method. First, predict , , as well as and at time using the first-order semi-implicit scheme (LABEL:semi-implicit), i.e.,

\hb@xt@.01(2.9)
\hb@xt@.01(2.10)
\hb@xt@.01(2.11)

Then correct and by a midpoint scheme, which gives (LABEL:e5)-(LABEL:e6)

\hb@xt@.01(2.12)
\hb@xt@.01(2.13)

Since the potential and electric field are already predicted at time by solving the following variable coefficient elliptic equation derived from (LABEL:e2)–(LABEL:e4):

\hb@xt@.01(2.14)

there is no need to solve the Poisson equation again. Therefore, the Poisson equation is only solved once at each time step.

The basic idea to get such reduction of the computational cost is to mimic the underlying mechanism of the second-order implicit midpoint rule [14, Chapter 3], in which the right-hand side appears only once at each time step. In order to avoid solving nonlinear systems, such mechanism is applied only to the electric field, while other parts are implemented following the explicit midpoint method. By comparison between the first-order scheme (LABEL:semi-implicit) and our second-order scheme (LABEL:e2)–(LABEL:e6), when focusing on the treatment of the electric field, one can find that the difference is similar to the difference between backward Euler method and the implicit midpoint method. However, it is well-known that the backward Euler method is L-stable while the implicit midpoint method is not. Hence, due to the strong relation between L-stability and asymptotic preserving property [11], when using (LABEL:e2)–(LABEL:e6), we will probably lose the asymptotic preserving property while gaining one more numerical order. Nevertheless, due to its implicit nature, the scheme (LABEL:e2)–(LABEL:e6) is indeed more stable than the explicit ones, as will be shown numerically in Section LABEL:stability.

It is worth noting that both (LABEL:e15) and (LABEL:semi2) are variable coefficient elliptic problems, and the coefficients vary at every time step during the streamer simulations. The coefficient matrix needs to be computed and assembled at each time step, whereas this will be done only once in the constant case. If a preconditioned iterative elliptic solver is used, the preconditioner should also be renewed every step for solving variable coefficient elliptic equation, while this again needs to be done only once in the constant case. The situation is similar if a direct solver is used. Therefore, in streamer simulations, solving the variable coefficient elliptic equation is generally more time consuming than solving a Poisson equation with constant coefficients.

However, it is still not true to conclude that the second-order explicit scheme (LABEL:heun_ex_1)–(LABEL:heun_ex_3) is faster than the second-order semi-implicit scheme (LABEL:e2)–(LABEL:e6). The reason is that, the proposed semi-implicit scheme needs solving the elliptic equation only once at each time step, while the explicit scheme (LABEL:heun_ex_1)–(LABEL:heun_ex_3) requires twice. As we will show later in Section LABEL:differentmethods, the semi-implicit scheme performs better for many Krylov elliptic solvers even if the same time steps are used. Moreover, the semi-implicit schemes remove the dielectric relaxation time restriction, which may allow a larger time step to shorten the total computational time. Therefore, the efficiency of two schemes depends on the problem and the elliptic solver we select.

2.2 Spatial discretization by FVM

The computational domain is decomposed by a uniform grid with , , partitions in , , directions respectively. Therefore, the grid size is characterized by , , . The grid cells are denoted by , where , , . The finite volume method is used for the spatial discretization, and we define

\hb@xt@.01(2.15)

Other notations such as and are similarly defined. For (LABEL:semi2), the classical second-order central scheme is adopted. Let be the discrete coefficient of the elliptic problem (LABEL:semi2) defined by

\hb@xt@.01(2.16)

and denote

Then (LABEL:semi2) is discretized as

\hb@xt@.01(2.17)

where the subscripts are neglected for the numerical solutions at , e.g., denotes ; and denotes the forward difference and backward difference of in direction respectively

\hb@xt@.01(2.18)

and similar notations for and ; denotes second-order central difference of in direction as

\hb@xt@.01(2.19)

and similar notations for and .

For the transport equations (LABEL:e2), (LABEL:e3), (LABEL:e5) and (LABEL:e6), the second-order MUSCL scheme combined with the Koren limiter is applied [32, 16]. Ghost cells are used for all the boundary conditions of and . This part of the spatial discretization is classical, and we omit the details here. Generally, we expect second-order accuracy of this spatial discretization for smooth solutions.

3 Multigrid preconditioned FGMRES elliptic solver

In order to solve (LABEL:e18), an iterative solver rather than a direct solver is preferred. Although some state-of-the-art direct solvers retains the sparsity of the matrix to some degree, however, generally speaking, in three-dimensional simulations, the parallel direct solver still requires huge memory that is unaffordable when the number of degrees of freedom becomes large, and is therefore inapplicable. One example of using parallel direct solver MUMPS can be found in [23].

In [15], geometric multigrid method has been shown to be faster than the SOR method for solving the Poisson equation in 2D streamer discharge. Moreover, the convergence rate of the SOR method depends on the relaxation factor, which is difficult to always keep optimal at each time step since the coefficients in elliptic problem (LABEL:e18) vary.

We use geometric multigrid as a preconditioner rather than a solver because geometric multigrid preconditioned Krylov subspace solver may be more stable and efficient than using geometric multigrid alone. In [28], multigrid is shown to be divergent for high order FEM when used as a solver, but the convergence is achieved when multigrid is combined with conjugate gradient method. By investigating the eigenvalues of the iteration matrix, it was found in [22] that isolated large eigenvalues limit the convergence of multigrid solver, and the eigenvectors belonging to these large eigenvalues can be captured in Krylov subspace constructed by GMRES in a few iterations, which accelerates the convergence of multigrid. It is also shown in [28, 29] that the multigrid preconditioner combined with conjugate gradient method is faster and more stable than the multigrid solver.

3.1 preconditioned FGMRES solver

With geometric multigrid as the preconditioner, we find that geometric multigrid preconditioned Flexible Generalized Minimal Residual (FGMRES) is the best among various Krylov subspace solvers, which will be shown later in Section LABEL:differentmethods. The details of preconditioned FGMRES is shown in Algorithm LABEL:FGMRESalg [24]. The notation denotes 2-norm hereafter.

1:  1. Initial guess . Define a zero matrix , where is a given number indicating the dimension of Krylov subspace and denotes the -th entry of .
2:  2. Arnoldi process:
3:  Compute , and .
4:  for  do
5:        Preconditioning ;
6:        Compute ;
7:        for  do
8:              Gram-Schmidt process: , ;
9:        end for
10:        Compute , .
11:  end for
12:  Define a matrix .
13:  3. Form the iterative solution: , where and .
14:  4. Restart: If the stopping criterion is not satisfied, let and go to 2.
Algorithm 1 FGMRES with preconditioning to solve .

It is shown in line 5 of Algorithm LABEL:FGMRESalg that for different basis vector , different preconditioning matrices can be selected, which gives the “flexibility” as in the name of the solver. As the price to pay, the preconditioned vectors in line 5 should be stored to form the matrix , resulting in larger memory cost than the classical Generalized Minimal Residual (GMRES) method in which only the vectors are stored. However, the flexibility to use different preconditioner can help to improve the robustness of GMRES algorithm, which is shown in [24].

In our implementation of FGMRES, we choose , and select the multigrid as the preconditioner in line .

3.2 Multigrid preconditioner

Geometric multigrid preconditioner is chosen to accelerate the convergence of FGMRES solver.

Our implementation of the geometric multigrid preconditioner uses full multigrid (FMG) for the first time step, and V-cycle multigrid afterwards. In general, FMG has faster convergence than the V-cycle multigrid for a general initial guess, while each preconditioning process of FMG (evaluation of in Algorithm LABEL:FGMRESalg) is slower. Therefore, at the first time step, when no previous information is available, we simply take the zero initial guess, expecting that FMG gives faster convergence. After that, the potential calculated in the previous time step is taken as the initial guess. With such a good initial guess, the cheaper V-cycle multigrid gives better performance.

To introduce the multigrid preconditioner, we would first like to provide a simple review of the multigrid solver. In the following, we suppose the elliptic equation on grid level is discretized as

\hb@xt@.01(3.1)

A diagram showing the procedure of the two-layer V-cycle multigrid method is given in Figure LABEL:multigriddiagram, where the subscript and denote the second layer (fine layer) and the first layer (coarse layer) respectively. The restriction and prolongation are shown in the same figure, using a 2D example of mesh and mesh. When solving the equation showing on the bottom of the V-cycle, such a multigrid procedure can be reapplied, resulting in a multi-layer multigrid solver. More details will be revealed as we introduce the multigrid preconditioner subsequently.

Figure 3.1: Diagram of V-cycle geometric multigrid for uniform mesh in two layers.

Now we will introduce the multigrid preconditioner, which is used in the “Preconditioning” step in line 5 of Algorithm LABEL:FGMRESalg. In this step, we need to apply the multigrid preconditioner to a vector . This can be done by setting zero initial and in the multigrid solver in Figure LABEL:multigriddiagram, and the output will be the preconditioned vector , which turns a multigrid solver to a preconditioner for Krylov subspace solvers. Details of this preconditioning process will be provided in the following, based on [29, 1].

For conciseness, we will only show the preconditioning process for a two-layer multigrid, following the notation in Figure LABEL:multigriddiagram. The V-cycle multigrid solver first applies a pre-smoothing to the initial value by some iterative methods. Some commonly used smoothers in sequential computation include the Gauss-Seidel method and successive over relaxation (SOR) method. We refer the readers to [29] for an illustration of these methods. However, when parallelized, the efficiency of these methods is impaired due to their sequential nature. Instead, we adopt the Chebyshev smoother in our implementation, which is a polynomial smoother based on Chebyshev polynomials. The performance of polynomial smoothers (including Chebyshev polynomials) and the parallel Gauss-Seidel smoother has been compared in [1], and the results show that the polynomial smoothers are preferable in the parallel environment. In general, given a polynomial of degree , the associated polynomial smoother in the pre-smoothing of our multigrid algorithm reads

\hb@xt@.01(3.2)

which actually smooths out the error by

where . Since the smoother is responsible for damping the high-frequency error, which usually corresponds to the large eigenvalues of , we would like to choose such that is small when is close to the largest eigenvalue of (denoted by below). This inspires us to consider the following optimization problem:

where is the set of polynomials of degree less than or equal to , and is chosen manually to specify the minimum corresponding eigenvalue of the modes that the smoother is supposed to damp. The solution of this optimization problem can be easily written down using Chebyshev polynomials:

\hb@xt@.01(3.3)

where is the Chebyshev polynomial of degree , defined recursively as

\hb@xt@.01(3.4)

Thus can be found accordingly. By introducing two matrices and , we can rewrite the pre-smoothing (LABEL:smootheq) as

\hb@xt@.01(3.5)

In practice, is usually replaced by an approximation of the largest eigenvalue. Moreover, due to the recursive relation of the Chebyshev polynomial (LABEL:recursion), the smoother (LABEL:e19) can also be implemented iteratively. Such “Chebyshev iteration” can further be improved to the “preconditioned Chebyshev iteration” by introducing another preconditioner on top of it, for which we refer the readers to [7] for details.

After pre-smoothing, we use a restriction matrix to map the residual to the coarse grid:

\hb@xt@.01(3.6)

Then we may calculate the solution for on the coarse grid by

\hb@xt@.01(3.7)

and use a prolongation matrix to interpolate the solution back to the second layer for correction:

\hb@xt@.01(3.8)

Finally, post-smoothing is applied to in the same way as (LABEL:smootheq), giving us

\hb@xt@.01(3.9)

which is similar to (LABEL:e19). Therefore, we formulate the V-cycle multigrid in Figure LABEL:multigriddiagram by (LABEL:e19)–(LABEL:e23), which can be summarized as

\hb@xt@.01(3.10)

where . Now it is clear that (LABEL:eRichard) is one Richardson iteration step from to with V-cycle geometric multigrid preconditioner . Details of the derivation of (LABEL:eRichard) and the corresponding equation for general multilayer V-cycle method are shown in Appendix LABEL:appendixa. FMG method can be viewed as a better initialized version of the V-cycle multigrid, which is taken as the preconditioner in a similar way.

In our implementation of multigrid preconditioner, a sequential direct solver is used on the coarsest mesh, which means collective communication is required on the coarsest layer. To avoid large cost in both computation and communication, a proper number of layers should be chosen according to the mesh size. On the other hand, the quadratic-polynomial preconditioned Chebyshev smoother is applied to both pre-smoothing and post-smoothing, with one step local symmetric successive over relaxation method (SSOR) as the preconditioner. The values of and , which are required in the Chebyshev iteration (see (LABEL:qntn)), are given by

where is the upper Hessenberg matrix obtained by applying the Arnoldi process (without preconditioning) to , as given in Algorithm LABEL:FGMRESalg. We use in our implementation for eigenvalue estimate.

4 MPI based parallel implementation

Message Passing Interface (MPI) is supported by most High Performance Computing (HPC) platforms and a variety of implementations are available. It supports parallel computing using thousands of cores, which fully utilizes the power of modern clusters. MPI takes charge of the communication between different processes by sending and receiving messages. A group of processes are defined in a communicator and each of them has a unique rank. A process communicates with another one by message with the rank of the process and a tag. Moreover, MPI supports collective communication, which broadcasts messages from one process to all other processes in an optimal way.

In our implementation, we partition the 3D grid into Cartesian subgrids with equal sizes in each direction, and each process only stores the portion of the solutions defined on one of the subgrids. To reduce the latency in communication, for each subgrid, the number of cells in each direction is nearly equal. For example, suppose we have a uniform mesh of cells, which is to be distributed to processes. Then we decompose the entire grid into subgrids, so that each process owns a cubic subgrid with size .

Figure 4.1: Explanation of communication for , in 2D case.

To apply the MUSCL scheme, every process needs to retrieve from neighbouring processes the values of and on the two adjacent layers of cells, which requires communication between processes. For illustrative purposes, we only show the communication in 2D cases in Figure LABEL:fcomm. There are 9 processes labelled by their ranks in Figure LABEL:fcomm, and we focus only on the 5th process, whose subgrid locates in the interior of the domain. Process 5 stores an subgrid, and to update the solutions on the top two rows of the subgrid using the MUSCL method, a stripe of cells of unknowns from process 2 (indicated by the dark yellow color) is required. Similarly, or cells of unknowns from processes 4, 6, 8 are required to update solutions on some of the other light yellow grid cells. On the other hand, to update the solution in processes 2, 6, 8 and 4, four stripes of or cells of unknowns from process 5 need to be sent to those processes correspondingly. Generally, for a 3D uniform grid, an interior subgrid with local subgrid size should receive cells of unknowns from the surrounding processes and send the same amount of unknowns to them. Therefore, the local communication/computation ratio can be characterized by the following number:

\hb@xt@.01(4.1)

which shows that the cost for communication is one order of magnitude lower than the computation for interior processes. We use ghost cells to deal with the boundary conditions, therefore the communication required for the processes handling the boundary conditions is less than the interior ones.

The communication for the potential is similar, but the stripes for communication have the width of only one cell in most layers of multigrid, which is sufficient for assembling the sparse coefficient matrix by (LABEL:e18) and performing matrix-vector multiplications. The only exception is at the coarsest layer of the multigrid preconditioner where a direct solver is used, gathering and broadcasting operations are needed, however, for a small amount of data.

5 Accuracy and efficiency test

Our codes are developed based on the well-known Portable, Extensible Toolkit for Scientific Computation (PETSc [4]). PETSc contains data structures and routines for both scalable or parallel solution of partial differential equations, and supports MPI parallelism. The simulations are performed on the cluster Tianhe2-JK located at Beijing Computational Science Research Center. It has 514 computational nodes, each of which is equipped with two Intel Xeon E5-2660 v3 CPUs (10 cores, 2.6 GHz) and 192 GB memory. The nodes are connected by TH high-speed network interface. More details can be found at https://www.csrc.ac.cn/en/facility/cmpt/2015-05-07/8.html.

If not otherwise stated, the following setup for the double-headed streamer in homogeneous field between two parallel planes at atmospheric pressure Torr, is used to test the accuracy and efficient of different numerical solvers.

\hb@xt@.01(5.1)

where , and . We take the parameters as in [10, 5]: cm(Vs) and cm(Vs), respectively; cm; cms.

The convergence criterion for the iterative elliptic solver is given by a tolerance of the relative residual, i.e., the iteration stops until

\hb@xt@.01(5.2)

where the tolerance is set in all the simulations in this paper.

5.1 Convergence and stability of the semi-implicit scheme

In this subsection, we will first use an 1D example to show that the proposed semi-implicit scheme is second-order accurate in time, and then use a 3D example to show the scheme is more stable than the explicit schemes.

5.1.1 Comparison of different temporal schemes in 1D case

For simplicity, the following model problem, which has a similar form to (LABEL:eM), is adopted for testing:

\hb@xt@.01(5.3)

Dirichlet boundary conditions and are applied for , while homogeneous Neumann boundary conditions are applied for and . We select , , , , and . The initial value is . For all the calculation in this example, we fix the ratio of the time step to the grid size at , and consider the convergence order in both time and space. Two semi-implicit temporal schemes introduced in Section LABEL:2nd_order are implemented with the same spatial discretization, which is the finite volume discretization with unlimited linear reconstruction. The “exact solution” and are calculated by second-order semi-implicit scheme with , which is sufficiently small. The numerical results are given in Tables LABEL:t5 and LABEL:t6.

3.1659e-4 8.3831e-5 2.1649e-5 5.5091e-6 1.3869e-6 3.4440e-7
order 1.9171 1.9532 1.9744 1.9900 2.0097
2.5302e-4 6.3421e-5 1.6027e-5 4.0407e-6 1.0129e-6 2.5099e-7
order 1.9962 1.9844 1.9879 1.9962 2.0127
Table 5.1: Error of the second-order semi-implicit scheme (LABEL:e2)–(LABEL:e6) in 1D testing problem.
1.7405e-3 9.9820e-4 5.3935e-4 2.8098e-4 1.4348e-4 7.2511e-5
order 0.8021 0.8881 0.9408 0.9696 0.9846
1.5186e-3 8.8186e-4 4.8021e-4 2.5121e-4 1.2856e-4 6.5040e-5
order 0.7841 0.8769 0.9347 0.9665 0.9830
Table 5.2: Error of the first-order semi-implicit scheme (LABEL:semi-implicit) in 1D testing problem.

Results in Table LABEL:t5 and LABEL:t6 clearly demonstrate that the proposed semi-implicit scheme (LABEL:e2)–(LABEL:e6) is indeed second-order accurate in time, while the previously used semi-implicit scheme (LABEL:semi-implicit) is only first-order accurate in time, even though the second-order spatial discretization has been used.

It is worth emphasizing that the proposed second-order semi-implicit scheme needs solving the elliptic equation only once at each time step, which is the same as the first-order semi-implicit scheme. To gain second order accuracy, the only additional cost is an explicit stage for and at every time step, which is relatively cheap compared with additionally solving the elliptic equation.

5.1.2 Study of stability in terms of the dielectric relaxation time restriction

To show that the semi-implicit scheme (LABEL:e2)–(LABEL:e6) is able to alleviate the dielectric relaxation time restriction, and thus is more stable than explicit schemes, the calculation is performed until ns on the domain cm with and a relatively coarse mesh cm.

For the proposed second-order semi-implicit scheme, the stability condition is

\hb@xt@.01(5.4)

where , and are the components of the electric field , and the subscript “” means the maximum value among all cells. However, we are unable to choose according to (LABEL:timedrift) because should be given before we solve in (LABEL:e18), but can only be computed after is obtained. Therefore, in our implementation, alternatively we choose a relatively small at the first step, and then choose

\hb@xt@.01(5.5)

For comparative purposes, in this section, we replace by an A priori estimated sufficient large value, and use fixed time steps satisfying (LABEL:timestep) in the simulations.

As mentioned previously, for explicit methods, the time step cannot exceed the dielectric relaxation time due to stability problems, while the semi-implicit scheme can. To test this property, we approximate the dielectric relaxation time by , which replaces the maximum value of in (LABEL:drt) by for simplicity. Four different time steps, i.e., , , and , are used to test the stability. We consider a simulation as unstable if