Parallelizing Sequential Sweeping on Structured Grids – Fully Parallel SOR/ILU preconditioners for Structured n-Diagonal Matrices

Parallelizing Sequential Sweeping on Structured Grids – Fully Parallel SOR/ILU preconditioners for Structured n-Diagonal Matrices

R. Tavakoli Department of Material Science and Engineering, Sharif University of Technology, Tehran, Iran, P.O. Box 11365-9466, email: tav@mehr.sharif.edu, rohtav@gmail.com.
Abstract

There are variety of computational algorithms need sequential sweeping; sweeping based on specific order; on a structured grid, e.g., preconditioning (smoothing) by SOR or ILU methods and solution of eikonal equation by fast sweeping algorithm. Due to sequential nature, parallel implementation of these algorithms usually leads to miss of efficiency; e.g. a significant convergence rate decay. Therefore, there is an interest to parallelize sequential sweeping procedures, keeping the efficiency of the original method simultaneously. This paper goals to parallelize sequential sweeping algorithms on structured grids, with emphasis on SOR and ILU preconditioners. The presented method can be accounted as an overlapping domain decomposition method combined to a multi-frontal sweeping procedure. The implementation of method in one and two dimensions are discussed in details. The extension to higher dimensions and general structured n-diagonal matrices is outlined. Introducing notion of alternatively block upper-lower triangular matrices, the convergence theory is established in general cases. Numerical results on model problems show that, unlike related alternative parallel methods, the convergence rate and efficiency of the presented method is close to the original sequential method. Numerical results also support successful use of the presented method as a cache efficient solver in sequential computations as well.

d

September 25, 2019

September 25, 2019

omain decomposition, overlapping decomposition, parallel Gauss Seidel, parallel sweeping.

{AMS}

65F10, 65H10, 65N06, 65N22, 65Y05

1 Introduction

Structured grids are commonly used in scientific computing for the purpose of numerical solution of partial differential equations (PDEs). Despite the unstructured grid based methods which manage domain complexity in a more consistent manner, in certain applications using structured grids is still the best choice, mainly due to preference in terms of consumed memory, efficiency and the ease of implementation. Moreover, with the invention of immersed boundary methods with make possible to manage complex geometries on structured grids, there is recently a significant interest to application of structured grids in scientific computing. Progress in power of supercomputer changes the bias in favor of structured grids too, as they are usually more appropriate for parallel computing.

Some of numerical algorithms on structured grids use sequential sweeping; grid by grid procedure on a mandatory order. In fact, using the conventional version of sequential algorithms, it is not possible to perform computations simultaneously. This limit us to use full power of multi-processor shared and/or distributed memory machines.

Since the advent of parallel computers, several parallel versions of the SOR method have been developed. Most variants of these algorithms have been developed by using multi-color ordering schemes or domain decomposition techniques [adams1982mcs, oa1985lar, adams1986scb, neumann1987cpm, white1987mpi, melhem1987tei, melhem1988mrs, ashcraft1988vif, block1990bcs, ortega1991ocg, ii1993oma, stotland1997opc, xie1999nps]. In contrast to domain decomposition method, a multi-color SOR method has usually a faster convergence rate, but is usually more difficult to solve elliptic boundary value problems in complex domains [xie2006nbp]. To improve the convergence rate of a domain decomposition algorithms, a novel mesh partition strategy was proposed in [xie1999nps] which led to an efficient parallel SOR. This method has the same asymptotic rate of convergence as the Red-Black SOR [xie1999nps]. Although, multi-color SOR algorithms are also usually implemented based on a domain decomposition strategy, the Xie and Adams’s algorithm [xie1999nps] takes less inter-processor data communication time than a multi-color SOR. The blocked version of the Xie and Adams’s parallel SOR algorithm [xie1999nps] is suggested in Xie [xie2006nbp]. According to presented results, the rate of convergence and parallel performance of this version is better than that of the non-blocked version.

The main drawback of the above mentioned parallel versions of SORs is lower rate of convergence in contrast to original one. This is mainly due to enforced decoupling between equations to provide some degree of concurrency in computations. For example in the m-color SOR method, an increase in the number of colors decreases the rate of convergence. So a key to achieve an efficient parallel SOR method is to minimize the degree of decoupling due to parallelization.

Other techniques such as the pipelining of computation and communication and an optimal schedule of a feasible number of processors are also applied to develop parallel SOR methods when the related coefficient matrix is banded [missirlis1987spi, robert1988csp, eisenstat1989csp, niethammer1989smp, comput1995pgs, amodio1996pis]. These techniques can isolate parts of the SOR method which can be implemented in parallel without changing the sequential SOR. The convergence rate of these methods are same as the original one but the application is limited to banded-structure matrices.

The present goals to suggest a new method to parallelize sequential sweeping on structured grids. The specific focus is on the SOR method which can be regarded as an iterative preconditioner, smoother or even a linear solver. Without loss of generality, the later case is taken into account in this study to simplify arguments. Since there is a direct correspondence between SOR and ILU preconditioners on structured grids, the presented method can be used to develop parallel ILU methods as well.

The rest of this paper is organized as follows. Sections 2 and 3 present the implementation of method in one and two spatial dimensions respectively. The extension to three spatial dimensions is commented in Section 4. he Considering the connection between SOR and ILU preconditioners on structured grids, Section 5 argues the application of presented approach to develop ILU(p) preconditioners on structured grids. The convergence theory is established in Section 6. Section 7 extend utility of the presented method to general structured n-diagonal matrices. Section 8 is devoted to numerical results on the convergence and performance of the method. Finally Section 9 close this paper by summarizing results and outlining possible application and extension of the presented method.

2 Parallel SOR sweeping in one-dimension

Consider the following scaler 1D elliptic PDE:

(1)

where , , is , . Also, it is assumed field variables , and posses the sufficient regularity required by the applied numerical method to ensure the existence and uniqueness of the discrete solution. Assume the spatial domain is discrtized into an intervals with grid points and , where .

Without loss of generality; for the ease of presentation; we discrtize (1) with a second order finite difference method as follows:

(2)

where subscript denotes the value of field variable at (e.g. ),

. Using boundary conditions and , it is possible to write above equations in the following matrix form,

(3)

where and . It is evident that matrix is irreducibly diagonally dominant which ensure the convergence of stationary iterative methods, like SOR. The SOR method using the natural row-wise ordering (left to right sweeping) generates the following sequence of iterations from a given initial guess ,

(4)

where is left to right (LR) sweeping relaxation factor and superscript denotes the iteration number. The other alternative SOR method is resulted by reversing the sweeping direction (right to left sweeping):

(5)

where is the right to left (RL) sweeping relaxation factor. The application of (4) and (5) alternatively during each two consecutive iterations leads to the classical symmetric SOR (SSOR) method. It is clear that these procedures are sequential, in the sense that update of grid point should be performed after its left (or right) neighbor. In fact, it is not possible to perform relaxation of more than one grid point at each time in the original version of SOR method.

The goal of parallel implementation of SOR method is to provide possibility of concurrent computation, simultaneously keep the convergence rate of original SOR method as much as possible. The concept of domain decomposition is used here to parallelize the SOR method. Assume spatial domain is decomposed into number of non-overlapping sub-domains, i.e., , and (splitting of grid points).

For the purpose of convenience, we first describe our method for two (approximately equal) sub-domains. As illustrated in figure 1, discrtized spatial domain is decomposed into sub-domains and , where and . If (4) is used in and the computation begins from the left boundary, and simultaneously, (5) is used in and the calculation is performed from the right boundary toward left direction, then the computation of sub-domains will be independent during the first iteration. In our algorithm during the next iteration (5) and (4) are used respectively in and ; and the computations are started from grid points and in the corresponding sub-domains. Therefore, we have the following equation at grid point ,

(6)

while the following equation should be used at grid point ,

(7)

6 and (7) can be written in the following matrix form:

(8)

The above system of equation can be easily inverted to compute and . Computing the new values of grid points and based on the mentioned procedure, the computations for the remaining grid points will be performed trivially according to the corresponding update formulae. These sequence of computation is repeated for each pair of consecutive iterations until the convergence criteria is satisfied. The flow chart of this procedure is displayed in figure 1. In this plot and symbols are used to denote the application of (4) and (5) respectively; also symbol is used to denote the application of (8).

Figure 1: The diagram of the parallel implementation of the presented SOR method for two sub-domains.

The extension of this algorithm to more than two sub-domains is a straightforward job which can be described as follow (cf. figure 2):

  1. Division of the spatial computational domain into desired number of sub-domains, based on desired load balancing constraints.

  2. Determination of the sweeping direction for each sub-domain. Sweeping direction of each sub-domain should be in opposite direction of its neighbors. For example we use LR direction for odd sub-domains and RL direction for even sub-domains. These sweeping directions are inverted after each iteration.

  3. Updating the starting node of each sub-domain with (8) and the remained nodes with either of (4) or (5), based on the corresponding sweeping direction. If the starting node be located at the physical boundaries, the prescribed boundary value will be used there (there is no need to use (8) in this case).

Figure 2: The diagram of the parallel implementation of the presented SOR method in one-dimension for multiple number of sub-domains.

Now let’s to look at the properties of the presented parallel algorithm and its connection to other domain decomposition methods. It is obvious that this algorithm is an overlapping domain decomposition method which changes the overlap regions area alternatively. As the method has some similarities to overlapping Schwarz algorithm, lets to mention some differences of our algorithm with the Schwarz method. In contrast to the Schwarz method the size of overlap regions here are changes alternatively. Sub-system of equations are only solved exactly at overlapped regions (of course alternatively); and within internal regions of sub-domains, equations are solved approximately (by one-pass SOR iteration). In Schwarz algorithm sub-domain boundaries are treated as virtual boundary conditions of Dirichlet, Neumann, Robin or mixture of these types. When relaxation factors are equal to unity (Gauss Seidel iterations), our numerical treatment for sub-domain boundaries are equivalent to Neumann-type (flux) boundary conditions, in the other cases our internal boundary conditions are equivalent to a mixture of Dirichlet and Neumann ones. In the later case, the relaxation factors and plays role of blending factors. In fact our method tries to keep the partial coupling between sub-domain during iterations, in expense of a negligible computational cost. It is worth mentioning that due to alteration of sweeping directions we expect good smoothing properties of the presented method. Later we will observe that how such enforced local coupling leads to competitive convergence rate of the presented method in contrast to that of original SOR.

Note that the procedure for higher order stencils or other numerical methods are conceptually the same. We avoid further remarks in this regard to save the space.

3 Parallel SOR sweeping in two-dimensions

In this section we shall extend the presented parallel sweeping algorithm for two-dimensional problems. We consider the following scaler 2D elliptic PDE:

(9)

where , , is diagonal matrix with diagonal entries , . For the sake of convenience, it is assumed that . Same as previous, sufficient regularities are assumed for the field variables.

Suppose that is divided into and Cartesian grids along and directions respectively. The grid points are given by and ; where , , , and . Same as previous, the finite difference approximation of (9) is used here to describe the presented method (with second order central differencing scheme). Higher order finite difference methods as well as other numerical methods (like FVM, FEM) can also be employed. The only requirement of our method is discretization of PDE on a structured grid. We use to denote the finite difference approximations of . The approximation of (9) with the mentioned scheme has the following form,

(10)

for and ; where ,

and the mid grid point coefficient are computed by harmonic averaging, e.g.,

Using the corresponding boundary conditions,

Same as previous, it is possible to write above equations in a matrix form. However, the presented method is matrix-free and there is no need to write equations explicitly in a matrix form at this point.

There are variety of sweeping direction to solve (10) with SOR method. Considering a rectangular computational domain, it is possible to start sweeping procedure from one corner toward its corresponding opposite corner. Therefore, there is four sweep directions. Using geographic directions (east, west, north, south), it is possible to show these directions by SW-to-NE, NE-to-SW, SE-to-NW and NW-to-SE. There are various sweeping strategy for each of mentioned directions. These strategies are shown in figure 3 for SW-to-NE, NE-to-SW directions on a structured grid. Every sweeping strategy should preserve causality; should be performed along a causal direction. The causality here means that order of update should be such that new values of at least one half of neighboring nodes of the current grid should be known prior to its update. It is possible to use distinct relaxation factors along each sweeping direction. We show relaxation parameter of each direction by subscript according to its starting corner, i.e., , , , .

Figure 3: Possible sweeping strategies for SW-to-NE (bottom row), NE-to-SW (top row) directions on a structured grid: order of updating procedure in natural row-wise (a, e), symmetric row-wise (b, f) and frontal (c, d, g, h) sweeping strategy.

The SOR updating (from iteration to ) formulae for grid point has one of the following formulae,

(11)
(12)
(13)
(14)

It is clear that due to required causality condition, the above procedures are sequential in nature. To parallelize this sequential procedure, we first split the computational domain into (almost) equally-size non-overlapping sub-domains (based on load balancing constraints). To describe our algorithm, we consider a Cartesian decomposition in this section, i.e., where along are number of sub-domains along and spatial axes respectively. For the sake of convenience, it is assumed that is decomposed into a array of sub-domains (cf. figure 4).

Figure 4: Decomposition of a Cartesian mesh into array of sub-domains.

As it is implied in previous, a key to achieve a parallel SOR with a little convergence rate decay is to minimize degree of decoupling due to parallelizing, as much as possible. In the original SOR when a point is updated the remaining points sense its new value due to causality of sweeping strategy.

Figure 5: Decomposition of a Cartesian mesh into array of sub-domains: sweeping directions and order of update are shown during two consecutive iterations, coupling between nodes are shown by shading.

To keep the causality of sweeping partially, a multi-frontal strategy is employed here for the purpose of parallelization. Consider the mentioned Cartesian splitting of the computational domain. According to figure 5, we use frontal sweeping along directions, , , and for sub-domains 1 though 4 respectively. The order of updating is shown in plots. It is clear that to update node #1 of each sub-domain, we need two unknown values (new values) from two neighbor sub-domains. A shaded region in figure show this coupling in iteration -th. Therefore a local system of equation should be solved for this purpose. Assuming node #1 of (at iteration -th, left side of figure) is denoted by and using global index, this local system of equation has the following form (application of (12), (14), (13) and (11) for though respectively),

(15)

This system of equation is non-singular and can be easily inverted. After update of the corner node (node 1), we should update nodes on (virtual) boundaries of sub-domains. As it is shown in figure 5), node #2 of each sub-domain is coupled to node #2 of a neighbor sub-domain. Assuming node #1 of is denoted by and using global index, using (12) and (13)the following system of equations should be solved to compute new values of node #2 in and , in fact and (cf. 5),

(16)

Note that the new values of and are located in right hand side of (3) as they are known after solution of (3). In the same manner other remaining boundary nodes will be updated. Then, the internal nodes will be updated using a classic SOR method along the corresponding sweeping direction.

The next sweep (iteration -th) for the mentioned decomposition will be performed along the corresponding opposite directions, i.e., , , and for sub-domains 1 though 4 respectively (cf. figure 5). In this specific example there is no need to solution some local system of equations for this sweep; but in general one may usually needs to solve such systems at coupling points of sub-domains which are determined based on sweeping directions. Figure 6 shows the alteration of sweeping directions during each four consecutive iterations for a array of sub-domains. In the next following sub-sections some implementation issues of our algorithm will be discussed in details.

Figure 6: Alternation of sweeping directions during 4 consecutive iterations for array of sub-domains.

As a final remark in this section it should be mentioned that the extension of our algorithm for nine-point or higher order central stencils is straightforward. As the extend of computational molecule is increased in a higher order method, the size of local systems to be solved will be also increased accordingly. Following the same strategy, it would be possible to apply the presented method for other numerical methods, e.g., FVM and FEM methods, on a structured grid. Note that every structured grid has a one-to-one correspondence with with a Cartesian grid. Therefore, the presented method can be directly used for non-Cartesian structured grids too. To save the space, further details in this regards are ignored in this study.

3.0.1 Domain decomposition

The first aspect of our parallel algorithm is to divide the computational grid into parts. This is how the full computational task is divided among the various processors. Each processor works on a portion of the grid and anytime a processor needs information from another one a message will be passed (in message passing protocol; no message passing is needed regarding to shared-memory machines).

To achieve a good parallel performance, we like to have an optimal load balancing and the least communication between processors. Consider the load balancing first. Assuming a homogeneous parallel machine, one would like that each processor does the same amount of work during every iteration. That way, each processor is always working and not sitting idle. It makes sense to partition the grid such that each processor gets an equal (or nearly equal) number of nodes to work on. The second criterion, needs to minimize the number of edge cuts and the degree of adjacency for each sub-domain (number of neighbors processor). So the domain decomposition could be converted to a constrained optimization problem in which the communication cost to be minimized under the load balancing constraint. In the case of regular Cartesian grid, this optimization problem can be easily solved within a negligible cost.

The cost of communication is composed from two types of elapsed time, one proportional to the communicated data size (function of network communication speed) and one due to the network latency which is independent from the data size. Therefore, dealing with a high latency networks, we may prefer to decrease the degree of adjacency in our communication graph; in expense of a higher size of communicated data size. On the other hand, dealing with low latency networks, we may neglect the effect of network latency. It is worth mentioning that, using an overlapped communication and computation strategy, it will be possible to decrease cost of communication.

The above discussion is correct in general for every parallel algorithms. However, one should consider other criteria when the convergence rate of the parallel algorithm is a function domain decomposition topology (number of sub-domains and their geometries).

In the present study increasing number of sub-domains along each spatial direction is equivalent to decreasing the degree of coupling in that direction. Rough speaking, it usually decreases the convergence rate. However, this conclusion is not correct in general. In some cases, a limited degree of decoupling not only decreases the convergence rate but also improves it. As an example consider the solution of Poisson equation in a square domain. Also, assume that the center of domain is the center of symmetry for the exact solution. Now, let’s to decompose the domain into equal sub-domains. Due to mentioned symmetry, the parallel algorithm performs superior than the original SOR method (as it is more consistent to mentioned symmetry in actual solution). This example implies that decoupling does not always decrease the convergence rate. Therefore, one may expect that an algorithm with partial coupling (like that of ours) performs even superior than the original SOR in practice.

3.0.2 Sweeping directions

After the domain decomposition, it is essential to determine the sweeping directions for each sub-domain during 4 consecutive iterations. For this purpose it is sufficient to use the following rules (lie 1D version of algorithm):

  • Along each spatial direction the sweeping direction of a sub-domain should be in opposite direction of its neighbor sub-domains (cf. figure 6).

  • The sweeping directions (along all spatial directions) should be reversed during every pair of iterations (cf. figure 6).

  • After each pair of iteration, the sweeping will be performed along a new diagonal, e.g., after two consecutive iterations along diagonal -, the direction of sweeping for the next two iteration will be changed to - diagonal (cf. figure 6).

Using the above mentioned rules, the sweeping directions are generated automatically without any complexity, e.g., figure 6 is produced applying the above mentioned rules within a nested loop using a few lines of Tikz111www.texample.net/tikz/ scripts.

3.0.3 Data structure

In the sequential SOR a two-dimensional array is sufficient to store the global data. This needs a nested loop to sweep the hole domain. Since in the presented parallel SOR method, we do not have a regular order of updating for sub-domains boundary nodes, it is preferable to store order of update for these nodes. To achieve better performance every data (mentioned indexes and field variables) are stored in a one-dimensional array (sometimes called as vectorization). In fact in this way we have a one-to-one mapping from 2D indexing to 1D indexing, e.g., node is stored at location in the one-dimensional array. In this way, the location of nodes and will be the location of node plus 1 and values respectively.

To update nodes which are located at boundaries between neighbor sub-domains, the data from the corresponding neighbor sub-domains is required. Therefore, we add two rows/columns of halo points around each sub-domain which are renewed (via communication) prior to a new iteration (in the case of higher order schemes, the width of halo region should be increased accordingly). In the present implementation every sub-domain has at most 8 neighbors, i.e., 4 first-degree neighbors (contacted via an edge), 4 second-degree neighbors (contacted via a corner). Figure 7 shows schematically the communicated data during to and to sweeps for a sub-domain with 8 active neighbors. In this figure, yellow-shaded regions contain native nodes and gray-shaded regions includes required data which should be communicated. It is clear that it is possible to overlap a portion of communications with computations to improve the performance.

Figure 7: Schematic of data communication during frontal sweeping to (left) and to (right) for a sub-domain with 8 active neighbors: yellow shaded regions contain native nodes and gray shaded regions include communicated nodes.

4 Extension to three-dimensions

Following the geometric procedure discussed in the previous section, it is straightforward to extend the presented method to three-dimensional (3D) cases. The main ingredients of such extensions are briefly addressed here.

In 3D case, there are eight alternatives frontal sweeping directions. Adding front and back directions to mentioned geographic directions (east, west, north, south), these eight sweep directions are as follows: -to-, -to-, -to-, -to-, -to-, -to-, -to- and -to-. There are eight relaxation parameters accordingly, , , , , , , and . After the domain decomposition, the sweeping directions are determined for each domain for each eight consecutive iterations using mentioned rules in 3.0.2. Using a Cartesian decomposition topology, each sub-domain will has at most 26 neighbors, i.e., 6 first-degree neighbors (contacted via a face), 12 second-degree neighbors (contacted via an edge), 8 third-degree neighbors (contacted via a corner).

Same as 2D version of algorithm, some local system of equations are needed to be solved at sub-domains coupling nodes. Now, consider a sub-domain with 26 active neighbor. At starting corner, a system of linear equations should be solved. Note that this system is not dense (includes 32 none-zero entries). Then, along each of the three coupling edges, a sort of local system of linear equations should be solved in an appropriate order. In the same manner, on each of the three corresponding coupling faces a sort of local systems of equations should be solved based on an appropriate update order. Updating coupling nodes, the remaining nodes are updated using classic SOR method along the corresponding sweep direction.

5 Parallel incomplete LU preconditioners

The SOR method does not usually used as an iterative solver in practice. But it is commonly used as a smoother in multi-grid methods or as a preconditioner in Krylov subspace methods. In these cases, a few iterations of SOR method is applied during each step of preconditioning (or smoothing). Another kind of preconditioners which are extensively used in Krylov subspace methods are incomplete LU (ILU) preconditioners. However due to sequential nature of incomplete factorization and also forward/backward subsituation procedure, application of ILU methods in parallel is challenge.

In this section, we show that the presented parallel method can be equivalently used to develop parallel ILU preconditioners. A simple observation shows that single pass of a symmetric Gauss-Seidel method on a n-diagonal matrix (precisely a matrix includes only a few non-zero diagonals with symmetric sparsity pattern) is equivalent to ILU(0) preconditioning (cf. section 10.3 of [saad2003ims]). Therefore, the presented parallel algorithm can be used as a parallel ILU(0) preconditioner as well. Similarly, ILU(p) preconditioning for such structured matrices is equivalent to application of our method for a higher order method (cf. section 10.3 of [saad2003ims]) which make sense to use our method to parallelize ILU(p) preconditioners as well.

Later, we will show that our method can be applied for n-diagonal matrices raised from discretization of PDEs on structured grids. Therefore, the presented strategy can be also regarded as a parallel ILU(p) preconditioner for such n-diagonal matrices.

6 Convergence analysis

In this section we first proof the convergence of the presented parallel algorithm in one- and two- spatial dimensions. Following the same line of proof, the extension of theory to higher dimension will be straightforward.

6.1 Mono-dimensional convergence analysis

To simplify the analysis, it is worth to write the presented method in an algebraic form. Without loss of generality, assume that the spatial domain is decomposed into number of sub-domains where is an even integer and all sub-domains include number of grid points (note that boundary nodes are not considered here, i.e., ). Also, assume that the sweeping direction of the first sub-domain is at the starting iteration. We can decompose matrix in (3) as follows,

(17)

where denotes an identity matrix here,

for ,

for ,

Note that matrices and should be assembled in and such that their positive diagonal match to the main diagonal of and . Assume that eigenvalues of matrices and are denoted by vectors and respectively. The following Lemma determines and explicitly.

{lemma}

and are strictly positive definite and we have,

{proof}

Considering the structures of and , it is easy to verify that and are block lower and upper triangular matrices. Note that the definition of both of the block lower triangular and upper triangular matrix hold for each of and . Therefore, their corresponding eigenvalues are equal to union of their diagonal blocks’s eigenvalues. The Diagonal blocks have a simple bi-diagonal structure and their eigenvalues are equal to their main diagonals. Considering the fact that that completes the proof.

{theorem}

The presented parallel one-dimensional algorithm is convergent if

{proof}

Considering the decomposition (17) of matrix , with simple algebra, we can write the presented one-dimensional parallel iterative algorithm in the following matrix form ,

(18)

where . From (18) we have,

(19)

in which,

(20)

To proof the convergence of iterations (19) it is remained to show that the spectral radius of iteration matrix which is denoted by here is strictly less than unity (cf. [saad2003ims]). Now, let’s to define the matrix which is equivalent to (has the same spectral radius),

(21)

Using (6.1), can be written as follows,

(22)

Let’s to define the following matrices

Assume eigenvalues of , , and are denoted by the following vectors respectively,

It is easy to show that,

Using the knowledge of linear algebra we have,

where the norm operator is understood as the Euclidean norm here,

Therefore, if,

which complete the proof.

{corollary}

The presented one-dimensional parallel Gauss-Seidel algorithm () is unconditionally convergent.

{corollary}

When the convergence criterion of the presented one-dimensional parallel SOR algorithm is similar to that of sequential SOR method, i.e., . Moreover in this case.

{remark}

Using the same procedure, it will easy to show that the coefficient matrix corresponding to a high order methods (includes more extended computational molecules) admits a decomposition like to (17), where and are two strictly positive definite block lower and upper triangular matrices respectively. Then, it will be straightforward to proof the convergence similarly.

6.2 Two-dimensional convergence analysis

Following the previous section, it is possible to proof the convergence of the presented method in two spatial dimensions. To illustrate this issue, the convergence of the presented two-dimensional algorithm for discrtized equation (10) will be discussed in this section.

Extension of mono-dimensional proof to higher dimensions is based on tensor product properties of the structured grids. The standard five-point discretization of (9) problem on an structured grid (ignoring Dirichlet boundary nodes) leads to the following system of linear equation (the effects of Dirichlet boundary nodes are included in the right hand side vector),

where strictly positive definite dimensional matrix has the following structure (row-wise ordering is assumed for grid points),

where each of , and is an multi-diagonal (three or mono-diagonal) matrices. For ,