Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution and Inversion
Abstract
We describe a new data format for storing triangular, symmetric, and Hermitian matrices called RFPF (Rectangular Full Packed Format). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full format representation. Also, RFPF requires exactly the same minimal storage as packed format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full format routine and two calls to Level 3 BLAS routines. This means no new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines.
G.1.3Numerical AnalysisNumerical Linear Algebra – Linear Systems (symmetric and Hermitian) \categoryG.4Mathematics of ComputingMathematical Software \termsAlgorithms, BLAS, Performance, Linear Algebra Libraries {bottomstuff} Authors’ addresses: F.G. Gustavson, IBM T.J. Watson Research Center, Yorktown Heights, NY-10598, USA, email: fg2@us.ibm.com; J. Waśniewski, Department of Informatics and Mathematical Modelling, Technical University of Denmark, Richard Petersens Plads, Building 321, DK-2800 Kongens Lyngby, Denmark, email: jw@imm.dtu.dk; J.J. Dongarra, Electrical Engineering and Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN 37996-3450, USA, email: dongarra@eecs.utk.edu; Julien Langou, Department of Mathematical and Statistical Sciences, University of Colorado Denver, 1250, 14th Street – Room 646, Denver, Colorado 80202, USA, email: julien.langou@ucdenver.edu.
1 Introduction
A very important class of linear algebra problems deals with a coefficient matrix that is symmetric and positive definite [Dongarra et al. (1998), Demmel (1997), Golub and Van Loan (1996), Trefethen and Bau (1997)]. Because of symmetry it is only necessary to store either the upper or lower triangular part of the matrix .
1.1 LAPACK full and packed storage formats
The LAPACK library [Anderson et al. (1999)] offers two different kinds of subroutines to solve the same problem: POTRF111Four names SPOTRF, DPOTRF, CPOTRF and ZPOTRF are used in LAPACK for real symmetric and complex Hermitian matrices [Anderson et al. (1999)], where the first character indicates the precision and arithmetic versions: S – single precision, D – double precision, C – complex and Z – double complex. LAPACK95 uses one name LA_POTRF for all versions [Barker et al. (2001)]. In this paper, POTRF and/or PPTRF express, any precision, any arithmetic and any language version of the PO and/or PP matrix factorization algorithms. and PPTRF both factorize symmetric, positive definite matrices by means of the Cholesky algorithm. A major difference in these two routines is the way they access the array holding the triangular matrix (see Figures 1 and 2).
In the POTRF case, the matrix is stored in one of the lower left or upper right triangles of a full square matrix ([Anderson et al. (1999), pages 139 and 140] and [IBM (1997), page 64])222In Fortran column major, in C row major., the other triangle is wasted (see Figure 1). Because of the uniform storage scheme, blocked LAPACK and Level 3 BLAS subroutines [Dongarra et al. (1990b), Dongarra et al. (1990a)] can be employed, resulting in a fast solution.
In the PPTRF case, the matrix is stored in packed storage ([Anderson et al. (1999), pages 140 and 141], [Agarwal et al. (1994)] and [IBM (1997), pages 74 and 75]), which means that the columns of the lower or upper triangle are stored consecutively in a one dimensional array (see Figure 2). Now the triangular matrix occupies the strictly necessary storage space but the nonuniform storage scheme means that use of full storage BLAS is impossible and only the Level 2 BLAS packed subroutines [Lawson et al. (1979), Dongarra et al. (1988)] can be employed, resulting in a slow solution.
To summarize: LAPACK offers a choice between high performance and wasting half of the memory space (POTRF) versus low performance with optimal memory space (PPTRF).
1.2 Packed Minimal Storage Data Formats related to RFPF
Recently many new data formats for matrices have been introduced for improving the performance of Dense Linear Algebra (DLA) algorithms. The survey article [Elmroth et al. (2004)] gives an excellent overview.
Recursive Packed Format (RPF) [Andersen et al. (2001), Andersen et al. (2002)]:
A new compact way to store a triangular, symmetric or Hermitian matrix called Recursive
Packed Format is described in [Andersen et al. (2001)] as are novel ways to transform
RPF to and from standard packed format. New algorithms, called Recursive
Packed Cholesky (RPC) [Andersen et al. (2001), Andersen et al. (2002)] that operate on the RPF
format are presented. RPF format operates almost entirely by calling
Level 3 BLAS GEMM [Dongarra et al. (1990b), Dongarra et al. (1990a)] but requires variants of algorithms
TRSM and SYRK [Dongarra et al. (1990b), Dongarra et al. (1990a)] that are designed t work on RPF. The authors call
these algorithms RPTRSM and RPSYRK [Andersen et al. (2001)] and find that they do most of
their FLOPS by calling GEMM [Dongarra et al. (1990b), Dongarra et al. (1990a)]. It follows that almost all
of execution time of the RPC algorithm is done in calls to GEMM.
There are three advantages of this storage scheme compared to traditional packed
and full storage. First, the RPF storage format
uses the minimum amount of storage required for symmetric, triangular, or
Hermitian matrices. Second, the RPC algorithm is a Level 3 implementation of
Cholesky factorization. Finally, RPF requires no block size tuning parameter.
A disadvantage of the RPC algorithm was that it had a high recursive calling overhead.
The paper [Gustavson and
Jonsson (2000)] removed this overhead and added other novel features
to the RPC algorithm.
Square Block Packed Format (SBPF) [Gustavson (2003)]: SBPF is described in Section 4 of [Gustavson (2003)]. A strong point of SBPF is that it requires minimum block storage and all its blocks are contiguous and of equal size. If one uses SBPF with kernel routines then data copying is mostly eliminated during Cholesky factorization.
Block Packed Hybrid Format (BPHF) [Andersen et al. (2005), Gustavson et al. (2007)]: We consider an efficient implementation of the Cholesky solution of symmetric positive-definite full linear systems of equations using packed storage. We take the same starting point as that of LINPACK [Dongarra et al. (1979)] and LAPACK [Anderson et al. (1999)], with the upper (or lower) triangular part of the matrix being stored by columns. Following LINPACK [Dongarra et al. (1979)] and LAPACK [Anderson et al. (1999)], we overwrite the given matrix by its Cholesky factor. The paper [Andersen et al. (2005)] uses the BPHF where blocks of the matrix are held contiguously. The paper compares BPHF versus conventional full format storage, packed format and the RPF for the algorithms. BPF is a variant of SBPF in which the diagonal blocks are stored in packed format and so its storage requirement is equal to that of packed storage.
We mention that for packed matrices SBPF and BPHF have become the format of choice for multicore processors when one stores the blocks in register block format [Gustavson et al. (2007)]. Recently, there have been many papers published on new algorithms for multicore processors. This literature is extensive. So, we only mention two projects, PLASMA [Buttari et al. (2007)] and FLAME [Chan et al. (2007)], and refer the interested reader to the literature for additional references.
In regard to other references on new data structures, the survey article [Elmroth et al. (2004)] gives an excellent overview. However, since 2005 at least two new data formats for Cholesky type factorizations have emerged, [Herrero (2006)] and the subject matter of this paper, RFPF [Gustavson and Waśniewski (2007)]. In the next subsection we highlight the main features of RFPF.
1.3 A novel way of representing triangular, symmetric, and Hermitian matrices in LAPACK
LAPACK has two types of subroutines for triangular, symmetric, and Hermitian matrices called packed and full format routines. LAPACK has about 300 these kind of subroutines. So, in either format, a variety of problems can be solved by these LAPACK subroutines. From a user point of view, RFPF can replace both these LAPACK data formats. Furthermore, and this is important, using RFPF does not require any new LAPACK subroutines to be written. Using RFPF in LAPACK only requires the use of already existing LAPACK and BLAS routines. RFPF strongly relies on the existence of the BLAS and LAPACK routines for full storage format.
1.4 Overview of the Paper
First we introduce the RFPF in general, see Section 2. Secondly we show how to use RFPF on symmetric and Hermitian positive definite matrices; e.g., for the factorization (Section 3), solution (Section 4), and inversion (Section 5) of these matrices. Section 6 describes LAPACK subroutines for the Cholesky factorization, Cholesky solution, and Cholesky inversion of symmetric and Hermitian positive definite matrices using RFPF. Section 7 indicates that the stability results of using RFPF is unaffected by this format choice as RFPF uses existing LAPACK algorithms which are already known to be stable. Section 8 describes a variety of performance results on commonly used platforms both for serial and parallel SMP execution. These results show that performance of LAPACK full routines using RFPF versus LAPACK full routines using standard format for both serial and SMP parallel processing is about the same while using half the storage. Also, performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines. Section 9 explains how some new RFPF routines have been integrated in LAPACK. LAPACK software for Cholesky algorithm (factorization, solution and inversion) using RFPF has been released with LAPACK-3.2 on November 2008. Section 10 gives a short summary and brief conclusions.
2 Description of Rectangular Full Packed Format
We describe Rectangular Full Packed Format (RFPF). It transforms a standard Packed Array AP of size to a full 2D array. This means that performance of LAPACK’s [Anderson et al. (1999)] packed format routines becomes equal to or better than their full array counterparts. RFPF is a variant of Hybrid Full Packed (HFP) format [Gunnels and Gustavson (2004)]. RFPF is a rearrangement of a Standard full format rectangular Array SA of size LDA*N where . Array SA holds a triangular part of a symmetric, triangular, or Hermitian matrix of order . The rearrangement of array SA is equal to compact full format Rectangular Array AR of size and hence array AR like array AP uses minimal storage. (The specific values of LDA1 and N1 can vary depending on various cases and they will be specified later during the text.) Array AR will hold a full rectangular matrix obtained from a triangle of matrix . Note also that the transpose of the rectangular matrix resides in the transpose of array AR and hence also represents . Therefore, Level 3 BLAS [Dongarra et al. (1990b), Dongarra et al. (1990a)] can be used on array AR or its transpose. In fact, with the equivalent LAPACK algorithm which uses the array AR or its transpose, the performance is slightly better than standard LAPACK algorithm which uses the array SA or its transpose. Therefore, this offers the possibility to replace all packed or full LAPACK routines with equivalent LAPACK routines that work on array AR or its transpose. For examples of transformations of a matrix to a matrix see the figures in Section 6.
RFPF is closely related to HFP format, see [Gunnels and Gustavson (2004)], which represents as the concatenation of two standard full arrays whose total size is also . A basic simple idea leads to both formats. Let be an order symmetric matrix. Break into a block 2–by–2 form
(1) |
where and are symmetric. Clearly, we need only store the lower triangles of and as well as the full matrix when we are interested in a lower triangular formulation.
When is even, the lower triangle of and the upper triangle of can be concatenated together along their main diagonals into a –by– dense matrix (see the figures where N is even in Section 6). This last operation is the crux of the basic simple idea. The off-diagonal block is –by–, and so it can be appended below the –by– dense matrix. Thus, the lower triangle of can be stored as a single –by– dense matrix . In effect, each block matrix , and is now stored in ‘full format’. This means all entries of matrix in array AR of size by can be accessed with constant row and column strides. So, the full power of LAPACK’s block Level 3 codes are now available for RFPF which uses the minimum amount of storage. Finally, matrix which has size –by– is represented in the transpose of array AR and hence has the same desirable properties. There are eight representations of RFPF. The matrix can have have either odd or even order , or it can be represented either in standard lower or upper format or it can be represented by either matrix or its transpose giving representations in all.
All eight cases or representations are presented in Section 6. The RFPF matrices are in the upper right part of the figures. We have introduced colors and horizontal lines to try to visually delineate triangles , representing lower, upper triangles of symmetric matrices , respectively and square or near square representing matrices . For an upper triangle of , , represents lower, upper triangles of symmetric matrices , respectively and square or near square representing matrices . For both lower and upper triangles of we have, after each , added its position location in the arrays holding matrices and .
We now consider performance aspects of using RFPF in the context of using LAPACK routines on triangular matrices stored in RFPF. Let be a Level 3 LAPACK routine that operates either on full format. has a full Level 3 LAPACK block 2–by–2 algorithm, call it . We write a simple related partition algorithm (SRPA) with partition sizes and where . Apply the new SRPA using the new RFPF. The new SRPA almost always has four major steps consisting entirely of calls to existing full format LAPACK routines in two steps and calls to Level 3 BLAS in the remaining two steps, see Figure 3.
|
|
Section 6 shows algorithms equal to factorization, solution and inversion algorithms on symmetric positive definite or Hermitian matrices.
3 Cholesky Factorization using Rectangular Full Packed Format
The Cholesky factorization of a symmetric and positive definite matrix can be expressed as
(2) |
where and are lower triangular and upper triangular matrices.
Break the matrices and into 2–by–2 block form in the same way as was done for the matrix in Equation (1):
(3) |
We now have
(4) |
where , , , and are lower and upper triangular submatrices, and and are square or almost square submatrices.
Using Equations (2) and equating the blocks of Equations (1) and Equations (4) gives us the basis of a 2–by–2 block algorithm for Cholesky factorization using RFPF. We can now express each of these four block equalities by calls to existing LAPACK and Level 3 BLAS routines. An example, see Section 6, of this is the three block equations is , and . The first and second of these block equations are handled by calling LAPACK’s POTRF routine and by calling Level 3 BLAS TRSM via . In both these block equations the Fortran equality of replacement () is being used so that the lower triangle of is being replaced and the nearly square matrix is being replaced by . The third block equation breaks into two parts: and which are handled by calling Level 3 BLAS SYRK or HERK and by calling LAPACK’s POTRF routine. At this point we can use the flexibility of the LAPACK library. In RFPF is in upper format (upper triangle) while in standard format is in lower format (lower triangle). Due to symmetry, both formats of contain equal values. This flexibility allows LAPACK to accommodate both formats. Hence, in the calls to SYRK or HERK and POTRF we set uplo = ’U’ even though the rectangular matrix of SYRK and HERK comes from a lower triangular formulation.
New LAPACK like routine PFTRF performs these four computations. PF was chosen to fit with LAPACK’s use of PO and PP. The PFTRF routine covers the Cholesky Factorization algorithm for the eight cases of the RFPF. Section 6 has Figure 4 with four subfigures. Here we are interested in the first and second subfigure. The first subfigure contains the layouts of matrices and . The second subfigure has the Cholesky factorization algorithm obtained by simple algebraic manipulations of the three block equalities obtained above.
4 Solution
In Section 3 we obtained the 2–by–2 Cholesky factorization (3) of matrix . Now, we can solve the equation :
-
If has lower triangular format then
(5) -
If has an upper triangular format then
(6)
, and are either vectors or rectangular matrices. contains the RHS values. and contain the solution values. , and are vectors when there is one RHS and matrices when there are many RHS. The values of and are stored over the values of .
Expanding (5) and (6) using (3) gives the forward substitution equations
(7) |
and the back substitution equations
(8) |
The Equations (7) and (8) gives the basis of a block algorithm for Cholesky solution using RFPF format. We can now express these two sets of two block equalities by using existing Level 3 BLAS routines. An example, see Section 6, of the first set of these two block equalities is and . The first block equality is handled by Level 3 BLAS TRSM: . The second block equality is handled by Level 3 BLAS GEMM and TRSM: and . The backsolution routines are similarly derived. One gets , and .
New LAPACK like routine PFTRS performs these two solution computations for the eight cases of RFPF. PFTRS calls a new Level 3 BLAS TFSM in the same way that POTRS calls TRSM. The third subfigure in Section 6 gives the Cholesky solution algorithm using RFPF obtained by simple algebraic manipulation of the block Equations (7) and (8).
5 Inversion
Following LAPACK we consider the following three stage procedure:
-
Factorize the matrix and overwrite with either or by calling PFTRF; see Section 3.
-
Compute the inverse of either or . Call these matrices or and overwrite either or with them. This is done by calling new routine new LAPACK like TFTRI.
-
Calculate either the product or and overwrite either or with them.
As in Sections 3 and 4 we examine 2–by–2 block algorithms for the steps two and three above. In Section 3 we obtain either matrices or in RFPF. Like LAPACK inversion algorithms for POTRI and PPTRI, this is our starting point for our LAPACK inversion algorithm using RFPF. The LAPACK inversion algorithms for POTRI and PPTRI also follow from steps two and three above by first calling in the full case LAPACK TRTRI and then calling LAPACK LAUUM.
Using the 2–by–2 blocking for either or in Equation (3) we obtain the following 2–by–2 blocking for and :
(10) |
From the identities and and the 2–by–2 block layouts of Equations (3) and 3), we obtain three block equations for and which can be solved using LAPACK routines for TRTRI and Level 3 BLAS TRMM. An example, see Figure 4, of these three block equations is , and . The first and third of these block equations are handled by LAPACK TRTRI routines as and . In the second inverse computation we use the fact that is equally represented by it transpose which is in RFPF. The second block equation leads to two calls to Level 3 BLAS TRMM via and . In the last two block equations the Fortran equality of replacement () is being used so that is replacing .
Now we turn to part three of the three stage LAPACK procedure above. For this we use the 2–by–2 blocks layouts of Equation (10) and the matrix multiplications indicated by following block Equations (11) giving
(11) |
where , , , and are lower and upper triangular submatrices, and and are square or almost square submatrices. The values of the indicated block multiplications of or in Equation (11) are stored over the block values of or .
Performing the indicated 2–by–2 block multiplications of Equation (11) leads to three block matrix computations. An example, see Section 6, of these three block computations is , and . Additionally, we want to overwrite the values of these block multiplications on their original block operands. Block operand only occurs in the (1,1) block operand computation and hence can be overwritten by a call to LAPACK LAUUM, , followed by a call to Level 3 BLAS SYRK or HERK, . Block operand now only occurs in the (2,1) block computation and hence can be overwritten by a call to Level 3 BLAS TRMM, . Finally, block operand can be overwritten by a call to LAPACK LAUUM, .
The fourth subfigure in Section 6 has the Cholesky inversion algorithms using RFPF based on the results of this Section. New LAPACK routine, PFTRI, performs this computation for the eight cases of RFPF.
6 RFP Data Formats and Algorithms
This section contains three figures.
-
The first figure describes the RFPF (Rectangular Full Packed Format) and gives algorithms for Cholesky factorization, solution and inversion of symmetric positive definite matrices, where is odd, uplo = ’lower’, and trans = ’no transpose’. This figure has four subfigures.
-
The first subfigure depicts the lower triangle of a symmetric positive definite matrix in standard full and its representation by the matrix in RFPF.
-
The second subfigure gives the RFPF Cholesky factorization algorithm and its calling sequences of the LAPACK and BLAS subroutines, see Section 3.
-
The third subfigure gives the RFPF Cholesky solution algorithm and its calling sequences to the LAPACK and BLAS subroutines, see Section 4.
-
The fourth subfigure in each figure gives the RFPF Cholesky inversion algorithm and its calling sequences to the LAPACK and BLAS subroutines, see Section 5.
-
-
The second figure shows the transformation from full to RFPF of all “no transform” cases.
-
The third figure depicts all eight cases in RFPF.
The data format for has . Matrix has if is odd and if is even and columns where . Hence, matrix always has LDAR rows and columns. Matrix always has rows and LDAR columns and its leading dimension is equal to . Matrix always has elements as does matrix .
The order of matrix in the first figure is seven and six or seven in the remaining two figures.
7 Stability of the RFPF Algorithm
The RFPF Cholesky factorization (Section 3), Cholesky solution (Section 4), and Cholesky inversion (Section 5) algorithms are equivalent to the traditional algorithms in the books [Dongarra et al. (1998), Demmel (1997), Golub and Van Loan (1996), Trefethen and Bau (1997)]. The whole theory of the traditional Cholesky factorization, solution, inversion and BLAS algorithms carries over to this three Cholesky and BLAS algorithms described in Sections 3, 4, and 5. The error analysis and stability of these algorithms is very well described in the book of [Higham (1996)]. The difference between LAPACK algorithms PO, PP and RFPF333full, packed and rectangular full packed. is how inner products are accumulated. In each case a different order is used. They are all mathematically equivalent, and, stability analysis shows that any summation order is stable.
8 A Performance Study using RFP Format
The LAPACK library [Anderson et al. (1999)] routines POTRF/PPTRF, POTRI/PPTRI, and POTRS/PPTRS are compared with the RFPF routines PFTRF, PFTRI, and PFTRS for Cholesky factorization (PxTRF), Cholesky inverse (PxTRI) and Cholesky solution (PxTRS) respectively. In the previous sentence, the character ’x’ can be ’O’ (full format), ’P’ (packed format), or ’F’ (RFPF). In all cases long real precision arithmetic (also called double precision) is used. Sometimes we also show results for long complex precision (also called complex*16). Results were obtained on several different computers using everywhere the vendor Level 3 and Level 2 BLAS. The sequential performance results were done on the following computers:
- Sun Fire E25K (newton):
-
72 UltraSPARC IV+ dual-core CPUs (1800 MHz/ 2 MB shared L2-cache, 32 MB shared L3-cache), 416 GB memory (120 CPUs/368 GB). Further information at “http://www.gbar.dtu.dk/index.php/Hardware”.
- SGI Altix 3700 (Freke):
-
64 CPUs - Intel Itanium2 1.5 GHz/6 MB L3-cache. 256 GB memory. Peak performance: 384 GFlops. Further information at “http://www.cscaa.dk/freke/”.
- Intel Tigerton computer (zoot):
-
quad-socket quad-core Intel Tigerton 2.4GHz (16 total cores) with 32 GB of memory. We use Intel MKL 10.0.1.014.
- DMI Itanium:
-
CPU Intel Itanium2: 1.3 GHz, cache: 3 MB on-chip L3 cache.
- DMI NEC SX-6 computer:
-
8 CPU’s, per CPU peak: 8 Gflops, per node peak: 64 Gflops, vector register length: 256.
The performance results are given in Figures 7 to 15. In Appendix A, we give the table data used in the figures, see Tables 2 to 27. We also give speedup numbers, see Tables 28 to 35.
The figures from 7 to 10 are paired. Figure 7 (double precision) and Figure 8 (double complex precision) present results for the Sun UltraSPARC IV+ computer. Figure 9 (double precision) and Figure 10 (double complex precision) present results for the SGI Altix 3700 computer. Figure 11 (double precision) presents results for the Intel Itanium2 computer. Figure 12 (double precision) presents results for the NEC SX-6 computer. Figure 13 (double precision) presents results for the quad-socket quad-core Intel Tigerton computer using reference LAPACK-3.2.0 (from netlib.org). Figure 14 (double precision) presents results for the quad-socket quad-core Intel Tigerton computer using vendor LAPACK library (MKL-10.0.1.14).
Figure 15 shows the SMP parallelism of these subroutines on the IBM Power4 (clock rate: 1300 MHz; two CPUs per chip; L1 cache: 128 KB (64 KB per CPU) instruction, 64 KB 2-way (32 KB per CPU) data; L2 cache: 1.5 MB 8-way shared between the two CPUs; L3 cache: 32 MB 8-way shared (off-chip); TLB: 1024 entries) and SUN UltraSPARC-IV (clock rate: 1350 MHz; L1 cache: 64 kB 4-way data, 32 kB 4-way instruction, and 2 kB Write, 2 kB Prefetch; L2 cache: 8 MB; TLB: 1040 entries) computers respectively. They compare SMP times of PFTRF, vendor POTRF and reference PPTRF.
The RFPF packed results greatly outperform the packed and more often than not are better than the full results. Note that our timings do not include the cost of sorting any LAPACK data formats to RFPF data formats and vice versa. We think that users will input their matrix data using RFPF. Hence, this is our rationale for not including the data transformation times.
For all our experiments, we use vendor Level 3 and Level 2 BLAS. For all our experiments except Figure 13 and Figure 15, we use the provided vendor library for LAPACK and BLAS.
We include comparisons with reference LAPACK for the quad-socket quad-core Intel Tigerton machine in Figure 13. In this case, the vendor LAPACK library packed storage routines significantly outperform the LAPACK reference implementation from netlib. In Figure 14, you find the same experiments on the same machine but, this time, using the vendor library (MKL-10.0.1.014). We think that MKL is using the reference implementation for Inverse Cholesky (packed and full format). For Cholesky factorization, we see that both packed and full format routines (PPTRF and POTRF) are tuned. But even, in this case, our RFPF storage format results are better.
When we compare RFPF with full storage, results are mixed. However, both codes are rarely far apart. Most of the performance ratios are between 0.95 to 1.05 overall. But, note that the RFPF performance is more uniform over its versions (four presented; the other four are for n odd ). For LAPACK full (two versions ), the performance variation is greater. Moreover, in the case of the inversion on quad-socket quad-core Tigerton (Figure 13 and Figure 14) RFPF clearly outperforms both variants of the full format.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
9 Integration in LAPACK
As mentioned in the introduction, as of release 3.2 (November 2008), LAPACK supports a preliminary version of RFPF. Ultimately, the goal would be for RFPF to support as many functionnalities as full format or standard packed format does. The 44 routines included in release 3.2 for RFPF are given in Table 1. The names for the RFPF routines follow the naming nomenclature used by LAPACK. We have added the format description letters: PF for Symmetric/Hermitian Positive Definite RFPF (PO for full, PP for packed), SF for Symmetric RFPF (SY for full, SP for packed), HF for Hermitian RFPF (HE for full, HP for packed), and TF for Triangular RFPF (TR for full, TP for packed).
Currently, for the complex case, we assume that the transpose complex-conjugate part is stored whenever the transpose part is stored in the real case. This corresponds to the theory developed in this present manuscript. In the future, we will want to have the flexibility to store the transpose part (as opposed to transpose complex conjugate) whenever the transpose part is stored in the real case. In particular, this feature will be useful for complex symmetric matrices.
functionality | routine names and calling sequence | |||
---|---|---|---|---|
Cholesky factorization | CPFTRF | DPFTRF | SPFTRF | ZPFTRF |
(TRANSR,UPLO,N,A,INFO) | ||||
Multiple solve after PFTRF | CPFTRS | DPFTRS | SPFTRS | ZPFTRS |
(TRANSR,UPLO,N,NR,A,B,LDB,INFO) | ||||
Inversion after PFTRF | CPFTRI | DPFTRI | SPFTRI | ZPFTRI |
(TRANSR,UPLO,N,A,INFO) | ||||
Triangular inversion | CTRTRI | DTRTRI | STRTRI | ZTRTRI |
(TRANSR,UPLO,DIAG,N,A,INFO) | ||||
Sym/Herm matrix norm | CLANHF | DLANSF | SLANSF | ZLANHF |
(NORM,TRANSR,UPLO,N,A,WORK) | ||||
Triangular solve | CTFSM | DTFSM | STFSM | ZTFSM |
(TRANSR,SIDE,UPLO,TRANS,DIAG,M,N,ALPHA,A,B,LDB) | ||||
Sym/Herm rank- update | CHFRK | DSFRK | SSFRK | ZHFRK |
(TRANSR,UPLO,TRANS,N,K,ALPHA,A,LDA,BETA,C) | ||||
Conv. from TP to TF | CTPTTF | DTPTTF | STPTTF | ZTPTTF |
(TRANSR,UPLO,N,AP,ARF,INFO) | ||||
Conv. from TR to TF | CTRTTF | DTRTTF | STRTTF | ZTRTTF |
(TRANSR,UPLO,N,A,LDA,ARF,INFO) | ||||
Conv. from TF to TP | CTFTTP | DTFTTP | STFTTP | ZTFTTP |
(TRANSR,UPLO,N,ARF,AP,INFO) | ||||
Conv. from TF to TR | CTFTTR | DTFTTR | STFTTR | ZTFTTR |
(TRANSR,UPLO,N,ARF,A,LDA,INFO) |
10 Summary and Conclusions
This paper describes RFPF as a standard minimal full format for representing both symmetric and triangular matrices. Hence, from a user point of view, these matrix layouts are a replacement for both the standard formats of DLA, namely full and packed storage. These new layouts possess three good features: they are efficient, they are supported by Level 3 BLAS and LAPACK full format routines, and they require minimal storage.
11 Acknowledgments
The results in this paper were obtained on seven computers, an IBM, a SGI, two SUNs, Itanium, NEC, and Intel Tigerton computers. The IBM machine belongs to the Center for Scientific Computing at Aarhus, the SUN machines to the Danish Technical University, the Itanium and NEC machines to the Danish Meteorological Institute, and the Intel Tigerton machine to the Innovative Computing Laboratory at the University of Tennessee.
We would like to thank Bernd Dammann for consulting on the SUN systems; Niels Carl W. Hansen for consulting on the IBM and SGI systems; and Bjarne Stig Andersen for obtaining the results on the Itanium and NEC computers. We thank IBMers John Gunnels who worked earlier on the HFPF format and JP Fasano who was instrumental in getting the source code released by the IBM Open Source Committee. We thank Allan Backer for discussions about an older version of this manuscript.
References
- Agarwal et al. (1994) \bibscAgarwal, R. C., Gustavson, F. G., and Zubair, M. \bibyear1994. Exploiting functional parallelism on power2 to design high-performance numerical algorithms. \bibemphicIBM Journal of Research and Development \bibemph38, 5 (September), 563–576.
- Andersen et al. (2002) \bibscAndersen, B. S., Gunnels, J. A., Gustavson, F., and Waśniewski, J. \bibyear2002. A Recursive Formulation of the Inversion of symmetric positive definite Matrices in Packed Storage Data Format. In \bibscJ. Fagerholm, J. Haataja, J. Järvinen, M. Lyly, and P. R. V. Savolainen Eds., \bibemphicProceedings of the International Conference, PARA 2002, Applied Parallel Computing, Number 2367 in Lecture Notes in Computer Science (Espoo, Finland, June 2002), pp. 287–296. Springer.
- Andersen et al. (2005) \bibscAndersen, B. S., Gustavson, F. G., Reid, J. K., and Waśniewski, J. \bibyear2005. A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm. \bibemphicACM Transactions on Mathematical Software \bibemph31, 201–227.
- Andersen et al. (2001) \bibscAndersen, B. S., Gustavson, F. G., and Waśniewski, J. \bibyear2001. A Recursive Formulation of Cholesky Facorization of a Matrix in Packed Storage. \bibemphicACM Transactions on Mathematical Software \bibemph27, 2 (Jun), 214–244.
- Anderson et al. (1999) \bibscAnderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. \bibyear1999. \bibemphicLAPACK Users’ Guide (Third ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA.
- Barker et al. (2001) \bibscBarker, V. A., Blackford, L. S., Dongarra, J. J., Croz, J. D., Hammarling, S., Marinova, M., Waśniewski, J., and Yalamov, P. \bibyear2001. \bibemphicLAPACK95 Users’ Guide (first ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA.
- Buttari et al. (2007) \bibscButtari, A., Langou, J., Kurzak, J., and Dongarra, J. \bibyear2007. A class of parallel tiled linear algebra algorithms for multi-core architectures. Tech rep. ut-cs-07-0600, Department of Electrical Engineering and Computer Science of the University of Tennessee.
- Chan et al. (2007) \bibscChan, E., Quintana-Ortí, E., Quintana-Ortí, G., and van de Geijn, R. \bibyear2007. Super-matrix out-of-order scheduling of matrix operations for smp and multi-core architectures. In \bibemphicSPAA 07, Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architecture (2007), pp. 116–125.
- Demmel (1997) \bibscDemmel, J. W. \bibyear1997. \bibemphApplied Numerical Linear Algebra. SIAM, Philadelphia.
- Dongarra et al. (1979) \bibscDongarra, J. J., Bunch, J. R., Moler, C. B., and Stewart, G. W. \bibyear1979. \bibemphLinpack Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
- Dongarra et al. (1990a) \bibscDongarra, J. J., Du Croz, J., Duff, I. S., and Hammarling, S. \bibyear1990a. Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms. \bibemphicACM Trans. Math. Soft. \bibemph16, 1 (March), 18–28.
- Dongarra et al. (1990b) \bibscDongarra, J. J., Du Croz, J., Duff, I. S., and Hammarling, S. \bibyear1990b. A Set of Level 3 Basic Linear Algebra Subprograms. \bibemphicACM Trans. Math. Soft. \bibemph16, 1 (March), 1–17.
- Dongarra et al. (1988) \bibscDongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R. J. \bibyear1988. An Extended Set of Fortran Basic Linear Algebra Subroutines. \bibemphicACM Trans. Math. Soft. \bibemph14, 1 (March), 1–17.
- Dongarra et al. (1998) \bibscDongarra, J. J., Duff, I. S., Sorensen, D. C., and van der Vorst, H. A. \bibyear1998. \bibemphNumerical Linear Algebra for High Performance Computers. SIAM, Society for Industrial and Applied Mathematics, Philadelphia.
- Elmroth et al. (2004) \bibscElmroth, E., Gustavson, F. G., Kagstrom, B., and Jonsson, I. \bibyear2004. Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. \bibemphicSIAM Review \bibemph46, 1 (March), 3–45.
- Golub and Van Loan (1996) \bibscGolub, G. and Van Loan, C. F. \bibyear1996. \bibemphicMatrix Computations (Third ed.). Johns Hopkins University Press, Baltimore, MD.
- Gunnels and Gustavson (2004) \bibscGunnels, J. A. and Gustavson, F. G. \bibyear2004. A new array format for symmetric and triangular matrices. In \bibscJ. W. J.J. Dongarra, K. Madsen Ed., \bibemphicApplied Parallel Computing, State of the Art in Scientific Computing, PARA 2004, Volume LNCS 3732 (Springer-Verlag, Berlin Heidelberg, 2004), pp. 247–255. Springer.
- Gustavson (2003) \bibscGustavson, F. G. \bibyear2003. High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. \bibemphicIBM Journal of Research and Development \bibemph47, 1 (January), 823–849.
- Gustavson et al. (2007) \bibscGustavson, F. G., Gunnels, J., and Sexton, J. \bibyear2007. Minimal Data Copy for Dense Linear Algebra Factorization. In \bibemphicApplied Parallel Computing, State of the Art in Scientific Computing, PARA 2006, Volume LNCS 4699 (Springer-Verlag, Berlin Heidelberg, 2007), pp. 540–549. Springer.
- Gustavson and Jonsson (2000) \bibscGustavson, F. G. and Jonsson, I. \bibyear2000. Minimal storage high performance cholesky via blocking and recursion. \bibemphicIBM Journal of Research and Development \bibemph44, 6 (Nov), 823–849.
- Gustavson et al. (2007) \bibscGustavson, F. G., Reid, J. K., and Waśniewski, J. \bibyear2007. Algorithm 865: Fortran 95 Subroutines for Cholesky Factorization in Blocked Hybrid Format. \bibemphicACM Transactions on Mathematical Software \bibemph33, 1 (March), 5.
- Gustavson and Waśniewski (2007) \bibscGustavson, F. G. and Waśniewski, J. \bibyear2007. Rectangular full packed format for LAPACK algorithms timings on several computers. In \bibemphicApplied Parallel Computing, State of the Art in Scientific Computing, PARA 2006, Volume LNCS 4699 (Springer-Verlag, Berlin Heidelberg, 2007), pp. 570–579. Springer.
- Herrero (2006) \bibscHerrero, J. R. \bibyear2006. \bibemphA Framework for Efficient Execution of Matrix Computations. Ph. D. thesis, Universitat Politècnica de Catalunya.
- Higham (1996) \bibscHigham, N. J. \bibyear1996. \bibemphAccuracy and Stability of Numerical Algorithms. SIAM.
- IBM (1997) IBM. \bibyear1997. \bibemphicEngineering and Scientific Subroutine Library for AIX (Version 3, Volume 1 ed.). IBM. Pub. number SA22–7272–0.
- Lawson et al. (1979) \bibscLawson, C. L., Hanson, R. J., Kincaid, D., and Krogh, F. T. \bibyear1979. Basic Linear Algebra Subprograms for Fortran Usage. \bibemphicACM Trans. Math. Soft. \bibemph5, 308–323.
- Trefethen and Bau (1997) \bibscTrefethen, L. N. and Bau, D. \bibyear1997. \bibemphNumerical Linear Algebra. SIAM, Philadelphia.
Appendix A Performance results
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 827 | 898 | 915 | 834 | 924 | 622 | 435 | 622 |
100 | 1420 | 1517 | 1464 | 1434 | 1264 | 1218 | 592 | 811 |
200 | 1734 | 1795 | 1590 | 1746 | 1707 | 1858 | 703 | 378 |
400 | 2165 | 2242 | 2275 | 2177 | 2234 | 2182 | 791 | 257 |
500 | 2175 | 2292 | 2358 | 2221 | 2337 | 2378 | 809 | 251 |
800 | 2426 | 2550 | 2585 | 2455 | 2618 | 2567 | 795 | 240 |
1000 | 2498 | 2617 | 2636 | 2485 | 2677 | 2650 | 668 | 217 |
1600 | 2590 | 2609 | 2739 | 2626 | 2764 | 2044 | 614 | 217 |
2000 | 2703 | 2758 | 2829 | 2711 | 2912 | 2753 | 606 | 216 |
4000 | 2502 | 2810 | 2822 | 2517 | 3100 | 2708 | 485 | 91 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 716 | 699 | 698 | 714 | 581 | 549 | 535 | 554 |
100 | 1199 | 1185 | 1183 | 1197 | 1163 | 1148 | 719 | 721 |
200 | 1768 | 1742 | 1756 | 1774 | 1821 | 1806 | 840 | 822 |
400 | 2277 | 2262 | 2293 | 2289 | 2179 | 2159 | 919 | 881 |
500 | 2354 | 2334 | 2357 | 2130 | 2468 | 2479 | 931 | 891 |
800 | 2551 | 2361 | 2593 | 2584 | 2636 | 2629 | 880 | 755 |
1000 | 2599 | 2600 | 2668 | 2639 | 2717 | 2717 | 708 | 520 |
1600 | 2621 | 2665 | 2702 | 2693 | 2507 | 2529 | 610 | 419 |
2000 | 2717 | 2767 | 2831 | 2740 | 2818 | 2854 | 599 | 401 |
4000 | 2542 | 2506 | 2757 | 2652 | 2635 | 2661 | 412 | 158 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 50 | 1829 | 1877 | 1883 | 1792 | 1698 | 1705 | 549 | 545 |
100 | 100 | 2118 | 2117 | 2121 | 2123 | 2042 | 1968 | 713 | 711 |
100 | 200 | 2505 | 2511 | 2515 | 2515 | 2242 | 2231 | 689 | 828 |
100 | 400 | 2638 | 2598 | 2626 | 2664 | 2356 | 2456 | 715 | 888 |
100 | 500 | 2386 | 2499 | 2669 | 2706 | 2479 | 2451 | 743 | 895 |
100 | 800 | 2759 | 2746 | 2776 | 2781 | 2410 | 2326 | 626 | 704 |
100 | 1000 | 2795 | 2739 | 2811 | 2817 | 2052 | 1987 | 525 | 554 |
160 | 1600 | 2870 | 2873 | 2886 | 2875 | 2431 | 2289 | 447 | 429 |
200 | 2000 | 2825 | 2825 | 2845 | 2838 | 2371 | 2167 | 416 | 416 |
400 | 4000 | 2701 | 2700 | 2808 | 2667 | 1589 | 1588 | 175 | 168 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 1423 | 1552 | 1633 | 1423 | 1301 | 1259 | 872 | 1333 |
100 | 2032 | 1986 | 2067 | 1854 | 1624 | 1905 | 1199 | 1353 |
200 | 2329 | 2277 | 2337 | 2198 | 2117 | 2374 | 1465 | 542 |
400 | 2646 | 2624 | 2698 | 2561 | 2556 | 2684 | 1725 | 482 |
500 | 2760 | 2264 | 2801 | 2699 | 2695 | 2793 | 1731 | 476 |
800 | 2890 | 2851 | 2897 | 2839 | 2874 | 2310 | 1315 | 441 |
1000 | 2929 | 2899 | 2954 | 2900 | 2958 | 2958 | 1244 | 435 |
1600 | 3002 | 2962 | 2563 | 2874 | 3204 | 1519 | 1202 | 379 |
2000 | 3031 | 2971 | 3016 | 3011 | 3372 | 3021 | 1173 | 411 |
4000 | 3022 | 2930 | 3011 | 3036 | 3185 | 2148 | 572 | 139 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 1525 | 1575 | 1515 | 1620 | 1400 | 1378 | 1230 | 1232 |
100 | 1968 | 2001 | 1948 | 2042 | 2012 | 1959 | 1525 | 1548 |
200 | 2388 | 2438 | 2277 | 2447 | 2428 | 2431 | 1731 | 1687 |
400 | 2665 | 2715 | 2700 | 2715 | 2758 | 2793 | 1867 | 1698 |
500 | 2748 | 2779 | 2777 | 2773 | 2840 | 2870 | 1885 | 1697 |
800 | 2841 | 2898 | 2917 | 2837 | 2599 | 2985 | 1330 | 1319 |
1000 | 2897 | 2943 | 2971 | 2914 | 3005 | 3040 | 1264 | 1258 |
1600 | 2920 | 2925 | 2724 | 2482 | 2031 | 3015 | 1153 | 1212 |
2000 | 2883 | 2948 | 2946 | 2931 | 2990 | 3079 | 1186 | 1193 |
4000 | 2839 | 2939 | 2975 | 2823 | 2485 | 3007 | 723 | 706 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 50 | 1949 | 1972 | 1971 | 1978 | 2161 | 2138 | 1029 | 1028 |
100 | 100 | 2552 | 2550 | 2562 | 2562 | 2501 | 2484 | 1212 | 1393 |
100 | 200 | 2858 | 2859 | 2860 | 2847 | 2646 | 2620 | 1303 | 1629 |
100 | 400 | 2982 | 2972 | 2972 | 2949 | 2811 | 2803 | 1398 | 1780 |
100 | 500 | 2991 | 2983 | 2987 | 2994 | 2835 | 2821 | 1364 | 1700 |
100 | 800 | 3083 | 3062 | 3083 | 2717 | 2819 | 2784 | 921 | 973 |
100 | 1000 | 3112 | 3100 | 3085 | 2694 | 2626 | 2604 | 853 | 855 |
160 | 1600 | 3141 | 3140 | 3149 | 3137 | 2820 | 2715 | 762 | 752 |
200 | 2000 | 3172 | 3182 | 3174 | 3171 | 2714 | 2698 | 718 | 667 |
400 | 4000 | 3193 | 3201 | 3214 | 3211 | 2656 | 2661 | 240 | 230 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 721 | 616 | 642 | 694 | 519 | 537 | 331 | 300 |
100 | 1419 | 1280 | 1337 | 1386 | 1347 | 1216 | 612 | 303 |
200 | 2764 | 2526 | 2637 | 2732 | 2621 | 2526 | 1072 | 300 |
400 | 4120 | 3728 | 3943 | 4053 | 4116 | 3932 | 1041 | 292 |
500 | 4430 | 4142 | 4313 | 4410 | 4495 | 4568 | 997 | 291 |
800 | 4663 | 4009 | 4198 | 4804 | 5034 | 3873 | 1007 | 290 |
1000 | 4764 | 4134 | 4485 | 5107 | 4789 | 3732 | 1029 | 289 |
1600 | 4278 | 3612 | 3956 | 4178 | 3740 | 2680 | 153 | 188 |
2000 | 4061 | 3611 | 3657 | 4087 | 3771 | 2335 | 85 | 107 |
4000 | 3493 | 2660 | 3126 | 3185 | 3769 | 2307 | 53 | 81 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 774 | 797 | 665 | 818 | 676 | 675 | 317 | 416 |
100 | 1731 | 1669 | 1681 | 1723 | 1561 | 1528 | 404 | 754 |
200 | 2945 | 3140 | 3169 | 3195 | 3034 | 2975 | 461 | 1246 |
400 | 4466 | 4383 | 4403 | 4476 | 4198 | 4176 | 439 | 1686 |
500 | 4648 | 4531 | 4662 | 4685 | 4740 | 4605 | 429 | 1795 |
800 | 4827 | 4815 | 4891 | 4799 | 4463 | 4833 | 422 | 2024 |
1000 | 4992 | 5016 | 5194 | 5155 | 4699 | 4931 | 421 | 2121 |
1600 | 4882 | 4957 | 4908 | 4874 | 4293 | 4733 | 267 | 1474 |
2000 | 3482 | 3749 | 5031 | 4967 | 3916 | 3072 | 70 | 238 |
4000 | 3080 | 3290 | 3613 | 3560 | 2725 | 3063 | 59 | 152 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 50 | 2535 | 2535 | 2552 | 2543 | 3283 | 2826 | 496 | 488 |
100 | 100 | 3838 | 3831 | 3853 | 3848 | 4438 | 4301 | 860 | 844 |
100 | 200 | 4898 | 4894 | 4894 | 4892 | 5045 | 5029 | 1357 | 1307 |
100 | 400 | 5311 | 5298 | 5251 | 5246 | 5067 | 5185 | 1312 | 1695 |
100 | 500 | 5214 | 5192 | 5259 | 5248 | 5195 | 5417 | 1319 | 1814 |
100 | 800 | 5300 | 5222 | 5645 | 5634 | 4666 | 4773 | 1369 | 2095 |
100 | 1000 | 4851 | 4712 | 4775 | 4846 | 4699 | 4098 | 1378 | 2159 |
160 | 1600 | 3721 | 3406 | 3850 | 4127 | 3658 | 3441 | 180 | 474 |
200 | 2000 | 3957 | 3469 | 3482 | 3998 | 3799 | 3620 | 97 | 338 |
400 | 4000 | 3913 | 3994 | 3957 | 3555 | 3945 | 3768 | 68 | 167 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 1477 | 1401 | 1532 | 1431 | 1449 | 1548 | 1084 | 1510 |
100 | 2651 | 2713 | 2765 | 2537 | 2492 | 2712 | 1628 | 2234 |
200 | 3828 | 3889 | 4040 | 3718 | 3532 | 3837 | 1812 | 2822 |
400 | 4374 | 4581 | 4829 | 4402 | 4343 | 4410 | 1550 | 3205 |
500 | 4592 | 4621 | 4933 | 4570 | 4776 | 4463 | 1521 | 3294 |
800 | 4729 | 4688 | 4897 | 4815 | 4737 | 4085 | 1277 | 3073 |
1000 | 4735 | 4694 | 4928 | 4689 | 4727 | 3334 | 441 | 1504 |
1600 | 4796 | 4701 | 4901 | 4737 | 3872 | 3801 | 223 | 693 |
2000 | 4560 | 4295 | 4553 | 4560 | 4476 | 3681 | 180 | 368 |
4000 | 3705 | 3341 | 4039 | 4200 | 4108 | 3487 | 101 | 186 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 1618 | 1633 | 1666 | 1744 | 1652 | 1529 | 1424 | 1284 |
100 | 2750 | 2744 | 2762 | 2968 | 2661 | 2523 | 2241 | 2037 |
200 | 3766 | 3780 | 3787 | 4085 | 3951 | 3582 | 2359 | 2764 |
400 | 4489 | 4404 | 4509 | 4708 | 4587 | 4408 | 1671 | 3205 |
500 | 4642 | 4594 | 4699 | 4860 | 4667 | 4618 | 1627 | 3340 |
800 | 4854 | 4826 | 4949 | 5044 | 4522 | 4634 | 1315 | 3366 |
1000 | 3246 | 3804 | 4958 | 5019 | 4001 | 3420 | 148 | 939 |
1600 | 4491 | 4623 | 3420 | 3620 | 2446 | 2881 | 69 | 1204 |
2000 | 2978 | 2912 | 4119 | 4158 | 3756 | 4040 | 62 | 325 |
4000 | 3532 | 3573 | 3514 | 3365 | 2829 | 2911 | 70 | 412 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 50 | 3062 | 3009 | 3067 | 3054 | 3465 | 3500 | 1551 | 1528 |
100 | 100 | 4106 | 4106 | 4114 | 4109 | 4287 | 4314 | 2146 | 2129 |
100 | 200 | 4562 | 4559 | 4750 | 4748 | 4369 | 4381 | 2200 | 2605 |
100 | 400 | 4662 | 4647 | 4885 | 4920 | 4920 | 5044 | 2163 | 2927 |
100 | 500 | 4612 | 4612 | 4970 | 5007 | 4925 | 4717 | 2193 | 3005 |
100 | 800 | 4332 | 4313 | 4729 | 4675 | 4726 | 4376 | 1951 | 2765 |
100 | 1000 | 4487 | 4430 | 4492 | 4639 | 4542 | 4454 | 1046 | 1838 |
160 | 1600 | 4469 | 4369 | 4450 | 4524 | 4057 | 4287 | 428 | 1225 |
200 | 2000 | 4284 | 4335 | 4225 | 4385 | 4315 | 4464 | 290 | 726 |
400 | 4000 | 3847 | 3845 | 4420 | 4434 | 4398 | 4445 | 110 | 373 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
50 | 781 | 771 | 784 | 771 | 1107 | 739 | 495 | 533 |
100 | 1843 | 1788 | 1848 | 1812 | 1874 | 1725 | 879 | 825 |
200 | 3178 | 2869 | 2963 | 3064 | 2967 | 2871 | 1323 | 1100 |
400 | 3931 | 3709 | 3756 | 3823 | 3870 | 3740 | 1121 | 1236 |
500 | 4008 | 3808 | 3883 | 3914 | 4043 | 3911 | 1032 | 1257 |
800 | 4198 | 4097 | 4145 | 4126 | 3900 | 4009 | 612 | 1127 |
1000 | 4115 | 4038 | 4015 | 3649 | 3769 | 3983 | 305 | 697 |
1600 | 3851 | 3652 | 3967 | 3971 | 3640 | 3987 | 147 | 437 |
2000 | 3899 | 3716 | 3660 | 3660 | 3865 | 3835 | 108 | 358 |
4000 | 3966 | 3791 | 3927 | 4011 | 3869 | 4052 | 119 | 398 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRI | PPTRI | |||||
u | l | u | l | u | l | u | l | |
50 | 633 | 659 | 648 | 640 | 777 | 870 | 508 | 460 |
100 | 1252 | 1323 | 1300 | 1272 | 1573 | 1760 | 815 | 810 |
200 | 2305 | 2442 | 2431 | 2314 | 2357 | 2639 | 1118 | 1211 |
400 | 3084 | 3199 | 3188 | 3094 | 3152 | 3445 | 1234 | 1363 |
500 | 3204 | 3316 | 3329 | 3218 | 3400 | 3611 | 1239 | 1382 |
800 | 3617 | 3741 | 3720 | 3640 | 3468 | 3786 | 1182 | 1268 |
1000 | 3611 | 3716 | 3637 | 3590 | 3456 | 3790 | 767 | 946 |
1600 | 3721 | 3802 | 3795 | 3714 | 3589 | 3713 | 500 | 609 |
2000 | 3784 | 3812 | 3745 | 3704 | 3636 | 3798 | 473 | 596 |
4000 | 3822 | 3762 | 3956 | 3851 | 3760 | 3750 | 467 | 614 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | u | l | u | l | u | l | u | l | |
100 | 50 | 2409 | 2412 | 2414 | 2422 | 3044 | 3018 | 725 | 714 |
100 | 100 | 3305 | 3301 | 3303 | 3303 | 3889 | 3855 | 1126 | 1109 |
100 | 200 | 4149 | 4154 | 4127 | 4146 | 4143 | 4127 | 1526 | 1512 |
100 | 400 | 4398 | 4403 | 4416 | 4444 | 4469 | 4451 | 1097 | 1088 |
100 | 500 | 4313 | 4155 | 4374 | 4394 | 4203 | 4093 | 1054 | 1045 |
100 | 800 | 3979 | 3919 | 4040 | 4051 | 3969 | 4011 | 692 | 720 |
100 | 1000 | 3716 | 3608 | 3498 | 3477 | 3630 | 3645 | 376 | 372 |
160 | 1600 | 3892 | 3874 | 4020 | 3994 | 4001 | 4011 | 188 | 182 |
200 | 2000 | 4052 | 4073 | 4040 | 4020 | 4231 | 4203 | 119 | 119 |
400 | 4000 | 4245 | 4225 | 4275 | 4287 | 4330 | 4320 | 115 | 144 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
u | l | u | l | u | l | u | l | |
50 | 206 | 200 | 225 | 225 | 365 | 353 | 57 | 238 |
100 | 721 | 728 | 789 | 788 | 1055 | 989 | 120 | 591 |
200 | 2028 | 2025 | 2005 | 2015 | 1380 | 1639 | 246 | 1250 |
400 | 3868 | 3915 | 3078 | 3073 | 1763 | 3311 | 479 | 1975 |
500 | 4483 | 4470 | 4636 | 4636 | 4103 | 4241 | 585 | 2149 |
800 | 5154 | 5168 | 4331 | 4261 | 3253 | 4469 | 870 | 2399 |
1000 | 5666 | 5654 | 5725 | 5703 | 5144 | 5689 | 1035 | 2474 |
1600 | 6224 | 6145 | 5644 | 5272 | 5375 | 5895 | 1441 | 2572 |
2000 | 6762 | 6788 | 6642 | 6610 | 6088 | 6732 | 1654 | 2598 |
4000 | 7321 | 7325 | 7236 | 7125 | 6994 | 7311 | 2339 | 2641 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRI | PPTRI | |||||
u | l | u | l | u | l | u | l | |
50 | 152 | 152 | 150 | 152 | 148 | 145 | 91 | 61 |
100 | 430 | 432 | 428 | 432 | 313 | 310 | 194 | 126 |
200 | 950 | 956 | 940 | 941 | 636 | 627 | 404 | 249 |
400 | 1850 | 1852 | 1804 | 1806 | 1734 | 1624 | 722 | 470 |
500 | 2227 | 2228 | 2174 | 2181 | 2180 | 2029 | 856 | 572 |
800 | 3775 | 3775 | 3668 | 3686 | 3405 | 3052 | 1186 | 842 |
1000 | 4346 | 4346 | 4254 | 4263 | 4273 | 3638 | 1342 | 985 |
1600 | 5313 | 5294 | 5137 | 5308 | 5438 | 4511 | 1690 | 1361 |
2000 | 6006 | 6006 | 5930 | 5931 | 5997 | 4832 | 1854 | 1536 |
4000 | 6953 | 6953 | 6836 | 6888 | 7041 | 4814 | 1921 | 2122 |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 50 | 873 | 870 | 889 | 886 | 1933 | 1941 | 88 | 88 |
100 | 100 | 2173 | 2171 | 2200 | 2189 | 3216 | 3236 | 181 | 179 |
100 | 200 | 4236 | 4230 | 4253 | 4245 | 4165 | 4166 | 352 | 347 |
100 | 400 | 5431 | 5431 | 5410 | 5408 | 5302 | 5303 | 648 | 644 |
100 | 500 | 5563 | 5562 | 5568 | 5567 | 5629 | 5632 | 783 | 779 |
100 | 800 | 6407 | 6407 | 6240 | 6240 | 5569 | 5593 | 1132 | 1128 |
100 | 1000 | 6578 | 6578 | 6559 | 6558 | 6554 | 6566 | 1325 | 1320 |
160 | 1600 | 6781 | 6805 | 6430 | 6430 | 6799 | 6809 | 1732 | 1727 |
200 | 2000 | 7568 | 7569 | 7519 | 7519 | 7406 | 7407 | 1920 | 1914 |
400 | 4000 | 7858 | 7858 | 7761 | 7761 | 7626 | 7627 | 2414 | 2410 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
2000 | 24.8460 | 24.2070 | 26.2493 | 27.3279 | 24.9569 | 13.0685 | 0.9389 | 0.4790 |
4000 | 39.0849 | 38.8042 | 41.1537 | 41.7441 | 38.4284 | 14.6297 | 0.7378 | 0.3879 |
6000 | 43.2940 | 43.9028 | 45.7611 | 45.6911 | 40.1301 | 14.6023 | 0.7212 | 0.3800 |
8000 | 48.2928 | 48.0530 | 50.1546 | 48.9082 | 40.1865 | 14.9028 | – | – |
10000 | 50.6669 | 50.0472 | 51.5198 | 50.8383 | 41.7279 | 14.9236 | – | – |
12000 | 47.9860 | 47.5107 | 50.6640 | 50.2138 | 43.1972 | 14.6511 | – | – |
14000 | 50.3806 | 50.6969 | 52.7881 | 52.3719 | 43.7816 | 14.5463 | – | – |
16000 | 51.2309 | 51.9454 | 53.5924 | 53.2322 | 44.0667 | 14.2067 | – | – |
18000 | 52.6901 | 52.2244 | 54.2978 | 53.5869 | 46.2805 | 14.5523 | – | – |
20000 | 53.6790 | 54.1209 | 54.3555 | 54.7896 | 45.8757 | 14.6236 | – | – |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
2000 | 29.9701 | 31.0403 | 29.2714 | 28.4205 | 13.2510 | 18.5249 | 0.6338 | 0.9229 |
4000 | 38.4338 | 39.1702 | 38.3199 | 37.7938 | 13.0367 | 18.1662 | 0.5080 | 0.7301 |
6000 | 38.6324 | 39.1249 | 39.0177 | 38.9534 | 12.8468 | 18.0594 | 0.4972 | 0.7149 |
8000 | 40.6770 | 40.7352 | 40.9032 | 39.8398 | 12.8871 | 17.9491 | – | – |
10000 | 41.3971 | 41.5932 | 41.6892 | 41.6400 | 12.6654 | 17.5897 | – | – |
12000 | 41.1646 | 40.8424 | 40.2776 | 40.4129 | 12.4705 | 17.5883 | – | – |
14000 | 42.1946 | 42.1400 | 41.2174 | 41.3633 | 12.4050 | 17.4173 | – | – |
16000 | 42.0274 | 42.2826 | 42.4457 | 42.3624 | 12.1912 | 17.2090 | – | – |
18000 | 42.2909 | 42.4922 | 41.9356 | 42.2480 | 12.1616 | 17.3289 | – | – |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 2000 | 0.7802 | 0.7759 | 0.7947 | 0.7897 | 0.7365 | 0.6691 | 0.8177 | 0.7628 |
100 | 4000 | 0.6925 | 0.6918 | 0.7130 | 0.7113 | 0.6462 | 0.6120 | 0.7310 | 0.7261 |
100 | 6000 | 0.6672 | 0.6639 | 0.6921 | 0.6937 | 0.5955 | 0.5773 | 0.7214 | 0.7193 |
100 | 8000 | 0.6494 | 0.6457 | 0.6787 | 0.6791 | 0.5524 | 0.5463 | – | – |
100 | 10000 | 0.6247 | 0.6194 | 0.6594 | 0.6579 | 0.5329 | 0.5269 | – | – |
100 | 12000 | 0.6228 | 0.6230 | 0.6512 | 0.6506 | 0.5336 | 0.5291 | – | – |
100 | 14000 | 0.5933 | 0.6181 | 0.6291 | 0.6309 | 0.5356 | 0.5271 | – | – |
100 | 16000 | 0.6020 | 0.6018 | 0.6265 | 0.6295 | 0.5095 | 0.5088 | – | – |
100 | 18000 | 0.6175 | 0.6164 | 0.6196 | 0.6184 | 0.5310 | 0.5232 | – | – |
100 | 20000 | 0.6092 | 0.6063 | 0.6022 | 0.6024 | 0.5221 | 0.5163 | – | – |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
2000 | 25.0114 | 24.7273 | 27.9415 | 29.2117 | 31.7156 | 22.4987 | 19.1706 | 18.1686 |
4000 | 46.5472 | 46.6683 | 50.4646 | 52.1384 | 53.5300 | 39.1913 | 25.5211 | 23.0137 |
6000 | 57.7951 | 59.0809 | 62.8870 | 62.2730 | 63.7367 | 45.6812 | 34.4061 | 30.1288 |
8000 | 67.8673 | 70.0423 | 72.3038 | 68.6783 | 70.7858 | 48.2404 | 39.5816 | 31.3558 |
10000 | 76.6851 | 78.1704 | 78.9962 | 79.0753 | 75.8030 | 52.6184 | 42.2241 | 35.7579 |
12000 | 72.2916 | 74.1424 | 79.1635 | 78.9553 | 78.4410 | 57.9543 | 46.0673 | 41.0530 |
14000 | 79.5957 | 81.4214 | 85.3673 | 84.0138 | 82.1996 | 59.0167 | 46.5374 | 38.7725 |
16000 | 83.6760 | 84.8718 | 89.7696 | 87.4224 | 83.8289 | 58.7681 | 50.8717 | 45.3575 |
18000 | 86.6604 | 86.5750 | 89.3476 | 88.8508 | 86.7870 | 62.9814 | 52.5077 | 47.1880 |
20000 | 90.7187 | 92.3898 | 92.9467 | 91.9760 | 88.2639 | 64.3982 | 51.0705 | 43.1419 |
n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|
NO TRANS | TRANS | POTRF | PPTRF | |||||
U | L | U | L | U | L | U | L | |
2000 | 29.7015 | 32.2611 | 29.5448 | 28.7837 | 13.8077 | 18.8739 | 0.6367 | 0.9238 |
4000 | 38.6352 | 39.5333 | 38.1630 | 37.9292 | 13.1999 | 18.1173 | 0.5069 | 0.7288 |
6000 | 38.7001 | 39.3848 | 38.5245 | 39.1682 | 12.8651 | 17.8524 | – | – |
8000 | 40.6456 | 41.2400 | 41.0437 | 40.9830 | 12.8791 | 17.9160 | – | – |
10000 | 41.5013 | 41.7725 | 42.4129 | 42.3191 | 12.7119 | 17.4713 | – | – |
12000 | 41.4199 | 41.2636 | 40.6651 | 40.5983 | 12.4945 | 17.6937 | – | – |
14000 | 42.0461 | 42.4899 | 41.8353 | 41.4583 | 12.4234 | 17.5004 | – | – |
16000 | 42.5350 | 42.7828 | 42.9538 | 42.4658 | 12.2014 | 17.2031 | – | – |
18000 | 42.0039 | 42.5616 | 42.3765 | 41.9941 | 12.2800 | 17.3990 | – | – |
20000 | 42.6296 | 43.0443 | 41.9921 | 41.9014 | 12.1434 | 17.3876 | – | – |
r | n | RFPF | LAPACK | ||||||
---|---|---|---|---|---|---|---|---|---|
h | NO TRANS | TRANS | POTRS | PPTRS | |||||
s | U | L | U | L | U | L | U | L | |
100 | 2000 | 0.1530 | 0.1439 | 0.1396 | 0.1432 | 0.1164 | 0.1093 | 0.1856 | 0.1482 |
100 | 4000 | 0.1516 | 0.1459 | 0.1486 | 0.1503 | 0.1077 | 0.1140 | 0.1545 | 0.1362 |
100 | 6000 | 0.1512 | 0.1451 | 0.1471 | 0.1493 | 0.1101 | 0.1065 | 0.1397 | 0.1223 |
100 | 8000 | 0.1490 | 0.1411 | 0.1429 | 0.1458 | 0.1085 | 0.1100 | 0.1192 | 0.1136 |
100 | 10000 | 0.1452 | 0.1408 | 0.1471 | 0.1430 | 0.1066 | 0.1088 | 0.1027 | 0.1019 |
100 | 12000 | 0.1407 | 0.1429 | 0.1452 | 0.1404 | 0.1079 | 0.1091 | 0.0958 | 0.0926 |
100 | 14000 | 0.1398 | 0.1406 | 0.1404 | 0.1388 | 0.1080 | 0.1100 | 0.0837 | 0.0843 |
100 | 16000 | 0.1374 | 0.1374 | 0.1411 | 0.1405 | 0.1075 | 0.1089 | 0.0786 | 0.0786 |
100 | 18000 | 0.1370 | 0.1366 | 0.1402 | 0.1396 | 0.1086 | 0.1087 | 0.0748 | 0.0745 |
100 | 20000 | 0.1362 | 0.1364 | 0.1394 | 0.1425 | 0.1065 | 0.1117 | 0.0699 | 0.0699 |
n | n | Mflop/s | Times | ||||||
pr | PFTRF | in PFTRF | LAPACK | ||||||
oc | PO | TR | SY | PO | PO | PP | |||
TRF | SM | RK | TRF | TRF | TRF | ||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1000 | 1 | 2695 | 0.12 | 0.02 | 0.05 | 0.04 | 0.02 | 0.12 | 0.94 |
5 | 7570 | 0.04 | 0.01 | 0.02 | 0.01 | 0.01 | 0.03 | 0.32 | |
10 | 10699 | 0.03 | 0.01 | 0.01 | 0.01 | 0.00 | 0.02 | 0.16 | |
15 | 18354 | 0.02 | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 | 0.11 | |
2000 | 1 | 2618 | 1.02 | 0.13 | 0.38 | 0.38 | 0.13 | 0.97 | 8.74 |
5 | 10127 | 0.26 | 0.04 | 0.10 | 0.09 | 0.04 | 0.24 | 3.42 | |
10 | 17579 | 0.15 | 0.02 | 0.06 | 0.05 | 0.03 | 0.12 | 1.65 | |
15 | 23798 | 0.11 | 0.02 | 0.04 | 0.04 | 0.01 | 0.13 | 1.11 | |
3000 | 1 | 2577 | 3.49 | 0.45 | 1.33 | 1.28 | 0.44 | 3.40 | 30.42 |
5 | 11369 | 0.79 | 0.11 | 0.28 | 0.30 | 0.11 | 0.71 | 11.76 | |
10 | 19706 | 0.46 | 0.06 | 0.19 | 0.16 | 0.05 | 0.38 | 6.16 | |
15 | 29280 | 0.31 | 0.05 | 0.12 | 0.10 | 0.04 | 0.26 | 4.28 | |
4000 | 1 | 2664 | 8.01 | 1.01 | 2.90 | 3.09 | 1.01 | 7.55 | 75.72 |
5 | 11221 | 1.90 | 0.26 | 0.68 | 0.72 | 0.24 | 1.65 | 25.73 | |
10 | 21275 | 1.00 | 0.13 | 0.39 | 0.36 | 0.12 | 0.86 | 13.95 | |
15 | 31024 | 0.69 | 0.09 | 0.28 | 0.24 | 0.08 | 0.59 | 10.46 | |
5000 | 1 | 2551 | 16.34 | 2.04 | 6.16 | 6.10 | 2.04 | 15.79 | 154.74 |
5 | 11372 | 3.66 | 0.45 | 1.37 | 1.44 | 0.40 | 3.27 | 47.76 | |
10 | 22326 | 1.87 | 0.25 | 0.78 | 0.62 | 0.22 | 1.73 | 28.13 | |
15 | 32265 | 1.29 | 0.17 | 0.53 | 0.45 | 0.14 | 1.16 | 20.95 |
n | n | Mflop/s | Times | ||||||
pr | PFTRF | in PFTRF | LAPACK | ||||||
oc | PO | TR | SY | PO | PO | PP | |||
TRF | SM | RK | TRF | TRF | TRF | ||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1000 | 1 | 1587 | 0.21 | 0.03 | 0.09 | 0.07 | 0.03 | 0.19 | 1.06 |
5 | 4762 | 0.07 | 0.02 | 0.02 | 0.02 | 0.02 | 0.07 | 1.13 | |
10 | 5557 | 0.06 | 0.01 | 0.01 | 0.02 | 0.02 | 0.06 | 1.12 | |
15 | 5557 | 0.06 | 0.02 | 0.01 | 0.01 | 0.02 | 0.06 | 1.11 | |
2000 | 1 | 1668 | 1.58 | 0.22 | 0.63 | 0.52 | 0.22 | 1.45 | 11.20 |
5 | 6667 | 0.40 | 0.07 | 0.13 | 0.13 | 0.07 | 0.38 | 11.95 | |
10 | 8602 | 0.31 | 0.06 | 0.07 | 0.11 | 0.07 | 0.25 | 11.24 | |
15 | 9524 | 0.28 | 0.06 | 0.06 | 0.08 | 0.08 | 0.23 | 11.66 | |
3000 | 1 | 1819 | 4.95 | 0.62 | 1.98 | 1.72 | 0.63 | 4.86 | 45.48 |
5 | 6872 | 1.31 | 0.20 | 0.42 | 0.48 | 0.20 | 1.38 | 55.77 | |
10 | 12162 | 0.74 | 0.14 | 0.22 | 0.21 | 0.16 | 0.76 | 46.99 | |
15 | 12676 | 0.71 | 0.14 | 0.16 | 0.30 | 0.16 | 0.61 | 45.71 | |
4000 | 1 | 1823 | 11.70 | 1.52 | 4.62 | 4.01 | 1.55 | 11.86 | 112.52 |
5 | 7960 | 2.68 | 0.40 | 0.94 | 0.92 | 0.42 | 2.74 | 112.77 | |
10 | 14035 | 1.52 | 0.26 | 0.47 | 0.49 | 0.30 | 1.61 | 112.53 | |
15 | 17067 | 1.25 | 0.24 | 0.37 | 0.35 | 0.29 | 1.29 | 111.67 | |
5000 | 1 | 1843 | 22.61 | 2.92 | 8.76 | 8.00 | 2.93 | 23.60 | 218.94 |
5 | 8139 | 5.12 | 0.77 | 1.81 | 1.80 | 0.74 | 5.45 | 221.58 | |
10 | 14318 | 2.91 | 0.50 | 0.97 | 0.93 | 0.51 | 3.11 | 214.54 | |
15 | 17960 | 2.32 | 0.45 | 0.72 | 0.68 | 0.47 | 2.40 | 225.08 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 0.99 | 1.47 | 1.23 | 1.29 | 1.10 | 3.43 |
100 | 1.20 | 1.87 | 1.03 | 1.66 | 1.04 | 2.98 |
200 | 0.97 | 2.55 | 0.97 | 2.11 | 1.12 | 3.04 |
400 | 1.02 | 2.88 | 1.05 | 2.50 | 1.08 | 3.00 |
500 | 0.99 | 2.91 | 0.95 | 2.53 | 1.09 | 3.02 |
800 | 0.99 | 3.25 | 0.98 | 2.95 | 1.15 | 3.95 |
1000 | 0.98 | 3.95 | 0.98 | 3.77 | 1.37 | 5.08 |
1600 | 0.99 | 4.46 | 1.07 | 4.43 | 1.19 | 6.46 |
2000 | 0.97 | 4.67 | 0.99 | 4.73 | 1.20 | 6.84 |
4000 | 0.91 | 5.82 | 1.04 | 6.69 | 1.77 | 16.05 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 1.16 | 1.31 | 1.26 | 1.23 | 0.92 | 1.92 |
100 | 1.01 | 1.32 | 1.09 | 1.53 | 1.02 | 1.84 |
200 | 1.01 | 1.41 | 0.98 | 1.60 | 1.08 | 1.76 |
400 | 0.97 | 1.45 | 1.01 | 1.56 | 1.06 | 1.68 |
500 | 0.97 | 1.47 | 1.00 | 1.62 | 1.06 | 1.76 |
800 | 0.98 | 2.19 | 1.01 | 2.20 | 1.09 | 3.17 |
1000 | 0.98 | 2.35 | 1.00 | 2.37 | 1.19 | 3.64 |
1600 | 0.97 | 2.41 | 0.94 | 2.50 | 1.12 | 4.13 |
2000 | 0.96 | 2.47 | 0.90 | 2.58 | 1.17 | 4.43 |
4000 | 0.99 | 4.11 | 0.95 | 5.31 | 1.21 | 13.39 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 1.34 | 2.18 | 1.21 | 1.97 | 0.78 | 5.15 |
100 | 1.05 | 2.32 | 1.11 | 2.30 | 0.87 | 4.48 |
200 | 1.05 | 2.58 | 1.05 | 2.56 | 0.97 | 3.61 |
400 | 1.00 | 3.96 | 1.07 | 2.65 | 1.02 | 3.13 |
500 | 0.97 | 4.44 | 0.99 | 2.61 | 0.97 | 2.90 |
800 | 0.95 | 4.77 | 1.01 | 2.42 | 1.18 | 2.69 |
1000 | 1.07 | 4.96 | 1.05 | 2.45 | 1.03 | 2.25 |
1600 | 1.14 | 22.76 | 1.05 | 3.36 | 1.13 | 8.71 |
2000 | 1.08 | 38.20 | 1.28 | 21.14 | 1.05 | 11.83 |
4000 | 0.93 | 43.12 | 1.18 | 23.77 | 1.01 | 23.92 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 0.99 | 1.01 | 1.06 | 1.22 | 0.88 | 1.98 |
100 | 1.02 | 1.24 | 1.12 | 1.32 | 0.95 | 1.92 |
200 | 1.05 | 1.43 | 1.03 | 1.48 | 1.08 | 1.82 |
400 | 1.10 | 1.51 | 1.03 | 1.47 | 0.98 | 1.68 |
500 | 1.03 | 1.50 | 1.04 | 1.46 | 1.02 | 1.67 |
800 | 1.03 | 1.59 | 1.09 | 1.50 | 1.00 | 1.71 |
1000 | 1.04 | 3.28 | 1.25 | 5.35 | 1.02 | 2.52 |
1600 | 1.27 | 7.07 | 1.60 | 3.84 | 1.06 | 3.69 |
2000 | 1.02 | 12.39 | 1.03 | 12.79 | 0.98 | 6.04 |
4000 | 1.02 | 22.58 | 1.23 | 8.67 | 1.00 | 11.89 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 0.71 | 1.47 | 0.76 | 1.30 | 0.80 | 3.34 |
100 | 0.99 | 2.10 | 0.75 | 1.62 | 0.85 | 2.94 |
200 | 1.07 | 2.40 | 0.93 | 2.02 | 1.00 | 2.72 |
400 | 1.02 | 3.18 | 0.93 | 2.35 | 0.99 | 4.05 |
500 | 0.99 | 3.19 | 0.92 | 2.41 | 1.05 | 4.17 |
800 | 1.05 | 3.72 | 0.99 | 2.95 | 1.01 | 5.63 |
1000 | 1.03 | 5.90 | 0.98 | 3.93 | 1.02 | 9.88 |
1600 | 1.00 | 9.09 | 1.02 | 6.24 | 1.00 | 21.38 |
2000 | 1.01 | 10.89 | 1.00 | 6.40 | 0.96 | 34.23 |
4000 | 0.99 | 10.08 | 1.05 | 6.44 | 0.99 | 29.77 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
50 | 0.62 | 0.95 | 1.03 | 1.67 | 0.46 | 10.10 |
100 | 0.75 | 1.34 | 1.38 | 2.23 | 0.68 | 12.15 |
200 | 1.24 | 1.62 | 1.50 | 2.37 | 1.02 | 12.08 |
400 | 1.18 | 1.98 | 1.07 | 2.57 | 1.02 | 8.38 |
500 | 1.09 | 2.16 | 1.02 | 2.60 | 0.99 | 7.11 |
800 | 1.16 | 2.15 | 1.11 | 3.18 | 1.15 | 5.66 |
1000 | 1.01 | 2.31 | 1.02 | 3.24 | 1.00 | 4.96 |
1600 | 1.06 | 2.42 | 0.98 | 3.14 | 1.00 | 3.93 |
2000 | 1.01 | 2.61 | 1.00 | 3.24 | 1.02 | 3.94 |
4000 | 1.00 | 2.77 | 0.99 | 3.28 | 1.03 | 3.26 |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
2000 | 1.10 | 29.11 | 1.68 | 33.63 | 1.08 | 0.97 |
4000 | 1.09 | 56.58 | 2.16 | 53.65 | 1.10 | 0.98 |
6000 | 1.14 | 63.45 | 2.17 | 54.73 | 1.16 | 0.96 |
8000 | 1.25 | – | 2.28 | – | 1.23 | – |
10000 | 1.23 | – | 2.37 | – | 1.24 | – |
12000 | 1.17 | – | 2.34 | – | 1.22 | – |
14000 | 1.21 | – | 2.42 | – | 1.18 | – |
16000 | 1.22 | – | 2.47 | – | 1.24 | – |
18000 | 1.17 | – | 2.45 | – | 1.17 | – |
20000 | 1.19 | – | 2.48 | – | 1.17 | – |
factorization | inversion | solution | ||||
---|---|---|---|---|---|---|
PF/PO | PF/PP | PF/PO | PF/PP | PF/PO | PF/PP | |
2000 | 0.92 | 1.52 | 1.71 | 34.92 | 1.31 | 0.82 |
4000 | 0.97 | 2.04 | 2.18 | 54.24 | 1.33 | 0.98 |
6000 | 0.99 | 1.83 | 2.21 | – | 1.37 | 1.08 |
8000 | 1.02 | 1.83 | 2.30 | – | 1.35 | 1.25 |
10000 | 1.04 | 1.87 | 2.43 | – | 1.35 | 1.43 |
12000 | 1.01 | 1.72 | 2.34 | – | 1.33 | 1.52 |
14000 | 1.04 | 1.83 | 2.43 | – | 1.28 | 1.67 |
16000 | 1.07 | 1.76 | 2.50 | – | 1.30 | 1.80 |
18000 | 1.03 | 1.70 | 2.45 | – | 1.29 | 1.87 |
20000 | 1.05 | 1.82 | 2.48 | – | 1.28 | 2.04 |