Efficient numerical computation of the Pfaffian for dense and banded skewsymmetric matrices
Abstract
Computing the Pfaffian of a skewsymmetric matrix is a problem that arises in various fields of physics. Both computing the Pfaffian and a related problem, computing the canonical form of a skewsymmetric matrix under unitary congruence, can be solved easily once the skewsymmetric matrix has been reduced to skewsymmetric tridiagonal form. We develop efficient numerical methods for computing this tridiagonal form based on Gauss transformations, using a skewsymmetric, blocked form of the ParlettReid algorithm, or based on unitary transformations, using block Householder transformations and Givens rotations, that are applicable to dense and banded matrices, respectively. We also give a complete and fully optimized implementation of these algorithms in Fortran, and also provide Python, Matlab and Mathematica implementations for convenience. Finally, we apply these methods to compute the topological charge of a class D nanowire, and show numerically the equivalence of definitions based on the Hamiltonian and the scattering matrix.
pacs:
02.10.Yn, 02.60.Dc, 03.65.VfI Introduction
i.1 Pfaffians and reduction to tridiagonal form
A real or complex matrix is called skewsymmetric (or antisymmetric), if , where denotes the transpose. The determinant of such a skewsymmetric matrix is the square of a polynomial in the matrix entries, the Pfaffian :
(1) 
In other words, the Pfaffian of a skewsymmetric matrix is a unique choice for the sign of the root of the determinant:
(2) 
Pfaffians arise in various fields of physics, such as in the definition of topological charges Kitaev2001 (); Fu2007a (); Fulga2011 (), electronic structure quantum Monte Carlo Bajdich2009 (), the twodimensional Ising spin glass Thomas2009 (), or in the definition of a trial wave function for the fractional quantum Hall state Moore1991 (). It also arises naturally from Gaussian Grassmann integration, and as such finds applications for example in quantum chaos Haake2004 () or lattice quantum field theory Montvay1996 ().
The Pfaffian for a skewsymmetric matrix is defined as
(3) 
where is the group of permutations of sets with elements. The Pfaffian of an odddimensional matrix is defined to be zero, as in this case also (). While Eq. (3) can be used to compute the Pfaffian directly for small matrices, its computational cost is prohibitively expensive for larger matrices.
Analogous to the numeric computation of the determinant, a promising strategy is thus to find a transformation of the original matrix into a form that allows an easier evaluation of the Pfaffian. Particularly useful in this context is the recursive definition of the Pfaffian,
(4) 
where is the matrix A without the rows and columns 1 and . (Note that the Pfaffian of a matrix is defined as 1). Further, for an arbitrary real or complex matrix ,
(5) 
From the recursive definition of the Pfaffian (4) it is obvious that the Pfaffian of a skewsymmetric tridiagonal matrix
(6) 
is given as
(7) 
Furthermore, a closer inspection of Eq. (4) shows that also a matrix that has only a partial tridiagonal form with only for odd and (i.e. a matrix that would be tridiagonal, if every even row and column would be removed),
(8) 
allows for an easy evaluation of the Pfaffian, as . Our goal is therefore to find for a skewsymmetric matrix a suitable transformation such that
(9) 
with tridiagonal or tridiagonal in every odd row and column.
It has been known for a while that the Pfaffian of a skewsymmetric matrix can be computed in time, using a skewsymmetric form of Gaussian elimination (adding multiples of rows and columns in a symmetric fashion) Galbiati1994 (); Rote2001 (); Krauth2006 (); Bajdich2009 (). Such an skewsymmetric Gaussian elimination computes a factorization of the matrix in the form (9) with where is a permutation matrix and a unit lower triangular matrix. For brevity, we will refer to this type of decomposition as decomposition. Gaussian elimination requires pivoting for numerical stability, hence the need for the permutation . Below, we will formulate this approach in a way that allows for an efficient computer implementation.
Another Gaussian based elimination technique is the decomposition where is reduced to , a matrix with only skewsymmetric blocks on the diagonal Bunch1982 (); Higham2002 (). This approach has also been suggested for computing the Pfaffian recently Thomas2009 (); Rubow2011 (). We will not persue this approach here, but show that the decomposition allows for computing the Pfaffian in the same number of operations and can be formulated more easily to use level3 matrix operations.
As an alternative to the Gaussian elimination based techniques, we also develop algorithms using unitary (orthogonal in the real case) transformations that are also known to allow for a stable numerical computation in for dense matrices. This approach doe not require pivoting for numerical stability and can more easily make use of the bandedness of a matrix. We will describe how to compute a unitary matrix such in order to tridiagonalize (either fully or partially) ,
(10) 
or equivalently , where denotes the Hermitian conjugate and complex conjugation. Note that such a unitary congruence transformation is for the complex case quite different from the usual unitary similarity transformations usually encounters, which are of the form . In the real case, the transformation reduces to the usual orthogonal similarity transformation.
i.2 Tridiagonalization and the canonical form of skewsymmetric matrices
Apart from computing allowing for an efficient computation of the Pfaffian, the tridiagonal form of a skewsymmetric matrix under unitary congruence is also relevant for computing the canonical form of this matrix.
A skewsymmetric matrix has a particularly simple canonical form under a unitary congruence transformation. For every skewsymmetric matrix there exists a unitary matrix such that Hua1944 (); Stander1960 ()
(11) 
where , denotes the direct sum, and
(12) 
This canonical form has been used in the physics context for example to prove the Kramer’s degeneracy of transmission eigenvalues Bardarson2008 () and the degeneracy of Andreev reflection eigenvalues Beri2009 ().
The problem of computing the canonical form of an evendimensional skewsymmetric tridiagonal matrix has been discussed in Ward1978a (); Golub1996 (); Hastings2010 (), the reduction of the problem with on odddimensional matrix to the evendimensional case in Ward1978a (). For a skewsymmetric tridiagonal matrix as defined Eq. (6), the values of , are given by the nonzero singular values of the bidiagonal matrix
(13) 
For completeness, we give details and a selfcontained derivation in appendix A.
The canonical form of a skewsymmetric matrix under unitary congruence is also connected to certain eigenvalue problems: In the real case, the eigenvalues of are given by . In the complex case, the matrix has doubly degenerate eigenvalues .
i.3 Skewsymmetric tridiagonalization and existing approaches
Both the computation of the Pfaffian and of the canonical form are ultimately connected to the problem of tridiagonalizing a skewsymmetric matrix. Here we give an overview of existing solutions (with implementations) that could be used to solve parts of the problem, and discuss the need for a new comprehensive implementation.
For real skewsymmetric matrices, the unitary congruence transformation reduces to an ordinary orthogonal similarity transformation and hence established decompositions can be used Golub1996 (): The Hessenberg decomposition of a skewsymmetric matrix reduces to tridiagonal form (6), and the real Schur decomposition to the canonical form (11) (implemented, for example in LAPACK Lapack ()). However, none of these decompositions make use of the structure of the problem which would be desirable for precision and speed, nor can they be used for complex skewsymmetric matrices.
Ward and Gray have developed and implemented algorithms to compute the tridiagonal form and the eigenvalues (and as an intermediate step, the canonical form) of a real dense, skewsymmetric matrix, making use of the structure of the problem Ward1978b (). A complex version is however not available.
The accompanying Matlab code HighamMCT () to Higham2002 () contains a skewsymmetric decomposition that can be used to compute Pfaffians, but according to the authors is not designed for efficiency.
Very recently, GonzálezBallestero, Robledo and Bertsch have developed a library for the numerical computation of the Pfaffian of a dense skewsymmetric matrix GonzalezBallestero2010 (), but do not give access to the transformation matrix (e.g. needed for computing the canonical form). They present algorithms based on a decomposition (called Aitken block diagonalization in GonzalezBallestero2010 ()) and on Householder tridiagonalization. However, their approach does not make use of the full symmetry of the problem.
None of the existing approaches (with the exception of LAPACK that does not exploit the skewsymmetry of the problem) makes use of block algorithms that are rich in level3 operations and desirable for a more favorable memory access pattern. Below we will show that such block algorithms can give rise to a considerable increase in speed.
Moreover, none of the above approaches makes use of the sparsity of a banded matrix, a structure that however often arises in practice. Below we will also consider this case in particular.
The goal of this work is thus to develop and implement algorithms for tridiagonalizing a real or complex skewsymmetric matrix, making use of the skewsymmetry and possibly the bandedness of the matrix.
i.4 Outline
The remainder of the paper is organized as follows. In Sec. II we discuss algorithms to tridiagonalize a dense or banded skewsymmetric matrix using Gauss transformations, Householder reflections and Givens rotations. Further, in Sec. III we discuss the details of our implementation, and present benchmarks and an exemplary application in Sec. IV. In the appendix, we give a selfcontained derivation on the computation of the canonical form of a tridiagonal, skewsymmetric matrix. Moreover, we discuss blocked versions of our tridiagonalization algorithms for dense matrices and give technical details about the Fortran implementation.
Ii Skewsymmetric numerical tridiagonalization
ii.1 Statement of the problem
Summarizing the discussion above, for a given skewsymmetric matrix we seek a (invertible) transformation such that with in tridiagonal form tridiagonal (or in partial tridiagonal for). Below we consider first an algorithm for dense matrices based on Gauss transformations requiring pivoting. Then we focus on algorithms bases on unitary transformations where we consider both dense and banded matrices. The discussion is presented for the case of complex matrices, but it carries over to the real case unchanged.
ii.2 decomposition of dense matrices using the ParlettReid algorithm
For symmetric or Hermitian matrices there exist efficient algorithms to compute a or decomposition (for an overview, see Golub1996 ()). It has been shown by Bunch that those decompositions can in principle also be generalized and computed stably for skewsymmetric matrices Bunch1982 (). Below we reformulate the algorithm for the decomposition of a symmetric matrix due to Parlett and Reid Parlett1970 () such that it is suitable for skewsymmetric matrices. The ParlettReid algorithm is usually not the method of choice in the symmetric case, as there are more efficient alternatives Aasen1971 (); Bunch1977 (). However, as we will discuss below, the ParlettReid algorithm can be used to compute the Pfaffian just as effective.
A matrix of the form
(14) 
where is the identity matrix and the th unit vector in , is called a Gauss transformation if the first entries of are zero. Given a vector and taking , can be used to eliminate the entries in , , provided that .
A Gauss transformation can thus be used to zero the entries in a column or row of below a chosen point . In order to avoid divisions by a small number or zero, a permutation interchanging entry with another nonzero, typically the largest entry in is performed. The numerical stability of this pivoting strategy is discussed in Bunch1982 ().
Hence, a series of Gauss transformations and permutations can be used to tridiagonalize a skewsymmetric matrix . To demonstrate the mechanism, assume that after applying Gauss transformations and permutations, the matrix is already in tridiagonal form in the first columns and rows and hence has the form
(15) 
with , , , , and , (transformations of the form maintain skewsymmetry). Now choose a permutation matrix such that the maximal entry in is permuted to the top, i.e. where . If the maximal element at this step is zero, and is already tridiagonal in the first columns and we set . Otherwise, we take and with with . Then we obtain
(16) 
Then and is tridiagonal in its first rows and columns. Defining we find
(17) 
The crossterm vanishes due to the skewsymmetry of . Note that in this skewsymmetric outer product update, the matrix remains actually unchanged in the first column and row due to the structure of and . The outer product update is dominating the computational cost of each step and can be computed in flops, if the symmetry is fully accounted for. Fig. 1 shows the structure of a tridiagonalization step schematically for a particular example.
After steps, the decomposition can be written as
(18) 
with permutation , skewsymmetric tridiagonal , and lower unit triangular matrix
(19) 
As in the symmetric ParlettReid algorithm, the first column of is , and the th column below the diagonal a permuted version of .
The computation of the updated matrix, Eq. (17), is a level2 matrix operation. It is possible to regroup these updates in a way that allows to operate with level3 matrix operations that have a more favorable memory access pattern. The details of this block version of the ParlettReid algorithm are given in appendix B.
The full skewsymmetric decomposition can be computed in flops. It is however readily generalized to compute only a partial tridiagonal form as in Eq. (8) by skipping every other row and column elimination. This partial decomposition can thus be computed in flops. Since and can be computed in steps, computing the Pfaffian of a skewsymmetric matrix with the ParlettReid algorithm thus requires a total of flops. This is a factor of 10 less than the unsymmetric Hessenberg decomposition.
For computing a full tridiagonalization, the ParlettReid algorithm requires twice as many flops as other approaches: Aasen’s algorithm Aasen1971 () computes a (complete) decomposition using a different order of operations in flops, as does the BunchKaufmann algorithm Bunch1977 () for computing a (complete) decomposition. Both algorithms can be generalized to the skewsymmetric case. Given the fact that computing the Pfaffian requires less information than a full tridiagonalization, it might seem feasible to compute the Pfaffian in flops. However, neither Aasen’s algorithm (which is based on the fact that is upper Hessenberg and hence fully tridiagonal), nor the BunchKaufmann algorithm (which relies on the blockdiagonal structure of ) are easily amended to compute a suitable partial factorization. Thus, for computing the Pfaffian, the ParlettReid algorithm is competitive. It remains an open question if it is possible to compute the Pfaffian of a dense skewsymmetric matrix in less than flops.
ii.3 Tridiagonalization of dense matrices with Householder reflections
Dense symmetric or Hermitian matrices are commonly reduced to tridiagonal form by Householder transformations Golub1996 (), and we adopt this approach to the skewsymmetric case here.
An order Householder transformation is a matrix of the form
where and chosen such that
for a given . Here denotes the norm of a vector, is the first unit vector in and . For example, , if one chooses when , but there is a certain degree of freedom in choosing the Householder vector which can be exploited to maximize stability (for an overview, see lawn72 ()). Note that is unitary (though not necessarily Hermitian) and can also be numerically calculated such that it is unitary up to machine precision Golub1996 ().
Householder transformations can thus be applied to a matrix to zero all the elements of a column (or row) below a chosen point, just as Gauss transformations, but without the need for pivoting. As a consequence, the structure of the tridiagonalization procedure is analogous to the decomposition. Assume that after step the matrix is already tridiagonal in the first columns and rows and partitioned as defined in Eq. (15). Then an order Householder matrix is chosen such that and the full transformation is set to . Writing and defining we find
(20) 
The main difference to the decomposition is the fact that the computation of now involves a full matrixvector multiplication. Hence, the total computational cost of the outer product update in Eq. (20) is flops. The structure of a Householder tridiagonalization step is also shown schematically in Fig. 1. The outer product updates of Eq. (20) can be rearranged to increase the fraction of level3 matrix operations. The block version of the Householder algorithm is detailed in App. B.
Complete tridiagonalization with Householder matrices requires in total flops. This can reduced to for computing the Pfaffian by skipping every other row/column elimination to compute only a partial tridiagonal form.
For the computation of the Pfaffian we also need to compute the determinant of the transformation matrix . The determinant of a single Householder transformation is given as
(21) 
For the particular choice , , i.e. is a reflection, but other choices of are equally viable. In particular, if the matrix is already tridiagonal in certain column and row (which can happen frequently for very structured matrices), it is advantageous to use . Moreover, any complex skewsymmetric matrix may be reduced to a purely real tridiagonal matrix using appropriate Householder transformations with complex lawn72 (). Because of this, the determinant of each Householder reflection must be computed separately. The task of computing still only scales as and is thus negligible compared to the tridiagonalization cost.
In summary, for computing the Pfaffian Householder tridiagonalization is twice as costly as the ParlettReid algorithm and thus usually not competitive. It has however a right on its own given its connection to computing the canonical form of a skewsymmetric matrix.
ii.4 Tridiagonalization of band matrices with Givens rotations
The dense algorithms of the previous two sections are not easily amended to matrices with a finite band width. In the case of the ParlettReid algorithm, the symmetric pivoting can lead to an uncontrolled growth of the band width depending on the details of the matrix. In the Householder tridiagonalization, the outer product matrix update always introduces values outside the band, leading to a fastgrowing band width.
For symmetric matrices, or decomposition algorithms respecting the band width have only been introduced recently Irony2006 (); Kaufman2007 (). In contrast, banded tridiagonalization with unitary transformations is well established for symmetric matrices, and we adopt this approach for the skewsymmetric case below.
Instead of zeroing a whole column or row as is done in the Householder approach, for banded matrices it is of advantage to use a more selective approach. The method of choice for this case in the symmetric or Hermitian case are Givens rotations Schwarz1968 (), and we will extend this approach to th skewsymmetric case. A Givens rotation is a modification of the identity matrix that is only different in the th and th row and column. It is defined as
(22) 
with , and and thus clearly unitary. Choosing and such that
(23) 
it is possible to selectively zero one element of a vector. Again, a Givens rotation can be computed numerically such that it is orthogonal up to machine precision.
A banded skewsymmetric matrix can be brought into tridiagonal form by Givens rotations of the form . The structure induced in the process of applying from the left and right is shown schematically in fig. 2. Applying a Givens rotation () from the left (right) only modifies the th and th rows (columns). Due to the skewsymmetry, if zeroes the entry in column , zeroes the entry in row . Furthermore, each Givens rotation only introduces at most one additional nonzero entry outside the band in a row and column . This nonzero entry can thus be moved further down the band by a sequence of Givens transformations until it is “chased” beyond the end of the matrix.
The structure of the skewsymmetric tridiagonalization routine is thus identical to the symmetric or Hermitian case. The main difference is in the update of the diagonal block that is affected by both Givens rotations from left and right: Due to the skewsymmetry, the diagonal blocks are invariant under these transformations,
(24) 
Kaufman Kaufman1984 (); Kaufman2000 () has presented a variant of the symmetric band matrix approach of Ref. Schwarz1968 () that allows to work on more data in a single operation, which allows a more favorable memory access pattern. These modifications carry over unchanged to the skewsymmetric case.
The tridiagonalization of an skewsymmetric matrix with bandwidth using Givens transformations scales as .
The determinant of any single Givens rotation , and thus the determinant of the full transformation , too. In the complex case the resulting tridiagonal matrix can be chosen to be purely real, in this case the determinant of total unitary transformations (the Givens transformations and row/columnsscalings with a phase factor) obey .
Iii Notes on the implementation
iii.1 Fortran
We have implemented the algorithms described in this manuscript as a comprehensive set of Fortran routines for real and complex variables as well as single and double precision. Because of the conceptional similarity of the skewsymmetric problem to the symmetric and Hermitian problem, these routines are designed analogous to to the corresponding symmetric and Hermitian counterparts in LAPACK. Moreover, our implementation also makes use of the LAPACK framework for computing, applying, and accumulating Householder and Givens transformations, which was designed for numerical stability and which is available in an optimized form for any common computer architecture.
Dense skewsymmetric matrices are stored as ordinary twodimensional Fortran matrices, but only the strictly lower or upper triangle needs to be set (for differences in the implementation between lower and upper triangular storage see App. C). For banded skewsymmetric matrices, only the strictly upper or lower diagonals are stored in a array , where is the number of nonzero offdiagonals and the size of the matrix. The th column of the matrix is stored in the th column of as

for , if the upper triangle is stored,

for , if the lower triangle is stored.
Note that in this scheme, also the zero diagonal is explicitly stored. This was done to keep the design identical to the storage scheme of symmetric and Hermitian band matrices in LAPACK.
SKTRF  Skewsymmetric tridiagonal decomposition of a dense matrix using the block ParlettReid algorithm. 

SKTF2  Skewsymmetric tridiagonal decomposition of a dense matrix using the ParlettReid algorithm (unblocked version). 
SKTRD  Skewsymmetric tridiagonalization of a dense matrix using block Householder reflections. 
SKTD2  Skewsymmetric tridiagonalization of a dense matrix using Householder reflections (unblocked version). 
SKPFA  Compute the Pfaffian of a dense skewsymmetric matrix (makes use of either SKTRD or SKTRF). 
SKPF10  Compute the Pfaffian of a dense skewsymmetric matrix (makes use of either SKTRD or SKTRF). The result is returned as to avoid over or underflow. 
SKBTRD  Skewsymmetric tridiagonalization of a banded matrix using Givens rotations. 
SKBPFA  Compute the Pfaffian of a banded skewsymmetric matrix (makes use of SKBTRD). 
SKPF10  Compute the Pfaffian of a banded skewsymmetric matrix (makes use of SKBTRD).The result is returned as to avoid over or underflow. 
Our library includes standalone routines for the tridiagonalization of a skewsymmetric dense matrix (SKTRF and SKTF2 using the ParlettReid algorithm, SKTRD and SKTD2 using the Householder approach) and banded matrices (SKBTRD). We also include functions to compute the Pfaffian of a skewsymmetric dense (SKPFA and SKPF10) and banded matrices (SKBPFA and SKBPF10), which are based on the tridiagonalization functions. As the determinant, the Pfaffian of a large matrix is prone to floating point over or underflow. Because of that, we have included routines that return the Pfaffian in the form , where is real or complex, and is always real and integer (SKPFA10 and SKBPF10). Both a Fortran95 and a Fortran77 interface are provided. In the Fortran77 version of the code the routine name is preceded by either S (single precision), D (double precision), C (complex single precision), or Z (complex double precision). The computational routines and their purpose are summarized in Table 1.
The block versions of the algorithm have an internal parameter controlling the block size. By default, the routines use the same block sizes as their symmetric counterpart from the LAPACK library. However, this internal parameter may be changed by the user to optimize for a specific architecture.
Apart from the documentation here, all routines (including the auxiliary ones) are documented extensively in the respective files. Due to our routines using LAPACK and BLAS, both libraries must be also linked.
iii.2 Python, Matlab and Mathematica
While most compiled languages (including C and C++) are easily interfaced with a Fortran library, interpreted languages such as Python or programs such as Matlab or Mathematica require somewhat more effort. For this reason we have included standalone versions of the tridiagonalization of dense skewsymmetric matrices using Householder reflections implemented in Python, Matlab and Mathematica. Those implementations, being of course slower than the Fortran counterpart, are useful especially for situations where speed is not critical. Both implementations also make use of the fact that for computing the Pfaffian, only the odd rows and columns need to be tridiagonalized, but always work on the full matrix instead of a single triangle.
Again, more extensive documentation for both implementations may be found in the respective files.
Iv Examples
iv.1 Benchmarks
To demonstrate the effectiveness of our methods, we have performed benchmark computations of the Pfaffian of large, random matrices on various architectures. In Table 2 we compare our approach with the other available software that can also be used to calculate the Pfaffian in certain situations (see Sec. I.3). For this benchmark we have compiled our Fortran implementation, and the implementations of Refs. Ward1978b () and GonzalezBallestero2010 () using the same compiler and compilation options, and chose a machineoptimized version of the LAPACK library Lapack ().
Block ParlettReid  Regular ParlettReid  Block Householder  Regular Householder  Givens for band matrix  DGEHRD (Lapack) Lapack ()  TRIZD from Ward1978b ()  PfaffianH from GonzalezBallestero2010 ()  PfaffianF from GonzalezBallestero2010 ()  
(a) benchmark for AMD Opteron 6174 2.2 Ghz  
real , dense  5.1  5.9  9.4  10.5    24.7  383.4  54.5  120.8 
real , banded,  0.9  0.7  9.3  10.5  2.1  24.8  383.2  54.3  121.7 
complex , dense  3.5  4.4  7.6  8.2        32.2  50.1 
complex , banded  0.8  0.7  7.6  8.1  2.0      31.9  50.2 
(b) benchmark for Intel Core 2 Duo E8135 2.66 Ghz  
real , dense  3.5  8.3  8.4  12.4    30.7  105.3  76.3  49.4 
real , banded,  0.7  0.5  8.4  12.2  1.4  30.4  105.4  75.9  48.5 
complex , dense  3.5  5.0  7.5  8.3        48.3  28.3 
complex , banded,  0.8  0.7  7.5  8.2  3.0      49.3  26.3 
From the benchmark results we can see that the block approach is always faster than the unblocked version. The relative speedup depends strongly on the architecture, but can reach up to 60%. We also observe that the relative speedup of the ParlettReid algorithm is larger than of the Householder tridiagonalization, reflecting the fact that the level3 content of the former is larger (see App. B).
For the banded random matrices we observe that the ParlettReid algorithm performs surprisingly well. Although it is not designed to make use of the bandedness of the matrix, the implementation of the skewsymmetric outer product update takes into account zeros in the vectors of the update. The Householder tridiagonalization does not benefit as much, as for the matrices here the band width growth in the Householder approach is faster than in the ParlettReid algorithm. It should be stressed however that the performance of these algorithms in the banded case depends on the actual values of the matrix. For example, band width growth is stronger in the ParlettReid algorithm, if the largest entries sit at the edge of the band.
The specialized approach for banded matrices using Givens transformations is still slightly slower than the ParlettReid algorithm for the matrix sizes considered here. The main benefit of the specialized algorithm for Pfaffian calculations is hence its much lower memory requirement. In fact, typically memory limits the matrix sizes that can be handled, not computational time. The banded Givensbased approach however is considerably faster than the Householder tridiagonalization which makes it very attractive for computing the canonical form (or eigenvalues in the real case).
Comparing to other approaches, we observe that our routines are always faster, typically by a factor of about 10 or more. In particular in the real case, our specialized approach is considerably faster than using the real Hessenberg reduction, although we do not always reach the full speedup of a factor of 10 that can be expected from the operation count, which is due to the optimization of the LAPACK library used. The implementation for real matrices of Ref. Ward1978b () is particularly slow as its memory access pattern is somewhat unfavorable for modern computer architectures.
iv.2 Application: topological charge of a disordered nanowire
Finally, we apply our approach to computing the Pfaffian to a physical example, namely the numerical computation of the topological charge of a disordered nanowire.
A nanowire made out of a topological superconductor supports at its two ends Andreev bound states pinned at the Fermi energy Kitaev2001 (); Wimmer2010 (); Lutchyn2010 (); Oreg2010 (); Potter2010 (). Because of particlehole symmetry, those states are Majorana fermions – particles that are their own antiparticle – and may allow for topologically protected quantum computing Kitaev2001 (). In contrast, an ordinary (trivial) superconducting wire does not support such states. The recent proposal to realize a topological superconductor using ordinary semiconducting and superconducting materials Lutchyn2010 (); Oreg2010 (); Potter2010 () has stirred a lot of interest towards Majorana physics in condensed matter.
A topological charge is a quantity that indicates whether a system is in the trivial or topological state, and hence signifies the absence or existence of Majorana bound states. A superconducting system exhibits particlehole symmetry which allows the Hamiltonian to be written in the form Kitaev2001 ()
(25) 
where is a skewsymmetric matrix, and , Majorana operators with , and . Below we further specialize to the case where particlehole symmetry is the only remaining symmetry, i.e. broken timereversal and spinrotation symmetry, which puts the system into class D of the general symmetry classification scheme Evers2008 ().
Kitaev has shown that for a translationally invariant wire,
(26) 
is a topological charge that signifies the absence () or existence () of Majorana bound states at the ends. In this expression, is the Hamiltonian in (Bloch) momentum space written in the Majorana basis. is a matrix with a size corresponding to the size of the unit cell, the unit cell length is denoted by . Note that the Pfaffian needs only be evaluated at two values of momenta which correspond to closing the unit cell with periodic () and antiperiodic () boundary conditions.
For a clean system, the size of is , where is the width of the wire, and Eq. (26) has been used previously to compute the topological charge Kitaev2001 (); Lutchyn2010 (). A disordered system can be considered (up to finite size effects, see below) in Eq. (26) as a large, disordered supercell repeated periodically. In this case, the size of is , where is the length of the supercell. This implies that in a disordered system will be a very large matrix, and we are not aware of any application of Eq. (26) for such as system. However, the sparse structure of , in particular its bandedness, allows the application of the special algorithms developed in this work, and allows for the first time applying (26) to large, disordered systems.
Recently, an alternative definition of the topological charge for class D systems has been shown Akhmerov2011 (). In contrast to Eq. (26) which is based on properties of the Hamiltonian, this alternative definition is based on transport properties:
(27) 
where is the reflection matrix. This definition is equally applicable to clean and disordered systems. Below we show numerically the equivalence of the definitions (26) and (27).
For this we use the model of Refs. Lutchyn2010 (); Oreg2010 () with a normal state Rashba Hamiltonian
(28) 
where is the effective mass of the twodimensional electron gas, the Rashba spinorbit coupling, and the Zeeman splitting due to an external magnetic field. Characteristic length and energy scales are and . The electrons are then confined into a nanowire of width and length in the plane. For the numerical treatment, the Hamiltonian is discretized on a square grid with lattice constant and thus represented by a matrix , where , denote lattice sites, and , the spin degrees of freedom. Disorder is introduced as a random onsite potential taken from the uniform distribution . The Hamiltonian of the system in contact with a swave superconductor then reads
(29) 
where is the proximityinduced pair potential. Defining Majorana operators as and we can transform Eq. (29) into the form (25) with a skewsymmetric matrix (whose bandwidth is reduced for the numerics with a bandwidth reduction algorithm Gibbs1976 ()).
In Fig. 3 we show the numerical results for the topological charge as defined in Eqs. (26) and (27). For the computation of the Pfaffian in (26) we apply the Givens based method from Sec. II.4, for computing the reflection matrix the numerical method of Ref. Wimmer2009 ().
In all cases, clean and disordered, both definitions of agree very well. In particular, both definitions predict a vanishing of the topological phase for the largest disorder in the region . There are only very small differences in the exact value of where the topological charges change sign. These differences can be explained by finite size effects: At these points the bulk of the nanowire has a significant conductance () which in turn means that the different geometry of Eqs. (26) (infinite repetition of a supercell) and (27) (single supercell connected to metallic leads) matter. In contrast, in the regime where the bulk of the wire is fully insulating (), both definitions agree fully.
The algorithmic developments in this work have allowed to evaluate Eq. (26) for a fairly large disordered system. The bandwidth of the respective skewsymmetric matrix scales , and hence the cost of tridiagonalization as . In contrast, the definition of the topological charge from transport properties (27) scales Wimmer2009 (). Hence, from a computational viewpoint, Eq. (27) is more favorable. It is thus reassuring that our numerical experiments showed the equivalence of both definitions.
V Conclusions
We have shown that both the computation of the Pfaffian and the canonical form of a skewsymmetric matrix can be solved easily once the matrix is reduced to skewsymmetric (partial) tridiagonal form. To find this form, we have then developed tridiagonalization algorithms based on Gauss transformations, using a skewsymmetric, blocked version of the ParlettReid algorithm, and based on unitary transformations, using block Householder reflections and Givens rotations, applicable to dense and skewsymmetric matrices, respectively. These algorithms have been implemented in a comprehensive numerical library, and its performance has been proven to be superior to other approaches in benchmark calculations. Finally, we have applied our numerical method for computing the Pfaffian to the problem of computing the topological charge of a disordered nanowire, showing numerically the equivalence of different methods based on the Hamiltonian or the scattering matrix of the system.
Acknowledgements.
MW acknowledges stimulating discussions with F. Hassler, I.C. Fulga, A.R. Akhmerov, C.W. Groth and C.W.J. Beenakker, as well as support from the German academic exchange service DAAD.Appendix A The computation of the canonical form of a skewsymmetric matrix
The problem of computing the canonical form of an evendimensional skewsymmetric matrix has been discussed in Ward1978a (); Golub1996 (); Hastings2010 () (the odddimensional problem can be reduced to an evendimensional one by a series of Givens rotations Ward1978a ()). Here we give a selfcontained derivation for completeness.
A skewsymmetric matrix can be reduced to tridiagonal form with unitary (orthogonal in the real case) and tridiagonal as given in Eq. (6). This tridiagonal matrix can be rewritten as
(30) 
with as given in Eq. (13) and the permutation
(31) 
From the singular value decomposition (SVD) of , with , , , and , unitary (orthogonal in the real case) matrices, we then obtain
(32) 
But since and with defined in Eq. (11), we find the canonical form of as
(33) 
with
(34) 
In practice, it suffices to implement the SVD for real , as any complex matrix can be reduced to real tridiagonal form using unitary transformations. For the computation of the SVD one can make use of the computational routines for bidiagonal matrices from LAPACK Lapack ().
Appendix B Block versions of the ParlettReid and Householder tridiagonalization algorithms
The application of the Gauss and Householder transformations in the update operations Eqs. (17) and (20) are inherently level2 matrix operations. It is however possible to accumulate transformations and apply those simultaneously in a block representations Golub1996 () which has a higher level3 fraction. This procedure is also used for the tridiagonalization of symmetric matrices lawn2 () and we describe its application to the skewsymmetric case below.
Both algorithms are based on transformations of the form
(35) 
and the update operation is of the form
(36) 
with .
Assume that the matrix after the th update is given as
(37) 
where and are matrices. For , and . The th update can then be written as
(38) 
where and , and
(39) 
can also be computed without forming explicitly.
Of course, while it may not be necessary to compute the full explicitly, the determination of the vectors and requires the knowledge of the th row or column of . In practice, the matrix is therefore partitioned into panels of rows and columns. successive Householder reflections are then computed and applied one by one to the rows and columns in the current panel, but not to the remaining matrix. Instead, they are accumulated as described above and the remaining part of the matrix is updated in one block update of the form (37) which is rich in level3 matrix operations.
In the case of the ParlettReid algorithm, is a unit vector and hence the computation of has little cost. In this case the computational cost is dominated by the outer product update and hence the block version consists almost entirely of level3 matrix operations. This is not the case for the Householder tridiagonalization where the matrixvector multiplication to compute remains inherently a level2 matrix operation.
Appendix C Upper versus lower triangle storage in the Fortran implementation
In order to make full use of the skewsymmetry of the problem, it is essential that an algorithm only works with either the lower or upper triangle of the matrix. This is done in the Fortran implementation. However, in this case it is also mandatory to use mainly column instead of row operations, as Fortran matrices are stored contiguously in memory columnbycolumn. For this reason, the Fortran code implements the algorithms described in this paper verbatimly only if the lower triangle of the matrix is provided. Below we briefly describe the differences when the upper triangle is used.
Instead of starting the tridiagonalization in the first column of the matrix, the versions using the upper triangle start in the last column.
If a partial tridiagonalization is computed, it is not of the form (8), but has only for even and . This amounts to interchanging rows and columns and , and , , and in Eq. (8). However, since the determinant of this permutation is equal to one, the value of the Pfaffian does not change.
In the case of the ParlettReid algorithm, a decomposition is computed, where is an upper unit triangular matrix.
References
 (1) A. Y. Kitaev, PhysicsUspekhi 44, 131 (2001).
 (2) L. Fu and C. L. Kane, Phys. Rev. B 76, 045302 (2007).
 (3) I. C. Fulga, F. Hassler, A. R. Akhmerov, and C. W. J. Beenakker, arXiv:1101.1749v1.
 (4) M. Bajdich and L. Mitas, Acta Phys. Slov. 59, 81 (2009).
 (5) C. K. Thomas and A. A. Middleton, Phys. Rev. E 80, 046708 (2009).
 (6) G. Moore and N. Read, Nucl. Phys. B 360, 362 (1991).
 (7) F. Haake, Quantum Signatures of Chaos (Springer, Berlin, Heidelberg, 2004).
 (8) I. Montvay, Nucl. Phys. B 466, 259 (1996).
 (9) G. Galbiati and F. Maffioli, Discrete Appl. Math. 51, 269 (1994).
 (10) G. Rote, pp. 119–135 in Computational Discrete Mathematics (Springer, Berlin, Heidelberg, 2001).
 (11) W. Krauth, Statistical mechanics: algorithms and computations (Oxford University Press, Oxford, 2006).
 (12) J. R. Bunch, Math. Comp. 38, 475 (1982).
 (13) N. J. Higham, Accuracy and stability of numerical algorithms (Society for Industrial and Applied Mathematics, Philadelphia, 2002).
 (14) J. Rubow and U. Wolff, A factorization algorithm to compute pfaffians, arXiv:1102.3576v1.
 (15) L.K. Hua, Amer. J. Math. 66, 470 (1944).
 (16) J. W. Stander and N. A. Wiegman, Can. J. Math 12, 438 (1960).
 (17) J. H. Bardarson, J. Phys. A: Math. Theor. 41, 405203 (2008).
 (18) B. Béri, Phys. Rev. B 79, 245315 (2009).
 (19) R. C. Ward and L. J. Gray, ACM Trans. Math. Software 4, 278 (1978).
 (20) G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd edition (The John Hopkins University Press, 1996).
 (21) M. B. Hastings and T. A. Loring, arXiv:1012.1019v1.
 (22) E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide, 3rd edition (Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999).
 (23) R. C. Ward and L. J. Gray, ACM Trans. Math. Software 4, 286 (1978).

(24)
N. J. Higham,
The Matrix Computation Toolbox,
http://www.ma.man.ac.uk/~higham/mctoolbox
.  (25) C. GonzálezBallestero, L. Robledo, and G. F. Bertsch, arXiv:1012.5022v1.
 (26) B. N. Parlett and J. K. Reid, BIT 10, 386 (1970).
 (27) J. O. Aasen, BIT 11, 233 (1971).
 (28) J. R. Bunch and L. Kaufman, Math. Comp. 31, pp. 163 (1977).
 (29) R. Lehoucq, The computation of elementary unitary matrices, Technical Report 72, LAPACK Working Note, 1995.
 (30) D. Irony and S. Toledo, SIAM. J. Matrix Anal. & Appl. 28, 398 (2006).
 (31) L. Kaufman, Numer. Linear Algebra Appl. 14, 237 (2007).
 (32) H. Schwarz, Numer. Math. 12, 231 (1968).
 (33) L. Kaufman, ACM Trans. Math. Softw. 10, 73 (1984).
 (34) L. Kaufman, ACM Trans. Math. Softw. 26, 551 (2000).
 (35) M. Wimmer, A. R. Akhmerov, M. V. Medvedyeva, J. Tworzydło, and C. W. J. Beenakker, Phys. Rev. Lett. 105, 046803 (2010).
 (36) R. M. Lutchyn, J. D. Sau, and S. Das Sarma, Phys. Rev. Lett. 105, 077001 (2010).
 (37) Y. Oreg, G. Refael, and F. von Oppen, Phys. Rev. Lett. 105, 177002 (2010).
 (38) A. C. Potter and P. A. Lee, Phys. Rev. Lett. 105, 227003 (2010).
 (39) F. Evers and A. D. Mirlin, Rev. Mod. Phys. 80, 1355 (2008).
 (40) A. R. Akhmerov, J. P. Dahlhaus, F. Hassler, M. Wimmer, and C. W. J. Beenakker, Phys. Rev. Lett. 106, 057001 (2011).
 (41) N. E. Gibbs, J. William G. Poole, and P. K. Stockmeyer, SIAM J. Num. Anal. 13, 236 (1976).
 (42) M. Wimmer and K. Richter, J. Comput. Phys. 228, 8548 (2009).
 (43) J. J. Dongarra, S. Hammarling, and D. C. Sorensen, Block reduction of matrices to condensed forms for eigenvalue computations, Technical Report 2, LAPACK Working Note, 1987.