# Steady-state solution methods for open quantum optical systems

###### Abstract

We discuss the numerical solution methods available when solving for the steady-state density matrix of a time-independent open quantum optical system, where the system operators are expressed in a suitable basis representation as sparse matrices. In particular, we focus on the difficulties posed by the non-Hermitian structure of the Lindblad super operator, and the numerical techniques designed to mitigate these pitfalls. In addition, we introduce a doubly iterative inverse-power method that can give reduced memory and runtime requirements in situations where other iterative methods are limited due to poor bandwidth and profile reduction. The relevant methods are demonstrated on several prototypical quantum optical systems where it is found that iterative methods based on iLU factorization using reverse Cuthill-Mckee ordering tend to outperform other solution techniques in terms of both memory consumption and runtime as the size of the underlying Hilbert space increases. For eigenvalue solving, Krylov iterations using the stabilized bi-conjugate gradient method outperform generalized minimal residual methods. In contrast, minimal residual methods work best for solvers based on direct LU decomposition. This work serves as a guide for solving the steady-state density matrix of an arbitrary quantum optical system, and points to several avenues of future research that will extend the applicability of these classical algorithms in absence of a quantum computer.

^{†}

^{†}: New J. Phys.

## 1 Introduction

Understanding the interplay between a quantum system and its environment is of fundamental importance for realistic quantum systems [1]. For although much care is taken experimentally to eliminate the unwanted influence of external interactions, there remains, if ever so slight, a coupling between the system of interest and the external world. In addition, any measurement performed on the system, classical or quantum, necessarily involves coupling to the measurement device, thereby introducing an additional source of external influence [2, 3]. By definition, an open quantum system is coupled to an environment where the complexity of the environmental dynamics renders the combined evolution of system plus reservoir intractable. However, for a system weakly coupled to its surroundings, there is a clear distinction between the system and its environment, allowing for the dynamics of the latter to be traced over, resulting in a reduced density matrix describing the system alone.

Finding the steady-state solution to the reduced density matrix of an open quantum system is of great importance for both experimental and theoretical investigations of quantum optics and related systems such as trapped ions, superconducting circuits, and quantum nanomechanical systems (e.g. see Refs. [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]). In addition to being a good approximation to the density matrix for many experimental quantum systems, in some devices it is possible to observe quantum features in the steady-state density matrix, such as negative Wigner functions [11, 16, 15], under experimentally feasible operating conditions that can be repeatedly measured without detrimentally affecting the state of the system. Often the steady-state density matrix cannot be expressed analytically, and as such, the efficient numerical simulation of this operator is a key component in the exploration of a multitude of quantum optical systems.

In absence of a quantum computer, the numerical calculation of the steady-state density matrix must be solved using classical computing resources, where the fundamental limit on the size of quantum system that can be explored is constrained by the exponentially increasing dimensionality of the underlying composite Hilbert space. However, when representing the quantum mechanical operators in a chosen basis using sparse matrices, the dimensionality of the underlying Hilbert space is currently not the limiting factor in determining the steady-state solution for a quantum optical system. Rather, it is the difficulties that arise due to the lack of Hermicity and poor conditioning of the Liouvillian super operator that lead to large runtimes and memory consumption when using sparse linear solution methods. As such, a thorough understanding of the available algorithms, and the development of techniques designed to overcome these limitations, is crucial to enabling the classical simulation of ever larger quantum systems.

In this work, we detail the numerical solution methods available when solving for the steady-state density matrix for an arbitrary time-independent quantum optical system where the operators are represented as sparse matrices. We consider the standard eigenvalue and linear system solution methods, along with recently developed iterative techniques and bandwidth reducing permutation methods designed to overcome the limitations of these standard approaches [16]. In addition, we put forth a doubly-iterative inverse power method, that yields better performance as compared to previously considered iterative methods when bandwidth reduction permutations are less effective. Here we show that, with the addition of this new doubly-iterative algorithm, iterative solution methods, in general, outperform their direct factorization counterparts in terms of both algorithm runtime and memory consumption for large quantum systems provided that the Liouvillian operator is suitably permuted such that the approximate inverse is well-conditioned. Both memory and solution time benchmarks can be reduced by an order of magnitude, or more, when using the appropriate iterative technique for a given Liouvillian. These methods allow for the simulation of ever larger quantum systems, and help to mitigate the need for a quantum computer.

This paper is organized as follows. In Sec. 2 we introduce the eigenvalue equation whose solution is the steady-state density matrix for which we must solve. Section 3 explores the use of eigenvector solution methods, their dependence on LU factorization, and the requirement of well-conditioned eigenvalues. In Sec. 4 we make use of the unit trace property of the density matrix to recast the eigenvalue problem as a linear system of equations, and discuss matrix permutation techniques designed to reduce fill-in in the LU decomposition that affects both memory and runtime. Section 5 discusses preconditioned iterative solvers and highlights the difficulties in implementing these techniques as general solution methods. Numerical simulations of several canonical quantum optical systems implementing these solution methods are presented in Sec. 6. Finally, Sec. 7 concludes with a brief discussion of the results and directions for possible future research.

## 2 Steady-state density matrix

For an open quantum system with decay rates larger than the corresponding excitation rates, the system approaches the steady-state density matrix as satisfying the eigenvalue equation

(1) |

where is the Liouvillian super operator. In many quantum optical systems, the Liouvillian takes the most general Lindblad form

(2) |

with the dissipative terms given by

(3) |

where the are collapse operators determined by the rates and operators for each dissipation channel that couples the system to its environment or measurement apparatus. This form of the Liouvillian is not required for the analysis presented here, and can be substituted with any time-independent Liouvillian.

If the system Hamiltonian and collapse operators are time-independent, or can be transformed into such a form by moving to an interaction representation, then Eq. (1) can be recast as a sparse matrix eigenvalue equation

(4) |

where is the dense vector formed by vectorization (column stacking) of , and is the sparse matrix representation of the Liouvillian in a given basis representation. In what follows, we will always assume a Fock state basis for oscillator modes, and a -basis representation for spin operators. Given the non-Hermitian form of the Liouvillian, the solutions to Eq. (4) are non-trivial, and the methods by which this equations can be solved is the focus of the remainder of this work.

## 3 Eigenvalue Methods

Let us assume that the eigenvalues of the Lindblad operator , formed from a composite Hilbert space with dimensionality , can be diagonalized with eigenvalues ordered as . Using the spectral decomposition where and is the matrix with the eigenvectors of as the column entries, repeated application of the Liouvillian obeys the relationship

(5) |

For the vector , generated from a random vector , the application of the Lindblad operator -times can be expressed as

(6) |

with for and a sufficiently large provided that , i.e. the initial vector has a nonzero component in the direction of the dominate eigenvalue, and is well separated from . The former criterion is automatically met by starting with a random vector. Provided that these conditions are satisfied then is approximately parallel to the dominate eigenvector with error decreasing linearly at an asymptotic rate given by . The power series solution for the dominate eigenvector can therefore be written as

(7) |

where is a chosen vector norm, e.g. the -norm. Since any eigenvalue problem is equivalent to finding the roots of the characteristic polynomial, from the Abel-Ruffini theorem, it is impossible to directly compute the eigenvalues for any matrix with dimension [17]. Power iteration itself is not suitable for finding the steady-state density matrix as the zero eigenvalue is not the dominate eigenvalue of . Furthermore, if , then the convergence of this method is prohibitively slow. The former of these shortcomings can be addressed by modifying the power iteration method, whereas the latter is not encountered in practice.

To transform the power method into an algorithm suitable for our purposes, recall that this method relies on the identity given in Eq. (5). Given a function , defined locally via a convergent power series, with the eigenvalues of in the radius of convergence, then one can define via the same series, and with . For the specific choice of one has

(8) |

where is the identity matrix and the shifted-Liouvillian is denoted by .

Performing power iteration on Eq. (8) converges to the dominate eigenvalue where is maximal; the solution vector corresponds to the eigenvalue in the complex plane closest to . This method is known as the Shifted-Inverse Power technique [17] where the iterations are given by

(9) |

Although Eq. (9) is formally correct, in practice, it is both computationally and memory intensive to calculate the inverse of a sparse matrix, as the inverse is in general dense. However, the product is equivalent to solving the linear system , and thus the inverse-power method requires computing the direct sparse LU factorization of the matrix . This factorization need only be done once, with the solution for different requiring only computationally efficient forward and backward substitution. The convergence of shifted-inverse iteration is always linear in the number of iterations. However, the rate of convergence is determined by the accuracy in choosing and the proximity of the dominant eigenvalue to its nearest neighbor. As we know the exact value of the desired eigenvalue, , and the dominant eigenvalue is well separated, the number of iterations required to reach a given error tolerance is quite small, usually requiring only a single iteration to reach convergence for a random input vector. This holds true even when converging to machine precision. As such, the computational and memory intensive portion of inverse-power iteration lies in the LU factorization of the shifted Liouvillian operator.

While Eq. (4) requires finding the eigenvector associated with the zero eigenvalue, in practice, the choice of is often nonzero and small. When working with double-precision numbers, a value of is typically used. The effect of this slight shift is to guarantee that the matrix has a nonzero main diagonal, while simultaneously perturbing the matrix away from being strictly singular. Such a transformation is crucial to the successful reduction in memory consumption for sparse LU factorization provided by the sparse reordering methods discussed in Sec. 4 where a nonzero diagonal is assumed.

Given that is nearly singular, this matrix is extremely ill-conditioned, and one would not expect an accurate result for the the steady-state density matrix. However, it can readily be shown that the effectiveness of inverse iteration does not depend on the condition number of [17]. Rather it is the condition number of the zero eigenvalue and its eigenvector that determines the stability of the solution. For a non-degenerate eigenvalue, the associated condition number gives a measure of how close to degenerate the eigenvalue becomes for a given permutation of the system matrix. Unlike Hermitian matrices, where the eigenvalue spectrum is stable against such perturbations [17], the eigenvalues of non-symmetric matrices can be arbitrarily ill-conditioned [18], and the associated loss of accuracy can give rise to considerable errors in the computed eigenvectors. As the convergence rate of inverse power iteration also depends on the eigenvalue of interest being well separated, poorly conditioned eigenpairs lead to poor performance when using this algorithm. Prior to computing the eigenvalues, applying a similarity transform , where is a diagonal matrix, that reduces the norm of , or the condition number of a subset of eigenvalues can be advantageous [19]. The possibility of ill-conditioned eigenvalues must be kept in mind when using eigenvalue solving methods, but is rarely encountered in practice ^{1}^{1}1We are unaware of any example where the conditioning of the zero eigenvalue leads to a break down in the inverse-power method. However, ill-conditioning of eigenvalues can give spurious results when computing the general eigenvalues of a Liouvillian operator [20].. Indeed, the inverse power method typically converges in a single iteration. Supplementing the numerical solution with analytical results, and/or varying parameter values over a given range and looking for discontinuous jumps in quantities such as density matrix metrics, for example the fidelity or trace distance, and expectation values can help in part help to diagnose the presence of an ill-conditioned zero eigenvalue.

Finally, Eq. (4) can also be solved using algorithms suitable for finding a subset of eigenvalues of a sparse matrix such as, for example, ARPACK [21]. This method works by building the Krylov subspace formed from repeated applications of the shifted Liouvillian on the initial random vector

(10) |

with , followed by an orthogonalization procedure based on Arnoldi iteration. For a non-symmetric matrix, building the Krylov subspace requires first performing the sparse LU factorization of and, as in the inverse-power method, this factorization represents the bulk of the runtime and memory usage in this method.

## 4 Direct factorization

As an alternative to solving for the zero eigenvalue, it is possible to find a direct (i.e non-iterative) solution to via sparse LU decomposition [22] by making use of the unit trace property of the density matrix to recast the eigenvalue equation as a linear system of equations

(11) |

where is a controllable weighting factor and is a matrix that enforces , with ones along the upper row in the columns corresponding to the locations of the diagonal elements in . Although the choice of row for the nonzero values in is arbitrary, importantly, this matrix always has a nonzero element in the final column. In contrast to iterative methods, the eigenvalue condition number has no effect on direct factorization, making this method attractive for finding steady-state solutions to open quantum systems. Note that the restriction is already included in the Liouvillian operator , and its matrix representation . Here, this constraint is used to simply add a constant vector to both sides of Eq. (4).

When performing the LU factorization of a sparse matrix, nonzero elements arise in the and factors that are not present in the original matrix; the sparse structure of is not the same as [22]. This fill-in, must be minimized in order to reduce both the memory requirements for storing the LU factors and the runtime of factorization, both of which scale with the number of nonzero matrix elements [22]. A convenient measure of the fill-in is given by the ratio of nonzero elements in the and factors to those in the original Liouvillian called the fill factor. The fill-in is sensitive to the order in which the rows and columns of a sparse matrix are operated on, and in particular, to the matrix bandwidth size and profile [23]. For a non-symmetric sparse matrix , one can define the upper and lower bandwidths, ‘’ and ‘’ respectively, to be

(12) |

The total bandwidth is then the sum , where the one takes into account the main diagonal [24]. Likewise, we define the upper profile and lower profile as

(13) |

such that the total profile is . From the definition of the bandwidth, we see that imposing the trace condition with the matrix gives an upper bandwidth that is equal to the square of the dimensionality of the Hilbert space. Therefore, the total bandwidth is bounded below by this value, suggesting that the fill-in for the LU factors rises rapidly with system size. Arising from the use of the trace matrix , and not the form of the Liouvillian, this bandwidth scaling holds for any system. It is therefore advantageous to find a matrix permutation that moves the elements of , as well as all other elements, toward the diagonal, thereby reducing the overall bandwidth and profle of the modified Liouvillian.

Finding the minimal bandwidth of a matrix is an NP-complete problem [25], and therefore several heuristics have been developed that attempt to minimize the bandwidth while remaining computationally efficient. One common technique is to permute the rows and columns of a symmetric matrix based on the Cuthill-McKee (CM) ordering [26]. Taking the structure of a symmetric sparse matrix as the adjacency matrix of a graph, CM ordering does a breadth-first search (BFS) of the graph starting with a node (row) of lowest degree, where the degree of the node is defined to be the number of nonzero elements in the matrix row, and visiting neighboring nodes in each level-set in order from lowest to highest degree. This is repeated for each connected component of the graph. CM ordering also reduces the profile of the matrix, and it was noticed that reversing the CM order, the RCM ordering, gives a superior profile reduction while leaving the bandwidth unchanged [23]. Since RCM operates on the structure of a matrix, only the locations of nonzero matrix elements, and not their numerical values, are used in this reordering. Modified RCM methods, such as the Gibbs-Poole-Stockmeyer (GPS) algorithm [27], provide a similar bandwidth and profile reduction. In the Fock basis, the effect of RCM reordering is to permute the basis vectors such that the Fock states are no longer in ascending order.

As the Liouvillian operator is itself non-symmetric, we calculate the RCM ordering of the symmetrized form , and apply the resulting row and column ordering to to obtain . One can also used the symmetrized product or , however this would lead to dense matrices and large memory consumption unless done symbollically. While RCM reordering of a symmetric matrix is guaranteed to reduce the bandwidth and profile, or at least do no worse [23], the need to operate on rather than itself invalidates this assurance. RCM reordering of overestimates the graph structure of the original modified Liouvillian, and there is little a priori information from which to judge whether RCM will significantly reduce the bandwidth and profile when the structure of the Liouvillian operator is varied. This is especially true when the structure of is far from symmetric, and the two matrix structures differ by a significant amount. Structural changes such as setting to zero numerical system parameters in the Hamiltonian or collapse operators can also affect the performance of RCM ordering.

Along with symmetric reorderings that reduce both the bandwidth and profile of the Liouvillian, it is possible to reduce the fill-in via non-symmetric permutations that sort only the columns, or rows, of the matrix. In the present case of non-symmetric matrices, the best performing general purpose reordering strategy is the Column Approximate Minimum Degree (COLAMD) ordering [28] that is the default reordering method in the SuperLU linear solver used here [29], as well as in many commercial solvers such as those found in Matlab [30]. This method finds the symmetric permutation of that, when applied as a column permutation to , minimizes the worst case fill factor for an arbitrary permutation of the matrix rows. In this case, the product is computed symbolically to avoid the explicit construction of the resulting dense matrix.

Minimizing the fill-in using COLAMD reordering requires a matrix with a nonzero main diagonal. In the case of the inverse-power method, this condition was automatically satisfied when applying the eigenvalue shift. In contrast, the diagonal of the modified Liouvillian can, in general, contain zero elements that must be permuted away. As the modified Liouvillian is necessarily non-singular, it is always possible to find a non-symmetric permutation that makes the diagonal zero-free [22]. A simple method, using only the matrix structure, is Maximum Bipartite Matching (MBM) [31]. Here we implementation a variation of the MBM algorithm using a BFS method processing the columns of the matrix in order starting with the column possessing the largest absolute value element. From this initial column, the algorithm proceeds via a BFS where the rows are processed in descending order based on the absolute value of each row element. The corresponding columns are marked as being visited, and the routine continues until all columns have been processed. The goal of this weighted variant of the MBM algorithm is to permute the matrix to a zero-free diagonal while simultaneously attempting to increase the sum of the diagonal elements. This can help, in part, to overcome difficulties in the construction of approximate LU factorizations discussed in Sec. 5 [32] . Although the diagonal sum is not guaranteed to be maximized, this method is computationally less intensive than maximum product traversal [33] and related methods. Note that RCM permutation does not require a non-zero diagonal, and in many cases, applying a non-symmetric permutation such as MBM before RCM can lead to both a larger bandwidth and profile as the matrix structure becomes less symmetric.

## 5 Iterative Solvers

The main drawback in using direct LU decomposition is the poor scaling in terms of both runtime and memory requirements, even when symmetric and/or non-symmetric reordering methods are employed, as the dimension of the matrix increases. Therefore, for sufficiently large sparse matrices, iterative methods are the only available solution method [34], with the most common choice being iterative Krylov solvers [24]. While iterative methods require less memory and fewer numerical operations than direct methods, these methods usually require preconditioning to achieve a sufficient tolerance level and a reasonable convergence rate [34]. The goal of preconditioning is to convert the original system of equations, Eq. (11), into the modified linear system

(14) |

where is the preconditoner. Convergence is improved provided that the condition number of is significantly lower than that of itself. As the condition number of a sparse matrix grows with the dimensionality [22], preconditioning is required when solving large sparse systems. The best preconditioner is obviously , however this is equivalent to solving the original system of equations. Instead, it is possible to efficiently solve for an approximation of the inverse . The application of a suitable preconditioner should make the modified linear system Eq. (14) easy to solve, while the preconditioner itself should be simple to build and apply as one or more matrix-vector products are required for each iteration. Moreover, the condition number of should not be so large as to affect convergence.

The iLU class of preconditoners are constructed from an incomplete (approximate) LU factorization to the modified Liouvillian by discarding fill-in elements based on a designated dropping strategy. The method used here is an incomplete LU factorization with dual-threshold and pivoting (iLUTP) [24], where a drop tolerance and allowed fill-in are specified such that all fill-in smaller than times the infinity-norm of a row are dropped, and at most only fill-in elements are allowed. Note that these parameters are not independent. In the limit where , the iLU preconditioner returns the complete LU factorization, and the fill-in for the direct factorization can be viewed as an upper-bound on the size of the preconditioner. The condition number of can vary as a function of the drop tolerance, and therefore decreasing does not necessarily improve the convergence rate [32]. A convenient estimate for the condition number of the approximate inverse is given by , where [32] and is the infinity-norm. Although this measure gives only a lower bound on , it has proved to be an useful benchmark for the stability of [16]. Finding the best combination of parameters for a given matrix is a trial and error process, thus preventing the use of these methods as a general purpose solver.

While the generation of robust iLU preconditioners in the case of an Hermitian matrix is now well established, the existence and stability of preconditioners for non-symmetric matrices is understood to a lesser extent [34]. In the non-symmetric case, iLU preconditioners typically fail due to a lack of diagonal dominance, zeros along the diagonal, and inaccuracies in the approximate inverse due to the dropping of small nonzero elements [32]. Moreover, even if a preconditioner is found, its condition number can be larger than that of the original matrix and hence convergence is lost. However, studies have shown that these failures may be overcome by utilizing symmetric and/or non-symmetric reorderings of the matrix to maximize the sum of the diagonal elements, and reduce the overall bandwidth and profile [32, 35, 34]. The majority of these reordering strategies are developed for matrices with symmetric structure (graphs) and therefore their application to non-symmetric problems with differing matrix structures must be evaluated on a case by case basis (see Ref. [34] and references therein); there is no universally applicable reordering scheme for non-symmetric matrices. However, previous studies on symmetric [36] and non-symmetric sparse matrices [35] have found that RCM reordering improves iLU conditioning at the expense of a marginal increase in fill-in when compared with other reordering strategies. An intuitive explanation for the case of symmetric matrices is given in Ref. [36]. In contrast, some of the best methods for fill-in reduction such as COLAMD do poorly when applied to iLU factorization [34].

Having successfully found a preconditioner for a given drop tolerance and fill-in , a solution to the steady-state density matrix can, in principle, be found using iterative Krylov solution methods [24] designed for non-symmetric matrices. Here, we perform the preconditioned iterations using both restarted generalized minimum residual (GMRES) [37] and stabilized biconjugate gradient (BiCGSTAB) solvers [38]. The GMRES algorithm constructs the Krylov subspace Eq. (10) for the preconditioned system and finds the vector in this subspace with minimal residual, as measured by the -norm, via Arnoldi iteration. Unlike the symmetric case, where there is a simple three-step recurrence relation for finding orthogonal Krylov vectors [39], the orthogonalization procedure must be done explicitly, which is both computationally and memory intensive, growing as where is the iteration count. As such, a restarted variant of the original GMRES method is typically used, where the approximate solution vector after -iterations is calculated in the subspace , where is much less than the dimensionality of the system in question, and then is used as the new initial condition for the next -iterations. An optimal value for , minimizing excessive work, while still leading to convergence is problem specific and must be found through trial-and-error. The BiCGSTAB algorithm can be viewed as a biconjugate-gradient step followed by a GMRES step with giving a local reduction in the residual vector, leading to smoother (stabilized) convergence behavior [39]. This method is less computationally intensive than GMRES, but comes at the cost of, in general, poorer convergence properties.

Regardless of which method is used, an acceptable tolerance on the -norm of the calculated residual vector must be supplied as a stopping criterion. In addition, one can specify a maximum number of iterations to perform, before convergence is assumed to be not possible. Setting strict tolerance values, e.g. requires building accurate, well conditioned preconditioners and possibly many iterations, thus increasing the computation time and memory storage requirements. In contrast, allowing more moderate values for the solution tolerance, can give a significant performance benefit in some cases. For a given solution tolerance, the optimal values for the preconditioner fill factor and drop tolerance can vary in a non-trivial manner and must be found empirically.

Here we are interested in using these preconditioned iterative methods in the solution for the modified Liouvillian Eq. (11), as well as in the LU factorization of used in the inverse power method. Given that the inverse-power method itself is an iterative method, we designate this method as the doubly-iterative inverse-power method to distinguish it from the usual inverse-power technique using a direct LU factorization of . Although using an iterative solver in the inverse-power method is typically not ideal due to the need to solve for a new right-hand vector at each iteration, our method relies on the knowledge that only a single inverse-power iteration is typically needed when solving the density matrix for a given . Therefore, it is possible to obtain a performance improvement provided that the system is large enough, and one forms a high-quality preconditioner. In order for this doubly-iterative method to converge, the tolerance on the iterative iLU solution vector must be stricter than the tolerance used to stop the inverse-power iterations. In what follows, we will always set the tolerance on the iterative iLU solver to be an order of magnitude lower than that of the inverse-power iterations.

## 6 Numerical Simulations

Given the non-Hermitian structure of the Liouvillian operator, the optimal solution method for finding the steady-state density matrix of the system will, in general, be problem specific [34]. Therefore, in order to gauge the general performance of each solution method and matrix reordering methods, we apply both eigenvalue and direct solvers to the steady-state solution for a set of common quantum optical systems. Note that there is an ambiguity when building the sparse matrix representation of a given system Liouvillian since the tensor product, and therefore matrix structure, depends on the ordering of the operators involved. For concreteness we will specify the operator ordering, although the choice ordering does not affect the results presented here.

First, we consider the canonical driven Jaynes-Cummings model

(15) |

where is the detuning between the driving frequency and cavity frequency . The frequency for the two-level system is given by , while represents the coupling strength between the atomic and cavity modes, and gives the amplitude of the external drive in units of frequency. Note that we have not applied the rotating wave approximation to Eq. (15) as we want a numerical solution to the full Hamiltonian. In addition, we consider a Liouvillian (2) formed from cavity collapse operators and corresponding to a thermal bath with an average thermal occupation number at , and qubit dissipation described by , where is the qubit lowering operator. The constants and characterize the energy decay rates for the cavity and qubit, respectively. Here we are interested in the solution as the number of cavity states increases, and therefore fix the numerical parameters to be (in units of the cavity frequency ): , , , , , , and . The modified Liouvillian representing Eq. (15) together with the collapse terms, is an example of a system that has zero elements along the diagonal. As such, before using COLAMD ordering, we will always apply the weighted variant of MBM to first permute the diagonal to be zero-free. Here the operator ordering is given by and

In addition, we consider a spin-chain consisting of -spins where the first spin is being driven externally

where the frequency of each spin is denoted by , is the detuning between the drive and resonant frequency for the first spin, represents the driving strength, and the are the nearest neighbor coupling strengths. For simplicity, we assume each spin has an identical frequency , and coupling strengths . The driving amplitude is taken to be . Finally, we also assume that each spin has a dephasing term given by , where .

Lastly, we look at an optomechanical Hamiltonian in a frame rotating at the driving frequency [40]

(17) |

where , and are the annihilation operators for the cavity and mechanical modes, respectively, is the mechanical resonance frequency, and is the single-photon radiation pressure coupling strength. In addition, we consider the experimentally relevant situation where the cavity mode is coupled to a thermal bath near zero temperature characterized by the collapse operator , while the mechanical mode is in a nonzero thermal environment with associated collapse terms and . The numerical parameters, in units of are: , , , , , and , while the operator ordering is and . Here we fix the number of cavity states to while varying the number of states for the mechanical resonator.

Given the reliance of each solution method on LU or iLU factorization, and the importance of the sparse matrix structure on the fill-in and stability of these solvers, in Fig. 1 we plot the matrix structures (non-zero elements) for the shifted Liouvillian used in the inverse-power method for the three representative systems using natural, RCM, and COLAMD ordering. In addition, we give the bandwidth and profile reduction factors, and respectively, when using RCM ordering. In all three cases, the matrix structure is sufficiently close to symmetric such that RCM ordering reduces both the bandwidth and profile for each of the shifted system Liouvillian operators. As such, we would expect that LU factorization using this ordering results in fewer nonzero elements, and thus outperforms the naturally ordered Liouvillian in terms of both runtime and memory consumption. Similar reasoning applies in the case of iLU factorization, however the stability of the approximate inverse is also critical, and both the structure and numerical values of the elements play a crucial role. The comparison to COLAMD reordering is not as straightforward, and must be evaluated by performing the numerical factorization.

In Fig. 2 we present the fill factors and solution times for each system using the inverse-power method, where the factorization of is performed using LU factorization using natural, RCM, or COLAMD ordering, and the doubly-iterative inverse-power solver based on iLU factorization employing RCM and COLAMD permutations using both GMRES and BiCGSTAB solvers. In addition, the bandwidth and profile reduction factors, and in the case of iLU factorization condition estimates for the approximate inverse , are also given. In all cases, we set the solution tolerance for the inverse-power solver to . All simulations were carried out using the QuTiP framework [41, 42, 43], where we have used the default setting for the various solvers save for the allowed fill-in that we set to so as to accommodate all of the simulation results using a single set of parameters.

Figure 2 highlights the benefit of both RCM and COLAMD ordering of the shifted Liouvillian operator in LU factorization for each system. For the largest Hilbert space dimensions considered here, the reduction in fill-in corresponds to a factor of five or more savings in terms of the memory requirements for direct LU factorization. iLU factorizations reduce these memory requirements even further, by nearly another order of magnitude, except in case of the Jaynes-Cummings model, where RCM ordering gives a larger fill factor even though iLU factorization should result in fewer nonzero elements. COLAMD iLU factorization also does no better than the full LU factorizations. This suggests that the default values used in the creation of the preconditioner are ill-suited for this particular system. In contrast, RCM ordering of the spin-chain Liouvillian reduces the memory footprint of factorization by over a factor of twenty compared to natural ordering, and a factor of two reduction when compared with COLAMD iLU factorization using the same parameters.

Given that the solution time is proportional to the number of nonzero elements in the LU or iLU factors, the solution times for each method are largely in step with the fill factors generated for each reordering. The biggest gains in performance occur for the driven spin chain where LU factorization using RCM reordering outperforms COLAMD by nearly a factor of six, and natural ordering by over an order of magnitude. For iterative solutions, the solution times when using the BiCGSTAB method are generally lower than those using the GMRES solver for each ordering indicating that the BiCGSTAB method achieves convergence in a relatively few number of iterations. For the driven spin-chain with spins, both GMRES and BiCGSTAB iterative methods fail to converge to the requested tolerance value when using COLAMD ordering. This lack of converge can be understood by examining the condition number estimate for the approximate inverse, Fig. 2e, that suggests that the computed inverse is too ill-conditioned to reach the requested level of accuracy. The relatively lower condition number achieved using RCM ordering is in line with earlier investigations [16] that suggest that the stability provided by this ordering plays a key role in the success of iterative methods for open quantum systems. Note that at smaller system sizes, the conditioning of the approximate inverse is not as crucial as in the large dimensional case as the system Liouvillian itself is better conditioned for relatively smaller Hilbert space sizes.

Turning now to direct solutions to Eq. (14), in Fig. 3 we plot the matrix structure for the modified Liouvillian for each example system, and in particular, highlight the location of the matrix elements corresponding to the trace preserving matrix that gives the large upper bandwidth in the naturally ordered Liouvillian operators. In addition we give the bandwidth and profile reduction factors when using RCM ordering for the modified Liouvillian. For both the Jaynes-Cummings and optomechanical modified Liouvillians, the total bandwidth and profile after RCM permutation are nearly the same as those for the shifted Liouvillian, and the introduction of should not greatly affect the factorization properties of these systems. In stark contrast, the bandwidth of the driven spin-chain modified Liouvillian is nearly four times greater than the shifted Liouvillian, Fig. 1e, as the need to accommodate the addition of the trace preserving matrix, and the associated loss of symmetry, reduces the effectiveness of RCM ordering. In this case, we expect the solution time to be markedly longer than that of the inverse-power method.

Finally in Fig. 4 we present the fill-in and solution times for solving the steady-state density matrix calculated from the direct LU and iterative iLU factorizations of the modified Liouvillian (14) for the three representative systems. The results for fill-in closely parallel those in Fig. 2, however the fill-in for the iLU factorization using RCM ordering of the Jaynes-Cummings Liouvillian is more stable when varying the number of cavity states. The benefit of direct LU factorization using RCM reordering for the driven spin-chain is also reduced due to the larger bandwidth, however it is still a factor of two lower than that of COLAMD. The RCM LU and iLU factorization for the optomechanical system still perform well due to the negligible increase in bandwidth and fill-in compared to those used in the inverse-power method.

Once again, the solution times when using complete LU factorization are directly determined by the fill-in, and are longer than those of the inverse-power method as the addition of the matrix increases the total number of nonzero elements in the modified Liouvillian. that, in combination with higher fill-ins, gives an increased runtime. The large increase in bandwidth seen for the driven spin-chain in turn leads to longer solution times that, for , are four times longer than the inverse-power methods. In cases such as this, it is advantageous to compare the RCM bandwidth for the shifted and modified Liouvillians, and pick the appropriate solution method based on the result. Fortunately, calculating the RCM ordering and bandwidth takes only a fraction of the total computation time, and can readily be checked before performing the more costly factorization step.

For iterative iLU methods, the results presented in Fig. 4 indicate that the preferred Krylov solver, in terms of runtime, is now the GMRES method. This is an indication that the number of iterations needed to achieve our tolerance goal using the BiCGSTAB method, is computationally more intensive than building the (the default value in QuTiP) Krylov subspace for the GMRES solver and using fewer iterations. In addition, the BiCGSTAB method completely fails for the optomechanical system as the residual vector computed during the iteration process becomes orthogonal to the original residual, leading to a break down in the algorithm [39]. The iterative techniques using iLU factorization based on COLAMD ordering once again fail for the driven spin-chain when due to the poor conditioning of the approximate inverse. Although not seen here, the condition number of also becomes important for large dimensional optomechanical systems where iLU factorization was shown to fail for COLAMD ordering, but converged to machine precision tolerance when using RCM [16]. In contrast, RCM ordering is found to lead to convergence to the requested tolerance value for the GMRES method in all cases, and is now the fastest solution method for the spin-chain and optomechanical systems at large Hilbert dimensions. Moreover, the memory savings obtained from iLU factorization using RCM is close to an order of magnitude smaller than full LU factorization using any reordering. In contrast, the direct solution of the Jaynes-Cummings model using COLAMD is still the optimal method. Here, the dimensionally of the underlying Hilbert space is never large enough for iterative methods and matrix permutation to overcome the intrinsically fast performance of direct LU factorization on small systems that is also seen in the other two systems.

## 7 Conclusion

We have examined the use of available numerical solution methods and matrix reordering strategies for solving for the steady-state density matrix for several common time-independent systems found in quantum optics and related sub-disciplines. In addition, we have introduced a doubly-iterative inverse-power method, making use of the fast, usually single step, convergence of the inverse-power method to replace full LU factorization with an iLU decomposition. These solution techniques were tested on several standard quantum optical systems were it is found that iterative methods based on RCM reordering of the system Liouvillian outperform techniques based on direct LU factorization for large Hilbert space dimensions. Moreover, iterative solvers using COLAMD ordering fail at large dimensions due to poor conditioning of the approximate inverse, while those based on RCM remain stable. Provided that the RCM bandwidth of the modified Liouvillian is nearly the same as that of the shifted Liouvillian then preconditioned GMRES solvers using should be the first choice of iterative method as one does not need to worry about eigenvalue conditioning. If instead, bandwidth reduction becomes hampered, due to the introduction of the trace preserving matrix , then the doubly-iterative inverse-power method can be employed. Regardless of which iterative method is used, finding an appropriate drop-tolerance and fill-factor for the approximate inverse must still be done on a trial and error basis, although the default values used in QuTiP are a good starting point.

In contrast to iterative methods, a clear understanding of the optimal matrix permutation method for direct LU factorization is still lacking. Indeed, there is no a priori information by which to judge whether RCM or COLAMD will be the best choice in terms of fill-in, or perhaps if some other as yet unstudied reordering will be most efficient. Although these orderings give fill-ins that are usually on par with each other, for large quantum systems, the choice of permutation can mean the difference between finding a solution and running into memory limitations. Therefore, understanding how the Liouvillian matrix structure affects the fill-in in each of these methods is a critical area of future research. Likewise, exploring additional iLU reorderings based on BFS (like RCM), or the related depth-first search, may give enhanced bandwidth and profile reduction while maintaining good conditioning of the approximate inverse [35, 36]. As with the results presented here, this line of endeavors will help to extend the applicability of classical solution methods to the steady-state density matrix to ever larger quantum systems in absence of a quantum computer.

## Acknowledgments

P.D.N was supported by startup funding from Korea University.

## References

## References

- [1] Breuer H P and Petruccione F 2002 The Theory of Open Quantum Systems (Oxford)
- [2] Clerk A A, Devoret M, Girvin S M, Marquardt F and Schoelkopf R 2010 Rev. Mod. Phys. 82 1155
- [3] Wiseman H and Milburn G J 2010 Quantum Measurement and Control (Cambridge)
- [4] Grimsmo A L and Parkins S 2013 Phys. Rev. A 87 033814
- [5] Majumdar A, Kaer P, Bajcsy M, Kim E D, Lagoudakis K G, Rundquist A and Vuckovic J 2013 Phys. Rev. Lett. 111 027402
- [6] Nation P D 2013 Phys. Rev. A 88 053828
- [7] Reiter F, Tornberg L, Johansson G and Sorensen A S 2013 Phys. Rev. A 88 032317
- [8] Chen F, Li J, Armour A D, Brahimi E, Stettenheim J, Sirois A J, Simmonds R W, Blencowe M P and Rimberg A J 2014 Phys. Rev. B 90 020506(R)
- [9] Hush M R, Li W, Genway S, Lesanovsky I and Armour A D 2014 arXiv:1412.1863 (unpublished) URL http://arxiv.org/abs/1412.1863
- [10] Killoran N, Huelga S F and Plenio M B 2014 arXiv:1412.4136 (unpublished)
- [11] Lörch N, Qian J, Clerk A A, Marquardt F and Hammerer K 2014 Phys. Rev. X 4 011015
- [12] Prado F O, Rosado W, de Moraes Neto G D and Moussa M H Y 2014 Euro. Phys. Lett. 107 13001
- [13] Schlawin F, Gessner M, Mukamel S and Buchleitner A 2014 Phys. Rev. A 90 023603
- [14] Weimer H 2014 arXiv:1409.8307 (unpublished)
- [15] Holland E T, Vlastakis B, Heeres R W, Reagor M J, Vool U, Leghtas Z, Frunzio L, Kirchmair G, Devoret M H, Mirrahimi M and Schoelkopf R J 2015 arXiv:1504.03382 (unpublished) URL http://arxiv.org/abs/1504.03382
- [16] Nation P D, Johansson J R, Blencowe M P and Rimberg A J 2015 Phys. Rev. E 91 013307
- [17] Trefethen L N and Bau III D 1997 Numerical Linear Algebra (SIAM)
- [18] Saad Y 1989 Comput. Phys. Commun. 53 71
- [19] Chen T Y and Demmel J W 2000 Lin. Alg. Appl. 309 261
- [20] Grimsmo A L 2013 (private communication)
- [21] Lehoucq R B, Sorensen D C and Yang C 1997 ARPACK users guide: Solution of large scale eigenvalue problems by implicitly restarted arnoldi methods URL http://www.caam.rice.edu/software/ARPACK/
- [22] Davis T A 2006 Direct Methods for Sparse Linear Systems (SIAM)
- [23] George A and Liu J W H 1981 Computer Solution of Large Sparse Positive Definite Systems (Prentice-Hall)
- [24] Saad Y 2003 Iterative Methods for Sparse Linear Systems 2nd ed (SIAM)
- [25] Yannakakis M 1981 SIAM J. Algebraic Discrete Methods 2 77
- [26] Cuthill E and McKee J 1969 ACM ’69 Proceedings of the 1969 24th national conference
- [27] Gibbs N E, Poole W G and Stockmeyer P K 1976 SIAM J. Numer. Anal. 13 236
- [28] Davis T A, Gilbert J R, Larimore S I and Ng E G 2004 ACM Trans. Math. Softw. 30 353
- [29] Demmel J W, Eisenstat S C, Gilbert J R, Li X S and Liu J W H 1999 SIAM J. Matrix Analysis and Applications 20 720
- [30] MATLAB 2014 version 8.4.0 (R2014b) (Natick, Massachusetts: The MathWorks Inc.)
- [31] Duff I S, Kaya K and Ucar B 2011 ACM Trans. Math. Softw. 38 13
- [32] Chow E and Saad Y 1997 J. Comp. Appl. Math. 86 387
- [33] Duff I S and Koster J 1999 SIAM. J. Matrix Anal. & Appl. 20 889
- [34] Benzi M 2002 J. Comp. Phys. 182 418
- [35] Benzi M, Szyld D B and Duin A V 1999 SIAM J. Sci. Comput. 20 1652
- [36] Bridson R and Tang W P 2000 SIAM J. Sci. Comput. 22 1527
- [37] Saad Y and Schultz M H 1986 SIAM J. Sci. Comput. 7 856
- [38] Van der Vorst H A 1992 SIAM J. Sci. Stat. Comput. 13 631
- [39] Barrett R, Berry M, Chan T F et al. 1993 Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods (Society for Industrial and Applied Mathematics)
- [40] Aspelmeyer M, Kippenberg T J and Marquardt F 2014 Rev. Mod. Phys. 86 1391
- [41] Nation P D and Johansson J R 2015 http://qutip.org URL http://qutip.org
- [42] Johansson J R, Nation P D and Nori F 2012 Comput. Phys. Commun. 183 1760
- [43] Johansson J R, Nation P D and Nori F 2013 Comput. Phys. Commun. 184 1234