# Propagating large open quantum systems towards their steady states: cluster implementation of the time-evolving block decimation scheme.

###### Abstract

Many-body quantum systems are subjected to the Curse of Dimensionality: The dimension of the Hilbert space , where these systems live in, grows exponentially with number of their components (’bodies’). However, with some systems it is possible to escape the curse by using low-rank tensor approximations known as “matrix-product state/operator (MPS/O) representation” in the quantum community and “tensor-train decomposition” among applied mathematicians. Motivated by recent advances in computational quantum physics, we consider chains of spins coupled by nearest-neighbor interactions. The spins are subjected to an action coming from the environment. Spatially disordered interaction and environment-induced decoherence drive systems into non-trivial asymptotic states. The dissipative evolution is modeled with a Markovian master equation in the Lindblad form. By implementing the MPO technique and propagating system states with the time-evolving block decimation (TEBD) scheme (which allows keeping the length of the state descriptions fixed), it is in principle possible to reach the corresponding steady states. We propose and realize a cluster implementation of this idea. The implementation on four nodes allowed us to resolve steady states of the model systems with spins (total dimension of the Hilbert space ).

## I Introduction

Many-body systems are at the focus of the current research in theoretical and experimental quantum physics. In addition to their fundamental importance for quantum thermodynamics and information nature (), these systems are perspective from the technological point of view; e.g., all manufactured (by now) quantum computers are based on arrays of interacting superconducting qubits tech2 ()).

All real-life quantum systems are open, meaning that they interact – to a different extent – with their environments book (). This ’action from outside’, termed “decoherence“ or ”dissipation“, works together with the unitary evolution stemming from system’s Hamiltonians and, on large time scales, these joint efforts result in the creation of an asymptotic stationary (steady) state. The evolution of an open quantum system towards its steady states is usually modeled with a Markovian master equation, which describes the dynamics of the system density operator , book (). Formally, similar to the Schroedinger equation used to describe unitary evolution of an isolated quantum system, this is a linear differential equation which can be solved numerically, e. g., by diagonalizing generator of evolution .

However, computational studies of many-body quantum systems are limited by the so-called Course of Dimensionality: the total length of description (number of parameters required to specify a state) of an isolated quantum system consisting of components (spins, qubits, ions, etc.), each one with degrees of freedom, scales as . To specify an arbitrary state of a system of qubits one needs complex-valued parameters. This exceeds the memory capacity of the supercomputer “Titan”titan (). In the case of open quantum systems, the complexity squares: to describe a density operator one needs real-valued parameters.

This is a famous problem in modern data science – manipulations (or even simply storing) with data tensors becomes impossible when the data are sorted in high-dimensional spaces. The attempts to break the curse led to the development of a variety of low-rank tensor approximation algorithms tensor_decompose_review (). These algorithms are used now in signal processing, computer vision, data mining, and neuroscience tensor1 (). The most robust algorithms are based on Singular Value Decomposition (SVD), and one particularly efficient for multilinear algebra manipulations is the so-called Tensor-Train (TT) decomposition oseled (). In physical literature, it is commonly referred to as Matrix Product State (MPS) [or Matrix Product Operator (MPO)] representation sh (); schollw (). While these two names are used simultaneously (though in different fields), the underlying mathematical structure is the same hag (). The MPS/MPO/TT approach allows to reduce descriptions of some many-body states to a linear scaling oseled ().

The MPS/MPO representation allows for effective propagation of quantum many-body systems in time by using the so-called Time-Evolving Block Decimation (TEBD) scheme vidal (). In short, this is a procedure to reduce the description of the state, obtained after every propagation step, to a given fixed length . The accuracy of the propagation is controllable through : If the information is thrown out after the restriction is substantial, the used TEBD propagation is bad and leads to a wrong description. Otherwise, it is good. Some many-body systems ’behave’ well during the TEBD propagation and so the amount of the neglected information is tolerable (we are not going to discuss physical properties underlying such a ’good behavior’ and refer the reader to an extensive literature on the subject; see. e.g., Ref. schollw ()). Important is that the MPO/TT-TEBD scheme can be used to propagate open systems ver () and thus get in touch with the corresponding steady states znid (); NN (). It is crucial therefore to estimate computational resources needed for the realization of this program. Here we report the results of our studies in these directions.

## Ii The algorithm

### ii.1 Tensor-Train Decomposition

Here we mainly follow works oseled () and sh (); for more details, we refer the interested reader to them.

We start with a -dimensional complex-valued tensor with . By gluing together indices we obtain a matrix to which we apply then SVD (henceforth we use notation without Hermitian conjugation for the last matrix in the decomposition)

(1) |

where and are unitary matrices and is a diagonal matrix with entries being singular values . We assume that singular values are sorted in descending order, . Using diagonal structure of ,, where are singular values, and reshaping as a tensor indexed by , we get

(2) |

Repeating the same “reshape-SVD-reshape” procedure for and continue further iteratively, we arrive at the TT representation,

(3) |

One may interpret this structure as a “train” (see Fig. 1a) of ’s that encode local structure in each dimension, and ’s that quantify correlations between them. Each is an array of matrices with restrictions with boundary conditions . Thus, the dimensions of the matrices are , which corresponds to the full representation with complex parameters. When SVD is performed, one can keep only certain singular values based on the approximation criterion. One possibility is to discard all values smaller than a fixed number. An alternative approach is to introduce a so-called bond dimension , a cut-off value such that on each bound only singular values , , are kept and the rest are truncated. We use the latter option. Each local approximation procedure on the set of singular value introduces a truncation error

(4) |

One of the main advantages of the TT representation is the simplicity of local convolutions with other tensor structures. Consider an operation acting in -th dimension only,

(5) |

After substitution in Eq. (3), one could see that this convolution only affects the corresponding tensor,

(6) |

Operations involving multiple dimensions (indices), especially distant, destroy TT structure, so additional procedures are required to restore it. Generally, it would imply effectively the same procedure as initial decomposition. However, in this paper, we only use convolutions involving two neighboring indices. Thus, only one local reorthogonalization is required.

Consider an operation :

(7) |

It affects a pair of the corresponding tensors,

(8) |

To perform reorthogonalization, we introduce matrix . Next we reshape indexes, perform SVD, and reshape indexes back vidal_tebd ():

(9) |

Such operation takes only steps and this scaling does not depend on . Moreover, it modifies only a pair of relevant tensors so multiple pairwise operations that act in independent subspaces may be parallelized. The TT representation allows also for fast realizations of other algebraic operations: calculation of partial or full traces, norm, scalar products, additions, etc oseled ().

### ii.2 Tensor-Train Propagation

The TT representation provides a basis for an approximate tensor propagation algorithms. Here we use Time-Evolving Block Decimation (TEBD) scheme vidal_tebd (); ver (), which was specifically designed for quantum systems but applicable also in the general case. Consider a tensor flow governed by an evolution generator consisting only of operations acting on one or two adjacent dimensions,

(10) |

We use standard time discretization to iteratively integrate this equation (starting from some initial tensor). In terms of operations the solution reads

(11) |

As operators generally do not commute, we have to approximate the matrix exponents. A way to minimize the error is to separate the operators into groups as large as possible such that all the operators belonging to one group commute with each other. All one-dimension operators commute by default, and two-dimension acting on odd/even pairs commute within their oddity groups; see Fig. 1b. We utilize this fact and use modified second order Suzuki-Trotter decomposition schollw ():

(12) | |||

(13) |

Note that the standard approach is to adsorb one-index operators into , but we find it numerically beneficial to separate them. Each factor of and can be calculated by using Eqs. (6) and (8) respectively. Furthermore, as they commute by construction, corresponding computation can be parallelized. Each two-index operator may include a cut-off if after the reorthogonalizationm Eq. (9), the number of singular value exceeds bound dimension . Corresponding accumulated truncation error is then calculated as a sum of local errors (4) over all the operation during evolution up to time ,

(14) |

Computations are dominated by SVD, so resulting complexity is , where is the bond dimension. With cores available, it becomes and thus the computational task is perfectly scalable.

### ii.3 Lindblad Equation

We apply both TT and TEBD methods to evolve numerically many-body open quantum models. The state of such systems is described by a density matrix of the size , where is the number of particles/spins and is number of the local states, which we put to for a -spins that we consider in the paper. Evolution of a quantum system in contact with the environment is governed by Lindblad equation book ()

(15) |

where is the Lindblad superoperator consisting of conservative and dissipative parts, is Hamiltonian, are dissipation operators, and are corresponding dissipation rates. There is a stationary state solution for any Lindblad superoperator which is unique (aside of special cases of symmetries which we do not address here).

Many-body density operator can be represented as an -dimensional tensor where every pair of indexes (each one runs from to ) correspond to the -th qubit/spin. The models we consider include only one- and two-particle interactions, which thus involve up to four indexes of . To overcome this, we use vectorization procedure ver () and glue together indexes of each particle forming – -dimension tensor with each dimension going from to . This allows applying TEBD scheme (II.2) as long as and in (15) do not couple more than two particles. However, as we restrict the accessible space to the bond dimension , evolution can only start from initial MPO conditions belonging to this space. In the models considered further, we use an extreme case of product initial states, , ; as we aim at the stationary states, we have complete freedom when choosing initial conditions (though other choices can be more beneficial from the relaxation-speed point of view).

## Iii Model system

As test-bed models, we use spin chains from Refs. znid (); NN (). Here we only briefly described them.

Both chains consist of spins. Hamiltonian part of evolution is governed by the Heisenberg XXZ model with local disordered potential

(16) |

where are Pauli matrices, is interaction strength parameter and are uncorrelated random value uniformly distributed in . Open boundary conditions are used.

In Ref. znid (), a disordered spin chain with next-neighbor coupling and two âthermal reservoirsâ – each one represented by pair of Lindblad operators causing excitation (relaxation) and acting on the two end spins, , – were considered,

(17) |

where is bias responsible for the formation of non-equilibrium steady state with directed non-zero current. In the limit , the stationary state is an infinite temperature ’(maximally mixed’) state, that is the normalized identity. By assuming , one could address the linear-response regime. The transport of the spin charge through a chain in the stationary regime was considered and the current scaling with was estimated.

The authors managed to achieve a model size of , which is an unprecedented size for many-body open quantum models. The complexity of the computational experiments is increased by the fact that the systems needed a considerable propagation time in order to reach the stationary state (this is because the dissipation was acting at the chain ends only). Finally, to obtain scaling dependencies, averaging over disorder realizations was performed. At the same time, this work provides little detail about the resources used for numerical simulations. We considered the obtained results as a challenge and decided to reproduce them – at least for .

As an additional test-bed, we use a model from another recent work NN (). In this paper, disordered spins chains in which all spins are subjected to the action of dissipation were considered. Dissipative terms couple each pair of spins,

(18) |

which try to make neighboring spins oscillating out of phase (‘anti-synchronization’). The maximal size of the models used in numerical simulations, reported in this work, was . Among other characteristics, scaling of the so-called operational entanglement entropy Prosen2007 () was considered. We use this quantifier in our numerical experiments, in which we tried to reach .

## Iv Implementation

The method described in Section II is implemented as shown in Algorithm 1. The algorithm is implemented using the C++ programming language. We found, that the matrix operations (mainly SVD) are the most time-consuming parts of the algorithm. In this regard, we employ the Armadillo software library integrated with highly optimized mathematical routines from the Intel Math Kernel Library to improve performance. Finally, Armadillo/MKL routines take about 50-80% of computation time during the propagation step depending on the current system state.

The algorithm assumes performing a set of integration operations for individual components of the system at every time step. These operations are not independent but can be ordered according to their dependencies for the organization of parallel computations. In particular, all one- and two-particle interactions can be performed completely in parallel.

The cluster parallelization is done by using the MPI technology. We apply the classic master-worker scheme for parallelization of the algorithm. For that, the single managing MPI-process (master) forms separate tasks for one– and two–particle interactions, monitors their dependencies from each other and readiness, distributes tasks to all other processes (workers) and accumulates the results.

All computational experiments have been done on the Lobachevsky cluster with a -core Intel Xeon CPU E5-2660, 2.20GHz, 64 GB RAM, Infiniband QDR interconnect. The code was compiled with the Intel C++ Compiler, Intel Math Kernel Library and Intel MPI from the Intel Parallel Studio XE suite of development tools and the Armadillo library.

We integrate model from Ref. znid () with following parameters: , , , . Parallel code was run on four computational nodes of the cluster (1 MPI-process per CPU core, 64 MPI-processes overall). Total computation time was 143 s. The resulted diagram for the distribution of computational and communication functions run time is presented in Fig. 2. It is shown that the calculations are fairly well balanced, which is an undoubted advantage of the parallelization scheme. However, MPI communications take a significant part of the computation time, while further increasing the number of cluster nodes used will not significantly speed up the calculations, which is a limitation of the scheme. Computational efficiency (ratio of computation time to total execution time) was 47%.

## V Results

We find that it is possible to reproduce - with high accuracy – the results reported in Ref. znid () by using bond-dimension . On Figure. 3 we present a comparison of the results of the sampling we perform with our code (big circles; yellow, red and green) with the results by Žnidaric̆ and his co-authors. We use propagation step and propagate every system up to , irrespectively of its size. It is not the most optimal way to sample (for example, it would be more effective to determine arrival of the system at the asymptotic state by monitoring the value of the spin current); however, at this stage, we tried to make the sampling procedure as simple as possible. For every value of and disorder strength , we additionally performed averaging for disorder realizations. Each realization took from minutes to hours depending on the size of the system and involved up to two nodes (for large system sizes, ).

For the model from Ref. NN () we study the dynamics of the operator entanglement entropy (for a fixed disorder realization) for different bond dimensions. We found that constitutes a threshold value after which the asymptotic entropy does not change upon further increase of the bond dimension. The calculation time for this value of bond dimension was four weeks of continuous propagation on four computing nodes.

We analyze also the evolution of accumulated error 14 in this case. It is noteworthy that saturation of the operator entropy, which signals the arrival to the asymtotic state, is not accompanied by the saturation of the error. The latter continues to grow in a power-law manner, see the dashed line in Fig. 4b. This means that MPO states – even with – are different from the genuine steady state of the model [which is the zero-value eigenelement of the corresponding Lindbladian (16 - 18)].

## Vi Conclusions

We presented a parallel implementation of the MPO-TEBD algorithm to propagate many-body open quantum systems. Parallelization is performed using the MPI technology and employs the master-worker scheme for computational tasks distribution. High-performance implementations of linear algebra from the Intel MKL were used to better utilize computational resources of modern hardware.

A series of numerical experiments was performed to determine the accuracy and limits of applicability of the developed code. In particular, the effect of the number of SVD numbers kept after each propagation step (bond-dimension ) on the accuracy of the method was investigated. We found that threshold value after which saturation of the relevant characteristics is observed and fuhrer increase of bond dimension does not change their values.

The performance tests on the Lobachevsky cluster demonstrated that MPI processes running on four computational nodes is the optimal configuration for the model systems with spins. As a next step, we plan to explore the possibility of further improvements of the parallelization by reducing the communications and increasing the efficiency of using computational resources. After that, we hope to reach the limit with the test-bed models.

## Vii Acknowledgments

The authors acknowledge support of the Russian Foundation for Basic Research and the Government of the Nizhni Novgorod region of the Russian Federation, grant # 18-41-520004. IV acknowledges support by the Institute for Basic Science, Project Code (IBS-R024-D1), and by the Korea University of Science and Technology Overseas Training program.

## References

## References

- (1) J. Eisert, M. Friesdorf, and C. Gogolin, Nature Phys. 11, 124 (2015).
- (2) J. M. Gambetta, J. M. Chow, and M. Steffen, npj Quant. Inf. 3, 2 (2017).
- (3) Breuer, H.-P., and F. Petruccione, The Theory of Open Quantum Systems (Oxford University Press, Oxford, 2002).
- (4) https://en.wikipedia.org/wiki/Titan (supercomputer)
- (5) T. G. Kolda and B. W. Bader, SIAM Rev. 51, 455 (2009).
- (6) A. Cichocki, Era of Big Data processing: a new approach via Tensor Networks and Tensor Decompositions, https://arxiv.org/abs/1403.2048.
- (7) I. V. Oseledets, SIAM J. Sci. Comput. 33, 2295 (2011).
- (8) D. Perez-Garcia, F. Verstraete, M.M. Wolf, J.I. Cirac, Quantum Inf. Comput. 7, 401 (2007); U. Schollwöck, Ann. Phys. 326, 96 (2011).
- (9) U. Schollwoeck, Ann. of Phys. 326(1) (2010).
- (10) J. Haegeman, C. Lubich, I. Oseledets, B. Vandereycken, and F. Verstraete, Phys. Rev. B 94, 165116 (2016).
- (11) G. Vidal, Phys. Rev. Lett. 91, 147902 (2003).
- (12) M. Zwolak and G. Vidal, Phys. Rev. Lett. 93, 207205 (2004).
- (13) M. Žnidaric̆, A. Scardicchio, and V. K. Varma, Phys. Rev. Lett. 117, 040601 (2016).
- (14) I. Vakulchyk, I. Yusipov, M. Ivanchenko, S. Flach, and S. Denisov, Phys. Rev. B 98, 020202(R).
- (15) G. Vidal, Phys. Rev. Lett. 91(14) (2003)
- (16) T. Prosen and I. Pižorn, Phys. Rev. A. 76, 032316 (2007).