Breaking the 49-Qubit Barrier in the Simulation of Quantum Circuits
With the current rate of progress in quantum computing technologies, systems with more than 50 qubits will soon become reality. Computing ideal quantum state amplitudes for devices of such and larger sizes is a fundamental step to assess their fidelity, but memory requirements for such calculations on classical computers grow exponentially. In this study, we present a new approach for this task that extends the boundaries of what can be computed on a classical system. We present results obtained from a calculation of the complete set of output amplitudes of a universal random circuit with depth 27 in a 2D lattice of qubits, and an arbitrarily selected slice of amplitudes of a universal random circuit with depth 23 in a 2D lattice of qubits. Combining our methodology with other decomposition techniques found in the literature, we show that we can simulate -qubit random circuits to arbitrary depth by leveraging secondary storage. These calculations were thought to be impossible due to memory requirements; our methodology requires memory within the limits of existing classical computers.
1 Overview of the paper
In the last few years, significant technological progress has enabled quantum computing to evolve from a visionary idea [feynman1982simulating] to reality [NatureNewsIBMQ, castelvecchi2017quantum]. With existing devices now providing – qubits with controllable couplings, the topic of “quantum supremacy” — that is, the ability of a quantum device to perform a computational task beyond what any classical computer can perform — is receiving much attention [harrow2017quantum, preskill2012quantum, boixo2018supremacy, aaronson2016complexity, farhi2016quantum]. Among the many issues faced in advancing quantum computing technology, two are particularly relevant for this paper: (1) the ability to quantify circuit fidelity (e.g., [bishop2017volume, reed2010high, gambetta2015building, boixo2018supremacy, linke2017experimental]), and (2) the ability to assess correctness, performance and scaling behavior of quantum algorithms [Gheorghiu2017verification]. Fundamental to both is the ability to calculate quantum state amplitudes for measured outcomes — a task whose difficulty grows exponentially in the size of the circuit. This paper presents a new methodology for this task that can handle circuits of much larger size than that previously found in the literature. Our approach to calculating quantum amplitudes uses a tensor representation [joshi1995matrices], combining the flexibility of tensor computations with tensor slicing methods, to enable final quantum states of circuits to be calculated in slices instead of having to materialize entire quantum states in memory.
We apply our methodology to universal random circuits constructed according to the rules described in [boixo2018supremacy]. These circuits can be embedded in a 2D lattice of qubits with nearest-neighbor couplings. The depth of such a circuit is the number of layers that the circuit can be partitioned into, in such a way that at any given layer at most one gate acts on a qubit. As in [boixo2018supremacy], we do not count the first layer of Hadamard gates in the depth.
We simulate two quantum circuits: one with depth 27 for a grid of qubits, the other with depth 23 for an grid of qubits. The primary data structures employed in the depth-27 simulation require just over 4.5 Terabytes of main memory to store results, and just over 3.0 Terabytes for the depth-23 simulation. In contrast, existing state-of-the-art techniques [haner2017simulation] would have required 8 Petabytes and 1 Exabyte, respectively. More recent work [boixo2017simulation, boixo2018supremacy, li2018quantum, chen201864, chen2018classical], that followed a preprint of this paper, has simulated larger circuits using similar techniques. We also present extensions of our simulation technique to combine it with the decomposition approaches discussed in [aaronson2016complexity, haner2017simulation]. The goal of these extensions is the simulation of quantum circuits with larger depth than discussed above. We show that single amplitudes of a , depth 46 circuit can in principle be computed on the supercomputer used in our tests in just a few hours using 141 TB of memory, and we propose a simulation strategy that is able to simulate such a circuit to arbitrary depth in reasonable time (e.g., less than a day for a depth-83 circuit) by leveraging secondary storage. We do not test these extensions computationally, but we describe the foundations of the methodology and estimate the computational effort that these experiments would require. The simulation of circuits to arbitrary depth was thought to be out of reach [boixo2018supremacy, li2018quantum, chen2018classical].
The simulation of the , depth 27 and , depth 23 circuits was performed at Lawrence Livermore National Laboratory on the Vulcan supercomputer, an IBM Blue Gene/Q system. We used the simulation on Vulcan to investigate the isotropic properties of the resulting distributions of quantum-state probabilities. Theory predicts that these probabilities should exhibit exponential (a.k.a. Porter-Thomas [porter1956fluctuations]) distributions across the aggregate quantum state [boixo2018supremacy]. However, because the pattern in which these probabilities occur within the final quantum state is also chaotic, the same exponential distribution of probabilities should be observed within small local slices of the state as well. Our simulations confirm this phenomenon.
2 Simulation methodology: preliminaries
The majority of this paper is devoted to a high-level description of the simulation methodology introduced in this paper; further details and an in-depth literature review are provided in the Appendix. The input of a simulation algorithm is a description of a quantum circuit, and a specification of a set of amplitudes (i.e., coefficients of the quantum state obtained at the end of the circuit) that have to be computed; we are mainly interested in the case in which all amplitudes have to be computed, but we will study the case of single amplitude calculations as well.
Our analysis considers a hierarchy of storage devices, which has two levels in its simplest form: primary storage, i.e., RAM, and secondary storage, i.e., disk. Descending the hierarchy increases available space but decreases access speed: the performance of a simulation strategy should take into account space occupancy and number of accesses at all levels of the hierarchy. This distinction is crucial from a practical point of view. For example, the full quantum state of a -qubit system requires complex numbers (8 PB in double precision); this eliminates the possibility of keeping the entire state in primary storage, but does not rule this out for secondary storage. Thus, disk usage must be allowed in order to have enough space to store the simulation output for circuits of this or larger size. The main numerical experiments discussed in this paper employ main memory only to store intermediate results, but the Appendix describes how disk can be used to simulate a -qubit circuit to arbitary depth. Of course, a single amplitude can in principle be computed using linear space and exponential time (i.e., number of floating point operations), using the Feynman path approach, see e.g., [boixo2017simulation]. However, the time requirement of such an approach grows intractable very quickly with increasing circuit size. We desire simulation algorithms with manageable time requirements (i.e., hours – not days), which from a practical standpoint implies a small number of floating point operations per amplitude per gate.
In very broad terms, our approach consists of partitioning a quantum circuit into subcircuits that can be simulated independently and then recombined to obtain the desired result. Of course, a quantum circuit cannot in general be split into subcircuits to be simulated independently due to entanglement. However, when the action of the circuit is represented in a purely algebraic manner, e.g., using a graphical model known as a tensor network [joshi1995matrices, markov2008simulating], it can be verified that arbitrary splitting into subcircuits can be performed, although it may require additional memory to account for entanglement between subcircuits. As a consequence, not all decompositions of a circuit into subcircuits are memory-efficient.
A description of our simulation algorithm requires some preliminary discussion regarding tensor networks. A tensor is a multilinear map (i.e., a multidimensional generalization of a matrix), equipped with a set of indices to address its elements; each index varies within a specified range, which we always assume to be the set in this paper — corresponding to the basis elements for the vector space in which the state of a single qubit lives. The number of indices of a tensor is called its rank. It follows that a rank- tensor requires storage in general. A tensor network is a graph in which each node is associated with a tensor and each edge with an index of the adjacent tensor(s). Edges between nodes represent shared indices that must be summed over. For example, one possible way to represent a matrix-vector multiplication in a tensor network is shown in Fig. 1. The edge between the tensors and represents the operation . A summation over shared indices is called a contraction. Given a tensor, it is natural to distinguish between input indices, associated with a dual space (e.g., the rows of a matrix), and output indices, associated with a primal space (e.g., the columns of a matrix). For example, is a tensor with one input index and one output index .
The tensor network associated with a quantum circuit contains a tensor for each gate, a tensor for the initial state of each qubit, and a tensor for the output state under consideration on those qubit lines for which the outcome or has been specified. Edges follow the qubit lines. In this paper we consider only single and two-qubit gates, without loss of generality since any quantum gate can be approximated to arbitrary precision using only single and two-qubit gates. A single-qubit gate is represented as a rank-2 tensor (one edge entering, one edge leaving the associated node), corresponding to a matrix, and a two-qubit gate is represented as a rank-4 tensor (two edges entering, two edges leaving), corresponding to a matrix.
State-of-the-art methodologies for quantum circuit simulation are for the most part based on the tensor network representation of the circuit discussed as described above, which we will refer to as circuit graph. A seminal work in the area is [markov2008simulating], describing a simulation algorithm that runs in time exponential in the treewidth of the line graph of the circuit graph (the treewidth of the line graph is within a constant factor of the treewidth of the circuit graph). Numerical experiments with this methodology are presented in [fried2017qtorch]. After a preprint of our paper was posted online (i.e., version 1 of this arXiv paper), a series of recent works [boixo2017simulation, li2018quantum, chen201864, chen2018classical] have tackled the problem with related techniques, increasing the size or depth of circuits that can be simulated. The best known upper bounds on the computational complexity of simulating quantum circuits of the class studied in this paper are given in [aaronson2016complexity]. The Appendix discusses the existing literature in more detail.
3 Simulation methodology: main building blocks
We propose an approach that builds on the foundations of tensor networks, with some generalizations to allow for more flexibility in the calculations, especially when we are interested in computing the entire state vector rather than single amplitudes. Three ideas are crucial for our approach: using hyperedges to exploit diagonal gates, allowing contractions between non-adjacent nodes of the tensor network, and slicing. The first two ideas are not discussed in the literature on tensor networks for quantum circuit simulation, to the best of our knowledge; slicing is used in [haner2017simulation]. We describe these ideas below.
Formally, we call a gate diagonal if the associated tensor is nonzero only if for . For example, the identity matrix is a diagonal gate; this is a generalization of the concept of diagonal matrices, translated to tensors. For a diagonal gate, because all elements outside the diagonal are zero, it is convenient to assign the same index label to input and output edges, writing the tensor as . This representation is not only natural, but also more economical in that it reduces the total number of index labels. We remark that in case of a sequence of diagonal gates, the same index label could be associated with multiple edges spanning across multiple nodes of the tensor network. Thus, rather than representing the tensor network as a graph, we use a directed hypergraph, in which each hyperedge consists of an ordered sequence of nodes and is associated with a single index label. A hyperedge consisting of just two nodes is equivalent to an edge in the original tensor network. We give an example of this representation in Fig. LABEL:fig:hypergraph_example.