Validating Quantum-Classical Programming Models with Tensor Network Simulations

Validating Quantum-Classical Programming Models with Tensor Network Simulations

Alexander McCaskey Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States    Eugene Dumitrescu Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States    Mengsu Chen Department of Physics, Virginia Tech, Blacksburg, Virginia 24060, United States    Dmitry Lyakh Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States    Travis S. Humble Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, Tennessee 37996, United States
July 2, 2019

The exploration of hybrid quantum-classical algorithms and programming models on noisy near-term quantum hardware has begun. As hybrid programs scale towards classical intractability, validation and benchmarking are critical to understanding the utility of the hybrid computational model. In this paper, we demonstrate a newly developed quantum circuit simulator based on tensor network theory that enables intermediate-scale verification and validation of hybrid quantum-classical computing frameworks and programming models. We present our tensor-network quantum virtual machine (TNQVM) simulator which stores a multi-qubit wavefunction in a compressed (factorized) form as a matrix product state, thus enabling single-node simulations of larger qubit registers, as compared to brute-force state-vector simulators. Our simulator is designed to be extensible in both the tensor network form and the classical hardware used to run the simulation (multicore, GPU, distributed). The extensibility of the TNQVM simulator with respect to the simulation hardware type is achieved via a pluggable interface for different numerical backends (e.g., ITensor and ExaTENSOR numerical libraries). We demonstrate the utility of our TNQVM quantum circuit simulator through the verification of randomized quantum circuits and the variational quantum eigensolver algorithm, both expressed within the eXtreme-scale ACCelerator (XACC) programming model.

thanks: This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.

I Introduction

Quantum computing is a computation paradigm that relies on the principles of quantum mechanics in order to process information. Recent advances in both algorithmic research, which has found remarkable speed-ups for a growing number of applications Childs and van Dam (2010); Montanaro (2016); Biamonte and Bergholm (2017), and hardware development Linke et al. (2017); Friis et al. (2018) continue to progress the field of quantum information processing. The near-term state of quantum computing is defined by the noisy intermediate-scale quantum (NISQ) paradigm which involves small-scale noisy quantum processors Preskill (2018) being used in a hybrid quantum-classical framework. In this context, recent experimental demonstrations Peruzzo et al. (2014); O’Malley et al. (2016a); Kandala et al. (2017); Otterbach et al. (2017); Dumitrescu et al. (2018) of hybrid computations have reinforced the need for robust programming models and classical validation frameworks.

The successful integration of quantum processors into conventional computational workloads is a complex task which depends on the programming and execution models that define how quantum resources interact with conventional computing systems Humble and Britt (2016); Britt and Humble (2017a). Many different models have been proposed for programming quantum computers and a number of software development efforts have begun focusing on high-level hybrid programming mechanisms capable of integrating both conventional and quantum computing processors together Green et al. (2013); Javadi-Abhari et al. (2014); Wecker and Svore (2014); Humble et al. (2014); Smith et al. (2016a); Liu et al. (2017); Svore et al. (2018); PAKIN (2018). For example, recent efforts have focused on Python-based programming frameworks enabling the high-level expression of quantum programs in a classical context, which may target numerical simulators or a variety of physical quantum processing units (QPUs) Smith et al. (2016b); Steiger et al. (2016); Cross et al. (2017). The eXtreme-scale ACCelerator programming model (XACC) is a recently-developed quantum-classical programming, compilation, and execution framework that enables programming across multiple languages and targets multiple virtual and physical QPUs McCaskey et al. (2017).

In all cases, the verification of quantum program correctness is a challenging and complex task due to the intrinsically noisy nature of near-term QPUs, and this is additionally complicated by remote hosting. As a remedy, numerical simulation techniques can greatly expedite the analysis of quantum-classical programming efforts by providing direct insight into the prepared quantum states, as well as serving to test a variety of quantum computing hardware models. Modeling and simulation is essential for designing effective program execution mechanisms because it provides a controlled environment for understanding how complex computational systems interact, subsequently generating feedback based on the state machine statistics. For example, the performance of existing QPUs is limited by the hardware connectivity Linke et al. (2017) and numerical simulations can draw on a broad range of parameterized models to test new processor layouts and architectures.

In practice, exact brute-force simulations of quantum computing are notoriously inefficient in memory complexity due to the exponential growth in resources with respect to system size. These brute-force methods explicitly solve the Schrodinger equation, or a mixed-state master equation, using a full representation of the quantum state in its underlying (exponentially large) Hilbert space. Limits on available memory place upper bounds on the size of the vectors or density operators that can physically be stored, severely restricting the size the simulated quantum circuit. Even with the availability of current large-scale HPC systems, including the state-of-the-art supercomputing systems, recent records for quantum circuit simulations are limited to less than 50 qubits Häner and Steiger (2017); Pednault et al. (2017). The performance of the brute-force quantum circuit simulators on current supercomputing architectures is also limited by the inherently low arithmetic intensity (Flop/Byte ratio) of the underlying vector operations (sparse matrix-vector multiplications) required for simulating a discrete sequence of one- and two-qubit gates.

The inherent inefficiency of the brute-force state-vector quantum circuit simulators has motivated a search for approximate numerical simulation techniques increasing the upper bound on the number of simulated qubits. As we are interested in general-purpose (universal) quantum circuit simulators, we will omit efficient specialized simulation algorithms that target certain subclasses of quantum circuits, for example, quantum circuits composed of only Clifford operations Aaronson and Gottesman (2004). As a general solution, we advocate for the use of the tensor network (TN) theory as a tool for constructing factorized approximations to the exact multi-qubit wave-function tensor. The two main advantages offered by the tensor-network based wave-function factorization are (1) the memory (space) and time complexity of the quantum circuit simulation reflect the level of entanglement in the quantum system, (2) the numerical action of quantum gates on the factorized wave-function representation results in numerical operations (tensor contractions) which become arithmetically intensive for entangled systems, thus potentially delivering close to the peak utilization of modern HPC platforms.

Ii Quantum Circuit Simulation with Tensor Networks

Tensor network theory Orús (2014); Biamonte and Bergholm (2017) provides a versatile and modular approach for the dimensionality reduction of operators acting in high-dimensional linear spaces. For the following discussion, a tensor is a generalization of a vector and is defined in a linear space constructed as the tensor product of two or more primitive vector spaces. Consequently, the components of a tensor are enumerated by a tuple of indices, instead of by a single index as is the case for vectors. From the numerical perspective, a tensor can be viewed as a multi-dimensional array of objects, which may be real or complex numbers. In this work, following the physics nomenclature, we shall refer to the number of indices in a tensor as its rank (in this case the tensor rank is ). Each index represents a distinct vector space contributing to the composite space by the tensor product. The extent of the range of each index gives the dimension of the vector space. In their essence, tensor networks aim at decomposing higher-rank tensors into a contraction over lower-rank tensors such that the factorized product accurately reconstructs properties of the original tensor (i.e. a variant of lossy compression in linear spaces). Any tensor can be approximated by a suitably chosen tensor network with arbitrary precision, however the size of the tensor factors may grow exponentially in worst case examples. Tensor factorizations, which we also refer to as decompositions, are not unique in general and the problem of finding the optimal tensor decomposition is a difficult non-convex optimization problem Kolda (2015).

In practice, a tensor network factorization is typically specified by a graph in which the nodes are the tensor factors and the edges represent physical or auxiliary vector spaces which are associated with the indices of the corresponding tensor factors. A closed edge, that is, an edge connecting two nodes, represents a contracted index shared by two tensor factors over which a summation is to be performed. In a standalone tensor network, contracted indices are associated with auxiliary vector spaces. An open edge, that is, an edge connected to only one node, represents an uncontracted index of that tensor factor. Uncontracted indices are typically associated with physical vector spaces. Different tensor network architectures differ by the topology of their representative graphs. Furthermore, one can define even more general tensor network architectures by replacing graphs with hypergraphs, in which case an edge may connect three or more tensors. In the subsequent discussion, however, we will mostly deal with conventional graph topologies.

Figure 1: Figure depicting the tensor decomposition of a wavefunction into the MPS form

A quantum many-body wave-function, including a multi-qubit wave-function, is essentially a high-rank tensor (its rank is equal to the number of simulated quantum particles or quasi-particles) Orús (2014). A number of different tensor network architectures have been suggested for the purpose of factorizing quantum many-body wave-functions, including the matrix-product state (MPS) White (1992); Schollwöck (2011), the projected entangled pair-state (PEPS) Schuch et al. (2007); Verstraete et al. (2008), the tree tensor network state (TTNS) Murg et al. (2010); Nakatani and Chan (2013); Dumitrescu (2017), the multiscale entanglement renormalization ansatz (MERA) Vidal (2006); Evenbly and Vidal (2009), as well as somewhat related non-conventional schemes like the complete-graph tensor network (CGTN) Marti et al. (2010). All of the above tensor network ansaetze differ in the factorization topology, that is, in how the tensor factors are contracted with each other to form the final quantum many-body wave-function. In a good tensor network factorization, topology is induced by the entanglement structure of a particular quantum many-body system. Many physical systems are described by many-body Hamiltonians with only local interactions – in many cases, nearest neighbor only – with correlation functions decaying exponentially for non-critical states. In such cases, the locality structure of the many-body Hamiltonian induces the necessary topology required to properly capture the quantum correlations present in the system of interest. The factorization topology also strongly affects the computational cost associated with the numerical evaluation/optimization of a specific tensor network architecture. Another important characteristic of a tensor network is its so-called maximal bond dimension, that is, the maximal dimension of the auxiliary linear spaces (auxiliary linear spaces are those contracted over). Provided that the maximal bond dimension is bounded, many tensor network factorizations can be evaluated with a polynomial computational cost in the bond dimension. In practice, the entanglement structure of the underlying quantum many-body system determines the maximal bond dimension needed for a given error tolerance and a given tensor network topology. A poorly chosen tensor network topology will necessarily lead to rapidly increasing (exponentially at worst) bond dimensions in order to keep the factorization error within the error threshold.

The entanglement structure in a multi-qubit wave-function is determined by the quantum circuit and may be unknown in general. Consequently, there is no well-defined choice of a tensor network architecture (topology) that could work equally well for all quantum circuits, unless it is some kind of an adaptive topology. In practice, the choice of a tensor network architecture for representing a multi-qubit wave-function is often dictated by numerical convenience and ease of implementation. For example, one of the simplest tensor network architectures, the MPS ansatz, was used to simulate Shor’s algorithm for integer factorization Dang et al. (2017). Although the inherently one-dimensional chain topology of the MPS ansatz often results in severely growing bond dimensions, and this can be remedied by a more judicious tensor network formDumitrescu (2017), its computational convenience and well understood theory makes the MPS factorization an appealing first candidate for our quantum virtual machine (quantum circuit simulator). In future, we plan on adding more advanced tensor network architectures.

In order to simulate a general quantum circuit over an -qubit register with the tensor network machinery the following steps will be necessary (see Figure 2):

  1. Specify the chosen tensor network graph that factorizes the rank- wave-function tensor into a contracted product of lower-order tensors (factors).

  2. Transform the quantum circuit into an equivalent quantum circuit augmented with SWAP gates in order to maximize the number of accelerated gate applications (see below). This is an optional step.

  3. Group quantum gates into ordered aggregates (super-gates) which will act as a whole on the qubit wave-function. In the simplest case, all quantum gates will be applied one-by-one in order of appearance, with no aggregation. This is an optional step.

  4. Sequentially apply aggregated super-gates (or individual gates when no aggregation occurred) to the wave-function tensor network, thus evolving it towards the output state.

Figure 2: Graphical illustration of the general quantum circuit simulation algorithm with the qubit wave-function factorized as a tensor network.

In the above general algorithm, the application of a super-gate (or just an individual gate) on a multi-qubit wave-function tensor consists of the following steps:

  1. Append the individual gates constituting the given super-gate to the input wave-function tensor network , thus obtaining a larger tensor network .

  2. If there are 2- or higher-body gates present, check whether they are applied to the qubit pairs or triples, etc. that allow accelerated gate application (for example, in MPS factorization, these would be the adjacent qubit pairs, triples, and so on). If yes, evaluate their action in an accelerated fashion (see below). Otherwise, resort to the general algorithm in the next steps.

  3. Instantiate a new tensor network by cloning .

  4. Close with , thus obtaining a closed tensor network .

  5. Optimize the tensors of to maximize .

  6. If value is not acceptable, increase dimensions of the auxiliary spaces in and repeat Step 5.

Figure 3: Graphical illustration of an accelerated evaluation of the action of a two-body gate on a pair of adjacent qubits in the matrix-product state representation.

In cases where an accelerated gate application is possible (for example, a 2-body gate is applied to the adjacent qubits in the MPS-factorized wave-function), one can restrict the update procedure only to the tensor factors directly affected by the gate action. In case of MPS factorization, in order to apply a 2-body gate to two adjacent qubits one can contract the gate tensor with the two MPS tensors representing the affected qubits and then perform the singular value decomposition (SVD) on the tensor-result, thus obtaining the new (updated) MPS tensors as illustrated in Figure 3.

The above general algorithm demonstrates the procedure used by TNQVM for approximate simulation of quantum circuits based on the tensor network factorization. For the sake of completeness, we should also mention quantum circuit simulators which use tensor representations for a brute-force simulation of quantum circuits with no approximations Pednault et al. (2017); Fried et al. (2017). This is different from our approach which is based on the explicit factorization of the multi-qubit wave-function tensor. In these other tensor-based schemes the entire quantum circuit as a collection of gate tensors is considered as a tensor network which is subsequently contracted over in order to compute observables or evaluate output probability distributions. In Ref. 27, a clever tensor slicing technique was introduced that avoided the evaluation of the full wave-function tensor, thus reducing the memory footprint and bypassing the existing 45-qubit limit on large-scale HPC systems. Yet, despite enabling simulations of somewhat larger qubit counts, this technique does not lift the asymptotic bounds of the exact simulation cost.

Iii Quantum Virtual Machines

In order to evaluate the correctness of a quantum program and its implementation via a decomposition into primitive gate operations, it is necessary to model both the conventional computing and quantum computing elements of the system architecture. In particular, it is necessary to expose the interface to the available instruction set architecture (ISA) and methods to support quantum program execution, scheduling, and layout. There are currently many different technologies available for testing and evaluating quantum processing units, and each of these technologies presents different ISAs and methods for program execution Britt and Humble (2017b).

As shown in Fig. 4, a quantum virtual machine (QVM) provides a portable abstraction of technology-specific details for a broad variety of heterogeneous quantum-classical computing architectures. The hardware abstraction layer (HAL) defines a portable interface by which the underlying quantum processor technology as well as other hardware components such as memory are exposed to system libraries, runtimes and drivers running on the host conventional computer. The implementation of the HAL provides an explicit translation of quantum program instructions into native, hardware-specific syntax, which may be subsequently executed by the underlying quantum processor. The HAL serves as a convenience to ensure portability of programs across different QPU platforms, while the QVM encapsulates the environment in which applications can be developed independently from explicit knowledge of QPU details. This environment is provided by the integration of the HAL with programming tools, libraries, and frameworks as well as the host operating system.

Figure 4: A schematic design how a quantum virtual machine (QVM) manages access to an underlying QPU through the hardware abstraction layer. A program binary exists within an application framework that accesses system resources through libraries. Library calls are managed by the host operating system, which manages and schedules requests to access hardware devices including attached QPUs. The hardware abstraction layer (HAL) provides a portable interface by which these requests are made to the underlying QPU technology.

Application performance within a QVM depends strongly on the efficiency with which host programs are translated into hardware-specific instructions. This includes the communication overhead between the HAL and hardware layers as well as the overhead costs for managing these interactions by the host operating system. Both algorithmic and hardware designs impact this performance by deciding when and how to allocate computational burden to specific devices. Presently, there is an emphasis on the development and validation of hybrid programs, which loosely integrates quantum processing with conventional post-processing tasks. This algorithmic design introduces a requirement for transferring memory buffers between the host and QPU systems. Memory management therefore becomes an important task for application behavior. While current QPUs are often accessed remotely through network interfaces, long-term improvements in application performance will require fine grain control over memory management.

Iv Tensor Network Quantum Virtual Machine

Our implementation of a QVM presented in this work is based on a previously developed hybrid quantum-classical programming framework, called XACC McCaskey et al. (2017), combined with a quantum circuit simulator that uses tensor network theory for compressing the multi-qubit wave-function. We provide an overview of the Tensor Network Quantum Virtual Machine (TNQVM) and its applications, including its software architecture and integration with the XACC programming framework. Since XACC integrates directly with TNQVM, compiled programs can in principle be verified instantaneously on any classical computer including workstations as well as HPC clusters and supercomputers. The support of different classical computer architectures (single-core, multi-core, GPU, distributed) for performing numerical simulations is achieved by interchangeability of the numerical backends in our TNQVM simulator. These backends are numerical tensor algebra libraries which perform all underlying tensor computations on a supported classical computer. In this work, we detail the HAL implementation of TNQVM using ITensor ite () for serial simulations, with some example applications demonstrating the utility of TNQVM. We also sketch some details on the upcoming ExaTENSOR backend that will enable large-scale quantum circuit simulations on homo- and heterogeneous HPC systems. Independent verification of hybrid programs within TNQVM provides an increased confidence in the use of these codes to characterize and validate actual QPUs.

iv.1 Xacc

The eXtreme-scale ACCelerator programming model (XACC) has been specifically designed for enabling near-term quantum acceleration within existing classical high-performance computing applications and workflows McCaskey et al. (2017, 2018). This programming model and associated open-source reference implementation follow the traditional co-processor model, akin to OpenCL or CUDA for GPUs, but takes into account the subtleties and complexities arising from the interplay between classical and quantum hardware. XACC provides a high-level application programming interface (API) that enables classical applications to offload quantum programs (represented as quantum kernels, similar in structure to GPU kernels) to an attached quantum accelerator in a manner that is agnostic to both the quantum programming language and the quantum hardware. Hardware agnosticism enables quantum code portability and also aids in benchmarking, verification and validation, and performance studies for a wide array of virtual (simulators) and physical quantum platforms.

To achieve language and hardware interoperability, XACC defines three important abstractions: the quantum intermediate representation (IR), compilers, and accelerators. XACC compiler implementations map quantum source code to the IR – the in-memory object key to integrating of a diverse set of languages to a diverse set of hardware. IR instances (and therefore compiled kernels) are executed by realizations of the accelerator concept, which defines an interface for injecting physical or virtual quantum hardware. Accelerators take this IR as input and delegate execution to vendor-supplied APIs for the QPU, or an associated API for a simulator. This forms the hardware abstraction layer, or abstract device driver, necessary for a general quantum (virtual) machine.

The IR itself can be further decomposed into instruction and function concepts, with instructions forming the foundation of the IR infrastructure and functions serving as compositions of instructions (see Figure 5). Each instruction exposes a unique name and the set of qubits that it operates on. Functions are a sub-type of the instruction abstraction that can contain further instructions. This setup, the familiar composite design pattern com (), forms an n-ary tree of instructions where function instances serve as nodes and concrete instructions instances serve as leaves.

Figure 5: Architecture of the XACC intermediate representation demonstrating sub-type extensibility for instructions, and the associated instruction visitor abstraction, enabling runtime-extension of concrete instruction functionality.

Operating on this tree and executing program instructions is a simple pre-order traversal on the IR tree. In order to enhance this tree of instructions with additional functionality, XACC provides a dynamic double-dispatch mechanism, specifically an implementation of the familiar visitor pattern Gamma et al. (1995). The visitor pattern provides a mechanism for adding virtual functions to a hierarchy of common data structures dynamically, at runtime, and without modifying the underlying type. This is accomplished via the introduction of a visitor type that exposes a public set of visit functions, each one taking a single argument that is a concrete sub-type of the hierarchical data structure composition (see Figure 5). For gate model quantum computing, XACC exposes a visitor class that exposes a visit method for all concrete gate instructions (X, H, RZ, CX, etc…). All instructions expose an accept method that takes as input a general visitor instance, and invokes the appropriate visit method on the visitor through double-dispatch. XACC instruction visitors thereby provide an extensible mechanism for dynamically operating on, analyzing, and transforming compiled IR instances at runtime.

iv.2 Tensor Network Accelerator and Instruction Visitors

The integration of a tensor network quantum circuit simulator with XACC can be accomplished through extensions of appropriate XACC concepts. In essence, this is an extension of the quantum virtual machine hardware abstraction layer that enables existing high-level programs and libraries to target a new virtual hardware instance. Injecting new simulators into the XACC framework requires a new implementation of the accelerator concept. Enabling that simulator to be extensible in the type of tensor networks, algorithmic execution, and the library backend requires different mappings of the IR to appropriate simulation data structures. This can be accomplished through individual implementations of the instruction visitor concept.

Our open-source implementation of the Tensor Network Quantum Virtual Machine (TNQVM) library extends the XACC accelerator concept with a new derived class that simulates pure-state quantum computation via tensor network theory McCaskey and Chen (2017). This library provides the TNAccelerator (Tensor Network Accelerator) that exposes an execute method that takes as input the XACC IR function to be executed. Generality in the tensor network graph structure and the simulation algorithm is enabled through appropriate implementations of the instruction visitor concept. For example, an instruction visitor can be implemented to map the incoming XACC IR tree to tensor operations on a matrix product state (MPS) ansatz. Walking the IR tree via pre-order traversal and invoking the instruction visitor accept mechanism on each instruction triggers invocation of the appropriate visit function via double dispatch. The implementation of these visit methods provides an extensible mechanism for performing instruction-specific tensor operations on a specific tensor network graph structure.

Furthermore, this visitor extension mechanism can be leveraged to not only provide new tensor network structures and operations, but also provide the means to leverage different tensor algebra backend libraries, and therefore introduce a classical parallel execution context. Different visitor implementations may provide a strictly serial simulation approach, while others can enable a massively parallel or heterogeneous simulation approach (incorporating the Message Passing Interface, OpenMP, and/or GPU acceleration via CUDA or OpenCL).

To date we have implemented two instruction visitor backends for the TNQVM and the TNAccelerator. We have leveraged the ITensor library ite () to provide a serial matrix product state simulator, and the ExaTENSOR library from the Oak Ridge Leadership Computing Facility (OLCF) to provide a matrix product state simulator that leverages MPI, OpenMP and CUDA for distributed parallel execution on GPU-accelerated heterogeneous HPC platforms. However, the ExaTENSOR library is currently undergoing final testing before its public release, thus it has not been utilized yet as a fully functional backend of TNQVM. Nevertheless, we will provide some details on the ExaTENSOR backend below.

iv.2.1 ITensor MPS Implementation

The ITensor MPS instruction visitor implementation provides a mechanism for the simulation of an -qubit wavefunction via a matrix product state tensor network decomposition. The MPS provides a way to restrict the entanglement entropy through SVD and associated truncation of Schmidt coefficients to reduce the overall Schmidt rank. With these MPS states, we need numbers to represent qubits, where is the largest Schmidt rank we keep. As long as is not too large (grows polynomially with system size), the space complexity is feasible for classical simulation. For example, if the quantum register is used to store the gapped ground states of systems with local interactions, we can simulate larger number of qubits and still adequately approximate the wavefunction by keeping small enough.

Our ITensor MPS visitor implementation begins by initializing a matrix product state tensor network using the serial tensor data structures provided by the ITensor libraryite (). Simulation of the compiled IR program is run through a pre-order tree traversal of the instruction tree. At each leaf of this tree (a concrete instruction), the accept method on the instruction is invoked (see Figure 5) which dispatches a call to the correct visit method of the instruction visitor.

At this point, the appropriate gate tensor is contracted into the MPS representation, which maps onto itself under local quantum gates. Updating the MPS according to two-body entanglers involves two-qubit gates which act on two rank-3 tensors, and the full contraction results in a rank-4 tensor. We maintain the MPS structure by decomposing the rank-4 tensor into two rank-3 tensors and a diagonal matrix between them. Note that when the two qubits are not adjacent we apply SWAP gates on intermediary qubits to bring them together. The gate is then applied and reverse SWAPs bring the qubits back to the original positions. Otherwise, applying a gate to non-adjacent qubits would modify the underlying graph topology, complicating future evolution by adding an non-local loop in the tensor network.

The SVD is used to return the resulting rank-4 tensor to the canonical MPS form ( rank-3 tensors and diagonal matrices), with the singular values below a cutoff threshold (e.g., default is ) being truncated. The truncation over subspaces supporting exponentially small components of the wave-function allows our MPS-based TNQVM simulate large numbers of qubits, contingent on some slowly growing entanglement properties. Examples and discussion may be found in the demonstrations in Sec. V.

iv.2.2 ExaTENSOR MPS Implementation

The ExaTENSOR numerical tensor algebra backend will enable larger-scale TNQVM quantum circuit simulations on GPU-enabled and other accelerated as well as conventional multicore HPC platforms. ExaTENSOR stores tensors in distributed memory (on multiple/many nodes) as a generally sparse collection of tensor slices in a hierarchical fashion. Such distributed tensor storage lifts the memory limitations pertinent to a single node, thus extending the maximal number of simulated qubits. Although we currently target the (distributed) MPS implementation, ExaTENSOR also provides a generic tensor network builder that can be used for constructing an arbitrary tensor network. The ExaTENSOR MPS visitor implementation provides a constructor that creates the MPS representation of the simulated multi-qubit wave-function (all constituent MPS tensors are distributed now). Then the XACC IR tree traversal invokes ExaTENSOR MPS visit method for each traversed node (instruction). The visit method implements lazy visiting, namely it only caches the corresponding instruction (gate) in the instruction cache of the ExaTENSOR MPS visitor. At some point, once the instruction cache has enough work to perform, the evaluate method of the ExaTENSOR visitor is invoked which implements the generic gate action algorithm shown in Section II. Specifically, it allocates the output MPS tensor network, that is, the result of the action of the cached gates on the input MPS tensor network. Then it creates the inner product (closed) tensor network by joining the gate tensors to the input MPS tensor network, subsequently closing it with the output tensor network (see Figure 2). This closed tensor network is a scalar whose value needs to be maximized. The ExaTENSOR MPS visitor will utilize the standard gradient descent algorithm by evaluating the gradient with respect to each tensor constituting the output tensor network. Each of these gradients is an open tensor network itself that needs to be fully contracted. Importantly, the computational cost of this contraction of many tensors strongly depends on the order in which the pairwise tensor contractions are performed. Finding the optimal tensor contraction sequence is an NP-hard problem. Instead, ExaTENSOR implements a heuristic algorithm that delivers the best found sequence of pairwise tensor contractions in a reasonable amount of time (subseconds). Then this pseudo-optimal sequence of pairwise tensor contractions is cached for a subsequent reuse, if needed. Given the sequence of pairwise tensor contractions, the ExaTENSOR library will numerically evaluate all of them and return the gradients that will subsequently be used for updating the output tensor network tensors, until the optimized inner product scalar reaches the desired value. In case it does not reach the desired value, the tensors constituting the output tensor network are reallocated with increased dimensions of the auxiliary spaces and the entire procedure is repeated. At this point, the early prototype implementation of the ExaTENSOR MPS visitor in TNQVM is based on the single-node version of the ExaTENSOR library and we are currently finishing the integration of the TNQVM with the distributed version of the ExaTENSOR library as well as performing the final testing of the ExaTENSOR library itself before the public release scheduled later this year.

V Demonstration

Here we demonstrate the utility of TNQVM by describing the overall memory scaling of our matrix product state TNQCM for varying levels of entanglement and system size. Our demonstrations show how TNQVM can be leveraged to validating hybrid quantum-classical programming models. Specifically we focus on random circuit simulations and the variational quantum eigensolver (VQE) hybrid algorithm.

v.1 Profiling Random Circuit Simulations with MPS

We demonstrate the improved resource cost of representing quantum states ( vs ) with TNQVM by using an MPS formulation and by profiling the memory usage of simulating randomly generated circuits. We vary the entanglement structure of our random circuits by constructing time slices defined as rounds. The first round begins with a layer of Hadamard operations on all qubits, followed by a layer of single qubit gates (Pauli gates and other general rotations), followed by a set of nearest-neighbor CNOT entangling operations. Multiple rounds constitute multiple iterations of generating these layers (excluding the Hadamards, which only appear in the first layer). Clearly, later rounds add layers of entangling CNOT operations and therefore generate states with a more complicated entanglement structure.

Figure 6: Memory usage as a function of the number of rounds (circuit depth) and with increasing number of qubits. Memory usage is constant for a small number of rounds but rapidly increases as the total circuit depth and number of qubits increases.

We generate these random circuits for through qubits in increments of , and for numbers of rounds ranging from through in increments of . For each pair, we generate 10 random circuits, compute the heap memory usage, and compute the mean and standard deviation of the memory usage. The results are plotted in Figure 6. For lightly-entangled systems (i.e. those generated by a small number of random rounds) we see that the MPS structure is able to encode the wavefunction of the system efficiently with a small cost. For example, for only two rounds the maximum bond dimension is , which is independent of system size. As we increase the entanglement in our random circuits, the computational cost of the MPS simulations increases exponentially. This is because circuits we have sampled from are designed to saturate exponentially increase the entanglement which saturates at entanglement described by of an -qubit system undergoing random roundsBoixo et al. (2018).

v.2 Variational Quantum Eigensolver

{listing} [H] {Verbatim}[commandchars= {}] h2˙src = ””” ˙˙qpu˙˙ ansatz(AcceleratorBuffer b, double t0) – RX(3.1415926) 0 RY(1.57079) 1 RX(7.85397) 0 CNOT 1 0 RZ(t0) 0 CNOT 1 0 RY(7.8539752) 1 RX(1.57079) 0 ˝ ˙˙qpu˙˙ term0(AcceleratorBuffer b, double t0) – ansatz(b, t0) MEASURE 0 [0] ˝ … (rest of measurement kernels) ””” qpu = xacc.getAccelerator(’tnqvm’) buffer = qpu.createBuffer(’q’,2) p = xacc.Program(qpu, h2˙src) kernels = p.getKernels() for t0 in np.linspace(-np.pi,np.pi,100): for k in kernels[1:]: k.execute(buffer, [xacc.InstructionParameter(t0)])
Figure 7: XACC program compiling and executing the variational quantum eigensolver for the molecule.

Finally, we demonstrate the utility of our tensor network simulation XACC Accelerator backend (the TNQVM library) in validating quantum-classical algorithms. It is this rapid feedback mechanism that is critical to understanding intended algorithmic results, and enables confidence in the programming of larger systems. Here we demonstrate this programmability and its verification and validation through a simple simulation of diatomic hydrogen via the variational quantum eigensolver algorithm. The quantum-classical program is shown in the listing below leveraging the TNQVM library.

This code listing demonstrates the integration of XACC and our tensor network accelerator implementation. The code shows how to program, compile, and execute the VQE algorithm to compute expectation values for the simplified (symmetry-reduced), two qubit Hamiltonian (see O’Malley et al. (2016b)). We start off by defining the quantum source code as XACC quantum kernels (note - we have left out a few measurement kernels for brevity). Each of these kernels is parameterized by a single double representing the variational parameter for the problem ansatz circuit (the ansatz kernel in the h2_src string). Integration with the TNQVM simulation library is done through a public XACC API function (getAccelerator). This accelerator reference is used to compile the program and get reference to executable kernels that delegate work to the TN Accelerator. We then loop over all and compute the expectation values for each Hamiltonian measurement term. Notice that this execution mechanism is agnostic to the accelerator sub-type. This provides a way to quickly swap between validation and verification with TNQVM, and physical hardware execution on quantum computers from IBM, Rigetti, etc.

Vi Conclusion

In this work we have discussed the concept of a general quantum virtual machine and introduced a concrete implementation of the QVM that enables quantum-classical programming with validation through an extensible tensor network quantum circuit simulator (TNQVM). We have discussed the applicability and scalability of a matrix product state backend implementation for TNQVM and discussed the role of TNQVM in benchmarking quantum algorithms and hybrid quantum-classical applications including random circuit sequences used in quantum supremacyBoixo et al. (2018) and the variational quantum eigensolverPeruzzo et al. (2014). We have chosen a tensor network based quantum virtual machine due to the complexity reduction such a formalism provides for a broad range of problems. In general TNQVM enables large-scale simulation of quantum circuits which generate states characterized by short-range entanglement. Studying systems with long-range entanglement interactions will require further developments in implementing more advanced tensor network decomposition types. We plan to investigate the applicability of the tree tensor network and the multiscale entanglement renormalization ansatz in future work, in an effort to scale simulation capabilities to a larger number of qubits.


This work has been supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory and the US Department of Energy (DOE) Early Career Research Program. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DEAC0500OR22725 with DOE. The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.


  • Childs and van Dam (2010) A. M. Childs and W. van Dam, Rev. Mod. Phys. 82, 1 (2010), URL
  • Montanaro (2016) A. Montanaro, npj Quantum Information 2, 15023 (2016), ISSN 2056-6387, URL
  • Biamonte and Bergholm (2017) J. Biamonte and V. Bergholm (2017), eprint 1708.00006, URL
  • Linke et al. (2017) N. M. Linke, D. Maslov, M. Roetteler, S. Debnath, C. Figgatt, K. A. Landsman, K. Wright, and C. Monroe, Proceedings of the National Academy of Sciences p. 201618020 (2017).
  • Friis et al. (2018) N. Friis, O. Marty, C. Maier, C. Hempel, M. Holzäpfel, P. Jurcevic, M. B. Plenio, M. Huber, C. Roos, R. Blatt, et al., Phys. Rev. X 8, 021012 (2018), URL
  • Preskill (2018) J. Preskill, arXiv preprint arXiv:1801.00862 (2018).
  • Peruzzo et al. (2014) A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, Nature Communications 5, 4213 (2014), URL{#}supplementary-information.
  • O’Malley et al. (2016a) P. J. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, et al., Phys. Rev. X 6, 031007 (2016a), URL
  • Kandala et al. (2017) A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Nature 549, 242 (2017), URL{#}supplementary-information.
  • Otterbach et al. (2017) J. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. S. Fried, S. Hong, et al., arXiv preprint arXiv:1712.05771 (2017).
  • Dumitrescu et al. (2018) E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris, T. Papenbrock, R. C. Pooser, D. J. Dean, and P. Lougovski, Phys. Rev. Lett. 120, 210501 (2018), URL
  • Humble and Britt (2016) T. S. Humble and K. A. Britt, in 2016 IEEE High Performance Extreme Computing Conference (HPEC) (2016), pp. 1–8.
  • Britt and Humble (2017a) K. A. Britt and T. S. Humble, ACM Journal on Emerging Technologies in Computing Systems (JETC) 13, 39 (2017a).
  • Green et al. (2013) A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and B. Valiron, in ACM SIGPLAN Notices (ACM, 2013), vol. 48, pp. 333–342.
  • Javadi-Abhari et al. (2014) A. Javadi-Abhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong, and M. Martonosi, in Proceedings of the 11th ACM Conference on Computing Frontiers (ACM, 2014), p. 1.
  • Wecker and Svore (2014) D. Wecker and K. M. Svore, arXiv preprint arXiv:1402.4467 (2014).
  • Humble et al. (2014) T. S. Humble, A. J. McCaskey, R. S. Bennink, J. J. Billings, E. F. DʼAzevedo, B. D. Sullivan, C. F. Klymko, and H. Seddiqi, Computational Science and Discovery 7, 015006 (2014), URL
  • Smith et al. (2016a) R. S. Smith, M. J. Curtis, and W. J. Zeng, arXiv preprint arXiv:1608.03355 (2016a).
  • Liu et al. (2017) S. Liu, X. Wang, L. Zhou, J. Guan, Y. Li, Y. He, R. Duan, and M. Ying, arXiv:1710.09500 (2017).
  • Svore et al. (2018) K. Svore, A. Geller, M. Troyer, J. Azariah, C. Granade, B. Heim, V. Kliuchnikov, M. Mykhailova, A. Paz, and M. Roetteler, in Proceedings of the Real World Domain Specific Languages Workshop 2018 (ACM, New York, NY, USA, 2018), RWDSL2018, pp. 7:1–7:10, ISBN 978-1-4503-6355-6, URL
  • PAKIN (2018) S. PAKIN, Theory and Practice of Logic Programming p. 1–22 (2018).
  • Smith et al. (2016b) R. S. Smith, M. J. Curtis, and W. J. Zeng, A practical quantum instruction set architecture (2016b).
  • Steiger et al. (2016) D. S. Steiger, T. Häner, and M. Troyer, ArXiv e-prints (2016), eprint 1612.08091.
  • Cross et al. (2017) A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, ArXiv e-prints (2017), eprint 1707.03429.
  • McCaskey et al. (2017) A. J. McCaskey, E. F. Dumitrescu, D. Liakh, M. Chen, W.-c. Feng, and T. S. Humble, ArXiv e-prints (2017), eprint 1710.01794.
  • Häner and Steiger (2017) T. Häner and D. S. Steiger, ArXiv e-prints (2017), eprint 1704.01127.
  • Pednault et al. (2017) E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh, T. Magerlein, E. Solomonik, and R. Wisnieff (2017), eprint 1710.05867, URL
  • Aaronson and Gottesman (2004) S. Aaronson and D. Gottesman, Phys. Rev. A 70, 052328 (2004), URL
  • Orús (2014) R. Orús, Annals of Physics 349, 117 (2014), ISSN 1096035X, eprint 1306.2164, URL
  • Kolda (2015) T. G. Kolda, Mathematical Programming 151, 225 (2015), ISSN 1436-4646, URL
  • White (1992) S. R. White, Physical Review Letters 69, 2863 (1992).
  • Schollwöck (2011) U. Schollwöck, Annals of Physics 326, 96 (2011), ISSN 00034916, eprint 1008.3477, URL
  • Schuch et al. (2007) N. Schuch, M. M. Wolf, F. Verstraete, and J. I. Cirac, Physical Review Letters 98, 140506 (2007), ISSN 00319007, eprint 0708.1567, URL
  • Verstraete et al. (2008) F. Verstraete, V. Murg, and J. Cirac, Advances in Physics 57, 143 (2008), ISSN 0001-8732, URL
  • Murg et al. (2010) V. Murg, F. Verstraete, Ö. Legeza, and R. M. Noack, Physical Review B - Condensed Matter and Materials Physics 82, 205105 (2010), ISSN 10980121, eprint 1006.3095, URL
  • Nakatani and Chan (2013) N. Nakatani and G. K. L. Chan, Journal of Chemical Physics 138, 134113 (2013), ISSN 00219606, eprint 1302.2298, URL
  • Dumitrescu (2017) E. Dumitrescu, Physical Review A 96, 062322 (2017), ISSN 24699934, eprint 1705.01140, URL
  • Vidal (2006) G. Vidal, Phys. Rev. Lett. 101, 110501 (2006), ISSN 0031-9007, eprint 0610099, URL{%}0A
  • Evenbly and Vidal (2009) G. Evenbly and G. Vidal, Physical Review B 79, 1 (2009), ISSN 1098-0121, eprint arXiv:0707.1454v3, URL
  • Marti et al. (2010) K. H. Marti, B. Bauer, M. Reiher, M. Troyer, and F. Verstraete, New Journal of Physics 12, 103008 (2010), URL
  • Dang et al. (2017) A. Dang, C. D. Hill, and L. C. L. Hollenberg (2017), eprint 1712.07311, URL
  • Fried et al. (2017) E. S. Fried, N. P. D. Sawaya, Y. Cao, I. D. Kivlichan, J. Romero, and A. Aspuru-Guzik, arXiv preprint arXiv:1709.03636 (2017), eprint 1709.03636, URL
  • Britt and Humble (2017b) K. A. Britt and T. S. Humble, in High Performance Computing, edited by J. M. Kunkel, R. Yokota, M. Taufer, and J. Shalf (Springer International Publishing, Cham, 2017b), pp. 98–105.
  • (44) iTensor, URL
  • McCaskey et al. (2018) A. McCaskey, E. Dumitrescu, D. Liakh, and T. Humble, ArXiv e-prints (2018), eprint 1805.09279.
  • (46) A look at the composite design pattern, learn-java/a-look-at-the-composite-design-pattern.html, accessed: 2018-03-12.
  • Gamma et al. (1995) E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-oriented Software (Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995), ISBN 0-201-63361-2.
  • McCaskey and Chen (2017) A. McCaskey and M. Chen, Tnqvm - tensor network quantum virtual machine, (2017).
  • Boixo et al. (2018) S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and H. Neven, Nature Physics 14, 1 (2018), ISSN 17452481, eprint 1608.00263, URL
  • O’Malley et al. (2016b) P. J. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, et al., Phys. Rev. X 6, 031007 (2016b), URL
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description