qTorch: The Quantum Tensor Contraction Handler
Abstract
Classical simulation of quantum computation is necessary for studying the numerical behavior of quantum algorithms, as there does not yet exist a large viable quantum computer on which to perform numerical tests. Tensor network (TN) contraction is an algorithmic method that may be used for such simulations, often greatly reducing the computational cost over methods that simulate the full Hilbert space. In this study we implement a parallel tensor network contraction program for simulating a variety of quantum circuits, including the quantum approximate optimization algorithm (QAOA) applied to MaxCut and quantum circuits for quantum chemistry simulations. We show simulation results for 3 through 7regular MaxCut/QAOA circuits, even successfully simulating up to 100 qubits for some of the 3regular circuits. We test two different methods for generating the ordering of tensor index contractions: one is based on the tree decomposition of the line graph, while the other generates ordering using a straightforward stochastic scheme. Through studying instances of QAOA circuits, we show the expected result that as the treewidth of the quantum circuit’s line graph decreases, TN contraction becomes significantly more efficient than simulating the whole Hilbert space. This tradeoff occurs when the MaxCut problem’s graph regularity is five or six, suggesting that tensor contraction methods are superior for simulating MaxCut/QAOA with graphs of regularities approximately five and below. The stochastic contraction method outperforms the line graph based method only when the time to calculate a reasonable tree decomposition is prohibitively expensive. Finally, we release our software package, qTorch (The Quantum TensOR Contraction Handler), intended for general quantum circuit simulation. The software may be used to more quickly study the numerical behavior of arbitrary quantum algorithms. For a nontrivial subset of these quantum circuits, 50 to 100 qubits can easily be simulated on a single compute node, an ability that is beyond of reach of other modern software packages.
1 Introduction
Experimental hardware for quantum computing has been steadily improving in the past twenty years, indicating that a useful quantum computer that outperforms a classical computer may eventually be built. However, until a largescale and viable quantum computer has been built, numerically simulating quantum circuits on a classical computer will be necessary for predicting the behavior of quantum computers.
Such simulations can play an important role in the development of quantum computing by (1) numerically verifying the correctness and characterizing the performance of quantum algorithms [WS14, SSAG16, SSMAG16, TJD09, TJD11, Mis10], (2) simulating error and decoherence due to the interaction between the quantum computer and its environment [SSMAG16, GM13, VMH04, SMKE08, GZ13, TS14], and (3) deepening our understanding of the boundary between classical and quantum computing in terms of computational power, for which recent efforts for characterizing the advantage of quantum computers over classical computers [FH16, BIS16] serve as an example of this direction.
For our purposes, it suffices to consider the problem of quantum circuit simulation as one where we are given a quantum circuit and an initial state, with the goal of determining the probability of a given output state. Various approaches have been proposed for such simulation tasks. The most general method is to represent the state vector of an qubit state by a complex unit vector of dimension and apply the quantum gates by performing matrixvector multiplications. This is essentially the approach adopted in, for example, [WS14, TJD09, SSAG16, PESV14]. Such a method has the advantage that full information of the quantum computer is represented at any point during the circuit propagation. However, the exponential cost of storing and updating the state vector renders it prohibitive for simulating circuits of moderate sizes (e.g. 40 qubits for qHipster [SSAG16] and 45 qubits by Häner and Steiger [HS17]). On the other hand, for a wide class of circuits with restricted gate sets and input states[Got98, AG04, Val01, TD01, BG16], efficient classical simulation algorithms are available. For example, the numerical package Quipu [GM13] has been developed for taking advantage of prior results [Got98, AG04, BG16] on the stabilizer formalism to speed up general quantum circuit simulation. Finally, path integralbased methods [RG06] have also been proposed–though they do not improve the simulation cost, they lead to reduced memory storage requirements.
Other than considering the gate sets involved, an alternative perspective of viewing a quantum circuit is through its geometry or topology. This perspective was initiated by the work of Markov and Shi [MS05] on simulating a quantum circuit via tensor network contractions (during the preparation of this manuscript, an implementation of this simulation approach was brought to our attention [McC16]). An advantage of viewing quantum circuits as tensor networks is that one can afford to ignore the particular kinds of quantum gates used in a circuit, and instead only focus on the graph theoretic properties. While it is known that general quantum circuits involving universal sets of elementary gates are likely hard to simulate on a classical computer [NC00], this geometric perspective often allows for the efficient simulation of a quantum circuit with a universal gate set, provided that it satisfies certain graph theoretic properties. At least one software implementation of the tensor network simulation of method exists [McC16].
Among others, treewidth is an important graph theoretic parameter that determines the efficiency of contracting a tensor network of quantum gates. A property of graphs that is intensely studied in the graph theory literature [RS91, BK10, BK11, GD04], the treewidth provides important structural information about a quantum circuit. Namely, if the circuit’s underlying tensor network has treewidth , it is shown in [MS05] that the cost of simulating the circuit is . In [BIS16] treewidth is also used for estimating the classical resource needed for simulating certain quantum circuits.
Motivated by the importance of tensor networks in quantum circuit simulation in general (and for example quantum computational supremacy tests in particular), it becomes useful to have a circuit simulation platform singularly dedicated to tensor network contractions. One immediate challenge in contracting tensor networks is to find an efficient contraction ordering, which relies on finding a reasonable tree decomposition of the underlying graph (definitions are further discussed in Section 2). However, finding the optimal contraction ordering (or equivalently finding the minimumsize tree decomposition, or finding the treewidth of a graph) is NPcomplete [ACP87]: therefore one must typically resort to heuristic methods when finding this decomposition.
For this study, we implemented a set of tensor network (TN) contraction schemes to characterize their performance and simulate quantum circuits. Two of these schemes are reported here, as the others were inferior to the two successful contraction schemes. However, there are likely other heuristic schemes that outperform our stochastic algorithm, and this is an avenue worth pursuing. For a large set of quantum circuits, our tensor network based methods are shown to be less costly than simulation of the full Hilbert space, by comparing to simulations using the LIQUi> software package [WS14]. We emphasize that the tests in this report give timing data for finding the expectation value of a measurement performed after implementing a quantum circuit, not for completely characterizing a the circuit’s final state.
The remainder of the paper is organized as the follows. Section 2 sets up the definitions and notations used in the paper. Section 3 describes the heuristic methods used for contracting the quantum circuit tensor networks. Section 4 presents the example quantum circuits used as benchmarks for demonstrating the performance of our contraction algorithms. Section 5 gives results of comparisons between the qTorch contraction methods, and between qTorch simulations and LIQUi>’s Hilbert space simulations. Lastly, we provide a detailed software guide for installing and using qTorch in Appendix C.
2 Preliminaries
In this section, we give some definitions. We will use standard graph theory terminologies. All graphs that we consider in this paper are undirected. We denote a graph as consists of the set of nodes and edges .
The notions of treewidth and tree decomposition were introduced by Robertson and Seymour [RS91] as follows.
Definition 1 (Tree decomposition)
A tree decomposition of a graph is a pair , where is a collection of subsets and is a tree (with edge set and node set ), such that


For all , there is an with

For all
The width of a tree decomposition is . The treewidth of a graph is the minimum width among all tree decompositions of .
Definition 2 (Tensor)
In this context, a tensor is defined as a data structure with a rank and dimension . More specifically, each tensor is a multidimensional array with complex numbers. A tensor has indices, which take values from to .
For our purposes, we use tensors of dimension four to store density matrices versus pure states, related to the fact that a singlequbit density matrix has four entries. Markov and Shi [MS05] provide additional details for preparing tensors from quantum gates, initial states, and measurement operators.
Definition 3 (Tensor Contraction)
A tensor contraction is a generalized tensortensor multiplication. Here, A, a rank dimension tensor and B, a rank dimension tensor are contracted into C, a rank dimension tensor.
(1) 
It is important to note that the number of floating point operations performed is , exponential in the number of indices contracted on () and the rank of the resulting tensor ().
Definition 4 (Tensor Network)
A tensor network is a graph with tensors as vertices, and edges representing an index of the tensors. The rank of each tensor is given by the number of edges connected to it. An edge from one tensor to another indicates a contraction between the two tensors, and multiple connected edges indicate a contraction on multiple indices.
Here, we provide a visual example of a tensor network graph:
Definition 5 (Contraction Scheme)
A contraction scheme determines the order in which the tensor network is contracted. The ordering chosen for the contraction will greatly affect the computation and memory requirements. This is because some contraction orderings can result in much larger intermediate tensors than other orderings.
It is important to avoid tensors of large intermediate rank when contracting the network, as floating point operations grow exponentially with tensor rank. However, it is often the case that increasing the tensor rank is unavoidable. For example, a tetrahedron shaped graph of rank tensors cannot be contracted without the rank of intermediate tensors increasing above .
To analyze the complexity of contracting a tensor network, we first create the linegraph of the network and then use Quick BB [GD04] to analyze its properties.
Definition 6 (Line graph)
There is a unique line graph L(G) for every graph G, with L(G) itself being an undirected graph. Each edge in G corresponds to a node in L(G). Two nodes in L(G) are connected if and only if these two nodes’ corresponding edges in G are connected to the same node in G.
There exists an optimal tree decomposition of L(G) that provides the optimal contraction ordering of G.
3 Contraction schemes and implementation details
For many problems in quantum physics to which matrix product states (MPS) or other tensor network methods have been applied, an efficient contraction scheme is often obvious from the underlying structure of the Hamiltonian [Orú14]. However, efficient contraction schemes are not available for arbitrary tensor networks. A general contraction scheme is important for the simulation of general quantum circuits, when one does not know a priori the topological properties of the underlying tensor network problem. Instead, for general circuits one must develop heuristics that define the contraction ordering.
3.1 Contraction schemes
qTorch implements two algorithms for determining the contraction ordering. For what we call the line graph (LG) method, we first create the line graph of the quantum circuit’s graph. Then, the software package QuickBB [GD04] is used to determine an approximately optimal tree decomposition of this linegraph. QuickBB is a socalled anytime algorithm, meaning that it can be run for an arbitrary amount of time, such that when the program is stopped it provides the best solution found thus far. The resulting tree decomposition is used to define the order of contraction. This linegraphbased approach was previously described by Markov and Shi [MS05].
The second contraction scheme is stochastic. First, a wire is randomly chosen, which of course is connected to two nodes. If the rank of the contracted tensor is higher than the highest rank of the two nodes, plus a given threshold, the contraction is rejected. After a fixed number of rejected contraction attempts, the threshold is relaxed.
Other stochastic contraction schemes were attempted for this study, including schemes that would calculate a cost function based on how a given wire’s contraction would affect the size of subsequent contractions. None of these outperformed the simple stochastic algorithm; because of this, none of them are included in qTorch. Nonetheless, it would be worth pursuing more sophisticated heuristics for contraction, as there likely exist other stochastic schemes that would outperform this first attempt at a simple stochastic algorithm.
3.2 Estimating the answer string
qTorch computes expectation values of the form for some input and output states and and a quantum circuit . (Alternatively, a measurement string can be determined as well, in e.g. a variational algorithm.) But to capture all the information of this final state of qubits, it generally requires repetitions of the algorithm. However, many quantities of interest may be calculated efficiently. For instance, the expectation value of one operator and one state can be estimated in just one contraction of the tensor network, a result essential to simulating the variational quantum eigensolver (VQE) [PMS14, WHT15, YCM14, MRBAG16].
qTorch provides a heuristic scheme to estimate the answer string of a variational algorithm such as QAOA, which we summarize here. Though this scheme is not used for the results presented in Section 5, it may be useful in the future for simulating algorithms (like QAOA) where the goal is to estimate a most likely bit string.
The scheme is implemented as follows. To begin, we run one simulation, and measure in the logical basis to project the first qubit into 0 or 1. Based on the resulting expected value from the simulation, we choose the value for the first qubit that has the greater probability. If the expected value is 0.5, we flip a fair coin to get the value. Then, we set the resulting qubit as the measurement for the first qubit in the next simulation, and repeat with a projective measurement on the second qubit. We continue this process for the rest of the qubits. As we will show below, this method often gives a good approximation of the most likely final computational basis state. In original tests on 3regular graphs of 30 vertices, the scheme (used on Quantum Approximate Optimization Algorithm [QAOA] circuits) gave bit strings that provided good estimates to the solution of the MaxCut problem (average approximation ratio of 94% compared to the exact brute force solution).
As a way to test the general applicability of this scheme, we performed some tests on more general circuits than the QAOA problem. These tests are meant to provide some insight into how useful this heuristic would be for estimating the most likely bit string of a quantum algorithm. We note that it is abundantly clear that this scheme will be very inaccurate in many cases—indeed, if it was a generally accurate scheme then we would have no need for a quantum computer.
In the remainder of this section, we consider the most likely bit string of the final state , which we define as , where the vectors are in the computational basis (i.e. the PauliZ basis). In what follows we propose a framework for approximating the most likely computational basis state, where the number of simulations (i.e. the number of full tensor network contractions) required is linear in the number of qubits in the tensor network.
We apply a unitary of the form
(2) 
where the matrix is a diagonal matrix with entries chosen randomly from the integers . Here is a parameter that can be interpreted as the number of clauses, if this were a QAOA problem. The elements of the dimensional vectors and are drawn uniformly from and respectively. We use the construction of to emulate the form of parametrized unitary operations used in QAOA with the same . Starting from the uniform superposition over all bit strings , we apply to compute the final state . Let denote the probability distribution associated to the QAOAlike output state .
Our likely string estimation algorithm can be thought of as treating the QAOA output state as a product state. Suppose we apply our algorithm onto the state . The product state then reads
(3) 
where is the probability of that the algorithm obtains at the step. Let denote the probability distribution of the bit strings corresponding to the product state approximation . Let be the (index of the) bit string returned by the algorithm. Then clearly . Let be the place of in the actual distribution , namely if is also the most probable bit string in , if is the second most probable, if it is the third most probable, and so on. We numerically investigate the distribution of as well as the 1norm distance between the approximate distribution which the algorithm effectively assumes and the actual distribution .
The results are shown in Figures 1 and 2. Here we use the number of qubits , the parameter and . However, Figure 1 shows that most of the time our algorithm produces a high ranking bit string—roughly 90% of the time the output of the algorithm is among the top 10% most likely bit strings. Figure 2 shows that the 1norm distance between the approximate and exact distributions is less than 0.1 for nearly all of the data points. These results suggest that our heuristic for for estimating an output bit string will produce acceptable results under some circumstances.
3.3 Noise models
Inserting a noise model into a quantum circuit would be straightforward, as it ought to be possible to map any noise model onto a set of one and multiqubit gates. The most commonly used noise approximations assume uncorrelated noise, which allows for singlequbit gates to be used for modeling noise. In this case, because rank2 tensors can always be contracted without increasing the rank of the resulting tensors, the complexity of simulating the resulting “noisy” quantum circuit would not increase.
A more physically realistic noise model would assume correlated noise, which would necessitate the insertion of twoqubit gates. In this case, the tree width of the circuit’s underlying line graph, and hence the complexity of the problem, would increase in all but the most trivial cases.
4 Circuit simulations
In this section we describe the quantum circuits that were simulated for this paper.
4.1 Quantum approximate optimization algorithm / MaxCut
Farhi, Goldstone, and Gutmann developed the quantum optimization approximation algorithm (QAOA) [FGG14a], meant to demonstrate a quantum speedup on lowdepth quantum circuits. An overview of the algorithm is provided in Appendix A.1. Farhi and Harrow have shown that it ought to be classically hard to sample from the output of QAOA [FH16]. Moreover, even though the QAOA algorithm was only recently invented, it has been applied to several quantum optimization problems. Wecker et. al developed an optimization algorithm based on QAOA that may have similar applications in quantum computers [WHT16]. Farhi, Goldstone, and Gutmann applied QAOA to Max E3LIN2 of bounded occurrence, showing that a quantum computer will achieve results better than the best classical algorithm [FGG14b]. Lin and Zhu generalize these results to more constraint satisfaction problems of bounded degree [YZ16]. Most recently, Guerreschi and Smelyankskiy tested a series of classical optimization routines for use with QAOA [GS17].
To generate the graphs for MaxCut, we wrote a random regular graph generator that places edges randomly throughout a given vertex set to satisfy a given regularity, checking for connectivity after all the edges have been placed. Disconnected graphs are rejected by the algorithm. QAOA/MaxCut Quantum circuits based on these graphs are then trivial to construct.
In the numerical results of this paper, we report only the timing for a single contraction of each quantum circuit. A full analysis of QAOA is beyond the scope of this work. However, we note that once the graphs have been created, it is possible to use qTorch to optimize the QAOA angles using an optimization library. Finally, if one chooses, one can use qTorch to estimate a MaxCut for the randomlygenerated graph, using the mostlikely ZString estimation method of the previous section.
4.2 Hubbard Model
Quantum simulation of fermionic systems is arguably one of the most relevant applications of quantum computers, with direct impact on chemistry and materials science, especially for the design of new drugs and materials. Among all the algorithms proposed for quantum simulation of fermions, the quantum variational algorithm (VQE) and related approaches [PMS14, WHT15, YCM14, MRBAG16] hold greater appeal for nearterm quantum devices due to their ability to correct certain types of errors and their lower coherence time requirements [MSCdJ16, OBK16].
As mentioned previously, in the VQE algorithm, a quantum computer is employed to prepare and measure the energy of quantum states associated to a parameterized quantum circuit. The approximate ground state of a Hamiltonian is obtained by variationally minimizing the energy with respect to the circuit parameters using a classical optimization routine. This hybrid quantumclassical approach offers a good compromise between classical and quantum resources. Classical simulations of the VQE algorithm for tens of qubits could provide insights into the complexity of the circuits used for state preparation and help design better ansatzes for the quantum simulation of fermions.
As an example of a VQE simulation, we employed our code to classically simulate variational circuits employed for the quantum simulation of 1D Hubbard lattices. We consider halffilled Hubbard models on sites, with periodic boundary conditions. The Hamiltonian for these systems is given by
(4) 
where and respectively create and annihilate an electron at site with spin . The summation in the first term runs over nearest neighbors, denoted as . These fermionic Hamiltonians can be mapped to qubit hamiltonians using an appropriate transformation, such as JordanWigner or BravyiKitaev [TSS15], which requires two qubits per site.
To construct variational circuits for these systems, we considered the variational ansatz introduced by Wecker et al [WHT15]. In this case, the Hamiltonian in Eq. 4 is divided as , where is the sum of hopping terms in the horizontal dimension and is the repulsion term (For 2D Hubbard lattices, the Hamiltonian also comprises vertical hopping terms). The variational circuit is constructed as a sequence of unitary rotations by terms in the Hamiltonian with different variational parameters. The sequence is repeated times. In each step, there are two variational parameters, and , where such as:
(5) 
where denotes a Trotter approximation to where can be or . For our numerical simulations, we employed the variational circuit of Eq. 5 with using a 1step Trotter formula for all the terms. Notice that this is only approximate for the term, which comprises a sum of noncommuting terms. We also assigned the value of 1 to all variational amplitudes. The corresponding unitary was mapped to a quantum circuit using the JordanWigner transformation and the circuit was generated using a decomposition into CNOT gates and singlequbit rotations [NC00, WBAG11]. The length of the sequence was reduced by combining .
5 Results
Simulations were performed on NERSC’s Cori supercomputer, using one node per simulation, each of which contains 68 cores and 96 GB of memory. Each LIQUi> simulation was run on a full node as well, using Docker [Mer14]. The free version of LIQUi> allows for the simulation of 24 qubits. Because full Hilbert space simulation scales exponentially regardless of the quantum algorithm’s complexity, we would not have been able to simulate more than 31 qubits on one of these compute nodes. For each set of parameters (regularity and number of vertices/qubits) 50 instances of MaxCut/QAOA circuit were created. For higher qubit counts and higher regularities, only a subset of these circuits were completed, since many simulations ran out of memory. In this section, LG or qTorchLG refer to the use of qTorch with the linegraphbased contraction, Stoch or qTorchStoch refer to qTorch with stochastic contraction. To determine a qTorchLG contraction ordering, QuickBB simulations were run for an arbitrary time of 3000 seconds for each quantum circuit. The plotted qTorch results include only the contraction time, not the QuickBB run time.
We note that LIQUi> implements many important optimizations, which makes it a fair benchmark against which to compare qTorch. For example, LIQUi> fuses many gates together before acting on the state vector, and uses sparse operations. qTorch, on the other hand, does not yet use sparsity at all (even when the circuit consists primarily of sparse CNOT gates), which is one of several optimizations that we expect will further improve performance.
LIQUi> is the fastest simulation method to use for the Hubbard simulations, as shown in Figure 3. This is because the tree width of the circuit’s graph increases substantially with the number of qubits, even for these shortdepth circuits. The result is not surprising—if the algorithm were easy to simulate with a tensor network on a classical computer, then it would not have been worth proposing as a candidate for a quantum computer.
Simulation timing results for 3, 4, and 5regular MaxCut/QAOA circuits are shown using Tukey boxplots in Figures 4 and 5. Stoch and LG simulation times are of similar order of magnitude for these circuits, though LG is generally faster. The exception is the 3regular graph problems, where Stoch apparently finds the more efficient contraction faster than QuickBB does. We note that if the QuickBB algorithm were run for infinite time before beginning the contraction, then qTorchLG would always contract the circuit faster than qTorchStoch. Note that LIQUi> begins to outperform tensor contraction methods once the algorithm is run on 5regular graphs, because the increased circuit complexity leads to larger intermediate tensors in qTorch.
For 3regular circuits (Figure 5), the LG method tends to be faster than the Stoch method. Using a single Cori NERSC node, we were able to contract quantum circuits of 90 qubits for a very small subset of the simulated graphs, though not on enough graphs to report statistics. Full Hilbert space methods would be limited to 30 qubits on these nodes, and indeed previous simulation packages have not yet surpassed 45 qubits [SSAG16, HS17], using thousands of nodes.
Interesting trends appear when the simulation time is plotted against regularity of the MaxCut problem’s graph (Figure 6). It is notable that the LG method runs out of memory before the Stoch method does. As previously mentioned, the LG method contracts more efficiently the longer QuickBB has been run, and we chose 3000 seconds as an arbitrary QuickBB limit for all circuits. In other words, there is a tradeoff between running a longer QuickBB simulation and instead immediately using the Stoch method. Even with few qubits, at higher regularities the full Hilbert space simulation (using LIQUi>) performs better. This is expected, since as the complexity of the quantum circuit increases, higherrank tensors must be dealt with.
Figure 7 shows simulation time as the tree width upper bound increases, for MaxCut/QAOA circuits of 18 qubits. These include 3 through 7regular graphs. This tree width upper bound is simply the tree width of the tree decomposition that defines the contraction ordering. The plot clearly demonstrates the expected general trend of an increase simulation time with increased tree width, regardless of contraction scheme.
Finally, we note that we were easily able to perform simulations of 100 qubits for less complex graphs. To report one such example, we produced a random 3regular graph with a slightly different procedure from that given in of Section 4.1. Beginning with a 2regular graph (i.e. a ring) of 100 vertices, we added edges between random pairs of vertices until all vertices were of 3 degrees. Contracting this graph’s MaxCut/QAOA circuit took 150 seconds.
6 Conclusions
We have implemented a tensor contraction code for the efficient simulation of quantum circuits. We compared a stochastic contraction scheme to one based on the line graph of the quantum circuit’s graph, showing that the latter is more efficient in most situations. However, it is clear that in circuits for which calculating a good approximate optimal tree decomposition of the line graph takes longer than contracting the circuit stochastically, then the stochastic scheme is superior. As expected, qTorch becomes substantially faster than LIQUi> (i.e. a full Hilbert space simulation) the smaller the treewidth of the tensor network’s linegraph becomes. This is because tensor network contraction can simulate lowercomplexity circuits with fewer floating point operations than Hilbert space methods can.
Several immediate algorithmic improvements are possible for this software. The use of sparse tensors would reduce the number of floating point operations for a large subset of relevant circuits. Another possible strategy may be to perform tensor contraction on some parts of the circuits, but to use full Hilbert space or stabilizer formalism for other parts of the network, in essence applying each algorithm to the pieces of the quantum circuit to which it would perform strongest. However, determining how to divide the circuit among different algorithms is unlikely to be a trivial task. Finally, more advanced parallelization methods would allow for faster calculation of a tree decomposition as well as faster contractions. This software may be integrated into larger packages such as qHiPSTER [SSAG16], LIQUi>[WS14], ProjectQ [SHT16], or others, allowing for the simulation of a wider range of quantum circuits.
7 Acknowledgements
We are grateful to Edward Farhi and Aram Harrow for discussions about QAOA, to Salvatore Mandrà for general discussions, and to Dave Wecker for helpful advice on using LIQUi>. N.S. acknowledges the use of resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DEAC0205CH11231. A.A.G. and E.F. were supported by the Office of Naval Research under grant N000141612008 (Vannevar Bush Faculty Fellowship). A. A.G. and Y.C. were supported by NSF grant CHE1655187.
Appendix A Detailed Descriptions of Algorithms
a.1 Qaoa
The QAOA attempts to approximate solutions for satisfaction problems, in which one attempts to satisfy many clauses at once. The accuracy depends upon a parameter ; increasing it results in a better approximation. The relevant optimization problems, combinatorial in nature, are each defined by an objective function with a variable number of binary clauses and bits per clause. The goal when solving the problem is to maximize or minimize the number of clauses satisfied. Any objective function is defined by the sum of all of its clauses:
(6) 
where , a binary string of fixed length, is the input to optimize. Each clause , if satisfied, outputs 1 and if not, outputs 0. Usually, each clause will only depend on a few of the bits in the string.
We create an operator , which is defined as
(7) 
Note that each operator in the product is local to the qubits acted on by the clause . Additionally, all operators in the sum commute because all are diagonal.
Next, create a second operator . The operator does not depend on the objective function (while does) and is defined on qubits by
(8) 
These are the only two operators used in QAOA, applied to the starting state , where . In the tensor network framework, one starts with the state and prepares the state by applying a Hadamard gate to every qubit.
The parameter determines how many times the and operators are applied to the initial state. If , there is only one and one , and and are only applied once (the double hat notation is used to represent a Liouville superoperator):
(9) 
However, if is greater than 1, there are number of angles and number of angles, and operators are applied to the state, from and to and . The resulting state for parameter is as follows:
(10) 
Once this state has been prepared, the expectation value for every clause is measured:
(11) 
This sum gives the cost of the objective function for the prepared state ; the goal is to choose the parameters (’s) and (’s) such that the cost is maximized.
The state must be reprepared and measured once for every clause in the objective function to determine the cost. This must be done at each step of the optimization procedure.
a.2 MaxCut
a.2.1 Definition
MaxCut is a common optimization problems for graphs. The input is a graph represented by a vector of edges, , where each edge connects two vertices in the graph. The solution is the binary string where each bit corresponds to an edge, and maximizes the number of edges “cut”. A cut edge means that the two vertices connected by the edge have opposite values of zero and one or viceversa. A completely optimal solution would cut every edge in the graph, but this is usually not possible.
To map the MaxCut problem to a quantum computer, one represents each vertex in the graph by one qubit. Therefore, a graph with 30 vertices would require 30 qubits to represent.
Now, we must define the objective function to be used in the QAOA optimization. The objective function for MaxCut is as follows:
(12) 
where
(13) 
and each edge is represented by and , the qubits of the two vertices it connects.
a.2.2 Optimal Angles
Note that each clause in the objective function only acts on two qubits, and . As a result, all of the terms that do not act on those two qubits in the preparation of the state for each clause in the objective function will commute through and cancel to identity. For , the only terms left will be the terms that act on those two qubits. So, for each clause in the objective function, the resulting measurement will look like:
(14) 
However, as increases the number of terms that commute through decreases. If , and are applied twice, and the terms that don’t commute will include not only the operators acting on qubits and , but also the operators that act on vertices that are neighbors to and . This is due to the fact that each is a two qubit operator, so the first time is applied, each will act on either qubit or and then on one other qubit. All other operators will commute through. However, the second time is applied, any operator that acts on , , or the other qubits that were acted on by the first application of will not commute through. As a result, as increases, more and more qubits will be needed to compute the cost of each clause in the objective function, which is necessary to find the optimal ’s and ’s.
To maximize the value of the objective function, we construct the circuit necessary to simulate the cost of each clause in the objective function, represent the circuit in a tensor network, and contract the network. This must be repeated for every clause in the objective function to determine the complete cost. Finally, we take this calculation and feed it into the nlopt open source with the ’s and ’s as the parameters to optimize. Once we have these optimal values, we can proceed with preparing the total state and measuring it.
a.2.3 MaxCut on 3Regular Graphs
For 3regular graphs, can be interpreted as how much of the graph the algorithm sees when it maximizes the value of each of the objective function clauses [FGG14a]. For example, as mentioned above, when , to calculate the cost of each clause in the objective function requires potentially fewer qubits. In the case of a 3regular graph, the and vertices can only have a maximum of two other vertices each that they are connected to. Therefore, the maximum amount of qubits needed to compute the cost of each objective function clause is six. Below is the subgraph of this:
For , to evaluate the cost of each clause in the objective function, the algorithm sees every vertex within distance two of and , while the other operators commute through. Therefore, the maximum number of qubits needed to compute the cost of each clause is 14:
a.3 Estimating Likely ZString Example
We present here an example set of measurements for a four qubit ring graph when :
The first measurement performed is:
(15) 
Consider the case where the expected value is 0.5 (ideal case). Then, we flip a coin. Say, we get 1 from the flip. Our next measurement is:
(16) 
Consider the case where the expected value is again 0.5 (ideal case). Then, by Bayes’ Law, we can calculate the value of the second qubit:
(17) 
Therefore,
(18) 
Plugging in our values for and , we obtain the probability of the second qubit being in the state, which is 1. Therefore, we assign the value of our second qubit to 0.
Repeating for a third time:
(19) 
Consider the case where we obtain an expected value of 0 (ideal case). Therefore, we can use Bayes’ Law again to calculate that the value of qubit three should be 1. Repeating for the last time, we perform the final measurement:
(20) 
An expected value of 0.5 tells us that the value of qubit four should be 0 and that this answer string appears in the state with a probability of 0.5. This high, nonzero probability tells us that 1010 is likely a correct solution for the MaxCut of the four vertex ring graph.
Appendix B LIQUi> benchmark
The quantum circuit simulations performed on LIQUi> are used as benchmarks for comparing with our qTorch implementation. First, a .qasm file describing the quantum circuit is converted to an F# script containing a function that runs the circuit in LIQUi>. When the circuit has many gates (say, with more than 500 gates), multiple functions are constructed in the F# script, each containing a subset (say, 500 gates) of the quantum circuit. We then use LIQUi> to first compile the functions(s) corresponding to (subsets of) the circuit, then run the circuit and finally compute the expectation value of some userspecified operator with respect to the final state of the circuit. We compute the total wall clock time of the threestep process and use it as a comparison to our qTorch performance.
Here are a few additional notes concerning the LIQUi> benchmark:

For computing the expectation value , we assume that is a string of Pauli operators i.e. operators of the form where each is a Pauli operator. Our method for computing takes advantage of this property and runs in time that scales linearly as the number of nonzero amplitudes in the final state. Timing results from the benchmark circuits indicate that the overhead for computing is only a small fraction (at most 10%) of the time spent running the circuit.

Some of the quantum circuits that we use for benchmarking are circuits for quantum chemistry simulations. Although LIQUi> has a more optimized implementation specifically dedicated for quantum chemistry simulation, we choose not to take advantage of such implementation because our focus is on the average performance of simulating general quantum circuits.

Our LIQUi> benchmark is performed using a Docker container to ensure that one could execute the programs regardless of the operating system used. The environment inside the container is Unixlike and mono is used for running Windows .exe executables. Due to the upper limit on the stack size imposed by mono, we restrict each function in our F# script generated from the .qasm file to have at most 500 gates, and we use LIQUi> to compile each function individually. This partition of the quantum circuit into segments of at most 500 gates renders the simulation speed possibly suboptimal compared with the case where LIQUi> is used for compiling the entire quantum circuit in one shot. However, the circuits that we simulate have been partitioned into no more than 10 subfunctions, thus we believe that our implementation should at least capture the order of magnitude of the optimized speed of LIQUi> simulation.
Appendix C Software Guide for qTorch
The source code of qtorch can be obtained from the following URL:
https://github.com/aspuruguzikgroup/qtorch 
c.1 License
The qTorch software package is licensed under the Apache 2.0 license^{3}^{3}3https://www.apache.org/licenses/LICENSE2.0.
c.2 Installation and Makefile
c.2.1 Dependencies
GCC/G++ version 4.9 or higher, Libtool or Glibtool (OSX), and GNU Make are the only external library dependencies. To check your GCC version, type gcc version. If your computer runs OSX, it’s likely that GCC links directly to a clang compiler. In that case, please ensure that clang is version 3.4 or higher using clang version. The other dependencies, GNU make, and Libtool or Glibtool (OSX) can be installed using a local package installer such as yum or brew, or by following instructions online. However, it is likely that these packages are already installed.
c.2.2 Installing qTorch Using GNU Make
For convenience, an installation option is provided with the library. Please use the following commands inside the main directory qtorch for installing qTorch:
make install
or, to locally install for just one user,
make installlocal
This is necessary for running any simulations. If installation of nlopt2.4.2 (the nonlinear optimization library used for QAOA angle optimization) fails, QAOA MaxCut will not run. Next, we recommend the user set their shell environment variables as detailed in the README.txt file to be able to run the compiled executables from any directory and extend the library to their custom .cpp files. Additionally, a Makefile is provided. The available commands are: make all, make qtorch, make test, make cut, and make clean. The make all command compiles all three executables (tester, maxcutQAOA, and qtorch), and the make clean command removes any compiled executables. Please read the rest of the guide for further details on the other make commands.
c.2.3 Installing qTorch Using Docker
An alternative way of running qTorch is to run it inside a Docker container. This allows the user to use qTorch regardless of the local environment (operating system, package dependencies etc) of her machine. We provide the following Docker image on DockerHub
therealcaoyudong/qtorch 
which contains the qTorch source code as well as the necessary libraries needed for building qTorch.
Here we provide a stepbystep instruction on how to use the Docker image, assuming no prior experience with Docker.

Install Docker on your computer. Also VirtualBox is needed;

In a command terminal, run dockermachine ls, which returns a list of virtual machines. If the list is empty, create a virtual machine called default by running dockermachine create driver virtualbox default.

Configure the shell to the virtual machine by running dockermachine env default. Run the command suggested by the returned message.

Now everything is set up, run the image by docker run it therealcaoyudong/qtorch. It may take a while to download the image. When the download is complete, the shell header will look like for example root@c7d9e3be4c53#. The hash string after @ is the identifier of the Docker container started by the docker run command just executed. To see a list of running Docker containers, run docker ps. Note that in addition to a hash string, each container is also labels with a nickname (such as happy_einstein).

The Docker container is effectively an Ubuntu environment with qTorch installed and executable from any directory. The source code for qTorch can be located at /root/qtorch. The user is free to rebuild qTorch as described in Section C.2.2. To see if qTorch is correctly installed, run qtorch anywhere inside the Docker container.
c.3 Library Introduction
The tensor contraction library provides a parallelized framework for users to simulate quantum circuits using a tensor network. Quantum circuits are translated from the QASM file format^{4}^{4}4https://www.media.mit.edu/quanta/qasm2circ/ to a tensor network object and then contracted at the user’s command, returning the probability of obtaining the measurement specified, not the amplitude. Additionally, the user must provide a list of measurements in a separate file: X, Y, Z, 0, 1, or T (Trace) on each qubit. Sample measurement files are provided for the user to peruse. Each contraction is irreversible, and the user cannot check the amplitudes of the wavefunction at a specified timestep in the circuit. This is unfortunate but is also paramount in the tensor network’s ability to calculate expected values for circuits with large numbers of qubits. Within the library, sample QASM files are provided for the user to look through and emulate. Furthermore, more information on how to write a QASM file compatible with the library exists in Section 2. Many gates are built into the library, but the user is free to design any arbitrary gate on one or two qubits. In this way, computation is universal, and any measurements can be executed.
c.4 Basic Rules of Writing QASM and Measurement Files

Each QASM file’s first line must be the number of qubits in the circuit

Each successive line can either act on a qubit with an operator or define a one or two qubit operator

A corresponding measurement file must be provided with the QASM file. If it’s blank, all qubits will be traced out

If the QASM file is incorrect, the tensor library will not simulate it
c.4.1 Acting On a Qubit with an Operator
The following operators are already supported by the library: CNOT, SWAP, Hadamard, Rx(), Ry(), Rz(), X, Y, Z, Depolarizing Noise Channel, CRk, CZ, and CPHASE(). A qubit index or two qubit indices for two qubit gates come after each operator. The only operators that have extra arguments are the Rx, Ry, Rz, and CPHASE gates. Here are a few examples of operations:

H 0

Rz 3.14159 0

CNOT 0 1

SWAP 1 2

CPHASE 3.14159 0 1
c.4.2 Defining a New 1 or 2 Qubit Operator
The software has the power to compile and execute arbitrary one and two qubit matrix operations. Note that when parsing these matrix files, the software does not check for unitary evolution. To define an arbitrary one or two qubit operator, the command def1 or def2 comes first, followed by the gate name and the path of the file the gate is located in. Here are a few examples:

def1 T input/t.gate

def2 Cz input/cz.gate
Within the gate file, the user must include (one qubit) or (two qubits) numbers separated by spaces. The numbers can either be complex and in the format , where the number is or real and formatted like a regular floating point number: . Some example gate files are included for reference. The gate should only be defined once per QASM file, and after being defined, can be used within the file like any other one or two qubit gate.
c.4.3 Measurement Files
A measurement file must be provided in addition to an input QASM file. A measurement file consists of X, Y, Z, 0, 1, or T characters separated by spaces; these characters represent projection measurements. The order of the characters corresponds to the order of the qubits, i.e. the first character is a projection measurement on the first qubit. If the measurement file includes measurements and an 8 qubit circuit is simulated, the last three qubits will be automatically traced out. Additionally, if the file path is incorrect or the file fails to open, all qubits will be traced out.
c.5 Simple QASM simulation
The main executable file (qtorch) provided allows the user to easily simulate an arbitrary QASM file without have to create their own tensor network object. The user writes an input script file that includes the path to their QASM input file, the path to their measurement file, the contraction type they would like to use (stochastic, linegraph ordering, or userdefined). If the contraction type is linegraph ordering, the user must also provide the amount of time they would like to run the tree decomposition algorithm (quick bb) for to determine the line graph ordering. If the type is userdefined, the user must provide the path to the contraction sequence. After the user writes the script file, they can run the main executable with the provided makefile. To run, simply type:
make qtorch
and then, if the shell PATH variable has been set,
qtorch <global path to script file>
or
./bin/qtorch <global path to script file>
if the shell PATH variable was not set. Then, the output of the simulation will be printed to the console and to the file "output/qtorch.out". The user can specify a different (valid) output file as an input parameter in their script: >string outputpath /my/path/to/output.txt
c.5.1 Writing an Input Script
Every line in the script file begins with the character ’>’, otherwise, it will be ignored. To specify the QASM file path, the user types: >string qasm followed by the QASM file path. To specify the measurement file, the user types: >string measurement followed by the measurement file path. To specify the contraction method (either stochastic or line graph ordering), the user types >string contractmethod followed by either linegraphqbb, userdefined, or simplestoch. If the user chooses line graph ordering, the time cutoff for the ordering algorithm (Quick BB) may be specified to override the default of 20 seconds. To do that, the user types: >int quickbbseconds followed by the max time in seconds. If the user chooses userdefined, they must also specify the file which contains the contraction sequence with the line: >string usercontractseq followed by the file path of the contraction sequence.
What follows are some example input scripts.
Stochastic method:
# Simple stochastic method >int threads 24 >string qasm mypath/myqasmfile.qasm >string measurement mypath/mymeasurefile.txt >string contractmethod simplestoch >string outputpath mypath/output/stochasticmyqasm.out
Linegraph method, full calculation all at once:
# Line graph decomposition method for contraction >int threads 24 >string qasm mypath/myqasmfile.qasm >string measurement mypath/mymeasurefile.txt >string contractmethod linegraphqbb >int quickbbseconds 3000 >string outputpath mypath/output/lgmyqasm.out
Linegraph method, only running quickbb and extracting the estimate for optimal ordering:
# Run quickbb to get approximate optimal ordering, but do not contract >int threads 24 >string qasm mypath/myqasmfile.qasm >string measurement mypath/mymeasurefile.txt >string contractmethod linegraphqbb >int quickbbseconds 3000 >bool qbbonly true
Linegraph method, contraction only, reading the approximate optimal ordering from a previous run:
# Contract tensor network from previously found approximate optimal ordering >int threads 24 >string qasm mypath/myqasmfile.qasm >string measurement mypath/mymeasurefile.txt >string contractmethod linegraphqbb >bool readqbbresonly true >string outputpath mypath/output/prevrunmyqasm.out
c.5.2 UserDefined Contraction Sequence Files
The user defined contraction sequence is specified by a wire elimination ordering, and each wire is defined by a pair of nodes. Therefore, each line the the user defined contraction sequence file contains two numbers separated by spaces, which represent the indices of the nodes connected by the wire. The indices of the nodes are specified as follows: the first nodes for an qubit simulation specify the initial state nodes, and the last nodes specify the measurements. The other nodes, which are gates in the circuit are numbered by the ordering specified in the QASM file. A quick example ordering for a qubit network with one qubit gate would be: 0 2/n 1 2/n 3 2/n 4 2/n
c.5.3 LineGraph Contraction and QuickBB Ordering
The Linegraph contraction method runs a binary executable in the library: quickbb_64 by default to determine the optimal wire elimination ordering of the tensor network. This can be changed to run quickbb_32 for a bit system by inserting the line: >bool 64bit false into the input script file. To be able to run either executable, the user’s system must be GNU/Linux and able to execute ELF executables. If this is not the case, we recommend using the library on a different operating system or sticking to the stochastic contraction method.
c.5.4 Modify the Threading for CPU Optimum Efficiency
The default number of threads used for large tensor contraction is eight, but the user has the option to modify this parameter for use on a supercomputer or a computer that supports more than eight threads. To change the number of threads, the user simply adds: >int threads <x> where <x> is the new number of threads to use. All threading is done via the C++ standard library’s threading class, which uses pthread.
c.5.5 Understanding the Simulation Output
The output of the simulation, whether it’s printed to the console, the default output file: "output/qtorch.out", or a customized output file path, is straightforward. If the simulation fails, any errors will be printed and the simulation aborted. If the simulation succeeds, the output file will contain three lines. The first line is the output of the contraction as a complex number , where the resulting amplitude is . It’s important to note that this number is the probability of reading the measurement string, not the amplitude of the string, as the simulation stores all values in density matrix format. The second line is the number of floating point operations that it took to contract the entire tensor network. This is for performance documentation if necessary. The final line is the time the simulation took in seconds, enclosed in brackets.
c.6 More Advanced QASM Simulation
To perform more advanced QASM file simulation, i.e. simulate a batch of files or modify simulation parameters, the user must write their own code and integrate the objects provided with the library under the qtorch namespace. We provide examples of how this can be done as a supplement to this guide. After reading, we encourage the user to try their own implementations or mimic our examples. First, it’s important for the user to orient themselves with all the files in the library.
c.6.1 Network.h
The Network header file contains the Network object class. Each Network object represents an entire tensor network, comprised of both Wires (connections between tensors and Nodes (tensors), but the Network only stores pointers to all of the Nodes. The class also contains functions that parse the QASM and measurement files, contract two tensors, reduce the circuit to only nonadjacent two qubit gates, print the circuit to graph files formatted for visualization or treewidth calculations, get the final value from the simulation, and reset the circuit. For exact semantics, we suggest you look at the source code.
If the user would like to cut off a simulation that takes too long or restrict the simulation time, they can access the global variables: totTimer and maxTime located within the network class. Both variables are nonstatic and must be set for each different instance of a Network class. To use the timer, the user sets the value of maxTime to the maximum simulation time (in seconds). Then, the user starts the totTimer and calls one of the contraction methods. The contraction will automatically return after the maximum simulation time.
c.6.2 Node.h
The Node.h file contains the Node class and all of its inheritors. Each instance of the Node class is a tensor in the tensor network. The Node stores the data held in the tensor in a vector of complex doubles, and it stores connections to other nodes in the form of Wires. Therefore, when a Network parses a circuit, each gate becomes a Node. Therefore, an inheritance hierarchy exists in the Node class that helps create a CNOTNode when there’s a CNOT in the QASM file or an HNode if there’s an H in the QASM file. Once two tensors are contracted, their inner product becomes an IntermediateStateNode. The initial states of the qubits and their final measurements are also created stored as Nodes.
c.6.3 Wire.h
The Wire.h header file contains the Wire class, which is a simple class. Each Wire stores pointers to the two Nodes it connects and its unique ID.
c.6.4 LineGraph.h
This header file contains the LineGraph class. The user creates an instance of a LineGraph class with a pointer to a Network as an input parameter if the user plans to contract the Network using line graph ordering. The only functions in the class other than the constructor run QuickBB, a branch and bound algorithm for treewidth and tree decomposition, to determine the optimal line graph ordering, or contract the network based on a QuickBB ordering.
c.6.5 ContractionTools.h
This header files contains the ContractionTools class. If the user wishes to contract their tensor network with the stochastic method provided, they should create an instance of this class. The stochastic method is preferred for small circuits where an optimal ordering is unnecessary, or if quickbb is not available. To create a ContractionTools instance, the user can either provide an already created network as a parameter or their QASM file path and measurement file path. In this way, the user does not have to interact with the Network class at all if they choose.
The class contains a contract method, where the user supplies which stochastic method they would like to use to contract the network. Of the nonLineGraph methods, we recommend exclusively using Stochastic. Here’s a quick example.
#include "qtorch.hpp" int main(){ qtorch::ContractionTools c("input.qasm","measure.txt"); c.Contract(Stochastic); std::cout<<c.GetFinalVal(); }
It also has a separate contraction function for the user to contract the tensor network using a wire ordering of their choice. The other methods in the class help the user visualize the tensor network or calculate the treewidth. The user can print the tensor network to a graph file and then call another method within the class to calculate the treewidth or treewidth bounds of the tensor network. The final function in the class allows the user to retrieve the simulation output as a complex double when the contraction finishes.
c.7 Calculating Treewidth
Calculating the treewidth of a tensor network is important because the complexity of contracting a tensor network is . Therefore, we provide a QuickBB binary executable that does this. However, our qtorch executable provides a simple wrapper around the binary executable. Please see C.5 for information on how to run QuickBB using the linegraph method. The output file "output/qbbstats.out" produced from running just QuickBB will provide treewidth information on the underlying tensor network graph. To calculate the treewidth directly using the quickbb_64 or quickbb_32 binary executables^{5}^{5}5http://www.hlt.utdallas.edu/~vgogate/quickbb.html, the user must first create a CNF file that specifies their graph. The advanced format of a CNF graph can be found here^{6}^{6}6http://www.cs.ubc.ca/~hoos/SATLIB/Benchmarks/SAT/satformat.ps. However, for a more basic explanation, see the QuickBB website^{1}^{1}footnotemark: 1. Once the CNF file is created, the user runs the binary executable provided using the format detailed here^{1}^{1}footnotemark: 1. If the PATH environment variable was set, the user can type: quickbb_64 or quickbb_32 from any directory". Otherwise, run quickbb from the bin directory: ./bin/quickbb_64 or ./bin/quickbb_64 The output statistics file will provide a bound on the treewidth.
c.8 Other Executables Included
We include two other executables: maxcut.cpp and tests.cpp. If the user types the command: make test, it will compile and run the provided unit tests for the library. All test results will print to the "test.log" file in the output directory. The other executable, maxcut.cpp, is our implementation of QAOA (quantum approximation optimization algorithm) to solve instances of MaxCut. We use our tensor network library to simulate QAOA and solve MaxCut. To compile the maxcut executable, the user must use the command: "make cut". Please refer to the Examples section for how to run maxcut.
c.9 Extra Files

Timer.h is a basic timer class where the user can access both wall and CPU time to time how long a simulation takes. Please see the actual header file for usage.

preprocess.h is a file that we used for maxcut. The single method within it allows the user to determine a contraction sequence that runs faster than a provided maximum simulation time. It repeats stochastic method contractions that return after the maximum time or before the maximum time if a solution is found.

GraphGenerator library: The graph generator is an extra library that allows the user to randomly generate Xregular graphs using the C++ standard library’s Mersenne Twister algorithm. To generate one graph, the algorithm runs in , where n is the number of vertices in the graph, and x is the regularity of the graph. Within the library, there are two files: GraphNode.h and main.cpp. GraphNode.h is a simple struct that represents a single vertex in the graph being generated. The struct holds pointers to the other nodes in the graph the vertex is connected to and an integer that tracks the number of connections. The main.cpp file parses command line arguments provided by the user and runs the graph generation algorithm until all the graphs have been generated. At the start of the graph algorithm, n GraphNode objects are generated. Then, the algorithm attempts to place all of the edges though random selection of two vertices. If the random generation results in two already connected vertices or one vertex that already has x edges, the algorithm will try again, quitting after one hundred failed attempts. The edge placement loop completes regardless of whether all the edges have been placed successfully. If there is a mistake, the graph is discarded, and the algorithm is run again until success. A makefile is included in the library that allows the user to compile the executable: main.cpp. The command: make all is sufficient to do this. To run the executable, the user should type: ./main <reg> <numGraphs> <numNodes> where the command line arguments: reg, numGraphs, and numNodes are respectively the regularity, the number of graphs to generate, and the number of vertices per graph. The user may find the generated graphs in the Output directory. Finally, if the user runs the executable twice without transferring the generated graphs to a separate directory, the graphs may be overwritten.

maxcut.cpp: maxcut.cpp is an example of how tensor networks can be used to simulate QAOA to solve the MaxCut problem. To compile and run the maxcut executable, the user must first install the nlopt (nonlinear optimization) library for C++, following the instructions in the folder included. Then, the user can run either the angle optimization (the first part of QAOA), the cut approximation calculator, or both. Type the command make cut to compile this executable. To run the angle optimization, the user must type the command:
maxcutQAOA <GraphFile Path> <QAOA p value> <0> <file path to output angle file>
if the shell PATH variable was set or
./bin/maxcutQAOA <GraphFile Path> <QAOA p value> <0> <file path to output angle file>
if the PATH variable was not set. To run the cut approximation calculator, the user must type the command:
maxcutQAOA <GraphFile Path> <QAOA p value> <1> <file path to input angle file> <file path to output answer file> <seconds to preprocess for (optional)>
if the shell PATH variable was set or
./bin/maxcutQAOA <GraphFile Path> <QAOA p value> <1> <file path to input angle file> <file path to output answer file> <seconds to preprocess for (optional)>
if the PATH variable was not set. Following the same pattern, to run both the angle optimization and cut approximator, the user must type
maxcutQAOA <GraphFile Path> <QAOA p value> <2> <file path to output answer file> <preprocessing seconds (optional)>
if the shell PATH variable was set or
./bin/maxcutQAOA <GraphFile Path> <QAOA p value> <2> <file path to output answer file> <preprocessing seconds (optional)>
if the PATH variable was not set. Note that running this executable can take a long time for input graphs with many vertices or high regularity. We also recommend a QAOA p value of . The p value from the angle optimization and the cut calculator must match, and the number of angles provided for the cut calculator must be sufficient.
c.10 Examples
There are multiple example scripts provided for the simple QASM circuit simulation, as well as sample measurement files, sample circuits, and the maxcut example. We invite you to explore these within the Samples and Examples folders of our library. Please type chmod +x Examples/* before running any of the example scripts.
c.11 Troubleshooting
If any bugs with the library are encountered, please contact the authors via email: schuylerfried at gmail dot com, sawayanicolas at gmail dot com.
c.12 Further Improvements
We want to make this library as easy to use as possible, so if you have any improvements you want to suggest, please send us an email. In the future, we hope to provide better parallelization using MPI as well as sparse tensors.
References
 [ACP87] Stefan Arnborg, Derek G. Corneil, and Andrzej Proskurowski. Complexity of finding embeddings in a ktree. SIAM J. Algebraic Discrete Methods, 8(2):277–284, April 1987.
 [AG04] Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. 2004, arXiv:quantph/0406196.
 [BG16] S. Bravyi and D. Gosset. Improved Classical Simulation of Quantum Circuits Dominated by Clifford Gates. Physical Review Letters, 116(25):250501, June 2016, 1601.07601.
 [BIS16] Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, John M. Martinis, and Hartmut Neven. Characterizing quantum supremacy in nearterm devices, 2016, arXiv:1608.00263.
 [BK10] Hans L. Bodlaender and Arie M.C.A. Koster. Treewidth computations I. Upper bounds. Information and Computation, 208(3):259 – 275, 2010.
 [BK11] Hans L. Bodlaender and Arie M.C.A. Koster. Treewidth computations II. Lower bounds. Information and Computation, 209(7):1103 – 1119, 2011.
 [FGG14a] E. Farhi, J. Goldstone, and S. Gutmann. A Quantum Approximate Optimization Algorithm. ArXiv eprints, November 2014, 1411.4028.
 [FGG14b] E. Farhi, J. Goldstone, and S. Gutmann. A Quantum Approximate Optimization Algorithm Applied to a Bounded Occurrence Constraint Problem. ArXiv eprints, December 2014, 1412.6062.
 [FH16] E. Farhi and A. W Harrow. Quantum Supremacy through the Quantum Approximate Optimization Algorithm. ArXiv eprints, February 2016, 1602.07674.
 [GD04] Vibhav Gogate and Rina Dechter. A complete anytime algorithm for treewidth. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, pages 201–208, Arlington, Virginia, United States, 2004. AUAI Press.
 [GM13] H. J. García and I. L. Markov. Quipu: Highperformance simulation of quantum circuits using stabilizer frames. In 2013 IEEE 31st International Conference on Computer Design (ICCD), pages 404–410, Oct 2013.
 [Got98] D. Gottesman. The Heisenberg representation of quantum computers. Jun 1998.
 [GS17] G. Giacomo Guerreschi and M. Smelyanskiy. Practical optimization for hybrid quantumclassical algorithms. ArXiv eprints, January 2017, 1701.01450.
 [GZ13] Michael R. Geller and Zhongyuan Zhou. Efficient error models for faulttolerant architectures and the pauli twirling approximation. Phys. Rev. A, 88:012314, Jul 2013.
 [HS17] Thomas Häner and Damian S. Steiger. 0.5 petabyte simulation of a 45qubit quantum circuit, 2017, arXiv:1704.01127.
 [McC16] Alexander J. McCaskey. Tensor Network Quantum Virtual Machine (TNQVM), Nov 2016.
 [Mer14] Dirk Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), March 2014.
 [Mis10] J. A. Miszczak. Models of quantum computation and quantum programming languages. 2010, arXiv:1012.6035.
 [MRBAG16] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán AspuruGuzik. The theory of variational hybrid quantumclassical algorithms. New. J. Phys, 18(2):023023, 2016.
 [MS05] Igor L. Markov and Yaoyun Shi. Simulating quantum computation by contracting tensor networks. 2005, arXiv:quantph/0511069.
 [MSCdJ16] Jarrod R McClean, Mollie E Schwartz, Jonathan Carter, and Wibe A de Jong. Hybrid quantumclassical hierarchy for mitigation of decoherence and determination of excited states. arXiv preprint arXiv:1603.05681, 2016.
 [NC00] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, October 2000.
 [OBK16] P. J. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Jeffrey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, C. Neill, C. Quintana, D. Sank, A. Vainsencher, J. Wenner, T. C. White, P. V. Coveney, P. J. Love, H. Neven, A. AspuruGuzik, and J. M. Martinis. Scalable quantum simulation of molecular energies. Phys. Rev. X, 6:031007, Jul 2016.
 [Orú14] Román Orús. Advances on tensor network theory: symmetries, fermions, entanglement, and holography. The European Physical Journal B, 87(11):280, Nov 2014.
 [PESV14] Robert N. C. Pfeifer, Glen Evenbly, Sukhwinder Singh, and Guifre Vidal. NCON: A tensor network contractor for MATLAB, 2014, arXiv:1402.0939.
 [PMS14] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, ManHong Yung, XiaoQi Zhou, Peter J Love, Alán AspuruGuzik, and Jeremy L O´Brien. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun., 5:4213, 2014.
 [RG06] Ben RudiakGould. The sumoverhistories formulation of quantum computing, 2006, arXiv:quantph/0607151.
 [RS91] Neil Robertson and P.D Seymour. Graph minors. X. Obstructions to treedecomposition. Journal of Combinatorial Theory, Series B, 52(2):153 – 190, 1991.
 [SHT16] D. S. Steiger, T. Häner, and M. Troyer. ProjectQ: An Open Source Software Framework for Quantum Computing. ArXiv eprints, December 2016, 1612.08091.
 [SMKE08] M. Silva, E. Magesan, D. W. Kribs, and J. Emerson. Scalable protocol for identification of correctable codes. Phys. Rev. A, 78:012347, Jul 2008.
 [SSAG16] Mikhail Smelyanskiy, Nicolas P. D. Sawaya, and Alán AspuruGuzik. qHiPSTER: The Quantum High Performance Software Testing Environment, 2016, arXiv:1601.07195.
 [SSMAG16] Nicolas P. D. Sawaya, Mikhail Smelyanskiy, Jarrod R. McClean, and Alán AspuruGuzik. Error sensitivity to environmental noise in quantum circuits for chemical state preparation. Journal of Chemical Theory and Computation, 12(7):3097–3108, 2016. PMID: 27254482.
 [TD01] Barbara M. Terhal and David P. DiVincenzo. Classical simulation of noninteractingfermion quantum circuits. 2001, arXiv:quantph/0108010.
 [TJD09] F. Tabakin and B. JuliaDiaz. Qcmpi: A parallel environment for quantum computing. 2009, arXiv:0902.0699.
 [TJD11] Frank Tabakin and Bruno JuliáDíaz. {QCWAVE} – a Mathematica quantum computer simulation update. Computer Physics Communications, 182(8):1693 – 1707, 2011.
 [TS14] Yu Tomita and Krysta M. Svore. Lowdistance surface codes under realistic quantum noise. Phys. Rev. A, 90:062320, Dec 2014.
 [TSS15] Andrew Tranter, Sarah Sofia, Jake Seeley, Michael Kaicher, Jarrod McClean, Ryan Babbush, Peter V Coveney, Florian Mintert, Frank Wilhelm, and Peter J Love. The Bravyi–Kitaev transformation: Properties and applications. International Journal of Quantum Chemistry, 115(19):1431–1441, 2015.
 [Val01] Leslie G. Valiant. Quantum computers that can be simulated classically in polynomial time. In Proceedings of the Thirtythird Annual ACM Symposium on Theory of Computing, STOC ’01, pages 114–123, New York, NY, USA, 2001. ACM.
 [VMH04] George F. Viamontes, Igor L. Markov, and John P. Hayes. Graphbased simulation of quantum computation in the density matrix representation. 2004, arXiv:quantph/0403114.
 [WBAG11] James D. Whitfield, Jacob Biamonte, and Alán AspuruGuzik. Simulation of electronic structure Hamiltonians using quantum computers. Mol. Phys., 109(5):735–750, 2011.
 [WHT15] Dave Wecker, Matthew B. Hastings, and Matthias Troyer. Progress towards practical quantum variational algorithms. Phys. Rev. A, 92:042303, Oct 2015.
 [WHT16] D. Wecker, M. B. Hastings, and M. Troyer. Training a quantum optimizer. Phys. Rev. A, 94(2):022309, August 2016, 1605.05370.
 [WS14] Dave Wecker and Krysta M. Svore. Liqui>: A software design architecture and domainspecific language for quantum computing, 2014, arXiv:1402.4467.
 [YCM14] M.H. Yung, J. Casanova, A. Mezzacapo, J. McClean, L. Lamata, A. AspuruGuzik, and E. Solano. From transistor to trappedion computers for quantum chemistry. Sci. Rep., 4:3589, 2014.
 [YZ16] C. YenYu Lin and Y. Zhu. Performance of QAOA on Typical Instances of Constraint Satisfaction Problems with Bounded Degree. ArXiv eprints, January 2016, 1601.01744.