Generalized swap networks for nearterm quantum computing
Abstract
The practical use of many types of nearterm quantum computers requires accounting for their limited connectivity. One way of overcoming limited connectivity is to insert swaps in the circuit so that logical operations can be performed on physically adjacent qubits, which we refer to as solving the “routing via matchings” problem. We address the routing problem for families of quantum circuits defined by a hypergraph wherein each hyperedge corresponds to a potential gate. Our main result is that any unordered set of qubit gates on distinct qubit subsets of logical qubits can be ordered and parallelized in depth using a linear arrangement of physical qubits; the construction is completely general and achieves optimal scaling in the case where gates acting on all sets of qubits are desired. We highlight two classes of problems for which our method is particularly useful. First, it applies to sets of mutually commuting gates, as in the (diagonal) phase separators of Quantum Alternating Operator Ansatz (Quantum Approximate Optimization Algorithm) circuits. For example, a single level of a QAOA circuit for Maximum Cut can be implemented in linear depth, and a single level for SAT in quadratic depth. Second, it applies to sets of gates that do not commute but for which compilation efficiency is the dominant criterion in their ordering. In particular, it can be adapted to Trotterized timeevolution of fermionic Hamiltonians under the JordanWigner transformation, and also to nonstandard mixers in QAOA. Using our method, a single Trotter step of the electronic structure Hamiltonian in an arbitrary basis of orbitals can be done in depth while a Trotter step of the unitary coupled cluster singles and doubles method can be implemented in depth, where is the number of electrons.
I Introduction
The state of experimental quantum computing is rapidly advancing towards “quantum supremacy” Boixo et al. (2018), i.e., the point at which quantum computers will be able to perform certain specialized tasks that are infeasible for even the largest classical supercomputers. Beyond this technical milestone, however, lies another: useful quantum supremacy, in which quantum computers can solve problems whose answers are of interest independently of how they were achieved. The combination of efficient quantum algorithms Shor (1999); Grover (1996) and scalable error correction Fowler et al. (2009) makes such progress likely in the long term, barring fundamental surprises. In the near term, we have socalled Noisy IntermediateScale Quantum (NISQ) devices Preskill (2018), capable perhaps of outperforming classical devices on certain problems, but with extremely constrained resources. Many types of such devices (e.g., superconducting quantum processors) will have limited connectivity. For the most part, existing quantum algorithms assume an abstract device with arbitrary connectivity, i.e., the ability to do a twoqubit gate between any pair of qubits. In theory, this suffices given that circuits can be compiled to any concrete family of devices with polynomial overhead in qubits and gates Brierley (2017). In practice, polynomial overheads matter and can be the crucial difference between being feasible and infeasible on NISQ devices.
The overall goal of compilation within the quantum circuit model is to take a quantum algorithm and implement it (maybe approximately) on a concrete piece of quantum computing hardware. There are many approaches to this, but perhaps the most straightforward is to transform the desired quantum circuit into an executable one in two steps: 1) decomposition of the constituent gates into (maybe approximately) equivalent subcircuits consisting of “native” gates, and 2) what we call routing via matchings of the circuit Childs et al. (2019). Our focus here is on the routing problem, in which the logical qubits are dynamically assigned to physical qubits in a way that allows the desired logical gates to be implemented while respecting the restricted connectivity of the actual hardware. In general, it may be necessary to use swap gates to change this assignment of logical qubits to physical qubits throughout the execution of the circuit.
In the past several years, there has been a blossoming of tools for addressing variants of this routing problem, which are variously called “quantum circuit placement”, “qubit mapping”, “qubit allocation”, or “quantum circuit compilation” (though the latter term generally encompasses much more). Prior work, however, has taken one of two approaches. First, of theoretical interest, is to show how any quantum circuit can be converted “efficiently” (i.e., with polynomial overhead) into one in which gates act only locally in some hardware graph Brierley (2017); Hirata et al. (2009); Maslov et al. (2007). The second is an instancespecific approach, in which the problem is solved anew for each logical circuit Bhattacharjee and Chattopadhyay (2017); Siraichi et al. (2018); Li et al. (2018); Lye et al. (2015); Venturelli et al. (2018); Booth et al. (2018); Saeedi et al. (2011); Wille et al. (2016); Lin et al. (2015); Zulehner et al. (2018); Herbert and Sengupta (2018). We propose and instantiate a new instanceindependent approach, in which the routing is done for a family of instances, with littletono compilation necessary for each instance; the perinstance compilation time is therefore effectively amortized to nil. This approach, which finds solutions for families of instances, interpolates between the two approaches above and seeks to balance the time to solution and the quality of solution. The familyspecific routing can be found either algorithmically or, as is done here, manually. Algorithms useful in the instanceindependent approach, where quality of solution is prioritized over time to solution, may (but not necessarily) differ significantly from those useful in the instancespecific approach, wherein the prioritization is reversed. On the other hand, for many problem families, there is an instance with maximal structure on which instancespecific algorithms can be run, thus obtaining compilations that can be used for the whole family. In general, these instancespecific approaches work best on sparser cases, and on dense instances will return inferior compilations to the ones given here.
In many quantum algorithms for quantum chemistry it is the case that all circuits of a given size for a particular problem have the same structure with respect to a partial ordering of the operations, and the only instancespecific aspect is the parameters (e.g. rotation angles) of the gates. Furthermore, the implementation of these gates on hardware often has the same properties (e.g. fidelity and duration) regardless of the parameters. In such cases, the instance of the compilation problem is effectively independent of the instance of the application problem. Compare this with implementing QAOA on hardware in which gate durations are independent of their parameters, in which case the routing problem for a given problem instance is the same regardless of the variational parameters, but differs significantly for different problem instances. In cases in which gate durations vary, an upper bound on (or average over) the range of durations can be used to obtain instanceindependent compilations. Thus, the distinction between instancespecific and instanceindependent approaches is somewhat subjective and contextual, but we merely aim to emphasize that there is an underexplored regime in the tradeoff between quality of solution and computation time in approaches to the quantum circuit routing problem.
An alternative approach for variational algorithms in general is to obviate the compilation problem by using an ansatz that is based on the connectivity of the target hardware Farhi et al. (2017); Kandala et al. (2017) and less so on the target application. By efficiently compiling applicationspecific circuits to constrained hardware, our methods combine the efficiency of this approach with respect to physical resources with the advantages of an applicationspecific ansatz (e.g. fewer variational parameters).
A method was recently proposed for implementing a Trotter step of a fermionic Hamiltonian containing terms, where is the number of orbitals, using a circuit of depth with only linear connectivity Kivlichan et al. (2018); Jiang et al. (2018). Using fermionic swap gates Corboz and Vidal (2009), Kivlichan et al. were able to change the mapping between fermionic modes and physical qubits while preserving antisymmetry Kivlichan et al. (2018). By constructing a network of these gates such that, at some point in the circuit, each pair of orbitals is assigned to some pair of neighboring qubits, they were able to guarantee that they could implement each of the terms in the Trotter decomposition of the Hamiltonian by acting only locally on said pair of qubits, and that they could implement such terms in parallel. In this work, we generalize their approach and describe a way to construct networks of fermionic swap gates acting on qubits such that each tuple of fermionic modes is mapped to adjacent physical qubits at some point during the circuit. The circuits that we construct have an asymptotically optimal depth of while only assuming linear connectivity.
For fermionic Hamiltonians, Motta et al. take a different approach that exploits the fact that many Hamiltonians of practical interest are lowrank Motta et al. (2018). For unitary coupled cluster and fullrank generic chemical Hamiltonians, their methods achieve the same scaling as ours, as summarized in Table 1. Our methods provide an alternative Trotter order, whose relative value will need to be studied empirically. For a Trotter step of the Hamiltonian for real molecular systems, empirical data indicate that their lowrank methods can achieve depth.
The question of how to optimally implement a collection of qubit operators is not confined to the simulation of fermionic quantum systems. Another promising use is in the application of the Quantum Alternating Operator Ansatz (Quantum Approximate Optimization Algorithm, or QAOA) to Constraint Satisfaction Problems (CSPs) over Boolean domains. This approach was taken for the Maximum Cut problem using existing linear swap networks Crooks (2018); our methods can address CSP for any in depth.
Our main contributions are:

Formalizing a variant of the quantum circuit routing problem in a way that abstracts away details of particular devices and focuses on their geometry, which is shared by a wide class of devices;

Making explicit and general the equivalence between swap networks that change the mapping of logical qubits to physical qubits and those that change the mapping between fermionic modes and physical qubits;

Explicit constructions for several important classes of problems, as summarized in Table 1, using modular primitives that can be applied to new problems; and

Providing tools for lower bounding the depth of solutions to the routing problem, in particular by connecting it with prior work on acquaintance time and graph minors.
This paper is organized as follows. In Section II, we more formally describe the quantum circuit routing problem and our approach thereto. In Section III, we introduce generalized swap networks that will be used in the constructions of later sections. In Section IV, we introduce some specific quantum simulation tasks related to fermionic Hamiltonians, as well as the Quantum Alternating Operator Ansatz (QAOA), which yield families of circuits that can be routed using our methods. In Section V, we present our main result, showing how to achieve optimal scaling when routing (with an arbitrary ordering) circuits consisting of a qubit gate for each possible set of qubits. In Section VI we describe families of instances arising from the Unitary Coupled Cluster method and how to efficiently route them. In Section VII, we conclude. A reader familiar with either QAOA or quantum simulation of fermionic Hamiltonians and interested in quickly learning some useful techniques may do so in sections III, V, and VI.
In Appendix A, we discuss the instanceindependent approach in the context of quantum annealing. In Appendix B, we show how to lower bound the depth of a solution to the circuit routing problem.
Instance family:  CSP  UCCGSD  UCCSD  UpCCGSD 

Depth: 
Ii Model
We consider hardware consisting of a line of qubits and suppose that we are able to implement any qubit gate in time on any set of qubits that are adjacent. This is an abstraction of the more physical model in which only  and qubit gates can be directly implemented; for is thus some linear combination of and that indicates an upper bound on the cost of compiling any qubit gate. When considering a specific piece of hardware, this model is relatively coarse; different gates on different sets of physical qubits may require vastly different times to implement. However, this level of abstraction allows for significant generality without too great a loss of precision. Accordingly, for a specific piece of hardware, our constructions should be considered as a starting point, with lowlevel optimizations likely to improve the constant factors significantly. For example, the line of qubits on which the swap networks are defined can be embedded in a “castellated” manner in a lattice, as shown in Figure 1. The availability of the additional qubit adjacency can enable more efficient decomposition of higherlocality gates.
The problem we would like to solve is as follows: given a set of qubit gates on qubits, implement them in some order on the hardware described above. In particular, we focus on the swapnetwork paradigm. That is, we start with an initial assignment of logical qubits to physical qubits and insert a sequence of qubit swap gates to move the logical qubits around so that for every gate in the logical qubits on which it acts are physically adjacent at some point in the process. As discussed in Sections IV.1 and VI, the routing problem thus defined is equivalent to the problem of using fermionic swap gates to change the ordering of a JordanWigner string to enable the implementation of gates locally. Without loss of generality, we assume that there is at most one gate in acting on any set of qubits, and that for any gate acting on a set of qubits there is no other gate acting on a subset of qubits . This is a convenient abstraction, rather than a restriction. An instance of the routing problem is thus specified as a hypergraph, with vertices corresponding to logical qubits and hyperedges corresponding to logical gates. We focus on complete hypergraphs, ones in which for every subset of vertices, there is an edge connecting them; . Results for complete hypergraphs give worst case bounds for the general problem. A more general variant is the more typically considered problem in which one wants to enforce a temporal partial ordering on the logical gates.
In general, nearterm hardware will have greater connectivity than a line; nevertheless, it will likely contain a line as a subset, so that our constructions give a baseline. Even with greater connectivity, our scaling is optimal when the number of gates is . Let be the number of gates, the number of physical qubits, and the number of logical qubits. At most gates can be implemented at a time, so the circuit depth must be at least . For and , this implies a minimal depth of , which our construction provides. Because our focus is on resourceconstrained nearterm hardware, we shall assume that the number of physical qubits is equal to the number of logical qubits.
Iii Swap networks
Henceforth, by “swap gate”, we shall mean either the standard swap gate (when considering a mapping of logical qubits to physical qubits) or the fermionic swap gate (when considering a mapping of fermionic modes to physical qubits); for circuit routing, everything is exactly the same in both cases except for the “interpretation”. A swap network is a circuit consisting entirely of swap gates. We define a 2complete linear swap network, a notion we shall generalize shortly, to be a swap network in which all pairs of logical qubits are linearly adjacent at some point in the circuit and in which all swap gates act on linearly adjacent physical qubits. Such networks ensure that, in the linear architecture described in Sec II, there is an opportunity to add, for each pair of logical qubits, a 2qubit gate acting on those logical qubits (or fermionic modes as the case may be) at some point in the circuit. We call such opportunities acquaintance opportunities. They are not part of the swap network, but we shall often draw them as empty boxes in circuit diagrams to illustrate acquaintance properties of swap networks, as in Figure 2. We shall say that a set of logical qubits that has at least one such acquaintance opportunity is “acquainted” by the network, or that the swap network “acquaints” those qubits.
Before generalizing this notion, we review the construction of Kivlichan et al. Kivlichan et al. (2018) for implementing a local gate on every pair of logical qubits in depth , using swap gates. The swap network underlying this construction is what we shall call the canonical 2complete linear swap network. Let the physical qubits be labeled through , and partition the pairs of adjacent qubits into two sets based on the parity of their larger index: even pairs and odd pairs . Note that the pairs in each partition are mutually disjoint. We define the canonical 2complete linear (2CCL) swap network as alternating layers of swaps on the even pairs and odd pairs, as illustrated in the top half of Figure 2. The overall effect of the 2CCL swap network is to reverse the ordering of the logical qubits. In doing so, it directly swaps every pair of logical qubits. This construction has the attractive property that each acquaintance opportunity precedes a swap gate on the same two qubits, so any added gate that acts on a pair of logical qubits can be combined with the swap of those two qubits, with the result that in depth we can execute a qubit gate between every pair of logical qubits.
One direction for generalization is to swap networks, where is a subset of all pairs of qubits and is an architecture, such as a 2D grid. The set captures the pairs of qubits to which we want to apply 2qubit gates at a given stage in a circuit. We shall not discuss this generalization further in this paper, other than to note that our results can be used to provide bounds for swap networks. Because in the present work we shall present only swap networks acting on a line, we shall often leave that aspect implicit in the terminology and refer simply, e.g., to a “2complete swap network”.
Instead, we are interested in generalizing to complete swap networks, networks in which the elements of every set of logical qubits are adjacent at some point, so that a qubit gate (or set of  and qubit gates making up the qubit gate) could be applied thereto. To support the construction of complete swap networks in Sec.V, here we introduce a generalization of a 2complete swap network that swaps elements of a partition of qubits, rather than individual logical qubits: a complete swap network, where is an ordered partition of the physical qubits such that each part contains only contiguous qubits, contains only swap operators that swap parts of the partition. In this way, a complete swap network has the property that every part in the partition is adjacent to every other part in the partition at some point in the network.
In constructing swap networks, it will be useful to swap pairs of sets of qubits using what we call a swap gate, or, more generally, a generalized swap gate. The swap gate swaps a set of logical qubits with a set of logical qubits, while preserving the ordering within each set, i.e., it permutes a sequence of logical qubits from to . Several examples of these generalized swap gates and their decompositions are shown in Figure 3. In general, a swap gate can be decomposed using standard swap gates in depth . We call a swap network a swap network whenever it contains only swap gates for .
The canonical swap network has the same structure as the 2CCL swap network, except that instead of pairs of single qubits being swapped at a time, pairs of sets of qubits (i.e., the parts of the partition) are swapped. In the canonical swap network, each swap gate is preceded by a local acquaintance opportunity. To make the overall effect of a complete swap network be a complete reversal of the qubit mapping, we append to the end a swap network within each part. This is unnecessary when considering a single swap network, but may be helpful when using the swap network as a primitive in a larger construction. Note that this is is primarily for explanatory purposes, and in an actual implementation would likely be optimized away. In the recursive strategy for local hypergraphs (discussed in Sec. V), each generalized swap gate is preceded by some number of acquaintance opportunities and swap gates that ensure that each set of or qubits is acquainted with each one of the other set.
The 2CCL swap network has the exact same structure as the optimal sorting network on a line Beals et al. (2013). A sorting network is a fixed circuit consisting of “comparators”. Given an initial assignment of objects to the wires, each comparator compares the objects and swaps them if they are out of order. This means that a subset of the 2CCL swap network can be used to effect an arbitrary permutation of logical qubits in at most linear depth.
The swap networks above acquaint all pairs of sets of qubits. Another useful primitive is what we call a “bipartite swap network”; again, this should be more precisely called a “bipartite linear swap network” to emphasize that it acts on a line, but we leave this implicit for concision. Given a bipartition of sets of qubits, it acquaints all the unions of pairs of sets of qubits which can be formed by taking one set from the first part and the other set from the second part. While the depth of a bipartite swap network is similar to that of a complete swap network, the gate count is approximately halved. Figure 4 shows an example bipartite swap network for the sets of qubits with the first three in one part and the latter three in the second part.
Swap networks can be useful for measurement as well. In many cases, the gates to be executed correspond onetoone with the terms of a Hamiltonian to be measured. Any swap network used to implement those gates thus yields a partition of the terms of the Hamiltonian into parts containing only gates acting on disjoint sets of qubits. This partition can then be used to parallelize the measurements. After an application of the swap network, the swap layers following the logical layer to be measured can be executed in reverse to return the mapping to one in which the terms of the Hamiltonian are mapped to adjacent sets of qubits, with appropriate optimizations made to account for the fact that many swap gates will likely cancel out once the logical gates are removed. Alternatively, a simple sorting network can be used to achieve the same end. For fermionic Hamiltonians, this approach can significantly reduce the number of measurements needed by reducing the locality of all measurement terms, in addition to the savings yielded by parallelization.
Iv Problem families
Application  QAOA  Quantum chemistry 

Iteration  
Assignment  logical qubits  fermionic modes 
Changed by  SWAP  FSWAP 
In this section, we introduce two families of quantum circuits that come from quite different application domains but whose compilation can be addressed using essentially the same tools; the analogy is summarized in Table 2. Both cases involve repeated application of a circuit of a particular form such that for each iteration the compilation instance is the same. We provide constructions for a single iteration; these can be repeated sequentially for the full circuit. Solving the compilation instance for the full circuit all at once may provide a better solution, but likely at the cost of it being much harder to find.
iv.1 Fermionic Hamiltonians
The general form of the electronic structure Hamiltonian in second quantization is
(1) 
where label singleelectron orbitals, and are real coefficients, and is the creation operator for the th orbital. A common subroutine of quantum simulation algorithms is the Trotterization of time evolution under such a fermionic Hamiltonian AspuruGuzik et al. (2005):
(2) 
where is the part of the Hamiltonian that acts exclusively on modes ,,,. (For simplicity we absorb the terms acting on two fermionic modes into the terms acting on four.) One approach to mapping the fermionic operators into operators acting on the qubit Hilbert space is to employ the JordanWigner transformation Ortiz et al. (2001),
(3) 
After performing the JordanWigner transformation on Equation 2, many of the resulting operators will act nontrivially on qubits, resulting in a naive gate depth of for the implementation of Equation 2, assuming there are terms in the Hamiltonian. As we shall see, by reordering the fermionic modes (thereby changing the JordanWigner ordering), this overhead from the nonlocality of the JordanWigner transformation is addressed automatically in our scheme for parallelization. For this reason, our constructions provide significant advantage even when connectivity is not a constraint, including in the errorcorrected regime. As a result, at least with respect to scaling, we avoid the need for more sophisticated alternatives to the JordanWigner transformation, such as those developed by Bravyi and Kitaev Bravyi and Kitaev (2002) and others Bravyi et al. (2017).
A related approach, employed by a variety of works proposing the study of quantum chemistry using a nearterm device, is the use of a quantum circuit to prepare and measure the unitary coupled cluster ansatz Peruzzo et al. (2014); McClean et al. (2016); Romero et al. (2018); Lee et al. (2018). Under the typical choice to include only single and double excitations in the cluster operator, this wave function is given by
(4) 
where the cluster operator has a form similar to in Equation 1. Usually, it contains only excitations from the “occupied” orbitals which contain an electron in the reference state to the “virtual” orbitals, and the coefficients are determined variationally. We refer to this case as UCCSD, and the case where all electron excitations are included as UCCGSD. The exact exponential of Equation 4 is typically approximated by a Trotter expansion and (assuming ), the overhead from the nonlocality of the JordanWigner strings discussed above would lead to a circuit depth of for a single Trotter step.
We show how depths of and can be achieved for a Trotter step of the time evolution under a fermionic Hamiltonian (or the similarly structured UCCGSD) and the UCCSD ansatz, respectively. These scalings match the asymptotic results of Ref. Hastings et al. (2015) while also respecting the spatial locality of the available gates and requiring no additional ancilla qubits.
iv.2 Qaoa
As originally proposed Farhi et al. (2014); Farhi and Harrow (2016), QAOA is a method for minimizing the expectation value of a diagonal Hamiltonian
(5) 
corresponding to a classical function whose multilinear form is
(6) 
The minimization is done variationally over states of the form
(7) 
which consists of alternating applications of the “phase separator” and the “mixer” . The phase separator can be written as the product of gates corresponding to terms in the Hamiltonian,
(8) 
Note that the gates are diagonal and so their order does not matter. The locality of the gates corresponds directly to the locality of the terms in the Hamiltonian, . QAOA applied to CSP, in which each term acts on at most variables, thus requires qubit gates.
Hadfield et al. Hadfield et al. (2019) generalized QAOA to the Quantum Alternating Operator Ansatz, employing a wider variety of mixers, many of which involve qubit gates. While these gates, in general, do not commute, it is an open question how the order of the gates affects the efficacy of the mixing. In NISQ devices with limited depth, the depth in which different mixers can be implemented plays a key role in their usefulness. The techniques here can be applied to these alternative mixers, with different orderings giving different mixers within the same family, and the resulting compilation a key step in determining the most effective mixing strategy.
V Complete hypergraphs
v.1 Cubic interactions
Now, suppose we want to implement a qubit gate between every triple of logical qubits. We call a swap network that achieves this goal a complete linear (3CCL) swap network. We can do so in the following way. First, we start with the 2CCL swap network, as shown in the top half of Figure 2. At each layer where acquaintance opportunities appear, consider the partition whose parts are the pairs of qubits appearing in the acquaintance opportunities together with singleton parts for any unpaired qubits at the boundary. To obtain a complete linear swap network, we add swap networks corresponding to the partition , as shown in the bottom half of Figure 2. The way acquaintance opportunities, where local gates (or compilations of them to and local gates) can be added, are interspersed between the generalized swaps making up the swap network, as shown in the top half of Figure 5. We make use of the property that for any two pairs of logical qubits involved in a 2swap, each triple consisting of one of the pairs and one qubit from the other pair is mapped to three contiguous physical qubits either before or after the swap. This ensures that overall every triple of logical qubits is acquainted because any triple is the union of a pair and a third qubit. The 2CCL network ensures that the pair is adjacent at some point, and thus a part of some partition . The third qubit is necessarily in some other part of the same partition, so that at some point in the swap network there is a 2swap network involving and , ensuring that the triple is acquainted. (Actually, it is acquainted thrice, because there are three pairs for which the preceding logic applies.) There are exactly swap networks inserted, and each swap gate can be implemented in depth using standard swap gates, for a total depth of approximately .
v.2 General qubit gates
The above ideas generalize to arbitrary . The construction is recursive. First, construct the network to implement all qubit gates. Then replace every layer of acquaintance opportunities with the corresponding complete swap network, inserting swaps and acquaintance opportunities between the layers of swaps in order to acquaint each set of qubits with each qubit in the other set of qubits with which it will be swapped. Specifically, when inserting a swap involving two sets of qubits each, we want to ensure that each set of qubits consisting of one of the sets and one qubit from the other set is mapped to contiguous physical qubits either before or after the swap. For , this is the case without additional swaps. For larger , this can be achieved by adding swaps that bring half each of set to the “interface” between them before the swap (the half closest to the interface), and the other half to the interface afterwards (when it will then be the closer half). This ensures that overall every set of logical qubits is acquainted because any such set is the union of a set of qubits and a th qubit . Suppose we start with a swap network that acquaints every set of qubits, and in particular , so that is a part of some partition (corresponding to acquaintance layer in the starting swap network) in the recursive step. The th qubit is necessarily in some other part of the same partition , so that at some point in the swap network there is a swap involving and , ensuring that the set is acquainted. (Actually, it is acquainted at least times, because there are sets for which the preceding logic applies.)
Each swap network has depth at most in terms of swap gates. A swap gate has depth at most in terms of standard swap gates, and the additional swaps for bringing inner qubits to the interface add depth at each swap. Therefore, if we have a depth construction for all qubit gates, we can use that to get an depth construction for all qubit gates. The base case is the lineardepth 2CCL swap network for qubit gates. Figures 2 and 5 show the steps for . Lowerlocality gates can be included in one of two ways, or a combination thereof. First, they can be incorporated directly into the highestlocality gates. Alternatively, the lowerlocality acquaintance opportunities can be kept when recursing.
Using this recursive method yields a significant amount of redundancy with respect to the number of times that each set of qubits can be acquainted. For applications in which the gates do not commute, this can be exploited in two ways. First, distributing the gates over all possible acquaintance opportunities may lead to smaller Trotter errors. Second, for each gate a possible acquaintance opportunity may be chosen randomly. In other words, the swap network can be considered as a family of swap networks, each corresponding to a particular Trotter order; prior work shows that such random Trotter orderings may be helpful Childs et al. (2018).
v.3 Alternative for local
Here we present an alternative construction for sets of local gates. Its depth is similar to that of the other given, but it doesn’t obviously generalize. We include it for two reasons: it demonstrates a potentially useful property of complete linear swap networks, and it may be better when applied to specific hardware devices.
Note that in the 2complete swap network, every pair of logical qubits that is initially at distance from each other remains so, except near the ends of the line. Furthermore, every other logical qubit passes through them at some point. For our purposes, this means that in the course of the 2complete swap network we can execute any local gate such that some pair of the three logical qubits on which it acts is at distance at the start of the network.
Consider a sequence of mappings labeled by . In the mapping labeled by , the logical qubits are mapped to physical qubits , respectively. Any triple of logical qubits contains at least one pair that are mapped to physical qubits at distance in at least one of the mappings. The construction is thus: alternate between 1) 2complete swap networks with initial assignments given by the mappings, and 2) sorting networks to get to the next mapping. The 2complete swap networks have depth and the sorting networks depth at most , so overall the total depth is at most .
Vi Unitary Coupled Cluster
In this section, we describe how the techniques of this paper can be used to implement Trotterized versions of three different types of unitary coupled cluster ansatz with a depth scaling that is optimal up to constant prefactors. We present the details for the standard unitary coupled cluster method with single and double excitations from occupied to virtual orbitals (UCCSD) Peruzzo et al. (2014); McClean et al. (2016); Romero et al. (2018), a unitary coupled cluster that includes additional, generalized, excitations (UCCGSD) Nooijen (2000); Wecker et al. (2015), and a recently introduced ansatz that is a sparsified version of UCCGSD (kUpCCGSD) Lee et al. (2018).
The standard unitary coupled cluster singles and doubles ansatz is given by
(9) 
where is the HartreeFock state, ,
(10) 
The and indices range over the “occupied” orbitals (those which are occupied in the HartreeFock state ) and the and indices over the “virtual” orbitals (those which are unoccupied in ). A Trotter step of the corresponding unitary has local gates.
These can be implemented in depth, as shown in Figure 6. First, we assign the occupied orbitals to the first physical qubits and the virtual orbitals to the last physical qubits . We have a 2complete swap network on the occupied orbitals. In between every swap layer thereof, we do a 2complete swap network on the virtual orbitals. For every pair of occupied orbitals and every pair of virtual orbitals, there is a layer in this composite network such that the pairs are simultaneously adjacent. Thus, if we then insert a final 2swap network with appropriate partitions at every layer, then every set of occupied orbitals and virtual orbitals will be adjacent at some point and a local gate can be implemented on them. There swap depth of just the 2complete swap networks is . Before each one, a swap network is inserted with an average depth of . Overall, this yields the claimed depth. The coefficient of the leading term in the depth can be halved by accounting for the fact that we are typically interested in implementing only those excitations that are spinpreserving. If we initially order the spin orbitals within the sets of occupied and virtual orbitals by , then the parity of the spins of the pairs of orbitals acquainted in each layer of the swap networks alternates, and we only need to do a bipartite swap network when the spin parities of the layers of the two sets coincide.
A more general version of the unitary coupled cluster ansatz is obtained by allowing excitations between any pair of orbitals. Rather than the cluster operators given in Equation 10, we use
(11) 
where the indices , , , and are allowed to range over the entire set of orbitals (except that we often disallow excitations that do not preserve spin). It has been shown that the inclusion of these “generalized’ singles and doubles greatly increases the ability of unitary coupled cluster to target the kind of strongly correlated states that pose the greatest challenge for quantum chemical calculations on a classical computer Wecker et al. (2015); Lee et al. (2018). A Trotter step for unitary coupled cluster with generalized singles and doubles may be implemented by a straightforward application of the techniques for implementing 4local gates described in Figure 5. That construction also yields the optimal scaling here, enabling the execution of all gates operations corresponding to the terms in Equation 11 using a circuit of depth . One possibility for exploiting spin symmetry is as follows. Start with an initial mapping in which the orbitals of one spin are mapped to the first half of the physical qubits and those of the other spin to the second half. Then apply the quartic swap network to each half of the qubits in parallel, thus acquainting all sets of four orbitals with the same spin. Then apply a double bipartite swap network, of the sort used for UCCSD, to acquaint every set of four orbitals such that there are two orbitals of each spin.
As a final example of the utility of a swap network approach to circuit compilation, we describe the implementation of a sparse version of the unitary coupled cluster operator with generalized singles and doubles recently developed by Lee at al. Lee et al. (2018). Rather than the full set of double excitations as in Equation 11, this variant of unitary coupled cluster uses only those double excitations that transfer two electrons with opposite spins from one spatial orbital to another. The resulting cluster operators,
(12) 
contain only terms and can be implemented in depth using the approach detailed below.
Recall our prior observation that, throughout the execution of a complete swap network, every pair of logical qubits that is initially at distance 2 from each other will remain so. Furthermore, every such pair of logical qubits will become adjacent to every other pair. Therefore we begin by ordering the fermionic modes . Then, by executing a 2complete swap network, we bring the fermionic modes involved in each of the 2local and 4local terms in Equation 12 adjacent to each other at some point. We show an example for in Figure 7 below.
Vii Conclusion
We have introduced and instantiated an instanceindependent approach to quantum circuit routing. This instanceindependent approach has a distinct advantage among the growing number of alternatives for addressing the limited connectivity of physical devices: it requires effectively no marginal classical computation per instance. Of course, there is the corresponding disadvantage that it cannot in general achieve instancespecific optimality. However, for many applications, including the fermionic simulation tasks we addressed, all instances of a given size share a topology. For applications in which this is not the case, instanceindependent swap networks can nevertheless provide a starting point for further optimizations. (Regardless, simple local optimizations, such as removing any two swap gates in a row on the same pair of qubits, should be used to tighten up the swap networks presented here in any practical implementation.)
Another limitation of our approach is the complete separation of the decomposition aspect of compilation from the routing aspect; perhaps a better compilation can be found by solving these together at once. Nevertheless, given the hardness of the compilation problem in its full generality, we expect that this separation will in general be useful in balancing quality of the solution found with the time to find it.
We have made a connection between the quantum circuit routing problem and the minor embedding problem in quantum annealing. This analogy should not be taken too mathematically, especially when considering the ordered variant of the routing problem, but may still be of value in encouraging the lifting of ideas from the significant body of theoretical and applied work on minor embedding. For example, the separation of gate decomposition and circuit routing can be thought of as corresponding to the separation of the parametersetting and minorembedding aspects of compilation in quantum annealing.
While our motivation and focus is NISQera devices, our results may continue to be applicable even with full error correction. In the surface code, for example, the dominant cost with respect to both time and qubits is the implementation of T gates; in comparison, the cost of swap gates are negligible, and thus so is the overhead in overcoming limited connectivity. However, even errorcorrected devices benefit from parallelization. Our constructions for swap networks imply a scheme for parallelization, which may be of use independent of any mapping to physical qubits. For problems arising from the JordanWigner transformation of fermionic Hamiltonians, the swap networks are just as useful even with “free” (fermionic) swap gates. In that case, the locality to be addressed is not spatial locality but the number of qubits that each gate acts on, which must be bounded even in the errorcorrected regime. The same applies to proposed ion trap implementations with effectively alltoall connectivity.
There are several directions for future work. Of most practical interest is lowering the abstraction level. That is, using the highlevel constructions presented here to compile specific families of circuits to lowlevel hardware with restricted gate sets and variable durations. This is a necessary step in a more general program of directly comparing swap networkbased methods to alternative approaches, with respect to quantum resources, basisset errors, Trotter errors, etc. Furthermore, for some algorithms, there is freedom in the choice of operator at certain stages in the algorithm. For example, for the alternative mixers in the Quantum Alternating Operator Ansatz Hadfield et al. (2019), while reordering the gates gives different mixers, it is an open research question as to which mixers and which orders provide the best performance. Given the limited depth of NISQ devices, the efficiency with which qubit routing can be achieved for a given operator significantly impacts the choice of operator. The techniques described here provide a key step toward exploring these tradeoffs.
There is also further work to be done in the present abstraction level. Specifically, our construction for local gate sets is likely suboptimal with respect to constant factors, and may be improved. The same goes for . We also focus only on the routing problem for unordered sets of gates, in which there is no precedence structure to be enforced on the logical gates; examples of solutions to the ordered problem would significantly broaden the usefulness of this approach. One limited example would be the iterated circuits of a complete variational algorithm or Trotterbased simulation, whereas in the present work we focused on a single iteration.
More generally, with this work we have established a foundation for designing swap networks for more applications and more architectures. A more comprehensive understanding of how well different architectures support the topologies of different applications can be the foundation for codesign in both directions: in one direction motivating new architectures by how well they are suited generally or specifically to applications, and in the other direction tweaking problems in a way that doesn’t degrade the value of their solution but that makes them more efficiently solvable on a quantum computer.
Acknowledgements.
This work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Quantum Algorithm Teams Program, under contract number DEAC0205CH11231. We are also grateful for support from the NASA Ames Research Center, the NASA Advanced Exploration systems (AES) program, the NASA Transformative Aeronautic Concepts Program (TACP), and from the AFRL Information Directorate under grant F4HBKC4162G001. B.O. was supported by a NASA Space Technology Research Fellowship.Appendix A Instanceindependent embedding for quantum annealing
Quantum annealing is an alternative model of quantum computation for minimizing a classical pseudoBoolean function , in which the Hamiltonian is slowly changed from an initial Hamiltonian into the problem Hamiltonian , whose ground state(s) we would like to find. Often, the desired Hamiltonian cannot be implemented directly on a physical quantum annealer due to limited connectivity. To overcome this limitation, each logical qubit in can be mapped to a connected set of physical qubits which are coupled together with a ferromagnetic field that induces them to take on the same value. In the standard case in which is 2local (i.e., is quadratic), it can be considered as a graph, and this mapping from logical to physical qubits as a minor embedding into the hardware graph. For example, Choi Choi (2011) gave a family of minor embeddings of the complete graph into a socalled Triad hardware graph (similar to the Chimera hardware graph used by DWave) in which the number of physical qubits scales quadratically with the number of logical qubits, which is optimal for boundeddegree hardware graphs. Zaribafiyan et al. Zaribafiyan et al. (2017) provide a deterministic embedding for Cartesian product graphs.
In practice, problem graphs of interest are usually much sparser than the complete graph, or the Cartesian product graphs, and so using an embedding for the complete graph is likely to use more physical qubits than necessary. Specifically, most problems run on the DWave quantum annealer make use of DWave’s heuristic embedding software Cai et al. (2014). Many practitioners thus use instancespecific embeddings to maximize the use of scarce resources. The problem, however, is the difficulty of finding such instancespecific embeddings. An approach similar to the one we used for quantum circuits can be taken. Instead of using an embedding of either the complete graph (which is trivial to find but resourceinefficient) or a single problem graph (which is harder to find but more resourceefficient), one can use an embedding of a “supergraph” of a class of problem graphs. Such an embedding can be found either manually or algorithmically, but in any case can be reused for any instance in the class with negligible marginal cost. This approach thus strikes a potentially valuable balance between the two existing ones.
Appendix B Lower bounds
The optimality of the complete swap network is easy to show. logical gates are executed in almost perfectly parallelized layers. In a reasonable accounting in which any qubit gate on adjacent qubits can be done in unit time, the logical qubits and swaps can be combined into one. However, for more complicated cases the reasoning becomes more involved. This section gives some methods for lower bounding the depth of solutions to the (unordered) circuit embedding problem. In particular, the lower bounds are on the depth of the qubit swaps only, i.e., the “swap depth”. For a boundeddegree physical graph and boundedlocality logical graph, the logical gates that can be executed with a single, fixed mapping of logical to physical qubits, i.e., that after an swap layer, can be executed in depth. In such cases, which comprise almost all of practical interest, exact lower bounds on the swap depth thus yield scaling lower bounds on the total depth.
b.1 Acquaintance time
Benjamini et al. defined Benjamini et al. (2014) the acquaintance time of a graph , denoted as follows. Consider placing an agent at each vertex of the graph and a series of matchings ^{1}^{1}1A matching is a set of mutually disjoint edges of a graph. of the graph. Each matching corresponds to simultaneously swapping the agents on the vertices of each edge. Such a a sequence of matchings of is a strategy for acquaintance in if every pair of agents are adjacent in the graph at least once. The acquaintance time is the number of rounds (matchings) in the shortest strategy for acquaintance (and is finite if and only if the graph is connected).
This notion of strategies for acquaintance is a useful if limited abstraction for compiling quantum circuits around geometric constraints. As is, a strategy for acquaintance corresponds to a compilation of all local gates in a hardware graph , with agents corresponding to logical qubits, vertices corresponding to physical qubits, and edges of matchings to swap gates. A gate between two logical qubits can be implemented at any point that that they can become “acquainted”. This level of abstraction has the advantage and disadvantage that it disregards the exact nature of the gates. This makes it extremely general but also constructions within it somewhat approximate. For example, in a strategy for acquaintance, it is permissible for an agent to become acquainted with more than one other agent in a single round, while the corresponding local gates would need to be implemented sequentially.
Nevertheless, known results about acquaintance times Benjamini et al. (2014); Angel and Shinkar (2016) can be interpreted in the context of quantum circuit embedding. For example, that the acquaintance time of the path graph is provides an alternative proof of the optimality of the complete linear swap network. Interestingly, the acquaintance time of the barbell graph (two fully connected halves connected by a single edge) is also . Generally, it is known that for a graph of maximum degree , , which in particular implies that for any graph . There are also hardness results: is NPhard to approximate within a multiplicative factor of or within any additive constant factor.
A strategy for acquaintance as defined above requires that every pair of agents become acquainted. However, it will often be the case that we care only about certain pairs of agents, or largersized sets of agents. We now define a generalization of acquaintance time that may be of value in finding lower bounds in such cases. Let be the hypergraph whose vertices correspond to the agents and whose hyperedges correspond to the sets of agents that we would like to acquaint. We can then define a strategy for acquaintance in as an initial (injective) mapping of the vertices of to the vertices of and a sequence of matchings as above such that, for every edge of , if agent is placed on vertex in , on vertex , and so on, then the set of agents can be acquainted at some point. Whether a set of agents can be acquainted given their locations on the vertices of can be specified in one of two ways. In the first case, itself is a hypergraph and the agents can be acquainted if their positions are a hyperedge of , where is the location of agent after rounds. In the alternative, is a simple graph, and the agents can be acquainted if their positions form a connected subgraph of . The latter is closer to our application of strategies for acquaintance: the physical graph specifies on which pairs of qubits a qubit gate can be applied, and higherlocality gates are decomposed using such qubit gates. The acquaintance time of , denoted then is the minimal size of a strategy for acquaintance in . Note that this definition does not assume that .
b.2 Circuit embeddings as minor embeddings
This section assumes that the reader is familiar with the basic ideas of graph minor embeddings and treewidth; see Klymko et al. Klymko et al. (2014) for a brief introduction to these ideas in a related context. All graphs in this section will be assumed to have edges of size .
Consider a strategy for acquaintance in with rounds. Let be the strong product of and the path graph on vertices. That is,
(13)  
(14) 
The strategy for acquaintance in can be interpreted as a graph minor embedding of into as follows. Figure 8 shows an example for and . The “agents” are the vertices of . The vertex model of is the set of vertices corresponding to the series of assignments of to vertices of . Note that this vertex model is connected (indeed, a simple path) and that the vertex models of distinct vertices are disjoint, by the properties of an acquaintance strategy. The edge model of an edge is for some round in which the vertices and are assigned to adjacent vertices of . For any graphs and , if is a minor of , then and , because any path or tree decomposition for can be converted into one for by edgecontracting the vertex models, without increasing the relevant width. In our case, we have shown that is a minor of whenever there exists a round strategy for acquaintance in . Therefore,
(15) 
and similarly for treewidth.
We show now that, for an arbitrary graph on vertices, the pathwidth is at most about one more than the acquaintance time in the path graph ,
(16) 
We do so by explicitly constructing a path decomposition of a graph from a strategy for acquaintance in . Consider such a strategy and let be the assignment of vertex after round . We can construct a path decomposition with bags as follows. Each bag corresponds to an edge of and contains all the vertices of that are assigned to an vertex of adjacent to . The bags form the path graph corresponding to the line graph of . Each bag can contain at most vertices, where is the number of rounds in the strategy for acquaintance. Lastly, the number of rounds in the strategy is at least the minimum number of rounds and the pathwidth of the graph is at most the width of this decomposition, yielding the desired inequality.
One application of this inequality is yet another lower bound on the swap depth of a complete swap network. Equation 16 and the fact that imply that
(17)  
(18)  
(19) 
Note that Equation 16 is not necessarily tight for arbitrary graphs. For example, consider the star graph for large . It has pathwidth 1 ^{2}^{2}2Consider the decomposition in which there is a bag for each leaf containing that leaf and the internal vertex., but the minimum swap circuit depth is . More generally, caterpillar graphs exemplify the looseness of the above bound for the same reason; the minimum depth of a swap circuit for any graph scales linearly with the degree of the graph.
References
 Boixo et al. (2018) S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and H. Neven, Nature Physics , 1 (2018).
 Shor (1999) P. Shor, SIAM Review 41, 303 (1999), https://doi.org/10.1137/S0036144598347011 .
 Grover (1996) L. K. Grover, in Proceedings of the Twentyeighth Annual ACM Symposium on Theory of Computing, STOC ’96 (ACM, New York, NY, USA, 1996) pp. 212–219.
 Fowler et al. (2009) A. G. Fowler, A. M. Stephens, and P. Groszkowski, Phys. Rev. A 80, 052312 (2009).
 Preskill (2018) J. Preskill, arXiv preprint arXiv:1801.00862 (2018).
 Brierley (2017) S. Brierley, Quantum Information & Computation 17, 1096 (2017).
 Childs et al. (2019) A. M. Childs, E. Schoute, and C. M. Unsal, arXiv preprint arXiv:1902.09102 (2019).
 Hirata et al. (2009) Y. Hirata, M. Nakanishi, S. Yamashita, and Y. Nakashima, in 2009 Third International Conference on Quantum, Nano and Micro Technologies (2009) pp. 26–33.
 Maslov et al. (2007) D. Maslov, S. M. Falconer, and M. Mosca, in Proceedings of the 44th annual Design Automation Conference (ACM, 2007) pp. 962–965.
 Bhattacharjee and Chattopadhyay (2017) D. Bhattacharjee and A. Chattopadhyay, “Depthoptimal quantum circuit placement for arbitrary topologies,” (2017), arXiv:1703.08540 [cs.ET] .
 Siraichi et al. (2018) M. Y. Siraichi, V. F. d. Santos, S. Collange, and F. M. Q. Pereira, in Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018 (ACM, New York, NY, USA, 2018) pp. 113–125.
 Li et al. (2018) G. Li, Y. Ding, and Y. Xie, “Tackling the qubit mapping problem for nisqera quantum devices,” (2018), arXiv:1809.02573 [cs.ET] .
 Lye et al. (2015) A. Lye, R. Wille, and R. Drechsler, in The 20th Asia and South Pacific Design Automation Conference (2015) pp. 178–183.
 Venturelli et al. (2018) D. Venturelli, M. Do, E. Rieffel, and J. Frank, Quantum Science and Technology 3, 025004 (2018).
 Booth et al. (2018) K. E. Booth, M. Do, J. C. Beck, E. Rieffel, D. Venturelli, and J. Frank, arXiv preprint arXiv:1803.06775 (2018).
 Saeedi et al. (2011) M. Saeedi, R. Wille, and R. Drechsler, Quantum Information Processing 10, 355 (2011).
 Wille et al. (2016) R. Wille, O. Keszocze, M. Walter, P. Rohrs, A. Chattopadhyay, and R. Drechsler, in 2016 21st Asia and South Pacific Design Automation Conference (ASPDAC) (2016) pp. 292–297.
 Lin et al. (2015) C. Lin, S. SurKolay, and N. K. Jha, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 1221 (2015).
 Zulehner et al. (2018) A. Zulehner, A. Paler, and R. Wille, in 2018 Design, Automation Test in Europe Conference Exhibition (DATE) (2018) pp. 1135–1138.
 Herbert and Sengupta (2018) S. Herbert and A. Sengupta, arXiv preprint arXiv:1812.11619 (2018).
 Farhi et al. (2017) E. Farhi, J. Goldstone, S. Gutmann, and H. Neven, arXiv eprints , arXiv:1703.06199 (2017), arXiv:1703.06199 [quantph] .
 Kandala et al. (2017) A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Nature 549, 242 (2017).
 Kivlichan et al. (2018) I. D. Kivlichan, J. McClean, N. Wiebe, C. Gidney, A. AspuruGuzik, G. K.L. Chan, and R. Babbush, Physical review letters 120, 110501 (2018).
 Jiang et al. (2018) Z. Jiang, K. J. Sung, K. Kechedzhi, V. N. Smelyanskiy, and S. Boixo, Physical Review Applied 9, 044036 (2018).
 Corboz and Vidal (2009) P. Corboz and G. Vidal, Physical Review B 80, 165129 (2009).
 Motta et al. (2018) M. Motta, E. Ye, J. R. McClean, Z. Li, A. J. Minnich, R. Babbush, and G. K. Chan, arXiv preprint arXiv:1808.02625 (2018).
 Crooks (2018) G. E. Crooks, arXiv preprint arXiv:1811.08419 (2018).
 Beals et al. (2013) R. Beals, S. Brierley, O. Gray, A. W. Harrow, S. Kutin, N. Linden, D. Shepherd, and M. Stather, Proc. R. Soc. A 469, 20120686 (2013).
 AspuruGuzik et al. (2005) A. AspuruGuzik, A. D. Dutoi, P. J. Love, and M. HeadGordon, Science 309, 1704 (2005).
 Ortiz et al. (2001) G. Ortiz, J. Gubernatis, E. Knill, and R. Laflamme, Physical Review A 64, 022319 (2001).
 Bravyi and Kitaev (2002) S. B. Bravyi and A. Y. Kitaev, Annals of Physics 298, 210 (2002).
 Bravyi et al. (2017) S. Bravyi, J. M. Gambetta, A. Mezzacapo, and K. Temme, arXiv preprint arXiv:1701.08213 (2017).
 Peruzzo et al. (2014) A. Peruzzo, J. McClean, P. Shadbolt, M.H. Yung, X.Q. Zhou, P. J. Love, A. AspuruGuzik, and J. L. Oâbrien, Nature communications 5, 4213 (2014).
 McClean et al. (2016) J. R. McClean, J. Romero, R. Babbush, and A. AspuruGuzik, New Journal of Physics 18, 023023 (2016).
 Romero et al. (2018) J. Romero, R. Babbush, J. R. McClean, C. Hempel, P. J. Love, and A. AspuruGuzik, Quantum Science and Technology 4, 014008 (2018).
 Lee et al. (2018) J. Lee, W. J. Huggins, M. HeadGordon, and K. B. Whaley, Journal of chemical theory and computation (2018).
 Hastings et al. (2015) M. B. Hastings, D. Wecker, B. Bauer, and M. Troyer, Quantum Information & Computation 15, 1 (2015).
 Farhi et al. (2014) E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint arXiv:1411.4028 (2014).
 Farhi and Harrow (2016) E. Farhi and A. W. Harrow, arXiv preprint arXiv:1602.07674 (2016).
 Hadfield et al. (2019) S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel, D. Venturelli, and R. Biswas, Algorithms 12, 1 (2019).
 Childs et al. (2018) A. M. Childs, A. Ostrander, and Y. Su, arXiv preprint arXiv:1805.08385 (2018).
 Nooijen (2000) M. Nooijen, Physical review letters 84, 2108 (2000).
 Wecker et al. (2015) D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev. A 92, 042303 (2015).
 Choi (2011) V. Choi, Quantum Information Processing 10, 343 (2011).
 Zaribafiyan et al. (2017) A. Zaribafiyan, D. J. Marchand, and S. S. C. Rezaei, Quantum Information Processing 16, 136 (2017).
 Cai et al. (2014) J. Cai, W. G. Macready, and A. Roy, arXiv preprint arXiv:1406.2741 (2014).
 Benjamini et al. (2014) I. Benjamini, I. Shinkar, and G. Tsur, SIAM Journal on Discrete Mathematics 28, 767 (2014).
 (48) A matching is a set of mutually disjoint edges of a graph.
 Angel and Shinkar (2016) O. Angel and I. Shinkar, Graphs and Combinatorics 32, 1667 (2016).
 Klymko et al. (2014) C. Klymko, B. D. Sullivan, and T. S. Humble, Quantum information processing 13, 709 (2014).
 (51) Consider the decomposition in which there is a bag for each leaf containing that leaf and the internal vertex.