Algorithm engineering for a quantum annealing platform
Recent advances bring within reach the viability of solving combinatorial problems using a quantum annealing algorithm implemented on a purposebuilt platform that exploits quantum properties. However, the question of how to tune the algorithm for most effective use in this framework is not well understood. In this paper we describe some operational parameters that drive performance, discuss approaches for mitigating sources of error, and present experimental results from a DWave Two quantum annealing processor.
1 Introduction
In the last three decades researchers in algorithm engineering have identified many strategies for bridging the gap between abstract algorithm and concrete implementation to yield practical performance improvements; see [21] or [24] for an overview. In this paper we apply this conceptual framework in a novel context, to improve performance of a quantum annealing algorithm implemented on a purposebuilt platform. Quantum annealing (QA) is a heuristic method for solving combinatorial optimization problems, similar to simulated annealing. The platform is a DWave Two^{1}^{1}1DWave, DWave Two, and Vesuvius are trademarks of DWave Systems Inc. system, which exploits quantum properties to solve instances of the NPhard Ising Minimization Problem (IMP).
Several research groups have reported on experimental work to understand performance of DWave systems; see for example [5], [6], [14], [23], [27] [29], and [30]. Building on this experience we describe an emerging performance model that helps to distinguish the algorithm from its realization on a physical platform. Using this model we present a collection of strategies for improving computation times in practice. Our discussion exposes similarities as well as differences in algorithm engineering approaches to quantum versus classical computation.
The remainder of this section presents a quick overview of the quantum annealing algorithm and its realization in DWave hardware. Section 2 surveys the main factors that drive performance. Section 3 presents our strategies together with experimental results to study their efficacy. Section 4 presents a few concluding remarks.
The native problem
An input instance to the Ising Minimization Problem (IMP) is described by a Hamiltonian containing a vector of local fields and a matrix of couplings (usually uppertriangular). We may consider weights and nonzero to be assigned to vertices and edges of a graph . The problem is to find an assignment of spin values (i.e. a spin configuration or spin state) that minimizes the function
(1) 
This problem has origins in statistical physics, where defines the energy of a given spin state . A ground state has minimum energy. A nonground state is called an excited state; a first excited state has the lowest energy among exited states. Notice how the signs of affect this function: a term with , called a ferromagnetic coupling, is minimized when ; an antiferromagnetic coupling term with is minimized when . The problem is NPhard when is nonplanar [15].
Quantum annealing
While a classical bit takes discrete values 0 or 1, a quantum bit (qubit) is capable of superposition, which means that is simultaneously in both states; thus a register of qubits can represent all possible states simultaneously. When a qubit is read, its superposition state “collapses” probabilistically to a classical state, which we interpret as a spin or .
Qubits act as particles in a quantummechanical system that evolves under forces described by a timedependent Hamiltonian . For a given Hamiltonian they naturally seek their ground state just as water seeks the lowest point in a landscape. Since superposition is represented not by a single state but by a probability mass, we can think of it as moving through hills in a porus landscape – this is sometimes called tunneling. A quantum annealing algorithm exploits this property to perform an analog computation defined by the following components.

The initial Hamiltonian puts each qubit into superposition whereby spins are independent and equiprobable.

The problem Hamiltonian matches the objective function (1) so that a ground state corresponds to an optimal solution to the problem.

The path functions define a transition from to , where and as . Parameter controls the rate of change (possibly speeding up or slowing down) as time moves from start to finish .
The entire algorithm is defined by the timedependent Hamiltonian:
(2) 
A QA algorithm can be simulated classically using many random states to model superposition: (C3) is analogous to a simulated annealing schedule, except it modifies the problem landscape rather than a traversal probability; (C1) corresponds to choosing random initial states in a flat landscape; and (C2) to the target solution. See [10], [17] or [20]. QA belongs to the adiabatic model of quantum computation (AQC), which is a polynomiallyequivalent [2, 13] alternative to the more familiar quantum gate model. QA algorithms typically use problem Hamiltonians from a subclass of those in the full AQC model; thus QA computation is likely not universal, although the question is open (see [22]).
2 Hardware platform and cost models
A DWave Two (DW2) platform contains a quantum annealing chip that physically realizes the algorithm in Equation 2. Qubits and the couplers connecting them are made of microscopic superconducting loops of niobium, which exhibit quantum properties at the processor’s operating temperature, typically below 20mK. See [7] for an overview.
The annealing process is managed by a framework of analog control devices that relay signals between a conventional CPU and the qubits and couplers onboard the chip, in stages as follows.

Programming. The weights are loaded onto the chip. Elapsed time = .

Annealing. The algorithm in (2) is carried out. Time = .

Sampling. Qubit states are measured, yielding a solution . Time = .

Resampling. Steps 2 and 3 are repeated to obtain some number of sampled solutions.
Total computation time is therefore equal to
(3) 
Component times vary from machine to machine; the system used in our tests has operating parameters shown in Table 1. Note that total time is dominated by what are essentially I/O costs; successive processor models have generally shown reductions in these times and this trend is expected to continue. Anneal time can be set by the programmer; the minimum setting is dictated by the system’s ability to shape and .
name  qubits  couplers  temperature  

V7  481  1306  14mK  30ms  20s  116s 
2.1 Analysis
In algorithm engineering we can identify different levels of instantiation in a spectrum that includes the penciland paper algorithm, an implementation in a highlevel language, and a sequence of machine instructions. The definition of time performance (dominant cost vs. CPU time) and the set of strategies for reducing it (asymptotics vs. lowlevel coding) depend on the level being considered. This framework applies to quantum as well as classical computation. This subsection describes instantiation layers and cost models for the quantum annealing algorithm realized on DWave platforms.
Asymptotics of closedsystem AQC
Abstract AQC algorithms have been developed for many computational problems; see [22] for examples. For a given algorithm (a generalization of (2)) and input of size , let denote the minimum spectral gap, the smallest difference between the energies of the ground state and the first excited state at any time . Under certain assumed conditions, if is above a threshold in , then the computation will almost surely finish in ground state. Setting below the threshold increases the probability that a nonoptimal solution is returned. Typically is difficult to compute and bounds are known only for simple scenarios; some algorithm design strategies have been identified for “growing the gap” to reduce asymptotic computation times. See [12], [13], [16], [22], or [25] for more.
Quantum computation in the real world
Asymptotic analysis assumes that the algorithm runs in a closed system in perfect isolation from external sources of energy (thermal, electrical, magnetic, etc). It is a matter of natural law, however, that any physicallyrealized quantum computer runs in an open system and suffers interference from the environmental “energy bath.” Environmental interference may reduce the probability of finishing in ground state – in particular the theoretical annealing time threshold depends on both and the ambient temperature [3], implying that colder is faster. In practice, there is evidence that the thermal bath can increase the probability of success substantially [11].
Realization on DWave platforms
In addition to the above nonideality, DW2 architecture imposes some restrictions on inputs:

The connection topology defines a hardware graph , a subgraph of a Chimera graph [7] containing 512 qubits. An IMP instance defined on a general graph must be minorembedded onto . This requires expansion in problem size [9] in the worst case; in practice we use a heuristic approach described in [8]. See Appendix A for more about Chimera graphs and minorembeddings.

The elements of and must be in the real range . This can be achieved by scaling general and by a positive constant factor .

The weights , specified as floats, are transmitted imperfectly by the analog control circuitry. As a result, they experience perturbations of various sorts, systematic (biased), random, persistent, and transient.^{2}^{2}2This is in contrast to the meme that Hamiltonian misspecification is due to calibration errors (cf. [29]). Calibration errors, which are systematic and relatively fixed, represent only a small component of ICE. The perturbations are collectively referred to as intrinsic control error (ICE). Because of ICE, the problem Hamiltonian solved by the chip may be slightly different from the problem Hamiltonian specified by the programmer.
Putting all this together, total computation time in Equation (3) depends on the probability of observing a successful outcome (a ground state) in a single sample. In theory, depends on a threshold value for , which is typically unknown in opensystem computing. Because of Hamiltonian misspecification we may prefer in order to sample solutions near the (wrong) ground state; if is too small, can be increased to improve the overall success probability. Just as in classical computing, there is a tradeoff between time and solution quality, although very little is known about the nature of that tradeoff.
In what follows, we calculate the empirical success probability for a given input as the proportion of successful samples drawn among samples from the hardware, using various definitions of success in order to examine the relationship between computation time and solution quality. We calculate the expected number of samples required to observe a successful outcome with probability at least (ST99): this is . Computation time is found by combining with component times as in (3).
2.1.1 ICE: The error model
Our simplified model of ICE, which will be described more fully in a forthcoming paper, assumes that the problem Hamiltonian is perturbed by an error Hamiltonian , where and are independent Gaussians having mean 0 and standard deviations and , which vary by chip (and generally decrease with new models). For V7 (see Table 1) we have and . These errors are relative to the nominal scale of , which means that if and are scaled by , relative errors are amplified by a factor of . ^{3}^{3}3This holds for most sources of ICE, with the notable exception of background susceptibility, denoted , which is reduced by a factor of – see [31]. Background susceptibility is an instancedependent, nontransient error that, in a more sophisticated error model, might be separated from Gaussian error.
For a given spin configuration this shifts the effective energy from by a Gaussian error with mean 0 and standard deviation where and are the number of active qubits and couplers in the hardware graph. On a full size V7 problem () we have . By the threesigma rule, and for about 68 and 99.5 percent of spin configurations, respectively. Although scales as , the typical value of scales linearly in , and is near at full size.
ICE imposes a practical limit on the precision of (scaled) weights that can be specified in successful computations. For example, if , then two solutions and with satisfy , so it is relatively unlikely that . The difficulty occurs when energy levels differ by smaller amounts, which can happen when integer weights are scaled by .
Figure 1 illustrates this effect using RAN3 instances (described in the next section) solved at full scale () and halfscale (). The left panel shows how reducing the problem scale increases ST99 roughly tenfold in the median case for largest problems when searching for an optimal solution. The right panel shows ST99 when the success condition is to find a solution within of ground state. In both scales, computation times shrink by more than two orders of magnitude in nearly all percentiles. This suggests that reductions in ICE on future chip models are likely to boost hardware performance significantly. Analyzing performance with respect to the error bound allows us to look beyond the effect of Hamiltonian misspecification, which is detrimental to hardware success rates and may mask evidence of quantum speedup.
3 Algorithm Engineering on DWave Two Platforms
In this section we consider strategies for mitigating ICErelated nonideality and small spectral gaps with the goal of increasing success probabilities and lowering computation times.
DWave systems realize a specific QA algorithm in the sense that the components and are set in firmware (see [7] for details). Here we focus on parameters that can be controlled by the programmer, namely , , and . We also consider classical methods for preprocessing and error correction. We evaluate these strategies on the following instance classes, described more fully in Appendix B.

Random native instances (RAN). For each , is assigned a random nonzero integer in . We set .^{4}^{4}4 Katzgraber et al. [18] have shown that these instances are not suitable for investigating quantum speedup because the solution landscape has many global minima and no nonzerotemperature phase transition. Consequently heuristic search algorithms act almost as random samplers, and there is no evolution of tall, thin barriers that would allow an opensystem quantum annealer to exhibit an advantage through tunneling. However, this class is suitable for looking at nonquantum effects such as ICE, as we do here.

Frustrated loop instances (FL) [14]. These are constraint satisfaction problems whose entries of lie in . They are combinatorially more interesting than RAN instances but do not require minorembedding.

Random cubic MAXCUT instances (3MC). These are MAXCUT problems on random cubic graphs, which must be minorembedded onto the V7 hardware graph.

Random notallequal 3SAT instances (NAE). These are randomly generated problems near the SAT/UNSAT phase transition, filtered subject to having a unique solution (up to symmetry), and then minorembedded onto the V7 hardware graph.
All experiments described here take random instances generated at sizes of up to qubits; the specifics of instances are given in Appendix A. Unless otherwise specified, is calculated from 1000 samples in 10 gauge transformations (next section), totaling 10,000 samples. Optimal solutions are verified using an independent software solver. In rare cases a sample will not contain an optimal solution, giving an empirical success probability of 0 and ST99 . To simplify data analysis we look at ST99 for the 95th and lower percentiles of each input set; missing percentile points in some graphs correspond to observations of .
3.1 Gauge transformations
Given instance , one can construct a modified instance by flipping the sign of some subdimension of the search space, as follows: take a vector , set for each , and set for each coupler . When solving in hardware, we can divide the samples among instances , where is constructed from by a random gauge transformation ; we then apply the (idempotent) transformation to the hardware output to obtain a solution for . Doing this mitigates the effects of some sources of ICE. Gauge transformations are also described in [6] and [29].
Figure 2 shows the effect of applying gauge transformations on RAN1 instances (left) and NAE instances (right). In both cases, gauge transformations help more on the most difficult problems (higher percentiles). This is unsurprising, as difficult problems are typically more sensitive to perturbation by ICE. Note that every is a new instance which requires a programming step; the current dominance of over means that it is rarely costeffective to draw fewer than 1000 samples per gauge transformation. However, this technique may yield more significant performance improvements in applications other than optimization, such as fair sampling of the solution space, which is highly sensitive to Hamiltonian misspecification.
3.2 Optimal anneal times
Previous work [6, 29] has reported on experiments to find optimal settings of for RAN instances, concluding that for problem sizes the lowest possible s is longer than optimal. More recent work has found instances whose optimal anneal time on a DW2 processor is greater than s [19]. Those studies consider anneal time in isolation, so that the optimal time minimizes . However, under the cost model in (3), the optimal minimizes so that a smaller increase in is sufficient to reduce total runtime in practice. Also, by analogy to observations about simulated annealing in [30], we might expect that longer anneal times are optimal for problem classes that are combinatorially interesting but relatively insensitive to misspecification (compared to RAN instances for large ).
Figure 3 shows the result of varying the anneal time from s to s for 200 RAN1 problems and 200 FL2 problems at a 481qubit scale drawing 100,000 samples over 100 gauge transformations: despite the noisy data, small reductions in ST99 can be seen at all quantiles. (Improvements from increased anneal time are less apparent for more errorsensitive classes such as NAE.) These limited results – together with very preliminary data on a prototype chip with qubits – suggest that we can expect anneal times to be more important to performance and to grow above s on nextgeneration chips with up to 1152 qubits.
3.3 Methods for minorembedded problems
Suppose we have a Hamiltonian for a general (nonChimerastructured) IMP instance defined on a graph of vertices; this graph must be minorembedded in the hardware graph for solution. In current DWave architectures we have where is a Chimera graph on vertices. Each contains as a minor (actually requiring only qubits [30]), but in practice we can find more compact embeddings using a heuristic algorithm such as described in [9]. (See Appendix A.)
Optimizing chain strength
An embedding contains, for each vertex of , a set of vertices assigned to a connected subgraph of . We call each (and a spanning subgraph induced by in ) a chain. By assigning a strong ferromagnetic coupling (a largemagnitude value for ) between qubits in the same chain we can ensure that in lowenergy states of , all qubits in will take the same spin value, for each . Thus the hardware output is likely to yield feasible solutions when mapped back to in .
Toosmall produces broken chains (i.e. chains whose spins do not unanimously agree) in hardware output; that is, the solution in the code space cannot be mapped back to the (unembedded) solution space (see [32]). On the other hand, large decreases the problem scaling factor , which effectively boosts ICE, as in Figure 1. Therefore the choice of has a significant effect on hardware success rates.
For NAE3SAT it appears that the hardware performs best when is minimized subject to the constraint that no ground state contains a broken chain; results on fullyconnected spin glasses appear to agree [30]. This value of , denoted , is instance dependent, and can be approximated empirically by gradually increasing from zero until the lowest energy found corresponds to a state with no broken chains. Figure 4 shows the effect of varying chain strength in NAE instances, with . For these instances ranges between 1.5 and 6 on instances of 10 to 40 logical variables, embedded on 18 to 379 physical qubits. At largest problem sizes, increasing by 1 can more than double median computation times. The right panel shows a difference of two orders of magnitude on some instances and an interesting bimodal property that awaits further analysis.
Chain shimming
One can think of the Hamiltonian in an embedded problem as a combination of two Hamiltonians, one encoding the original problem and one encoding the chain constraints. Thus we have since the chain Hamiltonian contains no local fields. Due to ICE, introduces a set of effective small local fields called biases. Although ICE will be mitigated in future hardware generations, this issue can be addressed immediately using a simple technique called chain shimming.
Chain shimming starts by sending the Hamiltonian to the hardware and measuring the bias on each chain: that is, since has no local fields and no connections between chains, the hardware should return unbroken chains having spins and with equal probability. If the distribution is biased, we place a compensating bias on each qubit of each askew chain. A few iterations of this process to refine biases can sometimes improve time performance. This technique can be most efficiently applied when the structure of (and therefore the chain Hamiltonian) is constant over many instances, e.g. for the fullyconnected graphs described in [30].
Figure 5 shows ST99 for 210 3MC instances on V7 with and without shimming. The data provides some evidence of a slight but systematic improvement in performance as problems become larger and more difficult. This improvement is not seen for NAE instances, likely due to the higher chain strength required for NAE instances and the subsequent ICE sensitivity (3MC instances use , which is always sufficient).
3.4 Classical Error Correction via Postprocessing
An obvious remedy for some types of errors described in 2.1.1 is to apply error correction techniques. Pudenz et al. [26, 27] present quantum errorcorrecting codes for DWave architectures; and Young et al. [32] describe quantum stabilizer codes. These techniques boost hardware performance immensely at the cost of many ancillary and redundant qubits, and consequently a reduction in the size of problems that can be solved on a fixedsize chip. An alternative strategy discussed here is to apply cheap classical postprocessing operations to the solutions returned by hardware.
Majority vote
In embedded problems it is possible for the hardware to return solutions with broken chains. Rather than discarding such samples, we may instead set the spin of each qubit in a chain according to a majority vote of qubits in the same chain (breaking ties randomly). This is computationally inexpensive and improves hardware success probabilities. Several more sophisticated methods may be considered for repairing broken chains, such as increasing chain strength until votes are unanimous or converting unanimous chains into local fields to reduce the problem: further study is needed.
Greedy descent
Another simple postprocessing technique is to walk each hardware solution down to a local minimum by repeatedly flipping random bits to strictly reduce solution cost. We call this approach greedy descent. In a minorembedded problem, this can be applied to the solution to the unembedded or the embedded problem, or both. More generally, one can apply as a postprocessing step any classical heuristic that takes an initial state from the hardware and refines it, e.g. simulated annealing, tabu search, or parallel tempering.^{5}^{5}5We recognize that there is conceptually a fine line between using a DW2 system as a preprocessor and using classical heuristic as an errorcorrecting postprocessor. Work is underway to explore these ideas.
Figure 6 shows the effect of postprocessing on NAE instances. In these tests is set to , runs at higher will derive less benefit from majority vote, and more from greedy descent.
4 Conclusions
We have presented several algorithm engineering techniques that aim to improve the performance of DWave quantum annealing processors. These include strategies for modifying anneal times, changing the problem Hamiltonian (gauge transformations, chain shimming), improving chains in embedded problems, and exploiting simple postprocessing ideas. Many more ideas along these lines can be identified, and it remains to be seen what performance gains can be achieved by applying combinations of techniques. Beyond these individual strategies, perhaps a more important contribution has been the presentation of a conceptual framework for distinguishing performance of the quantum algorithm from its realization in technologically immature but rapidlydeveloping hardware.
The question arises as to how some of these techniques might affect the performance of classical software solvers. Techniques that focus on mitigating Hamiltonian misspecification (e.g. chain shimming and gauge transformation) are largely irrelevant to classical heuristic approaches to solving IMP, since digital computers do not experience these types of errors. Other techniques such as postprocessing and longer anneal times can be successfully transferred to some algorithmic approaches – such as heuristic search – but not necessarily to others – such as dynamic programming based approaches.
Both the quantum annealing paradigm and its implementation on quantum hardware are very new concepts, and the current performance model is primitive and incomplete. This paper represents a small step towards better understanding of performance in this novel computing paradigm.
5 Acknowledgements
We extend warm thanks for useful discussions and suggestions to Carrie Cheung, Brandon Denis, Itay Hen, Robert Israel, Jamie King, Trevor Lanting, Aidan Roy, Miles Steininger, Murray Thom, and Cong Wang.
References
 [1] D. Achlioptas, A. Chtcherba, G. Istrate, and C. Moore. The phase transition in 1ink SAT and NAE 3SAT. In Proceedings of the Twelfth Annual ACMSIAM Symposium on Discrete Algorithms, pages 721–722. Society for Industrial and Applied Mathematics, 2001.
 [2] D. Aharonov, W. van Dam, J. Kempe, Z. Landau, S. Lloyd, and O. Regev. Adiabatic quantum computation is equivalent to standard quantum computation. SIAM Journal on Computing, 37(1):166–194, 2007.
 [3] T. Albash, S. Boixo, D.A. Lidar, and P. Zanardi. Quantum adiabatic Markovian master equations. New Journal of Physics, 14(12):123016, 2012.
 [4] P. Berman and M. Karpinski. On some tighter inapproximability results. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming, pages 200–209. SpringerVerlag, 1999.
 [5] Z. Bian, F. Chudak, R. Israel, B. Lackey, W.G. Macready, and A. Roy. Discrete optimization using quantum annealing on sparse Ising models. Frontiers in Physics, 2:56, 2014.
 [6] S. Boixo, T.F. Rønnow, S.V. Isakov, Z. Wang, D. Wecker, D.A. Lidar, J.M. Martinis, and M. Troyer. Evidence for quantum annealing with more than one hundred qubits. Nature Physics, 10(3):218–224, 2014.
 [7] P. Bunyk, E. Hoskinson, M. Johnson, E. Tolkacheva, F. Altomare, A. Berkley, R. Harris, J. Hilton, T. Lanting, A. Przybysz, et al. Architectural considerations in the design of a superconducting quantum annealing processor. IEEE Transactions on Applied Superconductivity, 2014.
 [8] J. Cai, W.G. Macready, and A. Roy. A practical heuristic for finding graph minors. arXiv preprint arXiv:1406.2741, 2014.
 [9] V. Choi. Minorembedding in adiabatic quantum computation: I. The parameter setting problem. Quantum Information Processing, 7(5):193–209, 2008.
 [10] A. Das and B.K. Chakrabarti. Quantum annealing and related optimization methods, volume 679 of Lecture Notes in Physics. Springer, 2005.
 [11] N.G. Dickson et al. Thermally assisted quantum annealing of a 16qubit problem. Nature Communications, 4(May):1903, January 2013.
 [12] E. Farhi. Different strategies for optimization using the quantum adiabatic algorithm. Presented at AQC 2014, www.isi.edu/events/aqc2014/, 2014.
 [13] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda. A quantum adiabatic evolution algorithm applied to random instances of an NPcomplete problem. Science, 292(5516):472–475, 2001.
 [14] I. Hen. Performance of DWave Two on problems with planted solutions. Presented at AQC 2014, www.isi.edu/events/aqc2014/, 2014.
 [15] S. Istrail. Statistical mechanics, threedimensionality and NPcompleteness: I. Universality of intracatability for the partition function of the Ising model across nonplanar surfaces. In Proceedings of the thirtysecond annual ACM symposium on Theory of computing, pages 87–96. ACM, 2000.
 [16] S.P. Jordan, E. Farhi, and P.W. Shor. Errorcorrecting codes for adiabatic quantum computation. Physical Review A, 74(5):052322, 2006.
 [17] T. Kadowaki and H. Nishimori. Quantum annealing in the transverse Ising model. Physical Review E, 58(5):5355, 1998.
 [18] H.G. Katzgraber, F. Hamze, and R.S. Andrist. Glassy Chimeras could be blind to quantum speedup: Designing better benchmarks for quantum annealing machines. Physical Review X, 4(2):021008, 2014.
 [19] D.A. Lidar. Personal communication. 2014.
 [20] R. Martoňák, G.E. Santoro, and E. Tosatti. Quantum annealing by the pathintegral Monte Carlo method: The twodimensional random Ising model. Physical Review B, 66(9):094203, 2002.
 [21] C.C. McGeoch. A guide to experimental algorithmics. Cambridge University Press, 2012.
 [22] C.C. McGeoch. Adiabatic quantum computation and quantum annealing: Theory and practice. Synthesis Lectures on Quantum Computing, 5(2):1–93, 2014.
 [23] C.C. McGeoch and C. Wang. Experimental evaluation of an adiabiatic quantum system for combinatorial optimization. In Proceedings of the ACM International Conference on Computing Frontiers, page 23. ACM, 2013.
 [24] M. MüllerHannemann and S. Schirra. Algorithm engineering: bridging the gap between algorithm theory and practice, volume 5971. Springer, 2010.
 [25] H. Nishimori, J. Tsuda, and S. Knysh. Comparative study of the performance of quantum annealing and simulated annealing. arXiv preprint arXiv:1409.6386, 2014.
 [26] K.L. Pudenz, T. Albash, and D.A. Lidar. Errorcorrected quantum annealing with hundreds of qubits. Nature Communications, 5, 2014.
 [27] K.L. Pudenz, T. Albash, and D.A. Lidar. Quantum annealing correction for random Ising problems. arXiv preprint arXiv:1408.4382, 2014.
 [28] J. Roland and N.J. Cerf. Quantum search by local adiabatic evolution. Physical Review A, 65(4):042308, 2002.
 [29] T.F. Rønnow, Z. Wang, J. Job, S. Boixo, S.V. Isakov, D. Wecker, J.M. Martinis, D.A. Lidar, and M. Troyer. Defining and detecting quantum speedup. Science, 345(6195):420–424, 2014.
 [30] D. Venturelli, S. Mandrà, S. Knysh, B. O’Gorman, R. Biswas, and V. Smelyanskiy. Quantum optimization of fullyconnected spin glasses. arXiv preprint arXiv:1406.7553, 2014.
 [31] W. Vinci, T. Albash, A. Mishra, P.A. Warburton, and D.A. Lidar. Distinguishing classical and quantum models for the DWave device. arXiv preprint arXiv:1403.4228, 2014.
 [32] K.C. Young, R. BlumeKohout, and D.A. Lidar. Adiabatic quantum optimization with the wrong Hamiltonian. Physical Review A, 88(6):062314, 2013.
Appendix A Chimera structure and the hardware graph
A Chimera graph consists of a grid of cells. In current DWave configurations each cell is a complete bipartite graph . Vertices in a row are matched to corresponding vertices in neighboring cells above and below, and vertices in a column are matched to corresponding vertices in neighbouring cells to the left and right. See Figures 7 and 8. A contains vertices of degree 6 (internal vertices), and 5 (sides), totalling edges. The hardware graph of V7 is a subgraph of , a result of fabrication imperfections and high calibration throughput. The working graph varies from chip to chip.
Minorembeddings
A minor of a given graph is any graph that can be constructed from by application of some number of the following operations, in any order:

Remove an edge.

Remove a vertex and incident edges.

Contract an edge, combining its incident vertices.
If graph is a minor of graph , it is straightforward to reduce IMP on to IMP on : for each edge of that is contracted in the graph minor construction, assign to a strong ferromagnetic (negative) coupling. If is sufficiently large, and will take the same spin in any lowenergy configuration. As discussed in the paper, sufficient bounds on are highly dependent on the structure of the individual minor chosen.
Choi [9] shows that a complete graph of vertices can be minorembedded in the upper diagonal of a , using vertices. The problem complexity of deciding the minorembeddability of an arbitrary graph into a Chimera graph is open.
Appendix B Instance classes
We present a brief overview of instance classes used in this work.
b.1 Random instances (RAN)
For given hardware graph , for each generate a weight uniformly at random from the integer range (omitting 0).
Katzgraber et al [18] have shown that these instances are fairly easy for simulated annealing based solvers, for two reasons. First, a random instance typically has a large number of global minima, which can be found using many random restarts. Second, during most of the anneal time the solution landscape has gentle slopes and no high barriers: thus the correct neighborhood of a global optimum is found early in the anneal process.
b.2 Frustrated loop instances
We present a construction of Hen [14] for an Ising Hamiltonian over a hardware graph in which is a ground state. Let , our precision limit, be a positive integer, let be the number of vertices in , and let be a constrainttoqubit ratio. We construct a Hamiltonian consisting of a conjunction of frustrated loops (where denotes the roundoff of ) as follows.
First let be a cycle chosen at random in some way. Here we do this by performing a random walk in starting at a random vertex, and taking the first cycle we find. To ensure that the cycles spread across sufficiently, we reject a cycle if it is contained entirely in a Chimera unit cell, and repeat the construction. Let be the number of vertices in ; note that due to the structure of , is even and at least . We construct a Hamiltonian by setting every edge of to except one chosen uniformly at random, which we set to . It is now straightforward to check that has ground states, and ground state energy .
We repeat this construction for further cycles , with the following wrinkle: if after choosing cycles, an edge of has
we forbid the edge from appearing in cycles .
The final Hamiltonian of the problem is . Note that the specified ground state can be “hidden” by applying a gauge transformation to the Hamiltonian.
The instances we use in this paper have ratio , which roughly corresponds to an empirically observed phase transition [14], and precision limit , which is the minimum possible value that allows a rich set of instances.
b.3 Random cubic MAXCUT instances
MAXCUT on cubic graphs is a wellknown NPhard problem [4] that has a very simple Ising formulation. Maximum cardinality cuts on a graph correspond to ground states of the Ising problem where for all , and elsewhere. It is straightforward to confirm that in the cubic case, when embedding a MAXCUT Hamiltonian, chain strength is always sufficient to guarantee chain fidelity in the ground state. Indeed, any is sufficient.
b.4 NAE3SAT instances
As in FL instances described above, we construct NAE instances as the conjunction of constraints. In this case we use , which corresponds roughly to the phase transition for Notallequal 3SAT [1]. Further, the instances must be minorembedded, as they do not naturally fit into the Chimera hardware graph.
We generate a random NAE3SAT instance by choosing clauses at random. Each clause consists of 3 unique randomly selected variables, each of which is negated independently with probability . The Hamiltonian for a clause , where if is negated in the clause and otherwise, has , and all entries of zero except for . As with frustrated loops, our final Hamiltonian has and .
For sufficiently large , the adjacency graph of the nonzero entries of is sparse, with average degree , and the nonzero entries of are overwhelmingly in . These random instances are converted to Chimerastructured problems via the heuristic minorembedding algorithm described in [8]. The question of how performance varies from one embedding of an instance to another requires further study outside the scope of this paper. To separate this issue from the algorithm engineering approaches we study here, we take five embeddings of each instance. When we want to compare the performance under several parameter settings, we choose the “best” embedding to study. That is, we choose the embedding for each instance that maximizes the geometric mean of under the parameter settings we compare.
Our choice of for each embedded instance was (over)estimated by solving each problem for chain strength in until a ground state without broken chains was found.