# Evidence for a Limited Quantum Speedup on a Quantum Annealer

###### Abstract

The observation of a quantum speedup remains an elusive objective for quantum computing. The D-Wave quantum annealing processors have been at the forefront of experimental attempts to address this goal, given their relatively large numbers of qubits and programmability. A complete determination of the optimal time-to-solution (TTS) using these processors has not been possible to date, preventing definitive conclusions about the presence of a speedup. The main technical obstacle has been the inability to verify an optimal annealing time within the available range. Here we overcome this obstacle and present a class of problem instances for which we observe an optimal annealing time using a D-Wave 2000Q processor with more than qubits. In doing so we are able to perform an optimal TTS benchmarking analysis, in comparison to several classical algorithms that implement the same algorithmic approach: single-spin simulated annealing, spin-vector Monte Carlo, and discrete-time simulated quantum annealing. We establish the first example of a limited quantum speedup for an experimental quantum annealer: we find that the D-Wave device exhibits certifiably better scaling than both simulated annealing and spin-vector Monte Carlo, with confidence, over the range of problem sizes that we can test. However, we do not find evidence for an unqualified quantum speedup: simulated quantum annealing exhibits the best scaling by a significant margin. Our construction of instance classes with verifiably optimal annealing times opens up the possibility of generating many new such classes, paving the way for further definitive assessments of speedups using current and future quantum annealing devices.

The elusive and tantalizing goal of experimentally demonstrating a quantum speedup is being actively pursued using a variety of quantum computing platforms. The holy grail is an exponential speedup, such as that believed to be attainable with Shor’s algorithm for factoring integers Shor (20-22 Nov 1994), or with the simulation of quantum systems R.P. Feynman (1982); Lloyd (1996); Cirac and Zoller (2012). However, by necessity all experimental demonstrations are carried out on physical systems of finite size, as measured in terms of the number of qubits. The accuracy threshold theorem provides a theoretical guarantee that for sufficiently low noise levels and through the use of quantum error correction, a finite-size device can be scaled up fault tolerantly Aharonov et al. (2006). In the absence of an asymptotic guarantee provided by fault-tolerance, a finite-size device provides evidence of what can be expected at larger, future sizes, provided the device temperature, coupling to the environment, and calibration and accuracy errors, can be appropriately scaled down. With this caveat in mind, how do we benchmark a quantum speedup on a finite-size device? This was addressed in Ref. Rønnow et al. (2014), which defined several types of quantum speedup. Of most direct interest to us here is the notion of a limited quantum speedup, which arises when comparing an algorithm run on quantum hardware with classical algorithms that ‘correspond’ to the quantum algorithm in the sense that they implement the same algorithmic approach, but on classical hardware. This is a weakened notion of a quantum speedup, in the sense that it permits other classical algorithms to outperform the quantum algorithm. Here we report on the first observation of a limited quantum speedup, which we obtain by comparing simulated annealing with single-spin updates (SA) Kirkpatrick et al. (1983) and the spin-vector Monte Carlo (SVMC) algorithm Shin et al. (2014) to quantum annealing devices Johnson et al. (2011) with more than qubits. These devices are designed to represent physical implementations of quantum annealing (QA) Kadowaki and Nishimori (1998); Das and Chakrabarti (2008) and the quantum adiabatic algorithm (QAA) algorithm Farhi et al. (2001); Albash and Lidar (2016) for combinatorial optimization problems. We do not find a speedup compared to discrete-time simulated quantum annealing (SQA) Santoro et al. (2002).

A great deal of work has already been invested in the design of suitable quantum speedup benchmarks for optimization via quantum annealing Boixo et al. (2014); Rønnow et al. (2014); King et al. (2015a); Vinci and Lidar (2016) and the identification of suitable classes of problem instances Hen et al. (2015); King et al. (2015b, 2017); Mandrà et al. (2016); Katzgraber et al. (2015, 2014); Denchev et al. (2016). Despite the large body of work on benchmarking quantum annealers, conclusive evidence about how their performance scales has been unattainable. The primary reason, as we shall discuss in detail, is that it has not been possible to identify an optimal annealing time for any class of problem instances. Here we show how this obstacle can be overcome, and present for the first time a complete algorithmic scaling analysis of a quantum annealer, up to the largest available problem size. Our work provides a clear path forward towards a rigorous assessment of the possibility of attaining an unqualified quantum speedup using such devices.

We consider the standard setting where the goal of the optimizer is to find the optimal solution (i.e., the global minimum of the cost function) and one is interested in minimizing the time taken to find the solution at least once. There is a tradeoff between finding the solution with a high probability in a single long run of the algorithm, and running the algorithm multiple times with a shorter runtime and (usually) a smaller single-run success probability. This tradeoff is reflected in the time-to-solution (TTS) metric, which measures the time required to find the ground state at least once with some desired probability (often taken to be ):

(1) |

Here is the success probability of a single-instance run of the algorithm with a runtime , and is the required number of runs; success means that the optimal solution was found. The instance size is , and is the size of the largest instance that the device accommodates (typically set by the total number of qubits); the factor accounts for maximal parallel utilization of the device (see Appendix A for more details).

However, when considering the performance of an algorithm evaluated over an ensemble of randomly chosen instances from the same class, we are typically interested not in the TTS of a single instance but in a given quantile of the TTS distribution over such instances at a given problem size . We denote the -th quantile of the TTS evaluated at by , and suppress the dependence for simplicity. Since the goal of optimization is to find the solution as rapidly as possible, there is an optimal value for a given quantile , , where is minimized, and we denote . It is important to note that since quantiles are solver-dependent, a comparison between different solvers involves different sets of instances.

We can now state the precise nature of the critical obstacle alluded to above: if , where is the smallest possible annealing time on the given quantum annealer, then it becomes impossible to determine . As was shown in Refs. Rønnow et al. (2014); Hen et al. (2015), when operating with a suboptimal , one can easily be led to false conclusions about the scaling with of compared to the all-important scaling as captured by , and even be led to conclude that there is a quantum speedup where there is none (see Appendix A for an example).

None of the experimental quantum annealing benchmarking studies to date Boixo et al. (2014); Rønnow et al. (2014); King et al. (2015a); Vinci and Lidar (2016); Hen et al. (2015); King et al. (2015b); Mandrà et al. (2016); Katzgraber et al. (2015, 2014); Denchev et al. (2016); King et al. (2017) have provided a complete scaling assessment, precisely because it has not been possible to verify that (for any quantile). The culprit was the absence of a suitable class of problem instances for which an optimal annealing time could be verified. Here we report on a class of instances that exhibits an optimal annealing time greater than s on the D-Wave 2000Q (DW2KQ, fourth generation). This allows us to obtain the first complete optimal-TTS scaling results for an experimental quantum annealer, defined as the TTS scaling obtained from certifiably optimal annealing times.

The D-Wave processors used in our study are designed to implement quantum annealing using a transverse field Ising Hamiltonian:

(2) |

where , and is the Ising, or ‘problem’ Hamiltonian whose ground state we are after. The and are the Pauli matrices acting on superconducting flux qubits that occupy the vertices of a ‘Chimera’ hardware graph with edge set Choi (2008); Bunyk et al. (Aug. 2014) and the local fields and couplings are programmable analog parameters. The system is initialized in or near the ground state of the initial Hamiltonian , and the annealing schedules and are described in Appendix B, along with further technical and operational details, including a schematic of the Chimera graph. The DW2KQ processor comprises unit cells, so we can consider subgraphs up to for our analysis, where each subgraph comprises unit cells, and each complete unit cell comprises qubits (a small number of unit cells are incomplete, as a total of out qubits are inoperative).

Before describing the new instance class, we first present the evidence for optimal annealing times in Fig. 1. Figure 1 shows the TTS for a single representative instance. The unambiguous minimum at s is the optimal annealing time for this instance. The presence of a minimum is a robust feature: Fig. 1 shows that the optimal annealing time feature persists for the median TTS (), at all sizes . In all previous benchmarking work, only the rise in as a function of was observed, i.e., was always below , thus precluding the identification of an optimal annealing time.

Having presented the evidence for optimality, we next introduce the instance class with this property. In order to construct random problem instances suitable for benchmarking optimizers at ever-growing sizes, it is useful to consider ‘planted’ solution instances, as this guarantees that the ground state energy is known in advance. The method builds the problem Hamiltonian as a sum of frustrated loop Hamiltonians , such that

(3) |

itself is ‘frustration-free’, i.e., the planted solution is the simultaneous ground state of all terms and hence is the ground state of Hen et al. (2015). Without loss of generality, we can always pick the planted solution to be the all-zero bit (all-spins-up) configuration.

We consider planted solutions defined on the logical graph formed by the complete unit cells of the D-Wave hardware graph (i.e., without faulty qubits or couplers; in the case of an ideal Chimera graph, this would form a square lattice) King et al. (2017). Frustrated loops are then built on this logical graph, where logical couplings between adjacent unit cells are imposed only when all four physical inter-unit cell couplings are available. The intra-unit cell couplers are then all set to be ferromagnetic, guaranteeing that the planted-solution on the hardware graph is the planted-solution on the logical graph with all physical spins in the unit cell set to their corresponding logical spin value. We refer to these as ‘logical-planted’ instances. In Appendix C we introduce ‘hardware-planted’ instances and demonstrate that they also exhibit an optimal annealing time.

Previous studies of planted-solution instances did not exhibit an optimal annealing time on the D-Wave processors Hen et al. (2015); King et al. (2015b, 2017). We thus introduce a modification and supplement the planted-solution instance Hamiltonian [Eq. (3)] with additional terms corresponding to the addition of -qubit ‘gadgets’ that are placed into randomly chosen unit cells:

(4) |

Here denotes the gadget Hamiltonian in unit cell , and denotes a randomly chosen subset comprising a fraction of complete unit cells (we use ). If the ground state of the gadget Hamiltonian is also the all-zero bit state, then the ground state of the full Hamiltonian remains the all-zero bit state. The gadget parameters and complete details of our instance construction are given in Appendix C.

It is instructive to contrast the scaling behavior with and without our gadget. For each instance, we can fit the success probability to a power law of the form (see Appendix D for the fit quality). We show in Fig. 1 the distribution of the scaling coefficient for instances for the logical-planted instances with and without the gadget. The distribution is substantially different in the two cases: the instances with the gadget exhibit larger coefficients, almost all with a value greater than , resulting in a significantly larger initial rate of increase in than for the instances without the gadget. This, in turn, leads to the observed initial decrease in the TTS with increasing annealing time: upon expanding the logarithm in Eq. (1) for small , we find that , so that decreases with provided . Since the TTS must eventually increase with the annealing time [for sufficiently large , , at which point ], this helps to explain why the instances with the gadget exhibit an optimal annealing time.

Having established accessible optimal annealing times (s) for the logical-planted instances, we are now finally ready to present a complete optimal-TTS scaling analysis. Our results for the dependence of on problem size are shown in Fig. 2, where we compare the DW2KQ results to three classical algorithms: simulated annealing with single-spin updates (SA) Kirkpatrick et al. (1983), SVMC (also known as SSSV, the author initials in Ref. Shin et al. (2014)), and SQA based on discrete-time path-integral quantum Monte Carlo Santoro et al. (2002). We discuss our implementation and timing of these algorithms in Appendix E. Table 1 summarizes the performance of each algorithm in terms of the coefficients of exponential and polynomial fits, respectively, for three quantiles (a hybrid polynomial-exponential fit does not work as well; see Appendix G).

(a) Solver | |||
---|---|---|---|

DW2KQ | |||

SA | |||

SVMC | |||

SQA | |||

(b) Solver | |||

DW2KQ | |||

SA | |||

SVMC | |||

SQA |

The results presented in Table 1 demonstrate an unequivocal (95% confidence) limited quantum speedup for the DW2KQ over both SA and SVMC in the case of the logical-planted instances, for the median and the easier quantile, while for the harder quantile, we still observe a limited quantum speedup relative to SA, but the scaling of SVMC and the DW2KQ are statistically indistinguishable. This represents the first observation of a limited quantum speedup on an experimental quantum annealer. However, the SQA algorithm has the best scaling by a significant margin, so that the limited (sequential Mandrà et al. (2016)) quantum speedup we observe is definitively not an unqualified quantum speedup. These results are robust under a change of the speedup metric to the so-called ‘quantile-of-ratios’ Rønnow et al. (2014) (see Appendix H). We also note that of all the solvers we tested, the scaling of SQA and the DW2KQ increases fastest from the easy to the hard quantile.

What is the significance of this demonstration of a limited quantum speedup? The SA and SVMC algorithms can both be viewed as implementing two possible classical limits of quantum annealing. While it is unclear whether the quantum effects in the D-Wave devices that have already been demonstrated on a smaller scale () Boixo et al. (2013, 2014); Albash et al. (2015a, b); Lanting et al. (2014); Johnson et al. (2011); Denchev et al. (2016); Boixo et al. (2016) remain operative at the much larger scales we have employed in our study, our results suggest that quantum effects that are necessarily absent from SA and SVMC, and that might be present in the quantum annealer, can provide an advantage. This is especially significant since there is a large overlap between the instances solved at the median quantile by the DW2KQ and the other solvers, as shown in Fig. 3. This means that if any quantum effects are responsible for the scaling advantage of the DW2KQ over the SA and SVMC algorithms, then they are operative in largely the same set of problem instances, so that these instances may define a target class for quantum enhanced optimization.

It is perhaps tempting to conclude that the SQA algorithm, which can sometimes model quantum effects such as tunneling Isakov et al. (2016); Jiang et al. (2017); Andriyash and Amin (2017)), is indicative of quantum effects in the DW2KQ device. However, we emphasize that the discrete-time SQA algorithm studied here as a solver (rather than a simulator) should not be interpreted as a model of a true analog quantum annealer; in fact, it has been demonstrated that time-discretization may result in improved performance over the continuum case Heim et al. (2015). This is consistent with the superior performance of the SQA algorithm we have observed. One way to test whether quantum effects are responsible for the observed limited quantum speedup is to perform quantum Monte Carlo simulations without Trotter errors Rieger and Kawashima (1999); Sandvik (2003); Albash et al. (2017); unfortunately, at the qubits scale we have worked with here, this is computationally prohibitive. The same is true for master equation simulations Albash et al. (2012); Boixo et al. (2016).

We emphasize that the instances presented here are not necessarily computationally hard, as suggested by the fact that, considering the entire range of sizes we tested, the quality of the polynomial fits is better than that of the exponential fits (see Fig. 2). The logical-planted instances, in the absence of the gadget, are defined on a square lattice and can be solved in polynomial time using the exact minimum-weight perfect-matching algorithm Mandrà et al. (2017). Therefore, it is natural that algorithms optimized with respect to the problem structure demonstrate superior performance. For example, simulated annealing with both single and multi-spin updates (SAC), with the latter being simultaneous updates of all the spins comprising a unit cell (super-spin approximation Mandrà et al. (2016)), scales nearly as well as SQA, as we show in Appendix J. Furthermore, there are many other classical algorithms that do not implement the same algorithmic approach as quantum annealing, such as the Hamze-Freitas-Selby (HFS) Hamze and de Freitas (2004); Selby (2014) algorithm. The latter exploits the low tree-width of the Chimera connectivity graph and has in all studies to date been the top performer for Chimera-type instances. We find that HFS’s scaling performance lies between the DW2KQ and SAC (see Appendix J). We can expect that a more highly connected hardware graph will prevent algorithms such as HFS or SAC from being efficient; which architectures may lend themselves to an unqualified quantum speedup remains an open research question. Meanwhile, our approach defines a clear path forward by concretely establishing the possibility of generating instance classes with accessible optimal annealing times, amenable to a complete scaling analysis.

We thank D-Wave Systems Inc. for providing access to the DW2KQ device in Burnaby, Canada. TA thanks James and Andrew King for useful discussions on the GPU implementations of the classical algorithms. This work was supported under ARO grant number W911NF-12-1-0523, ARO MURI Grant Nos. W911NF- 11-1-0268 and W911NF-15-1-0582, and NSF grant number INSPIRE-1551064. This research used resources provided by the USC Center for High Performance Computing and Communications and by the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

## References

- Shor (20-22 Nov 1994) P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” Foundations of Computer Science, 1994 Proceedings., 35th Annual Symposium on, 35th Annual Symposium on Foundations of Computer Science, 1994 Proceedings , 124–134 (20-22 Nov 1994).
- R.P. Feynman (1982) R.P. Feynman, “Simulating Physics with Computers,” Intl. J. Theor. Phys. 21, 467 (1982).
- Lloyd (1996) Seth Lloyd, “Universal quantum simulators,” Science 273, 1073–1078 (1996).
- Cirac and Zoller (2012) J. Ignacio Cirac and Peter Zoller, “Goals and opportunities in quantum simulation,” Nat. Phys. 8, 264–266 (2012).
- Aharonov et al. (2006) D. Aharonov, A. Kitaev, and J. Preskill, “Fault-tolerant quantum computation with long-range correlated noise,” Phys. Rev. Lett. 96, 050504 (2006).
- Rønnow et al. (2014) Troels F. Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei V. Isakov, David Wecker, John M. Martinis, Daniel A. Lidar, and Matthias Troyer, “Defining and detecting quantum speedup,” Science 345, 420–424 (2014).
- Kirkpatrick et al. (1983) S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science 220, 671–680 (1983).
- Shin et al. (2014) Seung Woo Shin, Graeme Smith, John A. Smolin, and Umesh Vazirani, “How “quantum” is the D-Wave machine?” arXiv:1401.7087 (2014).
- Johnson et al. (2011) M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P. Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, C. J. S. Truncik, S. Uchaikin, J. Wang, B. Wilson, and G. Rose, “Quantum annealing with manufactured spins,” Nature 473, 194–198 (2011).
- Kadowaki and Nishimori (1998) Tadashi Kadowaki and Hidetoshi Nishimori, “Quantum annealing in the transverse Ising model,” Phys. Rev. E 58, 5355 (1998).
- Das and Chakrabarti (2008) Arnab Das and Bikas K. Chakrabarti, “Colloquium: Quantum annealing and analog quantum computation,” Rev. Mod. Phys. 80, 1061–1081 (2008).
- Farhi et al. (2001) Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua Lapan, Andrew Lundgren, and Daniel Preda, “A Quantum Adiabatic Evolution Algorithm Applied to Random Instances of an NP-Complete Problem,” Science 292, 472–475 (2001).
- Albash and Lidar (2016) Tameem Albash and Daniel A. Lidar, “Adiabatic quantum computing,” arXiv:1611.04471 (2016).
- Santoro et al. (2002) Giuseppe E. Santoro, Roman Martoňák, Erio Tosatti, and Roberto Car, “Theory of quantum annealing of an Ising spin glass,” Science 295, 2427–2430 (2002).
- Boixo et al. (2014) Sergio Boixo, Troels F. Ronnow, Sergei V. Isakov, Zhihui Wang, David Wecker, Daniel A. Lidar, John M. Martinis, and Matthias Troyer, “Evidence for quantum annealing with more than one hundred qubits,” Nat. Phys. 10, 218–224 (2014).
- King et al. (2015a) James King, Sheir Yarkoni, Mayssam M. Nevisi, Jeremy P. Hilton, and Catherine C. McGeoch, “Benchmarking a quantum annealing processor with the time-to-target metric,” arXiv:1508.05087 (2015a).
- Vinci and Lidar (2016) Walter Vinci and Daniel A. Lidar, “Optimally stopped optimization,” Physical Review Applied 6, 054016– (2016).
- Hen et al. (2015) Itay Hen, Joshua Job, Tameem Albash, Troels F. Rønnow, Matthias Troyer, and Daniel A. Lidar, “Probing for quantum speedup in spin-glass problems with planted solutions,” Phys. Rev. A 92, 042325 (2015).
- King et al. (2015b) Andrew D. King, Trevor Lanting, and Richard Harris, “Performance of a quantum annealer on range-limited constraint satisfaction problems,” arXiv:1502.02098 (2015b).
- King et al. (2017) James King, Sheir Yarkoni, Jack Raymond, Isil Ozfidan, Andrew D. King, Mayssam Mohammadi Nevisi, Jeremy P. Hilton, and Catherine C. McGeoch, “Quantum annealing amid local ruggedness and global frustration,” arXiv:1701.04579 (2017).
- Mandrà et al. (2016) Salvatore Mandrà, Zheng Zhu, Wenlong Wang, Alejandro Perdomo-Ortiz, and Helmut G. Katzgraber, ‘‘Strengths and weaknesses of weak-strong cluster problems: A detailed overview of state-of-the-art classical heuristics versus quantum approaches,” Physical Review A 94, 022337– (2016).
- Katzgraber et al. (2015) Helmut G. Katzgraber, Firas Hamze, Zheng Zhu, Andrew J. Ochoa, and H. Munoz-Bauza, “Seeking quantum speedup through spin glasses: The good, the bad, and the ugly,” Phys. Rev. X 5, 031026– (2015).
- Katzgraber et al. (2014) Helmut G. Katzgraber, Firas Hamze, and Ruben S. Andrist, “Glassy chimeras could be blind to quantum speedup: Designing better benchmarks for quantum annealing machines,” Phys. Rev. X 4, 021008– (2014).
- Denchev et al. (2016) Vasil S. Denchev, Sergio Boixo, Sergei V. Isakov, Nan Ding, Ryan Babbush, Vadim Smelyanskiy, John Martinis, and Hartmut Neven, “What is the computational value of finite-range tunneling?” Phys. Rev. X 6, 031015 (2016).
- Choi (2008) Vicky Choi, “Minor-embedding in adiabatic quantum computation: I. The parameter setting problem,” Quant. Inf. Proc. 7, 193–209 (2008).
- Bunyk et al. (Aug. 2014) P. I Bunyk, E. M. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Altomare, AJ. Berkley, R. Harris, J. P. Hilton, T. Lanting, AJ. Przybysz, and J. Whittaker, ‘‘Architectural considerations in the design of a superconducting quantum annealing processor,” IEEE Transactions on Applied Superconductivity 24, 1–10 (Aug. 2014).
- Boixo et al. (2013) Sergio Boixo, Tameem Albash, Federico M. Spedalieri, Nicholas Chancellor, and Daniel A. Lidar, “Experimental signature of programmable quantum annealing,” Nat. Commun. 4, 2067 (2013).
- Albash et al. (2015a) T. Albash, T. F. Rønnow, M. Troyer, and D. A. Lidar, “Reexamining classical and quantum models for the D-Wave One processor,” Eur. Phys. J. Spec. Top. 224, 111–129 (2015a).
- Albash et al. (2015b) Tameem Albash, Walter Vinci, Anurag Mishra, Paul A. Warburton, and Daniel A. Lidar, “Consistency tests of classical and quantum models for a quantum annealer,” Phys. Rev. A 91, 042314– (2015b).
- Lanting et al. (2014) T. Lanting, A. J. Przybysz, A. Yu. Smirnov, F. M. Spedalieri, M. H. Amin, A. J. Berkley, R. Harris, F. Altomare, S. Boixo, P. Bunyk, N. Dickson, C. Enderud, J. P. Hilton, E. Hoskinson, M. W. Johnson, E. Ladizinsky, N. Ladizinsky, R. Neufeld, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, S. Uchaikin, A. B. Wilson, and G. Rose, “Entanglement in a quantum annealing processor,” Phys. Rev. X 4, 021041– (2014).
- Boixo et al. (2016) Sergio Boixo, Vadim N. Smelyanskiy, Alireza Shabani, Sergei V. Isakov, Mark Dykman, Vasil S. Denchev, Mohammad H. Amin, Anatoly Yu Smirnov, Masoud Mohseni, and Hartmut Neven, “Computational multiqubit tunnelling in programmable quantum annealers,” Nat Commun 7 (2016).
- Isakov et al. (2016) Sergei V. Isakov, Guglielmo Mazzola, Vadim N. Smelyanskiy, Zhang Jiang, Sergio Boixo, Hartmut Neven, and Matthias Troyer, “Understanding Quantum Tunneling through Quantum Monte Carlo Simulations,” Physical Review Letters 117, 180402– (2016).
- Jiang et al. (2017) Zhang Jiang, Vadim N. Smelyanskiy, Sergei V. Isakov, Sergio Boixo, Guglielmo Mazzola, Matthias Troyer, and Hartmut Neven, “Scaling analysis and instantons for thermally assisted tunneling and quantum Monte Carlo simulations,” Physical Review A 95, 012322– (2017).
- Andriyash and Amin (2017) Evgeny Andriyash and Mohammad H. Amin, “Can quantum Monte Carlo simulate quantum annealing?” arXiv:1703.09277 (2017).
- Heim et al. (2015) Bettina Heim, Troels F. Rønnow, Sergei V. Isakov, and Matthias Troyer, “Quantum versus classical annealing of Ising spin glasses,” Science 348, 215–217 (2015).
- Rieger and Kawashima (1999) H. Rieger and N. Kawashima, “Application of a continuous time cluster algorithm to the two-dimensional random quantum Ising ferromagnet,” Eur. Phys. J. B 9, 233–236 (1999).
- Sandvik (2003) Anders W. Sandvik, “Stochastic series expansion method for quantum Ising models with arbitrary interactions,” Phys. Rev. E 68, 056701 (2003).
- Albash et al. (2017) Tameem Albash, Gene Wagenbreth, and Itay Hen, “Off-Diagonal Expansion Quantum Monte Carlo,” arXiv:1701.01499 (2017).
- Albash et al. (2012) Tameem Albash, Sergio Boixo, Daniel A Lidar, and Paolo Zanardi, “Quantum adiabatic Markovian master equations,” New J. of Phys. 14, 123016 (2012).
- Mandrà et al. (2017) Salvatore Mandrà, Helmut G. Katzgraber, and Creighton Thomas, “The pitfalls of planar spin-glass benchmarks: Raising the bar for quantum annealers (again),” arXiv:1703.00622 (2017).
- Hamze and de Freitas (2004) Firas Hamze and Nando de Freitas, “From fields to trees,” in UAI, edited by David Maxwell Chickering and Joseph Y. Halpern (AUAI Press, Arlington, Virginia, 2004) pp. 243–250.
- Selby (2014) Alex Selby, “Efficient subgraph-based sampling of Ising-type models with frustration,” arXiv:1409.3934 (2014).
- Bravyi and Terhal (2009) S. Bravyi and B. Terhal, “Complexity of stoquastic frustration-free hamiltonians,” SIAM Journal on Computing, SIAM Journal on Computing 39, 1462–1485 (2009).
- King (2016) James King, “Simulating a Quantum Annealer with GPU-Based Monte Carlo Algoritms,” in GTC 2016, S6380 (2016).
- Hastings (1970) W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika 57, 97–109 (1970).
- Metropolis et al. (1953) Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller, “Equation of state calculations by fast computing machines,” The Journal of Chemical Physics 21, 1087–1092 (1953).
- Wolff (1989) Ulli Wolff, “Collective Monte Carlo updating for spin systems,” Phys. Rev. Lett. 62, 361–364 (1989).
- Selby (2013) Alex Selby, “Prog-QUBO,” https://github.com/alex1770/QUBO-Chimera (2013).

## Appendix A Time to Solution (TTS)

For a derivation of the TTS expression (1) in the main text see, e.g., the Supplementary Materials of Ref. Rønnow et al. (2014). Technically, the number of repetitions is defined as , but we do not include the ceiling operation in the calculation of in this work, since this can result in sharp TTS changes that complicate the extraction of a scaling. Similarly, the ratio should be . The TTS should in principle also include all time-costs accrued by running the algorithm multiple times, such as state initialization and state readout times, as well as multiple programming times if different gauges are used [see Eq. (5) below]. We do not include these here either, because at least on the D-Wave processors, the readout time is several factors larger than the annealing time and hence can effectively mask the scaling behavior. Instead, we restrict to be the runtime between state preparation and readout for all our algorithms.

To see how an analysis of the TTS that does not account for optimal annealing times can lead one astray, consider the following extreme example: suppose is too large at all problem sizes , such that always suffices to find the global minimum; in this case for all (i.e., is constant except for the parallelization factor ), which except for trivial problems, must obviously be false.

## Appendix B The D-Wave quantum annealers

We used the D-Wave 2000Q (DW2KQ) processor housed at Burnaby, that features functional qubits and programmable couplers. We also used the DW2X processor housed at USC/ISI, that features functional qubits and programmable couplers. The annealing schedules of the two devices are shown in Fig. 4. The minimum annealing times for all D-Wave processors involved in benchmarking studies to date are: s for the D-Wave One, D-Wave Two X, and D-Wave 2000Q, and s for the D-Wave Two. The Chimera hardware graphs of the DW2KQ and DW2X processors we used in this work are shown in Fig. 5.

For each instance, we ran random gauges (also known as spin-reversal transforms). A gauge is the application of a particular bit-flip transformation to the operators in the problem Hamiltonian, i.e., , where each . This transformation does not change the eigenvalues of the transverse field Hamiltonian, and it is meant to minimize the effect of local biases and precision errors on the device Boixo et al. (2013). For each gauge we took readouts, unless constrained by s. For example for ms, we only took readouts per gauge.

In the main text we focused on the annealing time. There are several other relevant timescales that we present here for completeness. We used the default initial state preparation time (post-programming thermalization time) of s. The readout time for the DW2KQ is s. A complete characterization of the required runtime (the “wall-clock time”) would include the thermalization and readout times in each independent run of the quantum annealer. Furthermore, since we program the same instance multiple times using different gauges, the programming time of s needs to be accounted for. In total, the wall-clock TTS would be given by:

(5) |

where is the number of gauges, and is the total number of runs, divided equally among the gauges. However, since these timescales can be much larger than the optimal annealing time, they can mask the scaling of the TTS, and hence we focus just on the annealing time, as in previous work Boixo et al. (2014); Rønnow et al. (2014). In principle, the initial state preparation time can be reduced and optimized along with the annealing time if included as part of the TTS, but we have not explored in this work how this impacts performance.

## Appendix C Gadget and Instance construction

### c.1 The eight qubit gadget

The key ingredient in our study is an eight qubit ‘gadget’ that fits into the unit cell of the D-Wave processors, which has a bipartite graph connectivity. The gadget parameters are shown in Fig. 6. The ground state of the gadget is the all-zero state of the eight qubits.

Figure 7 compares the results for the -qubit gadget on the two D-Wave processor generations, for different representative unit cells. The success probability exhibits a single maximum, with the peaks occurring at different annealing times on the two devices. While there is some variation in the magnitude of the success probability depending on which unit cell is used, the position of the peak remains robust. We note however that the position of the peak differs on the two devices (around on the DW2X and around on the DW2KQ), indicating that the physical characteristics of the two devices are different beyond simply having different connectivity graphs.

### c.2 Hardware-planted instances

Here we describe a class of instances we call “hardware-planted” (not discussed in the main text), that also exhibits an optimal annealing time within the accessible range of the DW2KQ, as we demonstrate below. The class is defined by constructing planted-solution instances on the hardware graph of the DW2KQ, shown in Fig. 5. This method builds an Ising Hamiltonian as a sum of frustrated loops, where the all-zero state is a ground state of all loops (somewhat confusingly, the Hamiltonian is thus ‘frustration-free’ in the terminology of Ref. Bravyi and Terhal (2009)). We picked (this value is approximately where the peak in hardness occurs for the HFS algorithm, described in Appendix J). We constructed loops as follows. Choose a random vertex on the graph as the starting vertex. From the (at most six) available edges connected to this vertex, randomly pick one. If this new vertex has not been visited already, it is added to the chain. Continue until the chain forms a loop by hitting a member of the chain. Only the loop and not the tail is kept. The loop is discarded if any of the couplings along the loop already have a value with magnitude three and if the loop does not visit at least two unit cells King et al. (2015b). The second condition means that the shortest possible loop includes six vertices (within each unit cell the degree of each vertex is four, but including other unit cells the degree is six, except for unit cells along the boundary of the Chimera graph, where the degree can be five). Along the loop, choose the couplings to satisfy the planted solution, i.e., set them all to be ferromagnetic. Then randomly pick a single coupling and flip it. The couplings along the loop are added to the already-present coupling values on the graph. This process is repeated until loops are generated for the chosen value of .

Finally, we randomly placed our gadget into complete unit cells (without faulty qubits or couplers), where in this work we set , and added these terms to the Ising Hamiltonian. The final Hamiltonian now has a maximum range of . Range is defined as follows: we choose the values of the Ising couplings from discrete values , with , and call the range Rønnow et al. (2014). The ground state of the final Hamiltonian remains the all-zero state because this state is the ground state of all loop and gadget terms in the Hamiltonian.

We provide in Fig. 8 analogous results to those in Fig. 1 of the main text for the hardware-planted instances. In Fig. 8, we show a representative instance at that exhibits an optimal annealing time above s. In Fig. 8, we show that the median TTS exhibits a clear minimum for sizes (no minimum was observed for ), which moves to higher annealing time values with increasing problem size. This is reflected in the distribution of instance optimal annealing times, as shown in Fig. 8. The steady increase in the hardness of the instances with problem size is reflected in the upward shift of the minimum TTS in Fig. 8.

Apart from the obvious difference of the existence of optimal annealing times at smaller sizes ( compared to )), the optimal annealing time is significantly higher for the hardware-planted instances than for the logical-planted instance class, as summarized in Fig. 9. The optimal annealing time is seen to increase with problem size in all cases, rising faster for the DW2KQ than for the DW2X, but eventually flattening for both types of problem instances. The increase is consistent with both the possibility of benefit from a longer adiabatic evolution time or from a longer thermal relaxation time at larger problem sizes.

In Fig. 10 we present the scaling results for the hardware-planted instances at three different quantiles, in analogy to Fig. 2 in the main text. The scaling coefficients are summarized in Table 2. We again find that SQA has the smallest scaling coefficient. The scaling coefficient of the DW2KQ is larger than that of all the classical solvers we tested, so for this class of instances we can definitively rule out the possibility of even a limited quantum speedup.

### c.3 Logical-planted instances

The logical-planted class of instances we discussed in the main text involves constructing planted-solution instances on the logical graph of the DW2KQ, shown in Fig. 5. The construction of the planted instance is similar to that of Ref. King et al. (2017) and follows the same procedure as the construction of the hardware-planted instances. We define the logical graph of the DW2KQ as being comprised of vertices corresponding to only the complete unit cells (with no faulty qubits or couplers). We also included one unit cell that was missing a single intra-cell coupler (unit cell ), since having this missing coupler does not change the analysis. This is a minor difference relative to Ref. King et al. (2017), where only complete unit cells were used. The edges of the logical graph correspond to having all four inter-cell couplers. We did remove the logical edge between unit cells and . On an ideal Chimera graph, this would form a square grid; on the DW2KQ used, the logical graph as we define it is shown in Fig. 5. We constructed an Ising Hamiltonian as a sum of frustrated loops, where here we picked . We again chose to plant the all-zero bit string. We constructed loops as follows. Choose a random vertex on the graph as the starting vertex, and randomly pick an available edge. If that edge edge does not already have magnitude three, add the vertex connecting it to the chain until a loop is formed. Continue until the chain forms a loop by hitting a member of the chain. Only the loop and not the tail is kept. Accept the loop if it includes more than vertices; this again means that the minimum loop has vertices. This then generates a planted-solution instance on the logical graph. In order to embed it on the hardware graph, turn on all the available couplings in the unit cell to be ferromagnetic with magnitude three. In the notation of Ref. King et al. (2017), this amounts to constructing instances with .

Finally, we randomly placed our gadget into a fraction of all the connected unit cells in the planted-solution instance, and added these terms to the Ising Hamiltonian. (The gadget on unit cell has the same ground state even with the one missing coupling.) The final Hamiltonian now has a maximum range of . Again, the ground state of the final Hamiltonian remains the all-zero state.

(a) Solver | |||
---|---|---|---|

DW2KQ | |||

SA | |||

SQA | |||

SVMC | |||

HFS | |||

SAC | |||

(b) Solver | |||

DW2KQ | |||

SA | |||

SQA | |||

SVMC | |||

HFS | |||

SAC |

(a) Solver | |||
---|---|---|---|

DW2KQ | |||

SA | |||

SVMC | |||

SQA | |||

HFS | |||

SAC | |||

(b) Solver | |||

DW2KQ | |||

SA | |||

SVMC | |||

SQA | |||

HFS | |||

SAC |

## Appendix D Success probability scaling

In Fig. 1 of the main text, we showed the power law scaling coefficient of the success probability extracted for . Here we provide supplemental data to support the quality of these fits. First, we show the data and fit in Fig. 11 for the instance depicted in Fig. 1 of the main text as well as its counterpart without the gadget. The data for this instance nicely agrees with a polynomial fit. In Fig. 11, we show that the uncertainty in the power law scaling coefficient for the majority of the instances is below , indicating that the polynomial fits to the data points are reasonable.

## Appendix E Simulation Parameters and Timing

Our implementation of the SA, SQA, and SVMC algorithms is based on the GPU implementation used in Ref. King et al. (2017) and described in more detail in Ref. King (2016). We briefly describe our CUDA implementation of these algorithms here for completeness. In what follows, a sweep is a single Monte Carlo update of all the spins. For all implementations, we use the default cuRAND random number generator (XORWOW). We compile the CUDA code using the ‘-use_fast_math’ flag, which, we note, may not be suitable for Monte Carlo simulations that require accurate calculations of thermal expectation values.

We first discuss our implementation of SA Kirkpatrick et al. (1983). Each GPU thread updates the eight spins in a single unit cell. Because the Chimera graph is bipartite, each thread updates the four spins in one partition followed by the four spins in the second partition. A key feature of the implementation is that the eight local fields, inter-cell couplers, and intra-cell couplers are stored in the memory registers of the GPU. Only the spin configuration is stored on local memory. This minimizes the cost of retrieving data from global memory. We use the GPU intrinsic math function for the calculation of the Metropolis acceptance probability in order to maximize the speed of the algorithm. As many copies as allowed by register memory are run in parallel in separate GPU blocks. Therefore, we have for the timing of SA:

(6) |

where is the number of sweeps and is the time required to perform a single sweep. Because depends on the total number of threads ( and hence on the problem size, this can be equivalently written as:

(7) |

where is the number of total spin updates per unit time performed by the GPU. For consistency we use the timings reported in Ref. King (2016) for runs performed on an NVIDIA GTX 980, which have ns. For SA, we use a linear annealing schedule in from to (in units where the maximum Ising coupling strength is ).

The implementation of SVMC follows the same structure as SA, except that the spin configuration is replaced by angles Shin et al. (2014). The energy potential along the anneal is given by:

(8) |

where we use the DW2X annealing schedule for and shown in Fig. 4. An update involves drawing a random angle , and it is accepted according to the Metropolis-Hastings rule Hastings (1970); Metropolis et al. (1953) with (in units where the maximum Ising coupling strength is ). We use the GPU intrinsic math function for the calculation of the cosine, sine, and Metropolis acceptance probability in order to maximize the speed of the algorithm. The timing of SVMC is the same as in Eq. (7) but with ns replacing .

The implementation of SQA follows the same structure as SA. We restrict the Trotter slicing to in order to fit the spins along the imaginary-time direction into a -bit word. A sweep involves performing a single Wolff cluster update Wolff (1989) along the imaginary time direction for each spin. Once a cluster of spins is picked, it is flipped according to the Metropolis-Hastings rule using the Ising energy of the cluster with . We use the GPU intrinsic math function for the calculation of the Metropolis acceptance probability in order to maximize the speed of the algorithm. At the end of the anneal, one of the slices is picked randomly as the final classical state. The timing of SQA is the same as in Eq. (7) but with ns replacing . We note that increasing the number of Trotter slices, while decreasing the Trotter error, appears to reduce the final success probability for one of the instances we have checked (see Fig. 12, where the peak success probability occurs for slices), an effect noted in Ref. Heim et al. (2015). Studying this effect over the entire set of instances is computationally prohibitive at the qubits scale we have worked with here.

## Appendix F Determining the optimal annealing time and optimal TTS

In order to determine the position of the optimal annealing time and its associated (we drop the quantile notation for simplicity) at a given size for the DW2KQ, DW2X, SA, SQA, and SVMC results, we fit the TTS data for different annealing times to a function of the form . The value of gives the value of , and the value of is the associated .

The fit values and their confidence intervals are given in Tables 5 (logical-planted) and 6 (hardware-planted) for the DW2KQ, in Table 7 for the DW2X (hardware-planted only, since logical-planted instances did not exhibit an accessible optimal annealing time on the DW2X), in Tables 8 and 9 for SA, in Tables 10 and 11 for SQA, and in Tables 12 and 13 for SVMC.

Results of this fitting procedure for the DW2KQ results on the logical-planted instances are shown in Fig. 1 of the main text and on the hardware-planted instances in Fig. 8. The fits for the DW2X on the hardware-planted instances are shown in Fig. 13. Because the largest problem size we can program on the DW2X is at , we studied only hardware-planted instances on this device. Note that because the hardware graph of the DW2X differs from that of the DW2KQ [see Figs. 5 and 5], we should not assume that the instances defined on both are necessarily from the same class.

## Appendix G Alternative fits

In Fig. 2 of the main text and in Figs. 10 (see also Fig. 15 below), we present exponential and polynomial fits to the optimal TTS as a function of . Here we show that a hybrid three-parameter fit, i.e., , does not give reasonable fits with good confidence bounds for all solvers. We restrict our attention to the hardware-planted instances, since in that case we have sizes for the fit. We show in Table 4 the results of the fits for the median, where we see that the estimate for the exponential scaling coefficient of the classical solvers is especially poor, likely due an insufficient number of data points for a three-parameter fit.

Solver | |||
---|---|---|---|

DW2KQ | |||

SA | |||

SQA | |||

SVMC |

## Appendix H Quantile of Ratios

The benchmarking analysis we performed in the main text is akin to the ‘ratio-of-quantiles’ comparison performed in Ref. Rønnow et al. (2014), where an alternative metric for speedups was also defined, called the ‘quantile-of-ratios’. For this case, we find the annealing time that minimizes the TTS for each instance individually, and the per-instance optimal TTS, denoted , is the minimal TTS for each instance individually. For each instance, the ratio of for two different solvers is calculated, and different quantiles over the set of ratios is taken. We show in Fig. 14 the results for the median ratio using the logical-planted instances. The limited quantum speedup of the DW2KQ relative to SA continues to hold, and SQA continues to exhibit the best scaling. The situation with respect to SVMC is inconclusive due to the slight drop of the datapoint.

## Appendix I Calculating the normalized overlap of instances

In Fig. 3 of the main text, we showed the overlap of the logical-planted instances below the median between the classical solvers and the DW2KQ. In order to calculate this quantity, we first fit the of each instance to the function , and we evaluate the function at the optimal annealing time for the median TTS. This gives us a mean value and its associated 1 error for the -th instance. We then perform bootstraps over instances, where for each bootstrap we generate two sets of normally distributed random numbers in order to calculate two TTS realizations for each instance, i.e. . For the two sets of TTS realizations, we calculate the median TTS and find which instances have a TTS below the median. We calculate the overlap fraction of instances between two solvers and for realizations and respectively, which we denote by . The normalized fraction between a solver and the DW2KQ is then given by:

(9) |

The normalization ensures that even with the noisy realization of the TTS, the overlap of instances between a solver and itself is one.

## Appendix J Comparison to the HFS and SAC algorithm

In the main text, we did not make comparisons to the HFS algorithm because it does not implement the same algorithmic approach as the other annealing algorithms. Nevertheless, because it is an algorithm tailored to solve spin-glass problems on the Chimera architecture, it is instructive to compare its performance. For HFS, we use the implementation provided by Ref. Selby (2013) (which does not utilize a GPU), and we ran it in mode “-S3”, meaning that maximal induced trees (treewidth in this case) were used. The TTS is given by Hen et al. (2015):

(10) |

where s is the time for a single update. is the number of repetitions with tree updates. In principle, the optimal TTS is found by finding the value of that minimizes the TTS, but the implementation of Ref. Selby (2013) continues to increase until an exit criterion is reached. Specifically, the algorithm exits when the same lowest energy is found consecutively after tree updates. We have found that this can be highly non-optimal, especially for the hardware-planted instances. Therefore, in all our scaling plots, we have optimized the value of . This is an important distinction from all previous work using the HFS algorithm, which to the best of our knowledge did not optimize , and hence the scaling of the HFS algorithm in previous work is likely to be an underestimate of the true scaling, in the very same sense that the D-Wave scaling reported previously underestimates the true scaling.

For the HFS algorithm, we find that a quadratic fit does not capture the general features of the TTS curve as a function of number of tree updates. Instead, we find that a function of the form captures the data well. The value of gives the value of . We give the fit values and their confidence intervals in Tables 14 and 15. The behavior of the optimal TTS with problem size is shown in Fig. 15, with the scaling parameter fits given in Table 3. We find that for the logical-planted instances, HFS scales better than the DW2KQ, while for the hardware-planted instances the scaling of the two is statistically indistinguishable.

We can also consider simulated annealing with both single and multi-spin updates (SAC), with the latter being simultaneous updates of all the spins comprising a unit cell (super-spin approximation Mandrà et al. (2016)). This requires the algorithm to know about the underlying hardware graph. The implementation of SAC is identical to that of SA, except each sweep of single spin updates is followed by a sweep of unit cell updates. The eight spins in the unit cell are flipped, and the move is accepted according to the Metropolis-Hastings rule. Because the unit cell graph is bipartite, unit cells in the first partition are updated first, followed by the unit cells in the second partition. This algorithm can be implemented as efficiently on GPU’s as the single spin SA algorithm since it does not require storing any more data in memory. Because SAC effectively involves updating twice as many spins as SA in a single sweep, the timing of SAC is the same as in Eq. (7) but with ns. For SAC, we use a linear annealing schedule in from to (in units where the maximum Ising coupling strength is ). We give the fit values and their confidence intervals in Tables 16 and 17, and the behavior of the optimal TTS with problem size is shown in Fig. 15, with the scaling parameter fits given in Table 2. We see that HFS and DW2KQ are statistically indistinguishable, and SAC is the top-scaling algorithm for the hardware-planted instances, outperforming even SQA. Results for the logical-planted instances are given in Table 3, which for convenience reproduces data from Table 1 in the main text. Here we see that SAC outperforms HFS, which in turn outperforms DW2KQ, while SQA is the top-scaling algorithm, outperforming SAC.

Min | Max | ||||
---|---|---|---|---|---|

Min | Max | ||||
---|---|---|---|---|---|

Min | Max | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min sweeps | Max sweeps | ||||
---|---|---|---|---|---|

Min trees | Max trees | ||||
---|---|---|---|---|---|

Min trees | Max trees | ||||
---|---|---|---|---|---|