Quantuminspired annealers as Boltzmann generators for machine learning and statistical physics.
Abstract
Quantum simulators and processors are rapidly improving nowadays, but they are still not able to solve complex and multidimensional tasks of practical value. However, certain numerical algorithms inspired by the physics of real quantum devices prove to be efficient in application to specific problems, related, for example, to combinatorial optimization. Here we implement a numerical annealer based on simulating the coherent Ising machine as a tool to sample from a highdimensional Boltzmann probability distribution with the energy functional defined by the classical Ising Hamiltonian. Samples provided by such a generator are then utilized for the partition function estimation of this distribution and for the training of a general Boltzmann machine. Our study opens up a door to practical application of numerical quantuminspired annealers.
I Introduction
The Ising model is a cornerstone of various fields of science ranging from magnetism Baxter2014 () and quantum field theory Zuber1977 () to combinatorial optimization McMahon2016 () and finance Sornette2014 (). This model analyzes magnetic interactions in a set of spin particles. The energy of each spin configuration with external fields is given by
(1) 
where is the vector of spins values, the coupling matrix, b the bias vector and is the number of interacting spins. The Ising problem consists in finding the ground state or lowenergy spin configurations of the energy functional (1). This is known to be an NPhard combinatorial problem Barahone1982 () and multiple classical numerical algorithms Kirkpatrick1983 (); Bilbro1989 (); Benlic2013 (), neuralnetwork based methods Smith1999 (); Barrett2019 (), quantum hardware devices McMahon2016 (); Harris2018 (); Inagaki2016 () and quantuminspired numerical algorithms King2018 (); Kalinin2018 (); Tiunov2019 () have been developed to approximately solve it.
The joint probability corresponding to each spin configuration is described by the Boltzmann distribution
(2) 
where is the inverse temperature of the system and the normalizing constant is referred to as the partition function. Knowledge of the partition function is important in multiple tasks ranging from statistical physics Wu2019 () to machine learning Ma2013 (). While the unnormalized probability is easy to compute for any spin configutation, the exact calculation of the partition function is intractable for a large and fully connected spin system. Therefore, approximate methods are usually employed. One of the most popular approaches is the iterative solution of a set of meanfield equations, such as the naive meanfield approach, Bethe Bethe1935 () and ThoulessAndersonPalmer Thouless1977 () approximations. While relatively easy to implement, meanfieldbased methods Jordan1999 () usually perform poorly when dealing with systems having strong, longrange correlations between spins.
An alternative approach is by generating unbiased samples from the true Boltzmann distribution, which can then be used to estimate the partition function either by direct summation or using more advanced algorithms such as annealed importance sampling (AIS) Neal1998 (). Boltzmann sampling is an important problem in its own right, relevant to multiple tasks in machine learning Hinton2006 (), manybody physics Carleo2017 (), quantumstate tomography Torlai2018 (); TiunovQST2019 (), chemistry Zhu2002 () and protein folding Rizzuti2013 ().
Softwarebased sampling algorithms include Markov chain Monte Carlo (MCMC) Newman1999 () or simulated annealing Kirkpatrick1984 (); however, these methods are prone to get trapped in local optima. Recently, this family was joined by machinelearning (ML) based algorithms, namely Variational Autoregressive Networks (VAN) Wu2019 () and Boltzmann Generator Networks (BGN) Noe2019 (), which have been demonstrated to be efficient for both approximate Boltzmann sampling and partition function estimation.
In 2014, Dumoulin et al. Dumoulin2014 () proposed to use quantum hardware — specifically, the quantum annealer — as a fast source of samples from a given Boltzmann distribution. This study analyzed the main limitations imposed by existing quantum hardware (such as DWave) among which are noise in parameters, limited parameter range, and restrictions in available architectures. The proposed method was experimentally implemented using DWave 2X quantum annealer Benedetti2016 (); Benedetti2017 () for the training of a Boltzmann machine Hinton2006 (); Teh2006 (). However, the above mentioned limitations in existing quantum annealers limited the study to lowdimensional datasets.
A novel device that appears promising in this context is the optical coherent Ising machine (CIM), which was shown McMahon2016 (); Inagaki2016 () to find good ground state approximations for classical, fully connected Ising systems of sizes up to 2000 spins. In a separate study, CIM was shown to significantly outperform the DWave annealer in application to densely connected Ising systems CIMvsDwave (). This technology inspired a range of classical algorithms, some of which — the noisy meanfield annealing (NMFA) King2018 () and the CIM simulation (SimCIM) Tiunov2019 () — were shown to outperform the CIM in terms of both the computational speed and mean sample energy. A further advantage of these algorithms is the ability to operate with Ising Hamiltonian with the coupling and bias matrix elements described by arbitrary real numbers. However, to the best of our knowledge, neither CIM nor any of the quantuminspired algorithms (NMFA or SimCIM) have been applied to Boltzmann sampling yet.
An important application of the Boltzmann sampling and the partition function estimation is the setting known as the inverse Ising problem Nguyen2017 () in statistical physics or Boltzmann machine (BM) Hinton1983 (); Hinton2002 () in ML. It consists in finding the coupling matrix and bias vector b entering the Boltzmann probability law (2), which maximizes the likelihood of a specific set of spin configuration samples. The samples are utilized to estimate the intractable terms in the gradients arising in the training.
In this paper, we find that the quantuminspired numerical annealer SimCIM Tiunov2019 () is capable of drawing highquality unbiased samples from a Boltzmann probability distribution associated with the Ising model. We demonstrate the training of a fullyvisible and fullyconnected BM using samples provided by SimCIM on two training sets. The first dataset (Bars&Stripes, BAS) is of low dimension, which permits exhaustive search over all possible spin configuration, thereby enabling direct quality estimation and comparison of various sampling methods. Second, we train the BM on the highdimensional dataset of handwritten digit images MNIST. The corresponding Ising graph is fully connected and has a dimension of 794. In addition to character recognition, we numerically estimate the partition function of this system. In the latter task, we use AIS with SimCIM as the sampler for all intermediate distributions.
SimCIM implementation for Boltzmann sampling is beneficial compared to VAN or BGN because, in order to draw samples, SimCIM requires no knowledge about the system of interest apart from the spin interaction strength. Unlike VAN or BGN, SimCIM does not require specific neural network training for each update of the coupling matrix and bias vector and each run of the partition function estimation, which could be very demanding for high dimensional systems. On the other hand, in comparison with analog annealers such as DWave or CIM, SimCIM is advantageous in that it supports realvalued and fullyconnected highdimensional coupling matrices and could be executed on a classical computer.
Ii Review of previous results and methods
In this section, we give a brief recap of previous results and techniques that are relevant to our work.
ii.1 General Boltzmann Machines
The Boltzmann machine (BM) is a stochastic energybased data model which was introduced at the dawn of ML by Hinton and Sejnowski Hinton1983 (). BM is the primary component of deep belief networks Hinton2006 () and deep Boltzmann machines Salakhutdinov2009 (). Mathematically, the BM can be represented as a complete graph whose nodes represent binary units that can take on the values of . The nodes are connected by edges, each of which is associated with an arbitrary real value. The graph is assigned the “energy” described by Eq. (1).
To apply the BM for machine learning — for example, character recognition — we associate every pixel of the bitmap containing the character with a node of the BM. The BM parameters (coupling matrix and biases) are “trained”, i.e. assigned such values that the energy (1) associated with the bitmaps within the training set is significantly lower than that for an arbitrary bitmap. Then, when posed with a task of recognizing whether a particular unknown bitmap resembles those from the training set, the energy (1) for that bitmap is calculated. A low value would indicate a high likelihood of the affirmative answer.
In order to facilitate the training, we further associate the energy with the Boltzmannlike probability given by Eq. (2) with . The optimization objective during the BM training is the maximization of the total loglikelihood of all bitmaps of the training set S. We then have HintonPracticalGuide ()
(3) 
which gives rise to a conceptually straightforward gradient descent update rule
(4) 
where is the learning rate. In Eq. (4), the first term is the expectation with respect to the training set and often called the positive phase of the update, while the second term is the expectation over the model’s probability distribution and called the negative phase.
Practical implementation of this gradient descent is complicated by the fact that calculating the negative phase requires exhaustive search over all possible configurations of s, and is hence intractable for a classical computer. Therefore it is usually replaced with a numerical estimation using a set of configuration samples drawn from the distribution (2). In this case, Eqs. (4) become
(5) 
where is the number of elements in the training set, the number of drawn samples.
Traditionally, in the context of BM training, the sampling has been approached using MCMC Salakhutdinov2009 (). This approach is however too slow to be practical because sometimes it requires an extremely large number of steps to give unbiased equilibrium samples. This motivates the interest to quantum annealers and their simulators in the context of BM training.
ii.2 Annealed importance sampling
It is not required to know the exact value of partition function for the approximate BM training procedure introduced above. However, this knowledge of is necessary for the evaluation of the training set likelihood , and hence permits the estimation of the training quality. While it appears straightforward that the partition function can be evaluated directly by performing summation over a large number of samples, this approach is less efficient than the procedure known as annealed importance sampling Neal1998 ().
Consider two probability distributions and defined as and , where is the corresponding unnormalized probability which is easily computable for both distributions. Assume that is uniform while is the distribution of our interest. Then we can estimate with samples from the uniform distribution using importance sampling
(6) 
where denotes the samples drawn from the uniform distribution . This method, while being relatively simple to implement, gives estimates of with high variance especially when the target distribution differs significantly from uniform. To reduce the variance of this estimate, a sequence of intermediate distributions , where and is chosen to gradually transition from the uniform to target distribution. For each pair of consecutive distributions, the importance weight is calculated using Eq. (II.2) and then the partition function of the target distribution is estimated by multiplying
(7) 
To estimate using AIS, it is necessary to draw multiple unbiased samples from all intermediate distributions. Given that can be very large, a fast sampling method is essential.
ii.3 SimCIM numerical annealer
SimCIM is an iterative algorithm for sampling lowenergy spin configurations in the classical Ising model. It treats each spin value as a continuous variable from the range . Each iteration begins with calculating the mean field acting on each spin by all other spins. Then the gradients for the spin values are calculated according to , where are the annealing control parameters and is Gaussian noise. Then the spin values are updated according to , where is the activation function
(8) 
After multiple updates, the spins will tend to either or and the final discrete spin configuration is obtained by taking the sign of each . The typical evolution of spin values and the corresponding energy is shown in Fig. 1.
SimCIM has been tested on a variety of problems, including graphs from the Gset Gset (). Implemented on a consumer graphic processor, this algorithm runs faster and generates higher quality samples than many analog and digital annealing processes.
ii.4 Routine for effective temperature estimation
In 2016, Benedetti et al. Benedetti2016 () pointed out that quantum annealers that have strong interaction with the environment, such as DWave 2000Q, freeze out the dynamics of a spin system before the termination of the annealing process. As a result, such annealers sample from a Boltzmann distribution with some finite temperature. As we demonstrate in the next section, the samples generated by SimCIM have the same property.
As we see from Eqs. (1) and (2), scaling the coupling matrix and bias vector b is equivalent to scaling the effective temperature by the same factor. This enables us to control the temperature of the distribution from which the samples are drawn, which is necessary for many applications, including BM training. To take advantage of this capability, however, we also need a method that would allow to be measured. We describe this method below Benedetti2016 ().
Consider a sample set corresponding to some inverse temperature . The Boltzmann probability of a spin configuration with energy is , where is the degeneracy of level . For two energy levels and , separated by , the logarithmic ratio of the probabilities is given by
(9) 
Because of the unknown degeneracies, we cannot use the above equation to evaluate directly from a set of samples. However, one can obtain an additional set of samples with a different inverse temperature, , by scaling the coupling matrix and bias vector by . The difference of the log ratios for the two sets is then
(10) 
Now and can be estimated from the slope of the linear dependence between and and the ratio .
For this method to work well, it is necessary that the two sets of samples have significant overlap, i.e. contain multiple samples within small energy intervals around both and . This can be achieved by judicious choice of .
Iii Results
iii.1 Temperature evaluation for SimCIM
The goal of this subsection is to demonstrate SimCIM’s capability to draw unbiased samples from a desired Boltzmann distribution, and that the methods of temperature evaluation and control described in section II.4 apply to SimCIM. We work with an Ising Hamiltonian with and being a symmetric matrix whose offdiagonal elements are uniformly sampled from and the diagonal elements are set to . This relatively simple Ising problem allows exhaustive search over spin combinations, thereby enabling us to draw a sample set from the exact Boltzmann distribution, which can be used as a benchmark.
We sampled two batches of 50000 vectors via SimCIM with different inverse temperatures and , with , by scaling the coupling matrix . An additional set of 50000 samples was drawn from the true Boltzmann distribution with . The histograms of these samples are presented in the Fig. 2 (a) and the pairwise dependencies (), defined by Eq. (10), in Fig. 2 (b). The linear shape of these dependences shows that the samples produced by SimCIM follow the Boltzmann distribution.
Our next goal is to use SimCIM to sample from the Boltzmann distribution with . To this end, we determine the slope of the dependencies () for the two sample sets drawn from SimCIM [red dots in Fig. 2 (b)] and find , from which we find and . Using this information, we rescale the coupling matrix for and sample 50000 additional vectors using SimCIM. The dependence () for this set with respect to the true Boltzmann distribution is close to constant, as expected [Fig. 2(b)].
iii.2 BM training on Bars & Stipes dataset
To benchmark the SimCIM performance as a sampling mechanism for machine learning, we start with BM training on the simple synthetic dataset from the Bars & Stripes (BAS) family [Fig. 3]. It consists of 30 instances of 4 4 bitmaps, with each pixel taking a value from . All instances occur with the same probabilities except the first and last one in Fig. 3, whose probability of occurrence is twice as high. We refer to this as the “ground truth probability distribution” of the training set. Similarly to the previous section, this simple dataset allows to do a full exhaustive search over all spin configurations and is thus handy for benchmarking.
The BM was trained using the method described in Sec. II.1. Each bitmap was unrolled into a vector of size 16. The gradients’ positive phase was always calculated from the full training set. The negative phase of gradients from Eqs. (II.1) was calculated using several different approaches. As a baseline method for comparison, we exactly calculated the gradients by exhaustive search of all possible spin configurations. The second method estimated gradients using samples drawn by MCMC with the chain length of 1000. Third, we utilized vectors sampled by SimCIM with and without temperature correction. All approximate methods of gradients’ negative phase estimation used sample sets of equal size 250. To obtain temperature corrected sample sets, two additional sample sets of size 250 were produced for each update (see Sec. II.4).
To monitor the training process, we selected two numerical metrics. First, we calculated the mean loglikelihood of the training set with respect to the current state of the BM parameters. Second, we monitored the KullbackLeibler (KL) divergence between the ground truth probability distribution defined by the training set and the distribution defined by the current BM parameters, which can be calculated explicitly thanks to the small size of the problem. Both metrics were evaluated after each update of the BM parameters. The learning curves are presented in Fig. 4.
From these learning curves, we can see that gradients calculated using samples drawn by SimCIM with temperature correction follow the true gradients better than all other methods do. This is to be expected, because in order for the training method defined by Eqs. (II.1) to work well, the samples must be drawn from the Boltzmann distribution with temperature 1. The vectors sampled by SimCIM satisfy this requirement better than samples provided by other sampling mechanisms.
iii.3 Partition function estimation
In this section, we implement AIS with SimCIM as the sampling tool to estimate the partition function of a Boltzmann distribution and compare its performance with other approximate methods. The energy functionals for this test have been obtained from the coupling and weight matrices from consecutive epochs of the BM training process described in the previous subsection. The BM was trained using true gradient updates for 2000 epochs. The partition function was estimated every 50 epochs.
The methods included in the comparison are as follows.

AIS equipped with SimCIM and MCMC samplers for importance weights calculations (7). We used 10 intermediate distributions and sample sets of size 250 to compute each importance weight. Consecutive intermediate distributions differed from each other only by the inverse temperature . To gradually transition from the uniform () to target distribution () we increase in steps. At each step, we used two additional sample batches of size 250 to estimate SimCIM’s effective temperature. In total, AIS with SimCIM employed 7500 samples for the computation of each partition function, where only 2500 samples were used for the algorithm and rest are auxiliary samples for the temperature correction.

VAN — an artificial neural networkbased method for partition function estimation of discrete distributions, which was sown to outperform meanfield family algorithms Wu2019 (). VAN was trained either for 2000 epochs with 250 samples in each batch (which gives samples in total) or for 1500 epochs with batchsize of 5 (which is 7500 samples total).

Direct sampling, with the sample sets of size either 7500 or drawn by SimCIM.
The results are sown in Fig. 5. We see that the most competitive methods are AIS equipped with the SimCIM sampler and VAN; however, the total sample consumption with AIS+SimCIM was 7500 while that with VAN was .
Fig. 5 (b) shows the KL divergence between the true Boltzmann probability distribution and approximate distributions derived using samples drawn by SimCIM, MCMC and trained VAN. We can see that VAN (which consumed more samples for training) approximates the true Boltzmann probability distribution better compared to all other methods. However, VAN requires training for every new BM parameter set, which could be quite timeconsuming. For example, it takes around 15 seconds on a desktop with an NVidia GTX 1080 Ti GPU for VAN, while SimCIM samples 7500 vectors in 1.5 seconds.
iii.4 BM training on MNIST dataset
To further test the capabilities of SimCIM as a sampling mechanism, we trained a BM on bitmaps from the MNIST dataset (Fig. 6). This dataset contains 60000 instances of pixel grayscale bitmaps of handwritten digits in the training set and 10000 similar bitmaps in the test set. Each bitmap was unrolled into a vector and the vector elements after normalization were binarized to the values with a threshold of 0.3. In order to enable the recognition of different digits, each training bitmap was augmented with a 10bit onehot vector such that for and otherwise, where is the label of (i.e. the actual digit corresponding to) the bitmap. In this way, we obtained a set consisting of 60000 binary vectors of length , which we split into training and crossvalidation sets containing 50000 and 10000 images respectively.
During the training, we used minibatches of size 500 for the calculation of the positive phase of the gradient. The negative phase was calculated using vector sets of size 250 sampled by SimCIM with and without effective temperature adjustment. The model’s parameters were updated after each minibatch. The BM initial weights were initialized as small random numbers. The training procedure was executed for 50 epochs, each epoch containing updates.
The training process has been evaluated using two benchmarks. The first one was the average loglikelihood of the test set bitmaps. Because the exact value of this parameter is intractable to compute in this case, we utilized AIS equipped with the SimCIM sampler to approximate the partition function and subsequently estimate the average loglikelihood. The second metric was the classification accuracy of the test set digits. Each test bitmap was augmented with 10 variants of the onehot vectors corresponding to 10 possible labels, and the corresponding 10 energy values were calculated. The label corresponding to the lowest energy yielded the model’s prediction for the digit contained in the bitmap. The choice of energy, rather than loglikelihood, as the classification criterion, is justified in this case, because, for a given coupling matrix, the partition function is constant and hence the likelihood (2) is a monotonic function of the energy.
The learning curves are shown in Fig. 7. We see that the BM can be trained successfully using the SimCIM sampler in view of both evaluation metrics. Consistently with our previous observations, the models trained using SimCIM with effective temperature adjustment perform better in terms of both the average loglikelihood and classification accuracy.
The classification accuracy reaches 86.9% [Fig. 7(b)]. This result compares poorly to stateoftheart supervised learning techniques based e.g. on convolutional deep neural networks Ciresan2012 () but is comparable to those obtained by other neural network architectures without hidden units. We believe that adding hidden units to our model would significantly enhance its accuracy.
In order to test the generative capabilities of the trained model, we used it to generate bitmaps corresponding to different digits. To that end, a onehot vector corresponding to a particular digit was chosen and the remaining part of the spin vector was sampled using the coupling matrix obtained through the training. A fixed gives rise to effective biases that ensure that bitmaps depicting the digit have lower energies, and are thus more likely to be produced by the annealer.
The results of this numerical experiment are shown in Fig. 8. We can see that the quality of generated digits improves with training.
Iv Summary
We have demonstrated the quantuminspired digital annealer SimCIM as a mechanism that provides highquality oneshot samples from the Boltzmann distribution for a classical Ising energy functional. We used the samples to train a fullyvisible and fullyconnected general Boltzmann machine and to estimate the partition function of a highdimensional probability distribution. The model trained on the MNIST dataset is able to achieve a reasonable classification accuracy and meaningful picture reconstruction. Approximate values of the partition function, as well as gradients computed using samples provided by SimCIM, agree well with the exact values computed using exhaustive search in a setting where such search is possible.
Implementation of SimCIM as a Boltzmann generator is beneficial compared to deep neural networkbased methods because SimCIM does not require timeconsuming training and could work in a plugandplay manner. On the other hand, SimCIM is beneficial compared to quantum annealers such as Dwave because existing quantum annealing hardware suffers from multiple limitations, such as the noise in parameters, limited parameter range, restrictions in available architectures, decoherence, etc. Dumoulin2014 (). Our algorithm enables simulation of various settings in which these limitations are curtailed or even completely eliminated, and therefore provides insights into the potential that nearterm noisy intermediatescale quantum (NISQ) devices Preskill2018 () may have in application to various tasks such as ML or statistical physics.
This study motivates several open questions that could be of interest both for machine learning and quantum physics communities. ML researchers may wish to investigate whether or not quantum annealers and their simulators are applicable to the training of multilayer deep Boltzmann machines Salakhutdinov2009 (), deep belief networks Hinton2006 () or restricted BM based convolutonal neural networks Lee2011 () and make rigorous comparison with existing methods of their training. For physicists, the approach to BM training and partition function estimation presented here may be handy in the tasks of finding neuralnetwork based solutions to quantum optimization problems, such as the ground states of molecules Xia2018 () and quantum manybody systems Carleo2017 () as well as quantum state tomography Torlai2018 (); TiunovQST2019 () of high dimensional systems. A further interesting direction of future research is the development of analog annealers with the aim to outperform existing digital annealing algorithms. These annealers need not be limited to the quantum domain; indeed, analog annealers that successfully employ classical hardware have been proposed and tested Bello2019 (); Chou2019 ().
Acknowledgments
We are grateful to Jacob D. Biamonte for a fruitful discussion. This work is supported by Russian Science Foundation (197110092).
References
 R. J. Baxter. Exactly solved models in statistical mechanics (Dover Publications, 2014).
 J. B. Zuber, and C. Itzykson. Quantum field theory and the twodimensional ising model, Phys. Rev. D 15, 2875 (1977).
 P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, et. al.. A fully programmable 100spin coherent Ising machine with alltoall connections, Science 354, 614 (2016).
 D. Sornette. Physics and financial economics (17762014): puzzles, ising and agentbased models, Reports on Prog. Phys. 77, 062001 (2014).
 F. Barahona. On the computational complexity of ising spin glass models, J. Phys. A: Math. Gen. 15, 3241 (1982).
 S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing, Science 220, 671 (1983).
 G. Bilbro, R. Mann, T. K. Miller, W. E. Snyder, D. E. Van den Bout, and M. White. Optimization by mean field annealing, in Advances in neural information processing systems, pp. 91â98 (1989).
 U. Benlic and J.K. Hao. Breakout local search for the maxcut problem, Eng. Appl. Artif. Intell. 26, 1162 (2013).
 K. A. Smith. Neural networks for combinatorial optimization: a review of more than a decade of research, INFORMS J. on Comput. 11, 15 (1999).
 T. D. Barrett, W. R. Clements, J. N. Foerster, and A. I. Lvovsky. Exploratory Combinatorial Optimization with Reinforcement Learning, arXiv:1909.04063 (2019).
 R. Harris, Y. Sato, A. J. Berkley, M. Reis, F. Altomare, et. al. Phase transitions in a programmable quantum spin glass simulator, Science 361, 162 (2018).
 T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, et. al. A coherent ising machine for 2000node optimization problems, Science 354, 603 (2016).
 A. D. King, W. Bernoudy, J. King, A. J. Berkley, and T. Lanting. Emulating the coherent Ising machine with a meanfield algorithm. arXiv: 1806.08422 (2018).
 K. P. Kalinin, and N. G. Berloff. Global optimization of spin Hamiltonians with gaindissipative systems, Scientific Reports 8, 17791 (2018).
 E. S. Tiunov, A. E. Ulanov, and A. I. Lvovsky. Annealing by simulating the coherent Ising machine, Opt. Express 27, 10288 (2019).
 D. Wu, L. Wang, and P. Zhang. Solving Statistical Mechanics Using Variational Autoregressive Networks, Phys. Rev. Lett. 122, 080602 (2019).
 J. Ma, J. Peng, S. Wang, and J. Xu. Estimating the partition function of graphical models using Langevin importance sampling, Journal of Machine Learning Research 31, 433 (2013).
 H. A. Bethe. Statistical theory of superlattices, Proc. R. Soc. Lond. A 150, 552 (1935).
 D. J. Thouless, P. W. Anderson, and R. G. Palmer. Solution of ’Solvable model of a spin glass’, Philos. Mag. 35, 593 (1977).
 M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An Introduction to Variational Methods for Graphical Models, Mach. Learn. 37, 183 (1999).
 M. E. J. Newman, and G. T. Barkema. Monte Carlo Methods in Statistical Physics, Clarendon Press, 1 edition (1999).
 S. Kirkpatrick. Optimization by simulated annealing: Quantitative studies, Journal of statistical physics 34, 975 (1984).
 G. E. Hinton. Reducing the Dimensionality of Data with Neural Networks, Science 313, 504 (2006).
 G. Carleo, and M. Troyer. Solving the quantum manybody problem with artificial neural networks, Science, 355, 602 (2017).
 G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G.Carleo. Neuralnetwork quantum state tomography, Nature Physics, 14, 447 (2018).
 E. S. Tiunov, V. V. Tiunova, A. E. Ulanov, A. I. Lvovsky, and A. K. Fedorov. Experimental quantum homodyne tomography via machine learning, arXiv: 1907.06589 (2019).
 Z. Zhu, M. E. Tuckerman, S. O. Samuelson, and G. J. Martyna. Using Novel Variable Transformations to Enhance Conformational Sampling in Molecular Dynamics, Phys. Rev. Lett. 88, 100201 (2002).
 B. Rizzuti, and V. Daggett. Using simulations to provide the framework for experimental protein folding studies, Archives of Biochemistry and Biophysics, 531 128 (2013).
 V. Dumoulin, I. J. Goodfellow, A. Courville, and Y. Bengio. On the Challenges of Physical Implementations of RBMs. In Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence (2014).
 M. Benedetti, J. RealpeGÃ³mez, R. Biswas, and A. PerdomoOrtiz. Estimation of effective temperatures in quantum annealers for sampling applications: A case study with possible applications in deep learning, Physical Review A, 022308 (2016).
 M. Benedetti, J. RealpeGÃ³mez, R. Biswas, and A. PerdomoOrtiz. QuantumAssisted Learning of HardwareEmbedded Probabilistic Graphical Models, Phys. Rev. X, 7, 041052 (2017).
 G. E. Hinton, S. Osindero, and Y.W. Teh. A Fast Learning Algorithm for Deep Belief Nets, Neural Computation 18, 1527â1554 (2006).
 R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, et. al. Experimental investigation of performance differences between Coherent Ising Machines and a quantum annealer, Science Advances 5, eaau0823 (2019).
 F. NoÃ©, S. Olsson, J. KÃ¶hler, and H. Wu. Boltzmann generators: Sampling equilibrium states of manybody systems with deep learning, Science 365, eaaw1147 (2019).
 H. C. Nguyen, R. Zecchina, and J. Berg. Inverse statistical problems: from the inverse Ising problem to data science, Advances in Physics 66, 197 (2017).
 G. E. Hinton, and T. J. Sejnowski. Optimal perceptual inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 448 (1983).
 G. E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation 14, 1771â1800 (2002).
 R. M. Neal. Annealed Importance Sampling, Journal Statistics and Computing 11, 125 (1998).
 R. Salakhutdinov, and G. Hinton. Deep Boltzmann Machines, In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) (2009).
 G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines,in Neural Networks: Tricks of the Trade, 2nd ed., Lecture Notes in Computer Science, edited by G. Montavon, G. B. Orr, and K.R. Miller (Springer, New York, 2012), Vol. 7700, pp. 599â619.
 Gset graphs are freely available for download at https://web.stanford.edu/yyye/yyye/Gset/
 Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. Unsupervised learning of hierarchical representations with convolutional deep belief networks, Communications of the ACM, 54(10), 95 (2011).
 Cireşan, D., Meier, U., Schmidhuber, J. Multicolumn deep neural networks for image classification, IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642â3649 (2012)
 Preskill, J. Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018).
 Xia, R. and Kais, S. Quantum machine learning for electronic structure calculations, Nature Communications 9, 4195 (2018).
 L. Bello, M. Calvanese Strinati, E. G. Dalla Torre, and A. Pe’er. Persistent Coherent Beating in Coupled Parametric Oscillators, Phys. Rev. Lett. 123 083901 (2019).
 J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog. Analog Coupled Oscillator Based Weighted Ising Machine, Scientific Reports 9, 14786 (2019).