A regression algorithm for accelerated lattice QCD that exploits sparse inference on the D-Wave quantum annealer

A regression algorithm for accelerated lattice QCD that exploits sparse inference on the D-Wave quantum annealer

Nga T.T. Nguyen nga.nguyen@lanl.gov Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA    Garrett T. Kenyon gkenyon@lanl.gov Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA    Boram Yoon boram@lanl.gov Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Abstract

We propose a regression algorithm that utilizes a learned dictionary optimized for sparse inference on D-Wave quantum annealer. In this regression algorithm, we concatenate the independent and dependent variables as an combined vector, and encode the high-order correlations between them into a dictionary optimized for sparse reconstruction. On a test dataset, the dependent variable is initialized to its average value and then a sparse reconstruction of the combined vector is obtained in which the dependent variable is typically shifted closer to its true value, as in a standard inpainting or denoising task. Here, a quantum annealer, which can presumably exploit a fully entangled initial state to better explore the complex energy landscape, is used to solve the highly non-convex sparse coding optimization problem. The regression algorithm is demonstrated for a lattice quantum chromodynamics simulation data using a D-Wave 2000Q quantum annealer and good prediction performance is achieved. The regression test is performed using six different values for the number of fully connected logical qubits, between 20 and 64, the latter being the maximum that can be embedded on the D-Wave 2000Q. The scaling results indicate that a larger number of qubits gives better prediction accuracy, the best performance being comparable to the best classical regression algorithms reported so far.

I Introduction

Sparse coding refers to a class of unsupervised learning algorithms for finding an optimized set of bases vectors, or dictionary, for accurately reconstructing inputs drawn from any given dataset using the fewest number of non-zero coefficients. Sparse coding explains the self-organizing response properties of simple cells in the mammalian primary visual cortex olshausen96; olshausen_sparse_1997, and has been successfully applied in various fields including image classification yang2009; Coates:2011:IEV:3104482.3104598, image compression yijing, and compressed sensing Tao_Information; Donoho_Compressed. Optimizing a dictionary for a given dataset and inferring optimal sparse representations of input data involves finding solutions of the following minimization problem:

(1)

where is the index of the input data, and is the sparsity penalty parameter. Because of the -norm, the minimization problem falls into a NP-hard complexity class with multiple local minima Natarajan:1995:SAS:207985.207987.

Recently, we developed a mapping of the -optimization in Eq. (1) to the quadratic unconstrained binary optimization (QUBO) problem that can be solved on an quantum annealer and demonstrated its feasibility on the D-Wave systems Nguyen:2016; 8123653; Nguyen2018ImageCU. The quantum processing unit of the D-Wave systems realizes the quantum Ising spin system in a transverse field and finds the lowest or the near-lowest energy states of the classical Ising model,

(2)

using quantum annealing PhysRevE.58.5355; Finnila_1994; DWAVE. Here is the binary spin variable, and are the qubit biases and coupling strengths that can be controlled by a user, and optimization for the Ising model is isomorphic to a QUBO problem with . By mapping the sparse coding to a QUBO structure, the sparse coefficients are restricted to binary variables . Despite the restriction, it was able to provide good sparse representation for the the MNIST lecun-mnisthandwrittendigit-2010; Nguyen:2016; Nguyen2018ImageCU and CIFAR-10 cifar10; 8123653 images.

In this paper, we propose a regression algorithm using the sparse coding on D-Wave 2000Q in Section II and apply the algorithm to a prediction of quantum chromodynamics (QCD) simulation observable in Section III.

Ii Regression algorithm using sparse coding on D-Wave 2000Q

ii.1 Regression model

Consider sets of training data , and sets of the test data , where is an input vector known as the independent variable, and is an output variable known as the dependent variable. A regression model can be built by learning correlations between the input and output variables on the training dataset, so that it can make predictions of for an unseen input data as

(3)

Such regression model can be built using the sparse coding learning implemented on a quantum annealer described below.

  • Pre-training

    • [leftmargin=1em]

    • Normalize and so that their standard deviations becomes comparable. One possible choice is rescaling the data to have a zero mean and a unit variance using the sample mean and sample variance of the training dataset. This procedure is an essential step for the regression algorithm as it makes the reconstruction error for each component comparable.

    • Using in the test dataset () or those in the combined training and test datasets (), perform sparse coding training and obtain the dictionary for .

  • Training

    • [leftmargin=1em]

    • Concatenate the input and output variables of the training dataset and build the concatenated vectors . Extend the dictionary matrix obtained in the pre-training to , filling up the new elements by zeros.

    • Taking as the input signal and as an initial guess of the dictionary, perform sparse coding training on the training dataset and obtain the dictionary . Through this procedure, will encode the correlation between and .

  • Prediction

    • [leftmargin=1em]

    • For the test dataset, for which only is given, build a vector , where is an initial guess of . One possible choice of is the average value of in the training dataset.

    • Using the dictionary obtained in (4), find sparse representation for and calculate reconstruction as . This replaces the outlier components, including , in by the values that can be described by .

    • After inverse-normalization, the ’th component of is the prediction of : .

In this regression model, should be sufficiently large so that initial guess of the dependent variable does not bias the reconstruction. This procedure can be extended to predict multiple variables by increasing the dimension of , in exchange for prediction accuracy.

ii.2 Sparse coding on D-Wave quantum annealer

The -optimization of the sparse coding problem in Eq. (1), can be mapped onto the D-Wave problem in Eq. (2), by the following transformations Nguyen:2016; 8123653; Nguyen2018ImageCU:

(4)

Here the qubit-qubit coupling shares similarity with the lateral neuron-neuron inhibition in the locally competitive algorithm rozell.06c, and the constant makes the solution sparse by acting a constant field forcing the qubits to stay in () state. By performing the quantum annealing for a given dictionary and input data vector with the transformations given in Eq. (4), one can obtain the optimal sparse representation . On the D-Wave systems, however, computing graph, known as the Chimera architecture, has limited qubit connectivity by design, while the sparse coding problem mapped to the QUBO form requires a fully connected qubit graph. Therefore, it requires an additional step called embedding, which translates the fully-connected logical qubits to the partially-connected physical qubits on the Chimera structure, as described in Fig. 1.

Figure 1: A subset () of the Chimera structure of the D-Wave 2000Q consisting of qubits (circles) arranged in matrix of unit cells of 8 qubits. The qubits within a unit cell have relatively dense connections, while the interactions between the unit cells can be made through the sparse connections in their edges. This figure also shows an example of embedding fully-connected logical qubits (numbers from to inside circles) onto the D-Wave chimera using physical qubits, in which red edges indicate bipartite couplings between qubits while blue edges indicate chained qubits. After such embedding, for example, the logical qubit is mapped to two physical qubits tiled from one qubit in the top right and one in the bottom right unit cell, while the logical qubit mapped to three physical qubits tiled from two qubits in the top left and one qubit in the top right unit cell, and so forth.

The D-Wave 2000Q hardware consists of quantum bits (qubits) arranged into unit cells of physical qubits forming the Chimera structure. After accounting the less than of defects that occurs during the calibration process, as temperature is cooled down to a superconducting threshold, the machine have approximately qubits with larger than local spin-spin interactions. Using these qubits and connections, embedding an arbitrary QUBO problem onto the D-Wave 2000Q typically allows no more than fully connected logical qubits.

Iii Application to lattice QCD

QCD is a theory of quarks and gluons, which are the fundamental particles composing hadrons such as pions and protons, and their interactions. It is a part of the Standard Model of particle physics, and the theory has been demonstrated by large class of experiments over the decades Patrignani:2016xqp; Greensite:2011zz. Lattice QCD is an discrete formulation of QCD on an Euclidean space time lattice, which allows us to solve low-energy QCD problems using computer simulations by carrying out the Feynman path integration using Monte Carlo methods Wilson:1974sk; Creutz:1980zw.

In lattice QCD simulations, large number of observables are calculated over an ensemble of the Gibbs samples of gluon fields, called the lattices, and computational cost for calculating those observables are expensive in modern simulations. However, the observables’ fluctuations over the statistical samples of the lattices are correlated as they share the same background lattice. By exploiting the correlation between them, in Ref. Yoon:2018krb, boosted decision tree (BDT) regression algorithm was able to replace the computationally expensive direct calculation of some observables by the computationally cheap machine learning predictions of them from other observables.

In this section, we apply the regression algorithm proposed in Section II to the lattice QCD simulation data used for the calculation of the charge-parity (CP) symmetry violating phase of the neutron Yoon:2017tag; Pospelov:2005pr. Here we consider three types of observables: (1) two-point correlation functions of neutrons calculated without CP violating (CPV) interactions , (2) -projected two-point correlation functions of neutrons calculated without CPV interactions , and (3) -projected two-point correlation functions of neutrons calculated with CPV interactions , and the phase is extracted from the imaginary part of . Those observables are calculated at multiple values of the nucleon source and sink separations in Euclidean time direction .

iii.1 Method

Our goal of the regression problem is to predict the imaginary part of at from the real and imaginary parts of the two-point correlation functions calculated without CPV interactions, and , at and , where is the lattice spacing. It forms a problem with single value of output variable () and 20 values (two observables, real/imag, 5 timeslices) of the input variables (). In this application, we use 15616 data points of of these observables measured in Refs. Bhattacharya:2016oqm; Bhattacharya:2016rrc divided into 6976 training data and 8640 test data. Using this datasets, we follow the regression procedure proposed in Section II to predict of the test dataset that contains around 9K data points.

The procedure can be summarized as following. First, we standardize the total data using the mean and variance of the training dataset for normalization. Then, we perform the pre-training and obtained for the 20 elements of only using the test dataset. After appending the to as the 21st element in the training dataset, we perform the sparse coding dictionary learning and update to encode correlation between and . For prediction, input vectors in the test dataset are augmented to dimension of 21 vectors by appending the average value of , which is 0 after standardization. Finally, sparse coefficients for the augmented input vectors are calculated with the fixed dictionary obtained above, and predictions of are estimated by taking the 21st element of the reconstructed vectors on the test dataset.

Figure 2: Original, or ground truth, data (blue circles) and the reconstruction from the missing-21-element data using D-Wave 2000Q with (red squares) for two randomly chosen data points. Here the -element is the dependent variable of the prediction, whose initial value before the reconstruction is given by 0.

Note that a sparse coding problem solves for the sparsest representation and the dictionary , simultaneously, by minimizing Eq. (1). First, our optimization for is performed using the D-Wave 2000Q at a given , whose initial guess is given, in general, by random numbers or via imprinting technique. Then, the optimization for is performed on classical CPUs. The latter step is an offline learning for the fixed values of obtained using D-Wave 2000Q. In the offline learning procedure, is learned using the a batch stochastic gradient descent (SGD) algorithm:

(5)

where with is the sparse coding energy function for a given input data given in Eq. (1), and is the learning rate. Batch-learning is used with the batch size of . We repeat the iterative update of the quantum D-Wave inference for and SGD learning for a convergence is attained. On average, we find the convergence after or iterations. In this study, we use the SAPI2 python client libraries sapi for implementing D-Wave operations.

The sparsity of the sparse representation associated with the sparsity penalty parameter is calculated by the ratio of nonzero elements in . In this study, is tuned to the values that make the average sparsity about 20%, because we find that the 20% of sparsity provides an optimal prediction performance, after examining a few different values of . Although the prediction performance could be further optimized by an extensive parameter search, such as that performed in Ref. Kenyon_phasetransition, the procedure is computationally expensive so ignored in this proof-of-principle study.

Note that the definition of the overcompleteness is not straightforward for the D-Wave inferred sparse coding because the input signal may have arbitrary real numbers, while the sparse coefficients could have only binary numbers of 0 or 1. Ignoring the subtlety, for simplicity, the overcompleteness for the input signal of dimension 20 (or 21 for extended vectors) can be calculated by .

iii.2 Results

Examples of the reconstruction and prediction from the randomly chosen test data points are visualized in Fig. 2. In the plot, first 20 elements are the input variables, and the element 21 is the output of the prediction algorithm. As one can see, the reconstruction of the st element, which was 0 on their in initial guess, are successfully shifted close to their ground truth, as expected.

Figure 3: Distribution of the prediction error of the 21st element plotted against the distribution of the ground truth for different numbers of qubits , and . The narrower width of the prediction error indicates the better prediction. Standard deviations of the prediction errors for , and are , , , , and , respectively. Scaling of the prediction error is summarized in Fig. 4.

In order to investigate the prediction accuracy for different , we explore the prediction algorithm with six different numbers of qubits and , which corresponds to . Note that the larger implies the more difficult optimization problem, and is the maximum number of logical qubits that can be embedded onto the D-Wave 2000Q. In Fig. 3, we show the distribution of the normalized original data of the dependent variable and its prediction error defined by the difference between the ground truth and its prediction : . It is clearly demonstrated that (1) the prediction error is much smaller than the fluctuation of the original data, (2) the prediction error is sharply distributed near 0, which indicates no obvious bias in the prediction, and (3) the prediction error tends to be smaller when becomes larger.

To evaluate the prediction quality, the recovery of the st element in the extended input vector, quantitatively, we calculate the ratio of the standard deviations of the prediction error and that of the original data: . converges to when the prediction is precise, and indicates no prediction for a statistical data. Note that this definition of the prediction quality does not account for the bias of the prediction because the bias for the prediction of a statistical data can be removed by following the procedure introduced in Ref. Yoon:2018krb based on the variance reduction technique for lattice QCD calculations Bali:2009hu; Blum:2012uh.

Fig. 4 shows the prediction error as a function of the number of qubits. It is clearly demonstrated that the systematic decrease of the prediction error as is increased. Although no theory explaining the scaling is known, we find that the scaling roughly follows the exponential decay ansatz . By fitting the ansatz to the data points, an asymptotic value of the prediction quality is obtained as or for , depending on whether we include data point or not in the fit. For a comparison, regression algorithms provided by the scikit-learn library scikit-learn on a classical computer are investigated for the same dataset, and BDT regression algorithm showed the best prediction performance with .

Figure 4: Prediction error with the sparse coding regression algorithm implemented on D-Wave 2000Q applied to prediction of the lattice QCD simulation data as a function of (red squares). An exponential ansatz is fitted to the data points for all (blue solid line) and those excluding (green dashed line).

Pre-training is demonstrated to lower the prediction error of this regression algorithm, significantly. When performed the prediction with qubits without the pre-training procedure, we find that , while it becomes with the pre-training. Without the pre-training, furthermore, we find that the required number of iterative updates of the D-Wave inference for and SGD learning for is increased to about 10 iterations.

Iv Conclusion

In this paper, we proposed a regression algorithm using sparse coding dictionary learning that can be implemented on a quantum annealer, based on the formulation of a regression as an inpainting problem. A pre-training technique is introduced to improved the prediction quality. The procedure is described in Section II.1. The regression algorithm was numerically demonstrated using a set of lattice QCD simulation observables and was able to predict the correlation function calculated in presence of the CPV interactions from those calculated without the CPV interaction. The regression experiment is carried out using the D-Wave 2000Q quantum annealer with minor embedding technique in order to obtain fully-connected logical qubits. The study is performed for six different values of the number of qubits between 20 and 64, and it showed systematic decrease of the prediction error as the number of qubits is increased (see Fig. 4). With larger number of qubits and elaborately tuned sparsity parameter, we expect further improved performance in future.

Acknowledgements.
The sparse coding optimizations were carried out using the D-Wave 2000Q at Los Alamos National Laboratory (LANL). Simulation data used for numerical experiment were generated using the computer facilities at (i)the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; and, (ii) the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725; (iii) the USQCD Collaboration, which are funded by the Office of Science of the U.S. Department of Energy, (iv) Institutional Computing at Los Alamos National Laboratory. This work was supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics under Contract No. 89233218CNA000001. BY also acknowledges support from the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of Nuclear Physics, Scientific Discovery through Advanced Computing (SciDAC) program, and the LANL LDRD program.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
398287
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description