# An Artificial Neural Network Approach to the Analytic Continuation Problem

###### Abstract

Inverse problems are encountered in many domains of physics, with analytic continuation of the imaginary Green’s function into the real frequency domain being a particularly important example. However, the analytic continuation problem is ill-defined and currently no analytic transformation for solving it is known. We present a general framework for building an artificial neural network that efficiently solves this task with a supervised learning approach, provided that the forward problem is sufficiently stable for creating a large training dataset of physically relevant examples. By comparing with the commonly used maximum entropy approach, we show that our method can reach the same level of accuracy for low-noise input data, while performing significantly better when the noise strength increases. We also demonstrate that adding an unsupervised denoising step significantly improves the accuracy of the maximum entropy predictions for noisy inputs. The computational cost of the proposed neural network appoach is reduced by almost three orders of magnitude compared to the maximum entropy method.

###### pacs:

Valid PACS appear here###### pacs:

——–Numerical simulations are playing an extensive role in a growing number of scientific disciplines. Most commonly, numerical simulations approximate a process or a field on a discretized map, taking as input an equation describing the model as well as initial and boundary conditions Kabanikhin (2011). Problems falling under this definition are known as direct or forward problems. However, in a number of situations it is required to reconstruct an approximation of the input data or the model that generated it given the observable data. One particularly important example, known as the Fredholm integral equation of the first kind, takes the following form:

(1) |

where is the available quantity, is the quantity of interest and is the kernel. Such problem definitions are known as inverse problems and are mostly ill-posed Ramm (2006). Noise affecting the data may lead to arbitrarily large errors in the solutions. Formally speaking, one is interested in operators that are ill-conditioned or degenerated. Therefore, the formal inversion is not a stable operation Kabanikhin (2011). One popular approach for solving such problems is to construct regularizing algorithms that converge to a “pseudo-solution” Kabanikhin (2008). In this work, we will focus on a particularly important case of the analytic continuation problem in quantum many-body physics. The analytic continuation problem seeks to recover the electron single-particle spectral density in the real frequency domain from the fermionic Green’s function in imaginary time domain. These quantities are related by the following equation:

(2) |

High frequencies of are exponentially damped and their contribution to is negligible. Therefore, very different spectral density functions can correspond to very similar Green’s functions due to the ill-defined character of the problem. The analytic continuation is particularly important in quantum Monte Carlo methods that seek to compute , and whose results need to be compared to the observed Gubernatis et al. (2016).

There exist several techniques for performing the analytic continuation (see Refs. Jarrell and Gubernatis, 1996; Sandvik, 1998; Beach, ; Syljuåsen, 2008; Klepfish et al., 1998; Beach et al., 2000). Among them, the so-called maximum entropy (MaxEnt) approach is the most commonly used. This method initiates by generating random distributions and computes their corresponding using Eq. 2. and are then used to minimize , where , with being the covariance matrix, an entropy term that regularizes the problem by preventing the overfitting of the data, and is a temperature-like parameter that weights the relative importance of over . This last parameter is usually fixed by inferring the probability distribution using Bayes’ rule in a first step, and then by either fixing at the value that maximizes this distribution or by averaging different predictions with the weights provided by the distribution Fuchs et al. (2010). There is no a priori way to know which of these two approaches will give the best results in a given case.

The use of statistical methods for solving Fredholm integral equations has emerged recently. Those include the regularization of the problem and the acceleration of existing methods through dimensionality reduction of the Green’s functions Otsuki et al. (2017); Wu et al. (2013). Dahm and Keller pointed out that a similar context of the Fredholm inverse integral equation of the 2nd kind maps onto a reinforcement learning framework Dahm and Keller (2017). More recently, several works have shown that a machine learning approach is suitable for solving inverse problems Li et al. (); McCann et al. (2017); Arsenault et al. (2017). Among them, Arsenault et al. directly applied the kernel ridge regressions to the analytic continuation problem. In their work, a dataset of physically relevant pairs of spectral density and Green’s functions was generated and the kernel-ridge regression was used to train a model that approximates inversion of Eq. 2. The predictions were then projected onto a basis of functions that satisfies the problem constraints to ensure the physical relevance of the result. This data-driven approach distills the prior knowledge into the simulated training data, instead of placing it as prior knowledge in a default spectrum as in the traditional MaxEnt method. This allows more flexibility in the regularization of the dataset.

In this Letter, we propose an artificial neural network (ANN) approach to solving the analytic continuation problem. We show that the conventional regularization techniques of deep learning can efficiently solve inverse problems. In particular, a final softmax layer ensures the positivity and normalization of the spectrum function . In that sense, our framework directly integrates the projection step employed in Ref. Arsenault et al., 2017. We show that by applying standard training techniques, we are able to reach accuracy comparable with the MaxEnt approach while keeping lower standard deviations in the error distribution. Furthermore, our approach is more robust against input data that contains noise. We also show that the use of an unsupervised denoising preprocessing step greatly improves the robustness of the MaxEnt method against noisy inputs. Finally, we demonstrate that a trained neural network solves the analytic continuation problem at a much lower computational cost.

By their nature, supervised learning algorithms require a training dataset. Data can be collected from experimental measurements or simulated according to a theoretical model. In our work, we use the latter approach. We stress here that, apart the final softmax layer that ensures the output to have the shape of a density function, the choice of the training data is the main source of regularization of the problem. In this work, we choose to simulate spectral density functions that always has a quasiparticle peak close to as often encountered when considering correlated metals Zhang and Rice (1988); Kondo (1964). As in Ref. Arsenault et al., 2017, the model spectral densities are defined as a sum of uncorrelated Gaussian distributions:

(3) |

where the frequencies , the maximum frequency eV, the centers of the peaks eV, the number of Gaussian distributions and their broadening eV. The prefactor is used to normalize the spectrum. Parameters , and are uniformly sampled over the above-mentioned ranges, with the exception of that is constrained to be located close to the origin ( eV). These spectral functions are then used to compute their corresponding Green’s functions according to Eq. 2. A representative example of a pair is given in Fig. 1a,b. This approach can be extended to other Green’s function and self-energy problems by adapting Eq. 3 Levy et al. (2017).

Working in high-dimensional spaces leads to numerous problems commonly referred to as the “curse of dimensionality” Bellman (2015). In the absence of simplifying assumptions, the amount of data necessary to approximate a function to a given accuracy grows exponentially with the number of dimensions Lee and Verleysen (2007). This is due to the fact that for a given amount of data, the parameter space becomes more sparse when the dimensionality increases, which is also reflected in the term “empty space phenomenon”. Therefore, finding a more compact representation of the data facilitates the learning process of the model. In previous works, the orthogonal basis of Legendre polynomials was successfully used to improve results involving Green’s functions Huang (2016); Wu et al. (2013). Applying this idea, we compute the inputs from using the following definition:

(4) |

where are the Legendre polynomials. The truncation of polynomial expansions of a function can give rise to the Gibbs oscillations Gibbs (1898); Weiße et al. (2006). In our case, the norm of the coefficients decreases rapidly, therefore only the first 64 coefficients of the expansion are retained (Fig. 1b, inset).

We randomly generate one million pairs using this procedure and split them into a test set including 500 pairs SM () and a training set containing the rest of data. Since in practice we deal with quantum Monte Carlo data that is noisy by nature, we also generated test sets in which the Green’s functions were corrupted by errors [Fig. 1(a)] :

(5) |

where are independent random variables normally distributed with standard deviation .

The predicted spectral density functions are compared with the ones present in the dataset using two different measures. Namely, we employ the mean absolute error (MAE) as the most intuitive way to assess the difference

(6) |

and the Kullback-Leibler divergence (KLD) Kullback and Leibler (1951)

(7) |

which results in a better convergence of the model. Our aim is to build an ANN that is able to yield accurate predictions on unseen data. In deep learning terms, we speak of having sufficient representative and generalization capacity. The training of our model is improved until it performs better on average than the MaxEnt method on test data containing a low noise level of . This ensures that ANN performs well on real data. In our simulations MaxEnt reaches an average MAE of 0.0038 for such data.

The step-by-step design of our ANN starting from the simplest 2-layer model is briefly described in the Supplementary Material document SM (). The ultimate architecture of our model, shown in Fig. 1(c), consists of an input layer connected to a dense layer, followed by four repetitions of a sequence of layers. The first layer in this sequence is a batch normalization layer that improves the training by normalizing the data Loff and Szegedy (). This layer is followed by a linear rectifier unit (ReLU), which adds non-linearities into the model by taking the maximum value between 0 and each unit. Then follows a dropout unit that helps to avoid overfitting issues by randomly dropping connections between units during the training process Srivastava et al. (2014). Finally, a fully connected dense layer containing 1024 units learns the relation between the input and the output. After four repetitions of this sequence, the output is computed using a Softmax layerGoodfellow et al. (2016), which ensures its similarity to a probability density function. The training is performed on a dataset of one million samples using the Adam algorithm Kingma and Ba () as an optimizer and the KLD as a loss function. We reduce the learning rate, that is the speed at which the parameters are updated, each time the validation loss reaches a plateau during the training. Details concerning the training process can be found in the Supplementary Material SM ().

We have also investigated two approaches to improving the robustness of our results against noisy input data. The first approach relies on preprocessing data using unsupervised learning. Inspired by denoising algorithms in image recognition, we adopted the principal component analysis (PCA) approach for this purpose Jolliffe (2002); Zhang et al. (2010). As a first step, we compute the principal components of the Legendre coefficients present in our database to form a new basis whose vectors are aligned along the direction of maximum variance in the dataset. The noisy samples are then expressed in this basis, and all but the first components are set to zero. In our case, components are retained, which corresponds to directions covering of the variance of the training dataset. The data is then transformed back to the original space. This filtering approach keeps the important features of the sample while reducing the noise it contains. Examples of the filtered data can be found in the Supplementary Material document SM (). The second method of improving the robustness of the results only works for ANN and consists in adding a gaussian noise layer at the beginning of the network during the training phase. This layer adds random gaussian noise to the input data. Networks trained using this extra layer typically achieve better results than MaxEnt for levels of noise larger than by learning only from 25000 input data realizations.

Figure 2 provides a qualitative comparison of the results of our ANN and the MaxEnt implementation of Levy et al. Levy et al. (2017) for three different noise levels, , and , generated following the procedure described above. The level of noise was provided as parameter for MaxEnt and used to select the network for ANN, as explained above. In these examples, both methods predict accurately for the lowest level of noise. However, at MaxEnt tends to suppress some peaks in the predicted spectral function , while in the case of ANN this tendency is much less pronounced. At the highest level of noise , our ANN is able to correctly identify most peaks, whereas MaxEnt flattens the distribution (Fig. 2). Our ANN also showed better results compared to MaxEnt on the example of Legendre data from Ref. Levy et al., 2017.

We will now analyze the errors committed by the discussed models and the impact of the PCA step for increasing levels of noise (, and ). Fig. 3 displays the MAE distributions of both the ANN and MaxEnt methods, used with and without the PCA. We first notice that ANN has two main advantages over the MaxEnt approach. Firstly, the mean MAE of ANN remains lower than the one of MaxEnt at all noise levels. Second, the MAE of ANN shows a smaller spread. Adding a preprocessing PCA step has the remarkable advantage of of reducing the dependance of the accuracy of results on the noise level. However, this filter enlarges the errors at the lowest levels of noise and creates small counts in the whole interval of errors. In case of ANN, there is no benefit of using the PCA methods, while results for high levels of noise are improved in the case of MaxEnt. These results suggest that PCA efficiently removes noise from the data, but also alters some important characteristics. When using PCA in practice, one should carefully study the tradeoff between the noise reduction and the accuracy of the model. Adding more principal components will improve the accuracy, but will also penalize the efficiency of noise reduction. Fig. 4a summarizes our observations. ANN outperfoms MaxEnt and behaves even better with increasing the noise level, while the PCA filtering step offers better results at high noise levels.

It is worth discussing also the limitations of the proposed method. One drawback of our supervised learning approach is that the model is optimized for a particular inverse temperature (here, ), whereas the MaxEnt method can provide it as a parameter. A direct approach for extending our model to arbitrary values of would be to train a separate ANN for each temperature. Indeed, predictions based on our ANN remain stable in the vicinity of the optimal temperature, as shown in Fig. 4b. Another worth-trying idea would be to add as an input parameter to the model and train it over a range of temperatures. Both of these approaches would increase the training time, but would have no consequences on the computation time of a prediction.

Finally, we would like to underline the computational efficiency of our approach compared to MaxEnt. ANN allows a direct mapping between Green’s functions and the spectral densities. In that sense, we can define it as a direct way of solving the problem. In contrast, MaxEnt is an iterative method which requires generating trial functions until convergence is reached. In our computational setup, the time required for converting 500 pairs with is 5 seconds in the case of ANN (including the loading of the required libraries), while MaxEnt would need 51 minutes using the same setup.

To summarize, thanks to the stability of the forward problem, we have built an artificial neural network that solves the analytic continuation problem with an accuracy similar to that of the commonly used maximum entropy approach, but at a fraction of its computational cost. We have also shown that our ANN model performs better for noisy inputs and implemented a denoising filter based on the principal component analysis to improve the results. Adding more data and increasing the number of parameters can further improve the accuracy, although training must be performed with care to avoid overfitting issues. The great representative capacity of deep neural networks suggests that other inverse problem can be solved in a similar way, provided that a dataset can be built.

Trained models resulting from our work can be obtained from public repository Fournier (2018).

We thank Xi Dai for helpful discussions at the early stage of this project. O.V.Y. and Q.W. acknowledge support by NCCR Marvel. L.W. acknowledges support by the the National Natural Science Foundation of China under Grant No. 11774398. Computations were performed at the Swiss National Supercomputing Centre (CSCS) under project s832 and the facilities of Scientific IT and Application Support Center of EPFL.

Note: Another work using convolutional neural network to solve the analytic continuation problem Yoon et al. () appeared when the present manuscript was in preparation.

## References

- Kabanikhin (2011) S. I. Kabanikhin, Inverse and Ill-posed Problems: Theory and Applications (Walter de Gruyter, Berlin, Boston, 2011).
- Ramm (2006) A. G. Ramm, Inverse Problems: Mathematical and Analytical Techniques with Applications to Engineering (Springer US, Boston, 2006).
- Kabanikhin (2008) S. I. Kabanikhin, Journal of Inverse and Ill-posed Problems. 16, 317 (2008).
- Gubernatis et al. (2016) J. Gubernatis, N. Kawashima, and P. Werner, Quantum Monte Carlo Methods: Algorithms for Lattice Models (Cambridge University Press, Cambridge, 2016).
- Jarrell and Gubernatis (1996) M. Jarrell and J. Gubernatis, Physics Reports 269, 133 (1996).
- Sandvik (1998) A. W. Sandvik, Phys. Rev. B 57, 10287 (1998).
- (7) K. S. D. Beach, arXiv:cond-mat/0403055 .
- Syljuåsen (2008) O. F. Syljuåsen, Phys. Rev. B 78, 174429 (2008).
- Klepfish et al. (1998) E. Klepfish, C. Creffield, and E. Pike, Nuclear Physics B-Proceedings Supplements 63, 655 (1998).
- Beach et al. (2000) K. Beach, R. Gooding, and F. Marsiglio, Physical Review B 61, 5147 (2000).
- Fuchs et al. (2010) S. Fuchs, T. Pruschke, and M. Jarrell, Phys. Rev. E 81, 056701 (2010).
- Otsuki et al. (2017) J. Otsuki, M. Ohzeki, H. Shinaoka, and K. Yoshimi, Phys. Rev. E 95, 061302 (2017).
- Wu et al. (2013) Q.-S. Wu, Y.-L. Wang, Z. Fang, and X. Dai, Chinese Physics Letters 30, 090201 (2013).
- Dahm and Keller (2017) K. Dahm and A. Keller, in ACM SIGGRAPH 2017 Talks, SIGGRAPH ’17 (ACM, New York, NY, USA, 2017) pp. 73:1–73:2.
- (15) H. Li, J. Schwab, S. Antholzer, and M. Haltmeier, arXiv:1803.00092 .
- McCann et al. (2017) M. T. McCann, K. H. Jin, and M. Unser, IEEE Signal Processing Magazine 34, 85 (2017).
- Arsenault et al. (2017) L.-F. Arsenault, R. Neuberg, L. A. Hannah, and A. J. Millis, Inverse Problems 33, 115007 (2017).
- Zhang and Rice (1988) F. Zhang and T. Rice, Physical Review B 37, 3759 (1988).
- Kondo (1964) J. Kondo, Progress of theoretical physics 32, 37 (1964).
- Levy et al. (2017) R. Levy, J. LeBlanc, and E. Gull, Computer Physics Communications 215, 149 (2017).
- Bellman (2015) R. E. Bellman, Adaptive Control Processes: A Guided Tour (Princeton University Press, Princeton, 2015).
- Lee and Verleysen (2007) J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction (Springer Science & Business Media, New York, 2007).
- Huang (2016) L. Huang, Chinese Physics B 25, 117101 (2016).
- Gibbs (1898) J. W. Gibbs, Nature 59, 200 (1898).
- Weiße et al. (2006) A. Weiße, G. Wellein, A. Alvermann, and H. Fehske, Rev. Mod. Phys. 78, 275 (2006).
- (26) The supplementary material document contains details of the design and training of the proposed ANN as well as the discussions of the effect of temperature variations, test dataset size and PCA filtering.
- Kullback and Leibler (1951) S. Kullback and R. A. Leibler, The Annals of Mathematical Statistics 22, 79 (1951).
- (28) S. Loff and C. Szegedy, arXiv:1502.03167 .
- Srivastava et al. (2014) N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Journal of Machine Learning Research 15, 1929 (2014).
- Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, Cambridge, 2016).
- (31) D. P. Kingma and J. Ba, arXiv:1412.6980 .
- Jolliffe (2002) I. T. Jolliffe, Principal Component Analysis, 2nd ed., Springer Series in Statistics (Springer-Verlag, New York, 2002).
- Zhang et al. (2010) L. Zhang, W. Dong, D. Zhang, and G. Shi, Pattern Recognition 43, 1531 (2010).
- Fournier (2018) R. Fournier, “ACANN project repository,” https://github.com/rmnfournier/ACANN (2018).
- (35) H. Yoon, J.-H. Sim, and M. J. Han, arXiv:1806.03841v1 .