Deep learning: Extrapolation tool for ab initio nuclear theory
Ab initio approaches in nuclear theory, such as the no-core shell model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. The NCSM and other approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations. Each observable requires a separate extrapolation and many observables have no proven extrapolation method. We propose a feed-forward artificial neural network (ANN) method as an extrapolation tool to obtain the ground-state energy and the ground-state point-proton root-mean-square (rms) radius along with their extrapolation uncertainties. The designed ANNs are sufficient to produce results for these two very different observables in from the ab initio NCSM results in small basis spaces that satisfy the following theoretical physics condition: independence of basis space parameters in the limit of extremely large matrices. Comparisons of the ANN results with other extrapolation methods are also provided.
A major long-term goal of nuclear theory is to understand how low-energy nuclear properties arise from strongly interacting nucleons. When interactions that describe nucleon-nucleon (NN) scattering data with high accuracy are employed, the approach is considered to be a first principles or ab initio method. This challenging quantum many-body problem requires a non-perturbative computational approach for quantitative predictions.
With access to powerful high-performance computing (HPC) systems, several ab initio approaches have been developed to study nuclear structure and reactions. The no-core shell model (NCSM) Barrett et al. (2013) is one of these approaches that falls into the class of configuration interaction methods. Ab initio theories, such as the NCSM, traditionally employ realistic inter-nucleon interactions and provide predictions for binding energies, spectra, and other observables in light nuclei.
The NCSM casts the non-relativistic quantum many-body problem as a finite Hamiltonian matrix eigenvalue problem expressed in a chosen, but truncated, basis space. A popular choice of basis representation is the three-dimensional harmonic-oscillator (HO) basis that we employ here. This basis is characterized by the HO energy, , and the many-body basis space cutoff, . The cutoff for the configurations to be included in the basis space is defined as the maximum of the sum over all nucleons of their HO quanta (twice the radial quantum number plus the orbital quantum number) above the minimum needed to satisfy the Pauli principle. Due to the strong short-range correlations of nucleons in a nucleus, a large basis space (model space) is required to achieve convergence in this two-dimensional parameter space (, ), where convergence is defined as independence of both parameters within evaluated uncertainties. However, one faces major challenges to approach convergence since, as the size of the space increases, the demands on computational resources grow rapidly. In practice these calculations are limited and one can not directly calculate, for example, the ground-state (GS) energy or the GS point-proton root-mean-square (rms) radius for a sufficiently large that would provide good approximations to the converged result in most nuclei of interest Vary et al. (2009); Maris et al. (2009); Maris and Vary (2013); Shirokov et al. (2014). We focus on these two observables in the current investigation.
To obtain the GS energy and the GS point-proton rms radius as close as possible to the exact results, the NCSM and other ab initio approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations Maris et al. (2009); Maris and Vary (2013); Shin et al. (2017). Each observable requires a separate extrapolation and many observables have no proposed extrapolation method at the present time.
Deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks (ANNs). In recent years, deep learning became a tool for solving challenging data analysis problems in a number of domains. For example, several successful applications of the ANNs have emerged in nuclear physics, high-energy physics, astrophysics, as well as in biology, chemistry, meteorology, geosciences, and other fields of science. Applications of ANNs to quantum many-body systems have involved multiple disciplines and have been under development for many years Clark (1999).
ANNs have been applied previously to an array of problems in nuclear physics. For example, ANN models have been developed for identification of impact parameter in heavy ion collisions David et al. (1995); Bass et al. (1996); Haddad et al. (1997), statistical modeling of nuclear systematics Clark et al. (2001), developing nuclear mass systematics Athanassopoulos et al. (2004), determining one- and two-proton separation energies Athanassopoulos et al. (2005), modeling systematics of decay half-lives Costiris et al. (2007, 2009), constructing a model for the nuclear charge radii Akkoyun et al. (2013a), and obtaining potential energy curves Akkoyun et al. (2013b). More recent applications include predicting nuclear masses for properties of neutron stars Utama et al. (2016a), predicting nuclear charge radii Utama et al. (2016b), as well as improving and validating nuclear mass formulas Utama and Piekarewicz (2017, 2018). An ambitious application of ANNs for extrapolating nuclear binding energies is also noteworthy Neufcourt et al. (2018).
The present work proposes a feed-forward ANN method as an extrapolation tool to obtain the GS energy and the GS point-proton rms radius and their extrapolation uncertainties based upon NCSM results in readily-solved basis spaces. The advantage of ANN is that it does not need an explicit analytical expression to model the variation of the GS energy or the GS point-proton rms radius with respect to and . We will demonstrate that the feed-forward ANN method is very useful for estimating the converged result at very large through demonstration applications in .
We have generated theoretical data for by performing ab initio NCSM calculations with the MFDn code Sternberg et al. (2008); Maris et al. (2010); Aktulga et al. (2014), a hybrid MPI/OpenMP code for ab initio nuclear structure calculations, using the Daejeon16 NN interaction Shirokov et al. (2016) and HO basis spaces up through the cutoff . The dimension of the resulting many-body Hamiltonian matrix is about at this cutoff. We note that NCSM basis spaces for have now been achieved up through in Forssén et al. (2018).
This research extends the work presented in Negoita et al. (2018) where we initially considered the GS energy and GS point-proton rms radius for produced with the feed-forward ANN method. In particular, the current work presents results using multiple datasets, which consist of data through a succession of cutoffs: , and 18. The previous work considered only one dataset up through = 10. Furthermore, the current work is the first to report uncertainty assessments of the results. Comparisons of the ANN results and their uncertainties with other extrapolation methods are also provided.
The paper is organized as follows. In Section II, short introductions to the ab initio NCSM method and ANN’s formalism are given. In Section III, our ANN’s architecture and filtering are presented. Section IV presents the results and discussions of this work. Section V contains our conclusion and future work.
Ii Theoretical Framework
The NCSM is an ab initio approach to the nuclear many-body problem, which solves for the properties of nuclei for an arbitrary inter-nucleon interaction, preserving all the symmetries. The inter-nucleon interaction can consist of both NN components and three-nucleon forces but we omit the latter in the current effort since they are not expected to be essential to the main thrust of the current ANN application. We will show that the ANN method is useful to make predictions for the GS energy and the GS point-proton rms radius and their extrapolation uncertainties at ultra-large basis spaces using available data from NCSM calculations at smaller basis spaces. More discussions on the NCSM and the ANN are presented in each subsection.
ii.1 Ab Initio NCSM Method
In the NCSM method, a nucleus consisting of nucleons with neutrons and protons () is described by the quantum Hamiltonian with kinetic energy () and interaction () terms
Here, is the nucleon mass (taken as the average of the neutron and proton mass), is the momentum of the th nucleon, is the NN interaction including the Coulomb interaction between protons, is the three-nucleon interaction, and the interaction sums run over all pairs and triplets of nucleons, respectively. Higher-body (up to -body) interactions are also allowed and signified by the three dots. As mentioned, we retain only the NN interaction for which we select the Daejeon16 interaction Shirokov et al. (2016) in the present work.
Our chosen NN interaction, Daejeon16 Shirokov et al. (2016), is developed from an initial Chiral NN interaction at the next-to-next-to-next-to-leading order (N3LO) Entem and Machleidt (2002, 2003) by a process of similarity renormalization group evolution (SRG) Bogner et al. (2007, 2010) and phase-equivalent transformations (PETs) Lurie and Shirokov (1997, 2008); Shirokov et al. (2004). The PETs are chosen so that Daejeon16 describes well the properties of light nuclei without explicit use of three-nucleon or higher-body interactions, which, if retained, would require a significant increase of computational resources.
With the nuclear Hamiltonian (1), the NCSM solves the -body Schrödinger equation
using a matrix formulation, where the -body wave function is given by a linear combination of Slater determinants
and where is the number of many-body basis states, configurations, in the system. The Slater determinant is the antisymmetrized product of single-particle wave functions
where is the single-particle wave function for the th nucleon and is the antisymmetrization operator. Although we adopt a common choice for the single-particle wave functions, the HO basis functions, one can extend this approach to a more general single-particle basis Negoita (2010); Caprio et al. (2012, 2014); Constantinou et al. (2017). The single-particle wave functions are labeled by the quantum numbers , where and are the radial and orbital HO quantum numbers (with the number of HO quanta for a single-particle state), is the total single-particle angular momentum, and its projection along the axis.
We employ the scheme where each HO single-particle state has its orbital and spin angular momenta coupled to good total angular momentum, , and magnetic projection, . The many-body basis states have well-defined parity and total angular momentum projection, , but they do not have a well-defined total angular momentum . The matrix elements of the Hamiltonian in the many-body HO basis are given by . These Hamiltonian matrices are sparse, the number of non-vanishing matrix elements follows an approximate scaling rule of , where is the dimension of the matrix Vary et al. (2009). For these large and sparse Hamiltonian matrices, the Lanczos method is one possible choice to find the extreme eigenvalues Parlett (1998).
We adopt the Lipkin-Lawson method Lipkin (1958); Gloeckner and Lawson (1974) to enforce the factorization of the center-of-mass (CM) and intrinsic components of the many-body eigenstates. In this method, a Lagrange multiplier term, , is added to the Hamiltonian above, where is the HO Hamiltonian for the CM motion. With chosen positive (10 is a typical value), one separates the states of lowest CM motion from the states with excited CM motion by a scale factor of order .
In our truncation approach, all possible configurations with excitations above the unperturbed GS (the HO configuration with the minimum HO energy defined to be the configuration) are considered. The basis is limited to many-body basis states with total many-body HO quanta, , where is the minimal number of quanta for that nucleus, which is 2 for . Note that this truncation, along with the Lipkin-Lawson approach described above, leads to an exact factorization of the single-particle wave functions into the CM and intrinsic components. Usually, the basis includes either only many-body states with even values of (and, respectively, ), which correspond to states with the same (positive for ) parity as the unperturbed GS, and are called the “natural” parity states, or only with odd values of (and, respectively, ), which correspond to states with “unnatural” (negative for ) parity.
As it was already mentioned, the NCSM calculations are performed with the code MFDn Sternberg et al. (2008); Maris et al. (2010); Aktulga et al. (2014). Due to the strong short-range correlations of nucleons in a nucleus, a large basis space is required to achieve convergence. The requirement to simulate the exponential tail of a quantum bound state with HO wave functions possessing Gaussian tails places additional demands on the size of the basis space. The calculations that achieve the desired convergence are often not feasible due to the nearly exponential growth in matrix dimension with increasing . To obtain the GS energy and other observables as close as possible to the exact results one seeks solutions in the largest feasible basis spaces. These results are sometimes used in attempts to extrapolate to the infinite basis space. To take the infinite matrix limit, several extrapolation methods have been developed, such as “Extrapolation B” Maris et al. (2009); Maris and Vary (2013), “Extrapolation A5”, “Extrapolation A3”, and “Extrapolation based on ” Shin et al. (2017), which are extensions of techniques developed in Coon et al. (2012); Furnstahl et al. (2012); More et al. (2013); Wendt et al. (2015). We also note that theoretical extrapolation methods have been introduced and analyzed for quadrupole moments and transitions in Odell et al. (2016) and for capture cross sections in Acharya et al. (2017). Using such extrapolation methods, one investigates the convergence pattern with increasing basis space dimensions and thus obtains, to within quantifiable uncertainties, results corresponding to the complete basis. We will employ these extrapolation methods to compare with results from ANNs.
ii.2 Artificial Neural Networks
ANNs are powerful tools that can be used for function approximation, classification, and pattern recognition, such as finding clusters or regularities in the data. The goal of ANNs is to find a solution efficiently when algorithmic methods are computationally intensive or do not exist. An important advantage of ANNs is the ability to detect complex non-linear input-output relationships. For this reason, ANNs can be viewed as universal non-linear function approximators Hornik et al. (1989). Employing ANNs for mapping complex non-linear input-output problems offers a significant advantage over conventional techniques, such as regression techniques, because ANNs do not require explicit mathematical functions.
ANNs are computer algorithms inspired by the structure and function of the brain. Similar to the human brain, ANNs can perform complex tasks, such as learning, memorizing, and generalizing. They are capable of learning from experience, storing knowledge, and then applying this knowledge to make predictions.
ANNs consist of a number of highly interconnected artificial neurons (ANs), which are processing units. The ANs are connected with each other via adaptive synaptic weights. The AN collects all the input signals and calculates a net signal as the weighted sum of all input signals. Next, the AN calculates and transmits an output signal, . The output signal is calculated using a function called an activation or transfer function, , which depends on the value of the net signal, .
One simple way to organize ANs is in layers, which gives a class of ANN called multi-layer ANN. ANNs are composed of an input layer, one or more hidden layers, and an output layer. The neurons in the input layer receive the data from outside and transmit the data via weighted connections to the neurons in the first hidden layer, which, in turn, transmit the data to the next layer. Each layer transmits the data to the next layer. Finally, the neurons in the output layer give the results. The type of ANN, which propagates the input through all the layers and has no feed-back loops is called a feed-forward multi-layer ANN. For simplicity, throughout this paper we adopt and work with a feed-forward ANN. For other types of ANN, see Bishop (1995); Haykin (1999).
For function approximation, a sigmoid or sigmoidlike and linear activation functions are usually used for the neurons in the hidden and output layer, respectively. There is no activation function for the input layer. The neurons with nonlinear activation functions allow the ANN to learn nonlinear and linear relationships between input and output vectors. Therefore, sufficient neurons should be used in the hidden layer in order to get a good function approximation.
In our terminology, an ANN is defined by its architecture, the specific values for its weights and biases, and by the chosen activation function. For the purposes of our statistical analysis, we create an ensemble of ANNs.
The development of an ANN is a two-step process with training and testing stages. In the training stage, the ANN adjusts its weights until an acceptable error level between desired and predicted outputs is obtained. The difference between desired and predicted outputs is measured by the error function, also called the performance function. A common choice for the error function is mean-square-error (MSE), which we adopt here.
There are multiple training algorithms based on various implementations of the back-propagation algorithm Hagan and Menhaj (1994), an efficient method for computing the gradient of error functions. These algorithms compute the net signals and outputs of each neuron in the network every time the weights are adjusted, the operation being called the forward pass operation. Next, in the backward pass operation, the errors for each neuron in the network are computed and the weights of the network are updated as a function of the errors until the stopping criterion is satisfied. In the testing stage, the trained ANN is tested over new data that were not used in the training process.
One of the known problems for ANN is overfitting: the error on the training set is within the acceptable limits, but when new data is presented to the network the error is large. In this case, ANN has memorized the training examples, but it has not learned to generalize to new data. This problem can be prevented using several techniques, such as early stopping and different regularization techniques Bishop (1995); Haykin (1999).
Early stopping is widely used. In this technique the available data is divided into three subsets: the training set, the validation set, and the test set. The training set is used for computing the gradient and updating the network weights and biases. The error on the validation set is monitored during the training process. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. The test set error is not used during training, but it is used as a further check that the network generalizes well and to compare different ANN models.
Regularization modifies the performance function by adding a term that consists of the mean of the sum of squares of the network weights and biases. However, the problem with regularization is that it is difficult to determine the optimum value for the performance ratio parameter. It is desirable to determine the optimal regularization parameters automatically. One approach to this process is the Bayesian regularization of MacKay MacKay (1992) that we adopt here as an improvement on early stopping. The Bayesian regularization algorithm updates the weight and bias values according to Levenberg-Marquardt Hagan and Menhaj (1994); Marquardt (1963) optimization. It minimizes a linear combination of squared errors and weights and it also modifies the regularization parameters of the linear combination to generate a network that generalizes well. See MacKay (1992); Foresee and Hagan (1997) for more detailed discussions of Bayesian regularization. For further and general background on the ANN and how to prevent overfitting and improve generalization refer to Bishop (1995); Haykin (1999).
Iii ANN Design and Filtering
The topological structure of ANNs used in this study is presented in Figure ‣ III. The designed ANNs contain one input layer with two neurons, one hidden layer with eight neurons and one output layer with one neuron. The inputs were the basis space parameters: the HO energy, , and the basis truncation parameter, , described in Section II.1. The desired outputs were the GS energy and the GS point-proton rms radius. Separate ANNs were designed for each output. The optimum number of neurons in the hidden layer was obtained according to a trial and error process. The activation function employed for the hidden layer was a widely-used form, the hyperbolic tangent sigmoid function
It has been proven that one hidden layer and sigmoidlike activation function in this layer are sufficient to approximate any continuous real function, given sufficient number of neurons in the hidden layer Cybenko (1989).
Every ANN creation and initialization function starts with different initial conditions, such as initial weights and biases and different division of the training, validation, and test datasets. These different initial conditions can lead to very different solutions for the same problem. Moreover, it is also possible to fail to obtain realistic solutions with ANNs for certain initial conditions. For this reason, it is a good idea to train many networks and choose the networks with best performance function values to make further predictions. The performance function, the MSE in our case, measures how well ANN can predict data, i.e., how well ANN can be generalized to new data. The test datasets are a good measure of generalization for ANNs since they are not used in training. A small value on the performance function on the test dataset indicates an ANN with good performance was found. However, every time the training function is called, the network gets a different division of the training, validation, and test datasets. That is why, the test sets selected by the training function are a good measure of predictive capabilities for each respective network, but not for all the networks.
MATLAB software v9.4.0 (R2018a) with NEURAL NETWORK TOOLBOX was used for the implementation of this work. As mentioned before in Section I, the application here is the nucleus. The dataset was generated with the ab initio NCSM calculations using the MFDn code with the Daejeon16 NN interaction Shirokov et al. (2016) and a sequence of basis spaces up through . The basis space corresponds to our largest matrix diagonalized using the ab initio NCSM approach for with dimension of about . Only the “natural” parity states, which have even values for , were considered in this work.
For our application here, we choose to compare the performance for all the networks by taking the original dataset and dividing it into a design set and a test set. The design (test) set consists of 16/19 (3/19) of the original dataset. The design set is further randomly divided by the train function into a training set and another test set. This training (test) set comprises 90% (10%) of the design set.
For each design set, we train 100 ANNs with the above architecture and with each ANN starting from different initial weights and biases. To ensure good generalization, each ANN is retrained ten times, during which we sequentially evolve the weights and biases. A back-propagation algorithm with Bayesian regularization with MSE performance function was used for ANN training. Bayesian regularization does not require a validation dataset.
For function approximation, Bayesian regularization provides better generalization performance than early stopping in most cases, but it takes longer to converge to the desired performance ratio. The performance improvement is more noticeable when the dataset is small because Bayesian regularization does not require a validation dataset, leaving more data for training. In MATLAB, Bayesian regularization has been implemented in the function trainbr. When using trainbr, it is important to train the network until it reaches convergence. In this study, the training process is stopped if: (i) it reaches the maximum number of iterations, 1000; (ii) the performance has an acceptable level; (iii) the estimation error is below the target; or (iv) the Levenberg-Marquardt adjustment parameter becomes larger than . A typical indication for convergence is when the maximum value of has been reached.
In order to develop confidence in our ANNs, we organize a sequence of challenges consisting of choosing original datasets that have successively improved information originating from NCSM calculations. That is, we define an “original dataset” to consist of NCSM results at 19 selected values of and then in 2.5 increments covering 10 – 50 for all values up through, for example, 10 (our first original dataset). We define our second original dataset to consist of NCSM results at the same values of but for all values up through 12. We continue to define additional original datasets until we have exhausted available NCSM results at .
To split each original dataset (defined by its cutoff value) into 16/19 and 3/19 subsets we randomly choose three points for each value within the cutoff value. The resulting 3/19 set is our test set used to subselect optimum networks from these 100 ANNs. Figure ‣ i shows the general procedure for selecting the ANNs used to make predictions for nuclear physics observables, where “test1” is the 3/19 test set described above. We retain only those networks which have a MSE on the 3/19 test set below 0.002 () for the GS energy (GS point-proton rms radius). We then cycle through this entire procedure with a specific original dataset 400 times in order to obtain an estimated 50 ANNs that would satisfy additional sc9reening criteria. That is, the retained networks are further filtered based on the following criteria.
The networks must have a MSE on their design set below 0.0002 () for the GS energy (GS point-proton rms radius).
For the GS energy, the networks’ predictions should satisfy the theoretical physics upper-bound (variational) condition for all increments in up to . That is the ANNs predictions for the GS energy should decrease uniformly with increasing up to . All ANNs at this stage of filtering were found to satisfy this criteria so no ANNs were rejected according to this condition.
Pick the best 50 networks based on their performance on the design set which satisfy a three- rule: the predictions at () for the GS energy (GS point-proton rms radius) produced by these 50 networks are required to lie within three standard deviations (three-) of their mean. Thus, predictions lying outside three- are discarded as outliers. This is an iterative method since a revised standard deviation could lead to the identification of additional outliers. The three- method was initially proposed in Gross and Stadler (2008) and then implemented by the Granada group for analysis of NN scattering data Navarro Pérez et al. (2015).
If, at this stage, we obtained less than 50 networks in our statistical sample we go through the entire procedure with that specific original dataset an additional 400 times. In no case did we find it necessary to run more than 1200 cycles.
Iv Results and Discussions
This section presents results along with their estimated uncertainties for the GS energy and point-proton rms radius using the feed-forward ANN method. Comparison with results from other extrapolation methods is also provided. Preliminary results of this study were presented in Negoita et al. (2018). The results of this work extend the preliminary results as follows: multiple original datasets up through a succession of cutoffs: , and 18 are used to design, train and test the networks; for each original dataset, 50 best networks are selected using the methodology described in Section III and the distribution of the results is presented as input for the uncertainty assessment.
The 50 selected ANNs for each original dataset were used to predict the GS energy at and the GS point-proton rms radius at for 19 aforementioned values of These ANN predictions were found to be approximately independent of . The ANN estimate of the converged result, i.e., the result from an infinite matrix, was taken to be the median of the predicted results at () over the 19 selected values of for each original dataset.
In order to obtain the uncertainty assessments of the results, we constructed a histogram with a normal (Gaussian) distribution fit to the results predicted by the 50 selected ANNs for each original dataset and for each observable. Figure ‣ IV presents these histograms along with their corresponding Gaussian fits. The cutoff value of in each original dataset used to design, train and test the networks is indicated on each plot along with the parameters used in fitting: the mean ( or ) and the quantified uncertainty () indicated in parenthesis as the amount of uncertainty in the least significant figures quoted. The mean values ( or ) represent the extrapolates obtained using the feed-forward ANN method. It is evident from the Gaussian fits in Figure ‣ IV that, as we successively expand the original dataset to include more information originating from NCSM calculations by increasing the cutoff value of in the dataset, the uncertainty generally decreases. Furthermore, there is apparent consistency with increasing cutoff since successive extrapolates are consistent with previous extrapolates within the assigned uncertainties for each observable. An exception is the GS point-proton rms radius when using the original dataset with cutoff . In this case, note the single Gaussian distribution exhibits an uncertainly much bigger than the case with cutoff . The histogram for at cutoff shows a hint of multiple peaks, which could indicate multiple local minima within the limited sample of 50 ANNs.
Upon further inspection of Figure ‣ IV, one may question whether a Gaussian approximately represents all sets of histograms. Let us consider the five cases, which stand out in this regard: the GS energy for and as well as the point-proton rms radius for , , and . These five cases exhibit gaps and outliers more prominently than the remaining cases. For these five cited cases, with the exception of the point-proton rms radius case discussed above, we do find that at least 63% of the ANNs lie within their quoted 1- value and at least 86% lie within their quoted 2- value. Owing to the method of discarding outliers described above, all 50 ANNs fall within their quoted 3- values. These characteristics of the distributions of our ANN results lend support to our Gaussian fit procedure.
It is worth noting that the widths of the Gaussian fits to the histograms suggest that there is a larger relative uncertainty of the point-proton radius extrapolation than that of the GS energy extrapolation produced by the ANNs. In other words, as one proceeds down the five panels in Figure ‣ IV from the top, the uncertainty in the GS energy decreases significantly faster than the uncertainty in the point-proton radius. This reflects the well-known feature of NCSM results in a HO basis where long-range observables, such as , are more sensitive than the GS energy to the slowly converging asymptotic tails of the nuclear wave function.
Figure ‣ IV presents the sequence of extrapolated results for the GS energy using the feed-forward ANN method in comparison with results from “Extrapolation A5” Shin et al. (2017) and “Extrapolation B” Maris et al. (2009); Maris and Vary (2013) methods. Uncertainties are indicated as error bars and are quantified using the rules from the respective procedures. The experimental result is also shown by the black horizontal solid line Tilley et al. (2002). The “Extrapolation B” method adopts a three-parameter extrapolation function that contains a term that is exponential in . The “Extrapolation A5” method adopts a five-parameter extrapolation function that contains a term that is exponential in in addition to the single exponential in used in the “Extrapolation B” method. Note in Figure ‣ IV the convergence pattern for the GS energy with increasing cutoff values. All extrapolation methods provide their respective error bars which generally decrease with increasing cutoff . Also note the visible upward trend for the extrapolated energies when using the feed-forward ANN method while there is a downward trend for the “Extrapolation A5” and “Extrapolation B” methods. While these smooth trends in the extrapolated results of Figure ‣ IV suggest systematic errors are present in each method, the quoted uncertainties are large enough to nearly cover the systematic trends displayed within each method but the quoted uncertainties are not sufficient to cover the differences between the methods.
Figure ‣ IV presents the sequence of extrapolated results for the GS point-proton rms radius using the feed-forward ANN method in comparison with results from “Extrapolation A3” Shin et al. (2017) method. The “Extrapolation A3” method adopts a different three-parameter extrapolation function than the “Extrapolation A5” method used for the GS energy. For the GS point-proton rms radius there is mainly a systematic upward trend in the extrapolations and the uncertainties are only decreasing slowly with cutoff when using the “Extrapolation A3” method. However, when using the feed-forward ANN method, the predicted rms radius increases until cutoff and then decreases again. The experimental result is shown by the bold black horizontal line and its error band is shown by the thin black lines above and below the experimental line. We quote the experimental value for the GS point-proton rms radius that has been extracted from the measured charge radius by applying established electromagnetic corrections Tanihata et al. (2013).
While the extrapolation results from the ANNs show reasonable consistency with each other as a function of increasing the cutoff of the training data sets, there are trends in these extrapolations suggesting that systematic uncertainties are also present in the ANN predictions. Note that the analytical functions employed for extrapolations show trends suggesting that they also have systematic uncertainties. As a consequence, one can surmise that results presented in Figures ‣ IV and ‣ IV suggest that all results would be more consistent with each other if their current internal estimates of uncertainties were at least doubled to encompass the role of their respective, but as yet unquantified, systematic uncertainties. However, our comparisons in Figures ‣ IV and ‣ IV are not sufficient to indicate a quantitative systematic uncertainty for each method. Rather, we employ the present comparisons to reveal the likely presence of systematic uncertainties in the compared methods and suggest a comprehensive study of results from multiple nuclei and different interactions will be needed to fully quantify the systematic uncertainties of each method.
The extrapolated results along with their uncertainty estimations for the GS energy and the GS point-proton rms radius of and the variational upper bounds for the GS energy are also quoted in Table 1. The extrapolation arises when using all available results up through the cutoff values shown in the table. All the extrapolated energies were below their respective variational upper bounds. Our current results, taking into consideration our assessed uncertainties, appear to be reasonably consistent with the results of the single ANN using the dataset up through the cutoff developed in Negoita et al. (2018). Also note the feed-forward ANN method produces smaller uncertainty estimations than the other extrapolation methods. In addition, as seen in Figures ‣ IV and ‣ IV, the ANN predictions imply that Daejeon16 provides converged results slightly further from experiment than the other extrapolation methods.
To illustrate a convergence example, the network with the lowest performance function, i.e., the lowest MSE, using the original dataset at is selected from among the 50 networks to predict the GS energy (GS point-proton rms radius) for at , 14, 16, 18 and 70 (90). Figure ‣ IV presents these ANN predicted results of the GS energy and point-proton rms radius and the corresponding NCSM calculation results at the available succession of cutoffs: , 14, 16, and 18 for comparison as a function of . The solid curves are smooth curves drawn through 100 data points of the ANN predictions and the individual symbols represent the NCSM calculation results. The nearly converged result predicted by the best ANN and its uncertainty estimation, obtained as described in the text above, are also shown by the shaded area at and for the GS energy and the GS point-proton rms radius, respectively. Figure ‣ IV shows good agreement between the ANN predictions and the calculated NCSM results at 12 – 18.
Predictions of the GS energy by the best 50 ANNs converged uniformly with increasing down towards the final result. In addition, these predictions became increasingly independent of the basis space parameters, and . The ANN is successfully simulating what is expected from the many-body theory applied in a configuration interaction approach. That is, the energy variational principle requires that the GS energy behaves as a non-increasing function of increasing matrix dimensionality at fixed (basis space dimension increases with increasing ). That the ANN result for the GS energy is essentially a flat line at provides a good indication that the ANN is producing a valuable estimate of the converged GS energy.
The GS point-proton rms radii provide a dependence on the basis size and which is distinctly different from the GS energy in the NCSM. In particular, these radii are not monotonic with increasing at fixed and they are more slowly convergent with increasing basis size. However, the GS point-proton rms radius converges monotonically from below for most of the range shown. More importantly, the GS point-proton rms radius also shows the anticipated convergence to a flat line when using the ANN predictions at .
V Conclusion and Future Work
We used NCSM computational results to train feed-forward ANNs to predict the properties of the nucleus, in particular the converged GS energy and the converged point-proton rms radius along with their quantified uncertainties. The advantage of the ANN method is that it does not need any mathematical relationship between input and output data as opposed to other available extrapolation methods. The architecture of ANNs consisted of three layers: two neurons in the input layer, eight neurons in the hidden layer and one neuron in the output layer. Separate ANNs were designed for each output.
We have generated theoretical data for by performing ab initio NCSM calculations with the MFDn code using the Daejeon16 NN interaction and HO basis spaces up through the cutoff .
To improve the fidelity of our predictions, we use an ensemble of ANNs obtained from multiple trainings to make predictions for the quantities of interest. This involved developing a sequence of applications using multiple datasets up through a succession of cutoffs. That is, we adopt cutoffs of , and 18 at 19 selected values of 8 – 50 to train and test the networks.
We introduced a method for quantifying uncertainties using the feed-forward ANN method by constructed a histogram with a normal (Gaussian) distribution fit to the converged results predicted by the best performing 50 ANNs. The ANN estimate of the converged result (i.e. the result from an infinite matrix) was taken to be the median of the predicted results at over the 19 selected values of for the GS energy (GS point-proton rms radius). The parameters used in fitting the normal distribution were the mean, which represents the extrapolate, and the quantified uncertainty, .
The designed ANNs were sufficient to produce results for these two very different observables in from the ab initio NCSM. Through our tests, the ANN predicted results were in agreement with the available ab initio NCSM results. The GS energy and the GS point-proton rms radius showed good convergence patterns and satisfied the theoretical physics condition, independence of basis space parameters in the limit of extremely large matrices.
Comparisons of the ANN results with other extrapolation methods of estimating the results in the infinite matrix limit were also provided along with their quantified uncertainties. The results for ultra-large basis spaces were in approximate agreement with each other. Table 1 presents a summary of our results, performed with the feed-forward ANN method introduced here, as well as performed with the “Extrapolations A” and “Extrapolation B” methods, introduced earlier.
By these measures, ANNs are seen to be successful for predicting the results of ultra-large basis spaces, spaces too large for direct many-body calculations. It is our hope that ANNs will help reap the full benefits of HPC investments.
As future work, additional isotopes such as , , and , then heavier nuclei, will be investigated using the ANN method and the results will be compared with results from other extrapolation methods. Moreover, this method will be applied to other observables such as magnetic moment, quadruple transition rates, etc.
This work was supported in part by the Department of Energy under Grants No. DE-FG02-87ER40371 and No. DESC000018223 (SciDAC-4/NUCLEI), and by Professor Glenn R. Luecke’s funding at Iowa State University. The work of A.M.S. was supported by the Russian Science Foundation under Project No. 16-12-10048. The work of I.J.S and Y.K. was supported partly by the Rare Isotope Science Project of Institute for Basic Science funded by Ministry of Science, ICT and Future Planning and NRF of Korea (2013M7A1A1075764). Computational resources were provided by the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231.
- Barrett et al. (2013) B. R. Barrett, P. Navrátil, and J. P. Vary, Progress in Particle and Nuclear Physics 69, 131 (2013), DOI: 10.1016/j.ppnp.2012.10.003, ISSN: 0146-6410.
- Vary et al. (2009) J. P. Vary, P. Maris, E. Ng, C. Yang, and M. Sosonkina, Journal of Physics: Conference Series 180, 012083 (2009), DOI: 10.1088/1742-6596/180/1/012083, arXiv:0907.0209 [nucl-th].
- Maris et al. (2009) P. Maris, J. P. Vary, and A. M. Shirokov, Physical Review C 79, 014308 (2009), DOI: 10.1103/PhysRevC.79.014308.
- Maris and Vary (2013) P. Maris and J. P. Vary, International Journal of Modern Physics E 22, 1330016 (2013), DOI: 10.1142/S0218301313300166, ISSN: 1793-6608.
- Shirokov et al. (2014) A. M. Shirokov, V. A. Kulikov, P. Maris, and J. P. Vary, in Nucleon-Nucleon and Three-Nucleon Interactions, edited by L. Blokhintsev and I. Strakovsky (Nova Science, Hauppauge, 2014), chap. 8, pp. 231–256.
- Shin et al. (2017) I. J. Shin, Y. Kim, P. Maris, J. P. Vary, C. Forssén, J. Rotureau, and N. Michel, Journal of Physics G: Nuclear and Particle Physics 44, 075103 (2017).
- Clark (1999) J. W. Clark, in Scientific Applications of Neural Nets, Springer Lecture Notes in Physics, edited by J. W. Clark, T. Lindenau, and M. L. Ristig (Springer-Verlag, Berlin, 1999), vol. 522, pp. 1–96.
- David et al. (1995) C. David, M. Freslier, and J. Aichelin, Physical Review C 51, 1453 (1995), DOI: 10.1103/PhysRevC.51.1453.
- Bass et al. (1996) S. A. Bass, A. Bischoff, J. A. Maruhn, H. Stöcker, and W. Greiner, Physical Review C 53, 2358 (1996), DOI: 10.1103/PhysRevC.53.2358.
- Haddad et al. (1997) F. Haddad et al., Physical Review C 55, 1371 (1997), DOI: 10.1103/PhysRevC.55.1371.
- Clark et al. (2001) J. W. Clark, E. Mavrommatis, S. Athanassopoulos, A. Dakos, and K. Gernoth, in Proceedings of the International Workshop Fission Dynamics of Atomic Clusters and Nuclei, 15-19 May, 2000, Luso, Portugal (World Scientific, Singapore, 2001), pp. 76–85, DOI: 10.1142/9789812811127_0008, arXiv:0109081 [nucl-th].
- Athanassopoulos et al. (2004) S. Athanassopoulos, E. Mavrommatis, K. Gernoth, and J. Clark, Nuclear Physics A 743, 222 (2004), DOI: 10.1016/j.nuclphysa.2004.08.006, ISSN: 0375-9474.
- Athanassopoulos et al. (2005) S. Athanassopoulos, E. Mavrommatis, K. A. Gernoth, and J. W. Clark, in Proceedings for the 14 Hellenic Symposium on Nuclear Physics (2005), arXiv:0509075 [nucl-th].
- Costiris et al. (2007) N. J. Costiris, E. Mavrommatis, K. A. Gernoth, and J. W. Clark, in Proceedings of the 16 Panhellenic Symposium of the Hellenic Nuclear Physics Society (2007), arXiv:0701096 [nucl-th].
- Costiris et al. (2009) N. J. Costiris, E. Mavrommatis, K. A. Gernoth, and J. W. Clark, Physical Review C 80, 044332 (2009), DOI: 10.1103/PhysRevC.80.044332, arXiv:0806.2850 [nucl-th].
- Akkoyun et al. (2013a) S. Akkoyun, T. Bayram, S. O. Kara, and A. Sinan, Journal of Physics G: Nuclear and Particle Physics 40, 055106 (2013a), DOI: 10.1088/0954-3899/40/5/055106.
- Akkoyun et al. (2013b) S. Akkoyun, T. Bayram, S. O. Kara, and N. Yildiz, Physics of Particles and Nuclei Letters 10, 528 (2013b), DOI: 10.1134/S1547477113060022, ISSN: 1531-8567.
- Utama et al. (2016a) R. Utama, J. Piekarewicz, and H. B. Prosper, Physical Review C 93, 014311 (2016a), DOI: 10.1103/PhysRevC.93.014311, arXiv:1508.06263 [nucl-th].
- Utama et al. (2016b) R. Utama, W. C. Chen, and J. Piekarewicz, Journal of Physics G 43, 114002 (2016b), DOI: 10.1088/0954-3899/43/11/114002, arXiv:1608.03020 [nucl-th].
- Utama and Piekarewicz (2017) R. Utama and J. Piekarewicz, Physical Review C 96, 044308 (2017), DOI: 10.1103/PhysRevC.96.044308, arXiv:1704.06632 [nucl-th].
- Utama and Piekarewicz (2018) R. Utama and J. Piekarewicz, Physical Review C 97, 014306 (2018), DOI: 10.1103/PhysRevC.97.014306, arXiv:1709.09502 [nucl-th].
- Neufcourt et al. (2018) L. Neufcourt, Y. Cao, W. Nazarewicz, and F. Viens, Physical Review C 98, 034318 (2018), DOI: 10.1103/PhysRevC.98.034318.
- Sternberg et al. (2008) P. Sternberg et al., in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing – International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2008) Nov. 15–21, 2008, Austin, TX, USA (IEEE, Piscataway, 2008), pp. 1–12, DOI: 10.1109/SC.2008.5220090, ISSN: 2167-4329, ISBN: 978-1-4244-2834-2.
- Maris et al. (2010) P. Maris, M. Sosonkina, J. P. Vary, E. Ng, and C. Yang, Procedia Computer Science 1, 97 (2010), ICCS 2010, DOI: 10.1016/j.procs.2010.04.012, ISSN: 1877-0509.
- Aktulga et al. (2014) H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, Concurrency and Computation: Practice and Experience 26, 2631 (2014), DOI: 10.1002/cpe.3129, ISSN: 1532-0634.
- Shirokov et al. (2016) A. Shirokov et al., Physics Letters B 761, 87 (2016), DOI: 10.1016/j.physletb.2016.08.006, ISSN: 0370-2693.
- Forssén et al. (2018) C. Forssén, B. D. Carlsson, H. T. Johansson, D. Sääf, A. Bansal, G. Hagen, and T. Papenbrock, Physical Review C 97, 034328 (2018), DOI: 10.1103/PhysRevC.97.034328, arXiv:1712.09951 [nucl-th].
- Negoita et al. (2018) G. A. Negoita, G. R. Luecke, J. P. Vary, P. Maris, A. M. Shirokov, I. J. Shin, Y. Kim, E. G. Ng, and C. Yang, in Proceedings of the Ninth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking COMPUTATION TOOLS 2018 February 18–22, 2018, Barcelona, Spain (IARIA, Wilmington, 2018), pp. 20–28, ISSN: 2308-4170, ISBN: 978-1-61208-613-2.
- Entem and Machleidt (2002) D. Entem and R. Machleidt, Physics Letters B 524, 93 (2002), ISSN 0370-2693, DOI: 10.1016/S0370-2693(01)01363-6.
- Entem and Machleidt (2003) D. R. Entem and R. Machleidt, Physical Review C 68, 041001 (2003), DOI: 10.1103/PhysRevC.68.041001.
- Bogner et al. (2007) S. K. Bogner, R. J. Furnstahl, and R. J. Perry, Physical Review C 75, 061001 (2007), DOI: 10.1103/PhysRevC.75.061001, arXiv:0611045 [nucl-th].
- Bogner et al. (2010) S. K. Bogner, R. J. Furnstahl, and A. Schwenk, Progress in Particle and Nuclear Physics 65, 94 (2010), DOI: 10.1016/j.ppnp.2010.03.001, arXiv:0912.3688 [nucl-th].
- Lurie and Shirokov (1997) Y. A. Lurie and A. M. Shirokov, Izv. Ross. Akad. Nauk, Ser. Fiz. 61, 2121 (1997), [Bull. Rus. Acad. Sci., Phys. Ser. 61, 1665 (1997)].
- Lurie and Shirokov (2008) Y. A. Lurie and A. M. Shirokov, in The J-Matrix Method: Developments and Applications, edited by A. D. Alhaidari, H. A. Yamani, E. J. Heller, and M. S. Abdelmonem (Springer Netherlands, Dordrecht, 2008), pp. 183–217, DOI: 10.1007/978-1-4020-6073-1_11, SBN: 978-1-4020-6073-1; Ann. Phys. (NY) 312, 284 (2004).
- Shirokov et al. (2004) A. M. Shirokov, A. I. Mazur, S. A. Zaytsev, J. P. Vary, and T. A. Weber, Physical Review C 70, 044005 (2004), DOI: 10.1103/PhysRevC.70.044005.
- Negoita (2010) G. A. Negoita, Ph.D. thesis, Iowa State University (2010), URL: https://lib.dr.iastate.edu/etd/11346.
- Caprio et al. (2012) M. A. Caprio, P. Maris, and J. P. Vary, Physical Review C 86, 034312 (2012), DOI: 10.1103/PhysRevC.86.034312.
- Caprio et al. (2014) M. A. Caprio, P. Maris, and J. P. Vary, Physical Review C 90, 034305 (2014), DOI: 10.1103/PhysRevC.90.034305, arXiv:1409.0877 [nucl-th].
- Constantinou et al. (2017) C. Constantinou, M. A. Caprio, J. P. Vary, and P. Maris, Nuclear Science and Techniques 28, 179 (2017), DOI: 10.1007/s41365-017-0332-6, arXiv:1605.04976 [nucl-th].
- Parlett (1998) B. N. Parlett, The Symmetric Eigenvalue Problem (SIAM, Philadelphia, 1998), DOI: 10.1137/1.9781611971163, ISBN: 978-0-89871-402-9.
- Lipkin (1958) H. J. Lipkin, Physical Review 109, 2071 (1958), DOI: 10.1103/PhysRev.109.2071.
- Gloeckner and Lawson (1974) D. H. Gloeckner and R. D. Lawson, Physics Letters B 53, 313 (1974), DOI: 10.1016/0370-2693(74)90390-6.
- Coon et al. (2012) S. A. Coon, M. I. Avetian, M. K. G. Kruse, U. van Kolck, P. Maris, and J. P. Vary, Physical Review C 86, 054002 (2012), DOI: 10.1103/PhysRevC.86.054002.
- Furnstahl et al. (2012) R. J. Furnstahl, G. Hagen, and T. Papenbrock, Physical Review C 86, 031301 (2012), DOI: 10.1103/PhysRevC.86.031301.
- More et al. (2013) S. N. More, A. Ekström, R. J. Furnstahl, G. Hagen, and T. Papenbrock, Physical Review C 87, 044326 (2013), DOI: 10.1103/PhysRevC.87.044326.
- Wendt et al. (2015) K. A. Wendt, C. Forssén, T. Papenbrock, and D. Sääf, Physical Review C 91, 061301 (2015), DOI: 10.1103/PhysRevC.91.061301.
- Odell et al. (2016) D. Odell, T. Papenbrock, and L. Platter, Physical Review C 93, 044331 (2016), DOI: 10.1103/PhysRevC.93.044331, arXiv:1512.04851 [nucl-th].
- Acharya et al. (2017) B. Acharya, A. Ekström, D. Odell, T. Papenbrock, and L. Platter, Physical Review C 95, 031301 (2017), DOI: 10.1103/PhysRevC.95.031301, arXiv:1608.04699 [nucl-th].
- Hornik et al. (1989) K. Hornik, M. Stinchcombe, and H. White, Neural Networks 2, 359 (1989), DOI: 10.1016/0893-6080(89)90020-8, ISSN: 0893-6080.
- Bishop (1995) C. M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995), ISBN: 978-0198538646.
- Haykin (1999) S. Haykin, Neural Networks: A Comprehensive Foundation (Prentice-Hall Inc., Englewood Cliffs, 1999), Englewood Cliffs, NJ, USA, ISBN: 978-0132733502.
- Hagan and Menhaj (1994) M. T. Hagan and M. B. Menhaj, IEEE Transactions on Neural Networks 5, 989 (1994), DOI: 10.1109/72.329697, ISSN: 1045-9227.
- MacKay (1992) D. J. MacKay, Neural Computation 4, 415 (1992), DOI: 10.1162/neco.19184.108.40.2065, ISSN: 0899-7667.
- Marquardt (1963) D. W. Marquardt, Journal of the Society for Industrial and Applied Mathematics 11, 431 (1963), SIAM, DOI: 10.1137/0111030, ISSN: 2168-3484.
- Foresee and Hagan (1997) F. D. Foresee and M. T. Hagan, in Proceedings of the International Joint Conference on Neural Networks (IEEE, Piscataway, 1997), vol. 3, pp. 1930–1935, DOI: 10.1109/ICNN.1997.614194.
- Cybenko (1989) G. Cybenko, Mathematics of Control, Signals and Systems 2, 303 (1989), DOI: 10.1007/BF02551274, ISSN: 1435-568X.
- Gross and Stadler (2008) F. Gross and A. Stadler, Physical Review C 78, 014005 (2008), DOI: 10.1103/PhysRevC.78.014005, arXiv:0802.1552 [nucl-th].
- Navarro Pérez et al. (2015) R. Navarro Pérez, J. E. Amaro, and E. R. Arriola, Physical Review C 91, 029901 (2015), DOI: 10.1103/PhysRevC.91.029901, arXiv:1310.2536 [nucl-th].
- Tilley et al. (2002) D. Tilley, C. Cheves, J. Godwin, G. Hale, H. Hofmann, J. Kelley, C. Sheu, and H. Weller, Nuclear Physics A 708, 3 (2002), DOI: 10.1016/S0375-9474(02)00597-3, ISSN: 0375-9474.
- Tanihata et al. (2013) I. Tanihata, H. Savajols, and R. Kanungo, Progress in Particle and Nuclear Physics 68, 215 (2013), DOI: 10.1016/j.ppnp.2012.07.001, ISSN: 0146-6410.