# Building machine learning force fields for nanoclusters

###### Abstract

We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analysing the performance of 2-body, 3-body and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within a 0.1 eV/Å average error even for small training datasets, and achieve high accuracy even on out-of-sample, high temperature, structures. While training and testing on the same structure always provides satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [Glielmo et al. PRB 97, 184307 (2018)]. We use this to assess the thermal stability of Ni nanoclusters at a fractional cost of full ab initio calculations.

## I Introduction

Metallic nanoparticles have fascinating chemo-physical properties, different from those of their individual atomic constituents and their bulk counterpartsTyo and Vajda (2015); Li et al. (2012); Billas, Chatelain, and de Heer (1994); Di Paola, DâAgosta, and Baletto (2016); Cottancin et al. (2006). Because of the variety of isomers accessible at finite temperatures and the lack of translational symmetry, implying a non-trivial interplay between their geometric and electronic structure, a comprehensive understanding of metallic nanoclusters remains challenging, despite their potential use in many advanced applications Steenbergen, Schebarchov, and Gaston (2012); Ojha, Steenbergen, and Gaston (2013); Steenbergen and Gaston (2013, 2015a); Ojha, Steenbergen, and Gaston (2015); Steenbergen and Gaston (2015b); Vargas et al. (2009); Pavan, Di Paola, and Baletto (2013); Santarossa et al. (2010); Zhai and Alexandrova (2016, 2017). Density Functional Theory (DFT) is the most common framework to investigate the static and dynamical properties of nanoclusters of few tens of atoms, for which standard classical force fields cannot systematically be relied upon to provide sufficient accuracy. However, DFT-based calculations are very expensive, and probing limited timescales in first principles dynamical simulations may lead to poor sampling of the nanoclusters’ configuration space.

Machine Learning Force Fields (ML-FFs) may provide a solution to this problem, and the needed access to the dynamical properties of nanoclusters, by extending by several orders of magnitude the accessible time scale, while still describing sufficiently accurately the interactions between the cluster atoms. The ML-FFs of most widespread use in cluster science are based on artificial neural networks Behler and Parrinello (2007). In many works aiming at investigating nanoclusters’ properties the training databases were constructed from clusters of several sizes, involving structures based on different Bravais lattices and surfaces with different crystallographic orientations Artrith, Hiller, and Behler (2013); Artrith and Kolpak (2014, 2015); Ouyang, Xie, and Jiang (2015); Chiriki and Bulusu (2016); Chiriki, Jindal, and Bulusu (2017) . The ex novo production of such databases requires many expensive quantum calculations: while some redundancy is hard to avoid, the neural network architecture, by means of multiple layers and a high number of fitted parameters, is usually able to extract the necessary information and correctly predict energies and forces in specific scenarios. The resulting trained force-field, although versatile, can be however difficult to interpret because of the inherent complexity of the algorithm.

Another commonly used class of ML-FFs is based on Gaussian Process (GP) regression Bartók et al. (2010); Glielmo, Sollich, and De Vita (2017), and has recently been used to predict properties of both bulk Suzuki, Tamura, and Miyazaki (2017); Deringer and Csányi (2017) and molecular Uteva et al. (2017); Cui and Krems (2016); Miwa and Ohno (2016) systems. While GPs have been also applied to predict adsorption energies of small molecules on NiGa and RhAu nanoclusters Ulissi et al. (2017); Jinnouchi and Asahi (2017), they have never been used so far to estimate the finite-temperature structural properties of a nanoparticle. GPs are usually easier to interpret as they contain a small number of physically meaningful hyperparameters. Moreover, including symmetries such as the translational invariance and rotational covariance of forcesGlielmo, Sollich, and De Vita (2017) or choosing simplifying approximations such as selecting the -body order of interaction between atomsGlielmo, Zeni, and De Vita (2018); Bartók et al. (2013); Shapeev (2016); Deringer and Csányi (2017) in the algorithm is straightforward in the case of GPs, where these properties can be encoded the kernel function, enabling fast training and high prediction accuracy.

In this work we systematically asses the GP force estimates for a set of Ni nanocluster structures, as obtained from 2-, 3-, and many-body kernels, and for a number of training databases of different nature and size.
Consistent with the significant change of physical properties occurring during the atom-to-bulk transition, we find that a 2-body kernel generally fails to correctly predict forces in small-sized Ni nanoclusters, despite doing so for bulk Ni systems.
However, a 3-body kernel performs well, hinting to an increased significance of angular terms in the bonding.

The choice of training dataset is key to the performance of any ML-FF. The search for an optimal training dataset which may encompass all the relevant structures while avoiding redundancy is therefore of interest, to guarantee accuracy while limiting the need to produce ad-hoc ab-initio databases. Consistent with intuition, the GP accuracy decreases as the structural dissimilarity increases between the training and testing morphologies, and the best accuracy is found when using homogeneous training databases highly similar to the target testing structure. However, heterogeneous training databases provide a just slightly less good overall prediction performance, while the trained kernels display a much higher degree of versatility, predicting accurate forces also in out-of-sample tests. Furthermore, 3-body ML-FFs trained on an heterogeneous database accurately reproduce structural fingerprints such as pair distance distribution functions at finite temperatures. Using this ML-FF to derive a non parametric machine-learning mapped force field (“M-FF”) via the procedure discussed in Ref. [Glielmo, Zeni, and De Vita, 2018] makes it possible to execute tens of ns long simulations with a minimal computational effort, allowing to assess the thermal stability of Ni nanoclusters.

In the next section, we introduce the necessary GP formalism (II A), provide expressions for the kernels used throughout the work (II B) and briefly explain the concept of “mapped” force field Glielmo, Zeni, and De Vita (2018) (II C). We then describe a protocol for the validation of force predictions (III A) and discuss the performance achieved by the three kernels when tested on structures either very similar or morphologically different from the ones present in the training database for single-structure (III B) and multi-structure (III C) training. The construction of the 3-body M-FF and its validation are described (III D), while its application in MD simulations investigating the behaviour of Ni clusters in the 300-1200 K temperature range is described in section III E.

##
Ii Machine Learning

force fields

### ii.1 Gaussian process regression

A GP regression Rasmussen and Williams (2006) is a Bayesian method to learn a function from a finite database of input-output pairs. As we are interested in learning the local force acting on any given atom, we construct such training database by extracting (from a DFT simulation) a set of local configurations relative to each atom and the corresponding forces on that atom. This database is then partitioned into a training set (with entries) used for learning, and a test set (with entries) used for validation. As the space of forces is three dimensional, we here use the multi-output (vectorial) version of GP regressionAlvarez et al. (2012); Micchelli and Pontil (2005); Glielmo, Sollich, and De Vita (2017), for which the learned function takes the form

(1) |

where is a matrix-valued kernel function encoding the correlation of the forces relative to any two atomic environments.

The coefficients in Eq. (1) can be written in closed form as

(2) |

where is the covariance matrix containing block entries , is the identity matrix and is a regularization hyperparameter that formally governs the uncertainty associated with the training dataset outputs , which has been kept fixed at a value of 10. The performance of a GP is determined by the choice of the kernel function , its hyperparameters and by the choice of training set .

### ii.2 Kernel functions

The kernel function should be invariant w.r.t. translation and permutation of identical atoms, and covariant (when predicting forces) w.r.t. rotation of the configurations.
Furthermore, the function should have a spatial resolution compatible with the features of the energy landscape encoded in the training dataset; this is taken care of by optimizing the kernel hyperparameters.
A useful property of a kernel function for force or energy prediction is its , that is, the maximum number of simultaneously interacting particles it can describe (see
Ref.[Glielmo, Zeni, and De Vita, 2018] for a formal definition).
Here we will use 2-body, 3-body and many-body force kernels Deringer and Csányi (2017); Glielmo, Sollich, and De Vita (2017).

#### 2-body.

We assume that the force acting on an atom located at position is a sum of independent contributions associated with every other atom in its local environment .

Each configuration is expressed as a sum of Gaussian functions of width representing individual atoms “SOAP representation” Bartók, Kondor, and Csányi (2013)). A natural way to obtain a rotation invariant scalar energy kernel would be via integration over the group of rotations of the space-integrated overlap of each couple of configurations Bartók, Kondor, and Csányi (2013). The covariance of predicted forces, a general property not requiring the existence of an underlying invariant total energy function, can also be obtained as an integral over rotations from a suitable matrix-valued covariant integration expression, as detailed in Ref.[Glielmo, Sollich, and De Vita, 2017]. For the 2-body kernel case the integration can be carried out analytically, yielding the following matrix-valued energy-conserving kernelGlielmo, Sollich, and De Vita (2017):

(3) | |||||

where expresses the position relative to the central atom
of its neighbour.

#### 3-body.

A 3-body kernel allows to represent an angular dependence on the force components. As described in detail in Refs.[Glielmo, Zeni, and De Vita, 2018, Bartók and Csányi, 2015], a 3-body force kernel can be built as a double derivative of a 3-body energy kernel with respect to the positions of the central atoms of the configurations and :

(4) |

The 3-body energy kernel compares triplets of atoms that include the central atom across the two configurations. This kernel is intrinsically invariant under permutation, rotation and translation of the atoms in and , avoiding the need of any integration over SO(3). Each triplet is associated with a vector containing the three atomic distances i.e., . Apart from a normalisation factor, the 3-body kernel reads:

(5) |

where () is the set of cyclic permutations of three elements,
and is the single required lengthscale hyperparameter.
Summing over the permutation group is needed to guarantee permutation symmetry of the energy.
No or restriction is however imposed in the external sum, making the overall expression not limited to the case of three distinct atoms, so that the kernel also includes the 2-body case as a subset.
We note that permutation invariance in 3-body kernels could also be obtained using permutation invariant descriptors, as done in Ref.[Deringer and Csányi, 2017]

#### Many-body.

Describing arbitrarily complex interactions requires a many-body kernel function such that force prediction becomes dependent on the full local atomic environment , and is no more the result of summing independent pairwise (or triplet) contributions. A way to obtain a many body kernel (see Ref.[Glielmo, Zeni, and De Vita, 2018]) is to take the exponential of a scalar 2-body kernel :

(6) |

where

(7) |

To impose rotational force covariance we should perform an integration over the manifold of rotations (Glielmo, Sollich, and De Vita, 2017). Unfortunately the integration over SO(3) of the many-body kernel in Eq.(6) cannot be done analytically, while numeric integration is computationally heavy. We hence resort to restricting the summation to a discrete symmetry group of rotations (and reflections) whose elements are represented by orthogonal matrices :

(8) |

The optimal choice of rotation group is system-dependent: in FCC and BCC bulk environments a natural choice is to sum over all elements of the O point group. The resulting many-body kernel can be made arbitrarily accurate if given a large enough training set Glielmo, Zeni, and De Vita (2018); Csáji (2001) while the predicted force field will not be conservative (make zero work on closed loops to numerical accuracy) by construction. However, to the extent that force errors are small, the energy-conserving character of the reference Hamiltonian forces will be approximately reproduced.

### ii.3 Mapped force field (M-FF)

Once the 3-body GP has been trained, the “mapping” technique described in Ref.[Glielmo, Zeni, and De Vita, 2018] can be used to build a non-parametric 3-body force field (a M-FF) which retains the accuracy of the original GP while typically increasing its computational speed by a factor . This procedure is effectively equivalent to storing the energies predicted by the kernel (5) for a three-dimensional grid of values of the triplet of distances (, , ) occurring in a three atom system. In a more complex structure, the contributions from every triplet and atom pair are calculated by spline interpolation over the stored GP predictions of the energy values. Analytic differentiation of the spline expression produces an energy conserving force field practically indistinguishable from the predictions of the 3-body GP used to build it, while independent of the number of configurations used for GP training. This M-FF could be seen as a classical -body force field, as simple to physically interpret and fast to compute as a standard parametrised 3-body force field, whose systematic non-parametric construction requires no ad-hoc parameter choice and fine-tuning. This enables simulation times which would not be achievable by standard direct GP force prediction or by first-principles molecular dynamics based on the reference DFT Hamiltonian.

## Iii Results

We consider Ni nanoclusters of 19 atoms and gather data from Born-Oppenheimer molecular dynamics (BOMD) simulations at 300K, 600K, and 900K, from five different initial structures, represented in Figure 1: a hcp motif of three layers (3HCP), a double icosahedron (DIH), a bipyramid (BIP), a four stacked hcp layer (4HCP) structure, and a structure obtained by displacing two five-fold arranged vertexes of a double icosahedron to form two six-fold rings, also known as a “rosette” defect Apra et al. (2004) (dDIH). The first three motifs are the most energetically favourable at PBE DFT level Lu et al. (2011), the other two were found with a metadynamics sampling procedurePavan, Di Paola, and Baletto (2013); Santarossa et al. (2010) and are included here to introduce low symmetry morphologies in the database set. The BOMD simulation details are provided in the supplementary material.

### iii.1 Validation methodology

It is evident from Eq. (1) and (2) that the predictions of a GP will depend on how the training dataset is chosen. To assess the errors incurred by the three kernels while used in interpolation and (putative) extrapolation regimes, we next systematically analyse the GP predictions on test databases containing contributions from all five structures while training is carried out on different combinations of structures. This procedure allows us to introduce a novel strategy to measure the similarity between cluster geometries, based on the relative GP errors made while training and testing on two different structures. In our practical implementation, for all the GP trainings, we choose = 400 for each of our five cluster structures, yielding a total pool of 2000 testing points.

Every test is repeated five times to estimate a standard deviation for the Mean Absolute Error on Forces (MAEF), defined as the average error done by the GP on the force vector:

(9) |

where and are the reference and predicted forces acting on an atom, respectively, and indicates the Cartesian component. Our tests can be separated into three categories: self-training, cross-training and mixed-training, depending on which databases was used to build the training sets and which subset of the testing pool is used. In the self-training, the configurations used to build the and are associated with the same cluster structure. In the cross-training, the database from a structure morphology is used for training, and testing occurs on configurations associated with the other four structures. In both of the above, the database is homogeneous, that is it contains configurations relative to a single morphology. For the mixed-training we build and test all possible heterogeneous training sets that contain inputs from two, three, and all five morphologies (here as in self-training, no data point present in is allowed to be in ).

### iii.2 Self- and cross-training

We first discuss the results for self-training. Figure 10 reports a learning curve (MAEF against ) for the case of a 3HCP cluster structure. The 2-body kernel achieves its maximum accuracy for 50; similarly, the 3-body MAEF decreases with until 100 and an accuracy plateau is reached. The accuracy of the many-body kernel, on the other hand, keeps increasing with the number of training set point, as expected for an universal approximator kernel Csáji (2001); Glielmo, Zeni, and De Vita (2018). The learning curves for the other structures show the same qualitative trends (see supplementary material). Figure 3 shows the converged MAEF achieved by self-training GPs for each of the five morphologies when using 2-, 3-, and many-body kernels. The left-hand histogram reveals that modeling the atomic interactions between Ni atoms in terms of a 2-body potential yields a MAEF larger than the target accuracy of 0.15 eV/Å for all morphologies, with higher values for low symmetry ones (4HCP and dDIH). We note that this is not the case for FCC bulk Ni systems, where 2-body kernels were found to be suprisingly accurate Glielmo, Sollich, and De Vita (2017). The central and right-hand histograms in Fig. 3 reveal that both 3- and many-body kernels achieve a suitably accurate force prediction for all cluster structures if the training dataset used contains 200+ points. For comparison, the calculated MAEF of a state-of-the-art classical parametric potential for Ni Purja Pun and Mishin (2009) is 0.59 0.39 eV/Å. The relative importance of -body contributions to the forces in the five Ni cluster morphologies can be appreciated by looking at the accuracy of the -body kernels. For instance, the accuracy of 2-body and 3-body forces is very similar for the 3HCP morphology, indicating that the angular dependence of forces is not crucial in this motif, while it is more significant for the other structures. We note that comparing -body kernel predictions could be more generally used as a way to characterize the nature of the bonding occurring in complex systems such as metal nanoclusters or grain boundaries, and to reveal and quantify (dis)similarities between these systems or relative to reference bulk structures.

The MAEFs obtained for cross-testing are reported in Figure 4. In these case, the 2-body kernels MAEFs are consistently larger than 0.15 eV/Å and often twice as large. Comparing the 3- and many-body kernels reveals the accuracy achieved by the 3-body kernel strongly depends on the training database, while the many-body kernel displays more consistent errors over different structures. This could be rationalised by considering that a many-body kernel is capable of learning high- interaction terms whose contributions are effectively sampled in any morphology, even e.g., in structures in which they are quantitatively less important. These terms help maintaining a good prediction accuracy even on “partially unknown” new morphologies where higher order interactions come more into effect. The 3-body kernel is instead intrinsically restricted to 3-body interactions, and if the reference forces include (say) a 4-body interaction contribution, incorporating this by machine learning based on a lower-dimensional feature space may achieve some success only in self-training (interpolation) mode, but won’t correctly extrapolate to new structures. This suggests that the accuracy that a 3-body kernel achieves on a target structure is to a significant extent conditional to the presence of database entries representative of that structure in the training database used.

Consistently, for this kernel, the training databases comprising the low-symmetry morphologies (4HCP and dDIH) that have the most varied repertoires of atomic environments are those which work best in cross-testing. We also note how the cross-testing error incurred by training over 4HCP and testing on its higher-symmetry counterpart 3HCP is 0.18 eV/Å, while the reversed test yields a significantly larger 0.26 eV/Å MAEF. The same effect becomes even more apparent by examining the dDIH (low symmetry) and DIH (high symmetry) pair of structures, yielding 0.14 eV/Å and 0.31 eV/Å errors in the direct and reversed tests, respectively.

Further analysis of the GP predictions allows some qualitative understanding of why using different training databases leads to stronger or weaker performances over the available testing sets. We first examine the case of training on each of the five structures and testing on the dDIH structure. Figure 5 is a visual representation of the MAEFs committed at the testing stage by our three kernels after they were trained on the five single-structure training databases. As expected from Figure 4, the 2-body kernel is associated with large errors for all training sets but the dDIH one - the only one here in the self-training regime. In the case of 3-body kernel, training on a 4HCP database yields the best overall cross-training performance (while as expected, self-training on a dDIH database offers better results). This provides good accuracy on most atoms, falling short only around the rosette defect, a peculiar distortion absent in the 4HCP structure. The MAEFs incurred by training on the DIH database are also very low for the lowermost 5-fold cap of the dDIH cluster. This is expected since these local environments are very similar in these two morphologies. On the other hand, cross-training on the BIP and 3HCP datasets fails to predict forces around the icosahedral centres and the rosette defect of the dDIH. These results hold true even in the case of the many-body kernel, for which the DIH is the best performing cross-training morphology.

We next compare the pair-distance function (PDF) and the bond angular distribution function (BADF) for the five morphologies as obtained from BOMD simulations at 300 K, reported in Figure 6. These reveal structural differences between the morphologies. For instance, the PDF peak close to 3.3 Å in the 3HCP, 4HCP, and BIP morphologies is absent in the DIH and dDIH structures. Also, the BADF in the bottom panel displays a broadened distribution for 4HCP and dDIH and much more peaked ones for 3HCP, DIH, and BIP.

As a possible quantitative indicator of how well a PDF “samples” another one we calculate their KullbackLeibler (KL) divergence. For a discrete probability distribution this is calculated as:

(10) |

This (asymmetric) quantity measures the information “lost” when a function is used to approximate another function , returning a 0 value when = , and increasing positive values as grows dissimilar from . In the present context, and are the PDFs associated with the training set and testing set, respectively. Figure 7 contains a normalized scatter plot comparing the KL divergence relative to each ordered pair of PDFs taken from Figure 6 with the corresponding cross-training MAEF incurred by the 2-body kernel (see supplementary material for details on how these two quantities were normalized). The graph reveals a striking correlation between the two dissimilarity measures, generally highlighting the importance of the presence of database entries which contain pairs of atoms at distance values relevant for the testing dataset. Moreover, since the PDF can be assumed to be an unique structural descriptor in the case of monometallic nanoparticles Tribello et al. (2011); Oganov and Valle (2009); Steenbergen and Gaston (2014); Pavan, Rossi, and Baletto (2015), the correlation indicates that the 2-body cross-training error is non-trivially linked to properties that go beyond 2-body descriptors. Thus, the KL divergence between PDFs could be used as an a priori indicator of extrapolating performance of the 2-body kernel in cross-training. Similar tests for the 3-body kernel also display a positive correlation, although of smaller statistical significance (see supplementary material). The results above suggest that evaluating the KL divergence for other functions than the PDFs could provide more dissimilarity estimators. This could be used to guide the extraction of informed, minimally sized training databases from a “general” database too large to be used in full for GP regression.

### iii.3 Heterogeneous training and training set optimisation

We next examine the mixed-training strategy, to learn how a small training database which still encompasses a sufficiently varied repertoire of atomic distances and bond angle values could be built. For this, we test the accuracy of training on mixed datasets containing entries from two, three, and all five different cluster morphologies. Our results indicate that the MAEF incurred by the 3-body kernel is fairly homogeneous in the various mixed-training and testing scenarios, staying the same within a 0.03 eV/Å standard deviation, contrary to what was generally found for self-training. This suggests that introducing even a modest amount of variety in the training configuration pool is sufficient to achieve a reasonably complete training of a 3-body kernel, avoiding “local” overfitting causing extrapolation errors. Consistently, the MAEF incurred by a kernel not restricted to just a 3-dimensional feature space and thus much harder to completely train, such as the many-body kernel, is found to have a higher standard deviation ( 0.05 eV/Å) when trained and tested in the same scenarios. For all kernels, we find that mixed-training yields errors comprised between those incurred by self-training and cross-training, with MAEFs slightly higher than those produced by self-training but appreciably smaller than those associated with cross-training.

Figure 8 illustrates the performance of the 3-body kernel trained on our “best” single-structure database (4HCP, see section III B), the best choice of two- and three-structure mixed databases, the full 5-structure database (“penta”), and a (“mixed T”) training database containing 1000 DFT configurations extracted from simulations at 300, 600 and 900 K. The results are good for all training scenarios and notably, the 5-structure “penta” databases achieves the same performance of all the other database choices which needed to be identified as the best restricted ones. To investigate the performance stability of a given, simple database construction recipe, we generated 100 independent “penta” training sets, each containing 100 randomly chosen configurations for each cluster structure. Training 3-body kernels on the low temperature of Figure 8 and testing on a fixed database also comprising configurations from all five structures yields an average MAEF of 0.14 0.07 eV/Å. The small 0.004 eV/Å difference we find between the MAEF incurred by the best-performing low-temperature GP and the average MAEF suggests that the accuracy gain which might be obtained by a “best training set choice” procedure is practically negligible. To further analyse this issue we performed Metropolis Monte Carlo simulations to optimize the dataset training points, and again found no significant accuracy gain (see supplementary material). An overall better performance was instead achieved when using the “mixed T” database which included higher T configurations for all cluster morphologies.

### iii.4 Building and validating a 3-body M-FF

To perform computationally inexpensive MD simulations with near-ab initio accuracy we map the ML-FF corresponding to the two best performing 3-body kernel s (“penta” and “mixed T onto two non-parametric M-FFs, following the procedure described in Section III C.

We first assess their accuracy on configurations extracted from 600 K and 900 K BOMD simulations started from 3HCP and DIH initial structures, respectively. Both systems undergo several structural changes along their trajectory. The computed MAEFs for the ”penta” M-FF are 0.26 0.24 eV/Å for 3HCP at 600 K and 0.25 0.17 eV/Å for DIH at 900 K, indicating that the M-FF retains an acceptable accuracy level while visiting configurations not represented in the “penta” training database. (Also note that higher errors should be expected for high-temperature samples, where forces have a larger modulus).

The MAEFs associated with the “mixed T” M-FF is 0.25 0.46 eV/Å for 3HCP at 600 K, and 0.17 0.09 eV/Å for DIH at 900 K.

The M-FFs described so far are appropriate for simulating dynamical runs as they contain data gathered from BOMD DFT simulations only. Testing their accuracy on minimized 0 K structures reveals a 0.10 0.02 eV/Å MAEF for the “penta” training set and a 0.06 0.02 eV/Å MAEF for the “mixed T” training set. The inclusion of configurations collected during structural relaxation in the training set reduces the MAEF to 0.04 0.02 eV/Å, while using a many-body kernel with 4000 training points yields a further reduced 0.02 0.01 eV/Å MAEF.

To further test the accuracy of our 3-body “penta” M-FF in a dynamical setting, we run three 200 ps-long 300 K MD simulations (see supplementary material) for each of the five geometries, and compare the PDFs and BADFs with the reference ones extracted from equally long BOMD simulations of the same structures kept at the same temperatures. The excellent overlap obtained for both PDFs and BADFs, visible in Figure 6, provides some further validation of the ability of the M-FF to track the reference DFT forces. The errors averaged over our five 300 K structures incurred by the M-FFs while predicting energies are 16 10 meV/atom and 9 7 meV/atom for the “penta” and the more comprehensive“mixed T” training sets, respectively. A scatter plot of the energy error incurred at 900 K by the M-FF trained on the “mixed T” database is provided in the Supplementary Material section.

### iii.5 Assessing the thermal behaviour of Ni

To probe whether results potentially yielding novel physical insights into the nanocluster behaviour can be obtained by using a M-FF, we finally investigate the kinetic behaviour of Ni, to explore the extent of shape fluctuations occurring in the nanoclusterRossi and Baletto (2017); Pavan, Rossi, and Baletto (2015); Rossi et al. (2018); Gould et al. (2016); Rossi et al. (2017). To do so, we carry out MD runs at 50 K-spaced temperatures comprised between 300 K and 1200 K. We perform four 480 ps-long simulations for each temperature for each of the five morphologies considered in this work (380 simulations in total). The computational speed-up factor associated with carrying out a M-FF rather than a BOMD simulation in these systems is . The root mean bond fluctuation (RMBF) is a quantity describing the average bond length oscillation at a given temperature, often used to characterize phase changes in nanoscale systems Steenbergen, Schebarchov, and Gaston (2012); Ojha, Steenbergen, and Gaston (2013); Steenbergen and Gaston (2013, 2015a); Ojha, Steenbergen, and Gaston (2015); Steenbergen and Gaston (2015b), defined as:

(11) |

where is the number of atoms and the averages are taken over the simulation (excluding the first 5 ps to allow for thermal equilibration) for each atom pair. Figure 9 shows the RMBF value averaged over 20 simulations for each temperature value (four repeated simulations for each of the five structures) as a function of temperature for both the “penta” and the “mixed T” M-FFs.

The first M-FF was trained on the low temperature “penta” database, and was thus expected to be operating in a largely extrapolatory regime. This force field predicted a “slush” transition region rather than abrupt melting (cf. Fig 9, green symbols). The second M-FF was trained on the “mixed T” database including configurations at 600 K and 900 K not available in the previus “penta” database and more directly relevant to the morphologies visited by the system along the temperature ramp. MD simulations using this M-FF also predicted a “slush” transition region (cf. Fig 9, orange symbols), consistent with the earlier result.

In more detail, for temperatures below 700 K, all clusters remain solid for both M-FFs, as indicated by the small ( 0.1) RMBF. In this region, the “mixed T” M-FF alone displays a non-zero RMBF, hinting at small geometrical changes for some of the starting morphologies.

A RMBF , characteristic of nanoliquidsLi and Truhlar (2008), is observed above 900 K in simulations using the “penta” M-FF, and similarly above 975 K in simulations using the“mixed T” M-FF. In the intermediate, approximately 700-900 K range, our Ni system is predicted to be associated with a RMBF intermediate between the nanosolid and nanoliquid regimes. The agreement between the RMBFs of the two M-FFs in figure 9 implies that training on low temperature structures is sufficient to predict this qualitative feature of the dynamical behaviour of the present system. The prediction is also consistent with the high probability of geometrical rearrangements, corresponding to slush structures, that have been discussed in details for Al systemsLi and Truhlar (2008).

Several detailed geometrical interconversion processes are observed during our M-FF simulations, whose in-depth characterization is ongoing and will be provided in a future work.

## Iv Conclusions

We investigated the accuracy of a Gaussian Process-based machine learning approach to the prediction of interatomic forces in metallic nanoclusters. In particular, we assessed the ability of different -body kernels to correctly model the interactions between atoms in the Ni system, and probed how the prediction accuracy is affected by ML training carried out on single-structure and multi-structure (heterogeneous) databases. We find that, at variance with the case of bulk Ni, a 2-body kernel is not sufficiently accurate, while a 3-body kernel is able to accurately reproduce the reference DFT forces. Restricting the training databases to configurations derived from a single structure yields excellent interpolation accuracy, so that a 3-body kernel can be safely used in a “self-training” regime. However, we find that “cross-training” the kernel is not equally successful. Using training databases comprising configurations derived from different cluster structures is therefore necessary to enable extrapolation, whenever this is deemed to occur e.g., by evaluation of a KullbackLeibler asymmetric indicator.

Our results suggest that mixing configurations from as few as two different structures is sufficient to train a 3-body kernel capable of robust extrapolation and thus accurate force prediction for all the structures considered. This in turn suggests that a 3-body kernel can achieve a very good compromise between representation power (which increases with the -body kernel order), and speed of convergence with respect to database size (which instead decreases with ), provided that heterogeneous training databases are used. This result could be particularly useful for practical applications, since the force field predicted by a 3-body GP kernel can be mapped onto an exactly equivalent non parametric “M-FF” force field Glielmo, Zeni, and De Vita (2018). Such mapping yields a very significant efficiency gain, for all practical purposes aligning the M-FF speed of execution with that of any equivalent (e.g., 3-body) parametrised classical force field, while retaining the accuracy and ease of training of the underlying machine learning scheme. As a simple feasibility test, we investigated the thermal behaviour of Ni between 300 K and 1200 K. To address the cluster’s thermal behaviour, we carried out MD simulations, using a M-FF trained on 300 K structures and a second M-FF trained on 300, 600, and 900 K structures, adding up to a 400 ns total simulated time. Both M-FFs predict the occurrence of three distinct physical regimes, with similar estimates for the temperature boundaries separating them so that in particular dynamical states of the cluster, intermediate between the solid and liquid phases, are predicted to occur between 700 K and 900 K.

## Supplementary Material

See supplementary material for details on the BOMD and M-FF simulations, the learning curve graphs for DIH, BIP, 4HCP and dDIH cluster structures, more informations about the KullbackLiebler divergence method, the complete set of graphs for the MAEFS incurred by the 3- and many-body kernels while mixed-training, a description of the Monte Carlo Metropolis simulations performed, and the scatter plot of energy predictions for the “mixed T” M-FF on 1500 configurations extracted from DFT BOMD simulations at 900 K.

## Acknowledgements

CZ and AG acknowledge funding by the Engineering and Physical Sciences Research Council (EPSRC) through the Centre for Doctoral Training Cross Disciplinary Approaches to Non-Equilibrium Systems (CANES, Grant No. EP/L015854/1) and by the Office of Naval Research Global (ONRG Award No. N62909-15-1-N079). ADV acknowledges further support by the EPSRC HEmS Grant No. EP/L014742/1 and, together with AF, by the European Union’s Horizon 2020 research and innovation program (Grant No. 676580, The NOMAD Laboratory, a European Centre of Excellence). We are grateful to the UK Materials and Molecular Modelling Hub for computational resources, which is partially funded by EPSRC (EP/P020194/1). FB acknowledges financial support from the UK Engineering and Physical Sciences Research Council (EPSRC), under Grants No. EP/GO03146/1 and No. EP/J010812/1. KR acknowledges financial support from EPSRC, Grant No. ER/M506357/1, the Thomas Young Centre, the MacDiarmid Institute, and Auckland University. KR and FB acknowledge the financial support offered by the Royal Society under the project number RG120207. KR also thanks Krista Grace Steenbergen for useful discussions. The authors wish to acknowledge the contribution of NeSI high-performance computing facilities to the results of this research. NZ’s national facilities are provided by the NZ eScience Infrastructure and funded jointly by NeSI’s collaborator institutions and through the Ministry of Business, Innovation and Employment’s Research Infrastructure programme. URL https://www.nesi.org.nz.

## References

- Tyo and Vajda (2015) E. C. Tyo and S. Vajda, Nature nanotechnology 10, 577 (2015).
- Li et al. (2012) L. Li, A. H. Larsen, N. A. Romero, V. A. Morozov, C. Glinsvad, F. Abild-Pedersen, J. Greeley, K. W. Jacobsen, and J. K. Nørskov, The journal of physical chemistry letters 4, 222 (2012).
- Billas, Chatelain, and de Heer (1994) I. M. Billas, A. Chatelain, and W. A. de Heer, Science 265, 1682 (1994).
- Di Paola, DâAgosta, and Baletto (2016) C. Di Paola, R. DâAgosta, and F. Baletto, Nano letters 16, 2885 (2016).
- Cottancin et al. (2006) E. Cottancin, G. Celep, J. Lermé, M. Pellarin, J. Huntzinger, J. Vialle, and M. Broyer, Theoretical Chemistry Accounts 116, 514 (2006).
- Steenbergen, Schebarchov, and Gaston (2012) K. Steenbergen, D. Schebarchov, and N. Gaston, The Journal of Chemical Physics 137, 144307 (2012).
- Ojha, Steenbergen, and Gaston (2013) U. Ojha, K. G. Steenbergen, and N. Gaston, The Journal of chemical physics 139, 094309 (2013).
- Steenbergen and Gaston (2013) K. G. Steenbergen and N. Gaston, Physical Chemistry Chemical Physics 15, 15325 (2013).
- Steenbergen and Gaston (2015a) K. G. Steenbergen and N. Gaston, Chemistry-A European Journal 21, 2862 (2015a).
- Ojha, Steenbergen, and Gaston (2015) U. Ojha, K. G. Steenbergen, and N. Gaston, Physical Chemistry Chemical Physics 17, 3741 (2015).
- Steenbergen and Gaston (2015b) K. G. Steenbergen and N. Gaston, Nano letters 16, 21 (2015b).
- Vargas et al. (2009) A. Vargas, G. Santarossa, M. Iannuzzi, and A. Baiker, Physical Review B 80, 195421 (2009).
- Pavan, Di Paola, and Baletto (2013) L. Pavan, C. Di Paola, and F. Baletto, The European Physical Journal D 67, 24 (2013).
- Santarossa et al. (2010) G. Santarossa, A. Vargas, M. Iannuzzi, and A. Baiker, Physical Review B 81, 174205 (2010).
- Zhai and Alexandrova (2016) H. Zhai and A. N. Alexandrova, Journal of chemical theory and computation 12, 6213 (2016).
- Zhai and Alexandrova (2017) H. Zhai and A. N. Alexandrova, ACS Catalysis 7, 1905â (2017).
- Behler and Parrinello (2007) J. Behler and M. Parrinello, Physical Review Letters 98, 146401 (2007).
- Artrith, Hiller, and Behler (2013) N. Artrith, B. Hiller, and J. Behler, physica status solidi (b) 250, 1191 (2013).
- Artrith and Kolpak (2014) N. Artrith and A. M. Kolpak, Nano letters 14, 2670 (2014).
- Artrith and Kolpak (2015) N. Artrith and A. M. Kolpak, Computational Materials Science 110, 20 (2015).
- Ouyang, Xie, and Jiang (2015) R. Ouyang, Y. Xie, and D.-e. Jiang, Nanoscale 7, 14817 (2015).
- Chiriki and Bulusu (2016) S. Chiriki and S. S. Bulusu, Chemical Physics Letters 652, 130 (2016).
- Chiriki, Jindal, and Bulusu (2017) S. Chiriki, S. Jindal, and S. S. Bulusu, The Journal of Chemical Physics 146, 084314 (2017).
- Bartók et al. (2010) A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Physical Review Letters 104, 136403 (2010).
- Glielmo, Sollich, and De Vita (2017) A. Glielmo, P. Sollich, and A. De Vita, Physical Review B 95, 214302 (2017).
- Suzuki, Tamura, and Miyazaki (2017) T. Suzuki, R. Tamura, and T. Miyazaki, International Journal of Quantum Chemistry 117, 33 (2017).
- Deringer and Csányi (2017) V. L. Deringer and G. Csányi, Physical Review B 95, 094203 (2017).
- Uteva et al. (2017) E. Uteva, R. S. Graham, R. D. Wilkinson, and R. J. Wheatley, The Journal of Chemical Physics 147, 161706 (2017).
- Cui and Krems (2016) J. Cui and R. V. Krems, Journal of Physics B: Atomic, Molecular and Optical Physics 49, 224001 (2016).
- Miwa and Ohno (2016) K. Miwa and H. Ohno, Physical Review B 94, 184109 (2016).
- Ulissi et al. (2017) Z. W. Ulissi, M. T. Tang, J. Xiao, X. Liu, D. A. Torelli, M. Karamad, K. Cummins, C. Hahn, N. S. Lewis, T. F. Jaramillo, et al., ACS Catalysis 7, 6600 (2017).
- Jinnouchi and Asahi (2017) R. Jinnouchi and R. Asahi, The journal of physical chemistry letters 8, 4279 (2017).
- Glielmo, Zeni, and De Vita (2018) A. Glielmo, C. Zeni, and A. De Vita, Physical Review B 97, 184307 (2018).
- Bartók et al. (2013) A. P. Bartók, M. J. Gillan, F. R. Manby, and G. Csányi, Physical Review B 88, 054104 (2013).
- Shapeev (2016) A. V. Shapeev, Multiscale Modeling & Simulation 14, 1153 (2016).
- Rasmussen and Williams (2006) C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning, Vol. 1 (MIT press Cambridge, 2006).
- Alvarez et al. (2012) M. A. Alvarez, L. Rosasco, N. D. Lawrence, et al., Foundations and Trends in Machine Learning 4, 195 (2012).
- Micchelli and Pontil (2005) C. A. Micchelli and M. Pontil, in Advances in neural information processing systems (2005) pp. 921–928.
- Bartók, Kondor, and Csányi (2013) A. P. Bartók, R. Kondor, and G. Csányi, Physical Review B 87, 184115 (2013).
- Bartók and Csányi (2015) A. P. Bartók and G. Csányi, International Journal of Quantum Chemistry 115, 1051 (2015).
- Csáji (2001) B. C. Csáji, MSc Thesis, Eötvös Loránd University (ELTE), Budapest, Hungary (2001).
- Apra et al. (2004) E. Apra, F. Baletto, R. Ferrando, and A. Fortunelli, Physical review letters 93, 065502 (2004).
- Lu et al. (2011) Q. Lu, Q. Luo, L. Chen, and J. Wan, The European Physical Journal D 61, 389 (2011).
- Purja Pun and Mishin (2009) G. Purja Pun and Y. Mishin, Philosophical Magazine 89, 3245 (2009).
- Tribello et al. (2011) G. A. Tribello, J. Cuny, H. Eshet, and M. Parrinello, The Journal of chemical physics 135, 114109 (2011).
- Oganov and Valle (2009) A. R. Oganov and M. Valle, The Journal of chemical physics 130, 104504 (2009).
- Steenbergen and Gaston (2014) K. Steenbergen and N. Gaston, The Journal of chemical physics 140, 064102 (2014).
- Pavan, Rossi, and Baletto (2015) L. Pavan, K. Rossi, and F. Baletto, The Journal of chemical physics 143, 184304 (2015).
- Rossi and Baletto (2017) K. Rossi and F. Baletto, Physical Chemistry Chemical Physics 19, 11057 (2017).
- Rossi et al. (2018) K. Rossi, L. Pavan, Y. Soon, and F. Baletto, The European Physical Journal B 91, 33 (2018).
- Gould et al. (2016) A. L. Gould, K. Rossi, C. R. A. Catlow, F. Baletto, and A. J. Logsdail, The journal of physical chemistry letters 7, 4414 (2016).
- Rossi et al. (2017) K. Rossi, T. Ellaby, L. Paz-Borbón, I. Atanasov, L. Pavan, and F. Baletto, Journal of Physics: Condensed Matter 29, 145402 (2017).
- Li and Truhlar (2008) Z. H. Li and D. G. Truhlar, Journal of the American Chemical Society 130, 12698 (2008).
- Kresse and Furthmüller (2001) G. Kresse and J. Furthmüller, Vienna: Vienna University (2001).
- Perdew, Burke, and Ernzerhof (1996) J. P. Perdew, K. Burke, and M. Ernzerhof, Physical review letters 77, 3865 (1996).
- Larsen et al. (2017) A. Larsen, J. Mortensen, J. Blomqvist, I. Castelli, R. Christensen, M. Dulak, J. Friis, M. Groves, B. Hammer, C. Hargus, et al., Journal of Physics: Condensed Matter (2017).

## Supplementary Material

### DFT simulation details

All our Density Functional Theory (DFT) simulations are performed using the VASP package Kresse and Furthmüller (2001). The plane-wave energy cut-off is set to 270 eV, while the energy convergence cut-off is eV. The exchange-correlation functional is calculated using the PBE/GGA approximation Perdew, Burke, and Ernzerhof (1996) and spin-orbit coupling effects are accounted for. Each calculation is carried out using periodic boundary conditions in cubic simulation cell, where is the Ni bulk lattice parameter (3.52 Å), which is sufficient to avoid spurious interactions between images of the system. A Nose-Hoover thermostat is used to control the temperature during NVT BOMD runs at 300 K; data at 600 K and 900 K are also gathered for further comparison. The Newton’s equations of motion are integrated with a timestep of 3 fs.

### Learning curves

We report in Figure 10 the learning curves for the DIH, BIP, 4HCP, and dDIH cluster morphologies while trained on a 2-, 3-, and a many-body kernel.

### KL divergence and cross-errors normalization

In order to check for a correlation between the cross-testing 2- and 3-body MAEFs and the KL divergence calculated for PDFs and BADFs of the five configurations, we first normalize both sets of results to values comprised between 0 and 1. In the case of the 2- and 3-body kernels, this implies subtracting the self-training MAEFs from each set in order to obtain a zero for every self-training MAEF, before dividing all values by the maximum value. For the KL divergence of both BADFs and PDFs, we divide by the maximum value obtained in order to, again, rescale to the [0:1] range. Figure 11 contains the scatter plot between the 3-body normalized cross-error and the corresponding KL divergence values associated with the twenty possible ordered structure couples obtained starting from our five cluster structures. Here the function fed to the KL estimator is, for each cluster structure, a (still normalized) weighed sum of the structure’s PDF (weight = 1/3) and BADF (weight = 2/3).

### Mixed datasets MAEFs

Figures 12 and 13 illustrate the errors incurred by the 3- and many-body kernels when trained on every possible combination of two, three and five configurations. As a highlight, Figure 12 reports an example of the performance of the dDIH+3HCP training database. This is the best-performing two-structure training set for the 3-body kernel, although this heterogeneous training set does not contain the 4HCP structure, which is the “best” structure for cross-testing. This can be rationalized by analyzing the PDFs and BADFs of the two configurations (see Figure 5 of the main article), noting that bond distance and angle values not contained in the PDF and BADF of one structure are present in the other, and viceversa.

### Monte Carlo algorithm for database selection

To estimate the error reduction achievable by optimising the data point selection in the , we compare randomly built training datasets with the best performing sets we find using a Monte Carlo algorithm for the same .

In order to select the training database which yields the minimum average prediction error while also keeping the accuracy relatively homogeneous, we chose to minimise the following linear combination of the MAEF and the standard deviation of the absolute error, both taken as functions of the training set:

(12) |

The Monte Carlo optimization was structured as follows. First, we built and kept fixed a testing set which we used to test the performance of all GPs. We then initialialized a training database of randomly chosen configurations and carried out Metropolis steps swapping database configurations with new ones, randomly chosen from our complete configuration pool. This optimization process typically yields a MAEF lower by just 0.001 eV/Å than the MAEF associated to a randomly chosen training set when using data points to train the many-body kernel, a result further insensitive to increasing .

### M-FF simulation details

The M-FF simulations are carried out using the Atomic Simulation Environment (ASE) python package Larsen et al. (2017) and a Langevin thermostat to control the temperature (with gamma parameter set at 0.001), using a 3 fs time step.

### Energy prediction accuracy

Finally, we report in Figure 14 a scatter plot for the energy predictions obtained from the “mixed T” M-FF on a set of 1500 testing configurations extracted from a DFT BOMD simulation carried out at 900 K (see also main manuscript text). The mean energy error is 13 13 eV/atom.