Symmetry-Adapted Machine-Learning for Tensorial Properties of Atomistic Systems
Statistical learning methods show great promise in providing an accurate prediction of materials and molecular properties, while minimizing the need for computationally demanding electronic structure calculations. The accuracy and transferability of these models are increased significantly by encoding into the learning procedure the fundamental symmetries of rotational and permutational invariance of scalar properties. However, the prediction of tensorial properties requires that the model respects the appropriate geometric transformations, rather than invariance, when the reference frame is rotated. We introduce a formalism that can be used to perform machine-learning of tensorial properties of arbitrary rank for general molecular geometries. To demonstrate it, we derive a tensor kernel adapted to rotational symmetry, which is the natural generalization of the smooth overlap of atomic positions (SOAP) kernel commonly used for the prediction of scalar properties at the atomic scale. The performance and generality of the approach is demonstrated by learning the instantaneous electrical response of water oligomers of increasing complexity, from the isolated molecule to the condensed phase.
The last few years have seen a surge in applications of statistical learning approaches to the prediction of the properties of molecules and materials. Chemical and materials informatics approaches – in which large databases are mined to find correlations between structure and macroscopic properties – have become ubiquitous Hachmann et al. (2011); Hat et al. (2013); Saal et al. (2013); Jain et al. (2013); Calderon et al. (2015); Ward and Wolverton (2017). Furthermore, “machine-learning potentials” are increasingly used as surrogate models for demanding electronic structure calculations, and to obtain information on the stability and properties of a material as a function of the microscopic arrangement of its atoms Behler and Parrinello (2007); Bartók et al. (2010); Montavon et al. (2013); Li et al. (2015); Faber et al. (2016). For these approaches to be effective, it is crucial that the statistical learning algorithm and the mathematical representation of the atomic configurations respect the fundamental symmetries of the problem. For example, scalar properties should be invariant under rigid translations, rotations or reflections of the atomic configurations, as well as permutations of the order of identical atoms. Methods that fulfill these requirements have demonstrated very promising performance for predicting scalar quantities such as electronic ground-state energies Behler and Parrinello (2007); Bartók et al. (2013); Rupp (2015); De et al. (2016a); Ferré et al. (2016).
A complete description of molecular and condensed-phase systems, however, also requires the prediction of properties that are not scalars. The response of a material to mechanical, magnetic or electric perturbations all require response coefficients that are tensorial in nature. The electrical response momenta – the dipole moment , polarizability , first hyperpolarizability , etc. – underlie in particular the modelling of experiments such as infrared Perakis et al. (2016), Raman Zhang et al. (2016); Heaton and Madden (2008); Luber et al. (2014); Zhang et al. (2016) and second-harmonic spectroscopy Boyd (2008); Roke and Gonella (2012); Tocci et al. (2016). No less importantly, they represent a fundamental ingredient to include many-body effects in atomistic simulations of a material through the development of polarizable force-fields Morita (2002); Fanourgakis and Xantheas (2008); Cieplak et al. (2009); Lopes et al. (2009); Baker (2015); Cisneros et al. (2016).
Gaussian process regression (GPR) is a commonly used machine learning technique, which is formally equivalent to kernel-ridge regression Schölkopf et al. (1998); Williams and Rasmussen (2006), and is built upon the definition of a kernel function that encodes the similarity between two configurations and Williams and Rasmussen (2006); Cuturi (2009); Bartók and Csányi (2015). In order to guarantee that predicted properties respect the relevant physical symmetries, the kernel function must obey corresponding transformation rules. For instance, when predicting a scalar, should be invariant to rotations of the two configurations. The extension to tensorial quantities is not straightforward. As discussed recently for the case of the learning of vectorial properties such as forces Glielmo et al. (2017), the regression framework must be designed to that the predicted properties are covariant with respect to symmetry operations applied to the system. Under certain conditions, suitable strategies can be used to bypass the problem: for example, in the presence of relatively rigid molecular units (e.g. in water) it is possible to define a local reference frame, so that electrical response tensors can be learned by comparing mutually aligned molecules Bereau et al. (2015); Liang et al. (2017). However, this approach is not generally applicable to flexible or dissociable molecular systems. A learning algorithm that handles symmetries in a more general, mathematically rigorous fashion is required.
In this Letter, we introduce a GPR framework that explicitly includes the rotational symmetry of tensorial properties of arbitrary order, generalizing an earlier framework designed for the kernel ridge regression of forces Glielmo et al. (2017), and can treat molecular or condensed-phase systems of arbitrary complexity. As a practical implementation, we define a family of kernels that are based on the smooth overlap of atomic positions (SOAP) kernels of Ref. Bartók et al. (2013), which we modify to account for the covariance of the tensorial property. Within a GPR framework Bartók and Csányi (2015), the prediction of a property for a configuration can be written as a linear combination of kernel functions , that quantify the dissimilarity of the trial configuration with a set of reference inputs :
The weights can be determined by solving a linear problem , where and contains the values of the target property for the training configurations, and is a regularization parameter, which can be interpreted as the expected error of the fit, due to both any intrinsic noise in the target data and the limitations of the model representation.
Consider now the case of a tensorial property . We will label the components of the tensor using a compact notation , where indicates for example a set of Cartesian axes . Within a Bayesian interpretation, the kernel represents a measure of correlations between the value of the tensorial property associated with the configurations . In particular, we can write:
where indicates the covariance between and . In this formalism the learning algorithm is expected to simultaneously take into account all the components of . Eq. (2) represents a block of a full kernel matrix, which can be built by merging the portions associated with each pair of configurations. The complete matrix is Hermitian, so that for each block .
When a generalized symmetry operation is applied to one configuration of the system, the corresponding tensorial property transforms as . Then, given two independent symmetry operations and acting on the two configurations, it follows from Eqn. (2) that each kernel element must satisfy the following transformation rule:
This is the generalization of the covariance conditions introduced in Ref. Glielmo et al. (2017) for the special case of learning vectors. Similarly to that case, one can then verify that a kernel which satisfies Eq. (3) can be obtained starting from a scalar kernel , by averaging over the matrix that represent the symmetry operation :
The scalar kernel only needs to be be independent of the absolute reference frame, but not of the relative orientation of the two configurations, i.e. .
In the case of a Cartesian tensor of rank , a full hierarchy of Cartesian kernels can be built by combining orthogonal rotation matrices, i.e., in Eq.(4), generating a kernel with blocks of size . However, this strategy is unnecessarily complicated. The actual dimensionality of the problem can be significantly reduced by recasting the tensor into its irreducible spherical tensor (IST) representation . Each identifies an orthogonal subspace of dimension , according to algebra Weinert (1980). Depending on the rank and the symmetries of the tensor, the decomposition contains a different number of elements, which in any case correspond to diagonal blocks of size smaller than . Performing a decomposition into the IST components makes the statistical learning faster and more transparent, since each tensorial component can now be independently learned as a vector of dimension .
What is more, in the spherical basis, the covariance conditions of Eqs. (3) and (4) can be reformulated by using the fact that each spherical component of a completely symmetric tensor follows the same transformation rules as the corresponding vector spherical harmonics , if co-variant, or , if contra-variant Weinert (1980). It follows that if the kernel is required to encode rotational symmetry in three dimensions, the generalized transformation matrix of Eq. (4) will be represented by the Wigner matrix associated with the active rotations of the system’s configurations. Morrison and Parker (1987)
As a practical implementation of Eq. (4), we consider the case where is given by the overlap between Gaussian smoothed atom densities,
where is a sum over the atoms making up the environment and is a Gaussian of width centred on . The range of the kernel can be tuned by introducing a cutoff function that zeroes out the contribution from atom that lie farther than a given distance from the central atom. With this choice of , the matrix kernel associated with a given IST component is,
As shown in the Supplementary Information (SI), when an angular decomposition of the atom-centred Gaussian densities is applied Kaufmann and Baumeister (1989), this integral can be computed analytically. The case recovers the scalar SOAP kernel of Ref. Bartók et al. (2013), which has been demonstrated to be very effective for the statistical learning of scalar properties of materials and molecules Szlachta et al. (2014); De et al. (2016b, 2017); Cliffe et al. (2017); Deringer and Csányi (2017). As detailed in the SI, such a “-SOAP” hierarchy of tensorial kernels can be recast as an inner product of -size vectors , in the form:
where the contraction indexes and running respectively over the basis sets of the radial and the angular expansion of atomic densities. 111Note that, in agreement with the covariance condition of Eq. (3), the fingerprint vectors transform like the vector spherical harmonics when the configuration is rotated. Each represents a symmetry-adapted fingerprint associated with the individual configurations , generalizing the SOAP power spectrum of Ref. Bartók et al. (2013).
The advantage of this formulation, which builds on a mathematically rigorous treatment of group symmetry, is that it can be applied seamlessly to molecules as well as to systems undergoing chemical reactions or to condensed phases of matter. Discrete symmetries can also be included straightforwardly. For instance, an kernel with inversion symmetry can be computed as,
where denotes inversion and is a kernel. 222In this work, the overlap between atomic densities is raised to an even power in the definition of the scalar base kernel (Eq. (5)), which means that inversion symmetry is taken into account automatically when learning invariant properties. An overlap kernel raised to a odd power would instead be necessary when learning chiral properties Bartók et al. (2013); De et al. (2016a).
As a demonstration of the general applicability of the framework, we now show the performance of our symmetry-adapted GPR algorithm (SA-GPR) in predicting the static polarizability series of neutral and charged water oligomers, as well as the instantaneous dielectric response tensor of liquid water configurations. Details on the training sets and the kernel hyperparameters used in each case are provided in the SI. As a first example, we consider the polarizability series of flexible and arbitrarily-oriented water molecules in vacuum. The dipole moment , polarizability and first hyperpolarizability were computed with high-end quantum chemical methods for 1000 configurations. Due to the symmetry with respect to permutations of Cartesian indices – which is implied by the definition of response tensors as the derivatives of the electronic energy with an applied electric field – corresponds to an irreducible representation involving the and spherical components only, while has an IST decomposition containing and . Figure 1a shows the learning curves (i.e. the test error as a function of the number of training structures included) for all the IST components. Without explicitly providing information on the orientation of water molecules, the SA-GPR framework can easily achieve an error below 5% with only 100 training points.
A natural approach to extend the -SOAP framework to more complex molecules, and eventually to the condensed phase, involves decomposing the overall properties of the system into atom-centered components. It is straightforward to see Bartok et al. (2017) that an atom-centered decomposition is equivalent to the learning of the system’s properties using a single kernel that is built by breaking down each configuration into multiple environments, and defining the kernel of Eq. (6) as the sum of all possible local similarities between two configurations Bartok et al. (2017),
with representing the environment of the configuration . is the tensorial kernel that compares the local environment of the configuration with the local environment of the configuration.
Considering a water dimer as en example, we take the two O atoms as centers of the environments (so that ), and allow all of the surrounding atoms (H and O) to contribute to the smoothed atom density. The extension of this formalism to multiple chemical species involves a generalization of the scalar kernel (5), discussed in Ref. 14. With 500 training samples, both the isotropic and anisotropic components of of the dimer polarizability can be learned with a RMSE below 10% of the intrinsic variance (see Fig. 1b). It is worth stressing that although we use the dimer responses as learning targets, the additive kernel implies a decomposition in (environment-corrected) monomer responses. Eq. (9) allows us to write, e.g.
where is the contribution of the environment to the dimer polarizability. As shown in the SI, when the two molecules are far apart the monomer polarizabilities predicted using Eq. (10) converge to the values computed separately for the two monomers. Thus, the discrepancy observed when the molecules separation is small can be seen as the two-body correction to the dielectric response function of individual monomers.
As the next step, we consider the case of the Zundel cation HO. Being both charged and chemically active, this molecule is an example of a system that would be difficult to describe in terms of separate molecular contributions. Fig. 1c compares the learning curves for the moduli of the spherical components of , and , obtained using a spherical cutoff of 4 Å around each oxygen atom. Note that although each environment encompass the entire molecule, learning with atom-centered environments implies enforcing the covariance condition at the level of O atoms, which better captures the physics of the problem. The errors are well below 5% with 500 training samples, showing that -SOAP kernels are well suited to extend the SA-GPR method to systems which are intrinsically not separable into smaller molecular units.
In order to test the robustness and generality of our scheme, we finally consider the prediction of the dielectric response tensor of instantaneous configurations of condensed phase water. The reference data set has been collected by computing through the modern theory of polarization Resta (2010) within density functional theory, for 400 different snapshots of a 32-molecule path integral simulation Ceriotti et al. (2014) of room-temperature q-TIP4P/f water Habershon et al. (2009) (see SI for further details). Fig. 2 shows how building a -SOAP kernel with an environment cutoff of 4 Å around each oxygen atom allows us to learn directly both the isotropic and anisotropic components of with a RMSE well below 0.01 a.u. with just 200 training samples. As we discuss in the SI, training of the isotropic component is much more effective if performed on the molecular polarizability . This underscores the importance of reducing the impact of non-local effects – which appear in the definition of through the volume term – when applying a machine-learning strategy that is based on an atom-centered decomposition. Indeed, similar performance can be obtained by learning if is increased to 5 Å, so that information on the volume of the simulation is captured by the kernel.
The SA-GPR framework we introduced in this work provides a generally applicable strategy to perform kernel-based machine-learning of tensorial properties, fully incorporating their rotational symmetries. Extensions to other discrete or continuous symmetries (e.g. to cylindrical geometries, or translational invariances) is straightforward. Building on the existing SOAP kernel between atomic environments, we obtain a hierarchy of -SOAP kernels which can be used to predict the electric response tensors of systems of increasing complexity, from isolated molecules to the condensed phase. Being able to apply statistical learning to tensors opens the way to the prediction of anisotropic materials properties: elastic and magnetic response, NMR chemical shifts, etc. Machine-learning of molecular electric responses, which we used here as an example, makes it possible to improve the computation of linear and non-linear optical spectra, as well as to design more accurate polarizable forcefields for complex systems that cannot be described well in terms of rigid molecular entities. Another application with immense potential is related to the calculation of the building blocks of electronic-structure methods, such as the ground-state charge density, or the matrix elements of Hamiltonians written in an atom-centered basis. Learning the Hamiltonian would allow one to obtain “tight-binding-like” schemes free of an explicit parameterization, which can match the accuracy of higher levels of electronic-structure theory when computing properties such as electronic bands. Statistical learning methods are finding applications across all fields of science and technology. This Letter shows how to realize the full potential of these methods by making them consistent with the fundamental physical symmetries of the problem at hand.
The authors thank Francesco Paesani for providing the water dimer structures and Giulio Imbalzano for critical reading of the manuscript. M.C was supported by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 677013-HBMAP). D.M.W. acknowledges funding from the Swiss National Science Foundation (Project ID 200021_163210).
- Hachmann et al. (2011) J. Hachmann, R. Olivares-Amaya, S. Atahan-Evrenk, C. Amador-Bedolla, R. S. Sánchez-Carrera, A. Gold-Parker, L. Vogt, A. M. Brockway, and A. Aspuru-Guzik, J. Phys. Chem. Lett. 2, 2241 (2011).
- Hat et al. (2013) G. L. W. Hat, S. Curtarolo, T. B. Massalski, and O. Levy, Phys. Rev. X 3, 041035 (2013).
- Saal et al. (2013) J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton, JOM 65, 1501 (2013).
- Jain et al. (2013) A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richard, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson, APL Mater. 1, 011002 (2013).
- Calderon et al. (2015) C. E. Calderon, J. J. Plata, C. Toher, C. Oses, O. Levy, M. Fornari, A. Natan, M. J. Mehl, G. Hart, M. B. Nardelli, and S. Curtaralo, Comp. Mater. Sci 108, 233 (2015).
- Ward and Wolverton (2017) L. Ward and C. Wolverton, Curr. Opin. Solid State Mater. Sci. 21, 167 (2017).
- Behler and Parrinello (2007) J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007).
- Bartók et al. (2010) A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Phys. Rev. Lett. 104, 136403 (2010).
- Montavon et al. (2013) G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko, K. R. Müller, and O. Anatole Von Lilienfeld, New Journal of Physics 15, 095003 (2013).
- Li et al. (2015) Z. Li, J. R. Kermode, and A. De Vita, Phys. Rev. Lett. 114, 096405 (2015).
- Faber et al. (2016) F. A. Faber, A. Lindmaa, O. A. von Lilienfeld, and R. Armiento, Phys. Rev. Lett. 117, 135502 (2016).
- Bartók et al. (2013) A. P. Bartók, R. Kondor, and G. Csányi, Phys. Rev. B 87, 184115 (2013).
- Rupp (2015) M. Rupp, International Journal of Quantum Chemistry 115, 1058 (2015).
- De et al. (2016a) S. De, A. P. Bartók, G. Csányi, and M. Ceriotti, Phys. Chem. Chem. Phys. 18, 13754 (2016a).
- Ferré et al. (2016) G. Ferré, T. Haut, and K. Barros, (2016), arXiv:1612.00193 .
- Perakis et al. (2016) F. Perakis, L. D. Marco, A. Shalit, F. Tang, Z. R. Kann, T. D. KÃ¼hne, R. Torre, M. Bonn, and Y. Nagata, Chemical Reviews 116, 7590 (2016).
- Zhang et al. (2016) X. Zhang, Q.-H. Tan, J.-B. Wu, W. Shi, and P.-H. Tan, Nanoscale 8, 6435 (2016).
- Heaton and Madden (2008) R. J. Heaton and P. A. Madden, Mol. Phys. 106, 1703 (2008).
- Luber et al. (2014) S. Luber, M. Iannuzzi, and J. Hutter, J. Chem. Phys. 141, 094503 (2014).
- Boyd (2008) R. W. Boyd, Nonlinear Optics (Academic Press, 2008).
- Roke and Gonella (2012) S. Roke and G. Gonella, Annu. Rev. Phys. Chem. 63, 353 (2012).
- Tocci et al. (2016) G. Tocci, C. Liang, D. M. Wilkins, S. Roke, and M. Ceriotti, J. Phys. Chem. Letters 7, 4311 (2016).
- Morita (2002) A. Morita, Journal of Computational Chemistry 23, 1466 (2002).
- Fanourgakis and Xantheas (2008) G. S. Fanourgakis and S. S. Xantheas, The Journal of Chemical Physics 128, 074506 (2008).
- Cieplak et al. (2009) P. Cieplak, F.-Y. Dupradeau, Y. Duan, and J. Wang, Journal of Physics: Condensed Matter 21, 333102 (2009).
- Lopes et al. (2009) P. E. M. Lopes, B. Roux, and A. D. MacKerell, Theoretical Chemistry Accounts 124, 11 (2009).
- Baker (2015) C. M. Baker, Wiley Interdisciplinary Reviews: Computational Molecular Science 5, 241 (2015).
- Cisneros et al. (2016) G. A. Cisneros, K. T. Wikfeldt, L. OjamÃ¤e, J. Lu, Y. Xu, H. Torabifard, A. P. BartÃ³k, G. CsÃ¡nyi, V. Molinero, and F. Paesani, Chemical Reviews 116, 7501 (2016).
- Schölkopf et al. (1998) B. Schölkopf, A. Smola, and K.-R. Müller, Neural Computation 10, 1299 (1998).
- Williams and Rasmussen (2006) C. K. I. Williams and C. E. Rasmussen, Gaussian Processes for Machine Learning (MIT Press, 2006).
- Cuturi (2009) M. Cuturi, (2009), arXiv:0911.5367v2 .
- Bartók and Csányi (2015) A. P. Bartók and G. Csányi, International Journal of Quantum Chemistry 115, 1051 (2015).
- Glielmo et al. (2017) A. Glielmo, P. Sollich, and A. De Vita, Phys. Rev. B 95, 214302 (2017).
- Bereau et al. (2015) T. Bereau, D. Andrienko, and O. A. von Lilienfeld, J. Chem. Theory Comput. 11, 3225 (2015).
- Liang et al. (2017) C. Liang, G. Tocci, D. M. Wilkins, A. Grisafi, S. Roke, and M. Ceriotti, Phys. Rev. B 96, 041407 (2017).
- Weinert (1980) U. Weinert, Archive for Rational Mechanics and Analysis 74, 165 (1980).
- Morrison and Parker (1987) A. M. Morrison and A. G. Parker, Australian Journal of Physics 40, 465 (1987).
- Kaufmann and Baumeister (1989) K. Kaufmann and W. Baumeister, Journal of Physics B: Atomic, Molecular and Optical Physics 22, 1 (1989).
- Szlachta et al. (2014) W. J. Szlachta, A. P. Bartók, and G. Csányi, Phys. Rev. B 90, 104108 (2014).
- De et al. (2016b) S. De, A. P. Bartok, G. Csanyi, and M. Ceriotti, Phys. Chem. Chem. Phys. 18, 13754 (2016b).
- De et al. (2017) S. De, F. Musil, T. Ingram, C. Baldauf, and M. Ceriotti, Journal of Cheminformatics 9, 6 (2017).
- Cliffe et al. (2017) M. J. Cliffe, A. P. Bartók, R. N. Kerber, C. P. Grey, G. Csányi, and A. L. Goodwin, Phys. Rev. B 95, 224108 (2017).
- Deringer and Csányi (2017) V. L. Deringer and G. Csányi, Phys. Rev. B 95, 094203 (2017).
- (44) Note that, in agreement with the covariance condition of Eq. (3\@@italiccorr), the fingerprint vectors transform like the vector spherical harmonics when the configuration is rotated.
- (45) In this work, the overlap between atomic densities is raised to an even power in the definition of the scalar base kernel (Eq. (5\@@italiccorr)), which means that inversion symmetry is taken into account automatically when learning invariant properties. An overlap kernel raised to a odd power would instead be necessary when learning chiral properties Bartók et al. (2013); De et al. (2016a).
- Bartok et al. (2017) A. P. Bartok, S. De, C. Poelking, N. Bernstein, J. Kermode, G. Csányi, and M. Ceriotti, (2017), arXiv:1706.00179 .
- Resta (2010) R. Resta, Journal of Physics: Condensed Matter 22, 123201 (2010).
- Ceriotti et al. (2014) M. Ceriotti, J. More, and D. E. Manolopoulos, Comp. Phys. Comm. 185, 1019 (2014).
- Habershon et al. (2009) S. Habershon, T. E. Markland, and D. E. Manolopoulos, J. Chem. Phys. 131, 24501 (2009).