# DeePCG: constructing coarse-grained models via deep neural networks

###### Abstract

We introduce a general framework for constructing coarse-grained potential models without ad hoc approximations such as limiting the potential to two- and/or three-body contributions. The scheme, called Deep Coarse-Grained Potential (abbreviated DeePCG), exploits a carefully crafted neural network to construct a many-body coarse-grained potential. The network is trained with full atomistic data in a way that preserves the natural symmetries of the system. The resulting model is very accurate and can be used to sample the configurations of the coarse-grained variables in a much faster way than with the original atomistic model. As an application we consider liquid water and use the oxygen coordinates as the coarse-grained variables, starting from a full atomistic simulation of this system at the ab-initio molecular dynamics level. We found that the two-body, three-body and higher order oxygen correlation functions produced by the coarse-grained and full atomistic models agree very well with each other, illustrating the effectiveness of the DeePCG model on a rather challenging task.

## I Introduction

In molecular dynamics (MD), we are often faced with two types of coarse-graining tasks. In a first set of applications we are interested in evaluating the Landau free energy, which is a function of a small subset of coarse-grained (CG) variables. In this case the CG variables are either scalar or low dimensional vector variables. In a second set of applications we are interested in sampling with molecular dynamics (MD) or with Monte Carlo the configurations of an extensive set of CG variables. In this case the dimensionality of the CG space is proportional to the size of the system but is reduced relative to the full space of atomistic coordinates. The first type of CG variables is typically adopted to study problems like phase transitions, where the objective is to perform detailed analyses of the Landau free energy surface by finding the metastable states, the free energy barriers between these states, the transition pathways, etc. Take the melting of a solid as an example, the Steinhardt order parameters steinhardt1983bond () have been used as CG variables to differentiate solid (crystal) and liquid phases. The second type of CG variables is typically used to accelerate configurational sampling relative to full atomistic simulations. For example, one may coarse-grain a polymer by replacing the monomers with point-like particles, or beads, connected by springs.

For a good description of the Landau free energy surface one needs to find good order parameters acting as CG variables and address the issues associated with crossing high energy barriers. Typically these approaches are limited to a few CG variables, but recent work demonstrated that machine learning methods allow us to describe the functional dependence of the Landau free energy surface in terms of several CG variables stecher2014free (); mones2016exploration (); galvelis2017neural (); schneider2017stochastic (); zhang2017reinforced (). When considering extensive CG variables, the difficulty is often associated with finding an accurate potential energy function in the space of the CG variables. Finding such potential usually requires substantial physical/chemical intuition rudd1998coarse (); nielsen2004coarse (); shinoda2008coarse (); noid2008multiscale (); noid2008multiscale2 (); larini2010multiscale (); das2012multiscale (); lyubartsev1995calculation (); reith2003deriving (); shell2008relative (); molinero2009water (). In principle, machine learning methods can address this problem more accurately and in an automated way behler2007generalized (); bartok2010gaussian (); schutt2017quantum (); chmiela2017machine (); han2017deep (); zhang2017deep (), but most machine learning approaches so far have focused on the representation of the potential energy surface in the space of the atomistic degrees of freedom rather than in the space of the CG variables. For example, the Deep Potential method han2017deep (), a model based on deep neural networks, has made it possible to parametrize an atomistic potential energy function derived from quantum mechanics without ad hoc approximations. A subsequent development of this approach, called Deep Potential Molecular Dynamics (DeePMD) zhang2017deep (), has allowed us to perform MD simulations of comparable quality to ab initio molecular dynamics (AIMD) at the cost of classical empirical force fields.

The free energy surface, rather than the potential energy surface, is the key physical quantity that we need to represent when dealing with CG variables. In this work, we introduce the Deep Coarse-Grained Potential (DeePCG) scheme, an approach that generalizes the Deep Potential and DeePMD methods to representations of the free energy surface in the space of the CG variables, a quantity that will be called the CG potential in the following. As in the case of the Deep Potential and DeePMD methods, no ad hoc approximations are required, in addition to the network model itself, to represent the CG potential. The approach is very accurate as demonstrated by the almost perfect agreement of the many-body correlations extracted from CG simulations with the corresponding correlations extracted from the original atomistic model. In the present work, we use liquid water as an example to illustrate the approach. We choose AIMD as the underlying atomistic model, and replace the individual water molecules with point-like particles located at the oxygen sites in the CG model. The excellent agreement of the two-, three-, and higher order correlation functions between CG and atomistic models shows the promise of the DeePCG approach.

## Ii Methodology

### ii.1 Basic Theory

We consider a -dimensional system with atoms in the constant-volume canonical () ensemble. The coordinates of the atoms, in the laboratory frame, are , and the corresponding momenta are . The Hamiltonian is

(1) |

where is the mass matrix, the subscript denotes matrix transpose, and is the potential energy function. The ensemble is characterized by the Boltzmann distribution

(2) |

with partition function . The configurational part of the distribution is defined by

(3) |

where .

Let us perform a change of variable in the coordinate space, . are generalized coordinates. We define the Jacobian matrix, , as

(4) |

and define the Gram matrix as

(5) |

The momentum conjugate to , denoted by , is given by . Then the Hamiltonian and the canonical distribution, in the phase space, become

(6) |

and

(7) |

respectively. Note that the Gram matrix , which is a mass-like matrix for the variables, is no longer necessarily diagonal and introduces couplings between the generalized coordinates and momenta.

Now, let us introduce the coarse-grain variables: , the leading components of . can be finite and independent of the system size or it can be extensive with the system size, as in the two cases discussed in the introduction. In the first case the variables are the so-called order parameters of the system, while in the second case the CG system replaces molecular objects of the atomistic system with smaller sub-objects. The latter can be point-particles, as in the example that we will discuss in this paper, but could also be more complex objects like rods, ellipsoids, etc.

The configurational distribution of the CG system is the projection of the configurational distribution of the microscopic (atomistic) system to the space of the CG variables:

(8) |

The probability distribution in Eq. (8) allows us to define the CG potential and the forces acting on the CG variables as:

(9) |

and

(10) |

respectively. Eq. (9) tells us that an accurate CG potential should reproduce accurately the full configurational distribution of the CG variables in the atomistic model. Typically, however, the accuracy of the schemes that have been proposed in the literature to construct CG potentials has been tested using only two- and three-body correlation functions noid2008multiscale (); noid2008multiscale2 (); larini2010multiscale (); das2012multiscale ().

The form of is uniquely specified by the underlying atomistic model and the definition of the CG degrees of freedom. Thus, our goal is to find: (1) a closed representation for , which we call the CG potential representation, and (2) a proper way of doing the optimization, or training. We notice that, even when we know , we do not have a closed deterministic form for the equations of motion of the CG variables, as a consequence of the couplings in the full set of and in the Hamiltonian in Eq. (6), and due to the missing degrees of freedom of the CG representation. In the literature, this problem was addressed in Refs. hijon2010mori (); lu2014exact () and many others. Further assumptions, like a time-scale separation between the CG variables and the remaining degrees of freedom, are usually required to recover dynamical information of the atomistic model. In the following we shall focus on the accurate construction of the CG potential. We will leave to future studies the investigation of CG dynamics. For simplicity in the following we will use , instead of , to denote the CG variables.

### ii.2 CG potential representation

We design a neural network representation for the CG potential . Here are the parameters to be optimized by the training process. should be constructed using in input only the generalized coordinates , without any human intervention in the optimization process. The constructed in this way should preserve the symmetry properties of . In this manuscript we limit ourselves to considering CG objects that behave as point particles and have only positional dependence. In this case, the variables are the coordinates of the CG particles. More general choices of the CG objects have been suggested in the literature voth2008coarse (); underhill2004coarse (); goodchild2007structural (); bhargava2009formation () when dealing with, e.g., polymers, biological molecules, or colloidal particles. In these cases it may be useful to consider rods, ellipsoids, particles connected by springs, etc., as the CG objects. In principle, all these cases could be treated with the present formalism. In the setup adopted here of point-like CG objects, the CG potential is extensive, intrinsically many-body, and should preserve the translational and rotational invariance, as well as the permutational symmetry of the CG objects.

All the properties of the CG potential described above are preserved by the Deep Potential model han2017deep (). To illustrate how it works, we use the example of liquid water. We write the CG potential as a sum of the local contributions of the CG particles, i.e., . , the potential contribution of the CG particle , is constructed in two steps. First, the coordinates of the CG particle and its neighbors within a cut-off radius are transformed into the descriptors of the local environment of the CG particle . We call this procedure local frame transformation and refer to Fig. 1 for more details. In the following we use the symbol to denote the entire set of descriptors for atom . Next, as illustrated in Fig. 2, the descriptors are given in input to a fully connected feedforward neural network to compute the potential. The mathematical formulation of the network structure is also presented in Fig. 2, where the operation of each layer of the network corresponds to a linear mapping of the output from the previous layer combined with a nonlinear mapping. The translational and rotational symmetries are preserved by the local frame transformation. The permutational symmetry is preserved because: (a) for each CG particle , its descriptors are sorted in ascending order according to the inverse distances between particles and ; (b) the subnetworks associated with the same type of particles share the same parameters ; (c) is an additive relationship. More details on the Deep Potential method can be found in Refs. han2017deep (); zhang2017deep (). Due to the adoption of a finite cut-off radius, the simulation cost with the DeePCG model scales linearly with the system size.

### ii.3 CG potential optimization

The construction of the CG potential , introduced in Sect. B, has many similarities with the construction of the potential energy , using the DeePMD method. There is, however, a very important difference in the two cases. In the DeePMD case the potential energy is directly available from the underlying AIMD simulations. In the DeePCG case, the CG potential is a free energy and is not directly available. A straightforward approach would consist in fitting accurate mean forces from atomistic simulations. There have been many efforts in this direction ciccotti2005blue (); maragliano2006temperature (); abrams2008efficient (). Of particular interest is a simple formula proposed by Ciccotti et al. ciccotti2005blue (), in which a set of -dimensional vectors that satisfy

(11) |

is introduced. Then the mean force on , namely the negative gradient of with respect to the position of the -th CG particle, can be expressed as

(12) |

with an instantaneous force estimator

(13) |

Here denotes conditional expectation over the equilibrium distribution of the system restricted to the hypersurface . To train the DeePCG model one needs to minimize the so-called loss function with respect to the model parameters . In this process, the most natural loss function is

(14) |

where is the number of configurations in the dataset and the mean force is estimated with Eq. (12). The different configurations in the dataset can be extracted from unconstrained MD or Monte Carlo (MC) simulations on the microscopic atomistic model. These simulations need not to be at the same thermodynamic conditions. For instance, different temperatures could be used to better sample the set of representative CG configurations . In practice, this straightforward approach is not convenient when the conditional expectation values in Eq. (12) require computationally expensive constrained/restrained simulations. In these situations we find more convenient to approximate the ensemble average with the average () over the configurations . The latter average does not require constrained/restrained simulations. Then the mean force in the loss function (14) can be replaced by the instantaneous force . In other words, this corresponds to using an instantaneous version of the loss function

(15) |

With a sufficiently representative dataset, we expect that the ensemble average of the difference between predicted and instantaneous forces should be approximated quite well by , i.e.:

(16) | ||||

This amounts to an ergodicity requirement for the atomistic system and is always valid if the system samples an equilibrium thermodynamic state.

The instantaneous force can be viewed as the mean force plus a random error , which depends on the microscopic configuration , i.e.,

(17) |

By using Eq. (12), the average in the constrained ensemble vanishes, so the average also vanishes. By inserting (17) into (16), the instantaneous loss function (16) becomes

(18) |

with

(19) |

Since the second term on the right hand side of Eq. (18) is independent of , and have the same minimizer.

In practice, the stochastic gradient descent method is found to be very efficient to optimize Eq. (15), which is a highly non-convex function corresponding to a rugged landscape in the parameter space due to the nonlinearity of the neural network interpolation. This ruggedness does not seem to constitute an essential difficulty since the different local minima found with the stochastic gradient descent method approximate equally well the physics associated to the target function. We will discuss this issue in more detail later. Within our approach, the stochastic gradients , applied to update the parameters at each step, are provided by the average over a small batch , a subset of the whole dataset:

(20) |

where denotes the batch size.

## Iii Coarse-graining of liquid water

To show how we construct the CG potential for an extensive CG system, we use the coarse graining of a liquid water model from an ab initio density functional theory (DFT) based simulation into effective “water particles” as an example. Because of its importance as a solvent in chemical and biological systems and its unique properties, the study of water has been of wide interest. The DFT potential energy surface is intrinsically many-body. Developing an accurate CG model that represents a water molecule by a single particle is an ever-evolving and ongoing quest noid2008multiscale (); noid2008multiscale2 (); larini2010multiscale (); das2012multiscale (); lyubartsev1995calculation (); reith2003deriving (); shell2008relative (); molinero2009water ().

Constructing effective interactions to achieve this goal has usually required a large amount of human effort combined with substantial physical/chemical intuition. For example, in the mW monatomic potential molinero2009water (), which has been successfully used to study crystallization of water moore2011structural (), a specially designed Lennard-Jones-like form is used for two-body interactions while three-body interactions are adapted from the Stillinger-Weber potential stillinger1985SW ().

Some methods introduce systematic procedures to optimize the parameters. Of particular interests are the iterative Boltzmann inversion method reith2003deriving () and the multi-scale coarse-graining method noid2008multiscale (). The iterative Boltzmann inversion method works by iteratively optimizing the CG interactions until the radial distribution functions of the CG system match those of the target atomistic simulation. By construction, it provides accurate two-body correlations but lacks a solid foundation for higher order correlations wang2009comparative (). On the other hand, the multi-scale coarse-graining method works by matching atomistic forces with a variational procedure, but the corresponding two- and three-body distribution functions still show non-negligible deviations from those of the target microscopic system larini2010multiscale (); das2012multiscale ().

The DFT dataset in our example comes from Ref. distasio2014individual (). The electronic structure of the water system is modeled by DFT with the PBE0 exchange-correlation functional Carlo1999PBE0 ()
and includes the long-range dispersion interactions self-consistently using the Tkatchenko-Scheffler model TS2009TS ().
The corresponding AIMD simulation car1985unified () adopts periodic boundary conditions
and deuterons replace protons for a larger integration time step (0.5 fs).
The simulation data consist of snapshots from a 20 ps-long trajectory in the NVT
ensemble, where (64 HO molecules), (simple cubic periodic simulation cell), and K.
In total 40,000 snapshots are recorded.
A short AIMD trajectory is not sufficient to train a DeePCG model with satisfactory accuracy.
This difficulty is circumvented by constructing a DeePMD model zhang2017deep () from the AIMD data and sampling the configurations
with a much longer DeePMD trajectory (15 ns)
^{1}^{1}1The training labels for the DeePCG model are instantaneous forces, and are subject to large statistical uncertainty,
while the training labels for the DeePMD model are precise atomistic energies and forces.
Thus the AIMD data are enough for training the DeePMD model, but not enough for training the DeePCG model..
Figs. 3, 4, and 5 compare DeePMD and AIMD configurations
in terms of the O-O radial distribution function (RDF), O-O-O angular distribution functions (ADFs), and the distributions of two averaged local Steinhardt parameters (defined in Appendix A) lechner2008accurate (), respectively.
It is observed that the configurations sampled by DeePMD are in almost perfect agreement with the AIMD data.
Therefore, when considering the O configurations, training with the data generated by DeePMD is essentially indistinguishable from that with data generated by AIMD.

Now we construct the DeePCG model. We use oxygen as the CG particle. First, we adopt a cutoff radius = 6.0 Å to define the local environment, which is the same adopted in the DeePMD model. We use the full radial and angular information for the 16 CG particles closest to the particle at the origin (see, e.g., particle in Fig. 1), while retaining only radial information for all the other particles within , (see, e.g., particle in Fig. 1). Next, the local environment of each CG particle defines a sub-network, and we use 4 hidden layers with decreasing number of nodes per layer, i.e., 120, 60, 30, and 15 nodes from the innermost to the outermost layer, to construct the corresponding contribution to the CG potential.

The training process minimizes defined in Eq. (15). The force on each oxygen in the atomistic model serves as the instantaneous estimator in Eq. (13). We employ the stochastic gradient descent method with the Adam optimizer kingma2014adam () to update the parameters of each layer, with a learning rate that exponentially decays with the training step and a batch size of 4. In our current implementation, the training process requires 15 hours on a ThinkPad p50 laptop computer with an Intel Core i7-6700HQ CPU and 32 GB memory. The DeePMD-kit wang2017deepmd () is used for optimizations and MD simulations of both the DeePMD and the DeePCG models.

## Iv Discussion

In Figs. 3, 4, and 5, we show that the DeePCG model reproduces very well the oxygen correlation functions of the atomistic DeePMD model and, by extension, of the underlying AIMD model. In addition to comparing 2- and 3-body correlations, as done in standard protocols larini2010multiscale (); das2012multiscale (), we also perform tests on how well the DeePCG model preserves higher order distribution properties. In this regard, we calculate the sample averaged local Steinhardt bond order parameters and , and find satisfactory agreement between the DeePCG and DeePMD models.

In the example that we discussed above we use to optimize a CG model of water. We find that to base the optimization on defined in Eq. (14) is significantly more inefficient. This is because when the oxygens are the CG variables, very long constrained simulations using Eq. (12) are required to sample exhaustively the allowed configurations of the hydrogen bond network (HBN). Typically, when the oxygen positions are fixed, as in a constrained simulation, different HBN configurations are compatible with the fixed oxygen configurations, but it takes a very long time, typically of the order of a few nanoseconds, for the system to sample different HBN configurations. This is because of the long-range correlations imposed on the HBN by the Pauling ice rules (i.e., each oxygen has two nearer and two more distant hydrogen neighbors) pauling1928shared (). Thus, the scheme used here for matching the on-the-fly instantaneous forces is much more efficient.

It is well-known that neural network models are highly nonlinear functions of the parameters . Multiple local minima exist in the landscape of the loss functions or . Indeed, different initializations of the weights often lead to different local minimizers of the loss function. This, however, does not seem to be a serious problem as demonstrated by the test described below.

In this test, we prepare 1000 configurations randomly selected from the DeePMD data and pick up oxygen positions to define the CG particle configurations. For a CG particle in each configuration, we define the model deviation to be the standard deviation of the force on CG particle predicted by CG models that only differ among themselves by the initialization of the simulation procedure, i.e.,

(21) |

where the ensemble average is taken with respect to
models obtained from the same training process, the same training data set ^{2}^{2}2The training data set is the same, but the batches used at each step of the Adam iteration are picked from the data set randomly and independently for different models., but different initialization of the parameters .
In this way, 64,000 instances of the model deviation are computed,
and they are used to show the consistency of the predictions of different DeePCG models quantitatively.
As shown by Fig. 6, with DeePMD data corresponding to 6 independent 2.5 ns-long trajectories,
99.3% of the model deviations, i.e., a large majority of them, are below 50 meV/Å.
Moreover, the deviations do not become more significant when the magnitude of the CG force is large (inset in Fig. 6).
Therefore, the differences of the CG forces predicted by different DeePCG models are generally consistent.
Indeed the configurational distribution functions generated by DeePCG models that differ only in the initialization are indistinguishable.
If we use shorter trajectories,
the model deviations increase, as shown in Fig. 6
for DeePMD data corresponding to 2 independent 2.5 ns-long trajectories,
and for DeePMD data corresponding to a single 2 ns-long trajectory.
This confirms that longer trajectories give better approximations of the ensemble average for .

In terms of computational cost and scalability, in the current implementation, DeePCG accelerates DeePMD 7.5 times. Since all the physical quantities in DeePCG are sums of local contributions, upon training, the DeePCG model can be directly applied to much larger systems with linear scaling of cost. To test the reliability of DeePCG for larger systems, we perform a 1 ns-long NVT CGMD simulation on a system containing 512 water beads. This system is at the same temperature of the original DeePMD data, but is 8 times larger than the system used to construct the DeePCG model. The corresponding RDF, as shown in Fig. 3, is only very slightly less structured than the DeePCG result with 64 water beads, but tends to unity at large separation with a longer tail as we expect. This is consistent with the result in Ref. kuhne2009static (), which shows that the pair correlation function is almost converged in a 64-water fixed-cell system and larger cells only loosen the structure very slightly.

## V Conclusion and future work

In summary, DeePCG is a promising tool for parameterizing the CG potential and sampling CG configurations via MD. Due to the generality of the procedure adopted to construct the CG potential function, we expect DeePCG to be useful for a wide variety of tasks. In the case of water, we note that one reason for the great success of the mW potential is that it allows us to accelerate ice nucleation by several orders of magnitude because the absence of the hydrogen coordinates in the CG coordinate set eliminates the constraint imposed by the Pauling ice rules molinero2009water (). It would be interesting to investigate whether the CG water model introduced in this paper could describe not only the liquid but also the crystalline ice phase, and whether the freezing temperature of the CG model could approximate closely that of the underlying microscopic model. Direct ice nucleation studies would be greatly facilitated by the CG model.

Coarse grained models are often used to describe the conformations of polymers, represented for example by a sequence of beads and springs. Until now these models are constructed phenomenologically by requiring that a small set of force constants match experimental and/or molecular simulation data. The DeePCG model presented here has the potential to completely eliminate phenomenological assumptions such as the harmonic springs, by systematically constructing a many-body potential for the beads depending on their configurations. We leave these studies and a more rigorous investigation of the dynamical properties of the CG models to future work.

###### Acknowledgements.

The work of L. Zhang, J. Han and W. E is supported in part by is supported in part by ONR grant N00014-13-1-0338, DOE grants DE-SC0008626 and DE-SC0009248, and NSFC grants U1430237 and 91530322. The work of R. Car is supported in part by DOE-SciDAC grant DE-SC0008626. The work of H. Wang is supported by the National Science Foundation of China under Grants 11501039 and 91530322, the National Key Research and Development Program of China under Grants 2016YFB0201200 and 2016YFB0201203, and the Science Challenge Project No. JCKY2016212A502.## References

- (1) Paul J Steinhardt, David R Nelson, and Marco Ronchetti. Bond-orientational order in liquids and glasses. Physical Review B, 28(2):784, 1983.
- (2) T Stecher, N Bernstein, and G Csányi. Free energy surface reconstruction from umbrella samples using gaussian process regression. Journal of chemical theory and computation, 10(9):4079, 2014.
- (3) L Mones, N Bernstein, and G Csányi. Exploration, sampling, and reconstruction of free energy surfaces with gaussian process regression. J Chem Theory Comput, 12:5100–5110, 2016.
- (4) Raimondas Galvelis and Yuji Sugita. Neural network and nearest neighbor algorithms for enhancing sampling of molecular dynamics. J. Chem. Theory Comput, 13(6):2489–2500, 2017.
- (5) Elia Schneider, Luke Dai, Robert Q Topper, Christof Drechsel-Grau, and Mark E Tuckerman. Stochastic neural network approach for learning high-dimensional free energy surfaces. Physical Review Letters, 119(15):150601, 2017.
- (6) Linfeng Zhang, Han Wang, and Weinan E. Reinforced dynamics for enhanced sampling in large atomic and molecular systems. i. basic methodology. arXiv preprint arXiv:1712.03461, 2017.
- (7) Robert E Rudd and Jeremy Q Broughton. Coarse-grained molecular dynamics and the atomic limit of finite elements. Physical review B, 58(10):R5893, 1998.
- (8) Steve O Nielsen, Carlos F Lopez, Goundla Srinivas, and Michael L Klein. Coarse grain models and the computer simulation of soft materials. Journal of Physics: Condensed Matter, 16(15):R481, 2004.
- (9) Wataru Shinoda, Russell DeVane, and Michael L Klein. Coarse-grained molecular modeling of non-ionic surfactant self-assembly. Soft Matter, 4(12):2454–2462, 2008.
- (10) WG Noid, Jhih-Wei Chu, Gary S Ayton, Vinod Krishna, Sergei Izvekov, Gregory A Voth, Avisek Das, and Hans C Andersen. The multiscale coarse-graining method. i. a rigorous bridge between atomistic and coarse-grained models. The Journal of chemical physics, 128(24):244114, 2008.
- (11) WG Noid, Pu Liu, Yanting Wang, Jhih-Wei Chu, Gary S Ayton, Sergei Izvekov, Hans C Andersen, and Gregory A Voth. The multiscale coarse-graining method. ii. numerical implementation for coarse-grained molecular models. The Journal of chemical physics, 128(24):244115, 2008.
- (12) Luca Larini, Lanyuan Lu, and Gregory A Voth. The multiscale coarse-graining method. vi. implementation of three-body coarse-grained potentials. The Journal of chemical physics, 132(16):164107, 2010.
- (13) Avisek Das and Hans C Andersen. The multiscale coarse-graining method. ix. a general method for construction of three body coarse-grained force fields. The Journal of chemical physics, 136(19):194114, 2012.
- (14) Alexander P Lyubartsev and Aatto Laaksonen. Calculation of effective interaction potentials from radial distribution functions: A reverse monte carlo approach. Physical Review E, 52(4):3730, 1995.
- (15) Dirk Reith, Mathias Pütz, and Florian Müller-Plathe. Deriving effective mesoscale potentials from atomistic simulations. Journal of computational chemistry, 24(13):1624–1636, 2003.
- (16) M Scott Shell. The relative entropy is fundamental to multiscale and inverse thermodynamic problems. The Journal of chemical physics, 129(14):144108, 2008.
- (17) V Molinero, EB Moore, et al. Water modeled as an intermediate element between carbon and silicon. Journal of Physical Chemistry B, 113(13):4008–4016, 2009.
- (18) Jörg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical review letters, 98(14):146401, 2007.
- (19) Albert P Bartók, Mike C Payne, Risi Kondor, and Gábor Csányi. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Physical review letters, 104(13):136403, 2010.
- (20) Kristof T Schütt, Farhad Arbabzadah, Stefan Chmiela, Klaus R Müller, and Alexandre Tkatchenko. Quantum-chemical insights from deep tensor neural networks. Nature Communications, 8:13890, 2017.
- (21) Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schütt, and Klaus-Robert Müller. Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5):e1603015, 2017.
- (22) Jequn Han, Linfeng Zhang, Roberto Car, and Weinan E. Deep potential: a general representation of a many-body potential energy surface. Communications in Computational Physics, 23(3):629–639, 2018.
- (23) Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. accepted by Physical Review Letters (arXiv preprint arXiv:1707.09571), 2018.
- (24) Carmen Hijón, Pep Español, Eric Vanden-Eijnden, and Rafael Delgado-Buscalioni. Mori–zwanzig formalism as a practical computational tool. Faraday discussions, 144:301–322, 2010.
- (25) Jianfeng Lu and Eric Vanden-Eijnden. Exact dynamical coarse-graining without time-scale separation. The Journal of chemical physics, 141(4):07B619_1, 2014.
- (26) Gregory A Voth. Coarse-graining of condensed phase and biomolecular systems. CRC press, 2008.
- (27) Patrick T Underhill and Patrick S Doyle. On the coarse-graining of polymers into bead-spring chains. Journal of non-newtonian fluid mechanics, 122(1):3–31, 2004.
- (28) Ian Goodchild, Laura Collier, Sarah L Millar, Ivan Prokeš, Jason CD Lord, Craig P Butts, James Bowers, John RP Webster, and Richard K Heenan. Structural studies of the phase, aggregation and surface behaviour of 1-alkyl-3-methylimidazolium halide+ water mixtures. Journal of colloid and interface science, 307(2):455–468, 2007.
- (29) BL Bhargava and Michael L. Klein. Formation of micelles in aqueous solutions of a room temperature ionic liquid: a study using coarse grained molecular dynamics. Molecular Physics, 107(4-6):393–401, 2009.
- (30) Giovanni Ciccotti, Raymond Kapral, and Eric Vanden-Eijnden. Blue moon sampling, vectorial reaction coordinates, and unbiased constrained dynamics. ChemPhysChem, 6(9):1809–1814, 2005.
- (31) Luca Maragliano and Eric Vanden-Eijnden. A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations. Chemical physics letters, 426(1):168–175, 2006.
- (32) Jerry B Abrams and Mark E Tuckerman. Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations. The Journal of Physical Chemistry B, 112(49):15742–15757, 2008.
- (33) Emily B Moore and Valeria Molinero. Structural transformation in supercooled water controls the crystallization rate of ice. Nature, 479(7374):506–508, 2011.
- (34) Frank H Stillinger and Thomas A Weber. Computer simulation of local order in condensed phases of silicon. Physical Review B, 31(8):5262, 1985.
- (35) Han Wang, Christoph Junghans, and Kurt Kremer. Comparative atomistic and coarse-grained study of water: What do we lose by coarse-graining? The European Physical Journal E: Soft Matter and Biological Physics, 28(2):221–229, 2009.
- (36) Robert A DiStasio Jr, Biswajit Santra, Zhaofeng Li, Xifan Wu, and Roberto Car. The individual and collective effects of exact exchange and dispersion interactions on the ab initio structure of liquid water. The Journal of chemical physics, 141(8):084502, 2014.
- (37) Carlo Adamo and Vincenzo Barone. Toward reliable density functional methods without adjustable parameters: The pbe0 model. The Journal of Chemical Physics, 110(13):6158–6170, 1999.
- (38) Alexandre Tkatchenko and Matthias Scheffler. Accurate molecular van der waals interactions from ground-state electron density and free-atom reference data. Physical Review Letters, 102:073005, Feb 2009.
- (39) Roberto Car and Michele Parrinello. Unified approach for molecular dynamics and density-functional theory. Physical Review Letters, 55(22):2471, 1985.
- (40) Wolfgang Lechner and Christoph Dellago. Accurate determination of crystal structures based on averaged local bond order parameters. The Journal of chemical physics, 129(11):114707, 2008.
- (41) Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (42) Han Wang, Linfeng Zhang, Jiequn Han, and Weinan E. Deepmd-kit: A deep learning package for many-body potential energy representation and molecular dynamics. arXiv preprint arXiv:1712.03641, 2017.
- (43) Linus Pauling. The shared-electron chemical bond. Proceedings of the national academy of sciences, 14(4):359–362, 1928.
- (44) Thomas D Kühne, Matthias Krack, and Michele Parrinello. Static and dynamical properties of liquid water from first principles by a novel car- parrinello-like approach. Journal of chemical theory and computation, 5(2):235–241, 2009.

## Appendix A Definition of the local averaged Steinhardt parameters

The bond orientational order of particle (atom or molecule) in a condensed environment is often described by a local Steinhardt parameter steinhardt1983bond (), defined as

(22) |

with

(23) |

Here denotes the set of neighbors of particle , are spherical harmonics, and is a switching function defined by

(24) |

In this work we take nm and nm, and adopt the modification of the local Steinhardt parameter proposed by Lechner and Dellago lechner2008accurate (), which is more sensitive than the original bond order parameter in distinguishing different crystal structures. The modified Steinhardt parameter is defined by

(25) |

with

(26) |

where includes and the tagged particle . In the full expansion of the local averaged Steinhardt parameters, 4-body terms like , are found. Therefore, the distribution of the value of the local averaged Steihardt parameters includes the effect of 4-body angular correlations.