3D Deep Learning with voxelized atomic configurations for modeling atomistic potentials in complex solid-solution alloys
The need for advanced materials has led to the development of complex, multi-component alloys or solid-solution alloys. These materials have shown exceptional properties like strength, toughness, ductility, electrical and electronic properties. Current development of such material systems are hindered by expensive experiments and computationally demanding first-principles simulations. Atomistic simulations can provide reasonable insights on properties in such material systems. However, the issue of designing robust potentials still exists. In this paper, we explore a deep convolutional neural-network based approach to develop the atomistic potential for such complex alloys to investigate materials for insights into controlling properties. In the present work, we propose a voxel representation of the atomic configuration of a cell and design a 3D convolutional neural network to learn the interaction of the atoms. Our results highlight the performance of the 3D convolutional neural network and its efficacy in machine-learning the atomistic potential. We also explore the role of voxel resolution and provide insights into the two bounding box methodologies implemented for voxelization.
3D Deep Learning with voxelized atomic configurations for modeling atomistic potentials in complex solid-solution alloys
Rahul Singh* Iowa State University Aayush Sharma* Iowa State University Onur Rauf Bingol* Iowa State University Aditya Balu Iowa State University Ganesh Balasubramanian Lehigh University Duane D. Johnson Iowa State University Soumik Sarkar Iowa State University ††thanks: Equal Contribution
noticebox[b]Preprint. Work in progress.\end@float
Over the past decade a subset of complex solid-solution alloys (CSAs), known as high-entropy alloys (HEAs), have captured the imagination of researchers across the globe. HEAs have demonstrated remarkable mechanical properties including, but not limited to, enhanced structural strength, as well as improved resistance to fatigue, oxidation, corrosion, and wear [1, 2, 3, 4, 5]. These findings have encouraged extensive research to explore the potential of HEAs for high temperature applications and for engineering systems operating under harsh environments [1, 5, 6, 7], and hence, the need to understand the material nano- and micro-structures. The infinite permutations and combinations of the atomic species in varying concentrations that can be employed to produce CSAs, creates an enormous challenge in designing alloys with novel compositions and structures. The plausible inclusion of several different elements in notable proportions to synthesize HEAs enables a high dimensional design space explorations relative to traditional alloys . The design space is impractically large for exploration solely by experimental methods or high-performance atomistic simulations. A majority of the investigations have resorted to resource intensive experimental measurements or first-principles calculations, while computational modeling by atomistic simulations, especially for the sub-micron length processes, have suffered due to the absence of well-defined force fields to describe interatomic interactions [1, 8, 9, 10]. Computationally demanding electronic-structure calculations using density functional theory (DFT) are often restricted to simulating a few hundred atoms, for time-scales less than a nanosecond [8, 9, 10], which molecular dynamics (MD) simulations can overcome if suitable potential functions are available to describe the atomic interactions. MD can predict deterministically nano/microscopic phenomena (deformation and motion of dislocations, phase-transition) and predict material properties at a fraction of the computational time compared to first-principles-based techniques [8, 9, 10].
An interatomic potential function or potential energy surface (PES) refers to the mathematical equation that provides a direct functional relation between the configurations, positions and potential energy of a group of atoms. Analytical potentials provide a simpler, direct relation between the molecular configurations and its potential energy in closed form, and facilitate a quick energy calculation, but they are usually derived by introducing physical approximations . These potentials represent a necessary compromise between efficiency and accuracy, given that the essential characteristics of the atomic interactions are reasonably described. In MD simulations , force-field functions, such as the embedded atom method (EAM) , are available for a limited set of atom-combinations and can be employed to describe the self and cross interactions of the different participating species. They ideally should reproduce configurational energies and defect energies that control relevant physical mechanisms, which, at present, is a short-coming of most empirical potentials.
The vastness of the CSAs composition landscape, and the absence of PES information for several of the elemental combinations, compels the development of a method for deriving useful potential functions for these materials a timely scientific problem to address. Here, we describe a deep-learning (DL) method seeded with data from ab-initio molecular dynamics (AIMD) calculations as a route to developing interatomic potentials. Such potentials essentially utilize ML techniques to construct direct functional correlation between the atomic position, configuration and energy , and employ a consistent set of atomistic data [11, 14, 15]. A robust ML potential enables (a) faster evaluations especially for large MD or Monte Carlo (MC) simulations, (b) general applicability to all possible interactions within a system containing various atomic species, and (c) high predictive accuracy that are comparable to DFT predictions.
Application of machine learning (ML) to atomistic simulations to predict PES has been explored by several in the past [11, 14, 15]. However, most of these approaches have been performed for pure elemental simulations. Also, because the atomic feature descriptor are volumetric in nature (i.e., the position of atoms, interatomic distances, etc.) and evaluating it in 3D domain of the cell shall makes the ML method more effective. The key novelty in this paper is to use a volumetric representation of the physical descriptors using a voxel grid and thus predicting atomistic potential for multi-component alloys. To best of our knowledge, there are no or very sparse attempts for using voxel-based (3D) convolutional Neural Networks for predicting PES for complex solid solution alloys.
Our specific contributions in this work are:
We produce a potential to describe efficiently and accurately the PES for molecular simulations of equiatomic MoTaW HEA, as a representative CSA.
We build a framework (called as Machine-Learning-based-Potential Energy Surface, or ML-PES) for using deep convolutional neural networks to learn from ab-initio molecular dynamics (AIMD) simulations of multi-component alloys.
The flow of the paper is as follows: In section 2, we discuss about Data preparation and voxelization of Ab-Initio Molecular Dynamics data of CSAs. In section 3, we provide details about the deep convolutional neural network used for machine learning and the training process. In section 4, we present the results and discussion.
2 Ab-Initio Molecular Dynamics for CSAs
2.1 First-principles-based calculations
We perform ab-initio molecular dynamics (AIMD) simulations with the VASP 5.4 package[17, 18], using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional and the projected-augmented wave (PAW) potentials[20, 21]. The three elements Ta, Mo and W form the equiatomic ternary HEA. The initial structures are assimilated from our in-house hybrid Cuckoo-Search with local environments optimized by Monte-Carlo algorithm that minimizes the short-range order (SRO), and represent near random configurations that are desirable for disordered solid solutions (CSAs or HEAs). We employ an energy cut-off of 400 eV and Born-Oppenheimer convergence of 10 eV at each time step. A Monkhorst-Pack k-mesh of 10 10 10 is used in all the AIMD simulations. A Nosé thermostat is applied at different temperatures (100 to 1500 K) for the various simulated cases, as required by the exchange-correlation functional to initiate atomistic displacements in the MoTaW CSA.
A body-centered-cubic (BCC) supercell containing 18 atoms each of Mo, Ta and W is shown in Figure 1(a). We perform AIMD simulations at seven different temperatures (100, 300, 500, 700, 900, 1100 and 1500 K), and monitor the displacements (x, y, z) and forces (F, F, F) experienced by each atom. The dataset consists of 70,000 different snapshots containing relevant displacements, forces, and energies for all atoms in the CSA. We employ a structural descriptor V, as expressed in Eq. 1, to capture the interatomic interactions and analyze the chemical disorder in the alloy. In Eq. 1, x, y, z are the normalized Cartesian coordinates (normalized with respect to maximum magnitude), refers to the atomic number and is the variance (typically 2.0 for a normal Gaussian distribution).
Typical AIMD output is extensive and complex for direct analysis with a deep neural network (DNNs). It is primarily very difficult because the data structure consists of a huge set of atomic coordinates and properties for all the atoms in the HEA. Hence, we use voxel based representation of the atomic configuration, where the data is represented in a simple cellular structure bounded by a rectangular box. Voxel is analogous to volume element, and the generation of these elements is voxelization, which generates a voxel grid with a user-specified grid resolution. Each voxel in the grid can contain none, one or more atoms depending on their x, y, z coordinates. Voxel grid generation depends on an inside-outside test that compares the x, y, z coordinates of the atoms with respect to the grid boundaries of the voxels. The bounds are defined by the minimum and maximum points that can form the diagonal of the volume element, as illustrated in Fig. 1(b); the maximum and minimum points are represented by green spheres and other edge points are represented by orange spheres. We apply the following test to check whether a point of interest, P, lies inside or outside the corresponding voxel. P is considered inside the voxel if the inequalities in Eq. 2 are satisfied.
where , , and are the basis vectors of the voxel illustrated in Figure 2 and . Once the atoms in simulation domain are decomposed to their respective voxels, the atomic properties are computed for each voxel. Here, each voxel contains the descriptors V.
The voxel grid generation is implemented by two distinct approaches. In the first case (denoted as ibb for internal bounding box), we compute the bounding box of each configuration (a.k.a., snapshot or image) and subsequently generate the voxel representation based on that bounding box. In this way, each image is having with its own (internal) bounding box. We can say that the critical features of scale and relative interaction of atoms is clearly captured by this representation (because of its invariance with respect to the variance of scale in the atomic locations). We generate the voxel representation with resolution options of 4 4 4, 8 8 8, 16 16 16. The resolution of the voxel grid defines the number of volume elements bounded by the grid and, consequently, the size of the data to be processed. In an alternate approach, we determine the bounding box of the entire set of 70,000 snapshots and generate voxel representation of each snapshot with respect to this outer bounding box (denoted as obb). As this representation is not invariant of the scale nor does it preserve the uniqueness of the atomic locations, it is important to have a very fine resolution of voxel grid. Physically, this could provide better generalization compared to ibb due to its accommodating the scale variance and with an additional cost of more computational resources. Physically, while the differences in atom locations are attributed to effect of kinetic energies in the AIMD simulations at the different temperatures, by using a proper voxel resolution all possible interactions within the alloy are included in the training dataset. Figure 2 illustrates a snapshot of the coordinates for the 54-atom ternary alloy structure, and their corresponding ibb and obb representations.
We implement the entire voxelization process in Python programming language. To accelerate the voxelization operations, we employ Python’s integrated multi-processing approach with Numba . A first-in-first-out (FIFO) queue is created using the queue package and all 70,000 configurations are added to the queue. We use the threading package using parallel programming to process the FIFO queue of the configurations. The voxelization subroutines are optimized and pre-compiled using the just-in-time compilation capability of Numba package. Applying optimization and pre-compilation steps before the subroutine execution allowed us to substantially (2 to 3) reduce the processing time spent for generating the voxel grids on a workstation equipped with Intel Xeon E5-2630 v3 processor and 32 gigabytes of RAM.
3 Deep Convolutional Neural Networks
Data encountered in real life often involves learning from three dimensional (3D) data. Naturally, this has been a major area of research in itself since the eruption of Deep Learning. Many researchers have worked on using 3D convolutional networks (3DCNN) to learn from 3D data. The very first work was for the task of object detection from a 3D computer-aided design (CAD) geometry of simple objects (such as bed, or chair) using a 3DCNN (called as VoxNet) [23, 24]. Further more, researchers have exploited this idea in several areas such as prediction from the point cloud data obtained from LIDARs , engineering data used in design for manufacturing applications , and rendering smooth 3D graphics .
A deep neural network is made up of several layers of connection forming one network which takes a input and produces an output . Each connecting layer() in the network can be represented as , where represents a non-linear activation function, and are the weights and biases, respectively, for connecting the input neurons() to the output() neurons. The connections could be as simple as a dense connection between every input neuron and output neuron in the layer. However, all connections in a dense connection layer may not be meaningful and the sample complexity to learn the connections would be high. A convolution connection instead of a dense connection helps in alleviating this issue. The convolution operation () is given by
A series of convolutional connections, non-linear activations, pooling forms a convolutional neural network (CNN). The voxelized data obtained from the process explained in the previous section is fed into the CNN model. The network architecture, shown in Figure 3, comprises of multiple layers with max pooling and dropout in-between the layers. To account for the difference in the resolution of the ibb and obb, during the training of ibb and obb method, we modify the above parameters by reducing the filter size and no. of filters etc. For the obb methodology, the deep neural network consists of 3 layers with 256 filters and the kernel size of 2 2 2 in the first convolution layer. The strides are 1 1 1 in each convolution layer with a dropout of 0.5 in between the different layers. Subsequent layers had a 50% reduction in filters, while kernel size and strides were constant. For the ibb methodology, increasing the filters and their size along with strides had negligible effect on the performance of the model. All the convolution layers have the LeakyReLu activation ( = 0.001) (except the final 3 dense layers, which had a tanh activation). Finally, the entire network is reduced to three fully connected layers, the layers bearing correspondence to F, F and F obtained from the AIMD simulations.
4 Results and Discussion
For the present implementation, we use the Adam  optimizer along with the L2-norm as the loss function. The test data was 0.3 the total data-set (70,000) used for validation, while the training/validation was done for 500 epochs. The analysis were carried out in a computing architecture with two Nvidia K20 Kepler GPUs.
In Figure 4, the training (Fig. 4(a) and (c)) and the validation (Fig. 4(b) and (d)) losses for the different resolutions of the voxel grids and the two implementations: ibb and obb, are reproduced. We observe that with an increase in the resolution, training and validation losses are reduced significantly for obb methodology till 8 resolution voxel grid. Thus, to minimize losses, the critical size of the voxel grid for the obb method must be greater than or equal to 8. Beyond 8 resolution, there is marginal improvement in accuracy on the expense of significant increase in computational time. Hence, for optimum performance the 8 resolution voxel grid needs to be selected in the obb methodology. From Figures 4(c) and (d), we note the effect of voxel grid resolution in the ibb methodology is less drastic than the obb method. Increasing resolution size improves accuracy and reduces losses but the overall performance is still poor in comparison to the obb method. The losses are constrained around 0.045 (training) and 0.04 (validation) even for the 16 resolution voxel grid size. In contrast, in the obb method we find the losses to be around 0.02 (training) and 0.01 (validation) for the 8 resolution voxel grid.
To show the effectiveness of the present implementation of using voxelization alongside convolutional neural network, we plot a comparison of validation loss (Fig. 5). It is observed that a fully connected deep network representing a dense configuration is out-performed by the CNN network (with obb voxelization framework). For the inner-bounding-box (ibb) voxelization methodology, the performance was poor in comparison to a full dense configuration, while for the outer-bounding-box (obb) methodology there occurs significant reduction in the losses, implying an enhanced accuracy for the proposed mocel. The losses (validation) for the dense network is around 0.02 while the obb 16 resolution model yields a loss around 0.0075. Thus, the inherent nature of the CNN to learn the local features more effectively and perform well even with a higher number of learnable parameters can be utilized to develop network architectures to generate the potential energy surface for complex multi-component alloy systems more effectively than the use of traditional dense networks.
The accuracy of the model is shown through the comparison between the actual values of forces and the predicted forces for a test set of coordinates. The comparison results are shown in Fig. 6 with blue marker representing the real force and red marker representing the predicted force. Figure 6(a) and 6(b) shows the distribution of the data points. In Fig. 6(a) the real forces in the three directions (x, y, z) are marked for 1000 test data points. The real forces distribution is largely concentrated around the center with few large values. Similar distribution is observed in all the three directions for the real force. We can observe in Figure 6(a) that the corresponding red markers (predicted values) are very close to the real forces, confirmed in Figure 6(b) (histogram), and Figure 6(c) (as Box plot). The distribution that resembles a Gaussian distribution is very similar for the real and predicted forces except for a few columns of data. Out of the three directional components of the force, F (Force in x direction) is predicted most accurately. The box plot shown in Figure 6(c) correlates with this observation. The median and the (25, 75) quantiles match exactly for F while there are deviations in the values of F and F from the real forces. In case of F, the median of real data and predicted data are significantly close, however width of the box does not match for the two data sets. It shows that the variance in the real data set of F is not captured perfectly by the model. Similarly we observe some discrepancies between the real and predicted values of F. In spite of these slight differences, this model provides a very good platform to create alternative solutions to the computationally expansive DFT-based simulations and model interactions between atoms.
Using CNN to estimate the interatomic potential of CSAs is novel and previously only few attempts have been made to generate the potential for one elemental systems [11, 14, 15]. Prior studies have often discussed low accuracy and high validation errors associated with dense networks. Our results corroborate that using a CNN with voxelization can suitably capture the local environments as well as atomic specific properties like bond length, which eventually could be utilized to generate the inter-atomic potential or alloy specific properties. We find that increasing the voxelization resolution in the obb method offers significant reduction in losses (training and validation). With the ibb method, we find that the performance even with increasing voxel resolution is limited, as it fails to capture the local chemical environments and generalizes the results. Future efforts involves directly comparing the structural and transport property predictions using ML potential with that from first-principles simulations.
Fruitful discussions on ML potential with Dr. Nikolai Zarkevich at Ames Laboratory is greatly appreciated. The work at Ames Laboratory was funded by the U.S. Department of Energy (DOE), Office of Science, Basic Energy Sciences, Materials Science and Engineering Division, and, in part (for A.S.) by Director’s discretionary funds. Ames Laboratory is operated for the U.S. DOE by Iowa State University under Contract No. DE-AC02-07CH11358. The work was supported, in part, by the Office of Naval Research (ONR) through awards N00014-16-1-2548, N00014-18-1-2484 and by the U.S. AFOSR under the YIP grant FA9550-17-1-0220. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agencies.
-  J-W Yeh, S-K Chen, S-J Lin, J-Y Gan, T-S Chin, T-T Shun, C-H Tsau, and S-Y Chang. Nanostructured high-entropy alloys with multiple principal elements: novel alloy design concepts and outcomes. Advanced Engineering Materials, 6(5):299–303, 2004.
-  Prashant Singh, Aayush Sharma, Andrei V Smirnov, Mouhamad S Diallo, Pratik K Ray, Ganesh Balasubramanian, and Duane D Johnson. Design of high-strength refractory complex solid-solution alloys. npj Computational Materials, 4(1):16, 2018.
-  Michael C Gao, Jien-Wei Yeh, Peter K Liaw, Yong Zhang, et al. High-Entropy Alloys. Springer, 2016.
-  Stéphane Gorsse, Daniel B Miracle, and Oleg N Senkov. Mapping the world of complex concentrated alloys. Acta Materialia, 135:177–187, 2017.
-  Daniel B Miracle and Oleg N Senkov. A critical review of high entropy alloys and related concepts. Acta Materialia, 122:448–511, 2017.
-  Daniel B Miracle. Critical assessment 14: High entropy alloys and their development as structural materials. Materials Science and Technology, 31(10):1142–1147, 2015.
-  Daniel B Miracle. High-entropy alloys: A current evaluation of founding ideas and core effects and exploring “nonlinear alloys”. JOM, 69(11):2130–2136, 2017.
-  Aayush Sharma, Prashant Singh, Duane D Johnson, Peter K Liaw, and Ganesh Balasubramanian. Atomistic clustering-ordering and high-strain deformation of an alcrcofeni high-entropy alloy. Scientific reports, 6:31028, 2016.
-  Aayush Sharma, Sanket A Deshmukh, Peter K Liaw, and Ganesh Balasubramanian. Crystallization kinetics in alcrcofeni (0 x 40) high-entropy alloys. Scripta Materialia, 141:54–57, 2017.
-  Aayush Sharma and Ganesh Balasubramanian. Dislocation dynamics in al0. 1cocrfeni high-entropy alloy under tensile loading. Intermetallics, 91:31–34, 2017.
-  Nongnuch Artrith and Alexander Urban. An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for tio. Computational Materials Science, 114:135–150, 2016.
-  Steve Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics, 117(1):1–19, 1995.
-  XW Zhou, RA Johnson, and HNG Wadley. Misfit-energy-increasing dislocations in vapor-deposited cofe/nife multilayers. Physical Review B, 69(14):144113, 2004.
-  Yue Liu, Tianlu Zhao, Wangwei Ju, and Siqi Shi. Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159–177, 2017.
-  Jörg Behler. Perspective: Machine learning potentials for atomistic simulations. The Journal of chemical physics, 145(17):170901, 2016.
-  Albert P Bartók, Risi Kondor, and Gábor Csányi. On representing chemical environments. Physical Review B, 87(18):184115, 2013.
-  Georg Kresse and Jürgen Hafner. Ab initio molecular dynamics for liquid metals. Physical Review B, 47(1):558, 1993.
-  Georg Kresse and Jürgen Hafner. Ab initio molecular-dynamics simulation of the liquid-metal–amorphous-semiconductor transition in germanium. Physical Review B, 49(20):14251, 1994.
-  John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple. Physical review letters, 77(18):3865, 1996.
-  Peter E Blöchl. Projector augmented-wave method. Physical review B, 50(24):17953, 1994.
-  Georg Kresse and D Joubert. From ultrasoft pseudopotentials to the projector augmented-wave method. Physical Review B, 59(3):1758, 1999.
-  Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, page 7. ACM, 2015.
-  Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1912–1920, 2015.
-  Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Ioannis Kompatsiaris. Deep learning advances in computer vision with 3d data: A survey. ACM Computing Surveys (CSUR), 50(2):20, 2017.
-  D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928, Sept 2015.
-  Sambit Ghadai, Aditya Balu, Soumik Sarkar, and Adarsh Krishnamurthy. Learning localized features in 3d cad models for manufacturability analysis of drilled holes. Computer Aided Geometric Design, 62:263 – 275, 2018.
-  Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. arXiv preprint arXiv:1703.09438, 2017.
-  Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.