# How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids

## Abstract

Computing vibrational free energies () and entropies () has posed a long standing challenge to the high-throughput ab initio investigation of finite temperature properties of solids. Here we use machine-learning techniques to efficiently predict and of crystalline compounds in the Inorganic Crystal Structure Database. By employing descriptors based simply on the chemical formula and using a training set of only 300 compounds, mean absolute errors of less than 0.04 meV/K/atom (15 meV/atom) are achieved for (), whose values are distributed within a range of meV/K/atom ( meV/atom.) In addition, for training sets containing fewer than 2,000 compounds the chemical formula alone is shown to perform as well as, if not better than, four other more complex descriptors previously used in the literature. The accuracy and simplicity of the approach mean that it can be advantageously used for the fast screening of phase diagrams or chemical reactions at finite temperatures.

[] [] [] CEA, LITEN] CEA, LITEN, 17 Rue des Martyrs, 38054 Grenoble, France CEA, LITEN] CEA, LITEN, 17 Rue des Martyrs, 38054 Grenoble, France \altaffiliationCurrent address: Institute of Materials Chemistry, TU Wien, A-1060 Vienna, Austria CEA, LITEN] CEA, LITEN, 17 Rue des Martyrs, 38054 Grenoble, France Stefano2] Dept. Mech. Eng. and Materials Science, Duke University, Durham, NC 27708, USA CEA, LITEN] CEA, LITEN, 17 Rue des Martyrs, 38054 Grenoble, France

## 1 Introduction

The calculation of stability at high temperatures was identified four years ago as one of the major standing challenges for high-throughput (HT) ab initio approaches.^{1}
Solving this problem is fundamental for the prediction of phase diagrams and of chemical reactions. HT phase diagrams have until now typically been calculated using the ab initio formation enthalpies at 0 K.^{2, 3, 4}
This means that in many cases the stable phases will not correspond to the ones at finite temperatures. With respect to chemical reactions, the common practice in HT has been to incorporate only the entropy of the gas phases, and to completely neglect the phonon vibrational contributions. This can lead to important errors in the estimated reaction pressures and temperatures.^{5}
The reason for this widespread neglect of the vibrational free energy is the high computational demand of phonon spectrum calculations. Whereas calculating the formation energy can be solved with calculations on a single unit cell, a vibrational entropy calculation requires computing the interatomic force constants of the solid in a large supercell, which increases the computational time by orders of magnitude. This has represented a bottleneck to the inclusion of the vibrational contribution in HT calculations of phase diagrams and chemical reactions. Solving it is a critical issue with a transforming impact for materials science.

This article shows that HT calculations of and of solids are feasible and can be carried out with satisfactory accuracy and less computational expense than previously thought. Furthermore, several Machine-Learning (ML) approaches are applied to this problem, and it is shown that the chemical composition alone can be efficiently used as a descriptor to rapidly predict vibrational properties, outperforming more sophisticated descriptors for small training sets. The predictive power of this method is validated by comparing predicted entropies of a dozen compounds with measured values from the National Institute of Standards and Technology (NIST) ^{6}. The results presented here are a key step toward the HT screening of phase stability at finite temperatures, with important applications for the computational discovery of new materials including functional compounds such as hydrogen storage materials^{5, 7} .

## 2 Method

Ab initio HT computational methods are a powerful approach to identify
new application-specific materials.^{8, 9, 10, 11, 12, 1, 13, 14, 15, 16, 17, 18, 19}
Instead of investigating materials one at a time, such methods use
algorithms to automate the calculations and analysis. However, the screening process can quickly require tremendous amounts of computational
resources because (i) ab initio calculations, usually performed using
density functional theory (DFT)-based methods, are computationally expensive,
(ii) up to hundreds of DFT runs per compound can be required to compute
some materials properties (e.g. anharmonic thermal conductivity^{20, 21}),
and (iii) the number of prospective candidates can easily climb into the hundreds of thousands.

ML methods can provide a way
to tackle this computational resources issue: instead of
running expensive DFT calculations for all prospective materials,
the materials properties are predicted very quickly using a ML model trained in advance. Using ML techniques
in such a way has allowed identifying materials with targeted properties^{22, 23, 24},
such as compounds with unprecedentedly low thermal conductivity^{25, 26},
NiTi-based shape memory alloys with exceptionally low thermal hysteresis^{27},
and organic polymers with remarkably high band gap and dielectric
constant^{28}.

ML methods build a model (which can be seen as a function) that transforms inputs (also called descriptors, characterizing the materials) into outputs (usually a materials property such as thermal conductivity or dielectric constant) that should be as close as possible to the targets (the actual values of the material’s property). ML methods operate in three stages: (i) the learning phase in which the model is trained by minimizing the differences between outputs and targets for a set of compounds (ii) the test phase in which the model is tested by assessing the differences between outputs and targets for a second set of compounds (different from the training set) (iii) the prediction phase in which the model is used to effectively predict the targets (unknown) of other compounds.

### 2.1 Data sets.

The data set consists of the Inorganic Crystal Structure Database (ICSD) section of the aflow.org repositories (52,671 compounds)^{29, 30, 31}
from which the duplicates are removed (reducing it to 25,705 compounds) as well as
the compounds containing the noble gases Ne, Ar, Kr, and Xe, the elements
from Cm (Z = 96) to Cn (Z = 112), and Tb, At, Rn, Fr (further reducing it to 25,075
compounds). High-throughput calculations are used to compute the phonon frequencies of about 600 compounds randomly selected among this set of 25,075
materials. Out of the 423 phonon calculations that ran smoothly, the 131 calculations which resulted in imaginary phonon frequencies are discarded, with a tolerance of rad/ps for the first three frequencies. The data set employed in the following to predict the vibrational properties consists of the 292 remaining calculations.

### 2.2 Descriptors.

The set of descriptors is one of the main components that determine the performance of a ML method. This set should satisfy a few constraints: it should contain the same number
of features (or components) for all materials, and for instance not
depend on the number of atoms or atom types; it is also well accepted
that good sets of descriptors should be invariant under rotation and translation
of the materials structure as well as under permutation of atoms—so
that every material is characterized by a unique vector.
This means that, unless one restricts the materials of study to a
specific class (such as Heuslers^{11, 32}
or ternary oxides^{33}),
a lot of thought must be put into the design of appropriate descriptors. Many different
types of descriptors complying with such constraints have been proposed
in the literature. Some of them are structural: a non-exhaustive list
includes Coulomb matrices^{34}, bags of
bonds^{35}, pair
correlation functions^{36}, graph-theoretic variables^{37}. Other descriptors are physical, such as transformations (mean, standard deviation…) of properties (masses, radii…) of the atoms of the material^{38, 39, 40}.

In this article, a set of descriptors based exclusively on the chemical composition is first employed. In a second step, the performance of four other more complex descriptors is explored: despite the popularity of ML techniques in materials modeling very few studies have compared different descriptors on the same basis. The four additional sets are based on the pair correlation functions, O’Keeffe’s description of solid angles, bispectrum components, and the properties of the atoms of the material.

#### Chemical composition.

The first set of descriptors contains exclusively the chemical
composition information of the compounds. The vector of descriptors
has 87 components, each component being the fractional
composition of the compound in the element type (
is H, is He…). For instance, \ceMg2Si is described
by . Seko and collaborators employed similar descriptors for the prediction of low-thermal-conductivity compounds, except that they use binary digits to represent the presence of chemical elements and they do not account for the fractional composition.^{26}

#### Elemental properties of the atoms.

Twelve elemental properties of the atoms are considered: atomic number , mass , radius , column number , row number , electronegativity , Pettifor scale , Pettifor number , and number of s, p, d, f valence electrons (represented by , , , ). For all compounds the elements are generated, they give the value of the th elemental property for the th atom of the unit cell. Five descriptors are defined for each of the 12 elemental properties: the mean, minimum, and maximum of the property over the atoms, the value of the property for the most abundant chemical element (or the mean of the property over the most abundant chemical elements if two or more chemical elements are equally dominant), and ( being the number of atoms), respectively denoted as mean, min, max, ab, and var, and resulting in a set of 60 descriptors.

#### Bartók-Partay’s bispectrum components.

Bartók-Partay et al. have developed a type of interatomic potentials, the Gaussian Approximation Potentials (GAP), based on the bispectrum components of the atoms.^{41, 42} The bispectrum components, defined for each atom of the unit cell, are based on the local atomic density surrounding the atom and are meant to describe its atomic environment while being invariant
under translation, rotation and permutation of atoms. The bispectrum components are generated using the LAMMPS code ^{43} and a radius cutoff of 6 . The band limit is set to 5 and only the diagonal components are considered, resulting in 52 components per atom. This gives for
all compounds the elements , representing the bispectrum
component of the atom of the unit cell. The bispectrum
components are then averaged over the atoms of the unit cell of the same chemical type
to form the matrix of size number of bispectrum components
number of different elements, i.e. , where the
element represents the mean of the bispectrum component
over all atoms of chemical type ( is H,
is He…). The vector of descriptors employed consists of the 4524
elements.

#### O’Keeffe’s solid angles.

Inspired by O’Keeffe’s definition of coordination numbers^{44},
a set of descriptors is based on the solid angles subtended by the
faces of the Voronoi polyhedron centered at each atom. The solid angles , centered at each atom , represent the solid
angle between the atom and one of its neighbors (among its set
of neighbors ). The derived elements
are computed as
: ;
is the number of atoms of chemical type , and
() is equal to 1 if the () atom is of chemical
type () and 0 otherwise. This means that for each
atom of the unit cell the solid angles are computed for all its neighbors,
and summed up for neighbors of same chemical type . The solid-angle sums are then averaged over the atoms of the unit cell which
are of the same chemical type . The vector of descriptors
employed consists of the (7,569) elements.

#### Pair correlation functions.

Another set of descriptors uses the partial radial distribution function
representation that considers the distribution of pairwise distances
between two atom types, and , as described by Schütt
et al.^{36}
The functions are splitted into 200 bins, each bin spanning 0.1 ,
which leads to a set of (1,513,800) descriptors.

## 3 Results and discussion

### 3.1 -only versus full Brillouin zone calculations.

The vibrational entropies and free energies can be computed at different temperatures based on the phonon density of states. Such phonon density of states can be obtained with first-principles calculations, using a rather dense phonon wave-vector q-point grid or a large supercell. However, because of computational resources, it is desirable to use a phonon density of states computed using single unit cells and a limited number of q-points. The most extreme simplification consists in only using the (i.e. point) phonon frequencies.

It is possible to assess how the vibrational properties calculated using the
phonon frequencies at (i.e. coarser but much cheaper) compare with those
calculated using the full phonon spectra (i.e. more accurate but also
much more expensive). The data provided by Togo and collaborators^{45, 46, 47, 48}, consisting of 207 compounds, are used for this.
Because the acoustic frequencies , which are zero at ,
are generally non-zero at other points, one must use a non-zero representative of the acoustic frequencies. Here a fractional part of the average optical frequencies is used: , where
represent the optical frequencies. The value is taken as it is the value that gives best overall results.

Figure 1 displays the values of all the vibrational properties considered, calculated from the phonon frequencies at , against the more accurate values calculated from the full phonon density of states. On the plots, the mean absolute errors (MAE), root mean squared errors (RMSE), Pearson and Spearman correlations are provided. For all properties, the Spearman and Pearson correlations are higher than 0.95, and for the maximal phonon frequencies, which are the least well predicted properties, the MAE (7.45 rad/ps) represents about 6% of the range of the maximal frequencies (about 130 rad/ps). This shows that the phonon frequencies at already result in well approximated values of the vibrational properties. Because this approximation allows massive computational resource savings, it is used to calculate the vibrational properties of the 292 compounds—together with the acoustic frequencies as defined above.

### 3.2 How well are vibrational properties predicted based solely on chemical composition?

The phonon frequencies at are computed for randomly selected materials among our data set of 25,075 compounds. After discarding erroneous calculations 292 sets of phonon frequencies are obtained. They are used to calculate the vibrational entropies and free energies per atom at 300 K, the maximal phonon frequencies , and the arithmetic and geometric means of phonon frequencies. The performance of machine learning for the prediction of the vibrational properties is then studied using the chemical compositions as descriptors. The machine learning algorithms are described in the Computational details section.

To assess the performance of the ML method, the k-fold cross validation technique is employed with and the performance is averaged over all (i.e. 10) cross validations. The corresponding MAE, RMSE, Pearson and Spearman correlations are given in Figure 2, next to the plots showing the predicted vs. computed vibrational properties as obtained with the 14-fold cross validation. Given the small training set (fewer than 300 compounds) and the simplicity of the descriptors (containing only the chemical composition), the performance of the ML approach is impressive: the Pearson and Spearman correlations are superior to 0.9 for all properties, and the MAE is less than 6 rad/ps for the average phonon frequencies (for a range of 250 rad/ps) and is less than 15 meV/atom for the vibrational free energies (for a range of 300 meV/atom).

In a second step, the performance of the method is assessed by comparing
directly the vibrational entropy predictions with experimental values taken from
NIST^{6}. Compounds relevant for hydrogen storage are selected as vibrational entropies
can play a role in the stability of such compounds but they are often neglected. The materials considered are all the crystalline solids of Ref. 5
and of Table 3 of Ref. 7 for which
the entropy is available in NIST^{6}. Figure 3 shows the plot of the vibrational entropies as predicted with the ML model (trained on
the 292 compounds) against the ones measured experimentally. The agreement
between the predictions and the experiments is remarkable given the
simplicity of the method and the approximations done. This shows that the approach can effectively be employed to rapidly predict the vibrational properties of crystalline compounds.

### 3.3 What descriptors yield the best prediction of vibrational properties?

So far the set of descriptors employed only contained the chemical composition information. In particular, the inputs hold no information regarding the structure of the compound or the physical properties of the atoms. Now, in addition to the chemical composition, four additional sets of descriptors are explored. They are based respectively on the pair correlation functions, properties of the atoms, bispectrum components, and O’Keeffe’s solid angles.

The performances of the different descriptors are evaluated using the MAE, shown on Figure 4. The smaller the MAE, the more accurate the prediction. The results show that the two most competitive sets of descriptors are those based on the chemical composition and on the elemental properties of the atoms. The three other types of descriptors (based on the pair correlation functions, O’Keeffe’s solid angles, and bispectrum components) do not perform as well. Interestingly, the descriptors tend to work better when they contain fewer components: the descriptors based on the pair correlation functions (with components) are overall the least effective set of descriptors, the descriptors based on the O’Keeffe’s solid angles (with components) and on the bispectrum components (with components) have a similar and intermediate overall performance, while the most successful descriptors (based on the chemical composition and on the elemental properties of the atoms) contain respectively 87 and 60 components.

Figure 5 shows the correlogram between the vibrational entropies and the properties of the elements in the compounds. For each property of the atoms (atomic number, mass, radius, …), only the component most correlated with (out of mean, min, max, ab, and var) is presented. The property most correlated with the vibrational entropies is the mean of the rows of the atoms . However, the performances obtained when using this descriptor only are not nearly as good as using all the descriptors: 0.055 meV/K/atom (MAE), 0.071 meV/K/atom (RMSE), 0.81 (Pearson), and 0.80 (Spearman), vs. 0.037 meV/K/atom (MAE), 0.051 meV/K/atom (RMSE), 0.91 (Pearson), and 0.92 (Spearman) when using the full set of descriptors. This shows that simple intuition cannot achieve the same result as the one obtained with machine learning. Machine learning affords results beyond what classical rules of thumb can provide.

It is also useful to study the performance of these five different sets of descriptors for predicting properties other than the vibrational ones. The metallic or insulator character of the material, provided in the aflow.org repository, is particularly well suited for this. The advantage is that now the performance of the different descriptors can be evaluated as a function of the size of the training set, which is much larger than the training set available for vibrational properties. For the prediction of the metallic vs. insulator (M/I) character, the whole set of 25,075 compounds is considered. The M/I character of the materials is based on the band gaps provided in the aflow.org repository. We are aware of the limitations of DFT to compute material band gaps, our focus is not to predict the M/I character of additional compounds but to study the predictive capabilities of the model.

Figure 6 shows the performances of the descriptors against the size of the training set. The f1-score is used as indicator of performance. The f1-score is defined for each class (metal and insulator) as the harmonic mean of precision and recall (times 2 to scale the score to 1): f1-score . The global f1-score is calculated as a weighted average of the f1-score of each class. The larger the f1-score, the better the descriptors. The results are consistent with our above discussed findings regarding the vibrational properties: when the training set contains fewer than 2,000 compounds, the descriptors based on the chemical composition and on the elemental properties of atoms are the most powerful sets of descriptors, achieving similar performance. However, as the size of the training set is increased, the sets of descriptors based on the properties of the atoms, on the pair correlation functions, and on the O’Keeffe’s solid angles surpass the one based on the chemical composition.

## 4 Conclusions

Three major conclusions emerge from all the results above.

First, machine learning makes it possible to efficiently predict the vibrational entropies and free energies of crystalline compounds. Using a set of descriptors simply based on the compound’s chemical formula, and a training set of 292 compounds in the ICSD, a MAE of 0.037 meV/K/atom (13.2 meV/atom) is achieved for the prediction of vibrational entropies (free energies), whose values range from 0 to 0.9 meV/K/atom (from -100 to 200 meV/atom). Excellent performance is also demonstrated by directly comparing predicted entropies with measured values from NIST for a dozen compounds relevant for hydrogen storage . The obtained MAE of 0.047 meV/K, and Pearson and Spearman correlations of 0.85, mean that the predicted entropies are good estimates of the measured values. This approach therefore allows for the rapid calculation of vibrational free energies and entropies, representing a key step toward the study of materials stability at finite temperatures using HT methods.

A second conclusion is that the vibrational properties computed from the phonon frequencies at are already good approximations of the values calculated from the full phonon density of states. This is an important piece of information in situations where the computational resources to compute the full phonon density of states are not available.

The third conclusion is that, for small training sets, descriptors based on the chemical composition and on the elemental properties of atoms of the material perform best. It is only for much larger training sets, from about a few thousands of compounds, that the set of descriptors based on the chemical composition is outperformed by some of the more elaborate descriptors, like those based on the bispectrum, pair correlation functions, or O’Keeffe’s solid angles.

## 5 Computational details

The phonon frequencies of 292 randomly selected compounds are computed using
DFT^{49, 50}
as implemented in the Vienna Ab initio Package (VASP)^{51}.
The projector augmented wave (PAW) method is employed to deal with the core and valence electrons^{52}. The specific choice of PAW datasets follows aflow.org’s^{29, 30, 31} recommendations,
and the default cutoffs are used for the plane wave basis. The phonon frequencies are computed at
using density functional perturbation theory^{53}. From the phonon frequencies it is possible to obtain the vibrational entropies () and free energies (), as well as the maximal phonon frequencies (), and the arithmetic and geometric means ( and ) of phonon frequencies. The vibrational entropies and free energies are computed as^{54}:

(1) |

(2) |

In the equations, is the number of phonon frequencies and is the Bose-Einstein factor.

To predict the vibrational properties and the metallic / insulator character of the materials, two different types of ML algorithm are systematically employed: random forests and non-linear support vector machines. The number of trees is set to 500 for the random forests. It is checked that an increase in the number of trees does not result in better performance. For non-linear support vector machines, the radial basis function kernel is used and the and coefficients are optimized for each different descriptors-properties system. For the prediction of the metallic / insulator character for which different training sets are considered, the and coefficients are optimized for a training set of 1,000 compounds and the optimized values are used for the other training sets. Only the best performance is presented, which is sometimes obtained using random forests, and at other times using non-linear support vector machines. The performance is assessed by calculating the mean absolute errors between the predictions and the targets for a set of compounds not included in the training set. For the prediction of the metallic / insulator character of the materials for which there is a data set of 25,075 compounds, the model is trained with the training set (containing compounds, being in the range 100-20,000) and the performance of the model is assessed using the remaining data (i.e. a set of 25,075 - compounds). The process is repeated with 10 different (and randomly selected) training sets, and the average performance is presented. For the prediction of the vibrational properties, for which the data set contains 292 compounds, 10 k-fold cross validations (k = 5…14) are performed and the performances obtained are averaged.

## 6 Acknowledgements

The work is supported by M-era.net through the ICETS project (DFG: MA 5487/4-1). We also acknowledge support from ANR through the Carnot MAPPE project.

### References

- Curtarolo, S.; Hart, G. L. W.; Nardelli, M. B.; Mingo, N.; Sanvito, S.; Levy, O. The High-Throughput Highway to Computational Materials Design. Nat. Mater. 2013, 12, 191–201.
- Korbel, S.; Marques, M. A. L.; Botti, S. Stability and Electronic Properties of New Inorganic Perovskites from High-Throughput Ab Initio Calculations. J. Mater. Chem. C 2016, 4, 3157–3167.
- Curtarolo, S.; Kolmogorov, A. N.; Cocks, F. H. High-Throughput Ab Initio Analysis of the BiâIn, BiâMg, BiâSb, InâMg, InâSb, and MgâSb Systems. Calphad 2005, 29, 155 – 161.
- Hart, G. L. W.; Curtarolo, S.; Massalski, T. B.; Levy, O. Comprehensive Search for New Phases and Compounds in Binary Alloy Systems Based on Platinum-Group Metals, Using a Computational First-Principles Approach. Phys. Rev. X 2013, 3, 041035.
- R. Akbarzadeh, A.; Ozolinš, V.; Wolverton, C. First-Principles Determination of Multicomponent Hydride Phase Diagrams: Application to the Li-Mg-N-H System. Adv. Mater. 2007, 19, 3233–3239.
- Chase, M. W. J. NIST Standard Reference Database 13, NIST-JANAF Thermochemical Tables Version 1.0; National Institute of Standards and Technology: Gaithersburg, MD, 1985.
- Alapati, S. V.; Johnson, J. K.; Sholl, D. S. Predicting Reaction Equilibria for Destabilized Metal Hydride Decomposition Reactions for Reversible Hydrogen Storage. J. Phys. Chem. C 2007, 111, 1584–1591.
- van Roekeghem, A.; Carrete, J.; Oses, C.; Curtarolo, S.; Mingo, N. High-Throughput Computation of Thermal Conductivity of High-Temperature Solid Phases: The Case of Oxide and Fluoride Perovskites. Phys. Rev. X 2016, 6, 041061.
- He, J.; Amsler, M.; Xia, Y.; Naghavi, S. S.; Hegde, V. I.; Hao, S.; Goedecker, S.; Ozoliņš, V.; Wolverton, C. Ultralow Thermal Conductivity in Full Heusler Semiconductors. Phys. Rev. Lett. 2016, 117, 046602.
- Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pohls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta, M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and Experimental Investigation of TmAgTe2 and XYZ2 Compounds, a New Group of Thermoelectric Materials Identified by First-Principles High-Throughput Screening. J. Mater. Chem. C 2015, 3, 10554–10565.
- Carrete, J.; Mingo, N.; Wang, S.; Curtarolo, S. Nanograined Half-Heusler Semiconductors as Advanced Thermoelectrics: An Ab Initio High-Throughput Statistical Study. Adv. Funct. Mater. 2014, 24, 7427–7432.
- Garrity, K. F. High-Throughput First Principles Search for New Ferroelectrics. arXiv:1610.04279 2016,
- Emery, A. A.; Saal, J. E.; Kirklin, S.; Hegde, V. I.; Wolverton, C. High-Throughput Computational Screening of Perovskites for Thermochemical Water Splitting Applications. Chem. Mater. 2016, 28, 5621–5634.
- Sluydts, M.; Pieters, M.; Vanhellemont, J.; Van Speybroeck, V.; Cottenier, S. High-Throughput Screening of Extrinsic Point Defect Properties in Si and Ge: Database and Applications. Chem. Mater. 2016, in press.
- Meredig, B.; Wolverton, C. Dissolving the Periodic Table in Cubic Zirconia: Data Mining to Discover Chemical Trends. Chem. Mater. 2014, 26, 1985–1991.
- Chen, H.; Hautier, G.; Jain, A.; Moore, C.; Kang, B.; Doe, R.; Wu, L.; Zhu, Y.; Tang, Y.; Ceder, G. Carbonophosphates: A New Family of Cathode Materials for Li-Ion Batteries Identified Computationally. Chem. Mater. 2012, 24, 2009–2016.
- Hautier, G.; Jain, A.; Mueller, T.; Moore, C.; Ong, S. P.; Ceder, G. Designing Multielectron Lithium-Ion Phosphate Cathodes by Mixing Transition Metals. Chem. Mater. 2013, 25, 2064–2074.
- Varley, J. B.; Miglio, A.; Ha, V.-A.; van Setten, M. J.; Rignanese, G.-M.; Hautier, G. High-Throughput Design of Non-oxide p-Type Transparent Conducting Materials: Data Mining, Search Strategy, and Identification of Boron Phosphide. Chem. Mater. 2017, in press.
- Bhatia, A.; Hautier, G.; Nilgianskul, T.; Miglio, A.; Sun, J.; Kim, H. J.; Kim, K. H.; Chen, S.; Rignanese, G.-M.; Gonze, X.; Suntivich, J. High-Mobility Bismuth-based Transparent p-Type Oxide from High-Throughput Material Screening. Chem. Mater. 2016, 28, 30–34.
- Katre, A.; Carrete, J.; Mingo, N. Unraveling the Dominant Phonon Scattering Mechanism in the Thermoelectric Compound ZrNiSn. J. Mater. Chem. A 2016, 4, 15940–15944.
- Eliassen, S. N. H.; Katre, A.; Madsen, G. K. H.; Persson, C.; Løvvik, O. M.; Berland, K. Lattice Thermal Conductivity of Half-Heusler Alloys Calculated from First Principles: Key Role of Nature of Phonon Modes. Phys. Rev. B 2017, 95, 045202.
- Mueller, T.; Kusne, A. G.; Ramprasad, R. Reviews in Computational Chemistry; John Wiley, Inc, 2016; pp 186–273.
- Isayev, O.; Fourches, D.; Muratov, E. N.; Oses, C.; Rasch, K.; Tropsha, A.; Curtarolo, S. Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints. Chem. Mater. 2015, 27, 735–743.
- Kim, C.; Pilania, G.; Ramprasad, R. From Organized High-Throughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown. Chem. Mater. 2016, 28, 1304–1311.
- Carrete, J.; Li, W.; Mingo, N.; Wang, S.; Curtarolo, S. Finding Unprecedentedly Low-Thermal-Conductivity Half-Heusler Semiconductors via High-Throughput Materials Modeling. Phys. Rev. X 2014, 4, 011019.
- Seko, A.; Togo, A.; Hayashi, H.; Tsuda, K.; Chaput, L.; Tanaka, I. Prediction of Low-Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization. Phys. Rev. Lett. 2015, 115, 205901.
- Xue, D.; Balachandran, P. V.; Hogden, J.; Theiler, J.; Xue, D.; Lookman, T. Accelerated Search for Materials with Targeted Properties by Adaptive Design. Nat. Commun. 2016, 7, 11241.
- Sharma, V.; Wang, C.; Lorenzini, R. G.; Ma, R.; Zhu, Q.; Sinkovits, D. W.; Pilania, G.; Oganov, A. R.; Kumar, S.; Sotzing, G. A.; Boggs, S. A.; Ramprasad, R. Rational Design of All Organic Polymer Dielectrics. Nat. Commun. 2014, 5, 4845.
- Curtarolo, S.; Setyawan, W.; Hart, G. L.; Jahnatek, M.; Chepulskii, R. V.; Taylor, R. H.; Wang, S.; Xue, J.; Yang, K.; Levy, O.; Mehl, M. J.; Stokes, H. T.; Demchenko, D. O.; Morgan, D. AFLOW: An Automatic Framework for High-Throughput Materials Discovery. Comput. Mater. Sci. 2012, 58, 218 – 226.
- Taylor, R. H.; Rose, F.; Toher, C.; Levy, O.; Yang, K.; Nardelli, M. B.; Curtarolo, S. A {RESTful} {API} for Exchanging Materials Data in the AFLOWLIB.org Consortium. Comput. Mater. Sci. 2014, 93, 178 – 192.
- Calderon, C. E.; Plata, J. J.; Toher, C.; Oses, C.; Levy, O.; Fornari, M.; Natan, A.; Mehl, M. J.; Hart, G.; Nardelli, M. B.; Curtarolo, S. The {AFLOW} Standard for High-Throughput Materials Science Calculations. Comput. Mater. Sci. 2015, 108, Part A, 233 – 238.
- Oliynyk, A. O.; Antono, E.; Sparks, T. D.; Ghadbeigi, L.; Gaultois, M. W.; Meredig, B.; Mar, A. High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds. Chem. Mater. 2016, 28, 7324–7331.
- Hautier, G.; Fischer, C. C.; Jain, A.; Mueller, T.; Ceder, G. Finding Natureâs Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory. Chem. Mater. 2010, 22, 3762–3767.
- Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301.
- Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326–2331, PMID: 26113956.
- Schütt, K. T.; Glawe, H.; Brockherde, F.; Sanna, A.; Müller, K. R.; Gross, E. K. U. How to Represent Crystal Structures for Machine Learning: Towards Fast Prediction of Electronic Properties. Phys. Rev. B 2014, 89, 205118.
- Pietrucci, F.; Andreoni, W. Graph Theory Meets Ab Initio Molecular Dynamics: Atomic Structures and Transformations at the Nanoscale. Phys. Rev. Lett. 2011, 107, 085504.
- Seko, A.; Hayashi, H.; Nakayama, K.; Takahashi, A.; Tanaka, I. Representation of Compounds for Machine-Learning Prediction of Physical Properties. arXiv:1611.08645v2 2016,
- Isayev, O.; Oses, C.; Curtarolo, S.; Tropsha, A. Universal Fragment Descriptors for Predicting Electronic Properties of Inorganic Crystals. arXiv:1608.04782 2016,
- Bialon, A. F.; Hammerschmidt, T.; Drautz, R. Three-Parameter Crystal-Structure Prediction for sp-d-Valent Compounds. Chem. Mater. 2016, 28, 2550–2556.
- Bartók, A. P.; Payne, M. C.; Kondor, R.; Csányi, G. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons. Phys. Rev. Lett. 2010, 104, 136403.
- Thompson, A.; Swiler, L.; Trott, C.; Foiles, S.; Tucker, G. Spectral Neighbor Analysis Method for Automated Generation of Quantum-Accurate Interatomic Potentials. J. Comput. Phys. 2015, 285, 316 – 330.
- Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 1995, 117, 1 – 19.
- O’Keeffe, M. A Proposed Rigorous Definition of Coordination Number. Acta Cryst. 1979, 35, 772–775.
- Togo, A.; Tanaka, I. First Principles Phonon Calculations in Materials Science. Scr. Mater. 2015, 108, 1–5.
- Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002.
- Ong, S. P.; Richards, W. D.; Jain, A.; Hautier, G.; Kocher, M.; Cholia, S.; Gunter, D.; Chevrier, V. L.; Persson, K. A.; Ceder, G. Python Materials Genomics (pymatgen): A Robust, Open-Source Python Library for Materials Analysis. Comput. Mater. Sci. 2013, 68, 314 – 319.
- Ong, S. P.; Cholia, S.; Jain, A.; Brafman, M.; Gunter, D.; Ceder, G.; Persson, K. A. The Materials Application Programming Interface (API): A Simple, Flexible and Efficient {API} for Materials Data Based on {REpresentational} State Transfer (REST) Principles. Comput. Mater. Sci. 2015, 97, 209 – 215.
- Hohenberg, P.; Kohn, W. Inhomogeneous Electron Gas. Phys. Rev. 1964, 136, B864–B871.
- Kohn, W.; Sham, L. J. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev. 1965, 140, A1133–A1138.
- Kresse, G.; Furthmüller, J. Efficiency of Ab-Initio Total Energy Calculations for Metals and Semiconductors Using a Plane-Wave Basis Set. Comput. Mater. Sci. 1996, 6, 15 – 50.
- Kresse, G.; Joubert, D. From Ultrasoft Pseudopotentials to the Projector Augmented-Wave Method. Phys. Rev. B 1999, 59, 1758–1775.
- Baroni, S.; de Gironcoli, S.; Dal Corso, A.; Giannozzi, P. Phonons and Related Crystal Properties from Density-Functional Perturbation Theory. Rev. Mod. Phys. 2001, 73, 515–562.
- Landau, L. D.; Lifshitz, E. M. Statistical Physics (Second Revised and Enlarged Edition); Pergamon Press: Oxford, 1969.