CLEASE: A versatile and userfriendly implementation of Cluster Expansion method
Abstract
Materials exhibiting a substitutional disorder such as multicomponent alloys and mixed metal oxides/oxyfluorides are of great importance in many scientific and technological sectors. Disordered materials constitute an overwhelmingly large configurational space, which makes it practically impossible to be explored manually using firstprinciples calculations such as density functional theory (DFT) due to the high computational costs. Consequently, the use of methods such as cluster expansion (CE) is vital in enhancing our understanding of the disordered materials. CE dramatically reduces the computational cost by mapping the firstprinciples calculation results on to a much faster Hamiltonian. In this work, we present our implementation of the CE method, which is integrated as a part of the Atomic Simulation Environment (ASE) opensource package. The versatile and userfriendly code automates the complex set up and construction procedure of CE while giving the users the flexibility to tweak the settings and to import their own structures and previous calculation results. Recent advancements such as regularization techniques from machine learning are implemented in the developed code. The code allows the users to construct CE on any bulk lattice structure, which makes it useful for a wide range of applications involving complex materials. We demonstrate the capabilities of our implementation by analyzing the two example materials with varying complexities: a binary metal alloy and a disordered lithium chromium oxyfluoride.
I Introduction
A class of materials with a substitutional disorder such as multicomponent alloys and mixed metal oxides is said to have a configurational problem. The vast configurational space of these materials makes it practically impossible to explore directly using firstprinciples calculations such as density functional theory (DFT). A quantitative method capable of establishing the relationship between the structure and property of materials is therefore essential. Cluster Expansion (CE) Sanchez, Ducastelle, and Gratias (1984); Fontaine (1994); Zunger (1994); Asta, Ozolins, and Woodward (2001); van de Walle (2008); Zhang and Sluiter (2016) is a method that has been used successfully in the past few decades to parameterize and express the configurational dependence of physical properties. The most widely parameterized physical property is energy computed using firstprinciples methods, but CE can also be used to parameterize other quantities such as band gap Magri and Zunger (1991); Franceschetti and Zunger (1999) and density of states Geng, Sluiter, and Chen (2005).
Despite its success and usefulness in predicting physical properties of crystalline materials, CE remains as a niche tool used in a small subfield within the computational materials science, primarily used by specialists. On the other hand, the research fields in which CE is becoming relevant is on the rise; one such example is the use of disordered materials for battery applications Wang et al. (2015); Abdellahi et al. (2016a, b); Urban et al. (2016); Kitchaev et al. (2018). The objective of our work is to make cluster expansion more accessible for a broad range of computational scientists who do not necessarily possess expertise in cluster expansion. Our approach to achieving such a goal is to implement CE as a part of a widely used, opensource Atomic Simulation Environment (ASE) package Hjorth Larsen et al. (2017). Henceforth, we refer to our implementation as CLEASE, which stands for CLuster Expansion in Atomic Simulation Environment.
Having CE as a part of a widely used package with interfaces to a multitude of opensource and commercial atomicscale simulation codes accompanies several practical benefits: (1) a large existing user base does not need to install or learn a new program as the CE module is a part of ASE and inherits its syntax and code style, and (2) all of the atomicscale simulation codes supported by ASE are also automatically supported by the implemented module. Therefore, the implementation presented in this article appeals to a significant portion of computational materials science community as a versatile and easytolearn package, thereby lowering the barrier to incorporate cluster expansion as a part of their research methods to accelerate computational materials prediction and design.
The rest of the paper is organized as follows. In Results, we describe the implementation of the CLEASE code. In Discussion, two application examples with different levels of complexities, namely a binary metal alloy and a lithium metal oxyfluoride, are presented. The computational settings and technical details for the examples are provided in Methods. As the objective of this work is to make cluster expansion accessible to nonspecialists, a brief overview of cluster expansion formalism and other important concepts are provided in Supplementary Information in order to aid the readers who are not familiar with the cluster expansion method.
Ii Results
In cluster expansion, a configuration (or a structure), , is decomposed into a sum of clusters as shown in Fig. 1. The symmetrically equivalent clusters are classified as the same cluster, and the collection of all symmetrically equivalent clusters are denoted with an . Each cluster type, , has a correlation function, , which represents an average value of the product of a basis function assigned to each element consisting the cluster. The target physical quantity to be described by CE, , is then written as
(1) 
where is the effective cluster interaction (ECI) of an empty cluster and is the ECI per atom for cluster , which corresponds to the clusters of size one and higher. Construction of a CE model ultimately boils down to selecting clusters that are relevant for mapping configuration, , to the target physical quantity, , and determining ECI values of the selected clusters.
CLEASE utilizes the existing classes and methods of ASE to perform necessary manipulations and analyses for carrying out CE. Among many adopted features, the most noteworthy are the use of

an Atoms object to represent an atomic configuration (),

a builtin database to store settings, atomic configurations of the training set and values of the correlation functions (), and

a Calculator class to determine the physical quantity of a new configuration based on its correlation functions and their ECI values.
Although it may not be trivial, CE needs to keep track of vacancies, which can easily be achieved by treating a vacancy as a regular atom with its atomic symbol set to ‘X’ or atomic number set to zero. The inheritance of the existing features of ASE allows CLEASE to be fully integrated to ASE where the users can incorporate CE as a part of their research without losing the continuity with the rest of their workflow.
A simple flowchart illustrating the procedure for constructing CE using CLEASE is shown in Fig. 2. The CLEASE workflow can be divided in to three main components: definition of CE settings, generation of training structures and evaluation of CE convergence. CLEASE is written in Python programming language and takes an objectoriented approach where each component has its own class. The modular design approach not only enables easy implementation of new features but also makes the code flexible to use and intuitive to follow the CE construction and evaluation procedure shown in Fig. 2. A more detailed description of main components of the procedure is provided below.
ii.1 Definition of Cluster Expansion Settings
The most fundamental component is to define which underlying crystal structure to use. ASE offers two functions to generate a crystal structure: bulk and crystal. The bulk function provides a simple way of generating common types of crystal structures by specifying the name of the crystal structure and its lattice constant value(s). The supported crystal structures are simple cubic, facecentered cubic, bodycentered cubic, hexagonal close packed, diamond, zinc blende, rocksalt, cesium chloride, fluorite and wurtzite structures. For more complicated crystal structures, a crystal function is used to generate a crystal structure by providing its space group, lattice parameters and scaled coordinates of the unique atomic sites. The definitions of the cluster expansion settings are specified using CEBulk and CECrystal classes, which respectively calls bulk and crystal functions to generate an Atoms object with the userspecified crystal structure.
The maximum size of the supercell on which the DFT calculations are performed is also defined along with the definition of the underlying crystal structure. The maximum supercell size is specified using a supercell_factor parameter, which is an integer corresponding to the product of the expansion coefficients i, j and k of supercell of the unit cell generated using bulk or crystal function. For example, a supercell_factor of 4 includes , , , and cells and all of the possible permutations of i, j and k resulting in a unique cell (i.e., only one of and cells is kept if one can be mapped on to another by rotation and reflection). The use of supercells with varying sizes and shapes enables the exploration of a larger configurational space without adding extra computational burden compared to using one fixed supercell size and shape (e.g., only using supercell). A set of training structures for CE are later generated iteratively from the pool of possible structures that are realizable using these supercells. To reduce the required computational resources, the structures using smaller supercells are generated (and calculated) first, followed by the larger supercells. The users also have a flexibility to select the supercell size using an optional size parameter, which is a vector specifying the values of , and .
Theoretically, an infinite number of clusters can be generated for a given system. The number of clusters is limited to a finite size in practice, and CLEASE takes an approach to generate all possible clusters that are under the truncation threshold (i.e., a maximum number of atoms in clusters and maximum diameter) specified by the user. A whole or subset of the generated clusters is selected during the convergence evaluation process. By default, up to fourbody clusters (i.e., empty, one, two, three and fourbody clusters) with their diameters smaller than the maximum diameter^{1}^{1}1The maximum cluster diameter is defined to be strictly smaller than half the smallest distance between periodic images of all the constituting atoms in the cell. that can be represented by the largest supercell are generated. The users have an option to define their own threshold settings both at the beginning of the CE procedure and at a later stage of the CE iteration cycles. CLEASE also offers a feature to visualize the generated clusters in order to assist the user to develop an intuition on the generated clusters.
Within the CE formalism, there does not exist a unique set of definitions for basis functions; they are considered valid if all of the basis functions are orthogonal. Consequently, several definitions are used in practice. The two most widely used definitions are the original definitions by Sanchez et al.Sanchez, Ducastelle, and Gratias (1984) and the one later developed by van de Wallevan de Walle (2009), which is used in the Alloy Theoretic Automated Toolkit (ATAT) van de Walle, Asta, and Ceder (2002); van de Walle (2009). The two definitions are equally valid, and both are implemented in CLEASE.
CLEASE offers an option to ignore a set of symmetrically inequivalent atomic sites if they are always occupied by one element type for all possible configurations. The contributions of these atoms are not explicitly included in the cluster expansion and are automatically included in the constant term () in Eq. 1. For example, lithium metal oxides (\ceLiMO2) with firstrow transition metals (M = {Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu}) have a rocksalt structure Urban, Lee, and Ceder (2014); Urban et al. (2016) with an exception of \ceLiMnO2, which is orthorhombic Hewston and Chamberland (1987). The rocksalt structure consists of two facecentered cubic sublattices. For the case of rocksalt \ceLiMO2, one sublattice is occupied by lithium and other metal atoms while the other is occupied by oxygen atoms. The configurational space of such systems can be reduced to a cation sublattice consisting only of two element types (the oxygen sublattice is ignored), which significantly reduces the complexity of the CE model. As such, an optional Boolean argument is present in CLEASE to enable/disable the reduction of configurational space by ignoring the such atoms if they exist in the system.
ii.2 Generation of Training Structures
CLEASE uses NewStructures class to generate training structures, which provides three different methods perform the task. The first and most trivial method is to generate a set of random structures. This method serves to generate an initial pool consisting of a small number of structures. The random generation method is used in the first iteration cycle of CE construction as shown in Fig. 2. An initial cluster expansion is capable of making a first set of predictions albeit with a low accuracy. It is noted that all of the generated training structures, along with their correlation function values, are stored in a database file.
Once the initial CE is constructed, the user is given three different choices for introducing an additional set of training structures. The first and most straightforward option is to keep generating random structures. Although trivial, generating random structures is claimed to be the best strategy when compressive sensing is used to select clusters Nelson et al. (2013). The second method is to generate groundstate and other lowenergy structures based on current cluster expansion (i.e., based on the pool of structures already calculated) van de Walle and Ceder (2002), which have the enthalpies of formation on or near the convex hull Urban, Seo, and Ceder (2016). The inclusion of groundstate and neargroundstate structures serves an important purpose of increasing the accuracy in predicting the correct ground states. A global minimization technique can be used to generate (near) groundstate configurations, and CLEASE uses a simulated annealing technique.
The last method of generating the training set is referred to as a “probe structure” method Seko, Koyama, and Tanaka (2009); Seko and Tanaka (2014). The probe structure method introduces a new structure that minimizes the mean variance of the predicted physical quantity . The mean variance of the predicted quantity is written as Seko, Koyama, and Tanaka (2009)
(2) 
where is the variance of the error in the training set, is the covariance matrix of the correlation functions of the training set and is a vector of the mean correlation functions of the structures in the training set. The probe structure is the one that reduce the value of the most when introduced to the training set, which is found using the simulated annealing procedure.
The newly generated structures are compared with the existing structures in the training set in order to avoid introducing duplicate structures. We adopted the structure comparison algorithm developed by Lonie and ZurekLonie and Zurek (2012) to identify equivalent structures. It is desirable to have the new structure compared against the existing structures in the training set as efficiently as possible. As a first step, the structures that have different chemical composition than the newly generated structure are filtered, and the new structure is compared only with the remaining structures. Once the candidate transformations for mapping the new structure onto one structure in the database are identified using the algorithm suggested by Lonie and Zurek, we note that exactly the same transformations can be used for the remaining structures in the database. Therefore, the structure comparison algorithm implemented in ASE is optimized for the case where one structure is to be compared against many.
In addition to the aforementioned methods of generating the training structures, CLEASE also offers a builtin function to import structures to the database. The import function also has an option to specify the calculated value, which allows users to easily import the previously calculated results.
ii.3 Evaluation of Cluster Expansion Convergence
An evaluation process to determine the convergence of CE includes a selection of clusters, a determination of their ECI values and an assessment of the leaveoneout crossvalidation (LOOCV) score using the selected clusters and their ECI values. An entire evaluation process is performed using an Evaluate class.
The simplest way to determine the ECI values of the generated clusters is by using ordinary least squares (OLS) to minimize residual sum of squared errors (RSS). It is highly likely that the ECI values found using OLS are overfitted. Therefore, both and regularization methods are implemented, and it is highly recommended to use a regularization methods to select clusters and evaluate their ECI values.
A default option in the Evaluate class is to include all of the clusters generated using the truncation conditions specified in CEBulk or CECrystal class, and either the entire or a subset of these clusters are selected for fitting depending on the method used. The Evaluate class provides additional options in which the users can select a subset of the generated clusters to perform any of OLS, and regularization. The first option is by manually specifying which clusters to include, while the second option is to provide a stricter truncation conditions than the ones set in the CEBulk or CECrystal class. The feature to freely select a subset of a large pool of clusters along with the use of OLS, and regularization methods allows the users to easily experiment with various settings to understand how the system behaves and to optimize the ECI values for achieving the lowest LOOCV score.
To further assist the evaluation process, the Evaluate class contains two builtin methods that automatically determine the LOOCV when a regularization method is used. The first method, plot_fit, determines and stores the selected clusters and their ECI values for a value of regularization parameter () specified by the user. It also plots the fit of all data points in the training set to their calculated values and presents the LOOCV score of the specified value. Since the most cumbersome task in determining the convergence of CE is finding the optimal value that yields the lowest LOOCV score, another method, plot_CV, is also implemented. It takes a range and number of values to evaluate as inputs and returns the best value in the specified range along with its LOOCV score. The plot_CV method also plots the LOOCV score as a function of and provides an option to store the results in a log file such that the users can add more values to the list at a later stage without having to reevaluate the same values in the process.
ii.4 Metropolis Monte Carlo and Simulated Annealing
The user can perform statistical sampling of the system on a larger simulation cell once the cluster expansion is constructed. The final selection of cluster and their ECI values can be stored and passed to other classes to conduct statistical analyses. A separate Calculator class for cluster expansion is implemented in ASE. The Clease calculator class takes a list of clusters and their ECI values as inputs, and the users can select what type of trial moves are allowed. The sampling in the canonical ensemble allows the swapping two atoms with different constraint conditions (i.e., swap any two atoms, swap any two atoms in the same basis, swap two nearest neighbors, swap two nearest neighbors in the same basis) while the semigrand canonical ensemble allows changing the type of occupying element at a random site.
The evaluation of the physical quantity is performed using Eq. 1, which is a fast because the Clease calculator keeps track of the changes in the Atoms object to update the correlation functions. When the physical quantity being modeled is energy, a trial move of the standard Metropolis algorithm has an acceptance probability Thijssen (2007)
(3) 
where and are the energy of the new and old configuration, respectively. is the Boltzmann constant and is temperature in Kelvin. As the Clease calculator keeps track of the change in the Atoms object after each move, updating the correlation functions is restricted to the contributions of one and two atoms for the semigrand canonical ensemble and canonical ensemble, respectively.
Iii Discussion
Here, we present two example systems to illustrate the capabilities of the CLEASE code. The first example illustrates the investigation of a Au–Cu binary alloy. The second example shows the cluster expansion on a more complex \ceLi2CrO2F system consisting of four types of elements and vacancy.
iii.1 Au–Cu Alloy
The binary Au–Cu alloy system is studied at temperatures ranging from 100 K to 800 K over the entire composition range. The resulting values obtained for both and regularization are shown in Fig. 3. The ECI value of the empty cluster, which is found to be eV/atom for both cases, is not included in Fig. 3 for better visibility. The LOOCV score for the  and regularized fit were 6.05 meV/atom and 6.00 meV/atom, respectively. The two regularization schemes yield essentially the same CV score, despite the fact that the regularized fit has only 8 clusters while the regularized fit has 17. The two regularization schemes give a similar qualitative trend although they have give different number of clusters and their ECI values. For instance, the magnitude of the onebody term is the largest while the signs of the twobody terms (pair interaction) are the same (e.g., the nearestneighbor term is positive while the second and fourth nearestneighbor terms are negative).
A qualitative information on the thermodynamic behavior of the system can be extract by inspecting the ECI values for simple binary system. Based on the fact that the energetically favorable configurations have DFT energies that are more negative than less favorable ones and that the two site variables are or , one can infer that a positive ECI value of the pair interaction term means that a pair consisting of two different elements is energetically preferred at a low temperature. Based on the ECI values in Fig. 3, the ECIs of the nearestneighbor and second nearestneighbor pairs are negative and positive, respectively. This indicates that the Au–Cu system energetically favors the strong mixing of the constituting elements such that the –Au–Cu–Au–Cu– pattern is likely to emerge, which is in a good agreement with experimental and computational observations Lysgaard et al. (2015).
It is experimentally determined Massalski et al. (1986) that Au–Cu alloys have three ordered phases at low temperatures: \ceAuCu3, \ceAuCu and \ceAu3Cu. The formation enthalpy, free energy of formation and configurational entropy are obtained through the Metropolis Monte Carlo simulation and are shown in Fig. 4. As the CE is trained with fully relaxed structures (zero pressure), the formation enthalpy is determined using
(4) 
where is the internal energy of the configuration, is the gold concentration, is the internal energy of pure gold and is the internal energy of pure copper. Similarly, the free energy of formation is obtained by subtracting the weighted average of the free energy for the pure phases. The configurational entropy is given by the difference between the internal energy and the free energy, divided by the temperature at which the Monte Carlo is sampled. The three ordered phases (\ceAuCu3, \ceAuCu and \ceAu3Cu) are found on the convex hull of the free energy of formation in Fig. 4b. Furthermore, the entropy of the ordered phases form local minima as shown in Fig. 4c. As the temperature increases, the free energy becomes a smooth convex curve with a minimum at composition, and the system is in a random phase with no shortrange order.
An accurate estimate of the order/disorder transition temperature can be found by tracking the evolution of an order parameter. As the system evolves, we track the average fraction of sites having a different element than the same site in the ground state. Furthermore, we normalize this number by the expected fraction if different sites in the random phase, and use as an order parameter for detecting the phase transition as shown in Fig. 5. The order/disorder transition is predicted to occur near 450 K, which is in a close agreement with the previous CE calculation van de Walle and Ceder (2002).
iii.2 Lithium Chromium Oxyfluoride
One of the recent focus areas of lithiumion battery research is the development of highcapacity cathode materials. Lithium metal oxyfluorides (\ceLi2MO2F, {V, Cr, Mn, Ti, Ni, }) is a family of materials that is at the forefront of the current research. The challenges for studying \ceLi2MO2F is in the vast size of the configurational space, which exhibit not only the cation disorder commonly found in lithium metal oxides Abdellahi et al. (2016a); Urban et al. (2016) but also anion disorder which is also present due to the mixed O/F composition Chen et al. (2015, 2016). The fact that the underlying crystal structure of \ceLi2MO2F can vary at different lithiation levels Cambaz et al. (2016) adds the complexity to investigate their properties. It is, however, known that the most predominant crystal structure is of disordered rocksalt type Ren et al. (2015), particularly at highlithiation levels. We therefore show an example CE study of \ceLi2CrO2F in a rocksalt configuration. It is important to point out that the presented test model does not account for the possible occupancy of the tetrahedral sites by Li, which is known to occur upon delithiation Lee et al. (2014); Urban, Lee, and Ceder (2014); Abdellahi et al. (2016b). More specifically, each tetrahedral site are adjacent to four octahedral sites, and Li from one of the four octahedral sites migrate to the tetrahedral site if the remaining three are vacant (referred to as a 0TM condition) Lee et al. (2014); Urban, Lee, and Ceder (2014); Abdellahi et al. (2016b). Although the omission of the tetrahedral site occupancy makes the model incomplete, it provides useful insight on how complicated systems with a vast configurational space can be studied using CLEASE, which is the aim of this example.
The Monte Carlo annealing study reveals that \ceLi2CrO2F (i.e., fully lithiated compound) takes a layered structure at room temperature (293 K) as shown in Fig. 6a. The layer structure shows a –Li–F–Li–O–Cr–O– pattern, which is similar to a –Li–O–M–O– layered pattern observed in lithium metal oxides Urban et al. (2016); Mizushima et al. (1981); Van der Ven and Ceder (2004). The layered structure is lost immediately upon delithiation, which leads to disordered structures as shown in Fig. 6b and c. The emergence of disordered structures agrees well with the previous experimental observations Ren et al. (2015); Chen et al. (2016), and it is important to model the disordered atomic arrangement it is has a direct link to the Li transport mechanism (e.g., a presence of zerotransitionmetal pathways Lee et al. (2014); Urban, Lee, and Ceder (2014); Urban et al. (2016)).
An interesting feature of LiCrOF is the phaseseparation of Li and vacancies, which leads to a segregation of vacancies that can be seen even at high lithiation levels such as and as shown in Fig. 6b and c, respectively. The vacancy segregation is an important feature to understand since it affects the transport properties of Li and promotes leakage of O and F anions. In fact, the oxygen loss are known to occur in lithium metal oxides Armstrong et al. (2006); Sathiya et al. (2013a, b) and also in lithium metal oxyfluorides Lee et al. (2017) although its extent is suppressed. The oxygen loss leads to poor capacity retention as it is an irreversible process. The phaseseparation of Li and vacancy indicates that oxygen loss is likely to take place for \ceLi2CrO2F, which aligns with its poor capacity retention of only 64% after 60 cyclesChen et al. (2016) and irreversible change of the local structureChen et al. (2016).
It is pointed out that while the presented model correctly predicts the disordered structure and poor cyclability of \ceLi2CrO2F, the vacancy segregation shown in Fig. 6b and c needs to interpreted with care. The aforementioned omission of Li occupancy in interstitial tetrahedral sites is likely to occur, especially due to the vacancy segregation. The missing migration of Li to tetrahedral site may suppress the further clustering of vacancies. With the missing tetrahedral site occupation mechanism, current model predicts the vacancy cluster to keep growing as more Li is taken out, which means that its voltage remains relatively constant as LiCrOF is phaseseparated into \ceCrO2F and \ceLi2CrO2F. The experimentally observed voltage curve Ren et al. (2015); Chen et al. (2016) is more complex than what the test model can capture.
Iv Methods
iv.1 Density Functional Theory Calculations
All of the calculations are performed with the Vienna Ab initio Simulation Package (VASP) Kresse and Hafner (1993); *VASP2; *VASP3; *VASP4 using the projector augmentedwave (PAW) methodBlöchl (1994). The generalized gradient approximation as parametrized by Perdew, Burke and Ernzerhof Perdew, Burke, and Ernzerhof (1996) is used as the exchangecorrelation functional. It is important to have a consistent and accurate dataset (i.e., DFT calculations with high energy cutoff and point mesh density) in order to minimize the numerical noise introduced to the CE training. The planewave cutoff of 500 eV is used, and both the cell and atomic positions are fully relaxed such that all the forces are smaller than 0.02 eV/Å. A rotationally invariant Hubbard U correction Anisimov, Zaanen, and Andersen (1991); Cococcioni and de Gironcoli (2005) is applied to the orbital of Cr with the value of 3.7 eV. The calculations are performed with supercells containing up to 18 and 54 atoms for Au–Cu alloy and \ceLi2CrO2F systems, respectively. Integrations over the Brillouin zone were carried out using the MonkhorstPack scheme Monkhorst and Pack (1976) with a grid with a maximal interval of 0.04 Å.
iv.2 Cluster Expansion Model
The CE model for Au–Cu alloy and LiCrOF are trained using 34 and 390 DFT calculations, respectively. CE model is trained for the entire composition range of Au–Cu alloy (from pure Au to pure Cu) and rocksalt LiCrOF with ranges from 0 to 2. Up to threebody clusters are used in both cases, where the maximum diameter is set to 5.5 Å for Au–Cu alloy and 4.5 Å for LiCrOF. and regularization schemes with the regularization parameter ranging from to are assessed to find the optimal setting that leads to the lowest LOOCV score. For Au–Cu alloy, and regularization schemes yield nearly identical LOOCV score of 6.05 meV/atom and 6.00 meV/atom, respectively. On the other hand, regularization performed better on LiCrOF with the LOOCV score of 26.5 meV/atom^{2}^{2}2Although the LOOCV seems larger compared to the case Au–Cu, it should be taken into account that the cohesive energy of metallic alloys are in general much smaller than those of oxides/oxyflourides..
iv.3 Metropolis Monte Carlo Simulations
For Au–Cu alloy, Metropolis Monte Carlo simulations are carried out using a supercell consisting of 1,000 atoms. At each temperature (i.e., 100 K to 800 K with a 100 K increment), the system is equilibrated with 10,000 MC steps, and an average energy is collected through additional 100,000 moves. A cell consisting of 1,458 atoms is used for LiCrOF. The temperature is gradually lowered from 10,000 K, and the structures are equilibrated at each temperature via 14,580 steps (10 times the number of atoms in the cell) to ensure that the system is equilibrated before sampling. The system is then sampled for more than one million steps. Such procedures ensure that the systems are sufficiently sampled at each composition at different temperatures.
V Acknowledgement
Authors thank valuable discussions with Dr. Juhani Teeriniemi and Prof. Kari Laasonen.
Vi Funding
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 711792 (FETâOPEN project LiRichFCC).
References
 Sanchez, Ducastelle, and Gratias (1984) J. M. Sanchez, F. Ducastelle, and D. Gratias, “Generalized cluster description of multicomponent systems,” Physica A: Statistical Mechanics and its Applications 128, 334–350 (1984).
 Fontaine (1994) D. D. Fontaine, “Cluster Approach to OrderDisorder Transformations in Alloys,” in Solid State Phys., Vol. 47 (1994) pp. 33–176.
 Zunger (1994) A. Zunger, ‘‘FirstPrinciples Statistical Mechanics of Semiconductor Alloys and Intermetallic Compounds,” in Statics and Dynamics of Alloy Phase Transformations, 1993 (1994) pp. 361–419.
 Asta, Ozolins, and Woodward (2001) M. Asta, V. Ozolins, and C. Woodward, “A firstprinciples approach to modeling alloy phase equilibria,” JOM 53, 16–19 (2001).
 van de Walle (2008) A. van de Walle, “A complete representation of structureproperty relationships in crystals,” Nature Materials 7, 455–458 (2008).
 Zhang and Sluiter (2016) X. Zhang and M. H. F. Sluiter, “Cluster Expansions for Thermodynamics and Kinetics of Multicomponent Alloys,” Journal of Phase Equilibria and Diffusion 37, 44–52 (2016).
 Magri and Zunger (1991) R. Magri and A. Zunger, “Realspace description of semiconducting band gaps in substitutional systems,” Physical Review B 44, 8672–8684 (1991).
 Franceschetti and Zunger (1999) A. Franceschetti and A. Zunger, “The inverse bandstructure problem of finding an atomic configuration with given electronic properties,” Nature 402, 60–63 (1999).
 Geng, Sluiter, and Chen (2005) H. Y. Geng, M. H. F. Sluiter, and N. X. Chen, “Cluster expansion of electronic excitations: Application to fcc NiâAl alloys,” The Journal of Chemical Physics 122, 214706 (2005).
 Wang et al. (2015) R. Wang, X. Li, L. Liu, J. Lee, D.H. Seo, S.H. Bo, A. Urban, and G. Ceder, “A disordered rocksalt Liexcess cathode material with high capacity and substantial oxygen redox activity: LiNbMnO,” Electrochemistry Communications 60, 70–73 (2015).
 Abdellahi et al. (2016a) A. Abdellahi, A. Urban, S. Dacek, and G. Ceder, “The Effect of Cation Disorder on the Average Li Intercalation Voltage of TransitionMetal Oxides,” Chemistry of Materials 28, 3659–3665 (2016a).
 Abdellahi et al. (2016b) A. Abdellahi, A. Urban, S. Dacek, and G. Ceder, “Understanding the Effect of Cation Disorder on the Voltage Profile of Lithium TransitionMetal Oxides,” Chemistry of Materials 28, 5373–5383 (2016b).
 Urban et al. (2016) A. Urban, I. Matts, A. Abdellahi, and G. Ceder, “Computational Design and Preparation of CationDisordered Oxides for HighEnergyDensity LiIon Batteries,” Advanced Energy Materials 6, 1600488 (2016).
 Kitchaev et al. (2018) D. A. Kitchaev, Z. Lun, W. D. Richards, H. Ji, R. J. Clément, M. Balasubramanian, D.H. Kwon, K. Dai, J. K. Papp, T. Lei, B. D. McCloskey, W. Yang, J. Lee, and G. Ceder, “Design principles for high transition metal capacity in disordered rocksalt Liion cathodes,” Energy & Environmental Science 11, 2159–2171 (2018).
 Hjorth Larsen et al. (2017) A. Hjorth Larsen, J. Jørgen Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. Dułak, J. Friis, M. N. Groves, B. Hammer, C. Hargus, E. D. Hermes, P. C. Jennings, P. Bjerre Jensen, J. Kermode, J. R. Kitchin, E. Leonhard Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J. Bergmann Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K. S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, and K. W. Jacobsen, ‘‘The atomic simulation environment–a Python library for working with atoms,” Journal of Physics: Condensed Matter 29, 273002 (2017).
 (16) The maximum cluster diameter is defined to be strictly smaller than half the smallest distance between periodic images of all the constituting atoms in the cell.
 van de Walle (2009) A. van de Walle, “Multicomponent multisublattice alloys, nonconfigurational entropy and other additions to the Alloy Theoretic Automated Toolkit,” Calphad 33, 266–278 (2009) .
 van de Walle, Asta, and Ceder (2002) A. van de Walle, M. Asta, and G. Ceder, ‘‘The alloy theoretic automated toolkit: A user guide,” Calphad 26, 539–553 (2002) .
 Urban, Lee, and Ceder (2014) A. Urban, J. Lee, and G. Ceder, “The Configurational Space of RocksaltType Oxides for HighCapacity Lithium Battery Electrodes,” Advanced Energy Materials 4, 1400478 (2014).
 Hewston and Chamberland (1987) T. Hewston and B. Chamberland, “A Survey of firstrow ternary oxides LiMO (M = ScCu),” Journal of Physics and Chemistry of Solids 48, 97–108 (1987).
 Nelson et al. (2013) L. J. Nelson, G. L. W. Hart, F. Zhou, and V. OzoliÅš, “Compressive sensing as a paradigm for building physics models,” Physical Review B 87, 035125 (2013).
 van de Walle and Ceder (2002) A. van de Walle and G. Ceder, “Automating firstprinciples phase diagram calculations,” Journal of Phase Equilibria 23, 348 (2002).
 Urban, Seo, and Ceder (2016) A. Urban, D. H. Seo, and G. Ceder, “Computational understanding of Liion batteries,” npj Computational Materials 2 (2016), 10.1038/npjcompumats.2016.2.
 Seko, Koyama, and Tanaka (2009) A. Seko, Y. Koyama, and I. Tanaka, “Cluster expansion method for multicomponent systems based on optimal selection of structures for densityfunctional theory calculations,” Physical Review B 80, 165122 (2009).
 Seko and Tanaka (2014) A. Seko and I. Tanaka, “Cluster expansion of multicomponent ionic systems with controlled accuracy: importance of longrange interactions in heterovalent ionic systems,” Journal of Physics: Condensed Matter 26, 115403 (2014) .
 Lonie and Zurek (2012) D. C. Lonie and E. Zurek, “Identifying duplicate crystal structures: XtalComp, an opensource solution,” Computer Physics Communications 183, 690–697 (2012).
 Thijssen (2007) J. Thijssen, Computational Physics (Cambridge University Press, 2007).
 Lysgaard et al. (2015) S. Lysgaard, J. S. G. Mýrdal, H. A. Hansen, and T. Vegge, “A DFTbased genetic algorithm search for AuCu nanoalloy electrocatalysts for CO reduction,” Physical Chemistry Chemical Physics 17, 28270–28276 (2015).
 Massalski et al. (1986) T. B. Massalski, J. Murray, L. Bennett, and H. Baker, “Binary alloy phase diagrams. vol. i and ii,” American Society for Metals, 1986, , 2224 (1986).
 Chen et al. (2015) R. Chen, S. Ren, M. Knapp, D. Wang, R. Witter, M. Fichtner, and H. Hahn, “Disordered lithiumrich oxyfluoride as a stable host for enhanced Li intercalation storage,” Advanced Energy Materials 5, 1–7 (2015).
 Chen et al. (2016) R. Chen, S. Ren, X. Mu, E. Maawad, S. Zander, R. Hempelmann, and H. Hahn, “HighPerformance LowTemperature Li + Intercalation in Disordered RockSalt LiCrV Oxyfluorides,” ChemElectroChem 3, 892–895 (2016).
 Cambaz et al. (2016) M. A. Cambaz, B. P. Vinayan, O. Clemens, A. R. Munnangi, V. S. K. Chakravadhanula, C. Kübel, and M. Fichtner, “Vanadium Oxyfluoride/FewLayer Graphene Composite as a HighPerformance Cathode Material for Lithium Batteries,” Inorganic Chemistry 55, 3789–3796 (2016).
 Ren et al. (2015) S. Ren, R. Chen, E. Maawad, O. Dolotko, A. A. Guda, V. Shapovalov, D. Wang, H. Hahn, and M. Fichtner, “Improved Voltage and Cycling for Li Intercalation in HighCapacity Disordered Oxyfluoride Cathodes,” Advanced Science 2, 1500128 (2015).
 Lee et al. (2014) J. Lee, A. Urban, X. Li, D. Su, G. Hautier, and G. Ceder, “Unlocking the Potential of CationDisordered Oxides for Rechargeable Lithium Batteries,” Science 343, 519–522 (2014).
 Mizushima et al. (1981) K. Mizushima, P. Jones, P. Wiseman, and J. Goodenough, “LiCoO (): A new cathode material for batteries of high energy density,” Solid State Ionics 34, 171–174 (1981).
 Van der Ven and Ceder (2004) A. Van der Ven and G. Ceder, “Ordering in Li(NiMn)O and its relation to charge capacity and electrochemical behavior in rechargeable lithium batteries,” Electrochemistry Communications 6, 1045–1050 (2004).
 Armstrong et al. (2006) A. R. Armstrong, M. Holzapfel, P. Novák, C. S. Johnson, S.H. Kang, M. M. Thackeray, and P. G. Bruce, “Demonstrating oxygen loss and associated structural reorganization in the lithium battery cathode Li[NiLiMn]O.” Journal of the American Chemical Society 128, 8694–8 (2006).
 Sathiya et al. (2013a) M. Sathiya, G. Rousse, K. Ramesha, C. P. Laisa, H. Vezin, M. T. Sougrati, M. L. Doublet, D. Foix, D. Gonbeau, W. Walker, A. S. Prakash, M. Ben Hassine, L. Dupont, and J. M. Tarascon, “Reversible anionic redox chemistry in highcapacity layeredoxide electrodes,” Nature Materials 12, 827–835 (2013a).
 Sathiya et al. (2013b) M. Sathiya, K. Ramesha, G. Rousse, D. Foix, D. Gonbeau, A. S. Prakash, M. L. Doublet, K. Hemalatha, and J. M. Tarascon, “High performance LiRuMnO(0.2 y 0.8) cathode materials for rechargeable lithiumion batteries: Their understanding,” Chemistry of Materials 25, 1121–1131 (2013b).
 Lee et al. (2017) J. Lee, J. K. Papp, R. J. Clément, S. Sallis, D. H. Kwon, T. Shi, W. Yang, B. D. McCloskey, and G. Ceder, “Mitigating oxygen loss to improve the cycling performance of high capacity cationdisordered cathode materials,” Nature Communications 8 (2017) .
 Kresse and Hafner (1993) G. Kresse and J. Hafner, “Ab initio molecular dynamics for liquid metals,” Phys. Rev. B 47, 558–561 (1993).
 Kresse and Hafner (1994) G. Kresse and J. Hafner, “Ab initio moleculardynamics simulation of the liquidmetalâamorphoussemiconductor transition in germanium,” Phys. Rev. B 49, 14251–14269 (1994).
 Kresse and Furthmüller (1996) G. Kresse and J. Furthmüller, “Efficiency of abinitio total energy calculations for metals and semiconductors using a planewave basis set,” Computational Materials Science 6, 15 – 50 (1996).
 Kresse and Furthmüller (1996) G. Kresse and J. Furthmüller, “Efficient iterative schemes for ab initio totalenergy calculations using a planewave basis set,” Phys. Rev. B 54, 11169–11186 (1996).
 Blöchl (1994) P. E. Blöchl, “Projector augmentedwave method,” Physical Review B 50, 17953–17979 (1994).
 Perdew, Burke, and Ernzerhof (1996) J. P. Perdew, K. Burke, and M. Ernzerhof, “Generalized Gradient Approximation Made Simple,” Physical Review Letters 77, 3865–3868 (1996).
 Anisimov, Zaanen, and Andersen (1991) V. I. Anisimov, J. Zaanen, and O. K. Andersen, “Band theory and Mott insulators: Hubbard U instead of Stoner I,” Physical Review B 44, 943–954 (1991) .
 Cococcioni and de Gironcoli (2005) M. Cococcioni and S. de Gironcoli, “Linear response approach to the calculation of the effective interaction parameters in the LDA+U method,” Physical Review B 71, 035105 (2005) .
 Monkhorst and Pack (1976) H. J. Monkhorst and J. D. Pack, “Special points for Brillouinzone integrations,” Physical Review B 13, 5188–5192 (1976).
 (50) Although the LOOCV seems larger compared to the case Au–Cu, it should be taken into account that the cohesive energy of metallic alloys are in general much smaller than those of oxides/oxyfluorides.
Supplementary Information
Appendix A Cluster Expansion Formalism
The core concept of the cluster expansion is to express the scalar physical quantity of a material, , to its configuration, , where a crystalline system is represented with a fixed underlying grid of atomic sites. In such a representation, any configuration with the same underlying topology can be completely specified by the atomic occupation of each atomic site. For the case of a crystalline material with atomic sites, any configuration can be specified by an dimensional vector , where is a site variable that specifies which type of atom occupies the atomic site (also referred to as an occupation variableZarkevich and Johnson (2004); Meng and Arroyode Dompablo (2009); van de Walle (2009) or pseudospinFontaine (1994); Nelson et al. (2013a, b); Seko and Tanaka (2014); Magri and Zunger (1991)). It is noted that the terms configuration and structure are often used interchangeably.
For the case of multinary systems consisting of different atomic species, takes one of distinct values. The original formulation of Sanchez et al.Sanchez, Ducastelle, and Gratias (1984) specifies the to take any values from , , , for ^{3}^{3}3For the case where there is an odd number of element types, an additional value of 0 should be included in the possible values of , and the relation between and becomes .. Other choices of are also commonly used such as values ranging from to by van de Wallevan de Walle (2009) and from to by Mueller and CederMueller and Ceder (2010). Based on the original formalism by Sanchez et al., singlesite basis functions are determined through an orthogonality condition
(S1) 
where is the th singlesite basis function (e.g., Chebyshev polynomials) for th site and is a Kronecker delta.
The configuration is decomposed into a sum of clusters. Each cluster has a set of associated cluster functions, which are defined as
(S2) 
where and are vectors specifying the order of the singlesite basis function and the site variables in the cluster, respectively. and specify the th element of the respective vectors. The use of orthogonal basis functions guarantees that the cluster functions defined in Eq. S2 are also orthogonal. The symmetrically equivalent clusters are classified as the same cluster, and the collection of all symmetrically equivalent clusters are denoted with an .
The average value of the cluster functions in cluster is referred to as a correlation function, . The physical quantity, , normalized with the number of atomic sites is then expressed as
(S3) 
where is the multiplicity factor indicating the number of cluster per atom and is the effective cluster interaction (ECI) per occurrence, which needs to be determined. It is noted that the cluster includes the cluster of size zero, which have . Alternatively, Eq. S3 can be written in a more explicitly form,
(S4) 
where is the ECI of an empty cluster while in this case corresponds to the clusters of size one and higher. It is often more practical and convenient to express the ECI per atom rather than per occurrence Zhang and Sluiter (2016), in which case and are combined into one term, and Eq. S3 becomes
(S5) 
CLEASE uses the ECI per atom (), but interested users can determine the value of based on the values of and .
Theoretically, there is an infinite number of terms in Eq. S5, and the resulting expression can represent any scalar function given that appropriate ECI values are found. In practice, sufficient accuracy is often reached with clusters with small number of atoms (e.g., one, two and threebody clusters) that are relatively compact in size (e.g., 5 to 7 Å in diameter).
Appendix B Cluster Selection & Determination of ECI Values
A crucial element of CE approach is to select relevant clusters from a theoretically infinite number of possible clusters. Many multicomponent systems yield thousands of clusters even when the expansion is limited to relatively compact size and small number of atoms, and they are vastly truncated since only a small fraction of them is needed to achieve the required accuracy. Determining the optimal set of clusters that minimizes the number of clusters without losing its predictive power has been a topic of keen interest in the past decade Zarkevich and Johnson (2004); Blum et al. (2005); Hart et al. (2005); DíazOrtiz, Dosch, and Drautz (2007); Lerch et al. (2009), and the cluster selection based on genetic algorithms Blum et al. (2005); Hart et al. (2005); Lerch et al. (2009) was considered to be the most robust method.
More recently, the use of compressive sensing Nelson et al. (2013b, a) was proposed to efficiently select the clusters and determine their ECIs in one shot. The compressive sensing is based on norm (a special case of norm where ), which is defined as
(S6) 
where is a vector quantity. It is noted that cluster expansion defined in Eq. S5 is in the same form as a linear regression model in statistics and machine learning. Therefore, one can treat CE as a linear regression problem and apply regularization techniques based not only on norm but also on any other values, although and norms are most commonly used.
The use of regularization techniques for CE can be illustrated by expressing Eq. S5 in a matrix form,
(S7) 
is a matrix containing the correlation functions of the training data where each element in row and column is defined as
(S8) 
is a column vector in which the th element is the physical quantity of the configuration and is a column vector in which th element is .
The simplest way of determining is by using ordinary least squares (OLS) method, which minimizes the residual sum of squared errors (RSS). RSS is defined as
(S9) 
and the minimization of the RSS has a unique solution where
(S10) 
The OLS has two major drawbacksNelson et al. (2013b). The first is the requirement on which the number of configurations in the training set needs to be larger than the number of clusters being considered. The matrix becomes singular in such a case, and the limitations imposed by the first requirement become more severe for systems consisting of many element types since even strict expansion conditions (i.e., small number of atoms per cluster and compact size) can lead to a large number of clusters. The second drawback is the susceptibility to possible overfitting, which refers to the conditions in which the ECI values are overtuned to accurately represent of the training set at a cost of losing its predictive power for the new configurations that are not included in the training set. The overfitting also makes the model prone to noise present in the training data because the model attempts to meticulously fit the model to the training data including the noise therein.
Regularization is an efficient technique to address the aforementioned drawbacks of OLS by adding a regularization term to Eq. S10. The most common regularization scheme are and regularization, which respectively uses and norm as a regularization term. For regularization, the solution becomes
(S11) 
where is a regularization parameter that controls the weight given to the regularization term. The main benefit of regularization is its promotion of sparsity. In context of CE, the sparsity means a selection of a fewer number of clusters, or many clusters with their ECI values set to zero. It is noted that there is no unique analytical solution for Eq. S11, and it needs to be solved iteratively. Unlike regularization, regularization has a unique analytical solution which is expressed as
(S12) 
However, regularization does not promote sparsity, and the resulting solution is likely to contain more clusters than necessary.
Regardless of the fitting technique used, the predictive power of the expansion needs to be assessed to determine its accuracy and reliability. Crossvalidation (CV) is a technique used for assessing the prediction accuracy of the model. A leaveoneout (LOO) scheme is most commonly used in CE community, and the LOOCV score is defined as
(S13) 
where is the number of configurations in the training set, is the physical quantity of a structure predicted by CE using structures without a structure and is the calculated physical quantity of structure . While OLS has only one (likely overfitted) solution, and regularization schemes have a solution for each value. The solution — a selection of clusters and their ECI values — that yields the lowest LOOCV score is chosen.
Appendix C Thermodynamics in Lattice Models
The true benefit of CE is in its ability to predict the expanded scalar quantity based on trained data. An accurate prediction can be made if the CV score of the expanded is sufficiently low, and the prediction speed is very fast on modern computer architecture since it only involves executions of only a small number of simple numerical calculations specified in Eq. S5. Such a speed boost allows one to conduct types of analyses that require substantial statistical sampling.
In contrast to zero temperature studies where the system occupies the state with lowest energy, configurations with the lowest free energy are occupied at finite temperature. The free energy ^{4}^{4}4DFT energies are obtained for fully relaxed structures without any external forces or pressure. Thus, the resulting thermodynamic quantities are effectively obtained in the NPT ensemble (fixed number of particles, fixed pressure and fixed temperature). However, the energy predicted by CE is only valid for the volume leading to the minimum energy of a particular atomic arrangement, and the volume fluctuations are neglected. is given by Andersen (2012)
(S14) 
where and is the partition function. is the Boltzmann constant and is temperature in Kelvin. In order to calculate the free energy, we utilize the exact differentials
(S15) 
where is the average internal energy. The free energy can be obtained by a thermodynamic integration from a reference temperature where is known, which can be written as Tuckerman (2010)
(S16) 
Important information of the materials under study such as the stability of ordered/disordered phases can be determined by comparing the free energy of the material at a given composition with respect to the free energies in the pure phases of its constituents.
References
 Zarkevich and Johnson (2004) N. A. Zarkevich and D. D. Johnson, ‘‘Reliable FirstPrinciples Alloy Thermodynamics via Truncated Cluster Expansions,” Physical Review Letters 92, 255702 (2004).
 Meng and Arroyode Dompablo (2009) Y. S. Meng and M. E. Arroyode Dompablo, “First principles computational materials design for energy storage materials in lithium ion batteries,” Energy & Environmental Science 2, 589 (2009).
 van de Walle (2009) A. van de Walle, “Multicomponent multisublattice alloys, nonconfigurational entropy and other additions to the Alloy Theoretic Automated Toolkit,” Calphad 33, 266–278 (2009) .
 Fontaine (1994) D. D. Fontaine, “Cluster Approach to OrderDisorder Transformations in Alloys,” in Solid State Phys., Vol. 47 (1994) pp. 33–176.
 Nelson et al. (2013a) L. J. Nelson, V. OzoliÅš, C. S. Reese, F. Zhou, and G. L. W. Hart, “Cluster expansion made easy with Bayesian compressive sensing,” Physical Review B 88, 155105 (2013a).
 Nelson et al. (2013b) L. J. Nelson, G. L. W. Hart, F. Zhou, and V. OzoliÅš, “Compressive sensing as a paradigm for building physics models,” Physical Review B 87, 035125 (2013b).
 Seko and Tanaka (2014) A. Seko and I. Tanaka, “Cluster expansion of multicomponent ionic systems with controlled accuracy: importance of longrange interactions in heterovalent ionic systems,” Journal of Physics: Condensed Matter 26, 115403 (2014) .
 Magri and Zunger (1991) R. Magri and A. Zunger, ‘‘Realspace description of semiconducting band gaps in substitutional systems,” Physical Review B 44, 8672–8684 (1991).
 Sanchez, Ducastelle, and Gratias (1984) J. M. Sanchez, F. Ducastelle, and D. Gratias, “Generalized cluster description of multicomponent systems,” Physica A: Statistical Mechanics and its Applications 128, 334–350 (1984).
 (60) For the case where there is an odd number of element types, an additional value of 0 should be included in the possible values of , and the relation between and becomes .
 Mueller and Ceder (2010) T. Mueller and G. Ceder, “Exact expressions for structure selection in cluster expansions,” Physical Review B 82, 184107 (2010).
 Zhang and Sluiter (2016) X. Zhang and M. H. F. Sluiter, “Cluster Expansions for Thermodynamics and Kinetics of Multicomponent Alloys,” Journal of Phase Equilibria and Diffusion 37, 44–52 (2016).
 Blum et al. (2005) V. Blum, G. L. W. Hart, M. J. Walorski, and A. Zunger, ‘‘Using genetic algorithms to map firstprinciples results to model Hamiltonians: Application to the generalized Ising model for alloys,” Physical Review B 72, 165113 (2005).
 Hart et al. (2005) G. L. W. Hart, V. Blum, M. J. Walorski, and A. Zunger, “Evolutionary approach for determining firstprinciples hamiltonians,” Nature Materials 4, 391–394 (2005).
 DíazOrtiz, Dosch, and Drautz (2007) A. DíazOrtiz, H. Dosch, and R. Drautz, “Cluster expansions in multicomponent systems: Precise expansions from noisy databases,” Journal of Physics Condensed Matter 19 (2007), 10.1088/09538984/19/40/406206.
 Lerch et al. (2009) D. Lerch, O. Wieckhorst, G. L. W. Hart, R. W. Forcade, and S. Müller, “UNCLE: a code for constructing cluster expansions for arbitrary lattices with minimal userinput,” Modelling and Simulation in Materials Science and Engineering 17, 055003 (2009).
 (67) DFT energies are obtained for fully relaxed structures without any external forces or pressure. Thus, the resulting thermodynamic quantities are effectively obtained in the NPT ensemble (fixed number of particles, fixed pressure and fixed temperature). However, the energy predicted by CE is only valid for the volume leading to the minimum energy of a particular atomic arrangement, and the volume fluctuations are neglected.
 Andersen (2012) J. O. Andersen, Introduction to statistical mechanics (Akademica Publishing, 2012).
 Tuckerman (2010) M. E. Tuckerman, Statistical mechanics: theory and molecular simulation (Oxford University Press, 2010).