Bayesian optimization of chemical composition: a comprehensive framework and its application to Fe-type magnet compounds
We propose a framework for optimization of the chemical composition of multinary compounds with the aid of machine learning. The scheme is based on first-principles calculation using the Korringa-Kohn-Rostoker method and the coherent potential approximation (KKR-CPA). We introduce a method for integrating datasets to reduce systematic errors in a dataset, where the data are corrected using a smaller and more accurate dataset. We apply this method to values of the formation energy calculated by KKR-CPA for nonstoichiometric systems to improve them using a small dataset for stoichiometric systems obtained by the projector-augmented-wave (PAW) method. We apply our framework to optimization of Fe-type magnet compounds (RZ)(FeCo)Ti, and benchmark the efficiency in determination of the optimal choice of elements (R and Z) and ratio (, and ) with respect to magnetization, Curie temperature and formation energy. We find that the optimization efficiency depends on descriptors significantly. The variable , and the number of electrons from the R and Z elements per cell are important in improving the efficiency. When the descriptor is appropriately chosen, the Bayesian optimization becomes much more efficient than random sampling.
Machine learning is attracting much attention these days, and its application to data obtained by first-principles calculation is a promising way to accelerate the exploration of novel materials. The basic idea is as follows: (i) introduce a numerical representation for the materials, which is called descriptor, (ii) calculate a property for materials from the search space of the descriptor by first-principles calculation, and (iii) infer a relation between and from thus obtained data by modeling . Many efforts have been made to develop models and descriptors that work in materials discovery.Rupp et al. (2012); Ghiringhelli et al. (2015); Seko et al. (2015); Pham et al. (2016); Lam Pham et al. (2017); Dam et al. (2018); Pham et al. (2018); Ouyang et al. (2018); Seko et al. (2018) These models can be used to identify promising candidates by predicting the property for unknown materials . It is also possible to perform the modeling and the sampling alternately to obtain the optimal as quickly as possible, which is called optimization.
Bayesian optimization (BO) is a powerful technique to find the maximum (or the minimum) of an unknown function along this idea. It is based on Bayesian modeling using a dataset collected in the previous sampling–modeling iterations, and does not require explicit form of the function . This method is efficient because it takes account of the uncertainty of a model in addition to the mean value. Figure 1 illustrates a typical situation in which BO is efficient. The dashed line is the true model. Suppose we have four sampled points which are denoted by black circles. By Bayesian modeling, we obtain the mean value (solid line) and the uncertainty (gray region). In this situation, the mean value does not give good prediction for the highest-score point. However, by considering the information of the uncertainty, one can find a significant probability that the true model has the maximum between the two rightmost data points.
BO has been recently applied in various problems in materials science.Ueno et al. (2016); Ju et al. (2017); Kikuchi et al. (2018) It has also a potential for application to optimization of a chemical composition,Yuan et al. (2018) but there was no reports on quantitative estimation of efficiency in such a problem avoiding possible overestimation by mere luck to our knowledge. In such applications, we need to properly choose a search space, a descriptor for the candidate systems, and a score to describe the performance that are suitable for the problem. Otherwise, the efficiency of the scheme is deteriorated. A descriptor—a form to which the input data is encoded—is especially crucial.Oganov and Valle (2009); Ghiringhelli et al. (2015); Lam Pham et al. (2017); Pham et al. (2018); Dam et al. (2018); Ouyang et al. (2018)
Accuracy of the first-principles calculation is also of great importance in the computer-aided materials search. However, conventional methods are often insufficient to achieve enough accuracy while sophisticated schemes are too much time-consuming for the purpose. For example, the magnetic transition temperature is overestimated in the mean-field approximation. Systematic errors also come in from numerical factors, such as a limited number of basis functions.
It is one of promising ideas to improve the data by using a smaller dataset from more accurate calculations or experiments.Kennedy and O’Hagan (2000); Pilania et al. (2017) This idea is also seen in the notion of transfer learning, which uses referential datasets that are different from the target dataset, and transfer the knowledge from the reference to the target.Pan et al. (2010); Hutchinson et al. (2017) However, there is no method that works for any purposes, and we need to devise a method that is suitable for each of the problems on the basis of knowledge about the origin of the error.
In this paper, we propose a practical framework for optimizing nonstoichiometric composition of multinary compounds based on Bayesian optimization and first-principles calculation. We perform a benchmark of our scheme and discuss its efficiency in the optimization using a dataset obtained by first-principles calculation with the KKR-CPA method for systems with nonstoichiometric compositions. We investigate the performance of descriptors, and discuss how we can choose an efficient one for problems. To set up a pragmatic problem, we deal with an issue from the cutting-edge of the materials study on hard magnets. We also present a method for correcting systematic errors in the formation energy by using smaller but more accurate dataset. Our idea is to construct a model of errors on the basis of our understanding of it.
This paper is organized as follows: in section II, we describe our problem setup for the benchmark, providing the background. In the first part of Section III, we present a brief summary of the whole framework. We then provide details of Bayesian optimization and the first-principles calculation in the subsequent subsections. Section III.3 is devoted to the method for integrating datasets that we use to improve the formation energy. We present the results of the benchmark and the data integration in Section IV. Finally, we conclude the paper with a summary in Section V.
Ii Problem setup and its background
Fe-type compounds having the ThMn structure have been considered as a possible main phase of a strong hard-magnet because they are expected to have high magnetization due to its high Fe content, and to have high magnetocrystalline anisotropy if the element is properly chosen.Ohashi et al. (1988a, b); Yang et al. (1988); Verhoef et al. (1988); De Mooij and Buschow (1988); Buschow (1988); Jaswal et al. (1990); Coehoorn (1990); Buschow (1991); Sakurada et al. (1992); Sakuma (1992); Asano et al. (1993); Akayama et al. (1994); Kuz’min et al. (1999); Gabay et al. (2016); Körner et al. (2016); Ke and Johnson (2016); Fukazawa et al. (2017, 2019) The magnetic properties of NdFeN was evaluated theoretically a few years ago Miyake et al. (2014), and its high magnetization and anisotropy field were confirmed by a successful synthesis of NdFeN film.Hirayama et al. (2015a, b)
Unfortunately, NdFeN and its mother compound NdFe are thermally unstable. They cannot be synthesized as a bulk without substituting another element for a part of the Fe elements. Titanium is typical of such a stabilizing element.Yang et al. (1991a, b) Introduction of Ti, however, reduces the magnetization significantly. Co also has a potential to work as a stabilizing element according to a prediction by first-principles calculation.Harashima et al. (2016) Compared to Ti, Co is favorable in terms of magnetization. In fact, a recent experiment on Sm(FeCo) film showed that it has superior saturation magnetization and anisotropy field to NdFeB, the main phase of the current strongest magnet.Hirayama et al. (2017) Chemical substitution at the site also affects structural stability. Zirconium has been attracted attention as a stabilizing element at the rare-earth site.Suzuki et al. (2014); Sakuma et al. (2016); Kuno et al. (2016); Suzuki et al. (2016) Recent first-principles calculation predicted that Dy also works as a stabilizer.Harashima et al. (2018) Therefore, optimization of chemical composition of Fe-type compounds in terms of stability and magnetic properties is an important issue for the development of next-generation permanent magnets.
Bearing these in mind, we set Fe-type magnet compounds as target systems. Especially speaking, we optimize chemical formula of (RZ)(FeCo)Ti (R = Y, Nd, Sm; Z = Zr, Dy) so that it maximizes magnetization, the Curie temperature, or minimizes the formation energy from the unary systems in the benchmark. Therefore, the problem is a combination of optimization with respect to the compositions (, and ) and optimization with respect to the choice of elements for R and Z. We discuss the efficiency of the optimization by comparing a number of iterations required in Bayesian optimization with that in random sampling. We also study how the efficiency is affected by the choice of descriptor.
Figure 2 shows the workflow in our optimization framework. At the beginning of the scheme, the user prepares a list of candidate compounds. The candidates are expressed in the form of a descriptor. In this study, we prepare 11 types of descriptors for the (RZ)(FeCo)Ti (R=Y, Nd, Sm; Z=Zr, Dy) systems, which we discuss in Section III.1.
Then, the candidate list is passed to the optimizer. The role of the optimizer is to pick one system from the candidate list so that a system with a high score is quickly found in the whole scheme. Because it does not have enough data to perform Bayesian optimization at the beginning, it randomly chooses a system from the list. It receives a feedback from a scorer later in the scheme, and record it. When the record reaches a certain size, the sampling method is switched to Bayesian optimization. To cover the role of the optimizer, we use a Python module called “COMmon Bayesian Optimization library” (COMBO).Ueno et al. (2016); COM () We present the parameters used in our benchmark in Section III.1.
In the next stage, a quantum simulator calculates physical properties for the system chosen, which is the most time-demanding process in the scheme. The details in our simulation is described in Section III.2.
Then, the scorer integrates the calculated properties to a score. It also performs improvement of the estimated values by using the referential data before generating the score. In our application, we use the value of magnetization, Curie temperature or the formation energy as a score. As for the formation energy, we improve it by the method for integrating datasets presented in Section III.3. The preparation of the referential data is described in Section III.2. The score is fed back to the optimizer to increase the size of the data used in Bayesian optimization.
The iteration loop is repeated until the number of iterations reaches a criterion. Otherwise, the workflow goes back to the optimizer. After the loop has ended, the candidate with the best score found in the iterations is output.
iii.1 Bayesian optimization
As mentioned above, we use COMBOUeno et al. (2016); COM () to cover the role of the optimizer in Fig. 2. We use Thompson sampling as a heuristic to the exploration–exploitation problem in optimization. The dimension of the random feature maps, which determines the degree of approximation for the Gaussian kernel, is set to 5000. The first 10 samples are chosen randomly without using Bayesian optimization. The number of iterations is set to 100, including the first 10 iterations with the random sampling.
The candidate list consists of (RZ)(FeCo)Ti systems for all the combination of R=Y, Nd, Sm; Z=Zr, Dy; ; ; . There are duplication in the list [e.g. YZrFe and YDyFe], and the number of the unique items is 3245 out of the systems.
We use 11 different sets of descriptors listed in Table 1. The descriptors consist of the number of electrons per cell (), the number of electrons from the R element per cell (), the number of electrons from the Z element per cell (), (), the number of electrons from the transition elements—namely Fe,Co,Ti—per cell (), the atomic number of the R element (), the atomic number of the Z element (), an index for the R element ( = 0, 1, 2 corresponding to R = Y, Nd, Sm), an index for the Z element ( = 0, 1 corresponding to Z = Zr, Dy), the Z content per cell (), the Co/(Fe+Co) ratio (), the Ti content (), and the values of , , , and when the chemical formula is expressed in the form of (YNdSmZrDy) (FeCo)Ti.
|#3||, ,||#8||, , ,|
|#4||, , ,||#9||, , , ,|
|#5||, , ,||#10||, , , ,|
|#6||, , , ,||#11||, , , , ,|
iii.2 First-principles calculation
In the “Simulator” block in Fig.2, we perform first-principles calculation based on density functional theory with the local density approximation.Hohenberg and Kohn (1964); Kohn and Sham (1965) We use the open-core approximationJensen and Mackintosh (1991); Richter (1998); Locht et al. (2016) to the f-electrons in Nd, Sm and Dy and apply the self-interaction correction.Perdew and Zunger (1981)
We assume the ThMn structure (Fig. 3) for the (RZ)(FeCo)Ti systems. The lattice parameters are determined by linear interpolation from those for RFe, RFeTi, ZFe, ZFeTi and RCo. These values for the stoichiometric systems were calculate with the PAW methodBlöchl (1994); Kresse and Joubert (1999) using a software package QMASQMA (). We use the PBE exchange-correlation functionalPerdew et al. (1996) of the Generalized Gradient Approximation (GGA) to obtain adequate structures. The values for ZFeTi and RCo are presented in Appendix A. Those for RFe, RFeTi and ZFe are in Refs.Fukazawa et al. (2018); Harashima et al. (2015a, 2018).
In the treatment with the coherent potential approximation (CPA)Soven (1967, 1970); Shiba (1971), we assume quenched randomness for random occupation of the elements: the element R and Z are assumed to occupy the 2a site (see Fig. 3 for the Wycoff positions). Titanium is assumed to occupy the 8i site. Iron (cobalt) is assumed to occupy the 8f, 8i and 8j site with a common probability of () to these sites.
We calculate the magnetization, the Curie temperature, and the formation energy from the unary systems. We use the raw value of magnetization from KKR-CPA. To estimate the Curie temperature, we calculate intersite magnetic couplings by using Liechtenstein’s method,Liechtenstein et al. (1987) and convert them to the Curie temperature using the mean-field approximation.Fukazawa et al. (2018) Although this procedure overestimates the Curie temperature, we can expect from previous results that it can capture materials trend because theoretical values within the mean-field approximation had a significant correlation with experimental Curie temperatures.Fukazawa et al. (in press)
As for the formation energy, KKR needs too large computational resources to obtain an accurate energy difference between systems when they have far different structures from each other. We use the method that we describe in the following subsection to correct the energy difference calculated by KKR-CPA with referential data of total energy obtained by PAW.
iii.3 A method for integration datasets
Let us consider the formation energy from the unary systems defined as follows:
where “(the unary systems)” is defined as
and denotes the total energy of the system in the square bracket. Because the structures of (RZ)(FeCo)Ti and each of the unary systems are much different from one another, it is not efficient to directly calculate this formation energy with the KKR method, although it can deal with non-stoichiometric systems with CPA. Our idea is to calculate the formation energy of stoichiometric systems more accurately by another method, and use calculated energies as reference data.
We construct a stochastic model for the total energy of (RZ)(FeCo)Ti based on the expectation that the smaller structural difference two systems have, the more accurate energy difference KKR-CPA gives. To quantify the difference of systems, we consider a descriptor with which the difference between the systems ( and ) is well-described by the distance (). Let us denote the reference systems in the form of the descriptor by where is the number of the reference systems. The descriptor here does not have to be identical to the descriptor used in the Bayesian optimization. In the demonstration, we use a set of with which the search space can be expressed as (RZ)(FeCoTi) irrespective of the choice of the descriptor in the optimization.
For each of the reference points , we construct a stochastic model for the total energy of a system (see also the graphs outside the box in Fig. 4):
The two ’s in the right-hand side (to which primes are attached) are the total energy calculated with KKR-CPA, whereas in the left-hand side is evaluated by a more accurate method, for which we use PAW in the present work. is a random variable whose distribution is , i.e. the normal distribution whose mean is zero and variance is , where and is a parameter we will estimate later. This model describes the expectation that the deviation of the energy difference (the first two terms in the right-hand side) from the true difference (the left-hand side) tends to be large when the difference, , is large. The graphs outside the box in Fig. 4 depict how behave: there are three models corresponding to the three reference systems ( ,, ), and the error in each model is large when the distance of the reference point from is large.
We then integrate these models, , into a single model by imposing the following condition to the distribution of :
This condition can be rewritten as follows:
Therefore, the conditional probability distribution of is
where denotes the probability distribution function of . It is straightforward to see that this conditional distribution is the normal distribution whose mean and variance are
where is a weight and is a normalization factor that are defined as
In another expression, our integrated model is
where is a random variable whose distribution is . Algorithm 1 summarizes the construction of the model explained so far in a form of a pseudocode. Note that the value for the input variable has not yet been determined. The characteristics of is illustrated in the right-bottom panel in Fig. 4. Although is singular at , it is easy to see that this is removable and .
The formation energy from the unary systems can be calculated as , where is calculated by an accurate method.
To complete the formulation, we discuss estimation of based on the data for the reference systems. We use the maximum likelihood estimation. However, it cannot be directly applied to our model because the distribution of becomes the delta function in the limit of , regardless of the value of .
To avoid this singularity, we consider a model that is constructed from all reference systems but . Figure 5 depicts the construction of such a model. Now, we consider the probability of at . Regarding the probability for as a likelihood , we select the value of that maximizes . We then obtain
In the actual application, we calculate with the KKR-LDA method and with the PAW-GGA method. We need to calibrate with a linear term because the difference in the treatment of the core electrons is another major source of error in the total energy, which is described well by a linear function. We deal with it by extending our model to include an adjustable linear term, and use it in the actual calculation. We discuss this extension of the model in Appendix B.
Iv Results and Discussion
iv.1 Integration method
We show how the integration model explained in Section III.3 works before we present the benchmark of the whole scheme in the next subsection. Let us take a simple example first: we consider a one dimensional function as a true model. Assume that we have many data about , which is an approximate function and actually obey . Because their derivatives differ from each other by 2, the difference deviates from roughly by . The assumption of the model described in Section III.3 was that the error in is large when is large. This is correct in the magnitude of the errors, but there is a bias in its sign. In this example, we prepare a dataset of for to with an interval of 0.02. We choose five reference points () from the -values, and prepare a dataset of .
The result of the prediction is shown in Fig. 6. The original points of are shown as points in cyan. The purple points show the mean value of the model; the light purple region shows the values of the standard deviation. The green curve shows , the true model, and the green circles are the reference points used in the prediction.
The mean values, which may be used as values of a point estimation, are improved from the original data in most of the region. The error region also covers the line of the true model except in the region .
It is noteworthy that the bias in the sign of the error mentioned above makes the uncertainty of the model large. This would be improved if we assume an asymmetric distribution for in Eq. (3). However, the resultant form of the integrated model becomes more complicated.
Next, let us see the results of the formation energy obtained by this method. Figure 7 shows the formation energy of (SmZr)(FeCo)Ti from the unary systems for (left) and (right). The mean values are shown in the top two figures, and the standard deviations are shown in the bottom two figures. In each figures, the horizontal axes show values of the Co/(Fe+Co) ratio (), and the vertical axes show values of the Zr concentration ().
The standard deviation of the model is zero at the corners of the figure in (left bottom) because they correspond to the reference points, namely SmFe, ZrFe, SmCo and ZrCo. Therefore, the mean values at the corners of the left top figure are identical with the reference data (obtained by PAW-GGA). Because the other reference points are SmFeTi and ZrFeTi (both at , ), the contours are placed slightly to the right in . This is more obvious in the plot for (the right bottom panel).
Although we used the mean values () in the Bayesian optimization in Section IV.2, it is important to check how the model has uncertainty in its prediction, which is typically represented by the standard deviation : the model needs more reference points when is too large compared with the value of . One can also make use of the upper (lower) confidence bound ()—where is a positive adjustable parameter—instead of to take account of the uncertainty in a pessimistic (optimistic) manner.
iv.2 Bayesian Optimization
In this subsection, we show the performance of the Bayesian optimization and discuss the results. Figure 8 shows one of the optimization processes with respect to magnetization using the descriptor #8.
The highest magnetization found in the first iterations, , is plotted against the number of iterations, . In this run, the highest in all the 3630 systems was found at 30th iteration where we define the zeroth iteration as that with the first sample. This process depends on a random sequence which is used in the sampling from the candidate lists. In order to take statistical profiles, we repeat the optimization scheme (which we call a session) 1,000 times.
To analyze the efficiency as a function of the number of iterations, we consider a cumulative distribution that is defined as the number of the sessions in which a system with a higher score than is found in the first iterations. We show a plot of in Bayesian optimization of magnetization using the descriptor #8 as the left figure in Fig. 9, where the horizontal axis shows the number of iterations, , and the vertical axis shows the score variable, . We also show a plot of the cumulative probability in the random sampling that is analytically obtained at the right-hand side. Because we took an enough number of sessions, there is negligible difference between the two figures for the first 10 iterations. The efficiency in the left figure suddenly becomes improved when the Bayesian optimization is switched on.
Figure 10 shows the success rate of finding the systems with the top 10 values of the target properties—magnetization (), Curie temperature () and formation energy from the unary systems ()—within 50 steps. The results with Bayesian optimization (BO) are compared with the search by the random sampling (RS). The numbers with # in the figure denote the descriptors listed in Table 1. We find that the efficiency significantly depends on the choice of the descriptor. It is obvious from the figure that the descriptors #7–11 are very efficient in Bayesian optimization and much superior to those with #1–6 and the random sampling. This clearly shows that and are important factors because the descriptors #7–11 differs from #2–6 only by and used instead of .
This example demonstrates how we can incorporate domain knowledge into the machine learning. It is known that the magnetization of ferromagnetic random alloys of Fe and another transition metal is usually a well-behaved function of the number of the valence electrons. It is called Slater-Pauling curve. This curve can be reproduced well by first-principles calculation with CPADederichs et al. (1991), and so is the Slater-Pauling-like curve for the Curie temperature.Takahashi et al. (2007) These effects have also been observed in ThMn-type compounds experimentally,Buschow (1991); Hirayama et al. (2017) and explained theoretically.Miyake et al. (2014); Harashima et al. (2015b, 2016); Fukazawa et al. (2018); Harashima et al. (2018) On the basis of these previous researches, we were able to expect that including [the Co/(Fe+Co) ratio] and (the Ti content) into the descriptor would improve the efficiency of the search in advance.
However, we also find that and alone do not work as an efficient descriptor. The results of the Bayesian optimization with using , , and the pair of them as descriptors are also shown in Fig. 10. Those success rates are significantly lower than the rates with the descriptors #7–11. This is because there are 66[= 3 (for R) 2 (for Z) 11 (for )] candidates that has common values of and on the list, and 50 steps are not enough to obtain an adequate Bayesian model and draw one of the top 10 systems by chance. It is noteworthy that the efficiency of the Bayesian optimization is largely improved by adding only as in the descriptor #7.
Figure 11 shows the 95% (950 sessions) contour of obtained with the descriptors #1–11. The best 95% of the data points are laid above those curves.
The panels show results with the random search and the Bayesian optimization with the descriptors #1–11. As shown in these graphs, the efficiency of the search depends much on the target property to optimize. The descriptors #7–11, which are with and , has a satisfying efficiency, with which even 20 iterations are enough to obtain a nearly best score regardless of the target property. The situation is quite different with the descriptors #1–6. In the optimization of the Curie temperature, these descriptors works efficiently. However, the optimizations of the magnetization and the formation energy progress even more slowly than the random sampling.
This difference between the Curie temperature and the other targets is also seen in Fig. 12, where the pair of and is used as descriptors. The pair descriptor works as efficient as the descriptors #7–11 in the optimization of the magnetization and the formation energy. However, discernible difference in efficiency exists in the optimization of the Curie temperature. This suggests importance of information about elements at the 2a site (R and Z), which is consistent with Dam et al’s observation that the concentration of rare-earth elements is important in explaining the Curie temperature of binary alloys composed of a rare-earth element and a 3d transition-metal.Dam et al. (2018)
Dependency of the search efficiency on the the dimension of the descriptor is also noteworthy. When the dimension of a descriptor is large, the descriptor can accommodate a large search space on the one hand. However, modeling tends to be difficult with a higher dimensional space, which is referred as “curse of dimensionality”, on the other hand. Note that we have descriptors with 4 different dimensions in the groups of the descriptors #1–6 and #7–11. Figure 11 shows that the dimension has only a minor effect on the efficiency. This dependence is magnified by the more stringent criterion of the top-10 benchmark as shown in Fig. 10, especially in the results with the descriptor #1–6. However, the difference among the descriptor #7–11 are still subtle. Therefore, we expect that it would be safe to include six or a little more variables into the descriptor when we optimize for another target property.
In this paper, we presented a machine-learning scheme for searching high-performance magnetic compounds. Our scheme is based on Bayesian optimization, and has a much higher efficiency than random sampling. We demonstrated its efficiency by taking the example of optimization of magnetization, Curie temperature and formation energy for the search space of magnet compounds having the ThMn structure. One of the typical results is the success rate of finding the top 10 systems with the highest properties when 50 systems are sampled from a candidate list of 3630 systems (Fig. 10). The success rate is more than 90% with our scheme when the descriptor is appropriately chosen while it is approximately 10% in the random sampling.
The efficiency is maximized when we include the Co content (), the Ti content (), and the information of the R and Z elements (e.g. ) into the descriptor. This improvement is what we could expect from the previous studies of magnet compounds. We stressed that it is important to incorporate domain knowledge into the choice of a descriptor. We also discussed how many variables a descriptor can accommodate without deteriorating the search efficiency. Although an excessive addition of variables to the descriptor can lose the efficiency of the search, there was not a significant loss when we doubled the dimension of the optimal descriptor.
We also proposed an integration scheme of two datasets to improve accuracy of an inexpensive large-sized dataset with using an accurate and small-sized dataset (reference dataset). The algorithm (Alg. 1 and 2) is easy to implement and fast. Prediction with a confidence bound (or the standard deviation ) is another feature of the scheme. We have also shown how it worked in the calculation of the formation energy (Fig. 7).
We acknowledge the support from the Elements Strategy Initiative Project under the auspices of MEXT. This work was also supported by MEXT as a social and scientific priority issue (Creation of new functional Devices and high-performance Materials to Support next-generation Industries; CDMSI) to be tackled by using the post-K computer, by the “Materials research by Information Integration” Initiative (MII) project of the Support Program for Starting Up Innovation Hub from the Japan Science and Technology Agency (JST). The computation was partly conducted using the facilities of the Supercomputer Center, the Institute for Solid State Physics, the University of Tokyo, and the supercomputer of ACCMS, Kyoto University. This research also used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID:hp170100).
Appendix A Lattice parameters
Table 2 lists the lattice parameters for RCo we used in the calculations. We assumed the ThMn structure [space group: I4/mmm (#139)] for the system. The definitions of and are summarized in Table 3 with representable atomic positions of the atoms.
In our KKR-CPA calculations for RFeTi (R=Y, Nd, Sm) and ZFeTi (Z=Zr, Dy), we assumed that they also have the ThMn structure. The lattice parameters are reduced from the structure obtained in Ref. Harashima et al., 2015a and the structure given in the Table 4. The value of is used as the parameter for the reduced cell to keep the cell volume. The inner parameters are determined so that the they minimize the deviation in the space of the coefficients of the unit vectors, which corresponds to the point (,,) in the Cartesian coordinates. The reduced values are listed in Table 5.
Appendix B Integration model with a linear term
In this section, we incorporate an adjustable linear term into the relation of and , which appear in Section III.3: we consider as an approximate function of in this section.
This changes Eq. (3) to
Therefore, our model does not depend on the variable .
The estimation by the maximum likelihood method with described in Section III.3 is applicable to the variable and . The equation for is
where is defined as
and the symbol denotes the dyadic product, with which is defined as the matrix whose component is . This equation can be solved for without determining .
The equation for is modified as
Appendix C Dimensions for R and Z
We prepared dimensions to include information of the R and Z elements in our design of the descriptors in Table 1. Because the corresponding choice is only 6 in combination (R = Y, Nd, Sm; Z = Zr, Du) while it is known that high-dimensionality often causes problems in modeling, readers may doubt its necessity. We compare the efficiency of search for the smaller space in which those elements are fixed to R = Y and Z = Zr in Fig. 13. The solid line along the cross symbols in the figures denote the frequency of finding the system with the best score out of 1000 sessions in the restricted space when we use the set of and as a descriptor. We also show the curve elongated 6 times along -axis (the dotted line) because one has to optimize also for the other combinations of R and Z to obtain the optimal system for the full space, which approximately takes 6 times larger. For comparison, we show the result of optimization for the full space using the descriptor #9 by the line along the circle points. We set the number of iterations before Bayesian optimization to 5, which is smaller than the number we use above (=10), because we know that the full-space search is so fast that the search in the smaller space cannot catch up if it starts 60 (=) iterations behind (Fig. 10 and 11). We see from Fig.13 that the search with the full space is more efficient, even with this setup, than repeating the search for the small space 6 times. Therefore, the dimensions for R and Z actually contribute to the search efficiency.
- Rupp et al. (2012) Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert Müller, and O Anatole Von Lilienfeld, “Fast and accurate modeling of molecular atomization energies with machine learning,” Physical review letters 108, 058301 (2012).
- Ghiringhelli et al. (2015) Luca M Ghiringhelli, Jan Vybiral, Sergey V Levchenko, Claudia Draxl, and Matthias Scheffler, “Big data of materials science: critical role of the descriptor,” Physical review letters 114, 105503 (2015).
- Seko et al. (2015) Atsuto Seko, Atsushi Togo, Hiroyuki Hayashi, Koji Tsuda, Laurent Chaput, and Isao Tanaka, “Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and bayesian optimization,” Physical review letters 115, 205901 (2015).
- Pham et al. (2016) Tien Lam Pham, Hiori Kino, Kiyoyuki Terakura, Takashi Miyake, and Hieu Chi Dam, “Novel mixture model for the representation of potential energy surfaces,” The Journal of Chemical Physics 145, 154103 (2016).
- Lam Pham et al. (2017) Tien Lam Pham, Hiori Kino, Kiyoyuki Terakura, Takashi Miyake, Koji Tsuda, Ichigaku Takigawa, and Hieu Chi Dam, “Machine learning reveals orbital interaction in materials,” Science and technology of advanced materials 18, 756–765 (2017).
- Dam et al. (2018) Hieu Chi Dam, Viet Cuong Nguyen, Tien Lam Pham, Anh Tuan Nguyen, Kiyoyuki Terakura, Takashi Miyake, and Hiori Kino, “Important descriptors and descriptor groups of Curie temperatures of rare-earth transition-metal binary alloys,” Journal of the Physical Society of Japan 87, 113801 (2018).
- Pham et al. (2018) Tien-Lam Pham, Nguyen-Duong Nguyen, Van-Doan Nguyen, Hiori Kino, Takashi Miyake, and Hieu-Chi Dam, “Learning structure-property relationship in crystalline materials: A study of lanthanide–transition metal alloys,” The Journal of chemical physics 148, 204106 (2018).
- Ouyang et al. (2018) Runhai Ouyang, Stefano Curtarolo, Emre Ahmetcik, Matthias Scheffler, and Luca M Ghiringhelli, “SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates,” Physical Review Materials 2, 083802 (2018).
- Seko et al. (2018) Atsuto Seko, Hiroyuki Hayashi, Hisashi Kashima, and Isao Tanaka, “Matrix-and tensor-based recommender systems for the discovery of currently unknown inorganic compounds,” Physical Review Materials 2, 013805 (2018).
- Ueno et al. (2016) Tsuyoshi Ueno, Trevor David Rhone, Zhufeng Hou, Teruyasu Mizoguchi, and Koji Tsuda, ‘‘COMBO: An efficient Bayesian optimization library for materials science,” Materials discovery 4, 18–21 (2016).
- Ju et al. (2017) Shenghong Ju, Takuma Shiga, Lei Feng, Zhufeng Hou, Koji Tsuda, and Junichiro Shiomi, “Designing nanostructures for phonon transport via bayesian optimization,” Physical Review X 7, 021024 (2017).
- Kikuchi et al. (2018) Shun Kikuchi, Hiromi Oda, Shin Kiyohara, and Teruyasu Mizoguchi, “Bayesian optimization for efficient determination of metal oxide grain boundary structures,” Physica B: Condensed Matter 532, 24–28 (2018).
- Yuan et al. (2018) Ruihao Yuan, Zhen Liu, Prasanna V Balachandran, Deqing Xue, Yumei Zhou, Xiangdong Ding, Jun Sun, Dezhen Xue, and Turab Lookman, “Accelerated discovery of large electrostrains in BaTiO-based piezoelectrics using active learning,” Advanced Materials 30, 1702884 (2018).
- Oganov and Valle (2009) Artem R Oganov and Mario Valle, “How to quantify energy landscapes of solids,” The Journal of chemical physics 130, 104504 (2009).
- Kennedy and O’Hagan (2000) Marc C Kennedy and Anthony O’Hagan, “Predicting the output from a complex computer code when fast approximations are available,” Biometrika 87, 1–13 (2000).
- Pilania et al. (2017) Ghanshyam Pilania, James E Gubernatis, and Turab Lookman, “Multi-fidelity machine learning models for accurate bandgap predictions of solids,” Computational Materials Science 129, 156–163 (2017).
- Pan et al. (2010) Sinno Jialin Pan, Qiang Yang, et al., “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering 22, 1345–1359 (2010).
- Hutchinson et al. (2017) Maxwell L Hutchinson, Antono Erin, Brenna M Gibbons, Sean Paradiso, Julia Ling, and Bryce Meredig, “Overcoming data scarcity with transfer learning,” (2017).
- Ohashi et al. (1988a) Ken Ohashi, Yoshio Tawara, Ryo Osugi, Junji Sakurai, and Yukitomo Komura, “Identification of the intermetallic compound consisting of sm, ti, fe,” J. Less-Common Met. 139, L1–L5 (1988a).
- Ohashi et al. (1988b) K. Ohashi, Y. Tawara, R. Osugi, and M. Shimao, “Magnetic properties of fe‐rich rare‐earth intermetallic compounds with a thmn12 structure,” J. Appl. Phys. 64, 5714–5716 (1988b).
- Yang et al. (1988) Ying-chang Yang, Lin-shu Kong, Shu-he Sun, Dong-mei Gu, and Ben-pei Cheng, “Intrinsic magnetic properties of SmTiFe,” Journal of applied physics 63, 3702–3703 (1988).
- Verhoef et al. (1988) R Verhoef, FR De Boer, Zhang Zhi-dong, and KHJ Buschow, “Moment reduction in RFeT compounds (R = Gd, Y and T= Ti, Cr, V, Mo, W),” Journal of Magnetism and Magnetic Materials 75, 319–322 (1988).
- De Mooij and Buschow (1988) DB De Mooij and KHJ Buschow, “Some novel ternary ThMn-type compounds,” Journal of the Less Common Metals 136, 207–215 (1988).
- Buschow (1988) K. H. J. Buschow, “Structure and properties of some novel ternary Fe‐rich rare‐earth intermetallics (invited),” J. Appl. Phys. 63, 3130–3135 (1988).
- Jaswal et al. (1990) S. S. Jaswal, Y. G. Ren, and D. J. Sellmyer, “Electronic structure and magnetic properties of Fe‐rich ternary compounds: YFeV and YFeCr,” J. Appl. Phys. 67, 4564–4566 (1990).
- Coehoorn (1990) R. Coehoorn, ‘‘Electronic structure and magnetism of transition-metal-stabilized intermetallic compounds,” Phys. Rev. B 41, 11790–11797 (1990).
- Buschow (1991) K H J Buschow, “Permanent magnet materials based on tetragonal rare earth compounds of the type RFeM,” Journal of magnetism and magnetic materials 100, 79–89 (1991).
- Sakurada et al. (1992) S. Sakurada, A. Tsutai, and M. Sahashi, “A study on the formation of ThMn and NaZn structures in RFeSi,” Journal of Alloys and Compounds 187, 67–71 (1992).
- Sakuma (1992) Akimasa Sakuma, “Self-Consistent Band Calculations for YFeTi and YFeTiN,” J. Phys. Soc. Jpn. 61, 4119–4124 (1992).
- Asano et al. (1993) S. Asano, S. Ishida, and S. Fujii, “Electronic structures and improvement of magnetic properties of RFeX (R = Y, Ce, Gd; X = N, C),” Physica B 190, 155 – 168 (1993).
- Akayama et al. (1994) M. Akayama, H. Fujii, K. Yamamoto, and K. Tatami, “Physical properties of nitrogenated RFeTi intermetallic compounds (R=Ce, Pr and Nd) with ThMn-type structure,” J. Magn. Magn. Mater. 130, 99 – 107 (1994).
- Kuz’min et al. (1999) M. D. Kuz’min, M. Richter, and K. H. J. Buschow, “Iron-rich versus cobalt-rich ThMn-type intermetallics: a comparative study of the crystal fields,” Solid State Commun. 113, 47 – 51 (1999).
- Gabay et al. (2016) A. M. Gabay, A. Martín-Cid, J. M. Barandiaran, D. Salazar, and G. C. Hadjipanayis, “Low-cost CeSm(Fe, Co, Ti) alloys for permanent magnets,” AIP Advances 6, 056015 (2016), https://doi.org/10.1063/1.4944066 .
- Körner et al. (2016) Wolfgang Körner, Georg Krugel, and Christian Elsässer, ‘‘Theoretical screening of intermetallic thmn12-type phases for new hard-magnetic compounds with low rare earth content,” Scientific Reports 6, 24686 (2016).
- Ke and Johnson (2016) Liqin Ke and Duane D. Johnson, “Intrinsic magnetic properties in (FeCo)Ti (Y and Ce; = H, C, and N),” Phys. Rev. B 94, 024423 (2016).
- Fukazawa et al. (2017) Taro Fukazawa, Hisazumi Akai, Yosuke Harashima, and Takashi Miyake, ‘‘First-principles study of intersite magnetic couplings in NdFe and NdFeX (X= B, C, N, O, F),” Journal of Applied Physics 122, 053901 (2017).
- Fukazawa et al. (2019) Taro Fukazawa, Hisazumi Akai, Yosuke Harashima, and Takashi Miyake, “First-principles study of spin-wave dispersion in Sm(FeCo),” Journal of Magnetism and Magnetic Materials 469, 296 – 301 (2019).
- Miyake et al. (2014) Takashi Miyake, Kiyoyuki Terakura, Yosuke Harashima, Hiori Kino, and Shoji Ishibashi, “First-Principles Study of Magnetocrystalline Anisotropy and Magnetization in NdFe, NdFeTi, and NdFeTiN,” Journal of the Physical Society of Japan 83, 043702 (2014).
- Hirayama et al. (2015a) Y Hirayama, YK Takahashi, S Hirosawa, and K Hono, “NdFeN hard-magnetic compound with high magnetization and anisotropy field,” Scripta Materialia 95, 70–72 (2015a).
- Hirayama et al. (2015b) Yusuke Hirayama, Takashi Miyake, and Kazuhiro Hono, “Rare-earth lean hard magnet compound NdFeN,” JOM 67, 1344–1349 (2015b).
- Yang et al. (1991a) Ying-chang Yang, Xiao-dong Zhang, Lin-shu Kong, Qi Pan, and Sen-lin Ge, “New potential hard magnetic material—NdTiFeN,” Solid state communications 78, 317–320 (1991a).
- Yang et al. (1991b) Ying-chang Yang, Xiao-dong Zhang, Sen-lin Ge, Qi Pan, Lin-shu Kong, Hailin Li, Ji-lian Yang, Bai-sheng Zhang, Yong-fan Ding, and Chun-tang Ye, “Magnetic and crystallographic properties of novel fe-rich rare-earth nitrides of the type RTiFeN,” Journal of applied physics 70, 6001–6005 (1991b).
- Harashima et al. (2016) Yosuke Harashima, Kiyoyuki Terakura, Hiori Kino, Shoji Ishibashi, and Takashi Miyake, “First-principles study on stability and magnetism of NdFe and NdFeN for M= Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn,” Journal of Applied Physics 120, 203904 (2016).
- Hirayama et al. (2017) Y Hirayama, Y K Takahashi, S Hirosawa, and K Hono, “Intrinsic hard magnetic properties of Sm(FeCo) compound with the ThMn structure,” Scripta Materialia 138, 62–65 (2017).
- Suzuki et al. (2014) S. Suzuki, T. Kuno, K. Urushibata, K. Kobayashi, N. Sakuma, K. Washio, H. Kishimoto, A. Kato, and A. Manabe, “A (Nd, Zr)(Fe, Co)TiN compound as a permanent magnet material,” AIP Adv. 4, 117131 (2014).
- Sakuma et al. (2016) N. Sakuma, S. Suzuki, T. Kuno, K. Urushibata, K. Kobayashi, M. Yano, A. Kato, and A. Manabe, “Influence of Zr substitution on the stabilization of ThMn-type (NdZr)(FeCo)TiN (),” AIP Adv. 6, 056023 (2016).
- Kuno et al. (2016) Tomoko Kuno, Shunji Suzuki, Kimiko Urushibata, Kurima Kobayashi, Noritsugu Sakuma, Masao Yano, Akira Kato, and Akira Manabe, “(Sm,Zr)(Fe,Co)Ti compounds as new permanent magnet materials,” AIP Adv. 6, 025221 (2016).
- Suzuki et al. (2016) S. Suzuki, T. Kuno, K. Urushibata, K. Kobayashi, N. Sakuma, K. Washio, M. Yano, A. Kato, and A. Manabe, ‘‘A new magnet material with ThMn structure: (NdZr)(FeCo)TiN(),” J. Magn. Magn. Mater. 401, 259–268 (2016).
- Harashima et al. (2018) Yosuke Harashima, Taro Fukazawa, Hiori Kino, and Takashi Miyake, “Effect of -site substitution and pressure on stability of Fe: A first-principles study,” arXiv preprint arXiv:1805.12241 (2018).
- (50) “COMmon Bayesian Optimization Library (COMBO),” https://github.com/tsudalab/combo.
- Hohenberg and Kohn (1964) Pierre Hohenberg and Walter Kohn, “Inhomogeneous electron gas,” Physical Review 136, B864 (1964).
- Kohn and Sham (1965) Walter Kohn and Lu Jeu Sham, “Self-consistent equations including exchange and correlation effects,” Physical Review 140, A1133 (1965).
- Jensen and Mackintosh (1991) Jens Jensen and Allan R Mackintosh, Rare earth magnetism (Clarendon Oxford, 1991).
- Richter (1998) Manuel Richter, “Band structure theory of magnetism in 3d-4f compounds,” Journal of Physics D: Applied Physics 31, 1017 (1998).
- Locht et al. (2016) I. L. M. Locht, Y. O. Kvashnin, D. C. M. Rodrigues, M. Pereiro, A. Bergman, L. Bergqvist, A. I. Lichtenstein, M. I. Katsnelson, A. Delin, A. B. Klautau, B. Johansson, I. Di Marco, and O. Eriksson, “Standard model of the rare earths analyzed from the hubbard i approximation,” Phys. Rev. B 94, 085137 (2016).
- Perdew and Zunger (1981) J. P. Perdew and Alex Zunger, “Self-interaction correction to density-functional approximations for many-electron systems,” Phys. Rev. B 23, 5048–5079 (1981).
- Blöchl (1994) P. E. Blöchl, “Projector augmented-wave method,” Phys. Rev. B 50, 17953–17979 (1994).
- Kresse and Joubert (1999) G. Kresse and D. Joubert, “From ultrasoft pseudopotentials to the projector augmented-wave method,” Phys. Rev. B 59, 1758–1775 (1999).
- (59) “QMAS—Quantum MAterials Simulator Official Site,” http://qmas.jp.
- Perdew et al. (1996) John P. Perdew, Kieron Burke, and Matthias Ernzerhof, “Generalized gradient approximation made simple,” Phys. Rev. Lett. 77, 3865–3868 (1996).
- Fukazawa et al. (2018) Taro Fukazawa, Hisazumi Akai, Yosuke Harashima, and Takashi Miyake, “First-principles study of intersite magnetic couplings and Curie temperature in RFeCr (R = Y, Nd, Sm),” Journal of Physical Society of Japan 87, 044706 (2018).
- Harashima et al. (2015a) Yosuke Harashima, Kiyoyuki Terakura, Hiori Kino, Shoji Ishibashi, and Takashi Miyake, “First-principles study of structural and magnetic properties of R(Fe,Ti) and R(Fe,Ti)N (R= Nd, Sm, Y),” in Proceedings of Computational Science Workshop 2014 (CSW2014), JPS Conference Proceedings, Vol. 5 (2015) p. 1021.
- Soven (1967) Paul Soven, “Coherent-potential model of substitutional disordered alloys,” Physical Review 156, 809 (1967).
- Soven (1970) Paul Soven, “Application of the coherent potential approximation to a system of muffin-tin potentials,” Physical Review B 2, 4715 (1970).
- Shiba (1971) Hiroyuki Shiba, “A reformulation of the coherent potential approximation and its applications,” Progress of Theoretical Physics 46, 77–94 (1971).
- Liechtenstein et al. (1987) A I Liechtenstein, M I Katsnelson, V P Antropov, and V A Gubanov, “Local spin density functional approach to the theory of exchange interactions in ferromagnetic metals and alloys,” Journal of Magnetism and Magnetic Materials 67, 65–74 (1987).
- Fukazawa et al. (in press) Taro Fukazawa, Hisazumi Akai, Yosuke Harashima, and Takashi Miyake, “Curie temperature of SmFe and NdFeB: a first-principles study,” IEEE Transaction on Magnetics (in press).
- Dederichs et al. (1991) P H Dederichs, R Zeller, H Akai, and H Ebert, “Ab-initio calculations of the electronic structure of impurities and alloys of ferromagnetic transition metals,” Journal of magnetism and magnetic materials 100, 241–260 (1991).
- Takahashi et al. (2007) C Takahashi, M Ogura, and H Akai, “First-principles calculation of the Curie temperature Slater–Pauling curve,” Journal of Physics: Condensed Matter 19, 365233 (2007).
- Harashima et al. (2015b) Yosuke Harashima, Kiyoyuki Terakura, Hiori Kino, Shoji Ishibashi, and Takashi Miyake, “Nitrogen as the best interstitial dopant among =B, C, N, O, and F for strong permanent magnet : First-principles study,” Phys. Rev. B 92, 184426 (2015b).