A comparative study of feature selection methods for stress hotspot classification in materials
Abstract
The first step in constructing a machine learning model is defining the features of the data set that can be used for optimal learning. In this work we discuss feature selection methods, which can be used to build better models, as well as achieve model interpretability. We applied these methods in the context of stress hotspot classification problem, to determine what microstructural characteristics can cause stress to build up in certain grains during uniaxial tensile deformation. The results show how some feature selection techniques are biased and demonstrate a preferred technique to get feature rankings for physical interpretations.
Keywords:
Stress hotspots Machine learning Random forests Crystal plasticity Titanium alloys Feature selection∎
1 Introduction
Statistical learning methods are gaining popularity in the materials science field, rapidly becoming known as ”Materials Data Science”. With new data infrastructure platforms like Citrination OMara2016 () and the Materials data curation system Dima2016 (), machine learning (ML) methods are entering the mainstream of materials science. Materials data science and informatics is an emergent field aligned with the goals of the Materials Genome Initiative to reduce the cost and time for materials design, development and deployment. Building and interpreting machine learning models are indispensable parts of the process of curating materials knowledge. ML methods have been used for predicting a target property such as material failure Mangal2017b (); Mangal2017c (), twinning deformation Orme2016 (), phase diagrams ChNg2017 () and guiding experiments and calculations in composition space Ling2017 (); Oliynyk2016 (). Machine learning models are built on learning from ”features” or variables that describe the problem. Thus, an important aspect of the machine learning process is to determine which variables most enable data driven insights about the problem.
Dimensionality reduction techniques (such as principal component analysis(PCA) wall2003singular (), kernel PCA mika1999kernel (), autoencoders Holden2006 (), feature compression from information gain theory yu2003feature ()) have become popular for producing compact feature representations Guyon2003 (). They are applied to the feature set to get the best feature representation, resulting in a smaller dataset, which speeds up the model construction van2009dimensionality (). However, dimensionality reduction techniques change the original representation of the features, and hence offer limited interpretability Guyon2003 (). An alternate method for better models is feature selection. Feature selection is the process of selecting a subset of the original variables such that a model built on data containing only these features has the best performance. Feature selection avoids overfitting, improves model performance by getting rid of redundant features and has the added advantage of keeping the original feature representation, thus offering better interpretability Guyon2003 ().
Feature selection methods have been used extensively in the field of bioinformatics Saeys2007 (), psychiatry Lu2014 () and cheminformatics Wegner2004 (). There are multiple feature selection methods, broadly categorized into Filter, Wrapper and Embedded methods based on their interaction with the predictor during the selection process. The filter methods rank the variables as a preprocessing step, and feature selection is done before choosing the model. In the wrapper approach, nested subsets of variables are tested to select the optimal subset that work best for the model during the learning process. Embedded methods are those which incorporate variable selection in the training algorithm.
We have used random forest models to study stress hotspot classification in FCC Mangal2017b () and HCP Mangal2017c () materials. In this paper, we review some feature selection techniques applied to the stress hotspot prediction problem in hexagonal close packed materials, and compare them with respect to future data prediction. We focus on two commonly used techniques from each method: (1) Filter Methods: Correlation based feature selection (CFS) Hall1999 (), and Pearson Correlation Cohen2009 (); (2) Wrapper Methods: Fealect Zare2013 () and Recursive feature elimination (RFE) Guyon2003 () and (3) Embedded Methods: Random Forest Permutation accuracy importance (RFPAI) Breiman1996 () and Least Absolute Shrinkage and Selection Operator (LASSO) Tibshirani1996 (). The main contribution of this article is to raise awareness in the materials data science community about how different feature selection techniques can lead to misguided model interpretations and how to avoid them. We point out some of the inadequacies of popular feature selection methods and finally, we extract data driven insights with better understanding of the methods used.
2 Methods
An applied stress is distributed heterogenously within the grains in a microstructureQidwai2009 (). Under an applied deformation, some grains are prone to accumulating stress due to their orientation, geometry and placement with respect to the neighboring grains. These regions of high stress, so called stress hotspots, are related to void nucleation under ductile fracture Rimmer1959 (). Stress hotspot formation has been studied in face centered cubic (FCC) Mangal2017b () and hexagonal close packed (HCP) Mangal2017c () materials using a machine learning approach. A set of microstructural descriptors was designed to be used as features in a random forest model for predicting stress hotspots. To achieve data driven insights into the problem, it is essential to rank the microstructural descriptors (features). In this paper, we review different feature selection techniques applied to the stress hotspot classification problem in HCP materials, which have a complex plasticity landscape due to anisotropic slip system activity.
Let , for be N independent identically distributed (i.i.d.) observations of a pdimensional vector of grain features , and the response variable denotes the truth value of a grain being a stress hotspot. The input matrix is denoted by , and is the binary outcome. We will use small letters to refer to the samples and capital letters to refer to the features of the input matrix . Feature importance refers to metrics used by various feature selection methods to rank, such as feature weights in linear models or variable importance in random forest models.
2.1 Dataset Studied
A dataset of HCP microstructures with different textures was generated using Dream.3D in Mangal2017c (). Uniaxial tensile deformation was simulated in these microstructures using EVPFFT Lebensohn2012 () with different constitutive parameters resulting in a dataset representing a Titanium like HCP material with an anisotropic critically resolved shear stress ratio Mangal2017c (). This dataset contains grainwise values for equivalent Von Mises stress, and the corresponding Euler angles and grain connectivity parameters.
The grains having stress greater than the percentile of the stress distribution were designated as stress hotspots, a binary target. Thirty four variables to be used as features in machine learning were developed. These features () describe the grain texture and geometry and have been summarized in table 1. We rank these features using different feature selection techniques, and observe the improvement in models, as well as understand the physics behind stress hotspot formation. The model performance is measured by the AUC (area under curve), a metric for binary classification which is insensitive to imbalance in the classes. An AUC of 100% denotes perfect classification and 50% denotes no better than random guessing auc ().
Feature name Abbreviation  Description  Feature name Abbreviation  Description 

Schmid_1  Basal Schmid factor  100_IPF_x  Distance of tensile axis from the corners of the 100 Inverse pole figure 
Schmid_2  Prismatic Schmid factor  001_IPF_x  Distance of tensile axis from the corners of the 001 Inverse pole figure 
Schmid_3  Pyramidal Schmid factor  AvgC_Axes_x  Unit vector components describing the c axis orientation for hcp 
Schmid_4  Pyramidal Schmid factor  Max_mis  Maximum misorientation between a grain and its nearest neighbor 
Schmid  FCC Schmid factor  Min_mis  Minimum misorientation between a grain and its nearest neighbor 
Polar angle of hcp c axis w.r.t sample frame  AvgMisorientations  Average misorientation between a grain and its nearest neighbor  
Azimuthal Angle of hcp c axis w.r.t. sample frame  QPEuc  Average distance of a grain to quadruple junctions  
TJEuc  Average distance of a grain to triple junctions  NumNeighbors  Number of nearest neighbors of a grain 
GBEuc  Average distance of a grain to grain boundaries  Neighborhoods  Number of grains having their centroid within the 1 multiple of equivalent sphere diameters from each grain 
KernelAvg  Average misorientation within a grain  FeatureVolumes  Volume of grain 
Omega3s  3rd invariant of the secondorder moment matrix for the grain, without assuming a shape type  Equivalent Diameters  Equivalent spherical diameter of a grain 
mPrimeList  Slip transmission factor for fcc materials  AspectRatios  Ratio of axis lengths (ba and ca) for bestfit ellipsoid to grain shape 
Surface Features  1 if grain touches the periodic boundary else 0  Surface area volume ratio  Ratio between surface area and volume of a grain 
2.2 Feature Selection Methods
2.2.1 Filter Methods
Filter methods are based on preprocessing the dataset to extract the features that most impact the target . Some of these methods are:
Pearson Correlation Cohen2009 ():
This method provides a straightforward way for filtering features according to their correlation coefficient. The Pearson correlation coefficient between a feature and the target is:
where is the covariance, is the standard deviation Cohen2009 (). It ranges between from negative to positive correlation, and can be used for binary classification and regression problems. It is a quick metric using which the features are ranked in order of the absolute correlation coefficient to the target.
Correlation based feature selection (CFS) Hall1999 ():
CFS was developed to select a subset of features with high correlation to the target and low intercorrelation among themselves, thus reducing redundancy and selecting a diverse feature set. CFS gives a heuristic merit over a feature subset instead of individual features. It uses symmetrical uncertainty correlation coefficient given by:
where is the information gain of feature for the class attribute . is the entropy of variable . The following merit metric was used to rank each subset containing features:
where is the mean symmetrical uncertainty correlation between the feature and the target, and is the average featurefeature intercorrelation. To account for the high computational complexity of evaluating all possible feature subsets, CFS is often combined with search strategies such as forward selection, backward elimination and bidirectional search. In this work we have used the scikitlearn implementation of CFS Zhao2010 () which uses symmetrical uncertainity Hall1999 () as the correlation metric and explores the subset space using best first search Pearl:1984:HIS:525 (), stopping when it encounters five consecutive fully expanded nonimproving subsets.
2.2.2 Embedded Methods
These methods are popular because they perform feature selection while constructing the classifier, removing the preprocessing feature selection step. Some popular algorithms are support vector machines (SVM) using recursive feature elimination (RFE) Guyon2002 (), random forests (RF) Breiman1996 () and Least absolute shrinkage and selection operator (LASSO)Tibshirani1996 (). We compare LASSO and RF methods for feature selection on the stress hotspot dataset.
Least Absolute Shrinkage and Selection Operator (LASSO) Tibshirani1996 ():
LASSO is linear regression with regularization Tibshirani1996 (). A linear model is constructed
on the training data , , where is a dimensional vector of weights corresponding to each feature dimension . The regularization term () helps in feature selection by pushing the weights of correlated features to zero, thus preventing overfitting and improving model performance. Model interpretation is possible by ranking the features according to the LASSO feature weights. However, it has been shown that for a given regularization strength , if the features have redundancy, inconsistent subsets can be selected bach2008bolasso (). Nonetheless, Lasso has been shown to provide good prediction accuracy by reducing model variance without substantially increasing the bias while providing better model interpretability. We used the scikitlearn implementation to compute our results pedregosa2011scikit ().
Random Forest Permutation Accuracy importance (RF PAI) Breiman1996 ():
The random forest is a non linear multivariate model built on an ensemble of decision trees. It can be used to determine feature importance using the inbuilt feature importance measure Breiman1996 (). For each of the trees in the model, a feature node is randomly replaced with another feature node while keeping all others nodes unchanged. The resulting model will have a lower performance if the feature is important. When the permuted variable , together with the remaining unchanged variables, is used to predict the response, the number of observations classified correctly decreases substantially, if the original variable was associated with the response. Thus, a reasonable measure for feature importance is the difference in prediction accuracy before and after permuting . The feature importance calculated this way is known as Permutation Accuracy Importance (PAI) and was computed using the scikitlearn package in Python pedregosa2011scikit ().
2.2.3 Wrapper Methods
Wrapper methods test feature subsets using a model hypothesis. Wrapper methods can detect feature dependencies i.e. features that become importance in presence of each other. They are computationally expensive, hence often use greedy search strategies (forward selection and backward elimination sutter1993comparison ()) which are fast and avoid overfitting to get the best nested subset of features.
Fealect Algorithm Zare2013 ():
The number of features selected by Lasso depends on the regularization parameter , and in the presence of highly correlated features, LASSO arbitrarily selects one feature from a group of correlated features Zou2005 (). The set of possible solutions for all LASSO regularization strengths is given by the regularization path, which can be recovered computationally efficiently using the Least Angles Regression (LARS) algorithm Efron2004 (). It was shown that LASSO selects the the relevant variables with a probability one and all other with a positive probability bach2008bolasso (). An improvement in LASSO, the Bolasso feature selection algorithm was developed based on this property bach2008bolasso () in 2008. In this method, the dataset is bootstrapped, and a LASSO model with a fixed regularization strength is fit to each subset. Finally, the intersection of the LASSO selected features in each subset is chosen to get a consistent feature subset.
In 2013, the FeaLect algorithm, an improvement over the Bolasso algorithm, was developed based on the combinatorial analysis of regression coefficients estimated using LARS Zare2013 (). FeaLect considers the full regularization path, and computes the feature importance using a combinatorial scoring method, as opposed to simply taking the intersection with Bolasso. The FeaLect scoring scheme measures the quality of each feature in each bootstrapped sample, and averages them to select the most relevant features, providing a robust feature selection method. We used the R implementation of FeaLect to compute our results Zare2015 ().
Recursive Feature Elimination (RFE) Guyon2002 ():
A number of common ML techniques (such as linear regression, support vector machines (SVM), decision trees, Naive Bayes, perceptron, e.t.c) provide feature weights that consider multivariate interacting effects between features Guyon2003 (). To interpret the relative importance of the variables from these model feature weights, RFE was introduced in the context of support vector machines (SVM) Guyon2002 () for getting compact gene subsets from DNAmicroarray data.
To find the best feature subset, instead of doing an exhaustive search over all feature combinations, RFE uses a greedy approach, which has been shown to reduce the effect of correlation bias in variable importance measures Gregorutti2016 (). RFE uses backward elimination by taking the given model (SVM, random forests, linear regression etc.) and discarding the worst feature (by absolute classifier weight or feature ranking), and repeating the process over increasingly smaller feature subsets until the best model hypothesis is achieved. The weights of this optimal model are used to rank features. Although this feature ranking might not be the optimal ranking for individual features, it is often used as a variable importance measure Gregorutti2016 (). We used the scikitlearn implementation of RFE with random forest classifier to come up with a feature ranking for our dataset.
3 Results and Discussion
Features 
Pearson Correlation  CFS  RFE  RF ()  Linear Regression  Ridge Regression  LASSO Regression  Fealect () 

0.29  1  1  53.43  27.37  27.36  26.01  245.0 

0.39  0  1  0.15  22.72  22.69  14.78  145.00 
EquivalentDiameters 
0.01  0  1  0.05  0.15  0.15  0.08  79.47 
GBEuc 
0.01  0  1  0.12  0.22  0.22  0.12  71.47 

0.18  0  1  0.31  7.29  7.31  10.35  41.27 
Neighborhoods 
0.01  0  22  0.01  0.10  0.10  0.00  5.53 

0.48  1  1  8.74  74.78  74.61  52.99  5.00 
TJEuc 
0.01  0  2  0.07  0.97  0.97  0.44  4.93 

0.14  1  16  0.03  80.46  79.96  19.17  1.0 
AvgMisorientations 
0.31  0  1  8.95  32.08  32.09  32.05  0.83 
NumNeighbors 
0.01  0  23  0.01  0.18  0.17  0.03  0.50 

0.12  0  9  0.03  4.05  4.04  0.00  0.0 

0.09  0  1  0.72  3.46  3.46  2.19  0.0 

0.00  0  1  0.22  0.09  0.09  0.00  0.0 

0.17  0  4  0.02  0.86  0.86  0.03  0.0 
NumCells 
0.01  0  18  0.04  1.3e6  0.11  0.21  0.0 

0.49  0  1  26.80  38.03  37.83  8.37  0.0 
KernelAvg 
0.01  0  25  0.0  0.22  0.22  0.00  0.0 

0.07  0  5  0.01  0.49  0.49  0.00  0.0 

0.13  1  3  3.4  66.42  65.94  7.68  0.0 

0.00  0  11  0.03  0.58  0.57  0.00  0.0 

0.09  0  21  0.01  0.21  0.24  0.19  0.0 

0.00  0  12  0.01  0.76  0.76  0.23  0.0 

0.00  0  10  0.01  0.13  0.13  0.00  0.0 

0.16  0  15  0.01  0.17  0.14  0  0.0 

0.07  0  14  0.02  1.10  1.10  0.00  0.0 
QPEuc 
0.01  0  6  0.02  0.57  0.57  0.00  0.0 

0.00  0  7  0.03  0.34  0.34  0.05  0.0 

0.00  1  24  0.02  0.04  0.04  0.00  0.0 
FeatureVolumes 
0.01  0  13  0.04  1.3e6  0.11  0.00  0.0 

0.04  0  17  0.01  0.79  0.79  0.00  0.0 

0.00  0  8  0.01  2.9e4  0.07  0.00  0.0 

0.04  0  19  0.01  1.21  1.20  0.00  0.0 

0.00  1  20  0.01  2.9e4  0.07  0.00  0.0 
Random Forest model AUC without feature selection: 71.94%  
Random Forest model AUC with selected features (%)  
training  84.02  82.51  84.24  83.82  84.20  84.19  84.31  84.28 
validation  80.46  80.45  80.73  80.19  80.72  80.61  80.83  80.75 
Table 2 shows the feature importances calculated using filter based methods: Pearson correlation and CFS; embedded methods: Random Forest (RF), Linear regression, Ridge regression ( regularization) and LASSO regression and finally wrapper methods: RFE and Fealect . The shaded cells denote the features that were finally selected to build RF models and their corresponding performances are noted. The input data was scaled by minimum and maximum values to [0,1]. Figure 1 shows the correlation matrix for the features and the target.
Pearson correlation can be used for feature selection, resulting in a good model. However, this measure has implicit orthogonality assumptions between variables, and the coefficient does not take mutual information between features into account. Additionally, this method only looks for linear correlations which might not capture many physical phenomenon.
The feature subset selected by CFS contains features with higher class correlation and lower redundancy, which translate to a good predictive model. Although we know grain geometry and neighborhood are important to hotspot formation, CFS does not select any geometry based features and fails to provide an individual feature ranking.
Linear regression, ridge regression and Lasso are highly correlated linear models. A simple linear model results in huge weights for some features (NumCells, FeatureVolumes), likely due to overfitting, and hence is unsuitable for deducing variable importance. Ridge regression compensates for this problem by using regularization, but the weights are distributed among the redundant features, which might lead to incorrect conclusions. LASSO regression overcomes this problem by pushing the weights of correlated features to zero, resulting in a good feature subset. The top five ranked features by LASSO with regularization strength of are : , AvgMisorientations, , and . The first geometry based feature ranks on the list, which seems to underestimate the physical importance of such features. A drawback of deriving insights from LASSO selected features is that it arbitrarily selects a few representatives from the correlated features, and the number of features selected depends heavily on the regularization strength. Thus the models become unstable, because changes in training subset can result in different selected features. Hence these methods are not ideal for deriving physical insights from the model.
Random forest models also provide an embedded feature ranking module. The RFPAI importance seems to focus only on the hcp ’c’ axis orientation derived features (), average misorientation and the Prismatic Schmid factor, while discounting most of the geometry derived features. RFPAI suffers from correlation bias due to preferential selection of correlated features during tree building process Strobl2008 (). As the number of correlated variables increases, the feature importance score for each variable decreases. Often times the less relevant variables replace the predictive ones (due to correlation) and thus receive undeserved, boosted importance Tolosi2011 (). Random forest variable importance can also be biased in situations where the features vary in their scale of measurement or number of categories, because the underlying Gini gain splitting criterion is a biased estimator and can be affected by multiple testing effects Strobl2007 (). From Figure 1, we found that all the geometry based features are highly correlated to each other, therefore deducing physical insights from this ranking is unsuitable.
Hence, we move to Wrapper based methods for feature importance. Recursive feature elimination (RFE) has been shown to reduce the effect of the correlation on the importance measure Gregorutti2016 (). RFE with underlying random forest model selects a feature subset consisting of two geometry based features (GBEuc and EquivalentDiameter), however, it fails to give an individual ranking among the features.
FeaLect provides a robust feature selection method by compensating for the uncertainty in LASSO due to arbitrary selection among correlated variables, and the number of selected variables due to change in regularization strength. Table 2 lists the Fealect selected variables in decreasing order. We find that the top two important features are derived from the grain crystallography, and geometry derived features come next. This suggests that both texture and geometry based features are important. Using linear regression based methods such as these tell us which features are important by themselves, as opposed to RFPAI which indicates the features that become important due to interactions between them (via RF models) Guyon2003 (). The Fealect method provides the best estimate of the feature importance ranking which can then be used to extract physical insights. This method also divides the features into 3 classes: informative, irrelevant features that cause model overfitting and redundant features Zare2013 (). The most informative features are: , , EquivalentDiameter, GBEuc, , Neighborhoods, and TJEuc. The irrelevant features are and AvgMisorientations (which cause model overfitting). The remaining features are redundant.
A number of selected features directly or indirectly represent the HCP caxis orientation, such as , and basal Schmid factor (), which is proportional to . It is interesting that pyramidal Schmid factor () is chosen as important. From Figure 1, we can see that hot grains form where maximize and i.e. . This means that the HCP caxis orientation of hot grains aligns with the sample Y axis, which means these grains have a low elastic modulus. Since the caxis is perpendicular to the tensile axis (sample Z); the deformation along the tensile direction can be accommodated by prismatic slip in these grains, and if pyramidal slip is occurring, it means they have a very high stress Mangal2017c (). This explains the high importance of the pyramidal Schmid factor. From the Pearson correlation coefficients in Figure 1, we can observe that the stress hotspots form in grains with low basal and pyramidal Schmid factor, high prismatic Schmid factor, and higher values of and .
From Figure 1, we can see that all the grain geometry descriptors do not have a direct correlation with stress, but are still selected by Fealect. This points to the fact that these variables become important in association with others. We analyzed these features in detail in Mangal2017c () and found that the hotspots lie closer to grain boundaries (GBEuc), triple junctions (TJEuc), and quadruple points (QPEuc), and prefer to form in smaller grains.
There is a subtle distinction between the physical impact of a variable on the target vs. the variables that work best for a given model. From table 2, we can see that a random forest model built on the entire feature set without feature selection has an AUC of . All the feature selection techniques result in an improvement in the performance of the random forest model to a validation AUC of about 81%. However, to draw physical interpretations, it is important to use a feature selection technique which: 1) keeps the original representation of the features, 2) is not biased by correlations/ redundancies among features, 3) is insensitive to the scale of variable values , 4) is stable to the changes in the training dataset, 5) takes multivariate dependencies between the features into account, and 6) provides an individual feature ranking measure.
4 Conclusions
We have used different feature selection techniques and demonstrated that while all techniques lead to an improvement in model performance, only the FeaLect method helps us to determine the underlying importance of the features by themselves.

All feature selection techniques result in improvement in the AUC metric for stress hotspot classification.

Correlation based feature selection and Recursive feature elimination are computationally expensive to run, and give only a feature subset ranking.

Random forest embedded feature ranking is biased against correlated features and hence should not be used to derive physical insights.

Linear regression based feature selection techniques can objectively denote the most important features, however have their flaws. The Fealect algorithm can compensate for the variability in LASSO regression, providing a robust feature ranking that can be used to derive insights.

Stress hotspots formation under uniaxial tensile deformation is determined by a combination of crystallographic and geometric microstructural descriptors.

It is essential to choose a feature selection method that can find this dependence even when features are redundant or correlated.
Acknowledgements.
This work was performed at Carnegie Mellon University and has ben supported by the United States National Science Foundation award number DMR1307138 and DMR1507830. The authors are grateful to the authors of skfeature and sklearn python libraries who made their source code available through the Internet. We would also like to thank the reviewers for their thorough work.References
 (1) J. O’Mara, B. Meredig, K. Michel, Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access, Jom 68(8), 2031 (2016). DOI 10.1007/s1183701619840
 (2) A. Dima, S. Bhaskarla, C. Becker, M. Brady, C. Campbell, P. Dessauw, R. Hanisch, U. Kattner, K. Kroenlein, M. Newrock, A. Peskin, R. Plante, S.Y. Li, P.F. Rigodiat, G.S. Amaral, Z. Trautt, X. Schmitt, J. Warren, S. Youssef, Informatics Infrastructure for the Materials Genome Initiative, Jom 68(8), 2053 (2016). DOI 10.1007/s1183701620004
 (3) A. Mangal, E.A. Holm, Applied Machine Learning to predict stress hotspots I: Face Centered Cubic Materials (2018)
 (4) A. Mangal, E.A. Holm, Applied Machine Learning to predict stress hotspots II: Hexagonal Close Packed Materials (2018)
 (5) A.D. Orme, I. Chelladurai, T.M. Rampton, D.T. Fullwood, A. Khosravani, M.P. Miles, R.K. Mishra, Insights into twinning in Mg AZ31 : A combined EBSD and machine learning study, Computational Materials Science 124, 353 (2016)
 (6) K. Ch’Ng, J. Carrasquilla, R.G. Melko, E. Khatami, Machine learning phases of strongly correlated fermions, Physical Review X 7(3), 1 (2017). DOI 10.1103/PhysRevX.7.031038
 (7) J. Ling, M. Hutchinson, E. Antono, S. Paradiso, B. Meredig, HighDimensional Materials and Process Optimization using Datadriven Experimental Design with WellCalibrated Uncertainty Estimates, Integrating Materials and Manufacturing Innovation 6(3), 207 (2017). DOI 10.1007/s401920170098z. URL http://arxiv.org/abs/1704.07423{%}0Ahttp://dx.doi.org/10.1007/s401920170098z
 (8) A.O. Oliynyk, E. Antono, T.D. Sparks, L. Ghadbeigi, M.W. Gaultois, B. Meredig, A. Mar, HighThroughput MachineLearningDriven Synthesis of FullHeusler Compounds, Chemistry of Materials 28(20), 7324 (2016). DOI 10.1021/acs.chemmater.6b02724
 (9) M.E. Wall, A. Rechtsteiner, L.M. Rocha, in A practical approach to microarray data analysis (Springer, 2003), pp. 91–109
 (10) S. Mika, B. Scholkopf, A. Smola, K.R. Muller, M. Scholz, G. Riitsch, in Advances in neural information processing systems (1999), pp. 536—542. URL http://papers.nips.cc/paper/1491kernelpcaanddenoisinginfeaturespaces.pdf
 (11) R.R. Hinton, Geoffrey E and Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313(July), 504 (2006). DOI 10.1126/science.1127647
 (12) L. Yu, H. Liu, in Proceedings of the 20th international conference on machine learning (ICML03) (2003), pp. 856—863. DOI citeulikearticleid:3398512. URL http://www.aaai.org/Papers/ICML/2003/ICML03111.pdf
 (13) I. Guyon, A. Elisseeff, An Introduction to Variable and Feature Selection, Journal of Machine Learning Research (JMLR) 3(3), 1157 (2003). DOI 10.1016/j.aca.2011.07.027
 (14) L. Van Der Maaten, E. Postma, J. Van Den Herik, Dimensionality Reduction : A Comparative Review, Journal of Machine Learning Research (JMLR) 10(2009), 66 (2009). DOI 10.1080/13506280444000102. URL http://www.uvt.nl/ticc
 (15) Y. Saeys, I. Inza, P. Larranaga, Gene expression A review of feature selection techniques in bioinformatics, Bioinformatics 23(19), 2507 (2007). DOI 10.1093/bioinformatics/btm344
 (16) F. Lu, E. Petkova, A comparative study of variable selection methods in the context of developing psychiatric screening instruments, Statistics in Medicine 33(3), 401 (2014). DOI 10.1002/sim.5937
 (17) J.K. Wegner, H. Fröhlich, A. Zell, Feature selection for descriptor based classification models. 1. Theory and GASEC algorithm, Journal of Chemical Information and Computer Sciences 44(3), 921 (2004). DOI 10.1021/ci0342324
 (18) M. Hall, L.a. Smith, Feature Selection for Machine Learning : Comparing a Correlationbased Filter Approach to the Wrapper CFS : Correlationbased Feature, International FLAIRS Conference p. 5 (1999). DOI 10.1.1.50.2192
 (19) I. Cohen, Y. Huang, J. Chen, J. Benesty, in Noise Reduction in Speech Processing (Springer, 2009), pp. 1—4. DOI 10.1007/9783642002960. URL http://link.springer.com/10.1007/9783642002960
 (20) H. Zare, G. Haffari, A. Gupta, R.R. Brinkman, Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis., BMC genomics 14 Suppl 1(Suppl 1), S14 (2013). DOI 10.1186/1471216414S1S14. URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3549810{&}tool=pmcentrez{&}rendertype=abstract
 (21) L. Breiman. OutOfBagEstimation (1996). DOI 10.1007/s1339801401737.2
 (22) R. Tibshirani. Regression Selection and Shrinkage via the Lasso (1996). DOI 10.2307/2346178. URL http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.7574
 (23) M.A.S. Qidwai, A.C. Lewis, A.B. Geltmacher, Using imagebased computational modeling to study microstructure â yield correlations in metals, Acta Materialia 57(14), 4233 (2009). DOI 10.1016/j.actamat.2009.05.021. URL http://dx.doi.org/10.1016/j.actamat.2009.05.021
 (24) D. Hull, D.E. Rimmer, The growth of grainboundary voids under stress, Philosophical Magazine 4(42), 673 (1959). DOI 10.1080/14786435908243264. URL http://dx.doi.org/10.1080/14786435908243264
 (25) R.A. Lebensohn, A.K. Kanjarla, P. Eisenlohr, An elastoviscoplastic formulation based on fast Fourier transforms for the prediction of micromechanical fields in polycrystalline materials, International Journal of Plasticity 3233, 59 (2012). DOI 10.1016/j.ijplas.2011.12.005. URL http://dx.doi.org/10.1016/j.ijplas.2011.12.005
 (26) J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve., Radiology 143(1), 29 (1982). DOI 10.1148/radiology.143.1.7063747. URL https://doi.org/10.1148/radiology.143.1.7063747
 (27) Z. Zhao, F. Morstatter, S. Sharma, S. Alelyani, A. Anand, H. Liu, Advancing Feature Selection Research, ASU Feature Selection Repository Arizona State University pp. 1 – 28 (2010). URL http://featureselection.asu.edu/featureselection{_}techreport.pdf
 (28) J. Pearl, Heuristics: Intelligent search strategies for computer problem solving (AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA, 1984)
 (29) I. Guyon, Gene selection for cancer classification using support vector machines, Machine Learning 46(13), 389 (2002). DOI 10.1023/A:1012487302797
 (30) F.R. Bach, in Proceedings of the 25th international conference on Machine learning (2008), pp. 33—40. DOI 10.1145/1390156.1390161. URL http://arxiv.org/abs/0804.1302
 (31) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, Others, Scikitlearn: Machine learning in Python, Journal of Machine Learning Research 12(Oct), 2825 (2011)
 (32) J.H. Sutter, Jon M and Kalivas, Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection, Microchemical journal 47(12), 60 (1993). DOI 10.1006/mchj.1993.1012
 (33) H. Zou, T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society. Series B: Statistical Methodology 67(2), 301 (2005). DOI 10.1111/j.14679868.2005.00503.x
 (34) B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least Angle Regression, The Annals of Statistics 32(2), 407 (2004). DOI 10.1214/009053604000000067. URL http://statweb.stanford.edu/{~}tibs/ftp/lars.pdf
 (35) H. Zare, FeaLect: Scores Features for Feature Selection (2015). URL https://cran.rproject.org/package=FeaLect
 (36) B. Gregorutti, B. Michel, P. SaintPierre, Correlation and variable importance in random forests, Statistics and Computing pp. 1–20 (2016). DOI 10.1007/s1122201696461
 (37) C. Strobl, A.L. Boulesteix, T. Kneib, T. Augustin, A. Zeileis, Conditional variable importance for random forests., BMC bioinformatics 9(23), 307 (2008). DOI 10.1186/147121059307
 (38) L. ToloÅi, T. Lengauer, Classification with correlated features: Unreliability of feature ranking and solutions, Bioinformatics 27(14), 1986 (2011). DOI 10.1093/bioinformatics/btr300
 (39) C. Strobl, A.L. Boulesteix, A. Zeileis, T. Hothorn, Bias in random forest variable importance measures: illustrations, sources and a solution., BMC Bioinformatics 8, 25 (2007). DOI 10.1186/14712105825. URL http://www.ncbi.nlm.nih.gov/pubmed/17254353