Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference

# Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference

Suofei Wu1    Jan Hannig 2    Thomas C. M. Lee1
11swu@ucdavis.edu, Department of Statistics, University of California at Davis
22jan.hannig@unc.edu, Department of Statistics & Operations Research, University of North Carolina at Chapel Hill
November 10, 2019
###### Abstract

Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several state-of-the-art methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee.

Keywords: additive regression trees, confidence intervals, FART, prediction intervals, random forests

## 1 Introduction

Ensemble learning is a popular method in regression and classification because of its robustness and accuracy (Mendes-Moreira et al., 2012). It is commonly used to make predictions for future observations. Denote the observed sample as , , where are scalar responses and are vector predictors. The general regression model is

 Yi=f(Xi)+ϵi, (1)

where the iid noise ’s follow . An ensemble learning method approximates the model by a weighted sum of weak learners ’s with weights ’s:

 f(Xi)=a∑i=1wiTi(Xi). (2)

Decision tree is a common choice for the weak learners because it has high accuracy and flexibility. Although it suffers from high variance, an ensemble of decision trees will keep the accuracy and at the same time reduce the variance. Given their successes, ensembling of trees have attracted a lot of attention. For example, Random forests (Breiman, 2001) and bagging (Breiman, 1996) take average of decorrelated trees to obtain a more stable model with similar bias. While both bagging and random forests sample a different training set with replacement when growing each tree, random forests also consider a randomly selected subset of features for each split. Recently, Athey et al. (2019) proposed generalized random forests that construct a more general framework and can be naturally extended to other statistical tasks such as quantile regression and heterogeneous treatment effect estimation. All of the three methods use the basic ensemble method (BEM) (Perrone and Cooper, 1992) in regression, which takes all the ’s in (2) equally as .

Wang et al. (2003) proposed a weighted ensemble approach for classification, where the classifiers are weighted by their accuracies in classifying their own training data. Their work can be straightforwardly extended to the regression case. Bayesian ensemble learning (Wu et al., 2007; Chipman et al., 2007) is another approach that takes a weighted average of the trees. The posterior probabilities are used as weights in this scenario.

Despite the above efforts, the study of uncertainty quantification of ensemble learning is somewhat limited. One notable exception is Wager et al. (2014), where the authors proposed a method that produces standard error estimates for random forests predictions. It is based on jackknife and infinite jackknife (Efron, 2014) and can be used for constructing Gaussian confidence intervals. Figure 1 shows the result of applying their method on the Auto MPG data set. The goal is to predict fuel economy of automotives (in miles per gallon, MPG) using features. Further details of this data set can be found in Section 5.2 below. Following Wager et al. (2014), we randomly split the data into a training set of size and a testing set of size . The error bars in Figure 1 are standard error in each directions. The rate that these error bars cover the prediction-equals-observation diagonal is . This suggests that there are residual noise in the data that cannot be explained by the random forests model based on the available features. Figure 1: Random forest predictions and 95% confidence intervals of the Auto MPG data set.

Moreover, in simulation experiments, the confidence intervals do not have a good coverage rate either, especially when there are a lot of noise in . We repeat the same simulation setting as in (Wager et al., 2014) and report the results in Section 5.

Besides the work of Wager et al. (2014), Mentch and Hooker (2016) showed that under some strong assumptions, random forests based on subsampling are asymptotically normal, allowing for confidence intervals to accompany predictions. In addition, Chipman et al. (2010) developed a Bayesian Additive Regression Trees model (BART) that produces both point and interval estimates via posterior inference.

In this paper, we use generalized fiducial inference (Hannig et al., 2016) to construct a probability density function on the set of honest trees in an honest random forests model. We shall show that such a new ensemble method of honest trees provides more precise confidence intervals as well as point estimates.

The rest of this paper is organized as follows. First a brief introduction of generalized fiducial inference is provided in Section 2. Then the main methodology is presented in Section 3 and the theoretical properties of the method is studied in Section 4. Section 5 illusrates the practical performances of the proposed method. Lastly concluding remarks are offered in Section 6 while technical details are delayed to the appendix.

## 2 Generalized Fiducial Inference

Fiducial inference was first introduced by Fisher in (Fisher, 1930). It aims to construct a statistical distribution for the parameter space when no prior information is available. Under such condition, the usage of classical Bayesian framework receives criticism because it requires a prior distribution of the parameter space. Alternatively, Fisher considered a switching mechanism between the parameters and the observations, which is quite similar to how parameters are estimated by the maximum likelihood method. Despite Fisher’s continuous effort on the theory of fiducial inference, this framework was overlooked by the majority of the statistics community for several decades. Hannig et al. (2016) has a detailed introduction on the history of the original fiducial inference.

In recent years, there is a renewed interest in extending Fisher’s idea. The modified versions include Dempster-Shafer theory (Dempster, 2008; Martin et al., 2010), inferential models (Martin and Liu, 2015a, b), confidence distributions (Xie and Singh, 2013; Xie et al., 2011) and generalized inference (Weerahandi, 1995, 2013). In this paper we focus on the successful extension known as generalized fiducial inference (GFI) (Hannig et al., 2006). It has been successfuly applied to a variety of problems, including wavelet regression (Hannig and Lee, 2009), ultrahigh-dimensional regression (Lai et al., 2015), logistic regression (Liu and Hannig, 2016) and nonparametric additive models (Gao et al., 2019).

Under the GFI framework, the relationship between the data and the parameter is expressed by a data generating equation :

 y=G(u,θ),

where is a random component with a known distribution. Suppose for the moment that the inverse function exists for any ; i.e., one can always calculate for any . Then a random sample of can be obtained by first generating a random sample of and then calculate

 ~θ1=G−1(~u1,y),~θ2=G−1(~u2,y),…

Notice that the roles of and are “switched” in the above, as in the maximum likelihood method of Fisher. See Hannig et al. (2016) for strategies to ensure the existence of . We call the above random sample a generalized fiducial sample of and the corresponding density the generalized fiducial density of .

Beyond this conceptually appealing and well defined definition, Hannig et al. (2016) provides a user friendly formula for the fiducial density

 r(θ)=h(y,θ)J(y,θ)∫Θh(y,θ′)J(y,θ′)dθ′, (3)

where is the likelihood and the function

 J(y,θ)=D{∇θG(u,θ)|u=G−1(y,θ)}

with .

Formula (3) assumes that the model dimension is known. When model selection is involved, the generating function of a certain model becomes:

 y=G(θT,u,T).

Similar to maximum likelihood, GFI tends to assign higher probabilities to models with higher complexity (i.e., larger number of parameters). As similar to penalized maximum likelihood, Hannig and Lee (2009) suggested adding an extra penalty term to (3). The marginal fiducial probability of a specific model then becomes:

 r(T)=∫rT(θT)n−l(T)2dθT∑T′∈T∫rT′(θT′)n−l(T′)2dθT′, (4)

where is the set of all possible models and is the number of parameters in model ; see Hannig et al. (2016) for derivation. Therefore in practice, when model selection is involved, to generate a fiducial sample for , one can first choose a model using (4), and then select from (3) given . We note that closed form expressions for (3) and (4) do not always exist so one may need to resort to MCMC techniques.

## 3 Methodology

### 3.1 Regression Trees and Honest Regression Trees

A decision tree models the function in (1) by recursively partitioning the feature space (i.e., the space of all ’s) into different subsets. These subsets are called leaves. Let be any point in the feature space and be the leaf that contains . The decision tree estimate for is the average of those responses that are in the same leaf as :

 ^f(X)=1|{i:Xi∈L(X)}|∑i:Xi∈L(X)Yi.

Naturally one may like a partition that minimizes the loss function:

 n∑i=1{yi−^f(Xi)}2.

However, very often in practice a serious drawback is that the number of potential partitions is huge which makes it infeasible to obtain the partition that minimizes the above loss. Therefore, a greedy search algorithm is usually considered, which consists of the following steps:

1. Start from the root.

2. Choose the best feature and split point that minimize the loss function.

3. Recursively repeat the former step on the children nodes.

4. Stop when

• Each node achieves the minimum node size pre-specified by the user, or

• The loss cannot be further reduced by extra partitioning.

One criticism about the above decision tree is that the same data are used to grow the tree and make prediction. To ensure good statistical behaviors and as a response to this criticism, honest decision trees were proposed (Biau, 2012; Denil et al., 2014). An honest tree is grown using one subsample of the training data while uses a different subsample for making predictions at its leaves. If there are no observations falling to a specific leaf, its prediction will be made by one of its parents. A corresponding honest random forest can be generated by using the same mechanism to generate random forests from decision trees. Wager and Athey (2018) proved that under some regularity conditions, the leaves of an honest tree become small in all dimensions of the feature space when becomes large. Hence, if we also assume that the true generating function is Lipschitz continuous, honest trees are unbiased and so are honest random forests.

### 3.2 Ensemble of Honest Trees using Generalized Fiducial Inference

The goal is to solve the regression problem (1) using an ensemble of honest trees and apply GFI to conduct statistical inference.

Suppose there exists a binary tree structured function such that for any ; we will call any such tree a true model. One example is the “AND” function mentioned in Wager et al. (2014):

 Y=10×AND(X1>0.3;X2>0.3;X3>0.3;X4>0.3)+ϵ.

A corresponding binary tree is shown in Figure 2.

We want to assign a generalized fiducial probability to each tree measuring how likely the true generating function is contained in . Suppose has leaves, . Denote the number of observations in the -th leaf is and hence . Also denote the response value of the -th leaf as , which can be estimated by the average of all the ’s that belong to this leaf:

 ^μj=1|Lj|∑i:Xi∈LjYi.

Write . First we calculate the generalized fiducial density for . From (3) it can be shown that is proportional to

 r(μ,σ2)∝J(μ,σ2)1(2πσ2)n2e−∑(Yi−μi′)22σ2n−l(T)2, (5)

where is the index of the leaf that belongs to; i.e., .

The Jacobian term in (5) is

 J(μ,σ2)=√SSE∏niσ

with as the sum of squared errors, where , the average of all ’s belong to the leaf .

Now we can calculate the marginal fiducial density for the tree using (4), for which the numerator becomes:

 r(T) ∝ ∫∫√∏niSSEσ(1√2πσ)ne−∑(Yi−¯Yi+¯Yi−μi)22σ2n−l(T)2dμ1,…,dμkdσ2 (6) = ∫∫√∏niSSEσ(1√2πσ)ne−∑(Yi−¯Yi)22σ2−∑(¯Yi−μi)22σ2n−l(T)2dμ1,…,dμkdσ2 = ∫√SSE(2π)n−l(T)2σn−l(T)+1e−SSE2σ2n−l(T)2dσ2 = ∫SSE12−n−l(T)+12+12−12π−n−l(T)2ξn−l(T)+12−2e−ξn−l(T)2dξ ∝ Γ(n−l(T)−12)n−l(T)2% SSEn−l(T)2−1πn−l(T)2.

### 3.3 A Practical Method for Generating Fiducial Samples

This subsection presents a practical method for generating a fiducial sample of honest trees using (6).

Even when is only of moderate size, the set of all possible trees is huge and therefore we only consider a subset of trees . More precisely, is an honest random forest with an adequate number of trees such that one can assume it contains at least one true model . Each tree samples observations without replacement to grow, and uses a different group of observations to calculate the averages ’s (i.e., make predictions) at the leaves.

Loosely, three steps are involved in generating a fiducial sample of trees. The first step is to generate the structure of the tree, the second step is to generate the noise variance, and the last step is to generate the leaf values ’s. We begin with approximating the generalized fiducial density in (6) as follows.

For each tree , we calculate:

 R(T)=Γ(n−l(T)−12)n−l(T)2SSEn−l(T)2−1πn−l(T)2

and approximate with

 r(T)=R(T)∑T′∈T∗R(T′). (7)

After a tree is sampled from (7), is sampled from

 SSE/σ2∼χ2n−l(T), (8)

where denotes the number of leaves in . To sample the ’s, we draw without replacement observations from the part of the data that was not used to grow . Denote these drawn observations as . Then a generalized fiducial tree sample can be obtained by updating the leaf values of using

 ~μj=1|Lj|∑i:X∗i∈LjY∗i+~σzi, (9)

where , with being the number of leaves in .

Repeating the above procedure multiple times provides multiple copies of the fiducial sample . Statistical inference can then be conducted in a similar fashion as with a posterior sample in the Bayesian context. For any design point , averaging over all the ’s will deliver a point estimate for . The and percentiles of will give a confidence interval for , while the and percentiles of will provide a prediction interval for the corresponding .

We summarize the above procedure in Algorithm 1.

## 4 Asymptotic Properties

The theoretical properties of the proposed method is established under the following conditions:

A1) The generating function has a binary tree structure. Denote the training data set of as ,  . We say this binary tree is a true model, if for any in the training set , . Notice that such a binary tree is not unique. We denote the collections of true models as :

 T0={T:T is a true model}.

A2) Let be the collection of honest trees in a trained random forests model. We assume that should have at least one tree that belongs to :

 P(T∩T0=∅)→0.

A3) Meanwhile, we assume that the size of is not too large for practical use.

 |T|=o(√log(n)loglogn).

A4) Let be the projection matrix of ; i.e., ,

 hij={1ni,ifXi∈L(Xj)inT0,else.

Let , where . Assume

 (10)

A5) Denote the number of leaves of a tree as . Let be the minimum number of leaves of the trees in :

 l0=min{l(T),T∈T∩T0}

and be the trees in with number of leaves equals to :

 Tl0={T:l(T)=l0,T∈T∩T0}.

A6) Denote as the maximum number of leaves in :

 L=max{l(T):T∈T}.

Assume that is at most , with .
Under the above assumptions, we have

###### Theorem 4.1.
 ∑T∈Tl0r(T)→p1.

The proof can be found in the appendix.

## 5 Empirical Properties

This section illustrates the practical performance of the above proposed method via a sequence of simulation experiments and real data applications. We shall call the proposed method FART, short for Fiducial Additive Regression Trees.

### 5.1 Simulation Experiments

In our simulation experiments three test functions were used:

• Cosine: ,

• XOR: ,

• AND: .

The design points ’s are iid and the errors ’s are iid . We tested different combinations of and (see below). These experimental configurations have been used by previous authors (e.g., Chipman et al., 2010; Wager et al., 2014). The number of repetitions for each experimental configuration is 1000.

We applied FART to the simulated data and calculated the mean coverages of various confidence intervals. We also applied the following three methods to obtain other confidence intervals:

• BART: Bayesian Additive Regression Trees of Chipman et al. (2010),

• Bootstrap: the bootstrap method of Mentch and Hooker (2016), and

• Jackknife: the infinite jackknife method of Wager et al. (2014).

Tables 12 and 3 report the empirical coverage rates of the, respectively, 90%, 95% and 99% confidence intervals produced by these methods for , where is a random future data point.

Overall FART provided quite good and stable coverages. The performances of Bootstrap and Jackknife are somewhat disappointing. The possible reasons are that in Jackknife the uncertainty of the residual noise was not taken into account, and that Bootstrap is, in general, not asymptotically unbiased, as argued in Wager and Athey (2018). BART sometimes gave better results than FART. However, for those cases where BART were better, results from FART were not far behind, but for some other cases, BART’s results could be substantially worse than FART’s. Therefore it seems that FART is the prefered and safe method if one is targeting .

Next we examine the coverage rates for the noise standard deviation . Since Bootstrap and Jackknife do not produce convenient confidence intervals for , we only focus on FART and BART. The results are summarized in Tables 4, 5 and 6. Overall one can see that FART is the prefered method, although its performances for the test function AND were disappointing.

Lastly we provide the histogram of the generalized fiducial samples of , which can be seen as an approximation of the marginal generalized fiducial density of . The histogram is displayed in Figure 3. These samples were for the case when the test function is XOR with . One can see that the histogram is approximately bell-shaped and centered at the true value of . Figure 3: Histogram of the generalized fiducial samples ~σ of σ.

### 5.2 Real Data Examples

This subsection reports the coverage rates the FART prediction intervals on five real data sets:

• Air Foil: This is a NASA data set, obtained from a series of aerodynamic and acoustic tests of two and three-dimensional airfoil blade sections conducted in an anechoic wind tunnel (Dua and Graff, 2017). Five features were selected to predict the aerofoil noise. We used observations as the training data set and observations as test data.

• Auto Mpg: This data set contains eight features to predict city-cycle fuel consumption in miles per gallon (Asuncion and Newman, 2007; Dua and Graff, 2017). After discarded samples with missing entries, we split the rest of the observations into a training set of size and a test set of size .

• CCPP: This data set contains data points collected from a Combined Cycle Power Plant over six years (2006-2011), when the power plant was set to work with full load (Tüfekci, 2014; Kaya et al., 2012). There are four features aiming to predict the full load electrical power. We split the data into a training set of size and a test set of size .

• Boston House: Originally published by (Harrison Jr and Rubinfeld, 1978), a collection of observations associated with features from U.S. Census Service are used to predict the median value of owner-occupied homes. We split the data into a training set of size and a test set of size .

• CCS: In civil engineering, concrete is the most important material (Yeh, 1998). This data set consists of eight features to predict the concrete compressive strength. We split it into a training set of size and a test set of size .

For each of the above data sets, we applied FART to the training data set to construct 95% prediction intervals for the observations in the test data set. We repeated this procedure 100 times by randomly splitting the whole data set into a training data set and a test data set. The empirical coverage rates of these prediction intercals are reported in Table 7. In addition, as a comparison to Figure 1, we plotted the coverage of the FART prediction intervals on the same Auto MPG data in Figure 4. One can see that FART gave very good performances. Figure 4: FART predictions and 95% prediction intervals for the Auto MPG data set.

## 6 Conclusion

In this paper, we applied generalized fiducial inference to ensembles of honest regression trees. In particular, we derived a fiducial probability for each honest tree in an honest random forests, which shows how likely the tree contains the true model. A practical procedure was developed to generate fiducial samples of the tree models, variance of errors and predictions. These samples can further be used for point estimation, and constructing confidence intervals and prediction intervals. The proposed method was shown to enjoy desirable theoretical properties, and compares favorably with other state-of-the-art methods in simulation experiments and real data analysis.

## Appendix A Technical Details

The appendix provides the proof for Theorem 4.1.

WLOG, assume and fix . We first prove that

 maxT′∉Tl0,T′∈TR(T′)/R(T)→p0.

Rewrite

 R(T′)/R(T)=exp{−D1−D2},

where

 D1=n−l(T′)−12logSSET′SSET,
 D2=logΓ(n−l02)Γ(n−l(T′)2)+l0−l(T′)2logπ+l0−l(T′)2logSSET+l(T′)−l02log(n).

Case 1: .

Now calculate

 SSET′−SSET=ΔT′+2μ′(I−HT′)ε−ε′(HT′−HT)ε. (11)

Let , consider the second term in equation (11) and denote , then

 μT(I−HT′)ε=√ΔT′ZT′

and since . Furthermore,

 P(maxT′∈T|ZT′/√cl(T′)|>1)≤|T|maxT′∈TP(Z2T′>cl(T′))=|T|maxT′∈TP(χ21>cl(T′))≤|T|maxT′∈T(cl(T′)e1−cl(T′))1/2⟶0asn⟶∞.

Therefore, .

Consider the third term in equation (11):

Notice that . Thus,

 P(maxT∈Tε′HTε/cl(T)>1)≤|T|maxT∈TP(εTHl(T)ε>cl(T))=|T|maxT∈TP(χ2l(T)>cl(T))≤|T|maxT∈T(cl(T)l(T)e1−cl(T)l(T))l(T)/2=|T|maxT∈T(eloglognlogn)l(T)/2⟶0asn⟶∞.

Therefore, , and as . Thus, we have as .

 P(χ2n−L

which means

 P(minT′∈Tχ2n−l(T′)

Thus,

 D1=n−l(T′)−12log(SSET′SSET)=−n−l(T′)−12log(SSETSSET′)=−n−l(T′)−12log(1+SSET−SSET′SSET′)≥n−l(T′)−12SSET′−SSETSSET′=Ωp(ΔT′).

Moreover, . Therefore, .

Case 2 : and .

Recall is fixed. First notice that =, where is a chi-square random variable depending on with degrees of freedom .

 P(maxT′∈T0,l(T′)>l0χ2l(T′)−l0(T′)(l(T′)−l0)loglogn≥1)≤|T|maxT′∈T0,l(T′)>l0(loglogne1−loglogn)l(T′)−l02=|T|(eloglognlogn)12→0.

It implies that

 χ2l(T′)−l0=Op(cl(T′)−l0).

Therefore,

 n−l(T′)−12logSSET′SSET=−n−l(T′)−12log(1+χ2l(T′)−l0(T′)χ2n−l(T′))≥−n−l(T′)−12(χ2l(T′)−l0(T′)χ2n−l(T′))=Ωp(−cl(T′)−l0),

uniformly over , . Thus, we show that

 D1=Ωp(−l(T′)2loglogn).

Meanwhile, the calculation of is similar to Case 1, , so we have .

Combining Case 1 and Case 2, we have:

 maxT′∉Tl0,T′∈TR(T′)/R(T)=Op(1/n).

Furthermore,

 ∑T′∉Tl0,T′∈TR(T′)/R(T)≤|T|maxT′∉Tl0,T′∈TR(T′)/R(T)≤|T|n→p0.

Equivalently,

 ∑T∈Tl0r(T)→p1.

## References

• Asuncion and Newman (2007) Asuncion, A. and Newman, D. (2007) UCI machine learning repository. \$mlearn/{MLR}epository.html.
• Athey et al. (2019) Athey, S., Tibshirani, J. and Wager, S. (2019) Generalized random forests. The Annals of Statistics, 47, 1148–1178.
• Biau (2012) Biau, G. (2012) Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095.
• Breiman (1996) Breiman, L. (1996) Bagging predictors. Machine learning, 24, 123–140.
• Breiman (2001) Breiman, L. (2001) Random forests. Machine learning, 45, 5–32.
• Chipman et al. (2007) Chipman, H. A., George, E. I. and McCulloch, R. E. (2007) Bayesian ensemble learning. In Advances in Neural Information Processing Systems, 265–272.
• Chipman et al. (2010) Chipman, H. A., George, E. I., McCulloch, R. E. et al. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4, 266–298.
• Dempster (2008) Dempster, A. P. (2008) The Dempster–Shafer calculus for statisticians. International Journal of Approximate Reasoning, 48, 365–377.
• Denil et al. (2014) Denil, M., Matheson, D. and De Freitas, N. (2014) Narrowing the gap: Random forests in theory and in practice. In International Conference on Machine Learning, 665–673.
• Dua and Graff (2017) Dua, D. and Graff, C. (2017) UCI machine learning repository.
• Efron (2014) Efron, B. (2014) Estimation and accuracy after model selection. Journal of the American Statistical Association, 109, 991–1007.
• Fisher (1930) Fisher, R. A. (1930) Inverse probability. In Mathematical Proceedings of the Cambridge Philosophical Society, vol. 26, 528–535. Cambridge University Press.
• Gao et al. (2019) Gao, Q., Lai, R. C. S., Lee, T. C. M. and Li, Y. (2019) Uncertainty quantification for high-dimensional sparse nonparametric additive models. Technometrics. To appear.
• Hannig et al. (2016) Hannig, J., Iyer, H., Lai, R. C. S. and Lee, T. C. M. (2016) Generalized fiducial inference: A review and new results. Journal of the American Statistical Association, 111, 1346–1361.
• Hannig et al. (2006) Hannig, J., Iyer, H. and Patterson, P. (2006) Fiducial generalized confidence intervals. Journal of the American Statistical Association, 101, 254–269.
• Hannig and Lee (2009) Hannig, J. and Lee, T. C. M. (2009) Generalized fiducial inference for wavelet regression. Biometrika, 96, 847–860.
• Harrison Jr and Rubinfeld (1978) Harrison Jr, D. and Rubinfeld, D. L. (1978) Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
• Kaya et al. (2012) Kaya, H., Tüfekci, P. and Gürgen, F. S. (2012) Local and global learning methods for predicting power of a combined gas & steam turbine. In Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE, 13–18.
• Lai et al. (2015) Lai, R. C., Hannig, J. and Lee, T. C. M. (2015) Generalized fiducial inference for ultrahigh-dimensional regression. Journal of the American Statistical Association, 110, 760–772.
• Liu and Hannig (2016) Liu, Y. and Hannig, J. (2016) Generalized fiducial inference for binary logistic item response models. psychometrika, 81, 290–324.
• Martin and Liu (2015a) Martin, R. and Liu, C. (2015a) Conditional inferential models: combining information for prior-free probabilistic inference. Journal of the Royal Statistical Society: Series B, 77, 195–217.
• Martin and Liu (2015b) Martin, R. and Liu, C. (2015b) Marginal inferential models: prior-free probabilistic inference on interest parameters. Journal of the American Statistical Association, 110, 1621–1631.
• Martin et al. (2010) Martin, R., Zhang, J., Liu, C. et al. (2010) Dempster–Shafer theory and statistical inference with weak beliefs. Statistical Science, 25, 72–87.
• Mendes-Moreira et al. (2012) Mendes-Moreira, J., Soares, C., Jorge, A. M. and Sousa, J. F. D. (2012) Ensemble approaches for regression: A survey. ACM Computing Surveys, 45, 10.
• Mentch and Hooker (2016) Mentch, L. and Hooker, G. (2016) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. The Journal of Machine Learning Research, 17, 841–881.
• Perrone and Cooper (1992) Perrone, M. P. and Cooper, L. N. (1992) When networks disagree: Ensemble methods for hybrid neural networks. Tech. rep.
• Tüfekci (2014) Tüfekci, P. (2014) Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, 60, 126–140.
• Wager and Athey (2018) Wager, S. and Athey, S. (2018) Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113, 1228–1242.
• Wager et al. (2014) Wager, S., Hastie, T. and Efron, B. (2014) Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research, 15, 1625–1651.
• Wang et al. (2003) Wang, H., Fan, W., Yu, P. S. and Han, J. (2003) Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 226–235. ACM.
• Weerahandi (1995) Weerahandi, S. (1995) Generalized confidence intervals. In Exact Statistical Methods for Data Analysis, 143–168. Springer.
• Weerahandi (2013) Weerahandi, S. (2013) Exact statistical methods for data analysis. Springer Science & Business Media.
• Wu et al. (2007) Wu, Y., Tjelmeland, H. and West, M. (2007) Bayesian cart: Prior specification and posterior simulation. Journal of Computational and Graphical Statistics, 16, 44–66.
• Xie and Singh (2013) Xie, M.-g. and Singh, K. (2013) Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81, 3–39.
• Xie et al. (2011) Xie, M.-g., Singh, K. and Strawderman, W. E. (2011) Confidence distributions and a unifying framework for meta-analysis. Journal of the American Statistical Association, 106, 320–333.
• Yeh (1998) Yeh, I.-C. (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28, 1797–1808.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   