Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference
Abstract
Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several stateoftheart methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee.
Keywords: additive regression trees, confidence intervals, FART, prediction intervals, random forests
1 Introduction
Ensemble learning is a popular method in regression and classification because of its robustness and accuracy (MendesMoreira et al., 2012). It is commonly used to make predictions for future observations. Denote the observed sample as , , where are scalar responses and are vector predictors. The general regression model is
(1) 
where the iid noise ’s follow . An ensemble learning method approximates the model by a weighted sum of weak learners ’s with weights ’s:
(2) 
Decision tree is a common choice for the weak learners because it has high accuracy and flexibility. Although it suffers from high variance, an ensemble of decision trees will keep the accuracy and at the same time reduce the variance. Given their successes, ensembling of trees have attracted a lot of attention. For example, Random forests (Breiman, 2001) and bagging (Breiman, 1996) take average of decorrelated trees to obtain a more stable model with similar bias. While both bagging and random forests sample a different training set with replacement when growing each tree, random forests also consider a randomly selected subset of features for each split. Recently, Athey et al. (2019) proposed generalized random forests that construct a more general framework and can be naturally extended to other statistical tasks such as quantile regression and heterogeneous treatment effect estimation. All of the three methods use the basic ensemble method (BEM) (Perrone and Cooper, 1992) in regression, which takes all the ’s in (2) equally as .
Wang et al. (2003) proposed a weighted ensemble approach for classification, where the classifiers are weighted by their accuracies in classifying their own training data. Their work can be straightforwardly extended to the regression case. Bayesian ensemble learning (Wu et al., 2007; Chipman et al., 2007) is another approach that takes a weighted average of the trees. The posterior probabilities are used as weights in this scenario.
Despite the above efforts, the study of uncertainty quantification of ensemble learning is somewhat limited. One notable exception is Wager et al. (2014), where the authors proposed a method that produces standard error estimates for random forests predictions. It is based on jackknife and infinite jackknife (Efron, 2014) and can be used for constructing Gaussian confidence intervals. Figure 1 shows the result of applying their method on the Auto MPG data set. The goal is to predict fuel economy of automotives (in miles per gallon, MPG) using features. Further details of this data set can be found in Section 5.2 below. Following Wager et al. (2014), we randomly split the data into a training set of size and a testing set of size . The error bars in Figure 1 are standard error in each directions. The rate that these error bars cover the predictionequalsobservation diagonal is . This suggests that there are residual noise in the data that cannot be explained by the random forests model based on the available features.
Moreover, in simulation experiments, the confidence intervals do not have a good coverage rate either, especially when there are a lot of noise in . We repeat the same simulation setting as in (Wager et al., 2014) and report the results in Section 5.
Besides the work of Wager et al. (2014), Mentch and Hooker (2016) showed that under some strong assumptions, random forests based on subsampling are asymptotically normal, allowing for confidence intervals to accompany predictions. In addition, Chipman et al. (2010) developed a Bayesian Additive Regression Trees model (BART) that produces both point and interval estimates via posterior inference.
In this paper, we use generalized fiducial inference (Hannig et al., 2016) to construct a probability density function on the set of honest trees in an honest random forests model. We shall show that such a new ensemble method of honest trees provides more precise confidence intervals as well as point estimates.
The rest of this paper is organized as follows. First a brief introduction of generalized fiducial inference is provided in Section 2. Then the main methodology is presented in Section 3 and the theoretical properties of the method is studied in Section 4. Section 5 illusrates the practical performances of the proposed method. Lastly concluding remarks are offered in Section 6 while technical details are delayed to the appendix.
2 Generalized Fiducial Inference
Fiducial inference was first introduced by Fisher in (Fisher, 1930). It aims to construct a statistical distribution for the parameter space when no prior information is available. Under such condition, the usage of classical Bayesian framework receives criticism because it requires a prior distribution of the parameter space. Alternatively, Fisher considered a switching mechanism between the parameters and the observations, which is quite similar to how parameters are estimated by the maximum likelihood method. Despite Fisher’s continuous effort on the theory of fiducial inference, this framework was overlooked by the majority of the statistics community for several decades. Hannig et al. (2016) has a detailed introduction on the history of the original fiducial inference.
In recent years, there is a renewed interest in extending Fisher’s idea. The modified versions include DempsterShafer theory (Dempster, 2008; Martin et al., 2010), inferential models (Martin and Liu, 2015a, b), confidence distributions (Xie and Singh, 2013; Xie et al., 2011) and generalized inference (Weerahandi, 1995, 2013). In this paper we focus on the successful extension known as generalized fiducial inference (GFI) (Hannig et al., 2006). It has been successfuly applied to a variety of problems, including wavelet regression (Hannig and Lee, 2009), ultrahighdimensional regression (Lai et al., 2015), logistic regression (Liu and Hannig, 2016) and nonparametric additive models (Gao et al., 2019).
Under the GFI framework, the relationship between the data and the parameter is expressed by a data generating equation :
where is a random component with a known distribution. Suppose for the moment that the inverse function exists for any ; i.e., one can always calculate for any . Then a random sample of can be obtained by first generating a random sample of and then calculate
Notice that the roles of and are “switched” in the above, as in the maximum likelihood method of Fisher. See Hannig et al. (2016) for strategies to ensure the existence of . We call the above random sample a generalized fiducial sample of and the corresponding density the generalized fiducial density of .
Beyond this conceptually appealing and well defined definition, Hannig et al. (2016) provides a user friendly formula for the fiducial density
(3) 
where is the likelihood and the function
with .
Formula (3) assumes that the model dimension is known. When model selection is involved, the generating function of a certain model becomes:
Similar to maximum likelihood, GFI tends to assign higher probabilities to models with higher complexity (i.e., larger number of parameters). As similar to penalized maximum likelihood, Hannig and Lee (2009) suggested adding an extra penalty term to (3). The marginal fiducial probability of a specific model then becomes:
(4) 
where is the set of all possible models and is the number of parameters in model ; see Hannig et al. (2016) for derivation. Therefore in practice, when model selection is involved, to generate a fiducial sample for , one can first choose a model using (4), and then select from (3) given . We note that closed form expressions for (3) and (4) do not always exist so one may need to resort to MCMC techniques.
3 Methodology
3.1 Regression Trees and Honest Regression Trees
A decision tree models the function in (1) by recursively partitioning the feature space (i.e., the space of all ’s) into different subsets. These subsets are called leaves. Let be any point in the feature space and be the leaf that contains . The decision tree estimate for is the average of those responses that are in the same leaf as :
Naturally one may like a partition that minimizes the loss function:
However, very often in practice a serious drawback is that the number of potential partitions is huge which makes it infeasible to obtain the partition that minimizes the above loss. Therefore, a greedy search algorithm is usually considered, which consists of the following steps:

Start from the root.

Choose the best feature and split point that minimize the loss function.

Recursively repeat the former step on the children nodes.

Stop when

Each node achieves the minimum node size prespecified by the user, or

The loss cannot be further reduced by extra partitioning.

One criticism about the above decision tree is that the same data are used to grow the tree and make prediction. To ensure good statistical behaviors and as a response to this criticism, honest decision trees were proposed (Biau, 2012; Denil et al., 2014). An honest tree is grown using one subsample of the training data while uses a different subsample for making predictions at its leaves. If there are no observations falling to a specific leaf, its prediction will be made by one of its parents. A corresponding honest random forest can be generated by using the same mechanism to generate random forests from decision trees. Wager and Athey (2018) proved that under some regularity conditions, the leaves of an honest tree become small in all dimensions of the feature space when becomes large. Hence, if we also assume that the true generating function is Lipschitz continuous, honest trees are unbiased and so are honest random forests.
3.2 Ensemble of Honest Trees using Generalized Fiducial Inference
The goal is to solve the regression problem (1) using an ensemble of honest trees and apply GFI to conduct statistical inference.
Suppose there exists a binary tree structured function such that for any ; we will call any such tree a true model. One example is the “AND” function mentioned in Wager et al. (2014):
A corresponding binary tree is shown in Figure 2.
We want to assign a generalized fiducial probability to each tree measuring how likely the true generating function is contained in . Suppose has leaves, . Denote the number of observations in the th leaf is and hence . Also denote the response value of the th leaf as , which can be estimated by the average of all the ’s that belong to this leaf:
Write . First we calculate the generalized fiducial density for . From (3) it can be shown that is proportional to
(5) 
where is the index of the leaf that belongs to; i.e., .
The Jacobian term in (5) is
with as the sum of squared errors, where , the average of all ’s belong to the leaf .
Now we can calculate the marginal fiducial density for the tree using (4), for which the numerator becomes:
(6)  
3.3 A Practical Method for Generating Fiducial Samples
This subsection presents a practical method for generating a fiducial sample of honest trees using (6).
Even when is only of moderate size, the set of all possible trees is huge and therefore we only consider a subset of trees . More precisely, is an honest random forest with an adequate number of trees such that one can assume it contains at least one true model . Each tree samples observations without replacement to grow, and uses a different group of observations to calculate the averages ’s (i.e., make predictions) at the leaves.
Loosely, three steps are involved in generating a fiducial sample of trees. The first step is to generate the structure of the tree, the second step is to generate the noise variance, and the last step is to generate the leaf values ’s. We begin with approximating the generalized fiducial density in (6) as follows.
For each tree , we calculate:
and approximate with
(7) 
After a tree is sampled from (7), is sampled from
(8) 
where denotes the number of leaves in . To sample the ’s, we draw without replacement observations from the part of the data that was not used to grow . Denote these drawn observations as . Then a generalized fiducial tree sample can be obtained by updating the leaf values of using
(9) 
where , with being the number of leaves in .
Repeating the above procedure multiple times provides multiple copies of the fiducial sample . Statistical inference can then be conducted in a similar fashion as with a posterior sample in the Bayesian context. For any design point , averaging over all the ’s will deliver a point estimate for . The and percentiles of will give a confidence interval for , while the and percentiles of will provide a prediction interval for the corresponding .
We summarize the above procedure in Algorithm 1.
4 Asymptotic Properties
The theoretical properties of the proposed method is established under the following conditions:
A1) The generating function has a binary tree structure. Denote the training data set of as , . We say this binary tree is a true model, if for any in the training set , . Notice that such a binary tree is not unique. We denote the collections of true models as :
A2) Let be the collection of honest trees in a trained random forests model. We assume that should have at least one tree that belongs to :
A3) Meanwhile, we assume that the size of is not too large for practical use.
A4) Let be the projection matrix of ; i.e., ,
Let , where . Assume
(10) 
A5) Denote the number of leaves of a tree as . Let be the minimum number of leaves of the trees in :
and be the trees in with number of leaves equals to :
A6) Denote as the maximum number of leaves in :
Assume that is at most , with .
Under the above assumptions, we have
Theorem 4.1.
The proof can be found in the appendix.
5 Empirical Properties
This section illustrates the practical performance of the above proposed method via a sequence of simulation experiments and real data applications. We shall call the proposed method FART, short for Fiducial Additive Regression Trees.
5.1 Simulation Experiments
In our simulation experiments three test functions were used:

Cosine: ,

XOR: ,

AND: .
The design points ’s are iid and the errors ’s are iid . We tested different combinations of and (see below). These experimental configurations have been used by previous authors (e.g., Chipman et al., 2010; Wager et al., 2014). The number of repetitions for each experimental configuration is 1000.
We applied FART to the simulated data and calculated the mean coverages of various confidence intervals. We also applied the following three methods to obtain other confidence intervals:

BART: Bayesian Additive Regression Trees of Chipman et al. (2010),

Bootstrap: the bootstrap method of Mentch and Hooker (2016), and

Jackknife: the infinite jackknife method of Wager et al. (2014).
Tables 1, 2 and 3 report the empirical coverage rates of the, respectively, 90%, 95% and 99% confidence intervals produced by these methods for , where is a random future data point.
Overall FART provided quite good and stable coverages. The performances of Bootstrap and Jackknife are somewhat disappointing. The possible reasons are that in Jackknife the uncertainty of the residual noise was not taken into account, and that Bootstrap is, in general, not asymptotically unbiased, as argued in Wager and Athey (2018). BART sometimes gave better results than FART. However, for those cases where BART were better, results from FART were not far behind, but for some other cases, BART’s results could be substantially worse than FART’s. Therefore it seems that FART is the prefered and safe method if one is targeting .
function  FART  Bootstrap  Jackknife  BART  

Cosine  50  2  34.6 (1.63)  23.6 (1.40)  57.6 (2.33)  
Cosine  200  2  87.4 (3.11)  51.0 (2.12)  39.2 (1.05)  
XOR  50  50  5.0 (1.72)  3.8 (1.48)  62.9 (4.53)  
XOR  200  50  92.6 (2.61)  26.3 (2.96)  32.0 (1.02)  
AND  50  500  60.3 (8.19)  3.4 (3.07)  0.7 (2.30)  
AND  200  500  35.0 (5.09)  0.0 (1.94)  59.2 (6.10) 
function  FART  Bootstrap  Jackknife  BART  

Cosine  50  2  41.9 (1.94)  27.9 (1.67)  66.3 (2.78)  
Cosine  200  2  93.6 (3.80)  58.8 (2.52)  47.5 (1.26)  
XOR  50  50  8.0 (2.05)  6.7 (1.77)  77.3 (5.39)  
XOR  200  50  96.1 (3.22)  38.0 (3.53)  37.4 (1.21)  
AND  50  500  8.3 (3.65)  2.6 (2.74)  71.4 (8.18)  
AND  200  500  50.3 (6.06)  0.4 (2.31)  67.9 (7.26) 
function  FART  Bootstrap  Jackknife  BART  

Cosine  50  2  54.6 (2.55)  39.0 (2.19)  78.9 (3.64)  
Cosine  200  2  98.6 (5.26)  73.1 (3.32)  60.5 (1.65)  
XOR  50  50  17.2 (2.69)  13.4 (2.32)  91.1 (7.03)  
XOR  200  50  62.9 (4.64)  46.5 (1.60)  98.2 (5.48)  
AND  50  500  24.9 (4.80)  12.2 (3.60)  75.3 (10.65)  
AND  200  500  69.0 (7.96)  5.7 (3.04)  81.7 (9.44) 
Next we examine the coverage rates for the noise standard deviation . Since Bootstrap and Jackknife do not produce convenient confidence intervals for , we only focus on FART and BART. The results are summarized in Tables 4, 5 and 6. Overall one can see that FART is the prefered method, although its performances for the test function AND were disappointing.
function  FART  BART  

Cosine  50  2  8.5 (0.67)  
Cosine  200  2  86.5 (0.20)  
XOR  50  50  5.4 (1.03)  
XOR  200  50  93.6 (0.62)  
AND  50  500  0.0 (1.59)  
AND  200  500  0.0 (0.78) 
function  FART  BART  

Cosine  50  2  15.5 (0.80)  
Cosine  200  2  99.1 (0.31)  
XOR  50  50  9.6 (1.24)  
XOR  200  50  96.1 (0.76)  
AND  50  500  0.0 (1.90)  
AND  200  500  0.0 (0.94) 
function  FART  BART  

Cosine  50  2  36.3 (1.06)  
Cosine  200  2  99.9 (0.40)  
XOR  50  50  24.0 (1.62)  
XOR  200  50  98.2 (0.38)  
AND  50  500  0.1 (2.51)  
AND  200  500  0.0 (1.25) 
Lastly we provide the histogram of the generalized fiducial samples of , which can be seen as an approximation of the marginal generalized fiducial density of . The histogram is displayed in Figure 3. These samples were for the case when the test function is XOR with . One can see that the histogram is approximately bellshaped and centered at the true value of .
5.2 Real Data Examples
This subsection reports the coverage rates the FART prediction intervals on five real data sets:

Air Foil: This is a NASA data set, obtained from a series of aerodynamic and acoustic tests of two and threedimensional airfoil blade sections conducted in an anechoic wind tunnel (Dua and Graff, 2017). Five features were selected to predict the aerofoil noise. We used observations as the training data set and observations as test data.

CCPP: This data set contains data points collected from a Combined Cycle Power Plant over six years (20062011), when the power plant was set to work with full load (Tüfekci, 2014; Kaya et al., 2012). There are four features aiming to predict the full load electrical power. We split the data into a training set of size and a test set of size .

Boston House: Originally published by (Harrison Jr and Rubinfeld, 1978), a collection of observations associated with features from U.S. Census Service are used to predict the median value of owneroccupied homes. We split the data into a training set of size and a test set of size .

CCS: In civil engineering, concrete is the most important material (Yeh, 1998). This data set consists of eight features to predict the concrete compressive strength. We split it into a training set of size and a test set of size .
For each of the above data sets, we applied FART to the training data set to construct 95% prediction intervals for the observations in the test data set. We repeated this procedure 100 times by randomly splitting the whole data set into a training data set and a test data set. The empirical coverage rates of these prediction intercals are reported in Table 7. In addition, as a comparison to Figure 1, we plotted the coverage of the FART prediction intervals on the same Auto MPG data in Figure 4. One can see that FART gave very good performances.
Data  Air Foil  Auto Mpg  CCPP  Boston House  CCS 

Coverage  93.2% (14.2)  91.8% (11.5)  95.1% (11.5)  87.9% (12.4)  92.8% (30.4) 
6 Conclusion
In this paper, we applied generalized fiducial inference to ensembles of honest regression trees. In particular, we derived a fiducial probability for each honest tree in an honest random forests, which shows how likely the tree contains the true model. A practical procedure was developed to generate fiducial samples of the tree models, variance of errors and predictions. These samples can further be used for point estimation, and constructing confidence intervals and prediction intervals. The proposed method was shown to enjoy desirable theoretical properties, and compares favorably with other stateoftheart methods in simulation experiments and real data analysis.
Appendix A Technical Details
The appendix provides the proof for Theorem 4.1.
WLOG, assume and fix . We first prove that
Rewrite
where
Case 1: .
Now calculate
(11) 
Let , consider the second term in equation (11) and denote , then
and since . Furthermore,
Therefore, .
Consider the third term in equation (11):
Notice that . Thus,
Therefore, , and
as .
Thus, we have as .
In addition,
which means
Thus,
Moreover, . Therefore, .
Case 2 : and .
Recall is fixed. First notice that =, where is a chisquare random variable depending on with degrees of freedom .
It implies that
Therefore,
uniformly over , . Thus, we show that
Meanwhile, the calculation of is similar to Case 1, , so we have .
Combining Case 1 and Case 2, we have:
Furthermore,
Equivalently,
References
 Asuncion and Newman (2007) Asuncion, A. and Newman, D. (2007) UCI machine learning repository. URL http://www.ics.uci.edu/$∼$mlearn/{MLR}epository.html.
 Athey et al. (2019) Athey, S., Tibshirani, J. and Wager, S. (2019) Generalized random forests. The Annals of Statistics, 47, 1148–1178.
 Biau (2012) Biau, G. (2012) Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095.
 Breiman (1996) Breiman, L. (1996) Bagging predictors. Machine learning, 24, 123–140.
 Breiman (2001) Breiman, L. (2001) Random forests. Machine learning, 45, 5–32.
 Chipman et al. (2007) Chipman, H. A., George, E. I. and McCulloch, R. E. (2007) Bayesian ensemble learning. In Advances in Neural Information Processing Systems, 265–272.
 Chipman et al. (2010) Chipman, H. A., George, E. I., McCulloch, R. E. et al. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4, 266–298.
 Dempster (2008) Dempster, A. P. (2008) The Dempster–Shafer calculus for statisticians. International Journal of Approximate Reasoning, 48, 365–377.
 Denil et al. (2014) Denil, M., Matheson, D. and De Freitas, N. (2014) Narrowing the gap: Random forests in theory and in practice. In International Conference on Machine Learning, 665–673.
 Dua and Graff (2017) Dua, D. and Graff, C. (2017) UCI machine learning repository. URL http://archive.ics.uci.edu/ml.
 Efron (2014) Efron, B. (2014) Estimation and accuracy after model selection. Journal of the American Statistical Association, 109, 991–1007.
 Fisher (1930) Fisher, R. A. (1930) Inverse probability. In Mathematical Proceedings of the Cambridge Philosophical Society, vol. 26, 528–535. Cambridge University Press.
 Gao et al. (2019) Gao, Q., Lai, R. C. S., Lee, T. C. M. and Li, Y. (2019) Uncertainty quantification for highdimensional sparse nonparametric additive models. Technometrics. To appear.
 Hannig et al. (2016) Hannig, J., Iyer, H., Lai, R. C. S. and Lee, T. C. M. (2016) Generalized fiducial inference: A review and new results. Journal of the American Statistical Association, 111, 1346–1361.
 Hannig et al. (2006) Hannig, J., Iyer, H. and Patterson, P. (2006) Fiducial generalized confidence intervals. Journal of the American Statistical Association, 101, 254–269.
 Hannig and Lee (2009) Hannig, J. and Lee, T. C. M. (2009) Generalized fiducial inference for wavelet regression. Biometrika, 96, 847–860.
 Harrison Jr and Rubinfeld (1978) Harrison Jr, D. and Rubinfeld, D. L. (1978) Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
 Kaya et al. (2012) Kaya, H., Tüfekci, P. and Gürgen, F. S. (2012) Local and global learning methods for predicting power of a combined gas & steam turbine. In Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE, 13–18.
 Lai et al. (2015) Lai, R. C., Hannig, J. and Lee, T. C. M. (2015) Generalized fiducial inference for ultrahighdimensional regression. Journal of the American Statistical Association, 110, 760–772.
 Liu and Hannig (2016) Liu, Y. and Hannig, J. (2016) Generalized fiducial inference for binary logistic item response models. psychometrika, 81, 290–324.
 Martin and Liu (2015a) Martin, R. and Liu, C. (2015a) Conditional inferential models: combining information for priorfree probabilistic inference. Journal of the Royal Statistical Society: Series B, 77, 195–217.
 Martin and Liu (2015b) Martin, R. and Liu, C. (2015b) Marginal inferential models: priorfree probabilistic inference on interest parameters. Journal of the American Statistical Association, 110, 1621–1631.
 Martin et al. (2010) Martin, R., Zhang, J., Liu, C. et al. (2010) Dempster–Shafer theory and statistical inference with weak beliefs. Statistical Science, 25, 72–87.
 MendesMoreira et al. (2012) MendesMoreira, J., Soares, C., Jorge, A. M. and Sousa, J. F. D. (2012) Ensemble approaches for regression: A survey. ACM Computing Surveys, 45, 10.
 Mentch and Hooker (2016) Mentch, L. and Hooker, G. (2016) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. The Journal of Machine Learning Research, 17, 841–881.
 Perrone and Cooper (1992) Perrone, M. P. and Cooper, L. N. (1992) When networks disagree: Ensemble methods for hybrid neural networks. Tech. rep.
 Tüfekci (2014) Tüfekci, P. (2014) Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, 60, 126–140.
 Wager and Athey (2018) Wager, S. and Athey, S. (2018) Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113, 1228–1242.
 Wager et al. (2014) Wager, S., Hastie, T. and Efron, B. (2014) Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research, 15, 1625–1651.
 Wang et al. (2003) Wang, H., Fan, W., Yu, P. S. and Han, J. (2003) Mining conceptdrifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 226–235. ACM.
 Weerahandi (1995) Weerahandi, S. (1995) Generalized confidence intervals. In Exact Statistical Methods for Data Analysis, 143–168. Springer.
 Weerahandi (2013) Weerahandi, S. (2013) Exact statistical methods for data analysis. Springer Science & Business Media.
 Wu et al. (2007) Wu, Y., Tjelmeland, H. and West, M. (2007) Bayesian cart: Prior specification and posterior simulation. Journal of Computational and Graphical Statistics, 16, 44–66.
 Xie and Singh (2013) Xie, M.g. and Singh, K. (2013) Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81, 3–39.
 Xie et al. (2011) Xie, M.g., Singh, K. and Strawderman, W. E. (2011) Confidence distributions and a unifying framework for metaanalysis. Journal of the American Statistical Association, 106, 320–333.
 Yeh (1998) Yeh, I.C. (1998) Modeling of strength of highperformance concrete using artificial neural networks. Cement and Concrete research, 28, 1797–1808.