A transformationbased approach to Gaussian mixture density estimation for bounded data
Abstract
Finite mixture of Gaussian distributions provide a flexible semiparametric methodology for density estimation when the variables under investigation have no boundaries. However, in practical applications variables may be partially bounded (e.g. taking nonnegative values) or completely bounded (e.g. taking values in the unit interval). In this case the standard Gaussian finite mixture model assigns nonzero densities to any possible values, even to those outside the ranges where the variables are defined, hence resulting in severe bias.
In this paper we propose a transformationbased approach for Gaussian mixture modelling in case of bounded variables. The basic idea is to carry out density estimation not on the original data but on appropriately transformed data. Then, the density for the original data can be obtained by a change of variables. Both the transformation parameters and the parameters of the Gaussian mixture are jointly estimated by the ExpectationMaximisation (EM) algorithm. The methodology for partially and completely bounded data is illustrated using both simulated data and real data applications.
Keywords: Bounded support; Density estimation; EM algorithm; Gaussian mixture models; Rangepower transformation.
1 Introduction
Density estimation is the problem of inferring a probability density function given a finite number of sample data points drawn from a population described by a probability distribution. Broadly speaking, three alternative approaches to density estimation can be distinguished. In the parametric approach a parametric distribution is assumed for the density with unknown parameters which are estimated by fitting the parametric function using the observed data. Conversely, in the nonparametric approach no density function is assumed a priori, but its form is completely determined by the data. Histograms and kernel density estimation (KDE) are two popular methods that belong to this class, and both are characterised by the number of parameters growing with the size of the dataset. Furthermore, extensions to higher dimensionality is problematic. A third approach is based on finite mixture models, where the unknown density is expressed as a convex combination of one or more probability density functions. In this class a popular model is the Gaussian mixture model which assumes the Gaussian distribution for the underlying component densities.
Gaussian mixture models (GMMs) can approximate any continuous density with arbitrary accuracy provided the model has a sufficient number of components and the parameters of the model are correctly estimated (Escobar and West, 1995; Roeder and Wasserman, 1997). However, the standard GMM for density estimation does not take into account whether or not a variable has bounded support.
Consider the graphs in Figure 1 which show some histograms for random samples drawn from two distributions, one bounded from below (top panels), and one having both lower and upper bounds (bottom panels). In these graphs boundaries are shown as vertical dots, true densities are represented as solid lines, and density estimates based on GMMs as dashed lines (see left panels of Figure 1). In both cases, the estimated densities are unsatisfactory at the boundaries, but also in the range of admissible values. A possible way to tackle these problems is to abandon the use of GMMs in favour of alternative component distributions. Another option is to remain in the realm of the Gaussian mixtures framework, but analyse the data in a transformed scale. The right panels of Figure 1 show the density estimates obtained with the GMDEB approach proposed in this paper. In both cases, the true underlying densities appear to be well approximated, and with natural boundaries constraints clearly satisfied.
In this paper a transformationbased approach to density estimation based on GMMs is proposed and discussed. The basic idea is to use an invertible function to map the bounded range of a variable to the whole real line, estimate the density of the transformed variable, and then backtransform to the original scale. This approach seems very natural and it has been around for a long time (see Wand et al., 1991; Marron and Ruppert, 1994, for KDE), but a simple and efficient implementation of this methodology is not yet available in the context of mixture density estimation. Note, however, that a similar approach based on Manly transformation has been recently proposed by Zhu and Melnykov (2016) for modelling skewed data in modelbased clustering.
In Section 2 the GMMs approach to density estimation is reviewed. Then, Section 3 presents the proposed rangepower transformation method for density estimation using GMMs in case of bounded variables. The model is described and the corresponding maximum likelihood estimates are derived through the EM algorithm. Section 4 contains the results of some simulation studies carried out to evaluate the proposed methodology and to compare with other available methods. In Section 5 some realworld datasets are analysed. The final section provides some concluding remarks.
2 Finite mixture modelling
2.1 Finite mixture for density estimation
Consider a vector of random variables taking values in the sample space with , and assume that the probability density function can be written as a finite mixture density of components of the form
where is the parameters vector. The mixing weights must satisfy the constraints for all , and . The th component density is usually taken as known except for the associated parameter(s) . Most applications assume that all component densities arise from the same parametric distribution family, although this need not be the case in general. In particular, a popular model specifies , where is the Gaussian density with mean and covariance matrix . Then, the Gaussian mixture model (GMM) can be written as
(1) 
where in this case represents the entire parameters vector. Note that is an operator that forms a vector by extracting unique elements of a symmetric matrix.
In this paper we refer to (1) as the Gaussian mixture density estimate (GMDE) model. The usual nonparametric kernel density estimate (KDE) can be viewed as a mixture of components with uniform weights, i.e. (Titterington et al., 1985, pp. 28–29). Compared to KDE, finite mixture modelling uses a smaller number of components (i.e. less parameters), so it has smaller variance. Conversely, compared to parametric density estimation, finite mixture modelling has the advantage of (potentially) using more parameters, so introducing less estimation bias. There are also disadvantages related to mixture modelling, such as an increased learning complexity and lack of closedform solution, so it needs to resort to numerical procedures (e.g. EM algorithm), and in certain cases there can be identifiability issues.
2.2 Estimation of Gaussian finite mixture model
Consider a random sample of observations on variables drawn from the mixture distribution in (1). Then, the loglikelihood is given by
(2) 
Direct maximisation of the loglikelihood function is not straightforward, so MLEs are usually obtained via the ExpectationMaximisation (EM) algorithm (Dempster et al., 1977; McLachlan and Peel, 2000).
An incompletedata formulation of the mixture problem is introduced by associating to each observation a latent componentlabel vector (). This is a dimensional vector, with the generic element or 0 according to whether or not arises from the th component of the mixture. Assuming independence of the completedata vector , and the multinomial distribution for the componentlabel vectors s, the completedata loglikelihood is given by
(3) 
The loglikelihood (2) is maximised using the EM algorithm, an iterative algorithm that alternates two steps, called Estep and Mstep, which guarantees, under fairly general conditions, the convergence to at least a local maximiser. The objective function at iteration of the EM algorithm is the conditional expectation of the completedata loglikelihood (3), the socalled Qfunction:
where , i.e. the estimated posterior probability at iteration of the EM algorithm, with the indicator function which equals 1 if the condition is fulfilled and 0 otherwise.
In the Estep the Qfunction is evaluated, using the parameter values obtained at the previous step, to get the updated posterior probabilities
Then, in the Mstep the parameters vector is updated by maximising the function given the previous values and the updated posterior probabilities , i.e.
In the case of a multivariate Gaussian mixture the Mstep yields
The update formula for the covariance matrix depends upon the structure of the withincomponent covariance matrices. Parsimonious parameterisation of the covariance matrices can be expressed through the eigendecomposition , where is a scalar controlling the volume of the corresponding ellipsoid, is a diagonal matrix specifying the shape of the density contours, and is an orthogonal matrix which determines the orientation of the ellipsoid (Banfield and Raftery, 1993; Celeux and Govaert, 1995). For instance, assuming an unconstrained covariance matrix, the updating formula is
Scrucca et al. (2016, Table 3) summarise some parameterisations of withincomponent covariance matrices, and the corresponding geometric characteristics, currently available in the mclust software. The previous unconstrained covariance matrix is indicated as VVV model. Note, however, that for some models no closedformula is available, so numerical optimisation is required.
The EM algorithm requires the specification of initial values for the the parameters, say . Alternatively, an initial assignment of observations to the components of the mixture can be made, basically starting the EM algorithm from the Mstep. In any case, the initialisation of the EM algorithm is often crucial because the likelihood surface tends to have multiple modes, although it usually produces sensible results when started from reasonable starting values (Wu, 1983, p. 150). For a further discussion on this point and a recent proposal see Scrucca and Raftery (2015).
Information criteria based on penalised forms of the loglikelihood are routinely used in finite mixture modelling for model selection, i.e. to decide how many components should be included in the mixture, but also which covariance parameterisations to adopt in the Gaussian case. Two popular criteria are the Bayesian information criterion (BIC; Schwartz, 1978; Fraley and Raftery, 1998) and the integrated completedata likelihood criterion (ICL; Biernacki et al., 2000). When the goal is density estimation, Roeder and Wasserman (1997) showed that the GMDE model selected using BIC is a consistent estimator of the true density. If only the order of the mixture is needed, formal hypothesis testing can also be pursued by likelihood ratio test (LRT). However, standard regularity conditions do not hold for the null distribution of the LRT statistic to have its usual chisquared distribution (McLachlan and Peel, 2000, Chap. 6), and significance must be assessed by resampling approaches. For a recent review see McLachlan and Rathnayake (2014), and for an implementation in the mclust software see Scrucca et al. (2016).
3 Methodology
3.1 Gaussian mixture density estimation for variables with bounded support
Let be a variate random vector from a distribution with density having bounded support , and be some family of continuous monotonic transformations that map to the dimensional real space. Then, we can write as the transformed set of variables with density having unbounded support .
Suppose that the density of the transformed data can be expressed through a Gaussian finite mixture density of the form
If an estimate is available, then by the continuous change of variable theorem the density of the untransformed data can be obtained as
(4) 
where is the Jacobian of the transformation, i.e. the determinant of the matrix of partial derivatives.
3.2 Rangepower transformation for variables with bounded support
Lower bound case
Suppose is a univariate random variable with bounded support , where , and density . A preliminary range transformation can be defined as , which maps . Let be a continuous monotonic transformation that maps . For instance, adopting the wellknown BoxCox power transformation (Box and Cox, 1964) we may write
(5) 
which has continuous first derivative equal to for any .
The original BoxCox power transformation method is restricted to the univariate case, but it can be extended also to the multivariate case as described in Velilla (1993). However, further development of the multivariate case can greatly simplified by working in a coordinatewise fashion. Thus, in this paper we propose the use of the rangepower transformation in (5) for each dimension separately.
Lower and upper bound case
Suppose now that is a univariate random variable with bounded support in , where . Consider the preliminary range transformation which maps . As in the previous case, adopting a BoxCox power transformation we can write
(6) 
with continuous first derivative given by
Following the approach discussed above, the multivariate case can be tackled by working in a coordinatewise fashion, hence applying the rangepower transformation in (6) to each variable separately.
3.3 Estimation
Maximum likelihood estimation can be pursued via the EM algorithm under the assumption that the density on the transformed scale can be expressed as
If the above equation holds, then the density function on the original scale is given by
where and is the Jacobian of the transformation. Note that as consequence of the coordinate independent approach to multivariate rangepower transformation, the matrix of first derivatives is diagonal, so the Jacobian reduces to
The conditional expectation of the complete loglikelihood given the observed data can be expressed as
where . Therefore, in the Estep the posterior probabilities are updated using
In the Mstep the parameters are updated by maximising the function given the previous values of the parameters and the updated posterior probabilities. This can be done in two steps. In the first step an updated value is computed by numerically maximising the Qfunction with respect to because no closedform expression is available. To this goal, a Newtontype numerical optimisation algorithm can be used. In our implementation we used the LBFGSB method of Byrd et al. (1995) available in the optim() function for the R statistical software. The remaining parameters are then obtained as in standard EM algorithm but accounting for the updated transformation parameters , i.e.
Again, the update formula for the covariance matrix depends upon the assumed eigendecomposition model. In the most general case of an unconstrained covariance matrix, i.e. the VVV model, we have
Initialisation of the above EM algorithm is obtained by first estimating the optimal marginal transformations, then using the final classification from a means algorithm on the rangepower transformed variables. This initial partition of data points is used to start the algorithm from the Mstep. Finally, the EM algorithm is stopped when the loglikelihood improvement falls below a specified tolerance value or a maximum number of iterations is reached.
4 Simulation studies
In this section we present some simulation studies designed to compare the proposed GMDEB approach to some density estimators for bounded variables discussed in the literature. The comparison is based on the integrate squared error (ISE):
where is the unknown true density and is its estimate based on a random sample of observations. Thus, the is a measure of discrepancy between the true and the estimated density based on a squared loss criterion. It is equal to 0 when the estimated density perfectly coincides with the true density, and increases as the differences between the two densities get larger. For more details see Scott (2009, Sec. 2.3). In the following sections the ISE is computed via numerical integration.
4.1 Distributions with lower bound
The univariate densities with lower bound support considered in this study are shown in Figure 2. They are all bounded at zero with different degrees of skewness, positive for the first three densities and negative for the last one.
The proposed method (GMDEB) is compared with the following density estimators:

simpleKern which refers to the simple boundary correction method proposed by Jones (1993) which is equivalent to a kernel weighted local linear fitting near the boundary;

reflectKern which indicates the reflection method of Schuster (1985) which amounts to reflect the observed data points at the origin, then a density estimate is obtained using this augmented dataset with a simple correction to ensure that integrate to one;

cutnormKern which is the cut and normalisation method of Gasser and Müller (1979) where the kernel is truncated at the boundary and renormalised to unity;

logtransKern which is the method proposed by Marron and Ruppert (1994) which fits a kernel density estimator on the logscale and then backtransforms the result with an explicit normalisation step;

logSpline which estimates a density using cubic splines to approximate the logdensity using knots located as described in Stone et al. (1997);

GaMixDE which estimates a density by fitting a mixture of Gamma densities using cubic splines to approximate the logdensity using knots located as described in Stone et al. (1997);

GMDE which is the standard density estimate from GMMs with no boundary correction.
The first four methods mentioned above are implemented in the evmix R package (Scarrott et al., 2018; Hu and Scarrott, 2018), whereas the logSpline estimator is available in the logspline R package (Kooperberg, 2016). For GaMixDE the code is available in the R package mixtools (Benaglia et al., 2009; Young et al., 2017), whereas GMDE is obtained from the mclust R package (Fraley et al., 2017). In the last two cases the number of mixture components is selected using the BIC criterion.
Figure 3 graphically summarises the simulation results obtained on 1000 replications. Overall the GMDEB estimator appears to be able to approximate the true density better than the other KDE methods with boundary correction, in particular when the sample size is small. The proposed approach also shows less variability, which decreases as the sample size increases. Clearly the GMDEB approach appears to be inferior to the Gamma mixture density estimator, although not by much, when the true density belongs to the Gamma family of distributions (i.e. in the first two cases), but it is better in the last two cases, in particular for the leftskewed Gompertz distribution.
4.2 Distributions with lower and upper bounds
For the case of univariate densities with both lower and upper bounds support, a list of distributions considered in the simulation study is shown in Figure 4. The first two settings involve the Beta distribution with parameters selected to produce a symmetric case and a skewed case. The last two settings consider two further asymmetric cases, the Kumaraswamy distribution with density on , and the logarithmic peak distribution with density on .
The cases described above should provide a broad spectrum of examples for comparing different estimators. To this goal the proposed method (GMDEB) is compared with the following density estimators:

beta1Kern, beta2Kern which use the Beta and modified Beta kernels proposed by Chen (1999) followed by a renormalisation to ensure a proper density;

copulaKern which uses the bivariate Gaussian copula based kernels of Jones and Henderson (2007);

logSpline which fits a density using cubic splines to approximate the logdensity using knots located as described in Stone et al. (1997);

BeMixDE which estimates the density by fitting a mixture of Beta distributions
The first two estimators are available in the R package evmix (Scarrott et al., 2018; Hu and Scarrott, 2018). The logSpline estimator is available in the logspline R package (Kooperberg, 2016). For the BeMixDE the betareg R package (Grün et al., 2012) is used with the number of mixture components selected using BIC.
Figure 5 reports the simulation results obtained on 1000 replications. By looking at the boxplots the GMDEB approach appears to be more accurate than the other nonparametric density estimators. Furthermore, its accuracy is slightly lower than the Beta mixture density estimator in the first three cases (which, however, are all cases related to the Beta distribution), but is better in the last case. Overall, GMDEB seems to provide robust reliable density estimates when both lower and upper bounds are present.
5 Real data analyses
5.1 Suicide data
Consider the suicide dataset which has been analysed by many authors (see for instance Silverman, 1986; Karunamuni and Alberts, 2005). The dataset provides the lengths of 86 spells of psychiatric treatment undergone by control patients in a suicide study. By its nature, the measured variable can only take positive values.
The GMDEB model with the largest BIC is the simple single component model (E,1) with estimated transformation parameter . The corresponding density estimate on the original scale is reported on the left panel of Figure 6, which is then compared on the remaining panels to some boundarycorrected versions of kernel density estimators (center panel), and to the logSpline and the Gamma mixture density estimates (right panel). As it can be seen from these graphs, the GMDEB estimate is able to well approximate the histogrambased empirical distribution without including any artifacts that appear in all the other density estimates. For instance, kernelbased methods tend to underestimate near the origin and show a slight bump in the right tail of the distribution. On the contrary, the logSpline and the Gamma mixture density estimates tend to overestimate near the origin.
5.2 Acidity data
This dataset provides the values of an acidity index (acidneutralizing capacity, ANC) measured in a sample of 155 lakes in NorthCentral Wisconsin. Several authors have previously analysed the data using a mixture of Gaussian distributions on the logscale (Crawford et al., 1992; Crawford, 1994; Richardson and Green, 1997; McLachlan and Peel, 2000).
We analyse the data in the original scale because the proposed method automatically selects the “optimal” transformation and takes into account the implicit lower bound of the index that can not assume negative values. From the left panel of Figure 7 we can see that according to BIC the best model is the one with two mixture components having different variances (V,2), closely followed by models (E,2) and (V,3). The right panel of Figure 7 shows the histogram of the data and the density estimated with model (V,2) using the GMDEB approach with transformation parameter . The density obtained appears to accurately follow the distribution of the data, indicating the presence of two separated skewed distributions having different dispersions, smaller for the component close to the origin and larger for higher values of ANC. This is also confirmed by the graphs in Figure 8, which show the component densities scaled by the estimated prior probabilities (left panel) and the estimated posterior probabilities . Using the standard cutoff value of , lakes with ANC smaller than about 232 are assigned to the first group, otherwise to the second group. This is in agreement with previous findings.
5.3 Racial data
Geenens (2013) presented an analysis on data giving the proportion of white student enrolled in 56 school districts in Nassau County (Long Island, New York), for the 1992–1993 school year. The density estimate for this dataset should only be supported on the range. See also Simonoff (1996, Sec. 3.2).
The selected model on the transformed scale is (E,1) with , and the corresponding density estimate on the original scale is shown in Figure 9. This can be compared with the Beta density obtained by estimating the parameters with maximum likelihood. The latter density seems to put too much emphasis close to the upper boundary and in the middle values of the distribution, while completely missing the bulk of the data between 70% and 90% of white students. On the contrary, the proposed approach provides a density estimate which correctly identify the majority of the data with at least 70% of white students, but also the small peak near the lower boundary containing schools with almost 0% white students. These findings largely agree with those reported in Geenens (2013, Fig. 3).
5.4 Plasma data
Consider the data from a study on the association between the low plasma concentrations of retinol, betacarotene, or other carotenoids on the increased risk of developing certain types of cancer (Nierenberg et al., 1989). The joint distribution of plasma Retinol (ng/ml) and plasma betacarotene (ng/ml) is bounded below at zero for both variables. The left panel of Figure 10 shows the scatterplot of data points observed on 314 patients.
If the bivariate density is estimated using the the standard GMM, the model with the largest BIC has 3 components and unconstrained covariance matrix (VVV). This relatively large number of components is related to the presence of a strong skewness in the data distribution. Furthermore, a nonnegligible mass of density is assigned to negative values of plasma betacarotene.
Both issues can be solved using the proposed rangetransformation approach. The selected model for the bivariate density estimation has BIC , with a diagonal equal variance structure and a single component (EII,1). The transformation parameters are estimated as . The right panel of Figure 10 shows the highest density regions (HDRs; Hyndman, 1996) corresponding to proportions . In this case the joint data distribution appears to be well approximated by the estimated density.
5.5 Chorizon layer of the Kola data
The Kola Ecogeochemistry Project (19931998) collected data on more than 50 chemical elements on four different primary sample materials: terrestrial moss, and the O, B, and Chorizon of podzolic soils located in parts of northern Finland, Norway and Russia. The main aim of the project was the documentation of the impact of the Russian nickel industry on the Arctic environment. The data are available on Reimann et al. (1998), see also Reimann et al. (2011). Here we analyse the distribution of nickel (Ni), copper (Cu), and chromium (Cr) on the Chorizon layer. For each of the 605 sites, the detected concentrations of the above mentioned heavy metals are provided. Clearly, concentrations are bounded below at zero, and a preliminary data exploration suggests that the joint distribution is highly skewed.
The selected GMDEB model according to the BIC criterion is a two components mixture model with variable volume and equal shape and orientation, i.e. VEE in mclust nomenclature. The estimated vector of transformation parameters is . Figure 11 contains the scatterplot matrix of heavy metal concentrations with the estimated density projected onto the marginal bivariate subspaces. The latter are shown as HDRs corresponding to 25%, 50%, and 75% probability regions. The distribution of nickel, copper, and chromium on the Chorizon layer is clearly skewed, with most sites having concentrations close to the origin. However, there are also a number of sites with relatively high concentrations. Further insights can be obtained by examining the distribution of metal concentrations conditional on the HDR to which the observed sites belong, as shown in Figure 12. Looking at the boxplots for the conditional distributions, we can see that the central part of the distribution, i.e. that corresponding to 025% HDR, is characterised by the lowest concentration levels, whereas higher concentrations of heavy metals can be found as we move to regions of lower density.
6 Conclusion
This paper addresses the problem of density estimation using GMMs when variables are partially or completely bounded. By introducing a rangepower transformation of the data, it is possible to obtain a GMM for density estimation on the transformed data, and then to derive an accurate estimate of the density on the original scale which takes into account the natural bounds of the variables. The proposed model is estimated by maximum likelihood using the EM algorithm. We showed that this transformationbased approach is able to deal with variables having both lower and upper bounds, and the results obtained are often better than those provided by other methods usually based on modified versions of kernel density estimation. The transformationbased approach seems to be very promising and, in principle, it can be applied to other types of nonGaussian variables, such as in the presence of skewness, and for other purposes outside density estimation, for instance in clustering. We defer these very important issues to future works.
References
 Banfield, J. and Raftery, A. E. (1993). Modelbased Gaussian and nonGaussian clustering. Biometrics 49, 803–821.
 Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009). mixtools: An R Package for Analyzing Finite Mixture Models. Journal of Statistical Software 32, 1–29. URL http://www.jstatsoft.org/v32/i06/.
 Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 719–725.
 Box, G. and Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 26, 211–252.
 Byrd, R. H., Lu, P., Nocedal, J., and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16, 1190–1208.
 Celeux, G. and Govaert, G. (1995). Gaussian Parsimonious Clustering Models. Pattern Recognition 28, 781–793.
 Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics & Data Analysis 31, 131–145.
 Crawford, S. L. (1994). An application of the Laplace method to finite mixture distributions. Journal of the American Statistical Association 89, 259–267.
 Crawford, S. L., DeGroot, M. H., Kadane, J. B., and Small, M. J. (1992). Modeling lakechemistry distributions: Approximate Bayesian methods for estimating a finitemixture model. Technometrics 34, 441–453.
 Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39, 1–38.
 Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the american statistical association 90, 577–588.
 Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via modelbased cluster analysis. The Computer Journal 41, 578–588.
 Fraley, C., Raftery, A. E., and Scrucca, L. (2017). mclust: Gaussian Mixture Modelling for ModelBased Clustering, Classification, and Density Estimation. URL https://CRAN.Rproject.org/package=mclust, R package version 5.4.
 Gasser, T. and Müller, H.G. (1979). Kernel estimation of regression functions. Springer.
 Geenens, G. (2013). Probit transformation for kernel density estimation on the unit interval. Journal of the American Statistical Association 109, 346–358.
 Grün, B., Kosmidis, I., and Zeileis, A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software 48, 1–25. URL http://www.jstatsoft.org/v48/i11/.
 Hu, Y. and Scarrott, C. (2018). evmix: An R package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84, 1–27.
 Hyndman, R. J. (1996). Computing and graphing highest density regions. The American Statistician 50, 120–126.
 Jones, M. C. (1993). Simple boundary correction for kernel density estimation. Statistics and Computing 3, 135–146.
 Jones, M. C. and Henderson, D. A. (2007). Miscellanea kerneltype density estimation on the unit interval. Biometrika 94, 977–984.
 Karunamuni, R. J. and Alberts, T. (2005). On boundary correction in kernel density estimation. Statistical Methodology 2, 191–212.
 Kooperberg, C. (2016). logspline: Logspline Density Estimation Routines. URL https://CRAN.Rproject.org/package=logspline, R package version 2.1.9.
 Marron, J. S. and Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 653–671.
 McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
 McLachlan, G. J. and Rathnayake, S. (2014). On The Number Of Components In A Gaussian Mixture Model. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4, 341–355.
 Nierenberg, D. W., Stukel, T. A., Baron, J. A., Dain, B. J., Greenberg, E. R., et al. (1989). Determinants of plasma levels of betacarotene and retinol. American Journal of Epidemiology 130, 511–521.
 Reimann, C., Äyräs, M., Chekushin, V., Bogatyrev, I., Boyd, R., et al. (1998). Environmental Geochemical Atlas of the Central Barents Region. Geological Survey of Norway, Trondheim, Norway .
 Reimann, C., Filzmoser, P., Garrett, R., and Dutter, R. (2011). Statistical Data Analysis Explained: Applied Environmental Statistics With R. John Wiley & Sons.
 Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59, 731–792.
 Roeder, K. and Wasserman, L. (1997). Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association 92, 894–902.
 Scarrott, C., Hu, Y., Akbar, A., and of Canterbury, U. (2018). evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. URL https://CRAN.Rproject.org/package=evmix, R package version 2.10.
 Schuster, E. F. (1985). Incorporating support constraints into nonparametric estimators of densities. Communications in StatisticsTheory and methods 14, 1123–1136.
 Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 31–38.
 Scott, D. W. (2009). Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2nd edition.
 Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016). mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. R Journal 8, 205–233. URL https://journal.rproject.org/archive/2017/RJ2017008/RJ2017008.pdf.
 Scrucca, L. and Raftery, A. E. (2015). Improved initialisation of modelbased clustering using Gaussian hierarchical partitions. Advances in Data Analysis and Classification 4, 447–460.
 Silverman, B. W. (1986). Density Estimation. Chapman & Hall/CRC.
 Simonoff, J. (1996). Smoothing Methods in Statistics. Springer.
 Stone, C. J., Hansen, M. H., Kooperberg, C., and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling. Annals of Statistics 25, 1371–1470. URL http://dx.doi.org/10.1214/aos/1031594728.
 Titterington, D. M., Smith, A. F., and Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Wiley, Chichester; New York.
 Velilla, S. (1993). A note on the multivariate BoxCox transformations to normality. Statistics and Probability Letters 17, 441–451.
 Wand, M. P., Marron, J. S., and Ruppert, D. (1991). Transformations in density estimation. Journal of the American Statistical Association 86, 343–353.
 Wu, C. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103.
 Young, D., Benaglia, T., Chauveau, D., and Hunter, D. (2017). mixtools: Tools for Analyzing Finite Mixture Models. URL https://CRAN.Rproject.org/package=mixtools, R package version 1.1.0.
 Zhu, X. and Melnykov, V. (2016). Manly transformation in finite mixture modeling. Computational Statistics & Data Analysis .