Composite Robust Estimators for Linear Mixed Models

Composite Robust Estimators for Linear Mixed Models

C. Agostinelli
Department of Environmental Sciences, Informatics and Statistics
Ca’ Foscari University
Venice, Italy
V.J. Yohai
Departamento de Matemáticas and Instituto de Cálculo
FCEyN, University of Buenos Aires and CONICET
Buenos Aires, Argentina
July 5, 2019

The Classical Tukey-Huber Contamination Model (CCM) is a usual framework to describe the mechanism of outliers generation in robust statistics. In a data set with observations and variables, under the CCM, an outlier is a unit, even if only one or few values are corrupted. Classical robust procedures were designed to cope with this setting and the impact of observations were limited whenever necessary. Recently, a different mechanism of outliers generation, namely Independent Contamination Model (ICM), was introduced. In this new setting each cell of the data matrix might be corrupted or not with a probability independent on the status of the other cells. ICM poses new challenge to robust statistics since the percentage of contaminated rows dramatically increase with , often reaching more than 50% . When this situation appears, classical affine equivariant robust procedures do not work since their breakdown point is 50%. For this contamination model we propose a new type of robust methods namely composite robust procedures which are inspired on the idea of composite likelihood, where low dimension likelihood, very often the likelihood of pairs, are aggregate together in order to obtain an approximation of the full likelihood which is more tractable. Our composite robust procedures are build over pairs of observations in order to gain robustness in the independent contamination model. We propose composite S and -estimators for linear mixed models. Composite -estimators are proved to have an high breakdown point both in the CCM and ICM. A Monte Carlo study shows that our estimators compare favorably with respect to classical S-estimators under the CCM and outperform them under the ICM. One example based on a real data set illustrates the new robust procedure.

Keywords: Composite S-estimators, Composite -estimators, Independent Contamination Model, Tukey-Huber Contamination Model, Robust estimation.

1 Introduction

The purpose of this paper is to find robust procedures for mixed linear models. This class of models include among others ANOVA models with repeated measures, models with random nested design and models for studying longitudinal data. These models are generally based on the assumption that the data follow a normal distribution and therefore the parameters are estimated using the maximum likelihood principle. See for example, Searle et al. (1992). As is well known, in general, the estimator obtained by maximum likelihood under the assumption that the data have a normal distribution is very sensitive to the presence of a small fraction of outliers in the sample. More than that, just one outlier may have an unbounded effect on this estimator. There are many robust estimators that have been proposed to avoid a large outlier influence. A large list of references of these proposals is available in Heritier et al. (2009). Victoria-Feser and Copt (2006) introduces a very interesting robust S-estimator for mixed linear models based on M-scales which has breakdown point equal to . We can also mention Gill (2000), Jiang and Zhang (2001), Sinha (2004), Copt and Heritier (2006), Jacqmin-Gadda et al. (2007), Lachosa et al. (2009), Chervoneva and Vishnyakov (2011) and Koller (2013a) which studied an SMDM-estimator. The procedure proposed in the last paper is implemented in the R package robustlmm (Koller, 2013b).

However all these procedures are focused on coping with outliers generated under the Classical (Tukey-Huber) Contamination Model (CCM), where some percentage of the units that compose the sample are replaced by outliers. However Alqallaf et al. (2009) introduced another type of contamination (called Independent Contamination Model, ICM) that may occur in multivariate data. Instead of contaminating a percentage of the units that compose the sample, the different cells of each unit may be independently contaminated. In this case, if the dimension of each unit is large, even a small fraction of cell contamination may lead to a large fraction of units with at least one contaminated cell. This type of contamination specially occurs when the different variables that compose each unit are measured from independent laboratories. Alqallaf et al. (2009) showed that for this type of contamination the breakdown point of affine equivariant procedures for multivariate location and covariance matrix tends to zero when the number of variables increases and therefore their degree of robustness is not satisfactory. A similar phenomenon occurs when dealing with mixed linear models. In particular the S-estimator procedure introduced in Victoria-Feser and Copt (2006) loses robustness for high dimensional data with independent contamination.

In this paper we propose a new class of robust estimators for linear mixed models. These estimators may be thought as robust counterparts of the composite likelihood estimators proposed by Lindsay (1988). If a vector of dimension is observed, the composite likelihood estimators are based on the likelihood of all the subvectors of a dimension . The estimators that we propose here are based on -scales of the Mahalanobis distances of two dimensional subvectors of . The -scale estimators were introduced by Yohai and Zamar (1988) and provides scales estimators which are simultaneously highly robust and highly efficient. We are going to show that these estimators have a robust behavior for both contamination models: the classical contamination model and the independent contamination model. In particular we will show that the breakdown point for the classical contamination model is , while for the independent contamination model is .

In Section 2 the model and the notation are presented. Section 2.1 introduces the Composite S-estimator, while Section 3 defines the Composite -estimator. Sections 4 and 5 discuss the breakdown properties and the asymptotic normality of the Composite -estimator, Section 6 provides details on the computational algorithm and Section 7 illustrates with a real data set the advantages of the proposed estimator. In Section 8 we perform a Monte Carlo simulation that shows that the proposed procedure has a robust behavior under both contamination models. Section 9 provides some concluding remarks. An Appendix contains details on computational aspects and the proofs of statements reported in previous Sections.

2 Model and Notation

Denote by the multivariate normal distribution of dimension with mean and covariance matrix . Many statistical models for components of variance and longitudinal analysis are of the following form. In the case of fixed covariables is assumed that independent -dimensional om vectors in are observed, and has distribution , where


are a fixed matrices and is an unknown -vector parameter. Moreover,


where , are matrices, is the identity, and are unknown parameters, where

In the case of random covariables, that is, when are i.i.d random matrices, it is assumed that


This is equivalent to independent of with distribution . However, in Section 5, where we study the asymptotic properties of the proposed estimators, we use a weaker assumption. In fact we only require that be independent of and have elliptical distribution with center and covariance matrix .

This setup covers several statistical models, for instance that of the form


where are as before, , , are known design matrices for the random effects, are independent -dimensional vectors with distribution , where is the identity and are -dimensional error vectors with distribution . Then, in this case we have with , .

2.1 Composite S-estimator

A very interesting class of S-estimators for the model defined by (1) and (2) was proposed by Victoria-Feser and Copt (2006).

Given a dimensional column vector and a vector and matrix the square of the Mahalanobis distance is defined by

Let , where is the set of nonnegative real numbers, satisfying the following properties:




implies .


is continuous.




If and , then .

Let be defined by

where is a chi-square distribution with degrees of freedom. Then, given a sample , an M-scale estimator is defined by the value solution of


The S-estimator proposed by Victoria-Feser and Copt (2006) is defined by

subject to

These estimators can be thought as a constrained version of the S-estimators for multidimensional location and scatter proposed by Davies (1987).

Given a squared matrix we denote by where is the determinant of the matrix . Note that depends only on and then will be denoted by . It is easy to show that the estimators proposed by Victoria-Feser and Copt (2006) can be also defined by

where the M-scale is defined now by (5). Notice that is defined by

In the classical contamination model a fraction of the vectors are replaced by outliers. Victoria-Feser and Copt (2006) show that for this model the breakdown point of this estimator is . Therefore if , we get .

Alqallaf et al. (2009) consider a different contamination model for multivariate data: the independent contamination model. In this contamination model if we observe a vector each component of has probability of being replaced by an outlier. Therefore the probability that at least one component of be contaminated is , and this number is close to one when is large even if is small.

Alqallaf et al. (2009) showed that the breakdown point for the independent contamination model of S-estimators of multivariate location and scatter tends to when . The same happens with other popular affine equivariance estimators like the minimum volume ellipsoid (Rousseeuw, 1985), Minimum covariance determinant (Rousseeuw, 1985) or the Donoho-Stahel estimators (Donoho, 1982; Stahel, 1981). The S-estimator proposed by Victoria-Feser and Copt (2006) for model (1)-(2) have a similar shortcoming: when , its breakdown point tends to under the independent contamination model. For this reason, hereafter we introduce a new type of estimators namely composite S-estimators and composite -estimators.

Given a vector , a matrix and for a couple of indices () we denote and the submatrix

In a similar way, given a matrix we denote by the matrix of dimension built by using the corresponding rows of . We define a pairwise squared Mahalanobis distance and a pairwise scale by


where the M-scale is now defined by (5) with given by


Thus is defined by


Similarly to Victoria-Feser and Copt (2006), we define for the model in (1)-(2), the composite S-estimator of and by


and the estimator of by


One shortcoming of the composite S-estimators are, as occurs with regression S-estimators, that they are not simultaneously highly robust and highly efficient. For this reason in next section we introduce the composite -estimators which are defined similarly to the -estimators, but replacing the M-scale by a -scale.

3 Composite -Estimator

In this section we introduce the composite -estimator. A -scale is defined using two functions and . Given a sample , the function is used to define an M-scale by


and the -scale by

We will require that and satisfy A1-A5. Put , . In Yohai and Zamar (1988) it is shown that to guarantee the Fisher consistency of the -estimators of regression, it is required that satisfies the following condition:


is continuously differentiable and for .

The breakdown point of the -scale is the same as that of the -scale. Then we are going to set to have breakdown point close to in the classical contamination model.

The estimators are going to be defined as in the previous section by replacing the M-scales by the -scales. Then is defined as in (6) and the -scale is

Let be the sum of all the scales, i.e.,


then the composite -estimator of is defined as follows


while is obtained as in (10) setting .

In the example of Section 7 and in the Monte Carlo study of Section 8 we took in the following family of functions,


where is the family of rho functions introduced by Muler and Yohai (2002) defined by


where , , , , and . The functions in this family have shapes close to that of those in the optimal family obtained by Yohai and Zamar (1997). However, they are easier to compute. The reason why we compose the function with the square root is that the functions are applied to the squared Mahalanobis distances. Notice that for any the scale obtained with and is equal to the scale corresponding to and divided by Hence without loss of generality we can take . We found that taking we obtain a good trade-off between robustness and efficiency, and these are the values that we recommend to use.

It is easy to show that the composite -estimators are equivariant for regression transformations of the form where is a vector, affine transformations of the form , where is a non singular matrix or scale transformations of the form , where is a scalar.

4 Breakdown point

Donoho and Huber (1983) introduced the concept of a finite sample breakdown point (FSBDP). For our case, let and be estimators of and . Informally speaking, the FSBDP of is the smallest fraction of outliers that makes the estimator unbounded.

To formalize this, let be a data set of size corresponding to model (1)-(2), , , , and (). Let be the set of all the samples with such that . Given estimators and we let

Definition 1

The finite sample breakdown point of for classical contamination (FSBDPCC) at the sample is defined by where and the breakdown point of by where

Let be the set of all the samples such that for each , . Given estimators and we let

Definition 2

The finite sample breakdown point for under independent contamination (FSBDPIC) at the sample is defined by , where , and the breakdown point of by where

The following theorems, whose proofs are discussed in Appendix B, gives a lower bound for the breakdown point of composite -estimators under both the classical and the independent contamination models. Before to state the Theorems we need the following notation. Given a sample we define

Theorem 1

Let , , as defined in (20). Assume that A1-A6 holds and let be the composite -estimator for the model given by (1) and (2). Then a lower bound for and for is given by .

Note that taking , this lower bound is close to for large independently of .

Theorem 2

Let , , as defined in (20). Assume that A1-A6 holds and let be the composite -estimator for the model given by (1) and (2). Then a lower bound for and for is given by .

In this case taking this lower bound is close to for large independently of .

5 Consistency and Asymptotic Normality

In this Section we study the almost sure consistency and asymptotic normality of the composite -estimators. We need the following additional assumptions for consistency


The vector is random and the error is independent of and has an elliptical density of the form


where is non increasing and is strictly decreasing in a neighborhood of 0.


Let be the distribution of Then for any , we have


(Identification condition). If for all we have .

An important family of distributions satisfying A7 is the multivariate normal, in this case,


Note that when the s satisfy (1), (2) and (3), A7 is satisfied. The following Theorem states the consistency of composite -estimators.

Theorem 3

Let , , , be i.i.d random vectors with distribution and call the marginal distribution of the Assume (i) satisfies (A1-A5), (ii) satisfies A1-A6, (iii) under A7 and A8 holds and (iv) A9. Then, the composite -estimators , and satisfy (a.s.), (a.s.). Moreover, if is given by (22) we also have (a.s.).

Note that for the consistency of and is not necessary that be multivariate normal. We do not give a formal proof of this Theorem. In Theorem 5 of the Appendix C we give a rigorous proof of the Fisher consistency of the estimating functional associated to the compose -estimator. From this result we derive an heuristic proof of Theorem 3.

The following Theorem states the asymptotic normality of composite -estimators. We need the following assumptions


Let be the distribution of . Then has finite second moments and is non–singular.


The functions , are twice differentiable.

Theorem 4

Let and be the composite -estimator. Consider the same assumptions as in Theorem 3, A10 and A11. Then, we have


and and are the gradient and Hessian matrix of respectively.

We do not give the proof of Theorem 4. However, it can be obtained using standard delta method arguments, see for example Theorem 10.9 in Maronna et al. (2006). This Theorem allows to define Wald tests for null hypothesis and confidence intervals for and , but not for . However in most practical applications the interest is centered in and .

6 Computational aspects

The composite -estimators are obtained by an iterative algorithm. Hereafter we provide some details. Given starting values and , we perform iterations on steps (A)-(C) until convergence. Suppose that we have already computed and then we performed the following steps A, B and C to obtain and :


Find scales () by solving equations


Update by the fixed point equation using equation (25) derived in Appendix A.1. That is,



A fixed point equation for can be derived from the estimating equation (27). However we found that to use this equation to update was numerically unstable. We preferred to make this updating by means of a direct minimization of the goal function defined in (12), that is, we define by

For this purpose, in the example of Section 7 and in the Monte Carlo study of Section 8 we used the function optim of the R program.


Once the convergence criterion for is reached, the estimator of is obtained by solving the equation (10).

To start the iterative algorithm the initial estimators and are necessary. Let , the -th column of , . Then, can be obtained by means of robust regression estimator using as response and , as covariables. In our algorithm we use an MM-estimator as implemented in function lmRob of the R package robust using an efficiency of . Once this initial estimator is computed, the residuals can be evaluated. Then, a robust covariance matrix of robust under the ICM model is obtained applying to these residuals the estimator presented in Agostinelli et al. (2014) based on filtering and S-estimators with missing observations. Call to this matrix and let be the vector of the values of the lower triangular side of this matrix including the diagonal elements. In a similar way, let be the column vector of the values of the lower triangular side of the matrix . An initial estimator of could be obtained by means of a regression estimator using as response and as covariables. Since neither nor need to have outliers, it is not necessary to use a robust estimator, in fact we use function lm of R to perform this step.

7 Example

Hereafter we present one application of the introduced method on a real data set. The example is a prospective longitudinal study of children with disorder of neural development. In this data set, outliers are present in the couples rather than in the units and the composite -estimator provides a different analysis with respect to maximum likelihood and classical robust procedures.

7.1 Autism

The data used in this example were collected by researchers at the University of Michigan (Anderson et al., 2009) as part of a prospective longitudinal study of children and they are analyzed, among others, also in West et al. (2007). The children were divided into three diagnostic groups when they were years old: autism, pervasive developmental disorder (PDD), and nonspectrum children. The study was designed to collect information on each child at ages , , , , and years, although not all children were measured at each age. One of the study objectives was to assess the relative influence of the initial diagnostic category (autism or PDD), language proficiency at age , and other covariates on the developmental trajectories of the socialization of these children. Study participants were children who had consecutive referrals to one of two autism clinics before the age of years. Social development was assessed at each age using the Vineland Adaptive Behavior Interview survey form, a parent-reported measure of socialization. The dependent variable, vsae (Vineland Socialization Age Equivalent), was a combined score that included assessments of interpersonal relationships, play/leisure time activities, and coping skills. Initial language development was assessed using the Sequenced Inventory of Communication Development (SICD) scale; children were placed into one of three groups (sicdegp, , where is the indicator function of the group) based on their initial SICD scores on the expressive language subscale at age . We consider the subset of children for which all measurements are available. We analyze this data using a regression model with random coefficients where vsae is explained by intercept, age, age and sicdegp as a factor variable plus interaction among the age related variables and sicdegp. Hereafter, the variable age is shifted by . Let be the value of the -th vsae for the -th ages value , then it is assumed that for we have

where are i.i.d. random coefficients with mean and covariance matrix

are fixed coefficients and the are i.i.d. random errors independent of the random coefficients with zero mean and variance . Then, the model could be rewritten in term of (1) and (2) with , , and , ,

while the variance and covariance structure is as follows. Let, a -vector of ones, , which corresponds to age and which corresponds to . Then, we have , , , , and . is the scale of the error term, , , , , and .

Method Int.
Max. Lik. 12.847 6.851 0.062 5.245 2.154 6.345 4.512 0.133 0.236
[0.000] [0.000] [0.579] [0.041] [0.325] [0.000] [0.000] [0.446] [0.121]
Composite 12.143 6.308 0.089 5.214 4.209 5.361 3.852 0.082 0.061
[0.000] [0.000] [0.329] [0.000] [0.012] [0.000] [0.001] [0.578] [0.677]
S Rocke 10.934 7.162 0.107 4.457 0.108 5.769 4.995 0.094 0.419
[0.000] [0.001] [0.666] [0.049] [0.957] [0.002] [0.000] [0.688] [0.011]
SMDM 12.346 6.020 0.001 5.192 2.173 5.190 3.870 0.046 0.151
[0.000] [0.000] [0.992] [0.010] [0.213] [0.000] [0.000] [0.781] [0.300]
Table 1: Autism data set. Estimated fixed term parameters by different methods. P-values are reported under squared parenthesis.

Table 1 report the estimators and the inference for the fixed term parameters using different methods, while Table 2 reports the estimators of the random effect terms. ML, S and SMDM provide similar results, while differences are present with the composite method. Main differences are on the estimation of the random effects terms, both in size (error variance component) and shape (correlation components). Composite assign part of the total variance to the random components while the other methods assign it to the error term. In fact, variances estimated by composite are in general larger than that estimated by the other methods; composite suggests negative correlation between intercept and age, while ML, S and SMDM suggest positive correlation. Composite provides small estimates compared to the other methods for the error variance. These discrepancies reflects mainly on the inference for the fixed term coefficients where the variable sicdegp is significant using composite but is not using ML, S and SMDM procedures. Interactions between age and sicdegp is highly non significant using composite and SMDM while it is somewhat significant using S.

To investigate more the reasons of differences between composite robust procedure and classic robust procedure results, we investigate cell, couple and row outliers. For a given dimension we define as -dimension outliers those -dimension observations such that the corresponding squared Mahalanobis distance is greater than a quantile order of a chi-square distribution with degree of freedom. In particular we call cell, couple and row outliers the -dimension, -dimension and -dimension outliers respectively. Composite procedure identifies couple outliers out of couples () at . The affected rows, with at least one couple outliers, are out of . This means that the classic S and SMDM procedures have to deal with a data set with a level of contamination about . We also run the analysis using the composite S estimator, not reported here, the results are similar to those obtained by the composite estimator.

Maximum Likelihood 2.643 2.328 0.102 0.775 0.429 0.038 51.360
Composite 9.362 9.670 0.052 4.019 0.002 0.327 5.164
S Rocke 9.467 3.373 0.222 2.170 1.062 0.349 22.209
SMDM 5.745 0.092 0.115 0.727 0.813 0.103 25.385
Table 2: Autism data set. Estimated random term parameters by different methods.

8 Monte Carlo simulations

In this section we describe the results of a Monte Carlo study with the aim of illustrating the performance of the new procedure in the classical contamination Model (CCM) and in the independent contamination model (ICM). We consider a 2-way crossed classification with interaction linear mixed model


where , , and . Here, we set , and which leads to .