Sparse Quantile Huber Regression for Efficient and Robust Estimation

# Sparse Quantile Huber Regression for Efficient and Robust Estimation

## Abstract

We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the predictors and therefore yields a more comprehensive view of the important predictors. We propose a generalized OMP algorithm for variable selection, taking the misfit loss to be either the traditional quantile loss or a smooth version we call quantile Huber, and compare the resulting greedy approaches with convex sparsity-regularized formulations. We apply a recently proposed interior point methodology to efficiently solve all convex formulations as well as convex subproblems in the generalized OMP setting, provide theoretical guarantees of consistent estimation, and demonstrate the performance of our approach using empirical studies of simulated and genomic datasets.

\numberofauthors

## 1 Introduction

Traditionally, regression analyses focus on establishing the relationship between the explanatory variables and the conditional mean value of the response variable. In particular, these analyses use the -norm (least squares) of the residual as the loss function, together with optional regularization functions. Least squares-based methods are sufficient when the data is homogenous; however, when the data is heterogeneous, merely estimating the conditional mean is insufficient, as estimates of the standard errors are often biased. To comprehensively analyze such heterogeneous datasets, quantile regression [15] has become a popular alternative to least squares-based methods. In quantile regression, one studies the effect of explanatory variables on the entire conditional distribution of the response variable, rather than just on its mean value. Quantile regression is therefore well-suited to handle heterogeneous datasets [5]. A sample of recent works in areas such as computational biology [40], survival analysis [16], and economics [17] serve as testament to the increasing popularity of quantile regression. Furthermore, quantile regression is robust to outliers [14]: the quantile regression loss is a piecewise linear “check function” that generalizes the absolute deviation loss for median estimation to other quantiles, and shares the robustness properties of median estimation in the presence of noise and outliers. Finally, quantile regression has excellent computational properties [25].

We study the application of quantile regression for high-dimensional sparse models, where the number of variables far exceeds the sample size () but the number of significant predictors for each conditional quantile of interest is assumed to be much smaller than (). Sparsity plays an important role in many applications. For example, in the field of compressive sensing, sparsity promoting programs allow exact recovery of sparse and compressible under-sampled signals under certain assumptions on the linear operator Candès & Tao [6], Donoho [8]. In many problem domains — natural image processing, seismic imaging, and video — sparse regularization improves recovery [23, 30, 9, 32]. In statistics, sparsity is often used to find the most parsimonious model that explains the data [39, 10, 33, 31].

The most popular technique used to enforce sparsity is that of regularization with a sparsity-inducing norm; for example, the penalty [31] on the coefficient vector is often used. Such regularization has been applied to various loss functions other than loss, including logistic regression [24] and -norm support vector machines [11], among others. Algorithms for learning these sparse models include gradient methods [3] and path following methods [28]. More recently, greedy selection methods such as Orthogonal Matching Pursuit (OMP) [20] have received considerable attention [37, 12] as an alternative to the -penalized methods. In this paper, we show that these methods are also directly applicable to other important regression problems, and demonstrate improved performance of these formulations in the recovery of sparse coefficient vectors.

Recently, quantile regression with an penalty [4, 28, 19] and with non-convex penalties (smoothly clipped absolute deviation and minimax concave) [34] was studied in a high dimensional setting. Our interest in quantile regression is two-fold: (1) to find new efficient and scalable algorithms to quickly compute these robust sparse models and (2) to apply it to high-dimensional biological datasets, where such modeling is highly relevant. Therefore, we extend Belloni & Chernozhukov [4] to consider greedy and convex formulations for sparse quantile regression in a high-dimensional setting. Our main contributions are:

• We generalize the classic quantile check function to a quantile Huber penalty (see Figure 1). In many cases, this formulation holds a significant advantage since the classic quantile check function attempts to fit a portion of the data exactly. Such exact fitting, while useful in some noiseless settings, is undesirable in the presence of noise and outliers. While some smoothing of the quantile loss has been proposed in the past for computational efficiency [38], this is not our motivation, since the computational efficiency of our approach is not significantly affected by smoothness. We show in our experiments that the quantile Huber regression penalty is able to produce better results in simulated experiments.

• We propose a generalized OMP algorithm for sparse quantile regression and extend it for the quantile Huber loss function. Using the greedy OMP algorithm instead of -penalized algorithms allows us to develop efficient, scalable implementations. Furthermore, the greedy OMP algorithm exhibits significant recovery improvements over -penalized algorithms in scenarios where quantiles other than 50% must be considered to capture all relevant predictors.

To demonstrate the significance of our contributions, we compare and contrast four formulations:

1. quantile check loss with penalty (-QR)

2. quantile check loss with constraint (-QR)

3. quantile Huber loss with penalty (-QHR)

4. quantile Huber loss with constraint (-QHR).

In particular, we present methods for convex problems of the form

 minn∑i=1ρ(bi−ATix)+λ∥x∥1

where can be the quantile or quantile Huber penalty (approaches 1 and 3), as well as a generalized OMP algorithm for nonconvex problems of the form

 minn∑i=1ρ(bi−ATix)subject to∥x∥0≤s

to address approaches 2 and 4. The same optimization approach we use for 1 and 3 is also used to solve the convex subproblems required for 2 and 4. In order to optimize all convex formulations and subproblems, we exploit a dual representation of the quantile-based loss functions and use a recently proposed interior point (IP) approach. IP methods directly optimize the Karush-Kuhn-Tucker (KKT) optimality conditions, and converge quickly even when the regression problems are ill-conditioned. The particular IP approach we chose allows us to easily formulate and solve all the problems of interest in order to compare their performance. Our experiments demonstrate that, in a majority of the cases, -QHR performs best; in cases where -QHR performs comparably, we show that -QHR has properties such as quick convergence that make it more suitable in high-dimensional settings.

The rest of this paper is organized as follows. In Section 2, we explain the different quantile-based loss functions and penalties that we use in the rest of the paper. In Section 3, we present the generalized OMP algorithm to solve the -constrained quantile and quantile Huber loss functions. In Section 4, we present the dual representation of the quantile-based loss functions and briefly explain the generalized interior point approach used to solve these formulations. In Section 5, we discuss the numerical and statistical convergence of quantile Huber loss functions with constraints; the convergence with penalties can be obtained by adapting the asymptotic analysis from Belloni & Chernozhukov [4]. Finally, in Section 6, we present extremely encouraging experiments on both synthetic and genomic data that show the effectiveness of both the quantile Huber loss function and the OMP approach.

## 2 Problem Formulation

### 2.1 Notation and background

Let denote the predictor matrix, whose rows are -dimensional feature vectors for training examples. corresponds to the feature of the observation. Let denote the response vector, with as the observation. Quantile regression assumes that the -th quantile is given by

 F−1b|A(τ)=A¯xτ (1)

where is the coefficient vector that we want to estimate and is the cumulative distribution function for a multivariate random variable with the same distribution as . Let be the vector of residuals. Quantile regression is traditionally solved using the following “check-function:”

 cτ(r)=(−τ+1{r≥0})r,

where the operations are taken element-wise; note that setting yields the Least Absolute Deviation (LAD) loss.

### 2.2 A quantile Huber loss

The quantile loss has two important modeling features: first, it is robust to outliers, and second, it sparsifies the residual because of the behavior of the loss at the origin. If this second behavior is not expected or desired, an asymmetrical Huber can be designed by rounding off the function at the origin. This penalty, which we call quantile Huber, still maintains the asymetric slopes required from the quantile penalty outside the interval (see right panel of Figure 1). For a scalar , the explicit formula is given by

 ρτ(x)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩τ|x|−κτ22ifx<−τκ12κx2ifx∈[−κτ,(1−τ)κ](1−τ)|x|−κ(1−τ)22ifx>(1−τ)κ (2)

In this paper we assume that the true model parameters are sparse, namely that for a given quantile , the true model parameter has a support of cardinality . To enforce sparsity we consider both the penalized estimation framework

 ^x=argminn∑i=1ρτ(bi−ATix)+λ∥x∥1 (3)

and the greedy approach that approximately solves the minimization of subject to the constraint where is a desired level of sparsity.

## 3 Generalized OMP for ℓ0-Qhr

Orthogonal matching pursuit (OMP) is an effective method for finding sparse solutions to data fitting problems. Given a basis of size , classic OMP computes the residual (where is the submatrix of comprising columns in , and are the corresponding model coefficients), and then chooses the next element to add to by computing

 i=argmaxi|rTAi|,

that is, the index where the maximum absolute value of the projection of the residual is achieved.

This approach generalizes to arbitrary loss functions , including the quantile and quantile Huber loss functions. Generalized OMP for -QHR is detailed in Algorithm 1.

When the loss is not differentiable, a subgradient can be used instead in (5). For example, for the quantile loss with parameter , we interpret as follows:

 ∇cτ(r):=(1−τ)r++τr−.

Note that the quantile Huber loss is differentiable, and the gradient is easily computable from (2).

The refitting step (6) can be solved using any algorithm. In particular, the largest submatrix used to solve any (6) problem will be , with the total number of OMP steps, and is small when the solution is expected to be very sparse. This regime favors the IP method described in section 4, and is dominated by costs of forming () and solving () a particular linear system discussed in section 4. When used as a subroutine to generalized OMP, IP methods very rapidly, reliably, and accurately solve the refitting step (6) for smooth and nonsmooth penalties .

## 4 Efficient optimization for Q(H)R

This section describes our optimization approach to solving -QR and -QHR penalized problems, as well as the refitting step for -QR and -QHR. We first show how our objectives fit into a very general class of functions and give a description of a recently proposed IP algorithm for this class [1]. The section concludes by detailing how to parallelize the required dense linear algebra operations for large-scale data using MATLAB and C++

We first note that other algorithms could be applied to these problems, but a rigorous comparison is out of the scope of this paper. For example, the refitting step in OMP is smooth for quantile Huber Regression, and the objective of -QHR fits into the well-known framework of fast gradient methods for optimizing certain smooth plus non-smooth functions (e.g., Beck & Teboulle [3]). While these methods are competitive with what we describe, the more general framework used here efficiently solves all of our problems, for both smooth and nonsmooth objectives.

### 4.1 Piecewise Linear Quadratic Functions

A broad class of functions known as piecewise linear quadratic (PLQ) are of the form

 f(x):=supu∈U⟨u,Bx+b⟩−12uTMu, (7)

where is a polyhedral set, and is positive semidefinite [27]. When , for all , and we therefore refer to these functions as penalties. Note that for any piecewise linear penalty, such as the quantile penalty (see left panel of Figure 1).

The quantile penalty for can be written

 cτ(r)=τ∑i(ri)−+(1−τ)∑i(ri)+, (8)

where and . The quantile penalty can be represented using notation (7) by taking

 M=0,B=I,b=0,C=[I−I], and c=[(1−τ)1τ1], (9)

to obtain

 cτ(r)=supu∈[−τ,(1−τ)]n⟨u,r⟩. (10)

By taking , the quantile Huber penalty can be represented as

 ρτ(r)=supu∈[−τ,(1−τ)]n⟨u,r⟩−κ2uTu. (11)

Note that in our regression settings, is the residual of a linear function on data matrix and can be plugged into both (10) and (11) by taking and as the response vector.

Moreover, it is easy to see that adding a sparse regularization term to any objective with PLQ penalty encoded by gives another PLQ penalty with augmented data structures , ,

 ~C=⎡⎢⎣CI−I⎤⎥⎦,~c=⎡⎢⎣c−λ1λ1⎤⎥⎦, and ~M=[M000]. (12)

### 4.2 Large-scale interior point approach

The significance of representations (10), (11), and (12) is that all objectives of our interest (including -QR/QHR and the generalized OMP refitting step for -QR/QHR) are of the form (7). Many of the matrices in these PLQ representations are hypersparse; that is, the number of non-zeros is of the order of the matrix dimension. In this section, we briefly review the interior point (IP) approach presented in Aravkin et al. [1] and show how to exploit hypersparsity.

The Karush-Kuhn-Tucker (KKT) optimality conditions for (7) are

 F(x,u,s,q):=⎡⎢ ⎢ ⎢ ⎢⎣BTub+Bx−Mu−CTqCu+s−cQs⎤⎥ ⎥ ⎥ ⎥⎦=0, (13)

where is a slack variable added to the inequality constraint, and is the dual variable corresponding to the resulting equality constraint. IP methods iteratively solve using relaxed systems , obtained by approximating the complementarity conditions in (13) by . Specifically, IP methods solve, at each iteration, the equation defined by a first-order approximation to

 F(1)μ[ΔxTΔuTΔsTΔqT]T=−Fμ, (14)

where is derivative matrix of with respect . Parameter is quickly driven to as the iterations proceed, so that we solve .

IP methods are explained in many classic sources [36, 35] and exhibit super-linear convergence. In order to discuss efficiency, we give an explicit algorithm to solve (14). Defining

 T:=M+CTQS−1C,Ω=BTT−1B, (15)

where , the full Newton iteration (14) is implemented as follows:

 r1 =−s−Cu+c Δx =Ω−1r4 (16) r2 =μ1+Q(Cu−c) Δu =T−1(−r3+BΔx) r3 =−(Bx−Mu−CTq) Δq =S−1(r2+QCΔu) −b+CTS−1r2 r4 =−BTu+BTT−1r3 Δs =r1−CΔu

For -QR/QHR using formulations (10), (11), and (12), is simply a stack of signed identity matrices, and , , and are diagonal by construction. The computations needed in (16) are vector additions, matrix-vector products, matrix-matrix products, and linear solves. Most of the operations are sparse (and fast); the expensive operations are forming and solving for .

In the case of -QR/QHR, OMP picks columns corresponding to a basis estimate for the refitting step at iteration , and in (16) is simply the -column submatrix of data matrix . The cost of forming is , and the cost of solving for is at iteration . Regarding -QR/QHR, in (16) is augmented as shown in (12) and ( are appropriate submatrices of ). The Woodbury inversion formula can be applied to solve for in (16):

 Δx=Ω−1r4=Tpr4−TpATΦ−1ATpr4,

where is a square positive definite matrix in the smaller dimension (number of samples). requires operations to form and to solve.

The IP algorithm is thus very efficient for all formulations on problems up to a moderate-sized number of samples. First order methods would typically require thousands of more iterations at lower complexities of . First order methods may be preferable for large sample sizes; however, for certain special cases, inexact IP methods have been shown to be competitive with state of the art first order methods for huge scale problems [13]. For the problems we studied, the IP framework detailed in this section was used for all convex formulations and convex subproblems of the generalized OMP algorithm.

### 4.3 Solving large-scale systems

The results in this paper (see Section 6) were generated using a multi-threaded version of MATLAB, which is very efficient for basic dense- and sparse-linear algebra operations. However, when the datasets (and ) are large, MATLAB is unable to load the datasets into its memory. Fortunately, it is possible to isolate the operations that involve and to use MATLAB for the remaining computations 1. For large datasets, is maintained as a file handle throughout the MATLAB code and never loaded into memory. When operations involving are to be performed, they are dispatched to a MEX function that contacts a parallel server to perform the operations. Our parallel server makes use of Elemental [26], a distributed dense matrix linear algebra package that makes use of MPI [21, 22] for parallelism. The server is capable of (optionally) caching matrices, which allows us to load the matrix only once and minimize disk cost. For example, when needs to be computed, the MEX function writes out to disk and contacts the server with a request to compute . The server then reads into distributed memory, computes (assuming was loaded earlier), and writes out the resulting vector to disk, which is read back in by the MEX function. Note that as is diagonal in our formulations, we can compute as a diagonal scaling operation on and use the resulting to form .

## 5 Convergence and consistency of OMP for ℓ0-Qhr

In this section we study the behavior of the quantile Huber OMP estimator (-QHR) in the high dimensional setting, i.e. in cases where . To the best of our knowledge, matching pursuit methods have not been considered before in the context of quantile regression, nor have they been studied theoretically. Here we study the various components involved in securing the numerical and statistical convergence of OMP for -QHR.

### 5.1 Notation

Let denote the empirical quantile Huber loss:

 Ln(x)=1nn∑i=1ρτ(bi−ATix).

Let denote the population loss, namely the expectation of the loss

 L(x)=EA,bLn(x).

For any reference vector we define the following three categories of indices.

 E0(^x) = {i:bi−ATi^x<−τκ} E1(^x) = {i:bi−ATi^x∈[−κτ,(1−τ)κ]} E2(^x) = {i:bi−ATi^x>(1−τ)κ}

can be seen as the set of outlying observations, while is the set of inlying observations based on Finally, let denote the restriction of matrix to the rows in set

### 5.2 Numerical convergence

The numerical convergence rate of OMP for -QHR as an optimization method is guided by three properties: (i) Restricted strong convexity, (ii) Lipschitz continuity and (iii) smoothness of the quantile Huber loss. We refer the reader to Shalev-Shwartz et al. [29] for the definition of these properties. The following proposition establishes Lipschitz and smoothness properties for the quantile Huber loss.

###### Proposition 1

The quantile Huber loss function is Lipschitz continous with Lipschitz constant The quantile Huber loss function is smooth with smoothness constant

Proof: Any global bound on the derivative of the loss is a Lipschitz constant for the loss function, and by construction of the quantile Huber loss, the maximum of the slopes of the linear portions of the loss is such a bound, namely The smoothness constant is the second derivative of the quadratic portion of the quantile Huber loss, namely

The following proposition characterizes the numerical convergence of OMP for -QHR

###### Proposition 2

Let denote the population minimizer of the quantile huber loss. Assume that the OMP algorithm for -QH is run for iterations, and produces the iterate . Then for any such that

 k≥2∥~x∥21κϵ,

there holds

 Ln(x(k))−Ln(~x)≤ϵ.

Proof: Noting that the quantile Huber loss is smooth with constant the result follows from theorem 2.7 in Shalev-Shwartz et al. [29].

Exponentially better numerical convergence can be secured under restricted strong convexity (see Shalev-Shwartz et al. [29][Definition 1.3]). To guarantee the latter property we need the following assumption, which is commonly made on the entire matrix in the study of high dimensional least squares regression.

###### Assumptions 1

(Sparse Eigenvalue on ). Given any positive integer for all we require that the matrix satistisfies the restricted eigenvalue property. Namely there exist such that

###### Proposition 3

Under Assumption 1, the quantile Huber loss enjoys the restricted strong convexty property with constant on the set from some ball centered around

Proof: In a small neighborhood of we have

 n∑i=1ρ0.5(bi−ATix)≥∑i∈E1(~x)12κ(bi−ATix)2.

Due to the restriced eigenvalue property of we obtain the desired result.

Improved convergence rates can then be obtained as soon as OMP for -QHR reaches the region of restricted strong convexity around

###### Proposition 4

Assume that the OMP algorithm for -QHR reaches the restricted strong convexity region after iterations and produces the iterate . Then for any such that

 k≥∥~x∥0κγ(k0+k)logLn(x(k0))−Ln(~x)ϵ,

there holds

 Ln(x(k+k0))−Ln(~x)≤ϵ.

Proof: The proof follows by adapting the reasoning of Theorem 2.8 in [29]. Specifically, careful inspection of the proof of Theorem 2.8 in  Shalev-Shwartz et al. [29] reveals that restricted convexity need not hold everywhere but only in a certain neighborhood around Thus as long as OMP for -QHR reaches the restricted strong convexity region around and Assumption 1 holds the convergence rate improves exponentially.

We note that as an extreme case, if is large enough so that belongs to the ball , then the situation reduces to the simple quadratic case and the algorithm enjoys fast convergence throughout.

### 5.3 Statistical consistency

We now briefly discuss the statistical consistency of OMP for -QHR. A formal technical analysis is beyond the scope of this paper and will be presented in future work. Recall that denote a population minimizer of the quantile Huber loss and the iterate output by OMP for -QHR after the -th iteration. We have the following decomposition:

 EL(x(k))−L(~x)≤E|Ln(x(k))−L(x(k))|+E|Ln(~x)−L(~x)|+E|Ln(x(k))−Ln(~x)|.

On the right-hand side of the inequality, the convergence of the second term is guaranteed using the traditional central limit theorem, since is fixed. The third term characterizes the numerical convergence of OMP for -QHR and was dealt with in Propositions 2 and 3. The first term can be bounded using standard results from empirical process theory (e.g. Bartlett & Mendelson [2]). For instance we get that uniformly for all there holds Combining all the pieces allows one to conclude that the expected quantile Huber loss of the estimate produced by OMP for -QHR converges to the infimum population loss.

To guarantee the consistency in terms of the original quantile loss rather than the Quantile Huber loss, it now remains to address how well a population minimizer of the quantile Huber loss approximates a population minimizer of the original quantile loss. The early work of Clark [7] sheds partial light on the relationship between both estimators (see in particular Theorem 6 in Clark [7]). Subsequently, the question was fully addressed by Li & Swetits [18] for the case whe . Specifically they showed that the solution set of the Huber estimator problem is Lipschitz continuous with respect to the parameter , and thus that the set of the traditional estimators is the limit of the set of the Huber estimators as This result can naturally be extended to general quantile Huber and quantile regression.

## 6 Experiments

In this section, we present experiments with simulated and real data, with very promising results. The first subsection demonstrates the use of quantile versus quantile Huber loss in feature selection tasks. In the second section we study an eQTL problem to discover variations in the genome that are associated with the APOE gene, a key gene for Alzheimer’s disease.

### 6.1 Simulations

We employ a simulation setting similar to [34]. The data matrix is generated in two steps. In the first step, an auxiliary matrix is generated from a multivariate normal distribution where In the second step, we set where is the normal cumulative distribution function, and The response vector is generated according to the model

 b=A6+A12+A15+A20+0.7A1ϵ+η,

where and are independent of one another and of the matrix . It is important to note that impacts the conditional distribution of given the predictors, but does not directly influence the center (mean or median) of the conditional distribution.

We consider and The comparison methods are -QHR (OMP with quantile Huber Loss), -QR (OMP with the traditional check function), -QR (quantile loss with regularization), -QHR (quantile Huber loss with regularization), and Lasso (the traditional Lasso estimator with loss). We run 100 simulation runs (considering 100 different datasets). For each simulation run, the parameters are selected via holdout validation, using a holdout dataset of size

As a measure of variable selection accuracy, we report the score which is the harmonic mean between precision and recall. Specifically, the score is

 F1=2⋅Prec⋅RecPrec+Rec,Prec=tptp+fp,Rec=tptp+fn,

with denoting true positives, false positives, and false negatives.

We analyze quantile loss with selected from , and for quantile Huber loss consider . Figure 2 depicts the scores for the various methods as a function of the quantile considered. From the figure we can make the following remarks:

• When the predictors do not necessarily impact the mean/median of the distribution it is critical to look at a wide spectrum of quantiles to capture all relevant predictors.

• Regardless of whether an OMP or regularized method is used, the quantile Huber loss yields higher accuracy than the quantile loss. This due to the fact that quantile Huber loss does not “insist” on fitting the inliers exactly.

• Remarkably both -QHR and -QR achieve superior accuracy over -QHR and -QR.

In comparison, the variable selection accuracy of Lasso is much lower, namely

We conclude the simulation study by briefly discussing the impact of in the quantile Huber loss in terms of robustness. Figure 3 depicts the score for -QHR and as a function of . Similar behavior is observed for -QHR and is omitted due to space constraints.

Parameter can be seen as a prior on the level of noise, providing a boundary between what is an acceptable level of noise and what is to be considered as outliers. Higher values of are better suited for errors with lighter tails, while smaller values of are more appropriate for heavier tails. A finer insight into the relationship between and the error distribution is the subject of future work.

### 6.2 Application to eQTL Mapping

An interesting benefit of sparse quantile regression, in addition to its robustness to outliers, is the ability to perform variable selection in a quantile-dependent fashion. Sparse quantile regression can thus provide a more realistic picture of the sparsity patterns, which may be different at different quantiles. We illustrate this point by analyzing Alzheimer’s disease (AD) data generated by the Harvard Brain Tissue Resource Center and Merck Research Laboratories (http://sage.fhcrc.org/downloads/ downloads.php). This data concerns AD cases with SNPs and expression levels in the visual cortex. For our analysis, we selected candidate SNPs based on a set of genes related to neurological diseases and studied the associations between these SNPs and the expression levels of the APOE gene, which is a key Alzeimer’s gene. Specifically, persons having an APOE e4 allele have an increased chance of developing the disease; those who inherit two copies of the allele are at even greater risk.

Figure 4 shows the top SNPs (sorted by the amplitude of their regression coefficient) selected by the -QHR method for quantiles 10, 25, 50, 75, 90, as a function of the chromosome distance (similar insights can be gained from the -QHR method, but we omit the plots due to space constraints.)

As can be seen from the figure, the coefficient profile indeed changes based on the quantiles. This confirms the intuition that high-dimensional data such as those encountered in genomics is likely to exhibit heterogeneity due to some non-location-scale covariate effects. Sparse quantile regression enables a more comprehensive view of sparsity, where the set of relevant covariates can differ based on the segment of the conditional distribution under consideration. While some of the “hotspots” identified by our methods vary accross quantiles, it is also interesting to note that some do persist, in particular a hotspot within chromosome 19 where gene APOE resides.

To conclude the eQTL analysis, it is insightful to investigate the Normal QQ-plots of the residuals from the methods. An examplary QQ-plot for the median quantile residuals of OMP is shown in Figure 4. We can see that the residuals have a very heavy right tail. This suggests that the robustness property of our methods is also valuable for this type of analysis.

### Footnotes

1. We assume that vectors of size and matrices of size fit in memory on one machine (and hence, in MATLAB). If this assumption does not hold, all computations must be implemented in distributed memory using lower-level languages such as C/C++.

### References

1. Aravkin, Aleksandr Y., Burke, James V., and Pillonetto, Gianluigi. Sparse/robust estimation and kalman smoothing with nonsmooth log-concave densities: Modeling, computation, and theory. Journal of Machine Learning Research, 14:2689–2728, 2013.
2. Bartlett, Peter L. and Mendelson, Shahar. Empirical minimization. Probability Theory and Related Fields, 135(3):311–334, 2006.
3. Beck, A. and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.
4. Belloni, Alexandre and Chernozhukov, Victor. L1-penalized quantile regression in high-dimensional sparse models. Annals of Statistics, 39(1):1012–1030, 2011.
5. Buchinsky, Moshe. Changes in the u.s. wage structure 1963-1987: Application of quantile regression. Econometrica, 62(2):405–58, March 1994.
6. Candès, E. J. and Tao, T. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203–4215, 2005.
7. Clark, D. The mathematical structure of huberâs m-estimator. SIAM Journal on Scientific and Statistical Computing, 6:1:209–219, 1985.
8. Donoho, D. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006.
9. Fornasier, M. Nonlinear projection recovery in digital inpainting for color image restoration. J. Math. Imaging Vis., 24(3):359–373, 2006. ISSN 0924-9907.
10. Guyon, I. and Elisseeff, A. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003.
11. J., Zhu., Rosset, S., Hastie, T., and Tibshirani, R. 1-norm support vector machines. Neural Information Processing Systems (NIPS), 16, 2004.
12. Johnson, Christopher C., Jalali, Ali, and Ravikumar, Pradeep D. High-dimensional sparse inverse covariance estimation using greedy methods. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 2012.
13. Kim, Seung-Jean, Koh, K., Lustig, M., Boyd, S., and Gorinevsky, D. An interior-point method for large-scale l1-regularized least squares. Selected Topics in Signal Processing, IEEE Journal of, 1(4):606–617, 2007.
14. Koenker, R. Quantile Regression. Cambridge University Press, 2005.
15. Koenker, R. and Bassett, G. Regression quantiles. Econometrica, pp. 33–50, 1978.
16. Koenker, R. and Geling, O. Reappraising medfly longevity: A quantile regression survival analysis. Journal of the American Statistical Association, 96:458â468, 2001.
17. Koenker, Roger and Hallock, Kevin F. Quantile regression. Journal of Economic Perspectives, American Economic Association, pp. 143–156, 2001.
18. Li, W. and Swetits, J. The linear l1 estimator and the huber m-estimator. SIAM Journal on Optimization, 8(2):457–475, 1998.
19. Li, Y. and Zhu, J. L1-norm quantile regression. Journal of Computational and Graphical Statistics, 17(1):1–23, 2008.
20. Mallat, S.G and Z, Zhang. Matching pusuits with time-frequency dictionaries. IEEE Transcations on Signal Processing, 41:3397–3415, December 1993.
21. Message Passing Interface Forum. MPI, June 1995.
22. Message Passing Interface Forum. MPI-2, July 1997.
23. Neelamani, R., Krohn, C. E., Krebs, J. R., Romberg, J. K., Deffenbaugh, Max, and Anderson, John E. Efficient seismic forward modeling using simultaneous random sources and sparsity. Geophysics, 75(6):WB15–WB27, 2010.
24. Ng, A. Feature selection, l1 vs. l2 regularization, and rotational invariance. Proceedings of 21st International Conference on Machine Learning (ICML), 2004.
25. Portnoy, Stephen and Koenker, Roger. The gaussian hare and the laplacian tortoise: Computability of squared- error versus absolute-error estimators. Statistical Science, 12(4):pp. 279–296, 1997.
26. Poulson, Jack, Marker, Bryan, van de Geijn, Robert A., Hammond, Jeff R., and Romero, Nichols A. Elemental: A new framework for distributed memory dense matrix computations. ACM Transactions on Mathematical Software, 39(2):13:1–13:24, February 2013.
27. Rockafellar, R.T. and Wets, R.J.B. Variational Analysis, volume 317. Springer, 1998.
28. Rosset, S. and Zhu, J. Piecewise linear regularized solution paths. Annals of Statistics, 35(3):1012–1030, 2007.
29. Shalev-Shwartz, Shai, Srebro, Nathan, and Zhang, Tong. Trading accuracy for sparsity in optimization problems with sparsity constraints. Siam Journal on Optimization, 20:2807–2832, 2010.
30. Starck, J.-L, Elad, M., and Donoho, D. Image decomposition via the combination of sparse representation and a variational approach. IEEE Transaction on Image Processing, 14(10), 2005.
31. Tibshirani, R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B., 58(1):267–288, 1996.
32. Wakin, M., Laska, J., Duarte, M., Baron, D., Sarvotham, S., Takhar, D., Kelly, K., and Baraniuk, R. Compressive imaging for video representation and coding. Proc. Picture Coding Symposium, 2006.
33. Wang, H., Li, G., and Tsai, C.L. Regression coefficient and autoregressive order shrinkage and selection via the lasso. Journal Of The Royal Statistical Society Series B, 69(1):63–78, 2007.
34. Wang, Lan, Wu, Yichao, and Li, Runze. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497):214–222, 2012.
35. Wright, S.J. Primal-dual interior-point methods. Siam, Englewood Cliffs, N.J., USA, 1997.
36. Ye, Yinyu and Anstreicher, Kurt. On quadratic and convergence of a predictor-corrector method for lcp. Mathematical Programming, 62(1-3):537–551, 1993.
37. Zhang, T. Adaptive forward-backward greedy algorithm for sparse learning with linear models. Neural Information Processing Systems (NIPS), 21, 2008.
38. Zheng, Songfeng. Gradient descent algorithms for quantile regression with smooth approximation. International Journal of Machine Learning and Cybernetics, 2(3):191–207, 2011.
39. Zou, H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429, 2006.
40. Zou, Hui and Yuan, Ming. Regularized simultaneous model selection in multiple quantiles regression. Computational Statistics & Data Analysis, 52(12):5296–5304, 2008.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters