Treeguided group lasso for multiresponse regression with structured sparsity, with an application to eQTL mapping\thanksrefT1
Abstract
We consider the problem of estimating a sparse multiresponse regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence geneexpression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchicallydefined cluster of responses. We propose a treeguided group lasso, or tree lasso, for estimating such structured sparsity under multiresponse regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the treepenalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structuredsparsityinducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariateresponse regression.
10.1214/12AOAS549 \volume6 \issue3 2012 \firstpage1095 \lastpage1117 \newproclaimexExample
Tree lasso for eQTL mapping
T1Supported in part by NIH 1R01GM087694.
A]\fnmsSeyoung \snmKimlabel=e1]sssykim@cs.cmu.edu and A]\fnmsEric P. \snmXing\corref\thanksreft2label=e2]epxing@cs.cmu.edu
t2Supported in part by ONR N000140910758, NSF DBI0640543, NSF CCF0523757 and an Alfred P. Sloan Research Fellowship.
Lasso \kwdstructured sparsity \kwdhighdimensional regression \kwdgenetic association mapping \kwdeQTL analysis.
1 Introduction
Recent advances in highthroughput technology for profiling gene expressions and assaying genetic variations at a genomewide scale have provided researchers an unprecedented opportunity to comprehensively study the genetic causes of complex diseases such as asthma, diabetes, and cancer. Expression quantitative trait locus (eQTL) mapping considers gene expression measurements, also known as geneexpression traits, as intermediate phenotypes, and aims to identify the genetic markers such as single nucleotide polymorphisms (SNPs) that influence the expression levels of genes, which gives rise to the variability in clinical phenotypes or disease susceptibility across individuals. This type of analysis can provide a deeper insight into the functional role of the eQTLs in the disease process by linking the SNPs to genes whose functions are often known directly or indirectly through other coexpressed genes in the same pathway.
The most commonly used method for eQTL analysis has been to examine the expression level of a single gene at a time for association, treating genes as independent of each other [Cheung et al. (2005), Stranger et al. (2005), Zhu et al. (2008)]. However, it is widely believed that many of the genes in the same biological pathway are often coexpressed or coregulated [Pujana et al. (2007), Zhang and Horvath (2005)] and may share a common genetic basis that causes the variations in their expression levels. How to incorporate such information on relatedness of genes into statistical analysis of associations between SNPs and gene expressions remains an underaddressed problem. One of the popular existing approaches is to consider the relatedness of genes after rather than during statistical analysis of eQTL data, which obviously fails to fully exploit the statistical power from this additional source of information. Specifically, in order to find the genetic variations with pleiotropic effects that perturb the expressions of multiple related genes jointly, in recent eQTL studies, the expression traits for individual genes were analyzed separately, and then the results were examined for all genes in light of gene modules to see if any gene sets are enriched for association with a common SNP [Zhu et al. (2008), Emilsson et al. (2008), Chen et al. (2008)]. This type of analysis uses the information on gene modules only in the postprocessing step after a set of singlegene analyses, instead of directly incorporating the correlation pattern in gene expressions in the process of searching for SNPs with pleitropic effects.
Recently, a different approach for searching for SNPs with pleiotropic effects has been proposed to leverage information on gene modules more directly [Segal et al. (2003), Lee et al. (2006)]. In this approach, the module network originally developed for discovering clusters of coregulated genes from gene expression data was extended to include SNPs as potential regulators that can influence the activity of gene modules. The main weakness of this method is that it computed the averages of geneexpression levels over those genes within each module and looked for SNPs that affect the average gene expressions of the module. The operation of computing averages can lead to a significant loss of information on the detailed activity of individual genes and negative correlations within a module.
(a)  (b) 
In this article we propose a treeguided group lasso, or tree lasso, that directly combines statistical strength across multiple related genes in gene expression data to identify SNPs with pleiotropic effects by leveraging any given knowledge of a hierarchical clustering tree over genes.^{3}^{3}3Here we focus on making use of the given knowledge of related genes to enhance the power of eQTL analysis, rather than discovering or evaluating how genes are related, which are interesting problems in their own right, and are studied widely [Segal et al. (2003)]. If the gene coexpression pattern is not available, one can simply run any offtheshelf hierarchical agglomerative clustering algorithm on the geneexpression data to obtain one before applying our method. It is beyond the scope of this paper to discuss, compare, and further develop such algorithms for clustering genes or learning trees. The hierarchical clustering tree contains clusters of genes at multiple granularity, and genes within a cluster have correlated expression levels. The leaf nodes of the tree correspond to individual genes, and each internal node represents a cluster of genes at the leaf nodes of the subtree rooted at the internal node in question. Furthermore, each internal node in the tree is associated with a weight that represents the height of the subtree, or how tightly the genes in the cluster for that internal node are correlated. As illustrated in Figure 1(a), the expression levels of genes in each cluster are likely to be influenced by a common set of SNPs, and this type of sharing of genetic effects among correlated genes is stronger among tightly correlated genes in the cluster at the lower levels with a smaller height in the tree than among loosely correlated genes in the cluster near the root of the tree with a greater height. This multilevel grouping structure of genes can be available either as prior knowledge from domain experts, or can be learned from the geneexpression data using various clustering algorithms such as the hierarchical agglomerative clustering algorithm [Golub et al. (1999)].
Our method is based on a multivariate regression method with a regularization function that is constructed from the hierarchical clustering tree. This regularizer induces a structured shrinkage effect that encourages multiple correlated responses to share a similar set of relevant covariates, rather than having independent sets of relevant covariates. This is a biologically and statistically desirable bias not present in existing methods for identifying eQTLs. For example, assuming that the SNPs are represented as covariates, gene expressions as responses, and the association strengths as regression coefficients in a regression model, a multivariate regression with an regularization, called the lasso, has been applied to identify a small number of SNPs with nonzero association strengths [Wu et al. (2009)]. Here, the lasso treats multiple responses as independent of each other and selects relevant covariates for each response variable separately. Although the penalty in the lasso can be extended to the penalty, also known as the grouplasso penalty, for union support recovery, where all of the responses are constrained to have the same relevant covariates [Obozinski, Wainwright and Jordan (2008), Obozinski, Taskar and Jordan (2010)], in this case, the rich and heterogeneous relatedness among the responses as captured by a weighted tree cannot be taken into account.
Our method extends the penalty to the treelasso penalty by letting the hierarchicallydefined groups overlap. The treelasso penalty achieves structured sparsity, where the related responses (i.e., gene expressions) in the same group share a common set of relevant covariates (i.e., SNPs), in a way that is properly calibrated to the strength of their relatedness and consistent with their overlapping group organization. Although several schemes have been previously proposed to use the grouplasso penalty with overlapping groups to take advantage of a more complex structural information on response variables, due to their ad hoc weighting scheme for different overlapping groups in the regularization function, some regression coefficients were penalized arbitrarily more heavily than others, leading to an inconsistent estimate [Zhao, Rocha and Yu (2009), Jacob, Obozinski and Vert (2009), Jenatton, Audibert and Bach (2009)]. In contrast, we propose a systematic weighting scheme for overlapping groups that applies a balanced penalization to all of the regression coefficients. Since the tree lasso is a special case of overlapping group lasso, where the weights and overlaps of groups are determined according to the hierarchical clustering tree, we adopt for efficient optimization the smoothing proximal gradient (SPG) method [Chen et al. (2011)] that was developed for optimizing a convex loss function with a general class of structuredsparsityinducing penalty functions including overlapping group lasso.
Compared to our previous work on the graphguided fused lasso that leverages a network structure over responses to achieve structured sparsity [Kim and Xing (2009)], the tree lasso has a considerably lower computational time, and allows more than thousands of response variables to be analyzed simultaneously as is necessary in a typical eQTL mapping. This is in part because the computation time in the graphguided fused lasso depends on the number of edges in the graph that can be as large as , where is the number of response variables, whereas in the tree lasso, it is determined by the number of nodes in the tree, which is bounded by twice the number of response variables. Another potential advantage of the tree lasso is that it relaxes the constraint in the graphguided fusion penalty that the regression coefficients should take the similar values for a covariate relavant to multiple correlated responses. Although introducing this bias through the fusion penalty in the graphguided fused lasso offered the benefit of combining weak association signals and reducing false positives, it is expected that relaxing this constraint could further increase the power. The penalty in our tree regularization achieves a joint selection of covariates for multiple related responses, while allowing different values for the regression coefficients corresponding to the selected covariate and correlated response variables.
Although the hierarchical agglomerative clustering algorithm has been widely popular as a preprocessing step for regression or classification tasks [Golub et al. (1999), Sørlie et al. (2001), Hastie et al. (2001)], our proposed method is the first to make use of the full results from the clustering algorithm given as tree structure and subtreeheight information. Most of the previous classification or regression methods that build on the hierarchical clustering algorithm used summary statistics extracted from the hierarchical clustering tree such as subsets of genes forming clusters or averages of gene expressions within each cluster, rather than using the tree as it is [Golub et al. (1999), Hastie et al. (2001)]. In the tree lasso, we use the full hierarchical clustering tree as prior knowledge to construct a regularization function. Thus, the tree lasso incorporates the full information present in both the raw data and the hierarchical clustering tree to maximize the power for detecting weak association signals and to reduce false positives. In our experiments, we demonstrate that our proposed method can be successfully applied to select SNPs affecting the expression levels of multiple genes, using both simulated and yeast data sets.
The remainder of the paper is organized as follows. In Section 2 we provide a brief discussion of previous work on sparse regression estimation. In Section 3 we introduce the tree lasso and describe an efficient optimization method based on SPG. We present experimental results on simulated and yeast eQTL data sets in Section 4, and conclude in Section 5.
2 Background on multivariate regression approach for eQTL mapping
Let us assume that data are collected for SNPs and geneexpression traits over individuals. Let denote the matrix of SNP genotypes for covariates, and the matrix of geneexpression measurements for responses. In eQTL mapping, each element of the takes values from according to the number of minor alleles at the given locus in each individual. Then, we assume a linear model for the functional mapping from covariates to response variables:
(1) 
where is the matrix of regression coefficients and is the matrix of noise terms distributed as mean 0 and a constant variance. We center each column of and such that the mean is zero, and consider the model without an intercept. Throughout this paper, we use subscripts and superscripts to denote rows and columns of a matrix, respectively (e.g., and for the th row and th column of ).
When is large and the number of relevant covariates is small, the lasso offers an effective method for identifying the small number of nonzero elements in [Tibshirani (1996)]. The lasso obtains by solving the following optimization problem:
(2) 
where is the Frobenius norm, is the matrix norm, and is a tuning parameter that controls the amount of sparsity in the solution. Setting to a small value leads to a smaller number of nonzero regression coefficients.
The lasso estimation in (2) is equivalent to selecting relevant covariates for each of the responses separately, and does not provide any mechanism to enforce a joint selection of common relevant covariates for multiple related responses. In the literature of multitask learning, an penalty, also known as a group lasso penalty [Yuan and Lin (2006)], has been adopted in multivariateresponse regression to take advantage of the relatedness of the response variables and recover the union support—the pattern of nonzero regression coefficients shared across all of the responses [Obozinski, Wainwright and Jordan (2008)]. This method is widely known as the regularized multitask regression in the machine learning community, and its estimate for regression coefficients is given as
(3) 
where denotes an norm. In regularized multitask regression, an norm is applied to the regression coefficients for all responses for each covariate, , and these norms for the covariates are combined through an norm to encourage only a small number of covariates to take nonzero regression coefficients. Since the part of the penalty does not have the property of encouraging sparsity, if the th covariate is selected as relevant, then all of the elements of would take nonzero values, although the regression coefficient values for the covariate are still allowed to vary across different responses. When applied to eQTL mapping, this method is significantly limited since it is not realistic to assume that the expression levels of all of the genes are influenced by the same set of relevant SNPs. A subset of coexpressed genes may be perturbed by a common set of SNPs, and genes in a different pathway are less likely to be affected by the same SNPs. The sparse group lasso [Friedman, Hastie and Tibshirani (2010)] can be adopted to relax this constraint by adding a lasso penalty to (3) so that individual regression coefficients within each norm can be set to zeros. However, this method shares the same limitation as the regularized multitask regression in that it cannot incorporate complex grouping structures in the responses such as groups at multiple granularity as in the hierarchical clustering tree.
3 Tree lasso for exploiting hierarchical clustering tree in eQTL mapping
We introduce the tree lasso that considerably adds flexibility and power to these existing methods by taking advantage of the complex correlation structure given as a hierarchical clustering tree over the responses. We present a highly efficient algorithm for estimating the parameters in a tree lasso that is based on the smoothing proximal gradient descent developed for a general class of structuredsparsityinducing norms.
3.1 Tree lasso
In a microarray experiment, geneexpression levels are measured for more than thousands of genes at a time, and many of the genes show highly correlated expression levels across samples, implying they may share a common regulator or participate in the same pathway. In addition, in eQTL analysis, it is widely believed that genetic variations such as SNPs perturb modules of related genes rather than acting on individual genes. As these gene modules are often derived and visualized by running the hierarchical agglomerative clustering algorithm on gene expression data, a natural extension of sparse regression methods for eQTL mapping is to incorporate with them the output of the hierarchical clustering algorithm to identify genetic variations that influence gene modules in the clustering tree. In this section, we build on the regularized regression and introduce a tree lasso that can directly leverage hierarchicallyorganized groups of genes to combine statistical strength across the expression levels of genes within each group. Although our work is primarily motivated by eQTL mapping in genetics, the tree lasso is generally applicable to any multivariateresponse regression problem, where the hierarchical group structure over the responses is given as desirable sources of structural bias, such as in many computer vision [Yuan and Yan (2010)] and natural language processing applications [Zhang (2010), Zhou, Jin and Hoi (2010)], where dependencies among visual objects and among parts of speech are well known to be valuable to enhance prediction performance.
Assume that the relationship among the responses is represented as tree with a set of vertices of size . As illustrated in Figure 1(a), each of the leaf nodes is associated with a response variable, and each of the internal nodes represents a group of the responses located at the leaves of the subtree rooted at the given internal node. Internal nodes near the bottom of the tree correspond to tight clusters of highly related responses, whereas the internal nodes near the root represent groups with weak correlations among the responses in its subtree. This tree structure may be provided as prior knowledge by domain experts or external resources (e.g., gene ontology databases in our eQTL mapping problem), or can be learned from the data for response variables using methods such as the hierarchical agglomerative clustering algorithm. We assume that each node of the tree is associated with height of the subtree rooted at , representing how tightly its members are correlated. In addition, we assume that the heights ’s of the internal nodes are normalized so that the height of the root node is 1.
Given this tree over the responses, we generalize the regularization in (3) to a tree regularization by expanding the part of the penalty into an overlapping group lasso penalty. The overlapping groups in tree regularization are defined based on tree as follows. Each node of tree is associated with group whose members are the response variables at the leaf nodes of the subtree rooted at node . For example, Figure 1(b) shows the groups of responses and the corresponding regression coefficients that are associated with each of the nodes of the tree in Figure 1(a). Given these overlapping groups, we define the tree lasso as
(4) 
where is a vector of regression coefficients . Since a tree associated with responses can have at most nodes, the number of terms that appear in the treelasso penalty is upperbounded by for each covariate.
Each group of regression coefficients in (4) is weighted with such that the group of responses near the leaf of the tree is more likely to have common relevant covariates, while ensuring the amount of penalization aggregated over all of the overlapping groups for each regression coefficient to be the same for all regression coefficients. We define ’s in (4) in terms of two quantities ’s and ’s, given as and , that are associated with each internal node of height in tree . The represents the weight for selecting relevant covariates separately for the responses associated with each child of node , whereas the represents the weight for selecting relevant covariates jointly for the responses for all of the children of node . We first consider a simple case with two responses () and a tree of three nodes that consists of two leaf nodes ( and ) and one root node (), and then generalize this to an arbitrary tree. When , the penalty term in (4) can be written as
(5) 
where the group weights are set to , , and . Equation (5) has a similar form to the elasticnet penalty [Zou and Hastie (2005)], with the slight difference that the elastic net uses the square of the norm. The norm and norm in (5) are weighted by and , and play the role of setting and to nonzero values separately or jointly. A large value of indicates that the responses are highly related, and a joint covariate selection is encouraged by heavily weighting the part of the penalty. When , the penalty in (5) is equivalent to the regularized multitask regression in (3), where the responses share the same set of relevant covariates, whereas setting in (5) leads to a lasso penalty. In general, given a singlelevel tree with all of the responses under a single parent node, the treelasso penalty corresponds to a linear combination of and penalties as in (5).
Now, we generalize this process of obtaining ’s in the treelasso penalty for the special case of a singlelevel tree to an arbitrary tree. Starting from the root node and traversing down the tree recursively to the leaf nodes, at each of the root and internal nodes, we apply the similar operation of linear combination of the norm and norm as in (5) as follows:
(6) 
where
Then, it can be shown that the following relationship holds between ’s and (, )’s:
The above weighting scheme extends the linear combination of the and penalty in (5) hierarchically, so that the and norms encourage separate and joint selections of covariates for the given groups of responses. The ’s and ’s determine the balance between these and norms. If and for all , then only separate selections are performed, and the treelasso penalty reduces to the lasso penalty. On the other hand, if and for all , the penalty reduces to the penalty in (3) that constrains all of the responses to have the same set of relevant covariates. The unit contour surfaces of various penalties for , , and with groups as defined in Figure 1 are shown in Figure 2.
(a)  (b)  (c) 
(d)  (e) 
The seemingly complex method for determining the weights ’s for groups in the treelasso penalty has the property of ensuring all of the regression coefficients to be overall penalized by an equal amount across all nested overlapping groups as they appear in a balanced manner. Proposition 1 (as stated and proved in the supplemental article [Kim and Xing (2012)]) shows that even if each response belongs to multiple groups associated with different internal nodes and appears multiple times in the overall penalty in (6), the sum of weights over all of the groups that contain the given response is always one. Thus, the weighting scheme in (6) guarantees that all of the individual regression coefficients are overall penalized equally. Although several variations of group lasso with overlapping groups have been proposed previously, all of those methods weighted the norms for overlapping groups with arbitrarily defined weights, resulting in unbalanced weights for different regression coefficients [Zhao, Rocha and Yu (2009), Jenatton, Audibert and Bach (2009)]. It was empirically shown that these arbitrary weighting schemes give an inconsistent estimate [Jenatton, Audibert and Bach (2009)].
Below, we provide an example of the process of constructing a treelasso penalty based on the simple tree over three responses in Figure 1(a). For more complex trees over a large number of responses, the same procedure can be applied, traversing the tree recursively from the root to the leaf nodes.
Given the tree in Figure 1, for the th covariate the penalty of the tree lasso in (6) can be written as follows:
The treelasso penalty that we introduced above can be easily extended to other related types of structures such as trees with different branching factors and a forest that consists of multiple trees. In addition, our proposed regularization can be applied to a pruned tree whose leaf nodes contain groups of variables instead of individual variables.
3.2 Parameter estimation
Although the treelasso optimization problem in (4) is convex, the main challenges for solving equation (4) arise from the nonseparable terms over ’s in the nonsmooth penalty. While the coordinate descent algorithm has been successfully applied to nonsmooth penalties such as the lasso and group lasso with nonoverlapping groups [Friedman et al. (2007)], it cannot be applied to the tree lasso because the overlapping groups with nonseparable terms in the penalty prevent us from obtaining a closedform update equation for iterative optimization. While the optimization problem for the tree lasso can be formulated as a secondorder cone program and solved with the interior point method [Boyd and Vandenberghe (2004)], this approach does not scale to highdimensional problems such as eQTL mapping that involves a large number of SNPs and geneexpression measurements. Recently, a smoothing proximal gradient (SPG) method was developed for an efficient optimization of a convex loss function with a general class of structuredsparsityinducing penalty functions that share the same challenges of nonsmoothness and nonseparability [Chen et al. (2011)]. The SPG can handle a wide variety of penalties such as the overlapping group lasso and fused lasso, and as the tree lasso is a special case of the overlapping group lasso, we adopt this method in our paper. As we detail below in this section, SPG first decouples the nonseparable terms in the penalty by reformulating it with a dual norm, and introduces a smooth approximation of the nonsmooth penalty. Then, in order to optimize the objective function with this smooth approximation of the penalty, SPG adopts the fast iterative shrinkage thresholding algorithm (FISTA) [Beck and Teboulle (2009)], an accelerated gradient descent method, to optimize the objective function an accelerated gradient descent method.
3.2.1 Reformulation of the penalty function
We rewrite (4) by splitting the treelasso penalty into two parts corresponding to two sets of nodes in tree , for all of the internal nodes and for all of the leaf nodes, as follows:
We notice that in the above equation, the first penalty term for contains overlapping groups, whereas the second penalty term for is equivalent to the weighted lasso penalty , where represents the weight for the leaf node associated with the th response.
Since the penalty term associated with contains overlapping groups and therefore is nonseparable, we rewrite this term by introducing a vector of auxiliary variables for each covariate and group and by reformulating it with a dual norm representation to obtain
where denotes a matrix inner product, and isa matrix given as
with domain . In addition, in (3.2.1) is a matrix whose elements are defined as
with rows indexed by such that and , and columns indexed by . We note that the nonseparable terms over ’s in the treelasso penalty are decoupled in the dualnorm representation in (3.2.1).
3.2.2 Smooth approximation to the nonsmooth penalty
The reformulation in (3.2.1) is still nonsmooth in , which makes it nontrivial to optimize. To overcome this challenge, SPG introduces a smooth approximation of (3.2.1) as follows:
(9) 
where is a smoothing function with the maximum value , and is the parameter that determines the amount of smoothness. We notice that when , we recover the original nonsmooth penalty in . It has been shown [Chen et al. (2011)] that is convex and smooth with gradient
where is the optimal solution to (9), composed of , given the shrinkage operator defined as
(10) 
In addition, is Lipschitz continuous with the Lipschitz constant , where is a matrix spectral norm. We can show that .
3.2.3 Smoothing proximal gradient (SPG) method
By substituting the penalty term for in (3.2.1) with in (9), we obtain an objective function whose nonsmooth component contains only the weighted lasso penalty as follows:
(11) 
The smooth part of the above objective function is
(12) 
and its gradient is given as
(13) 
which is Lipschitzcontinuous with the Lipschitz constant,
(14) 
where is the largest eigenvalue of .
Input: , , , , Lipschitz constant , desired accuracy .
Initialization: set where , , .
Iterate For until convergence of :

[aaIte]

Compute according to (13).

Solve the proximal operator associated with the norm:

Set .

Set .
Output: .
The key idea behind SPG is that once we introduce the smooth approximation of (3.2.1), the only nonsmooth component in (11) is the weighted lasso penalty and FISTA can be adopted to optimize (11). The SPG algorithm for the tree lasso is given in Algorithm 1. In order to obtain the proximal operator associated with the weighted lasso penalty, we rewrite in (2) as follows:
and obtain the closedform solution for in (2) by softthresholding:
where ’s are elements of . The Lipschitz constant given as in (14) plays the role of determining the step size in each gradient descent iteration, although this value can be expensive to compute for large . As suggested in Chen et al. (2011), a backtracking line search can be used to determine the step size for large [Boyd and Vandenberghe (2004)].
It can be shown that the convergence rate of Algorithm 1 is iterations, given the desired accuracy [Chen et al. (2011)]. If we precompute and store and , the time complexity per iteration of SPG for the tree lasso is , compared to for the interior point method for the secondorder cone program. Thus, the time complexity for SPG is quadratic in and linear in max(, ), which is significantly more efficient than cubic in both and for the interior point method.
4 Experiments
We demonstrate the performance of our method on simulated data sets and the yeast data set of genotypes and gene expressions, and compare the results with those from the lasso and the regularized multitask regression that do not assume any structure over responses. In all of our experiments, we determine the regularization parameter by fitting models on a training set for a range of values for , computing the prediction error of each model on a validation set, and then selecting the value of a regularization parameter that gives the lowest prediction error. We evaluate these methods based on two criteria, sensitivity/specificity in detecting true relevant covariates and prediction errors on test data sets. We note that the (specificity) and sensitivity are equivalent to type I error rate and (type II error rate), respectively. Test errors are obtained as mean squared differences between the predicted and observed response measurements based on test data sets that are independent of training and validation data sets.
4.1 Simulation study
We simulate data using the following scenario analogous to eQTL mapping. We simulate with , , and as follows. We first generate the genotypes by sampling each element in from a uniform distribution over that corresponds to the number of mutated alleles at each SNP locus. Then, we set the values of by first selecting nonzero entries and filling these entries with predefined values. We assume a hierarchical structure with four levels over the responses, and select the nonzero elements of so that the groups of responses described by the tree share common relevant covariates. The hierarchical clustering tree as used in our simulation is shown in Figure 3(a) only for the top three levels to avoid a clutter, and the true nonzero elements in the regression coefficient matrix are shown as white pixels in Figure 3(b) with responses (gene expressions) as rows and covariates (SNPs) as columns. In all of our simulation study, we divide the full data set of into training and validation sets of sizes 100 and 50, respectively.
(a)  (b)  (c) 
(d)  (e) 
To illustrate the behavior of different methods, we fit the lasso, the regularized multitask regression, and our method to a single data set simulated with the nonzero elements of set to 0.4, and show the results in Figure 3(c)–(e), respectively. Since the lasso does not have any mechanism to borrow statistical strength across different responses, false positives for nonzero regression coefficients are distributed randomly across the matrix in Figure 3(c). On the other hand, the regularization method blindly combines information across all responses regardless of the correlation structure. As a result, once a covariate is selected as relevant for a response, it gets selected for all of the other responses, and we observe vertical stripes of nonzero values in Figure 3(d). When the hierarchical clustering structure in Figure 3(a) is available as prior knowledge, it is visually clear from Figure 3(e) that our method is able to suppress false positives, and to recover the true relevant covariates for correlated responses significantly better than other methods.
(a)  (b) 
In order to systematically evaluate the performance of the different methods, we generate 50 simulated data sets, and show in Figure 4(a) receiver operating characteristic (ROC) curves for the recovery of the true nonzero elements in the regression coefficient matrix averaged over these 50 data sets. Figure 4(a) represents results from data sets with true nonzero elements in set to 0.2. Additional results for true nonzero elements in set to 0.4 and 0.6 are available in Online Appendix Figures 1A and 1B [Kim and Xing (2012)]. Our method clearly outperforms the lasso and the regularized multitask regression. Especially when the signaltonoise ratio is low in Figure 4(a), the advantage of incorporating the prior knowledge of the tree as a correlation structure over responses is significant.
We compare the performance of the different methods in terms of prediction errors, using an additional 50 samples as test data. The prediction errors averaged over 50 simulated data sets are shown in Figure 4(b) for data sets generated from 0.2 for true nonzero elements of regression coefficients. Additional results for data sets generated from 0.4 and 0.6 for true nonzero elements of regression coefficients are shown in Online Appendix Figures 2A and 2B, respectively. In addition to the results from sparse regression methods, we include the prediction errors from the null model that has only an intercept term. We find that our method shown as “T” in Figure 4(b) has lower prediction errors than all of the other methods. In the tree lasso, in addition to directly using the true tree structure in Figure 3(a), we also consider the scenario in which the true tree structure is not known a priori. In this case, we learn a tree by running a hierarchical agglomerative clustering algorithm on the correlation matrix of the response measurements, and use this tree along with the weights ’s associated with each internal node in our method. Since the tree obtained in this manner represents a noisy realization of the true underlying tree structure, we discard the nodes for weak correlation near the root of the tree by thresholding the normalized ’s at and 0.7, and show the prediction errors obtained from these thresholded trees as “T0.9” and “T0.7” in Figure 4(b). Even when the true tree structure is not available, our method is able to benefit from taking into account the correlation structure among responses, and gives lower prediction errors. We performed the same experiment while varying the threshold in the range of [0.6, 1.0], and obtained similar prediction errors across different values of (results not shown). This shows that the meaningful clustering information that the tree lasso takes advantage of lies mostly in the tight clusters at the lower levels of a tree rather than the clusters of loosely related variables near the root of the tree.
4.2 Analysis of yeast data
We analyze the yeast eQTL data set of the genotype and geneexpression data for 114 yeast strains [Zhu et al. (2008)] using various sparse regression methods. We focus on the chromosome 3 with 21 SNPs and expression levels of 3,684 genes, after removing those genes whose expression levels are missing in more than 5% of the samples. Although it is widely known that genes are organized into functional modules within which geneexpression levels are often correlated, the hierarchical module structure over correlated genes is not directly available as prior knowledge, and we learn the tree by running the hierarchical agglomerative clustering algorithm on geneexpression data. We use only the internal nodes with heights or in our method. The goal of the analysis is to search for SNPs (covariates) whose variation induces a significant variation in the geneexpression levels (responses) over different strains. By applying our method that incorporates information on gene modules at multiple granularity in the hierarchical clustering tree, we expect to be able to identify SNPs that influence the activity of a group of genes that are coexpressed or coregulated.
(a)  (b)  (c)  (d)  (e) 
In Figure 5(a), we show the correlation matrix of the gene expressions after reordering the rows and columns according to the results of the hierarchical agglomerative clustering algorithm. The estimated is shown for the lasso, the regularized multitask regression, and our method with and 0.7 in Figure 5(b)–(e), respectively, where the rows represent genes and the columns SNPs. The regularization parameter is chosen based on prediction errors on a validation set of size 10. The lasso estimates in Figure 5(b) are extremely sparse and do not reveal any interesting structure in SNPgene relationships. We believe that the association signals are very weak as is typically the case in the eQTL study, and that the lasso is unable to detect such weak signals without combining statistical strength across multiple genes with correlated expressions. The estimates from the regularized multitask regression are not sparse across gene expressions, and tend to form vertical stripes of nonzero regression coefficients as can be seen in Figure 5(c). On the other hand, our method in Figure 5(d)–(e) reveals clear groupings in the patterns of associations between gene expressions and SNPs. In addition, as shown in Figure 6, our method performs significantly better in terms of prediction errors on the test set obtained from the 10fold crossvalidation.
Given the estimates of in Figure 5, we look for an enrichment of gene ontology (GO) categories among the genes with nonzero estimated regression coefficients for each SNP. A group of genes that form a module often participate in the same pathway, leading to an enrichment of a GO category among the members of the module. Since we are interested in identifying SNPs influencing gene modules, and our method encourages this joint association through the hierarchical clustering tree, we hypothesize that our method would reveal more significant GO enrichments in the estimated nonzero elements in . Given the treelasso estimate, we search for GO enrichment in the set of genes that have nonzero regression coefficients for each SNP. On the other hand, the estimates of the regularized method are not sparse across genes. Thus, we threshold the absolute values of the estimated at 0.005, 0.01, 0.03, and 0.05, and perform GO enrichment analysis for only those genes with above the threshold.
In Figure 7, we show the number of SNPs with significant enrichments at different value cutoffs for subcategories within each of the three broad GO categories, including biological processes, molecular functions, and cellular components. For example, within biological processes, SNPs were found to be enriched for GO terms such as mitocondrial translation, amino acid biosynthetic process, and organic acid metabolism. Regardless of the thresholds for selecting significant associations in the estimates from the regularized multitask regression, our method generally finds more significant enrichment. Although due to the lack of groundtruth information, the results in Figure 7 do not directly demonstrate that our method led to more significant findings than other methods, they provide evidence that our method was successful in finding SNPs with pleiotropic effects that influence gene modules rather than focusing on identifying SNPs that affect individual genes as in the lasso.
(a)  (b)  (c) 
Previously  

SNP  reported  
loc.  Module  GO category  enrichment [Zhu  
in Chr3  size  (overlap/#genes)  value  et al. (2008)] 
64,300  203  BP: Amino acid biosynthetic process  
75,000  167  BP: Amino acid biosynthetic process  BP: Organic  
BP: Organic acid metabolism  acid metabolism  
MF: Transferase activity  ()  
76,100  186  MF: Catalytic activity  
79,000  167  BP: Amino acid biosynthetic process  
MF: Catalytic activity  
86,000  103  BP: Amino acid biosynthetic process  
MF: Oxidoreductase activity  
100,200  68  BP: Amino acid biosynthetic process  
105,000  168  BP: Amino acid biosynthetic process  
MF: Transferase activity  
175,800  89  BP: Amino acid biosynthetic process  
MF: Catalytic activity  
210,700  23  BP: Branched chain family  BP: Response to  
amino acid biosynthetic process  chemical stimulus  
BP: Response to pheromone  ()  
228,100  195  BP: Mitochondrial translation  
CC: Mitochondrial part  
MF: Hydrogen ion transporting ATP synthase  
activity, rotational mechanism  
240,300  258  CC: Cytosolic ribosome  
MF: Structural constituent of ribosome  
240,300  40  BP: Generation of precursor  
metabolites and energy  
CC: Mitochondrial inner membrane  
MF: Transmembrane transporter activity  
301,400  274  MF: snoRNA binding 
Table 1 lists the enriched GO categories () for SNPs and the groups of genes whose expression levels are affected by the given SNP based on the treelasso estimate of association strengths. For comparison, in the last column of Table 1, we include the enriched GO categories for roughly similar genomic locations that have been previously reported in Zhu et al. (2008) using the conventional singleSNP/singlegene statistical test for association. While the treelasso results mostly recover the previouslyreported GO enrichments, we find many additional enrichments that are statistically significant. This observation again provides us with indirect evidence that the tree lasso can extract finegrained information on gene modules perturbed by genetic polymorphisms.
5 Discussion
In this article we proposed a novel regularized regression approach, called the tree lasso, that identifies covariates relevant to multiple related responses jointly by leveraging the correlation structure in responses represented as a hierarchical clustering tree. We discussed how this approach can be used in eQTL analysis to learn SNPs with pleiotropoic effects that influence the activities of multiple coexpressed genes. For optimization, we adopted the smoothing proximal gradient approach that was originally developed for a general class of structuredsparsityinducing penalties, as the treelasso penalty can be viewed as a special case. Our results on both the simulated and yeast data sets showed a clear advantage of the tree lasso in increasing the power of detecting weak signals and reducing false positives.
The balanced weighting scheme of tree lasso and additional experimental results \slink[doi]10.1214/12AOAS549SUPP \slink[url]http://lib.stat.cmu.edu/aoas/549/supplement.pdf \sdatatype.pdf \sdescriptionWe prove that the weighting scheme of the treelasso penalty achieves a balanced penalization of all regression coefficients. We also provide additional experimental results on the comparison of the tree lasso with other sparse regression methods using simulated data sets.
References
 Beck and Teboulle (2009) {barticle}[mr] \bauthor\bsnmBeck, \bfnmAmir\binitsA. \AND\bauthor\bsnmTeboulle, \bfnmMarc\binitsM. (\byear2009). \btitleA fast iterative shrinkagethresholding algorithm for linear inverse problems. \bjournalSIAM J. Imaging Sci. \bvolume2 \bpages183–202. \biddoi=10.1137/080716542, issn=19364954, mr=2486527 \bptokimsref \endbibitem
 Boyd and Vandenberghe (2004) {bbook}[mr] \bauthor\bsnmBoyd, \bfnmStephen\binitsS. \AND\bauthor\bsnmVandenberghe, \bfnmLieven\binitsL. (\byear2004). \btitleConvex Optimization. \bpublisherCambridge Univ. Press, \baddressCambridge. \bidmr=2061575 \bptokimsref \endbibitem
 Chen et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmChen, \bfnmY.\binitsY., \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmLum, \bfnmP. K.\binitsP. K., \bauthor\bsnmYang, \bfnmX.\binitsX., \bauthor\bsnmPinto, \bfnmS.\binitsS., \bauthor\bsnmMacNeil, \bfnmD. J.\binitsD. J., \bauthor\bsnmZhang, \bfnmC.\binitsC., \bauthor\bsnmLamb, \bfnmJ.\binitsJ., \bauthor\bsnmEdwards, \bfnmS.\binitsS., \bauthor\bsnmSieberts, \bfnmS. K.\binitsS. K. \betalet al. (\byear2008). \btitleVariations in DNA elucidate molecular networks that cause disease. \bjournalNature \bvolume452 \bpages429–435. \bptokimsref \endbibitem
 Chen et al. (2011) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmChen, \bfnmX.\binitsX., \bauthor\bsnmLin, \bfnmQ.\binitsQ., \bauthor\bsnmKim, \bfnmS.\binitsS., \bauthor\bsnmCarbonell, \bfnmJ.\binitsJ. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2011). \btitleSmoothing proximal gradient method for general structured sparse learning. In \bbooktitleProceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI) \bpages105–114. \bpublisherAUAI Press, \baddressCorvallis, OR. \bptokimsref \endbibitem
 Cheung et al. (2005) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmCheung, \bfnmV.\binitsV., \bauthor\bsnmSpielman, \bfnmR.\binitsR., \bauthor\bsnmEwens, \bfnmK.\binitsK., \bauthor\bsnmWeber, \bfnmT.\binitsT., \bauthor\bsnmMorley, \bfnmM.\binitsM. \AND\bauthor\bsnmBurdick, \bfnmJ.\binitsJ. (\byear2005). \btitleMapping determinants of human gene expression by regional and genomewide association. \bjournalNature \bvolume437 \bpages1365–1369. \bptokimsref \endbibitem
 Emilsson et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmEmilsson, \bfnmV.\binitsV., \bauthor\bsnmThorleifsson, \bfnmG.\binitsG., \bauthor\bsnmZhang, \bfnmB.\binitsB., \bauthor\bsnmLeonardson, \bfnmA. S.\binitsA. S., \bauthor\bsnmZink, \bfnmF.\binitsF., \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmCarlson, \bfnmS.\binitsS., \bauthor\bsnmHelgason, \bfnmA.\binitsA., \bauthor\bsnmWalters, \bfnmG. B.\binitsG. B., \bauthor\bsnmGunnarsdottir, \bfnmS.\binitsS. \betalet al. (\byear2008). \btitleGenetics of gene expression and its effect on disease. \bjournalNature \bvolume452 \bpages423–428. \bptokimsref \endbibitem
 Friedman, Hastie and Tibshirani (2010) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmFriedman, \bfnmJ.\binitsJ., \bauthor\bsnmHastie, \bfnmT.\binitsT. \AND\bauthor\bsnmTibshirani, \bfnmR.\binitsR. (\byear2010). \bhowpublishedA note on the group lasso and a sparse group lasso. Technical report, Dept. Statistics, Stanford Univ., Stanford, CA. \bptokimsref \endbibitem
 Friedman et al. (2007) {barticle}[mr] \bauthor\bsnmFriedman, \bfnmJerome\binitsJ., \bauthor\bsnmHastie, \bfnmTrevor\binitsT., \bauthor\bsnmHöfling, \bfnmHolger\binitsH. \AND\bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear2007). \btitlePathwise coordinate optimization. \bjournalAnn. Appl. Stat. \bvolume1 \bpages302–332. \biddoi=10.1214/07AOAS131, issn=19326157, mr=2415737 \bptokimsref \endbibitem
 Golub et al. (1999) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmGolub, \bfnmT. R.\binitsT. R., \bauthor\bsnmSlonim, \bfnmD. K.\binitsD. K., \bauthor\bsnmTamayo, \bfnmP.\binitsP., \bauthor\bsnmHuard, \bfnmC.\binitsC., \bauthor\bsnmGaasenbeek, \bfnmM.\binitsM., \bauthor\bsnmMesirov, \bfnmJ. P.\binitsJ. P., \bauthor\bsnmColler, \bfnmH.\binitsH., \bauthor\bsnmLoh, \bfnmM. L.\binitsM. L., \bauthor\bsnmDowning, \bfnmJ. R.\binitsJ. R., \bauthor\bsnmCaligiuri, \bfnmM. A.\binitsM. A., \bauthor\bsnmBloomfield, \bfnmC. D.\binitsC. D. \AND\bauthor\bsnmLander, \bfnmE. S.\binitsE. S. (\byear1999). \btitleMolecular classification of cancer: class discovery and class prediction by gene expression monitoring. \bjournalScience \bvolume286 \bpages531–537. \bptokimsref \endbibitem
 Hastie et al. (2001) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmTibshirani, \bfnmR.\binitsR., \bauthor\bsnmBotstein, \bfnmD.\binitsD. \AND\bauthor\bsnmBrown, \bfnmP.\binitsP. (\byear2001). \btitleSupervised harvesting of expression trees. \bjournalGenome Biol. \bvolume2 \bpages0003.1–0003.12. \bptokimsref \endbibitem
 Jacob, Obozinski and Vert (2009) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmJacob, \bfnmL.\binitsL., \bauthor\bsnmObozinski, \bfnmG.\binitsG. \AND\bauthor\bsnmVert, \bfnmJ.\binitsJ. (\byear2009). \btitleGroup lasso with overlap and graph lasso. In \bbooktitleProceedings of the 26th International Conference on Machine Learning. \bpublisherACM, \baddressNew York. \bptokimsref \endbibitem
 Jenatton, Audibert and Bach (2009) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmJenatton, \bfnmR.\binitsR., \bauthor\bsnmAudibert, \bfnmJ.\binitsJ. \AND\bauthor\bsnmBach, \bfnmF.\binitsF. (\byear2009). \bhowpublishedStructured variable selection with sparsityinducing norms. Technical report, INRIA. \bptokimsref \endbibitem
 Kim and Xing (2009) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmKim, \bfnmS.\binitsS. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2009). \btitleStatistical estimation of correlated genome associations to a quantitative trait network. \bjournalPLoS Genetics \bvolume5 \bpagese1000587. \bptokimsref \endbibitem
 Kim and Xing (2012) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmKim, \bfnmS.\binitsS. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2012). \bhowpublishedSupplement to “Treeguided group lasso for multiresponse regression with structured sparsity, with an application to eQTL mapping.” DOI:\doiurl10.1214/12AOAS549SUPP. \bptokimsref \endbibitem
 Lee et al. (2006) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmLee, \bfnmS. I.\binitsS. I., \bauthor\bsnmPe’er, \bfnmD.\binitsD., \bauthor\bsnmDudley, \bfnmA.\binitsA., \bauthor\bsnmChurch, \bfnmG.\binitsG. \AND\bauthor\bsnmKoller, \bfnmD.\binitsD. (\byear2006). \btitleIdentifying regulatory mechanisms using individual variation reveals key role for chromatin modification. \bjournalProc. Natl. Acad. Sci. USA \bvolume103 \bpages14062–14067. \bptokimsref \endbibitem
 Obozinski, Taskar and Jordan (2010) {barticle}[mr] \bauthor\bsnmObozinski, \bfnmGuillaume\binitsG., \bauthor\bsnmTaskar, \bfnmBen\binitsB. \AND\bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2010). \btitleJoint covariate selection and joint subspace selection for multiple classification problems. \bjournalStat. Comput. \bvolume20 \bpages231–252. \biddoi=10.1007/s112220089111x, issn=09603174, mr=2610775 \bptnotecheck year\bptokimsref \endbibitem
 Obozinski, Wainwright and Jordan (2008) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmObozinski, \bfnmG.\binitsG., \bauthor\bsnmWainwright, \bfnmM. J.\binitsM. J. \AND\bauthor\bsnmJordan, \bfnmM. J.\binitsM. J. (\byear2008). \btitleHighdimensional union support recovery in multivariate regression. In \bbooktitleAdvances in Neural Information Processing Systems \bpages21. \bpublisherMIT Press, \baddressCambridge, MA. \bptokimsref \endbibitem
 Pujana et al. (2007) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmPujana, \bfnmM. A.\binitsM. A., \bauthor\bsnmHan, \bfnmJ. J.\binitsJ. J., \bauthor\bsnmStarita, \bfnmL. M.\binitsL. M., \bauthor\bsnmStevens, \bfnmK. N.\binitsK. N., \bauthor\bsnmTewari, \bfnmM.\binitsM., \bauthor\bsnmAhn, \bfnmJ. S.\binitsJ. S., \bauthor\bsnmRennert, \bfnmG.\binitsG., \bauthor\bsnmMoreno, \bfnmV.\binitsV., \bauthor\bsnmKirchhoff, \bfnmT.\binitsT., \bauthor\bsnmGold, \bfnmB.\binitsB. \betalet al. (\byear2007). \btitleNetwork modeling links breast cancer susceptibility and centrosome dysfunction. \bjournalNature Genetics \bvolume39 \bpages1338–1349. \bptokimsref \endbibitem
 Segal et al. (2003) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmSegal, \bfnmE.\binitsE., \bauthor\bsnmShapira, \bfnmM.\binitsM., \bauthor\bsnmRegev, \bfnmA.\binitsA., \bauthor\bsnmPe’er, \bfnmD.\binitsD., \bauthor\bsnmBotstein, \bfnmD.\binitsD., \bauthor\bsnmKoller, \bfnmD.\binitsD. \AND\bauthor\bsnmFriedman, \bfnmN.\binitsN. (\byear2003). \btitleModule networks: Identifying regulatory modules and their conditionspecific regulators from gene expression data. \bjournalNature Genetics \bvolume34 \bpages166–178. \bptokimsref \endbibitem
 Sørlie et al. (2001) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmSørlie, \bfnmT.\binitsT., \bauthor\bsnmPerou, \bfnmC. M.\binitsC. M., \bauthor\bsnmTibshirani, \bfnmR.\binitsR., \bauthor\bsnmAas, \bfnmT.\binitsT., \bauthor\bsnmGeisler, \bfnmS.\binitsS., \bauthor\bsnmJohnsen, \bfnmH.\binitsH., \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmEisen, \bfnmM. B.\binitsM. B., \bauthor\bparticlevan de \bsnmRijn, \bfnmM.\binitsM., \bauthor\bsnmJeffrey, \bfnmS. S.\binitsS. S., \bauthor\bsnmThorsen, \bfnmT.\binitsT., \bauthor\bsnmQuist, \bfnmH.\binitsH., \bauthor\bsnmMatese, \bfnmJ. C.\binitsJ. C., \bauthor\bsnmBrown, \bfnmP. O.\binitsP. O., \bauthor\bsnmBotstein, \bfnmD.\binitsD., \bauthor\bsnmLønning, \bfnmP. E.\binitsP. E. \AND\bauthor\bsnmBørresenDale, \bfnmA.\binitsA. (\byear2001). \btitleGene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. \bjournalProc. Natl. Acad. Sci. USA \bvolume98 \bpages10869–10874. \bptokimsref \endbibitem
 Stranger et al. (2005) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmStranger, \bfnmB.\binitsB., \bauthor\bsnmForrest, \bfnmM.\binitsM., \bauthor\bsnmClark, \bfnmA.\binitsA., \bauthor\bsnmMinichiello, \bfnmM.\binitsM., \bauthor\bsnmDeutsch, \bfnmS.\binitsS., \bauthor\bsnmLyle, \bfnmR.\binitsR., \bauthor\bsnmHunt, \bfnmS.\binitsS., \bauthor\bsnmKahl, \bfnmB.\binitsB., \bauthor\bsnmAntonarakis, \bfnmS.\binitsS., \bauthor\bsnmTavare, \bfnmS.\binitsS. \betalet al. (\byear2005). \btitleGenomewide associations of gene expression variation in humans. \bjournalPLoS Genetics \bvolume1 \bpages695–704. \bptokimsref \endbibitem
 Tibshirani (1996) {barticle}[mr] \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1996). \btitleRegression shrinkage and selection via the lasso. \bjournalJ. Roy. Statist. Soc. Ser. B \bvolume58 \bpages267–288. \bidissn=00359246, mr=1379242 \bptokimsref \endbibitem
 Wu et al. (2009) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmWu, \bfnmT. T.\binitsT. T., \bauthor\bsnmChen, \bfnmY. F.\binitsY. F., \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmSobel, \bfnmE.\binitsE. \AND\bauthor\bsnmLange, \bfnmK.\binitsK. (\byear2009). \btitleGenomewide association analysis by lasso penalized logistic regression. \bjournalBioinformatics \bvolume25 \bpages714–721. \bptokimsref \endbibitem
 Yuan and Lin (2006) {barticle}[mr] \bauthor\bsnmYuan, \bfnmMing\binitsM. \AND\bauthor\bsnmLin, \bfnmYi\binitsY. (\byear2006). \btitleModel selection and estimation in regression with grouped variables. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume68 \bpages49–67. \biddoi=10.1111/j.14679868.2005.00532.x, issn=13697412, mr=2212574 \bptokimsref \endbibitem
 Yuan and Yan (2010) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmYuan, \bfnmX.\binitsX. \AND\bauthor\bsnmYan, \bfnmS.\binitsS. (\byear2010). \btitleVisual classification with multitask joint sparse representation. In \bbooktitleProceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). \bpublisherIEEE Computer Society Press, \baddressLos Alamitos, CA. \bptokimsref \endbibitem
 Zhang (2010) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhang, \bfnmY.\binitsY. (\byear2010). \btitleMultitask active learning with output constraints. In \bbooktitleProceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI). \bpublisherAAAI Press, \baddressMenlo Park, CA. \bptokimsref \endbibitem
 Zhang and Horvath (2005) {barticle}[mr] \bauthor\bsnmZhang, \bfnmBin\binitsB. \AND\bauthor\bsnmHorvath, \bfnmSteve\binitsS. (\byear2005). \btitleA general framework for weighted gene coexpression network analysis. \bjournalStat. Appl. Genet. Mol. Biol. \bvolume4 \bpagesArt. 17, 45 pp. (electronic). \biddoi=10.2202/15446115.1128, issn=15446115, mr=2170433 \bptokimsref \endbibitem
 Zhao, Rocha and Yu (2009) {barticle}[mr] \bauthor\bsnmZhao, \bfnmPeng\binitsP., \bauthor\bsnmRocha, \bfnmGuilherme\binitsG. \AND\bauthor\bsnmYu, \bfnmBin\binitsB. (\byear2009). \btitleThe composite absolute penalties family for grouped and hierarchical variable selection. \bjournalAnn. Statist. \bvolume37 \bpages3468–3497. \biddoi=10.1214/07AOS584, issn=00905364, mr=2549566 \bptokimsref \endbibitem
 Zhou, Jin and Hoi (2010) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhou, \bfnmY.\binitsY., \bauthor\bsnmJin, \bfnmR.\binitsR. \AND\bauthor\bsnmHoi, \bfnmS. C. H.\binitsS. C. H. (\byear2010). \bhowpublishedExclusive lasso for multitask feature selection. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR W&CP. \bptokimsref \endbibitem
 Zhu et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmZhang, \bfnmB.\binitsB., \bauthor\bsnmSmith, \bfnmE. N.\binitsE. N., \bauthor\bsnmDrees, \bfnmB.\binitsB., \bauthor\bsnmBrem, \bfnmR. B.\binitsR. B., \bauthor\bsnmKruglyak, \bfnmL.\binitsL., \bauthor\bsnmBumgarner, \bfnmR. E.\binitsR. E. \AND\bauthor\bsnmSchadt, \bfnmE. E.\binitsE. E. (\byear2008). \btitleIntegrating largescale functional genomic data to dissect the complexity of yeast regulatory networks. \bjournalNature Genetics \bvolume40 \bpages854–861. \bptokimsref \endbibitem
 Zou and Hastie (2005) {barticle}[mr] \bauthor\bsnmZou, \bfnmHui\binitsH. \AND\bauthor\bsnmHastie, \bfnmTrevor\binitsT. (\byear2005). \btitleRegularization and variable selection via the elastic net. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume67 \bpages301–320. \biddoi=10.1111/j.14679868.2005.00503.x, issn=13697412, mr=2137327 \bptokimsref \endbibitem