Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping\thanksrefT1

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping\thanksrefT1

[ [    [ [ Carnegie Mellon University School of Computer Science
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
E-mail: \printead*e2
\smonth12 \syear2009\smonth2 \syear2012
\smonth12 \syear2009\smonth2 \syear2012
\smonth12 \syear2009\smonth2 \syear2012

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.


10.1214/12-AOAS549 \volume6 \issue3 2012 \firstpage1095 \lastpage1117 \newproclaimexExample


Tree lasso for eQTL mapping


T1Supported in part by NIH 1R01GM087694.


A]\fnmsSeyoung \snmKimlabel=e1] and A]\fnmsEric P. \snmXing\corref\thanksreft2label=e2]


t2Supported in part by ONR N000140910758, NSF DBI-0640543, NSF CCF-0523757 and an Alfred P. Sloan Research Fellowship.

Lasso \kwdstructured sparsity \kwdhigh-dimensional regression \kwdgenetic association mapping \kwdeQTL analysis.

1 Introduction

Recent advances in high-throughput technology for profiling gene expressions and assaying genetic variations at a genome-wide scale have provided researchers an unprecedented opportunity to comprehensively study the genetic causes of complex diseases such as asthma, diabetes, and cancer. Expression quantitative trait locus (eQTL) mapping considers gene expression measurements, also known as gene-expression traits, as intermediate phenotypes, and aims to identify the genetic markers such as single nucleotide polymorphisms (SNPs) that influence the expression levels of genes, which gives rise to the variability in clinical phenotypes or disease susceptibility across individuals. This type of analysis can provide a deeper insight into the functional role of the eQTLs in the disease process by linking the SNPs to genes whose functions are often known directly or indirectly through other co-expressed genes in the same pathway.

The most commonly used method for eQTL analysis has been to examine the expression level of a single gene at a time for association, treating genes as independent of each other [Cheung et al. (2005), Stranger et al. (2005), Zhu et al. (2008)]. However, it is widely believed that many of the genes in the same biological pathway are often co-expressed or co-regulated [Pujana et al. (2007), Zhang and Horvath (2005)] and may share a common genetic basis that causes the variations in their expression levels. How to incorporate such information on relatedness of genes into statistical analysis of associations between SNPs and gene expressions remains an under-addressed problem. One of the popular existing approaches is to consider the relatedness of genes after rather than during statistical analysis of eQTL data, which obviously fails to fully exploit the statistical power from this additional source of information. Specifically, in order to find the genetic variations with pleiotropic effects that perturb the expressions of multiple related genes jointly, in recent eQTL studies, the expression traits for individual genes were analyzed separately, and then the results were examined for all genes in light of gene modules to see if any gene sets are enriched for association with a common SNP [Zhu et al. (2008), Emilsson et al. (2008), Chen et al. (2008)]. This type of analysis uses the information on gene modules only in the post-processing step after a set of single-gene analyses, instead of directly incorporating the correlation pattern in gene expressions in the process of searching for SNPs with pleitropic effects.

Recently, a different approach for searching for SNPs with pleiotropic effects has been proposed to leverage information on gene modules more directly [Segal et al. (2003), Lee et al. (2006)]. In this approach, the module network originally developed for discovering clusters of co-regulated genes from gene expression data was extended to include SNPs as potential regulators that can influence the activity of gene modules. The main weakness of this method is that it computed the averages of gene-expression levels over those genes within each module and looked for SNPs that affect the average gene expressions of the module. The operation of computing averages can lead to a significant loss of information on the detailed activity of individual genes and negative correlations within a module.

(a) (b)
Figure 1: An illustration of a tree lasso. (a): The sparse structure in regression coefficients is shown with white entries for zeros and gray entries for nonzero values. The hierarchical clustering tree represents the correlation structure in responses. The first two responses are highly correlated according to the clustering tree, and are likely to be influenced by the same covariates. (b): Groups of variables associated with each node of the tree in panel (a) in the tree-lasso penalty.

In this article we propose a tree-guided group lasso, or tree lasso, that directly combines statistical strength across multiple related genes in gene expression data to identify SNPs with pleiotropic effects by leveraging any given knowledge of a hierarchical clustering tree over genes.333Here we focus on making use of the given knowledge of related genes to enhance the power of eQTL analysis, rather than discovering or evaluating how genes are related, which are interesting problems in their own right, and are studied widely [Segal et al. (2003)]. If the gene co-expression pattern is not available, one can simply run any off-the-shelf hierarchical agglomerative clustering algorithm on the gene-expression data to obtain one before applying our method. It is beyond the scope of this paper to discuss, compare, and further develop such algorithms for clustering genes or learning trees. The hierarchical clustering tree contains clusters of genes at multiple granularity, and genes within a cluster have correlated expression levels. The leaf nodes of the tree correspond to individual genes, and each internal node represents a cluster of genes at the leaf nodes of the subtree rooted at the internal node in question. Furthermore, each internal node in the tree is associated with a weight that represents the height of the subtree, or how tightly the genes in the cluster for that internal node are correlated. As illustrated in Figure 1(a), the expression levels of genes in each cluster are likely to be influenced by a common set of SNPs, and this type of sharing of genetic effects among correlated genes is stronger among tightly correlated genes in the cluster at the lower levels with a smaller height in the tree than among loosely correlated genes in the cluster near the root of the tree with a greater height. This multi-level grouping structure of genes can be available either as prior knowledge from domain experts, or can be learned from the gene-expression data using various clustering algorithms such as the hierarchical agglomerative clustering algorithm [Golub et al. (1999)].

Our method is based on a multivariate regression method with a regularization function that is constructed from the hierarchical clustering tree. This regularizer induces a structured shrinkage effect that encourages multiple correlated responses to share a similar set of relevant covariates, rather than having independent sets of relevant covariates. This is a biologically and statistically desirable bias not present in existing methods for identifying eQTLs. For example, assuming that the SNPs are represented as covariates, gene expressions as responses, and the association strengths as regression coefficients in a regression model, a multivariate regression with an regularization, called the lasso, has been applied to identify a small number of SNPs with nonzero association strengths [Wu et al. (2009)]. Here, the lasso treats multiple responses as independent of each other and selects relevant covariates for each response variable separately. Although the penalty in the lasso can be extended to the penalty, also known as the group-lasso penalty, for union support recovery, where all of the responses are constrained to have the same relevant covariates [Obozinski, Wainwright and Jordan (2008), Obozinski, Taskar and Jordan (2010)], in this case, the rich and heterogeneous relatedness among the responses as captured by a weighted tree cannot be taken into account.

Our method extends the penalty to the tree-lasso penalty by letting the hierarchically-defined groups overlap. The tree-lasso penalty achieves structured sparsity, where the related responses (i.e., gene expressions) in the same group share a common set of relevant covariates (i.e., SNPs), in a way that is properly calibrated to the strength of their relatedness and consistent with their overlapping group organization. Although several schemes have been previously proposed to use the group-lasso penalty with overlapping groups to take advantage of a more complex structural information on response variables, due to their ad hoc weighting scheme for different overlapping groups in the regularization function, some regression coefficients were penalized arbitrarily more heavily than others, leading to an inconsistent estimate [Zhao, Rocha and Yu (2009), Jacob, Obozinski and Vert (2009), Jenatton, Audibert and Bach (2009)]. In contrast, we propose a systematic weighting scheme for overlapping groups that applies a balanced penalization to all of the regression coefficients. Since the tree lasso is a special case of overlapping group lasso, where the weights and overlaps of groups are determined according to the hierarchical clustering tree, we adopt for efficient optimization the smoothing proximal gradient (SPG) method [Chen et al. (2011)] that was developed for optimizing a convex loss function with a general class of structured-sparsity-inducing penalty functions including overlapping group lasso.

Compared to our previous work on the graph-guided fused lasso that leverages a network structure over responses to achieve structured sparsity [Kim and Xing (2009)], the tree lasso has a considerably lower computational time, and allows more than thousands of response variables to be analyzed simultaneously as is necessary in a typical eQTL mapping. This is in part because the computation time in the graph-guided fused lasso depends on the number of edges in the graph that can be as large as , where is the number of response variables, whereas in the tree lasso, it is determined by the number of nodes in the tree, which is bounded by twice the number of response variables. Another potential advantage of the tree lasso is that it relaxes the constraint in the graph-guided fusion penalty that the regression coefficients should take the similar values for a covariate relavant to multiple correlated responses. Although introducing this bias through the fusion penalty in the graph-guided fused lasso offered the benefit of combining weak association signals and reducing false positives, it is expected that relaxing this constraint could further increase the power. The penalty in our tree regularization achieves a joint selection of covariates for multiple related responses, while allowing different values for the regression coefficients corresponding to the selected covariate and correlated response variables.

Although the hierarchical agglomerative clustering algorithm has been widely popular as a preprocessing step for regression or classification tasks [Golub et al. (1999), Sørlie et al. (2001), Hastie et al. (2001)], our proposed method is the first to make use of the full results from the clustering algorithm given as tree structure and subtree-height information. Most of the previous classification or regression methods that build on the hierarchical clustering algorithm used summary statistics extracted from the hierarchical clustering tree such as subsets of genes forming clusters or averages of gene expressions within each cluster, rather than using the tree as it is [Golub et al. (1999), Hastie et al. (2001)]. In the tree lasso, we use the full hierarchical clustering tree as prior knowledge to construct a regularization function. Thus, the tree lasso incorporates the full information present in both the raw data and the hierarchical clustering tree to maximize the power for detecting weak association signals and to reduce false positives. In our experiments, we demonstrate that our proposed method can be successfully applied to select SNPs affecting the expression levels of multiple genes, using both simulated and yeast data sets.

The remainder of the paper is organized as follows. In Section 2 we provide a brief discussion of previous work on sparse regression estimation. In Section 3 we introduce the tree lasso and describe an efficient optimization method based on SPG. We present experimental results on simulated and yeast eQTL data sets in Section 4, and conclude in Section 5.

2 Background on multivariate regression approach for eQTL mapping

Let us assume that data are collected for SNPs and gene-expression traits over individuals. Let denote the matrix of SNP genotypes for covariates, and the matrix of gene-expression measurements for responses. In eQTL mapping, each element of the takes values from according to the number of minor alleles at the given locus in each individual. Then, we assume a linear model for the functional mapping from covariates to response variables:


where is the matrix of regression coefficients and is the matrix of noise terms distributed as mean 0 and a constant variance. We center each column of and such that the mean is zero, and consider the model without an intercept. Throughout this paper, we use subscripts and superscripts to denote rows and columns of a matrix, respectively (e.g., and for the th row and th column of ).

When is large and the number of relevant covariates is small, the lasso offers an effective method for identifying the small number of nonzero elements in [Tibshirani (1996)]. The lasso obtains by solving the following optimization problem:


where is the Frobenius norm, is the matrix norm, and is a tuning parameter that controls the amount of sparsity in the solution. Setting to a small value leads to a smaller number of nonzero regression coefficients.

The lasso estimation in (2) is equivalent to selecting relevant covariates for each of the responses separately, and does not provide any mechanism to enforce a joint selection of common relevant covariates for multiple related responses. In the literature of multi-task learning, an penalty, also known as a group lasso penalty [Yuan and Lin (2006)], has been adopted in multivariate-response regression to take advantage of the relatedness of the response variables and recover the union support—the pattern of nonzero regression coefficients shared across all of the responses [Obozinski, Wainwright and Jordan (2008)]. This method is widely known as the -regularized multi-task regression in the machine learning community, and its estimate for regression coefficients is given as


where denotes an norm. In -regularized multi-task regression, an norm is applied to the regression coefficients for all responses for each covariate, , and these norms for the covariates are combined through an norm to encourage only a small number of covariates to take nonzero regression coefficients. Since the part of the penalty does not have the property of encouraging sparsity, if the th covariate is selected as relevant, then all of the elements of would take nonzero values, although the regression coefficient values for the covariate are still allowed to vary across different responses. When applied to eQTL mapping, this method is significantly limited since it is not realistic to assume that the expression levels of all of the genes are influenced by the same set of relevant SNPs. A subset of co-expressed genes may be perturbed by a common set of SNPs, and genes in a different pathway are less likely to be affected by the same SNPs. The sparse group lasso [Friedman, Hastie and Tibshirani (2010)] can be adopted to relax this constraint by adding a lasso penalty to (3) so that individual regression coefficients within each norm can be set to zeros. However, this method shares the same limitation as the -regularized multi-task regression in that it cannot incorporate complex grouping structures in the responses such as groups at multiple granularity as in the hierarchical clustering tree.

3 Tree lasso for exploiting hierarchical clustering tree in eQTL mapping

We introduce the tree lasso that considerably adds flexibility and power to these existing methods by taking advantage of the complex correlation structure given as a hierarchical clustering tree over the responses. We present a highly efficient algorithm for estimating the parameters in a tree lasso that is based on the smoothing proximal gradient descent developed for a general class of structured-sparsity-inducing norms.

3.1 Tree lasso

In a microarray experiment, gene-expression levels are measured for more than thousands of genes at a time, and many of the genes show highly correlated expression levels across samples, implying they may share a common regulator or participate in the same pathway. In addition, in eQTL analysis, it is widely believed that genetic variations such as SNPs perturb modules of related genes rather than acting on individual genes. As these gene modules are often derived and visualized by running the hierarchical agglomerative clustering algorithm on gene expression data, a natural extension of sparse regression methods for eQTL mapping is to incorporate with them the output of the hierarchical clustering algorithm to identify genetic variations that influence gene modules in the clustering tree. In this section, we build on the -regularized regression and introduce a tree lasso that can directly leverage hierarchically-organized groups of genes to combine statistical strength across the expression levels of genes within each group. Although our work is primarily motivated by eQTL mapping in genetics, the tree lasso is generally applicable to any multivariate-response regression problem, where the hierarchical group structure over the responses is given as desirable sources of structural bias, such as in many computer vision [Yuan and Yan (2010)] and natural language processing applications [Zhang (2010), Zhou, Jin and Hoi (2010)], where dependencies among visual objects and among parts of speech are well known to be valuable to enhance prediction performance.

Assume that the relationship among the responses is represented as tree with a set of vertices of size . As illustrated in Figure 1(a), each of the leaf nodes is associated with a response variable, and each of the internal nodes represents a group of the responses located at the leaves of the subtree rooted at the given internal node. Internal nodes near the bottom of the tree correspond to tight clusters of highly related responses, whereas the internal nodes near the root represent groups with weak correlations among the responses in its subtree. This tree structure may be provided as prior knowledge by domain experts or external resources (e.g., gene ontology databases in our eQTL mapping problem), or can be learned from the data for response variables using methods such as the hierarchical agglomerative clustering algorithm. We assume that each node of the tree is associated with height of the subtree rooted at , representing how tightly its members are correlated. In addition, we assume that the heights ’s of the internal nodes are normalized so that the height of the root node is 1.

Given this tree over the responses, we generalize the regularization in (3) to a tree regularization by expanding the part of the penalty into an overlapping group lasso penalty. The overlapping groups in tree regularization are defined based on tree as follows. Each node of tree is associated with group whose members are the response variables at the leaf nodes of the subtree rooted at node . For example, Figure 1(b) shows the groups of responses and the corresponding regression coefficients that are associated with each of the nodes of the tree in Figure 1(a). Given these overlapping groups, we define the tree lasso as


where is a vector of regression coefficients . Since a tree associated with responses can have at most nodes, the number of terms that appear in the tree-lasso penalty is upper-bounded by for each covariate.

Each group of regression coefficients in (4) is weighted with such that the group of responses near the leaf of the tree is more likely to have common relevant covariates, while ensuring the amount of penalization aggregated over all of the overlapping groups for each regression coefficient to be the same for all regression coefficients. We define ’s in (4) in terms of two quantities ’s and ’s, given as and , that are associated with each internal node of height in tree . The  represents the weight for selecting relevant covariates separately for the responses associated with each child of node , whereas the represents the weight for selecting relevant covariates jointly for the responses for all of the children of node . We first consider a simple case with two responses () and a tree of three nodes that consists of two leaf nodes ( and ) and one root node (), and then generalize this to an arbitrary tree. When , the penalty term in (4) can be written as


where the group weights are set to , , and . Equation (5) has a similar form to the elastic-net penalty [Zou and Hastie (2005)], with the slight difference that the elastic net uses the square of the  norm. The norm and norm in (5) are weighted by and , and play the role of setting and to nonzero values separately or jointly. A large value of indicates that the responses are highly related, and a joint covariate selection is encouraged by heavily weighting the part of the penalty. When , the penalty in (5) is equivalent to the -regularized multi-task regression in (3), where the responses share the same set of relevant covariates, whereas setting in (5) leads to a lasso penalty. In general, given a single-level tree with all of the responses under a single parent node, the tree-lasso penalty corresponds to a linear combination of and penalties as in (5).

Now, we generalize this process of obtaining ’s in the tree-lasso penalty for the special case of a single-level tree to an arbitrary tree. Starting from the root node and traversing down the tree recursively to the leaf nodes, at each of the root and internal nodes, we apply the similar operation of linear combination of the norm and norm as in (5) as follows:



Then, it can be shown that the following relationship holds between ’s and (, )’s:

The above weighting scheme extends the linear combination of the and  penalty in (5) hierarchically, so that the and  norms encourage separate and joint selections of covariates for the given groups of responses. The ’s and ’s determine the balance between these and norms. If and for all , then only separate selections are performed, and the tree-lasso penalty reduces to the lasso penalty. On the other hand, if and for all , the penalty reduces to the penalty in (3) that constrains all of the responses to have the same set of relevant covariates. The unit contour surfaces of various penalties for , , and with groups as defined in Figure 1 are shown in Figure 2.

(a) (b) (c)
(d) (e)
Figure 2: Unit contour surfaces for in various penalties, assuming the tree structure over responses in Figure 1. (a): Lasso, (b): tree lasso with and , (c): and , (d): and , and (e): and .

The seemingly complex method for determining the weights ’s for groups in the tree-lasso penalty has the property of ensuring all of the regression coefficients to be overall penalized by an equal amount across all nested overlapping groups as they appear in a balanced manner. Proposition 1 (as stated and proved in the supplemental article [Kim and Xing (2012)]) shows that even if each response belongs to multiple groups associated with different internal nodes and appears multiple times in the overall penalty in (6), the sum of weights over all of the groups that contain the given response is always one. Thus, the weighting scheme in (6) guarantees that all of the individual regression coefficients are overall penalized equally. Although several variations of group lasso with overlapping groups have been proposed previously, all of those methods weighted the norms for overlapping groups with arbitrarily defined weights, resulting in unbalanced weights for different regression coefficients [Zhao, Rocha and Yu (2009), Jenatton, Audibert and Bach (2009)]. It was empirically shown that these arbitrary weighting schemes give an inconsistent estimate [Jenatton, Audibert and Bach (2009)].

Below, we provide an example of the process of constructing a tree-lasso penalty based on the simple tree over three responses in Figure 1(a). For more complex trees over a large number of responses, the same procedure can be applied, traversing the tree recursively from the root to the leaf nodes.


Given the tree in Figure 1, for the th covariate the penalty of the tree lasso in (6) can be written as follows:

The tree-lasso penalty that we introduced above can be easily extended to other related types of structures such as trees with different branching factors and a forest that consists of multiple trees. In addition, our proposed regularization can be applied to a pruned tree whose leaf nodes contain groups of variables instead of individual variables.

3.2 Parameter estimation

Although the tree-lasso optimization problem in (4) is convex, the main challenges for solving equation (4) arise from the nonseparable terms over ’s in the nonsmooth penalty. While the coordinate descent algorithm has been successfully applied to nonsmooth penalties such as the lasso and group lasso with nonoverlapping groups [Friedman et al. (2007)], it cannot be applied to the tree lasso because the overlapping groups with nonseparable terms in the penalty prevent us from obtaining a closed-form update equation for iterative optimization. While the optimization problem for the tree lasso can be formulated as a second-order cone program and solved with the interior point method [Boyd and Vandenberghe (2004)], this approach does not scale to high-dimensional problems such as eQTL mapping that involves a large number of SNPs and gene-expression measurements. Recently, a smoothing proximal gradient (SPG) method was developed for an efficient optimization of a convex loss function with a general class of structured-sparsity-inducing penalty functions that share the same challenges of nonsmoothness and nonseparability [Chen et al. (2011)]. The SPG can handle a wide variety of penalties such as the overlapping group lasso and fused lasso, and as the tree lasso is a special case of the overlapping group lasso, we adopt this method in our paper. As we detail below in this section, SPG first decouples the nonseparable terms in the penalty by reformulating it with a dual norm, and introduces a smooth approximation of the nonsmooth penalty. Then, in order to optimize the objective function with this smooth approximation of the penalty, SPG adopts the fast iterative shrinkage thresholding algorithm (FISTA) [Beck and Teboulle (2009)], an accelerated gradient descent method, to optimize the objective function an accelerated gradient descent method.

3.2.1 Reformulation of the penalty function

We rewrite (4) by splitting the tree-lasso penalty into two parts corresponding to two sets of nodes in tree , for all of the internal nodes and for all of the leaf nodes, as follows:

We notice that in the above equation, the first penalty term for contains overlapping groups, whereas the second penalty term for is equivalent to the weighted lasso penalty , where represents the weight for the leaf node associated with the th response.

Since the penalty term associated with contains overlapping groups and therefore is nonseparable, we rewrite this term by introducing a vector of auxiliary variables for each covariate and group and by reformulating it with a dual norm representation to obtain

where denotes a matrix inner product, and isa matrix given as

with domain . In addition, in (3.2.1) is a matrix whose elements are defined as

with rows indexed by such that and , and columns indexed by . We note that the nonseparable terms over ’s in the tree-lasso penalty are decoupled in the dual-norm representation in (3.2.1).

3.2.2 Smooth approximation to the nonsmooth penalty

The reformulation in (3.2.1) is still nonsmooth in , which makes it nontrivial to optimize. To overcome this challenge, SPG introduces a smooth approximation of (3.2.1) as follows:


where is a smoothing function with the maximum value , and is the parameter that determines the amount of smoothness. We notice that when , we recover the original nonsmooth penalty in . It has been shown [Chen et al. (2011)] that  is convex and smooth with gradient

where is the optimal solution to (9), composed of , given the shrinkage operator defined as


In addition, is Lipschitz continuous with the Lipschitz constant , where is a matrix spectral norm. We can show that .

3.2.3 Smoothing proximal gradient (SPG) method

By substituting the penalty term for in (3.2.1) with in (9), we obtain an objective function whose nonsmooth component contains only the weighted lasso penalty as follows:


The smooth part of the above objective function is


and its gradient is given as


which is Lipschitz-continuous with the Lipschitz constant,


where is the largest eigenvalue of .

Input: , , , , Lipschitz constant , desired accuracy .

Initialization: set where , , .

Iterate For until convergence of :

  1. [aaIte]

  2. Compute according to (13).

  3. Solve the proximal operator associated with the -norm:

  4. Set .

  5. Set .

Output: .

Algorithm 1 Smoothing proximal gradient descent (SPG) for tree lasso

The key idea behind SPG is that once we introduce the smooth approximation of (3.2.1), the only nonsmooth component in (11) is the weighted lasso penalty and FISTA can be adopted to optimize (11). The SPG algorithm for the tree lasso is given in Algorithm 1. In order to obtain the proximal operator associated with the weighted lasso penalty, we rewrite in (2) as follows:

and obtain the closed-form solution for in (2) by soft-thresholding:

where ’s are elements of . The Lipschitz constant  given as in (14) plays the role of determining the step size in each gradient descent iteration, although this value can be expensive to compute for large . As suggested in Chen et al. (2011), a back-tracking line search can be used to determine the step size for large [Boyd and Vandenberghe (2004)].

It can be shown that the convergence rate of Algorithm 1 is iterations, given the desired accuracy [Chen et al. (2011)]. If we precompute and store and , the time complexity per iteration of SPG for the tree lasso is , compared to for the interior point method for the second-order cone program. Thus, the time complexity for SPG is quadratic in and linear in max(, ), which is significantly more efficient than cubic in both and for the interior point method.

4 Experiments

We demonstrate the performance of our method on simulated data sets and the yeast data set of genotypes and gene expressions, and compare the results with those from the lasso and the -regularized multi-task regression that do not assume any structure over responses. In all of our experiments, we determine the regularization parameter by fitting models on a training set for a range of values for , computing the prediction error of each model on a validation set, and then selecting the value of a regularization parameter that gives the lowest prediction error. We evaluate these methods based on two criteria, sensitivity/specificity in detecting true relevant covariates and prediction errors on test data sets. We note that the (specificity) and sensitivity are equivalent to type I error rate and (type II error rate), respectively. Test errors are obtained as mean squared differences between the predicted and observed response measurements based on test data sets that are independent of training and validation data sets.

4.1 Simulation study

We simulate data using the following scenario analogous to eQTL mapping. We simulate with , , and as follows. We first generate the genotypes by sampling each element in from a uniform distribution over that corresponds to the number of mutated alleles at each SNP locus. Then, we set the values of by first selecting nonzero entries and filling these entries with predefined values. We assume a hierarchical structure with four levels over the responses, and select the nonzero elements of so that the groups of responses described by the tree share common relevant covariates. The hierarchical clustering tree as used in our simulation is shown in Figure 3(a) only for the top three levels to avoid a clutter, and the true nonzero elements in the regression coefficient matrix are shown as white pixels in Figure 3(b) with responses (gene expressions) as rows and covariates (SNPs) as columns. In all of our simulation study, we divide the full data set of into training and validation sets of sizes 100 and 50, respectively.

(a) (b) (c)
(d) (e)
Figure 3: An example of regression coefficients estimated from a simulated data set. (a): Hierarchical clustering tree of four levels over responses. Only the top three levels are shown to avoid clutter. (b): True regression coefficients. Estimated parameters are shown for (c): lasso, (d): -regularized multli-task regression, and (e): tree lasso. The rows represent responses and the columns covariates.

To illustrate the behavior of different methods, we fit the lasso, the -regularized multi-task regression, and our method to a single data set simulated with the nonzero elements of set to 0.4, and show the results in Figure 3(c)–(e), respectively. Since the lasso does not have any mechanism to borrow statistical strength across different responses, false positives for nonzero regression coefficients are distributed randomly across the matrix  in Figure 3(c). On the other hand, the -regularization method blindly combines information across all responses regardless of the correlation structure. As a result, once a covariate is selected as relevant for a response, it gets selected for all of the other responses, and we observe vertical stripes of nonzero values in Figure 3(d). When the hierarchical clustering structure in Figure 3(a) is available as prior knowledge, it is visually clear from Figure 3(e) that our method is able to suppress false positives, and to recover the true relevant covariates for correlated responses significantly better than other methods.

(a) (b)
Figure 4: Comparison of various sparse regression methods on simulated data sets. (a): ROC curves for the recovery of true relevant covariates. (b): Prediction errors. In simulation, is used for the nonzero elements of the true regression coefficient matrix. Results are averaged over 50 simulated data sets.

In order to systematically evaluate the performance of the different methods, we generate 50 simulated data sets, and show in Figure 4(a) receiver operating characteristic (ROC) curves for the recovery of the true nonzero elements in the regression coefficient matrix averaged over these 50 data sets. Figure 4(a) represents results from data sets with true nonzero elements in set to 0.2. Additional results for true nonzero elements in set to 0.4 and 0.6 are available in Online Appendix Figures 1A and 1B [Kim and Xing (2012)]. Our method clearly outperforms the lasso and the -regularized multi-task regression. Especially when the signal-to-noise ratio is low in Figure 4(a), the advantage of incorporating the prior knowledge of the tree as a correlation structure over responses is significant.

We compare the performance of the different methods in terms of prediction errors, using an additional 50 samples as test data. The prediction errors averaged over 50 simulated data sets are shown in Figure 4(b) for data sets generated from 0.2 for true nonzero elements of regression coefficients. Additional results for data sets generated from 0.4 and 0.6 for true nonzero elements of regression coefficients are shown in Online Appendix Figures 2A and 2B, respectively. In addition to the results from sparse regression methods, we include the prediction errors from the null model that has only an intercept term. We find that our method shown as “T” in Figure 4(b) has lower prediction errors than all of the other methods. In the tree lasso, in addition to directly using the true tree structure in Figure 3(a), we also consider the scenario in which the true tree structure is not known a priori. In this case, we learn a tree by running a hierarchical agglomerative clustering algorithm on the correlation matrix of the response measurements, and use this tree along with the weights ’s associated with each internal node in our method. Since the tree obtained in this manner represents a noisy realization of the true underlying tree structure, we discard the nodes for weak correlation near the root of the tree by thresholding the normalized ’s at and 0.7, and show the prediction errors obtained from these thresholded trees as “T0.9” and “T0.7” in Figure 4(b). Even when the true tree structure is not available, our method is able to benefit from taking into account the correlation structure among responses, and gives lower prediction errors. We performed the same experiment while varying the threshold  in the range of [0.6, 1.0], and obtained similar prediction errors across different values of (results not shown). This shows that the meaningful clustering information that the tree lasso takes advantage of lies mostly in the tight clusters at the lower levels of a tree rather than the clusters of loosely related variables near the root of the tree.

4.2 Analysis of yeast data

We analyze the yeast eQTL data set of the genotype and gene-expression data for 114 yeast strains [Zhu et al. (2008)] using various sparse regression methods. We focus on the chromosome 3 with 21 SNPs and expression levels of 3,684 genes, after removing those genes whose expression levels are missing in more than 5% of the samples. Although it is widely known that genes are organized into functional modules within which gene-expression levels are often correlated, the hierarchical module structure over correlated genes is not directly available as prior knowledge, and we learn the tree by running the hierarchical agglomerative clustering algorithm on gene-expression data. We use only the internal nodes with heights or in our method. The goal of the analysis is to search for SNPs (covariates) whose variation induces a significant variation in the gene-expression levels (responses) over different strains. By applying our method that incorporates information on gene modules at multiple granularity in the hierarchical clustering tree, we expect to be able to identify SNPs that influence the activity of a group of genes that are co-expressed or co-regulated.

(a) (b) (c) (d) (e)
Figure 5: Results for the yeast eQTL data set. (a): Correlation matrix of the gene expression data, where rows and columns are reordered after applying hierarchical agglomerative clustering. Estimated regression coefficients are shown for (b): lasso, (c): -regularized multi-task regression, (d): tree lasso with , and (e): with . In panels (b)–(e), the rows represent genes (responses) and the columns SNPs (covariates).

In Figure 5(a), we show the correlation matrix of the gene expressions after reordering the rows and columns according to the results of the hierarchical agglomerative clustering algorithm. The estimated is shown for the lasso, the -regularized multi-task regression, and our method with and 0.7 in Figure 5(b)–(e), respectively, where the rows represent genes and the columns SNPs. The regularization parameter is chosen based on prediction errors on a validation set of size 10. The lasso estimates in Figure 5(b) are extremely sparse and do not reveal any interesting structure in SNP-gene relationships. We believe that the association signals are very weak as is typically the case in the eQTL study, and that the lasso is unable to detect such weak signals without combining statistical strength across multiple genes with correlated expressions. The estimates from the -regularized multi-task regression are not sparse across gene expressions, and tend to form vertical stripes of nonzero regression coefficients as can be seen in Figure 5(c). On the other hand, our method in Figure 5(d)–(e) reveals clear groupings in the patterns of associations between gene expressions and SNPs. In addition, as shown in Figure 6, our method performs significantly better in terms of prediction errors on the test set obtained from the 10-fold cross-validation.

Figure 6: Prediction errors for the yeast eQTL data set.

Given the estimates of in Figure 5, we look for an enrichment of gene ontology (GO) categories among the genes with nonzero estimated regression coefficients for each SNP. A group of genes that form a module often participate in the same pathway, leading to an enrichment of a GO category among the members of the module. Since we are interested in identifying SNPs influencing gene modules, and our method encourages this joint association through the hierarchical clustering tree, we hypothesize that our method would reveal more significant GO enrichments in the estimated nonzero elements in . Given the tree-lasso estimate, we search for GO enrichment in the set of genes that have nonzero regression coefficients for each SNP. On the other hand, the estimates of the -regularized method are not sparse across genes. Thus, we threshold the absolute values of the estimated  at 0.005, 0.01, 0.03, and 0.05, and perform GO enrichment analysis for only those genes with above the threshold.

In Figure 7, we show the number of SNPs with significant enrichments at different -value cutoffs for subcategories within each of the three broad GO categories, including biological processes, molecular functions, and cellular components. For example, within biological processes, SNPs were found to be enriched for GO terms such as mitocondrial translation, amino acid biosynthetic process, and organic acid metabolism. Regardless of the thresholds for selecting significant associations in the estimates from the -regularized multi-task regression, our method generally finds more significant enrichment. Although due to the lack of ground-truth information, the results in Figure 7 do not directly demonstrate that our method led to more significant findings than other methods, they provide evidence that our method was successful in finding SNPs with pleiotropic effects that influence gene modules rather than focusing on identifying SNPs that affect individual genes as in the lasso.

(a) (b) (c)
Figure 7: Enrichment of GO categories for genes whose expression-levels are influenced by the same SNP based on the regression coefficients estimated from the yeast eQTL data set. The number of SNPs with significant enrichment is shown for GO categories within (a): biological process, (b): molecular function, and (c): cellular component.
SNP reported
loc. Module GO category enrichment [Zhu
in Chr3 size (overlap/#genes) -value et al. (2008)]
064,300 203 BP: Amino acid biosynthetic process
075,000 167 BP: Amino acid biosynthetic process BP: Organic
BP: Organic acid metabolism acid metabolism
MF: Transferase activity ()
076,100 186 MF: Catalytic activity
079,000 167 BP: Amino acid biosynthetic process
MF: Catalytic activity
086,000 103 BP: Amino acid biosynthetic process
MF: Oxidoreductase activity
100,200 068 BP: Amino acid biosynthetic process
105,000 168 BP: Amino acid biosynthetic process
MF: Transferase activity
175,800 089 BP: Amino acid biosynthetic process
MF: Catalytic activity
210,700 023 BP: Branched chain family BP: Response to
amino acid biosynthetic process chemical stimulus
BP: Response to pheromone ()
228,100 195 BP: Mitochondrial translation
CC: Mitochondrial part
MF: Hydrogen ion transporting ATP synthase
activity, rotational mechanism
240,300 258 CC: Cytosolic ribosome
MF: Structural constituent of ribosome
240,300 040 BP: Generation of precursor
metabolites and energy
CC: Mitochondrial inner membrane
MF: Transmembrane transporter activity
301,400 274 MF: snoRNA binding
Table 1: Enriched GO categories for genes whose expression levels are influenced by the same SNP in the yeast eQTL data set. The results in columns 1–4 are based on the tree-lasso estimate of regression coefficients. The last column shows the enriched GO categories reported in Zhu et al. (2008) (BP: biological processes, MF: molecular functions, CC: cellular components)

Table 1 lists the enriched GO categories (-) for SNPs and the groups of genes whose expression levels are affected by the given SNP based on the tree-lasso estimate of association strengths. For comparison, in the last column of Table 1, we include the enriched GO categories for roughly similar genomic locations that have been previously reported in Zhu et al. (2008) using the conventional single-SNP/single-gene statistical test for association. While the tree-lasso results mostly recover the previously-reported GO enrichments, we find many additional enrichments that are statistically significant. This observation again provides us with indirect evidence that the tree lasso can extract fine-grained information on gene modules perturbed by genetic polymorphisms.

5 Discussion

In this article we proposed a novel regularized regression approach, called the tree lasso, that identifies covariates relevant to multiple related responses jointly by leveraging the correlation structure in responses represented as a hierarchical clustering tree. We discussed how this approach can be used in eQTL analysis to learn SNPs with pleiotropoic effects that influence the activities of multiple co-expressed genes. For optimization, we adopted the smoothing proximal gradient approach that was originally developed for a general class of structured-sparsity-inducing penalties, as the tree-lasso penalty can be viewed as a special case. Our results on both the simulated and yeast data sets showed a clear advantage of the tree lasso in increasing the power of detecting weak signals and reducing false positives.


The balanced weighting scheme of tree lasso and additional experimental results \slink[doi]10.1214/12-AOAS549SUPP \slink[url] \sdatatype.pdf \sdescriptionWe prove that the weighting scheme of the tree-lasso penalty achieves a balanced penalization of all regression coefficients. We also provide additional experimental results on the comparison of the tree lasso with other sparse regression methods using simulated data sets.


  • Beck and Teboulle (2009) {barticle}[mr] \bauthor\bsnmBeck, \bfnmAmir\binitsA. \AND\bauthor\bsnmTeboulle, \bfnmMarc\binitsM. (\byear2009). \btitleA fast iterative shrinkage-thresholding algorithm for linear inverse problems. \bjournalSIAM J. Imaging Sci. \bvolume2 \bpages183–202. \biddoi=10.1137/080716542, issn=1936-4954, mr=2486527 \bptokimsref \endbibitem
  • Boyd and Vandenberghe (2004) {bbook}[mr] \bauthor\bsnmBoyd, \bfnmStephen\binitsS. \AND\bauthor\bsnmVandenberghe, \bfnmLieven\binitsL. (\byear2004). \btitleConvex Optimization. \bpublisherCambridge Univ. Press, \baddressCambridge. \bidmr=2061575 \bptokimsref \endbibitem
  • Chen et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmChen, \bfnmY.\binitsY., \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmLum, \bfnmP. K.\binitsP. K., \bauthor\bsnmYang, \bfnmX.\binitsX., \bauthor\bsnmPinto, \bfnmS.\binitsS., \bauthor\bsnmMacNeil, \bfnmD. J.\binitsD. J., \bauthor\bsnmZhang, \bfnmC.\binitsC., \bauthor\bsnmLamb, \bfnmJ.\binitsJ., \bauthor\bsnmEdwards, \bfnmS.\binitsS., \bauthor\bsnmSieberts, \bfnmS. K.\binitsS. K. \betalet al. (\byear2008). \btitleVariations in DNA elucidate molecular networks that cause disease. \bjournalNature \bvolume452 \bpages429–435. \bptokimsref \endbibitem
  • Chen et al. (2011) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmChen, \bfnmX.\binitsX., \bauthor\bsnmLin, \bfnmQ.\binitsQ., \bauthor\bsnmKim, \bfnmS.\binitsS., \bauthor\bsnmCarbonell, \bfnmJ.\binitsJ. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2011). \btitleSmoothing proximal gradient method for general structured sparse learning. In \bbooktitleProceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI) \bpages105–114. \bpublisherAUAI Press, \baddressCorvallis, OR. \bptokimsref \endbibitem
  • Cheung et al. (2005) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmCheung, \bfnmV.\binitsV., \bauthor\bsnmSpielman, \bfnmR.\binitsR., \bauthor\bsnmEwens, \bfnmK.\binitsK., \bauthor\bsnmWeber, \bfnmT.\binitsT., \bauthor\bsnmMorley, \bfnmM.\binitsM. \AND\bauthor\bsnmBurdick, \bfnmJ.\binitsJ. (\byear2005). \btitleMapping determinants of human gene expression by regional and genome-wide association. \bjournalNature \bvolume437 \bpages1365–1369. \bptokimsref \endbibitem
  • Emilsson et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmEmilsson, \bfnmV.\binitsV., \bauthor\bsnmThorleifsson, \bfnmG.\binitsG., \bauthor\bsnmZhang, \bfnmB.\binitsB., \bauthor\bsnmLeonardson, \bfnmA. S.\binitsA. S., \bauthor\bsnmZink, \bfnmF.\binitsF., \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmCarlson, \bfnmS.\binitsS., \bauthor\bsnmHelgason, \bfnmA.\binitsA., \bauthor\bsnmWalters, \bfnmG. B.\binitsG. B., \bauthor\bsnmGunnarsdottir, \bfnmS.\binitsS. \betalet al. (\byear2008). \btitleGenetics of gene expression and its effect on disease. \bjournalNature \bvolume452 \bpages423–428. \bptokimsref \endbibitem
  • Friedman, Hastie and Tibshirani (2010) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmFriedman, \bfnmJ.\binitsJ., \bauthor\bsnmHastie, \bfnmT.\binitsT. \AND\bauthor\bsnmTibshirani, \bfnmR.\binitsR. (\byear2010). \bhowpublishedA note on the group lasso and a sparse group lasso. Technical report, Dept. Statistics, Stanford Univ., Stanford, CA. \bptokimsref \endbibitem
  • Friedman et al. (2007) {barticle}[mr] \bauthor\bsnmFriedman, \bfnmJerome\binitsJ., \bauthor\bsnmHastie, \bfnmTrevor\binitsT., \bauthor\bsnmHöfling, \bfnmHolger\binitsH. \AND\bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear2007). \btitlePathwise coordinate optimization. \bjournalAnn. Appl. Stat. \bvolume1 \bpages302–332. \biddoi=10.1214/07-AOAS131, issn=1932-6157, mr=2415737 \bptokimsref \endbibitem
  • Golub et al. (1999) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmGolub, \bfnmT. R.\binitsT. R., \bauthor\bsnmSlonim, \bfnmD. K.\binitsD. K., \bauthor\bsnmTamayo, \bfnmP.\binitsP., \bauthor\bsnmHuard, \bfnmC.\binitsC., \bauthor\bsnmGaasenbeek, \bfnmM.\binitsM., \bauthor\bsnmMesirov, \bfnmJ. P.\binitsJ. P., \bauthor\bsnmColler, \bfnmH.\binitsH., \bauthor\bsnmLoh, \bfnmM. L.\binitsM. L., \bauthor\bsnmDowning, \bfnmJ. R.\binitsJ. R., \bauthor\bsnmCaligiuri, \bfnmM. A.\binitsM. A., \bauthor\bsnmBloomfield, \bfnmC. D.\binitsC. D. \AND\bauthor\bsnmLander, \bfnmE. S.\binitsE. S. (\byear1999). \btitleMolecular classification of cancer: class discovery and class prediction by gene expression monitoring. \bjournalScience \bvolume286 \bpages531–537. \bptokimsref \endbibitem
  • Hastie et al. (2001) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmTibshirani, \bfnmR.\binitsR., \bauthor\bsnmBotstein, \bfnmD.\binitsD. \AND\bauthor\bsnmBrown, \bfnmP.\binitsP. (\byear2001). \btitleSupervised harvesting of expression trees. \bjournalGenome Biol. \bvolume2 \bpages0003.1–0003.12. \bptokimsref \endbibitem
  • Jacob, Obozinski and Vert (2009) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmJacob, \bfnmL.\binitsL., \bauthor\bsnmObozinski, \bfnmG.\binitsG. \AND\bauthor\bsnmVert, \bfnmJ.\binitsJ. (\byear2009). \btitleGroup lasso with overlap and graph lasso. In \bbooktitleProceedings of the 26th International Conference on Machine Learning. \bpublisherACM, \baddressNew York. \bptokimsref \endbibitem
  • Jenatton, Audibert and Bach (2009) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmJenatton, \bfnmR.\binitsR., \bauthor\bsnmAudibert, \bfnmJ.\binitsJ. \AND\bauthor\bsnmBach, \bfnmF.\binitsF. (\byear2009). \bhowpublishedStructured variable selection with sparsity-inducing norms. Technical report, INRIA. \bptokimsref \endbibitem
  • Kim and Xing (2009) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmKim, \bfnmS.\binitsS. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2009). \btitleStatistical estimation of correlated genome associations to a quantitative trait network. \bjournalPLoS Genetics \bvolume5 \bpagese1000587. \bptokimsref \endbibitem
  • Kim and Xing (2012) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmKim, \bfnmS.\binitsS. \AND\bauthor\bsnmXing, \bfnmE. P.\binitsE. P. (\byear2012). \bhowpublishedSupplement to “Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping.” DOI:\doiurl10.1214/12-AOAS549SUPP. \bptokimsref \endbibitem
  • Lee et al. (2006) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmLee, \bfnmS. I.\binitsS. I., \bauthor\bsnmPe’er, \bfnmD.\binitsD., \bauthor\bsnmDudley, \bfnmA.\binitsA., \bauthor\bsnmChurch, \bfnmG.\binitsG. \AND\bauthor\bsnmKoller, \bfnmD.\binitsD. (\byear2006). \btitleIdentifying regulatory mechanisms using individual variation reveals key role for chromatin modification. \bjournalProc. Natl. Acad. Sci. USA \bvolume103 \bpages14062–14067. \bptokimsref \endbibitem
  • Obozinski, Taskar and Jordan (2010) {barticle}[mr] \bauthor\bsnmObozinski, \bfnmGuillaume\binitsG., \bauthor\bsnmTaskar, \bfnmBen\binitsB. \AND\bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2010). \btitleJoint covariate selection and joint subspace selection for multiple classification problems. \bjournalStat. Comput. \bvolume20 \bpages231–252. \biddoi=10.1007/s11222-008-9111-x, issn=0960-3174, mr=2610775 \bptnotecheck year\bptokimsref \endbibitem
  • Obozinski, Wainwright and Jordan (2008) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmObozinski, \bfnmG.\binitsG., \bauthor\bsnmWainwright, \bfnmM. J.\binitsM. J. \AND\bauthor\bsnmJordan, \bfnmM. J.\binitsM. J. (\byear2008). \btitleHigh-dimensional union support recovery in multivariate regression. In \bbooktitleAdvances in Neural Information Processing Systems \bpages21. \bpublisherMIT Press, \baddressCambridge, MA. \bptokimsref \endbibitem
  • Pujana et al. (2007) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmPujana, \bfnmM. A.\binitsM. A., \bauthor\bsnmHan, \bfnmJ. J.\binitsJ. J., \bauthor\bsnmStarita, \bfnmL. M.\binitsL. M., \bauthor\bsnmStevens, \bfnmK. N.\binitsK. N., \bauthor\bsnmTewari, \bfnmM.\binitsM., \bauthor\bsnmAhn, \bfnmJ. S.\binitsJ. S., \bauthor\bsnmRennert, \bfnmG.\binitsG., \bauthor\bsnmMoreno, \bfnmV.\binitsV., \bauthor\bsnmKirchhoff, \bfnmT.\binitsT., \bauthor\bsnmGold, \bfnmB.\binitsB. \betalet al. (\byear2007). \btitleNetwork modeling links breast cancer susceptibility and centrosome dysfunction. \bjournalNature Genetics \bvolume39 \bpages1338–1349. \bptokimsref \endbibitem
  • Segal et al. (2003) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmSegal, \bfnmE.\binitsE., \bauthor\bsnmShapira, \bfnmM.\binitsM., \bauthor\bsnmRegev, \bfnmA.\binitsA., \bauthor\bsnmPe’er, \bfnmD.\binitsD., \bauthor\bsnmBotstein, \bfnmD.\binitsD., \bauthor\bsnmKoller, \bfnmD.\binitsD. \AND\bauthor\bsnmFriedman, \bfnmN.\binitsN. (\byear2003). \btitleModule networks: Identifying regulatory modules and their condition-specific regulators from gene expression data. \bjournalNature Genetics \bvolume34 \bpages166–178. \bptokimsref \endbibitem
  • Sørlie et al. (2001) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmSørlie, \bfnmT.\binitsT., \bauthor\bsnmPerou, \bfnmC. M.\binitsC. M., \bauthor\bsnmTibshirani, \bfnmR.\binitsR., \bauthor\bsnmAas, \bfnmT.\binitsT., \bauthor\bsnmGeisler, \bfnmS.\binitsS., \bauthor\bsnmJohnsen, \bfnmH.\binitsH., \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmEisen, \bfnmM. B.\binitsM. B., \bauthor\bparticlevan de \bsnmRijn, \bfnmM.\binitsM., \bauthor\bsnmJeffrey, \bfnmS. S.\binitsS. S., \bauthor\bsnmThorsen, \bfnmT.\binitsT., \bauthor\bsnmQuist, \bfnmH.\binitsH., \bauthor\bsnmMatese, \bfnmJ. C.\binitsJ. C., \bauthor\bsnmBrown, \bfnmP. O.\binitsP. O., \bauthor\bsnmBotstein, \bfnmD.\binitsD., \bauthor\bsnmLønning, \bfnmP. E.\binitsP. E. \AND\bauthor\bsnmBørresen-Dale, \bfnmA.\binitsA. (\byear2001). \btitleGene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. \bjournalProc. Natl. Acad. Sci. USA \bvolume98 \bpages10869–10874. \bptokimsref \endbibitem
  • Stranger et al. (2005) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmStranger, \bfnmB.\binitsB., \bauthor\bsnmForrest, \bfnmM.\binitsM., \bauthor\bsnmClark, \bfnmA.\binitsA., \bauthor\bsnmMinichiello, \bfnmM.\binitsM., \bauthor\bsnmDeutsch, \bfnmS.\binitsS., \bauthor\bsnmLyle, \bfnmR.\binitsR., \bauthor\bsnmHunt, \bfnmS.\binitsS., \bauthor\bsnmKahl, \bfnmB.\binitsB., \bauthor\bsnmAntonarakis, \bfnmS.\binitsS., \bauthor\bsnmTavare, \bfnmS.\binitsS. \betalet al. (\byear2005). \btitleGenome-wide associations of gene expression variation in humans. \bjournalPLoS Genetics \bvolume1 \bpages695–704. \bptokimsref \endbibitem
  • Tibshirani (1996) {barticle}[mr] \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1996). \btitleRegression shrinkage and selection via the lasso. \bjournalJ. Roy. Statist. Soc. Ser. B \bvolume58 \bpages267–288. \bidissn=0035-9246, mr=1379242 \bptokimsref \endbibitem
  • Wu et al. (2009) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmWu, \bfnmT. T.\binitsT. T., \bauthor\bsnmChen, \bfnmY. F.\binitsY. F., \bauthor\bsnmHastie, \bfnmT.\binitsT., \bauthor\bsnmSobel, \bfnmE.\binitsE. \AND\bauthor\bsnmLange, \bfnmK.\binitsK. (\byear2009). \btitleGenome-wide association analysis by lasso penalized logistic regression. \bjournalBioinformatics \bvolume25 \bpages714–721. \bptokimsref \endbibitem
  • Yuan and Lin (2006) {barticle}[mr] \bauthor\bsnmYuan, \bfnmMing\binitsM. \AND\bauthor\bsnmLin, \bfnmYi\binitsY. (\byear2006). \btitleModel selection and estimation in regression with grouped variables. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume68 \bpages49–67. \biddoi=10.1111/j.1467-9868.2005.00532.x, issn=1369-7412, mr=2212574 \bptokimsref \endbibitem
  • Yuan and Yan (2010) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmYuan, \bfnmX.\binitsX. \AND\bauthor\bsnmYan, \bfnmS.\binitsS. (\byear2010). \btitleVisual classification with multi-task joint sparse representation. In \bbooktitleProceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). \bpublisherIEEE Computer Society Press, \baddressLos Alamitos, CA. \bptokimsref \endbibitem
  • Zhang (2010) {bincollection}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhang, \bfnmY.\binitsY. (\byear2010). \btitleMulti-task active learning with output constraints. In \bbooktitleProceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI). \bpublisherAAAI Press, \baddressMenlo Park, CA. \bptokimsref \endbibitem
  • Zhang and Horvath (2005) {barticle}[mr] \bauthor\bsnmZhang, \bfnmBin\binitsB. \AND\bauthor\bsnmHorvath, \bfnmSteve\binitsS. (\byear2005). \btitleA general framework for weighted gene co-expression network analysis. \bjournalStat. Appl. Genet. Mol. Biol. \bvolume4 \bpagesArt. 17, 45 pp. (electronic). \biddoi=10.2202/1544-6115.1128, issn=1544-6115, mr=2170433 \bptokimsref \endbibitem
  • Zhao, Rocha and Yu (2009) {barticle}[mr] \bauthor\bsnmZhao, \bfnmPeng\binitsP., \bauthor\bsnmRocha, \bfnmGuilherme\binitsG. \AND\bauthor\bsnmYu, \bfnmBin\binitsB. (\byear2009). \btitleThe composite absolute penalties family for grouped and hierarchical variable selection. \bjournalAnn. Statist. \bvolume37 \bpages3468–3497. \biddoi=10.1214/07-AOS584, issn=0090-5364, mr=2549566 \bptokimsref \endbibitem
  • Zhou, Jin and Hoi (2010) {bmisc}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhou, \bfnmY.\binitsY., \bauthor\bsnmJin, \bfnmR.\binitsR. \AND\bauthor\bsnmHoi, \bfnmS. C. H.\binitsS. C. H. (\byear2010). \bhowpublishedExclusive lasso for multi-task feature selection. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR W&CP. \bptokimsref \endbibitem
  • Zhu et al. (2008) {barticle}[auto:STB—2012/04/05—08:30:55] \bauthor\bsnmZhu, \bfnmJ.\binitsJ., \bauthor\bsnmZhang, \bfnmB.\binitsB., \bauthor\bsnmSmith, \bfnmE. N.\binitsE. N., \bauthor\bsnmDrees, \bfnmB.\binitsB., \bauthor\bsnmBrem, \bfnmR. B.\binitsR. B., \bauthor\bsnmKruglyak, \bfnmL.\binitsL., \bauthor\bsnmBumgarner, \bfnmR. E.\binitsR. E. \AND\bauthor\bsnmSchadt, \bfnmE. E.\binitsE. E. (\byear2008). \btitleIntegrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. \bjournalNature Genetics \bvolume40 \bpages854–861. \bptokimsref \endbibitem
  • Zou and Hastie (2005) {barticle}[mr] \bauthor\bsnmZou, \bfnmHui\binitsH. \AND\bauthor\bsnmHastie, \bfnmTrevor\binitsT. (\byear2005). \btitleRegularization and variable selection via the elastic net. \bjournalJ. R. Stat. Soc. Ser. B Stat. Methodol. \bvolume67 \bpages301–320. \biddoi=10.1111/j.1467-9868.2005.00503.x, issn=1369-7412, mr=2137327 \bptokimsref \endbibitem
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description