Estimation and testing for multiple regulation of multivariate mixed outcomes

Estimation and testing for multiple regulation of multivariate mixed outcomes


Considerable interest has recently been focused on studying multiple phenotypes simultaneously in both epidemiological and genomic studies, either to capture the multidimensionality of complex disorders or to understand shared etiology of related disorders. We seek to identify multiple regulators or predictors that are associated with multiple outcomes when these outcomes may be measured on very different scales or composed of a mixture of continuous, binary, and not-fully-observed elements. We first propose an estimation technique to put all effects on similar scales, and we induce sparsity on the estimated effects. We provide standard asymptotic results for this estimator and show that resampling can be used to quantify uncertainty in finite samples. We finally provide a multiple testing procedure which can be geared specifically to the types of multiple regulators of interest, and we establish that, under standard regularity conditions, the familywise error rate will approach 0 as sample size diverges. Simulation results indicate that our approach can improve over unregularized methods both in reducing bias in estimation and improving power for testing.

1 Introduction

Considerable recent interest has been focused on studying multiple phenotypes simultaneously in both epidemiological and genomic studies. There are several reasons for such studies to be important. First, a complex disorder is usually associated with multiple correlated phenotypes. Hence, even when the focus of the study is on a single disease, multiple phenotypes might be needed to fully capture the complexity and multidimensionality of the disorder. Second, multiple related disorders might share the same etiology and a joint assessment will enable researchers to identify factors associated with risk of multiple diseases. As an example, recent studies have identified common genes associated with a higher risk of what were previously considered distinct autoimmune diseases [26]. Similar shared genetic bases have also been suggested for various types of cancers and related psychiatric disorders [15]. Identification of predictors of multiple outcomes, also commonly known as multiple traits in the genetics literature, can improve understanding of disease etiology, genetic regulatory pathways, and treatment. Further complicating matters, the outcome measures may be diverse: they may be binary (e.g., presence of disease), continuous (disease activity score), ordinal (severity of disease), not completely observable (perhaps due to a limit of quantification), or any combination thereof.

To address these questions statistically, we seek to assess the association between a vector of predictors and a vector of outcomes by estimating and testing all relevant effects. For each predictor we desire an estimation and testing procedure that will identify its associated subset of . In particular, researchers often want to identify predictors that are important for multiple or all outcomes. We will call a “multiple regulator” if it is associated with multiple outcomes, a terminology which we adapt from [11]. An example of what we call multiple regulation is known as pleiotropy in the genetics literature. Our goal of identifying multiple regulation is not to be confused with identifying predictors that are associated with any outcomes. Association with any outcomes is an active area of research, with two examples being global association tests and group-sparse regularization. Global tests provide a test for the relationship between and the entire set [6, 5] and have been shown in some situations to have higher power than marginal tests to detect associations when relates to multiple outcomes . Group-sparse methods, largely based on the group lasso [23], use model selection to identify predictors that are relevant for any outcome [19]. These methods, while powerful and useful, do not address the question of which outcomes are relevant for each predictor and in general are unsuited for diverse outcomes that may contain censoring.

Here, we are particularly interested in identifying predictors that are relevant for multiple outcomes and inferring which subset of each of the ’s are associated with. There is a paucity of literature that addresses these specific questions. Under linear regression models, the remMap procedure [11] addresses such a question via variable selection by jointly penalizing both the and group norms of a squared loss. Under generalized linear models, one could potentially modify the hierarchical lasso [27] procedure, originally proposed to handle grouped predictors with a single outcome, to address the multiple regulator problem. When making joint inference on a diverse set of outcomes, it is also desirable to put all effects on similar scales. A simple example of this idea can be found in [14], where linear regression models were considered for multiple continuous outcomes and each outcome was scaled by its standard deviation. However, none of these methods is applicable to settings where consists of a diverse set of outcomes whose scales may not be easily comparable to each other, especially when may contain censored time-to-event variables. To accommodate modeling of multiple outcomes of different scales and/or type, we propose in this paper the use of semiparametric transformation models which give all effects of on a similar interpretation. A liability thresholding version of such models can naturally model binary or ordinal outcomes.

Regardless of estimation technique, a multiple testing procedure is required to control error rates when identifying multiple regulation, which operates on the (potentially large) set of hypotheses unassociated with . Neither [11] nor [27] tackles this issue. In general, multiple testing based on regularized estimation is challenging for two reasons. First, while many of the regularization procedures such as [27] established asymptotic oracle properties for their estimators — non-informative predictors can be detected with no uncertainty and their detection induces no additional variation in the estimation of the informative predictors [3, 28] — in finite samples those properties may be far from holding. Consequently, basing testing procedures on such asymptotic results may lead to inflated type I error in finite samples. Second, the estimators and hence their corresponding test statistics could be highly correlated from the regression fitting. Standard methods for controlling the familywise error rate (FWER), like the Bonferroni procedure, tend to be conservative in the presence of correlation, and they ignore the dependence structure in the data.

We propose a two-stage technique to both estimate the effects of on and identify multiple regulation while controlling error rates. In the first stage, we posit models to put all effects on the same scale, and we use regularization to induce sparsity in the estimated effects. To do this, we generalize the adaptive hierarchical lasso of [27] to handle the case of semiparametric models. In the second stage, we employ a stepdown procedure analogous to [12] to identify multiple regulation while controlling error rates. Our two-stage method, entitled Sparse Multiple Regulation Testing (SMRT), is powerful for several reasons. First, our modeling strategy allows us to do estimation and make inference on outcomes that may be measured on completely different scales. Next, regularization enables us to more efficiently estimate both the null and non-null effects. The null effects are estimated as 0 with probability tending to 1 and the non-null effects are estimated with lower variability compared to unregularized estimators. Furthermore, the distributions of the estimates of null effects and the distributions of the estimates of non-null effects are distinctly separated through regularization, giving us more power to detect the non-null effects (see figure 1 for an illustration from our simulations). Finally, our testing procedure can be specifically geared to detect associations with multiple outcomes.

Figure 1: Sampling distributions of null and non-null effects, with and without regularization. Tails of the distributions are truncated for ease of presentation.

However, it is generally challenging to perform testing based on regularized estimators since their distributions in finite samples cannot be approximated well by asymptotic results. We lay out permutation- and resampling-based procedures to better approximate the finite-sample distributions of the proposed test statistics and the regression parameter estimators. This enables us to properly control error rates for both hypothesis testing and interval estimation. Thus, in addition to providing the estimator based on joint regularization, the main contributions of this paper include providing resampling procedures to make joint inference about and deriving the SMRT testing procedure to identify the subset of outcomes associated with each of the predictors. Our proposed estimation and testing procedures can account for the joint effects of the predictors and the correlation among both the predictors and the outcomes.

The rest of the paper is organized as follows. In section 2, we give an overview of SMRT. In section 3, we discuss details regarding our sparse estimator, including its asymptotic properties and quantifying its variability. In section 4, we discuss issues related to testing, including the asymptotic guarantee of familywise error control and practical approaches to finite-sample error control. In section 5, we apply our method to a genetic study of autoantibodies with the goal of identifying multiple regulators of autoimmunity. Simulation results which validate our method are provided in section 6. And finally, in section 7, we discuss implications and further directions.

2 Overview of SMRT

Suppose the data for analysis consists of independent and identically distributed random vectors where are the outcomes and are the predictors for the th subject. We first propose a unified modeling strategy for diverse by assuming that


where represents the unknown effect of on , is an unspecified smooth, increasing function, and the link function, , is given although the correlation structure of is left unspecified. For ease of presentation, we assume that is fully observed although the proposed method can easily accommodate censored outcomes. When is continuous, (1) is equivalent to


Generalized linear models for a binary or ordinal outcome can be written in the form of (1) and (2) by viewing the observed outcome as a thresholded version of a latent continuous outcome and as only defined at the threshold values, as previously suggested in the literature [17, e.g.]. Choice of determines the type of model being fit. For example, corresponds to a proportional odds model for continuous and a logistic regression model if is binary. Models (1) and (2) have also been previously used to analyze censored survival outcomes [2, 24]. The virtue of this approach is that the scale of the will be comparable across when the same or comparable are used whether is continuous, discrete, or not fully observed because each marginal model has a similar form. For example if , then each has the interpretation of a log odds ratio regardless of whether is continuous, binary, ordinal, or censored.

To estimate , one may employ the non-parametric maximum likelihood estimator (NPMLE) under model (1) [24, 10] based on data observed on the th outcome, . Let denote the resulting profile log-likelihood (PLL) function corresponding to the NPMLE. It has been shown that under mild smoothness conditions, the profile likelihood can be treated as a regular likelihood, and the maximum PLL estimator is regular and semiparametric efficient [10]. However, when is not too small and might be sparse, an improved estimator may be obtained by imposing regularization on the PLL. To do this, we simultaneously consider all outcomes and obtain a sparse as the minimizer of the penalized sum of negative PLLs


with penalty function The penalty function was previously proposed in [27] for generalized linear models with grouped predictor variables. The tuning parameter controls the amount of regularization and weight is chosen to ensure oracle properties of . Summing over the PLLs in (3) essentially imposes a working independence assumption across the outcomes [7]. Imposing the joint penalty incorporates the potential for joint sparsity across all outcomes for some ’s. Setting declares to be non-informative for all outcomes or equivalently ; while setting suggests that . We will show that possesses a sparsistency property, i.e., . This ensures desirable asymptotic properties for our testing procedures. We give further details regarding and its asymptotic properties in section 3. We now turn to the topic of testing.

2.1 Testing a single predictor

In order to make inference on a single predictor, SMRT employs a stepdown procedure for considering the hypotheses with alternative hypotheses denoted . To test , we consider the statistic and its reference distribution which approximates the distribution of and can be obtained by, for example, resampling or permutation (see section 4.2). We scale by , which is an estimated standard error of , since under , and the null distribution of is difficult to approximate.

To test simultaneously, we order the test statistics from largest to smallest, and identify their corresponding hypotheses . Define for every the sup-statistic over and its corresponding reference distribution: and . Furthermore, denote the th quantile of by , which approximates the th quantile of under the null that . We identify the subset of hypotheses to reject, denoted by , as follows.

  1. Let . If , accept all hypotheses and stop. Otherwise, let and continue. …

  2. Let . If , accept all hypotheses in and stop. Otherwise, let and continue. …

  3. Let . If , accept . Otherwise, let .

The stepdown procedure for the simultaneous testing of then rejects all hypotheses in and concludes that is associated with . If the reference distribution and are chosen such that the probability of making a type I error at each step is at most :


for any , then the FWER of the stepdown procedure – that is, the probability of making at least one false rejection over the set – is maintained at . We discuss in detail issues relating to the choice of reference distribution and in section 4. We also describe how, regardless of the choices of reference distribution and , the FWER is asymptotically 0 because is sparsistent.

2.2 Multiple regulation testing

Now suppose scientific interest lies only with a predictor if it regulates at least outcomes. That is, we only care to conclude that is associated with , for some if the number of rejections (i.e., the cardinality of ) is at least . Then we can modify the testing procedure in the previous section to increase power to detect -multiple regulators (kMRs) at the expense of being able to detect if appears to be associated with fewer than outcomes. The testing procedure proceeds by essentially skipping the first steps in the previous section and only rejecting the first hypotheses if any other hypotheses are rejected. Thus, we will either reject 0 hypotheses or or more hypotheses. Throughout, when we refer to SMRT, we mean the combination of our sparse estimation technique and our multiple regulation testing procedure for a given , with corresponding to the application of the test in the previous section.

We identify the subset of hypotheses to reject, denoted by , as follows: 1) let . If , accept all hypotheses and stop. Otherwise, let and continue; 2) let . If , accept all hypotheses in and stop. Otherwise, let and continue. Steps 3 through proceed as in the previous section.

As discussed in section 4, the stepdown test with also has asymptotic FWER of 0. In addition to requiring (4), which we will call controlling the common type I error, we also require the control of a second type of error: incorrectly rejecting one of based on correctly rejecting in step one. We will call this a type I error by implication. Since the distribution of null effects gets shrunk dramatically toward 0 (see figure 1), it is unlikely for this type of error to occur in practice because it requires a test statistic corresponding to a null hypothesis to be larger than a test statistic from a rejected alternative hypothesis. We leave discussion of controlling the FWER for all predictors to appendix D.2. The extension of the testing procedure for a single predictor is straightforward.

3 Inference about

We next detail the construction of as well as the asymptotic distribution for the zero and non-zero components, which is crucial for the validity of our estimator, confidence intervals, and proposed testing procedures. Estimation proceeds by minimizing (3). Now, since the profile log-likelihoods are non-linear functions without closed form in most cases, direct maximization of (3) may be numerically challenging, especially when is not small. To reduce the computational complexity and enable the use of widely available software, we propose to take a quadratic expansion of in (3) similar to [25] and [22]. Specifically, we instead minimize


where , , , and is a symmetric half matrix of such that . Computational simplifications and a full algorithm for fitting are discussed in appendix E.

3.1 Asymptotic Theory

In this section, we present the properties of our proposed estimator . It has the property of sparsistency in that it asymptotically sets truly null effects to exactly 0. Specifically, define and as indexing the non-zero and zero components of , respectively, where denotes the subvector of corresponding to . Then a sparsistent estimator is one that satisfies as . Furthermore, our estimates of non-null effects are asymptotically normal and possess the oracle property, in that they are as efficient in the limit as if we knew which effects were truly null a priori. Let denotes the submatrix of corresponding to rows in and columns in .

In appendix B, we show that for PLLs that satisfy certain regularity conditions (listed in appendix A), if , then there exists a root- consistent local maximizer such that and in distribution, where , denotes the contribution of the th subject to the profile score function for , , and is the limiting information matrix. This result, parallel to that given in [27], offers the promise of identifying null effects with probability approaching 1, while efficiently estimating non-null effects. From a testing perspective, it ensures that the type I error of SMRT for any decreases to 0 as .

3.2 Estimating the variability in

The asymptotic results on suggest that we are as efficient in the limit as if we knew which parameters were truly 0 from the outset. However, in finite samples the added variability due to estimating may not be negligible, and hence relying on the asymptotic result will underestimate the variability in . To better approximate the finite-sample distribution, we propose a perturbation resampling procedure to estimate the distribution of . This procedure, by accounting for the variability in estimating , provides a more precise estimate of the variability in and maintains the correlation structure in .

We generate a resampled counterpart of , denoted by , in two steps. We first generate , a resampled version of , by either perturbing the profile likelihood or directly perturbing the influence function corresponding to . In essence, each perturbation is achieved by multiplying to the likelihood contribution from the th subject, where the positive perturbation variables are generated independently with mean 1 and variance 1. Then we minimize our objective function (5) using in place of , yielding resampled estimates . Similar resampling procedures have been proposed for making inference with a wide range of standard objective functions without regularization [18, 20, e.g] and recently extended to accommodate -type regularized estimators [9]. Here, we propose such a resampling procedure to both account for the potential correlation among the outcomes and better approximate the finite-sample behavior of hierarchically regularized estimators.

In appendix C, we detail the perturbation procedure and establish its asymptotic properties, which are parallel to those for . A key feature of the resampled is that has the same limiting distribution as . Thus, to approximate the distribution of for a given dataset, we may generate a large number of s, for some suitably large . To construct a confidence interval (CI) for a specific , one may estimate the standard error of as the empirical standard error of its perturbed realizations, . A level confidence interval can then be constructed based on the normal confidence interval or alternatively the lower and upper percentiles of .

3.3 Tuning

SMRT involves a large number of minimizations and tuning parameter selections. It is thus not feasible to select using time-consuming methods such as cross-validation. We propose a modified BIC criteria: where is the minimizer of (5) corresponding to and df is the number of non-zero entries in . In small and moderate sample sizes, is much smaller than and is used here. However, when becomes large may be preferred. [22] showed that this BIC criteria (with either or ) satisfies the rate requirements for a standard adaptive LASSO type penalty. Similar arguments can be used to justify the rate for the adaptive hierarchical LASSO type penalty used in (5).

4 Testing

In this section, we show that the FWER of our testing procedure is asymptotically 0 because of the sparsistency of . We also discuss in more detail the choice of reference distribution.

4.1 Properties of SMRT

One of the main results of this paper is that, given a suitably estimated , the FWER of our stepdown procedure approaches 0 as for any regardless of the reference distribution or what quantile we use to determine the cutoff for rejection. Specifically, we show in appendix D.1 that if is sparsistent, then for every and , as , and SMRT has an asymptotic FWER of 0, for any reference distribution, , and . The result follows from showing that common type I errors and type I errors by implication both occur with probability tending to 0. With regard to common type I errors, under a given null , the test statistic is estimated at exactly 0 with probability tending to 1 and, under the composite null , tends to 0 as well. Thus, we cannot reject , regardless of the value of , and therefore common type I errors will occur with probability approaching 0 as . The other potential source of type I error occurs for when incorrectly rejecting based on correctly rejecting , . However, this sort of type I error will only occur if the test statistic for a null hypothesis (which is tending to 0) is larger in magnitude than the test statistic for an alternative hypothesis, which is of course impossible asymptotically.

While the foregoing result shows that the asymptotic behavior of SMRT is ensured by the sparsistency of , of course in finite samples choice of reference distribution and is paramount in maintaining the desired error rate. Maintaining the FWER at approximately can be ensured by choosing the reference distribution and such that the probability of making a type I error at each step of the testing procedure is maintained at approximately .

4.2 Choosing a reference distribution

As discussed in the previous section, any reference distribution will provide asymptotic control of the FWER by virtue of sparsistency of the estimator . We explore resampling- and permutation-based reference distributions. The resampling-based reference distribution is based on . Simulation results suggest that, although resampling provides good approximation to the finite-sample distribution of , it tends to over-estimate the variability of (see figure 2). As an alternative, we consider a permutation-based reference distribution with based on an estimate of from a dataset where is permuted. See appendix F for further details about the procedure and section 6 for simulation results. Numerical results suggest that,the permutation-based reference distribution does a better job of approximating the finite-sample null distribution of , as shown in figure 2.

Figure 2: Simulation-based empirical and estimated distribution of null effects. Empirical null distribution of (labelled ”Empirical”) agrees closely in the tails with the permutation-based estimate (labelled ”Permutation estimated”), while the resampling-based estimate (labelled ”Resampling estimated”) overestimates the density in the tails.

4.3 Choosing

To control the FWER at -level, it seems reasonable to choose . This ensures that, for a suitable reference distribution and large enough, , for any , which, along with negligible type I error by implication, will give approximate FWER control. In light of the fact that all type I errors are tending to probability 0, one could obtain improved power by choosing , while maintaining the level at . This is particularly important if using the resampling-based reference distribution. One could use another layer of permutation or resampling to estimate the smallest that would still maintain the level . However, that requires computing a large number of permutations/resamples for each of the members of the reference distribution, which becomes prohibitively computationally demanding quickly. Computing a suitable is a topic of future research.

5 Genetic study to identify shared autoimmune risk loci

We apply SMRT to a study of shared autoimmunity with the goal of identifying genetic markers associated with 4 autoantibodies: anti-nuclear antibodies (ANA), anti-cyclic citrullinated protein (CCP) antibodies, anti-transglutaminase (TTG) antibodies, and anti-thyroid peroxidase antibodies (TPO). These 4 autoantibodies are respectively markers for 4 autoimmune diseases (ADs): systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), celiac disease, and autoimmune thyroid disease. The genetic markers consists of 67 single-nucleotide polymorphisms (SNPs) previously published as potential risk markers for these four ADs. Discovering which SNPs regulate multiple ADs can aid in understanding potential shared pathways or etiology of these diseases [26]. While it is rare for an individual to have multiple ADs, multiple autoantibodies can be present in individuals predisposed to having the multiple ADs even in the absence of the disease phenotypes. Here we consider the autoantibodies markers for subjects at higher risk for the ADs.

The study cohort includes 1265 individuals of European ancestry with RA identified through electronic medical records at Partners Healthcare [8]. Due to a limit of quantification, the antibody measurements are highly unreliable when the values are either very low or very high. A convenient approach to incorporating such limitations is by assuming a marginal proportional odds model and truncating the observations at the limit of quantification. Hence still has the interpretation of being a log odds ratio (OR).

Results for the autoantibody data are summarized in figure 3. Figure 3 (a) shows results for the sparse estimation step. In the figure, SNPs are denoted along the -axis, and outcomes are denoted along the -axis. Color of the tile indicates the OR estimate, with darker colors indicating stronger association. In order to measure the strength of association with respect to the FWER, we provide adjusted -values as the smallest such that that test would reject while controlling the FWER for the SNP at . Figure 3 (b) shows this -value for each test.

Due to the large number of hypotheses, we do not have sufficient power to detect multiple regulation while simultaneously controlling the FWER across all SNPs. Taking a less conservative view, if we control the FWER at the SNP level, five SNPs show some evidence of multiple regulation at . The two strongest associations were with rs2187668 and rs3129860. Having previously shown associations to SLE [16] and celiac [21], rs2187668 was estimated to be related to the autoantibodies for those diseases at OR = 1.45 (-value 0.005) for ANA and OR = 1.62 (-value 0.005) for TTG, as well as to CCP (OR = 0.78, -value 0.05). This SNP is in the MHC region, which is known to affect immune function. Similarly, rs3129860, also in the MHC region, which had previously shown an association to SLE [16], here demonstrated an association to ANA (OR = 1.28, -value 0.05) , CCP (OR = 1.50, -value 0.003), and TPO (OR = 1.30, -value 0.05).

(a) (b)
Figure 3: Results for autoantibody data. SNPs are listed on the -axis, and autoantibodies are listed on the -axis. (a) Sparse effect estimates. Darker colors indicate larger magnitudes, and white indicates no estimated association. (b) Adjusted -values. Darker color indicates smaller -value and more evidence against the null hypothesis of no association.

6 Simulation results

We ran simulations to assess the performance of our point and interval estimation procedures as well as SMRT. We loosely based our simulations on the autoantibody dataset, allowing the relationship between and to be specified by a proportional odds model. We considered sample sizes of 150, 250, and 500 and ran 1000 simulations for each sample size. For each simulation, 1000 resampled s were generated.

We set the number of predictors of interest to be 30 and the number of outcomes to be 4. Covariates took values in with probability where . Outcomes were generated according to the marginal proportional odds model, conditional on . We allowed correlation in , which was accomplished by first generating correlated normal random variables where is exchangeable. Then let for Gaussian distribution function , and finally where logistic. For computational simplicity, we discretized into ten levels of roughly equal sizes according to deciles. The only change when discretizing is to the number of locations at which is estimated. In practice, this is not an issue (note that we did not discretize in the data analysis), but for the purposes of simulation it was a moderate speed-up with little information loss.

The relationship between and is defined by

where is a vector of ones, and . This configuration indicates that there are eight predictors related to all four outcomes, four related to just the first three outcomes, four related to just the first two outcomes, and four related to just the first outcome. The remaining ten predictors are null, unrelated to any outcome. We also see that associations to outcomes and are weak, so we would expect there to be less power to detect those effects.

6.1 Estimation

We first demonstrate that our point and interval estimation procedures perform well in finite samples. Figure 4 (top panel) shows the average bias in and across simulations, plotted according to true effect size and sample size. The regularized exhibits much smaller bias than the unregularized for all sample sizes and effect sizes. Particularly at smaller sample sizes, regularization substantially reduces the bias in the estimator.

In figure 4 (middle panel), we plot the average bias in SE estimates obtained based on our proposed resampling procedures as well as those based on the asymptotic variance. Both the asymptotic SE estimate and the resampling-based one overestimate the variability in when , but more closely approximates . When , the asymptotic SE tends to underestimate the true variability, while approximates it well.

We examine CI coverage in the bottom panel of figure 4 and see that underestimating the SEs leads to poor 95% CI coverage levels for the normal-based CI methods, based on and . Resampling-based quantile 95% CIs have good coverage for all values of and all sample sizes. The coverage levels of asymptotic-based CIs are as low as 78% for non-zero effects and remain lower than the nominal level even when . Hence in practice, we recommend the quantile-based CIs.

Figure 4: Average performance of point estimates, standard errors, and confidence intervals across 1000 simulations at sample sizes of . All quantities are aggregated over and plotted against . Top panel: Average estimated bias in regularized and unregularized . Middle panel: Average estimated bias of estimates of , the standard error of , comparing resampling-based estimates to asymptotic estimates. Bottom panel: 95% CI coverage comparing asymptotic, resampling-based normal, and resampling-based quantile CIs.

6.2 Testing

In the following sections, we examine the performance of SMRT. We first characterize the performance of our procedure with when testing is performed for each predictor individually considering both the resampling-based and the permutation-based reference distribution. Then we consider testing with and controlling error rates for all predictors.

Resampling-based reference distribution

We briefly demonstrate the gains in power possible by using the resampling-based reference distribution. For ease of presentation, we demonstrate the performance of the testing procedure for the marginal test of with and without regularization. Results for the full stepdown procedure are similar.

Figure 5: Threshold (-axis) plotted against its associated empirical rejection rate (-axis) for the marginal test of across 1000 simulations, with color denoting the magnitude of and linetype indicating whether regularization was used. The value on the -axis represents the threshold at which the empirical type I error was controlled for the regularized test, and represents the threshold at which the empirical type I error was controlled for the unregularized test. Results for effects of the same magnitude are averaged for ease of presentation.

Figure 5 demonstrates the power gain possible when using the regularized estimator with the resampling-based reference distribution. The plot shows the threshold necessary to obtain a given rejection rate. The ideal threshold maintains the rejection rate for null effects ) at a given level, say , indicated by the vertical dashed line. That threshold that maintains the type I error for the regularized estimator, indicated by in the plot, is much lower than the threshold for the unregularized estimator, indicated by in the plot. Furthermore, the power to detect weak effects () using the regularized estimator at is 58% compared to 52% using the unregularized estimator at , while the power to detect strong effects are similar. Thus, if one could select adaptively, it appears that large power gains could be observed by using regularization. Due to its computational burden, however, we did not pursue this method further in our simulations.

Permutation-based reference distribution

We pursue a more rigorous study of SMRT using the permutation-based reference distribution mentioned in section 4.2 and described in detail in appendix F. To demonstrate the role of regularization in improving testing, we compare SMRT to an identical testing procedure based on the unregularized , named MRT. We use the permutation-based reference distribution for both SMRT and MRT and take . To demonstrate the advantages of the stepdown method, we compare to a single-step procedure, denoted as Sup, where we reject all for which where . Finally, we compare to the Bonferroni adjustment.

When controlling the FWER at for each using the basic test, SMRT and MRT performed similarly in controlling FWER. The average empirical FWER was .046, .052, and .055 at respectively for SMRT. The corresponding average FWER for MRT was 0.042, 0.049, 0.054, at those respective sample sizes. The more conservative Sup test had average FWERs of .041, .044, and .043, respectively, and the even more conservative Bonferroni .028, .026, .021.

In terms of power, SMRT dominates all other test procedures. Figure 6 depicts the power to detect non-null effects at (other sample sizes show similar relative performances, with SMRT performing relatively better as sample size decreases). Possible rejections are listed across the bottom, and results are arranged according to how many outcomes the predictor is actually associated with. The figure shows that SMRT is uniformly more powerful than MRT, Bonferroni and Sup, with the differences becoming more apparent in identifying multiple regulation.

Figure 6: Power to detect non-null effects across 1000 simulations at sample size and level . Each plot indicates how many outcomes the predictors tested are associated with. For example, the top left plot corresponds to predictors with strong association to , and weak association to , . Tests are listed on the -axis. Power is indicated on the -axis. Power estimates are aggregated over all estimates that share the same effect sizes. To take a couple of examples, the bar corresponding to ”” in the figure corresponds to power to reject , and the bar corresponding to ”” in the figure corresponds to power to reject each of simultaneously.

Results for controlling the FWER by applying SMRT to all predictors and all hypotheses were qualitatively similar. All methods maintained the nominal level of the test, and SMRT obtained higher power than MRT, Sup, and Bonferroni at all sample sizes. When we apply SMRT to each with , the average FWER for SMRT decreases to .020, .027, and .033 at . Taking or sees a further reduction to FWERs.

6.3 Superiority of joint analysis over marginal models

In this section, we demonstrate the advantages of performing a joint analysis for the detection of multiple regulation. We compare our estimator to the estimator obtained by fitting each marginal model individually using penalization, which we will denote . The joint analysis improves our ability to detect multiple regulation, with the improvement over increasing with the number of outcomes a predictor is associated with. For example, when , for the eight predictors associated with all four outcomes, the power to detect association with and increased from 57% to 67% and from 60% to 66%, respectively, when using SMRT based on the joint penalty as opposed to marginal models, with a negligible increase for and . For predictors of three outcomes, SMRT based on increased the power for association with to 61% from 57% for . For predictors of two outcomes, had 58% power in detecting effects associated with compared to 54% for . Furthermore, is much better at eliminating completely non-informative predictors by estimating all of their effects at exactly 0. Using joint estimation, eliminates null predictors completely 52% of the time, while the rate is only 23% using marginal models, when . The relative performance patterns are similar for and 250.

7 Discussion

We have proposed a framework for testing and estimation across a diverse set of outcomes, with the explicit goal of identifying predictors for multiple outcomes. This framework allows the combination of information across continuous, semi-continuous, and discrete outcomes while maintaining control of the FWER. We have extended existing sparse regression methods for identifying multiple regulation to the complex scenario when the components of may be on very different scales or not completely observed. We have proven the asymptotic properties of this estimator and shown that one can use resampling to estimate its variability. We have, finally, provided a testing framework for identifying multiple regulation and demonstrated that the properties of the estimator ensure that the testing procedure has asymptotic FWER of 0.

While we rely on the sparsistency properties of our estimator, other penalty functions could potentially accomplish similar results to the hierarchical penalty we proposed. As long as sparsistency holds and a suitable finite-sample reference distribution can be obtained, e.g. through permutation or resampling, other penalty functions could be worth exploring. For simplicity, we used a working independence assumption to combine the profile log-likelihoods of multiple outcomes. But when the outcomes are not independent, incorporating information about the covariance in can improve efficiency [7]. A further advantage to using the quadratic approximation to in (5) instead of the profile log-likelihood itself (besides computational tractability) is that we can incorporate covariance information about through the initial estimate . If the (unpenalized) initial estimate is estimated in a way that gains efficiency by taking correlation in into account, then that increase in efficiency will be propagated into our estimation of .

Due to the fine-grained nature of multiple regulation analysis and the complexity of dealing with diverse , SMRT may not be preferred in genome-wide or other very high-dimensional data where discovery is of primary importance. Rather than using SMRT to discover novel markers, we suggest using it to validate known markers. Global tests for all outcomes [6, 5] provide better power to discover unknown risk markers. Theoretically, the convergence of our estimators cannot be guaranteed jointly unless the number of predictors and outcomes is finite. Thus, we require not to be too large compared to the sample size. Practically speaking, the computational complexity of the estimation procedure grows with . A brief simulation yielded average run times of 0.4,1.7, 4.3, 11.0, 21.2, 44.9, and 343.8 seconds for and 50, respectively, at , with results being quite similar with .

Finally, we have focused on FWER as the error rate of primary interest throughout this paper, but it could be of interest in some testing situations to employ less restrictive error control, especially when the number of tests grows large and signals are weak. We could easily extend SMRT with to include more generalized error rates, such as -FWER or the false discovery proportion, as in [13], and testing when could be adapted in that direction as well.

Appendix A Appendix

For the following proofs, we put mild restrictions on the model (1) for each outcome , as described in section 3 of [10]. We reproduce the restrictions here for completeness. Since the requirements hold for each outcome, we drop the superscripts for the moment. Let be the full log-likelihood. We require that there exists a such that is twice continuously differentiable for all with first and second derivatives denoted and . Further, And must be the efficient score function. For every fixed , let be the NPMLE for . Then, for any , and Finally, suppose that there exists a neighborhood of such that is Donsker with square integrable envelope function and is Glivenko-Cantelli and bounded in .

Appendix B Proof of sparsistency and asymptotic normality

To state and prove the results, we will need some preliminaries. Our objective function can be written equivalently as solely a function of rather than as a function of and , as shown in Theorem 1 of [27]. For a fixed number of outcomes and fixed number of predictors, the objective function can be written


where . In the following we establish the sparsistency and asymptotic normality of the minimizer .

b.1 Root- consistency

We first show the root- consistency of our estimator . For PLLs

that satisfy the regularity conditions listed above, if , then there exists a local maximizer of such that

To see this, let . We will show that for a given , there exists a constant C such that . Now consider

Now, since , then , and . Furthermore, . Now, following the argument in [27], . Thus, as long as , all terms are dominated by the first term of , which is positive. And root- consistency follows.

b.2 Sparsistency

We will now show that is sparsistent: If we can show that then sparsistency follows from root- consistency and the argument in the proof of Theorem 4 in [27]. To this end, note that