Christof Seiler    [    [    [

Mass cytometry technology enables the simultaneous measurement of over 40 proteins on single cells. This has helped immunologists to increase their understanding of heterogeneity, complexity, and lineage relationships of white blood cells. Current statistical methods often collapse the rich single-cell data into summary statistics before proceeding with downstream analysis, discarding the information in these multivariate datasets. In this article, our aim is to exhibit the use of statistical analyses on the raw, uncompressed data thus improving replicability, and exposing multivariate patterns and their associated uncertainty profiles. We show that multivariate generative models are a valid alternative to univariate hypothesis testing. We propose two models: a multivariate Poisson log-normal mixed model and a logistic linear mixed model. We show that these models are complementary and that either model can account for different confounders. We use Hamiltonian Monte Carlo to provide Bayesian uncertainty quantification. Our models applied to a recent pregnancy study successfully reproduce key findings while quantifying increased overall protein-to-protein correlations between first and third trimester.

Uncertainty Quantification in Multivariate Mixed Models]Uncertainty Quantification in Multivariate Mixed Models for Mass Cytometry Data ]Lisa M. Kronstad, Laura J. Simpson, Mathieu Le Gars, Elena Vendrame ]Catherine A. Blish Seiler et al.]Susan Holmes

1 Introduction

High dimensional flow and mass cytometry enable the simultaneous detection of surface and intracellular proteins at the single cell level. Flow cytometry uses antibodies tagged with fluorochromes and can measure more than 20 proteins on thousands of cells per second (Saeys et al., 2016). Mass cytometry uses heavy metal isotopes and provides measurements for more than 40 proteins on hundreds of cells per second (Bendall et al., 2011). Immunologists use this new technology to discover new cell types and to study how cells covary across experimental conditions.

There are two main statistical tasks involved in analyzing cytometry data. First, cell types are assembled into latent clusters in a multivariate marker space spanned by Cluster of Differentiation (CD) markers (Chan et al., 1988); these clusters are labeled according to known cell types. Standard practice is to separate cells sequentially in a hierarchical dichotomy given by positive or negative values along one of the CDs; this is referred to as gating. The sequential approach is helpful in overcoming the curse of dimensionality—the phenomenon where in dimension four or higher the data becomes very sparse and is concentrated at the boundary of the space (Orlova et al., 2018; Holmes and Huber, 2019). Gating also facilitates biological interpretability. Sequential manual gating is challenging when markers are correlated or higher order interactions of more than three markers are necessary to define a cell type. In such cases, simultaneous clustering for mass cytometry methods (Nowicka et al., 2017) have gained popularity. Many clustering methods are readily available (Lo et al., 2009; Finak et al., 2009; Qian et al., 2010; Zare et al., 2010; Aghaeepour et al., 2011; Qiu et al., 2011; Ge and Sealfon, 2012; Shekhar et al., 2014; Becher et al., 2014; Naim et al., 2014; Meehan et al., 2014; Van Gassen et al., 2015; Sörensen et al., 2015; Levine et al., 2015; Chen et al., 2016; Samusik et al., 2016). Weber and Robinson (2016) provide a benchmark study comparing many of these approaches. To improve interpretability, most researchers use a semi-supervised approach and post-process their clustering results by mapping them to known cell types; e.g. (Spitzer et al., 2015).

The second statistical task is the differential expression analysis between experimental conditions and across cell types. Nowicka et al. (2017) propose summarizing protein expressions by taking the median for each cell type. They then use linear models with random effect terms associated both to the donors and the samples and compare the median expression along the different experimental conditions. The statistical power of this approach depends on the number of cell types and samples. There have been efforts to combine both the clustering and differential analysis tasks into one framework. Weber et al. (2018) extend Nowicka et al. (2017) to a larger number of clusters using empirical Bayes moderated tests adapted from transcriptomics. Bruggner et al. (2014) couple hierarchical clustering and penalized regression to simultaneously find cell types and identify clusters that differentiate between cell types. Their approach divides markers into two types; those that define cell types, and those whose expression levels change in response to stimuli. Arvaniti and Claassen (2017) abandon the division in cell phenotype and functional markers and use convolutional neural networks to both detect cell types and predict experimental condition, although interpreting the learned network remains a challenge. Finally, Lun et al. (2017) select differentially abundant clusters between conditions following the spatial false discovery rates literature.

Currently, none of the aforementioned approaches explicitly models marker correlations. Including correlations in the model can have positive effects: it makes the statistical procedure more efficient, it exposes additional structure with which to interpret results, and it provides information as to eventual confounders that need to be attended to.

In this article, we model marker correlations with two mixed models designed specifically for mass cytometry data. We present two complementary approaches. First, a multivariate Poisson log-normal model (Aitchison and Ho, 1989; Chib and Winkelmann, 2001) that can handle complicated experimental designs, and guards against colliding confounders. Second, a logistic mixed effect model that guards against pipe confounders. Rather than recommending one model, we provide evidence that best results are obtained when we use both models.

In Section 2, we describe the publicly available pregnancy data we use throughout this paper. In Section 3, we define the two statistical models and introduce the colliding and pipe confounders. In Section 4, we show the results of applying our models to the pregnancy data using our two new R packages CytoGLMM and cytoeffect. In Section 5, we relate our results to previously reported findings, and comment on possible drawbacks and future challenges.

2 Data: Pregnancy Study

We reanalyze a recently published dataset studying the maternal immune system during pregnancy (Aghaeepour et al., 2017). The study provides a rich mass cytometry dataset analyzed at four time points during pregnancy in two cohorts. The authors isolated cells from blood samples and stimulated them with several activation factors. The goal was to explain how immune cells react to these stimuli, and how these reactions change throughout pregnancy. Findings from such experiments might identify immunological deviations implicated in pregnancy-related pathologies.

The authors collected data at early, mid, late pregnancy, and six weeks postpartum. They kept one sample per donor as an unstimulated control, and stimulated the other samples with a panel of receptor-specific ligands: interferon-2A (IFN), lipopolysaccharide, and a cocktail of interleukins. They processed the samples on a CyTOF 2.0 mass cytometer instrument, and bead normalized the data to account for signal variation over time from changes in instrument performance (Finck et al., 2013).

Figure 1: Data structure of stimulated and unstimulated cells from two donors after gating to T, B, and NK cell types.

In our analysis, we focus on comparing early (first trimester) with late (third trimester) pregnancy samples stimulated with IFN in the first cohort of 18 women. We gate cells into cell types and organize them in a data frame (Figure 1). We implement this step in our CytoGLMM workflow in the supplementary material.

3 Models

3.1 Two Models, Two Interpretations

In many mass cytometry studies the goal is to explore protein modulation under different stimuli. The underlying mechanisms are often unknown. Models can account for possible confounders and help to guard against spurious correlations. Here we present two regression models: a Poisson Log-normal Mixed Model (PLMM) and a Logistic Linear Mixed Model (LLMM). The PLMM explains marker expressions from experimental conditions. The LLMM does the reverse and “predicts” experimental conditions from marker expressions.

Figure 2: Possible causal graphs: no confounders, pipe, and collider. X stands for experimental condition. Y1 and Y2 stand for protein marker expressions.

We use insights from causal inference (Spirtes et al., 2000; Rothman et al., 2008; Pearl, 2009; Imbens and Rubin, 2015; Peters et al., 2017; Hernán and Robins, 2019) to simultaneously interpret results from both models. We distinguish between a pipe and collider confounder. An intuitive way to visualize these concepts is through Directed Acyclic Graphs (DAGs). Assume that X encodes whether a women is in her first or third trimester, and that Y1 and Y2 are two marker expressions. A possible relationship between X and (Y1, Y2) can be a correlation due to a common dependence of expression on the trimester (Figure 2: No Confounders). In such a case both models report similar results. Another possible structure is that trimester causes a change in protein Y2, and Y2 causes a change in Y1 (Figure 2: Pipe). This is sometimes referred to as the pipe confounder, and can be accounted for through a multiple regression model such as our LLMM. In contrast, our PLMM cannot uncover such a relationship and finds that both Y1 and Y2 are correlated with X. In our third construction, we combine both of the previous DAGs. This means that both Y2 and X cause a change in Y1, and X causes a change in Y2 (Figure 2: Collider). This is sometimes referred to as the collider confounder. The danger of colliders is that by conditioning on Y1, as in our LLMM, it can induce spurious negative correlations between X and Y2. In such cases it is more meaningful to use our PLMM (McElreath, 2019).

In our supplementary material we provide simulation results using simplified versions of LLMM and PLMM that demonstrate these effects. Of course, the problem with real data is that we usually have no information about the DAG. In the absence of a known biological mechanism, we therefore recommend fitting both models and giving a combined interpretation of them. In the next two subsection, we present the full models.

3.2 Poisson Log-Normal Mixed Model

We now extend the Poisson log-normal mixed model introduced by Chib and Winkelmann (2001). We begin by assuming that the marker counts in the th cell and th protein marker are distributed according to a Poisson distribution with rate . Each cell and protein marker has its own rate parameter following a linear model on the scale. The first term of the linear model are fixed effects in the index notation (McElreath, 2019): and are cells from the first and third trimester, respectively. The second term models a cell-level random effect. This approach can account for correlations between count variables and for cell-to-cell variability. We use the index notation to define multiple cell-level random effects; one per experimental condition. The third term models a donor-level random effect. Formally,

Random effect terms , and are vectors of length equal to the number of proteins ; in our reanalysis of the pregnancy study . In other studies with less gating markers and more functional markers, these vectors can be to dimensional. We have a separate cell-level random effect for first and third trimester. We model all three random effects using independent multivariate normal distributions. Formally,

We choose a weakly informative prior on regression coefficients to rule out highly unlikely cell counts. Within one standard deviation, our prior expects to see a count between and . We might be able to make much stronger assumptions for specific experiments. In most cases, the high number of cells will overwhelm this prior. Formally,

We choose to use weak priors for the random effects terms. The half-Cauchy for standard deviations is commonly used (Gelman et al., 2006) as a prior that favors smaller variances and still allows large variances due to its heavy tails. We choose the Lewandowskia, Kurowick, and Joe (LKJ) prior (Lewandowski et al., 2009) with parameter for correlation matrices . This is a uniform prior over correlation matrices. We expect to see quite large correlations and believe that a more flexible prior on the correlation matrices is warranted. In higher dimensions the LKJ will tend to concentrate its mass around zero correlations (see simulations in Seiler and Holmes (2017)). Formally,

Figure 3: A graphical representation of PLMM.

Figure 3 displays the PLMM schematically.

To summarize, we extended the Poisson log-normal mixed model by adding separate random effect terms for each experimental condition capturing differences in correlation structure between conditions. We also add a donor random effect term to handle the large donor-specific variability usually encountered in mass cytometry data from human subjects. The random effect terms induce marker correlations in the posterior distribution. The LKJ prior is an unrestricted prior and thus induces an unrestricted posterior distribution over correlation matrices. Additionally, the standard deviations can account for overdispersion similar to the negative binomial distribution in the univariate case.

3.2.1 Posterior Inference

The parameters of interest are

and we need to sample from the posterior distribution given the data . The posterior distribution is not analytically tractable. We therefore resort to Markov chain Monte Carlo simulations. We code the model in Stan (Carpenter et al., 2017) and run Hamiltonian Monte Carlo (Neal et al., 2011) with 8 chains in parallel. We assess convergence using traceplots; see R markdown reports in the supplementary material for details. To speed up convergence, we initialize the HMC sampler with maximum likelihood estimates from independent Poisson regressions for the fixed effects, and sample standard deviations and correlation matrices for hyperparameters of the random effects.

3.2.2 Goodness of Fit

Figure 4: Goodness of fit using marginal posterior predictive checks. Cell Subset A: Percentage of cells that have larger than median expression of pSTAT1, pSTAT3, and pSTAT5. Cell Subset B: Percentage of cells that have larger than median expression of pSTAT1, and lower than median expression of pSTAT3 and pSTAT5. Cell Subset C: Percentage of cells that have zero expression of pERK1/2 and larger than median expression of pMAPKAPK2. Cell Subset D: Percentage of cells that have nonzero expression of pERK1/2 and larger than median expression of pMAPKAPK2. The medians are taken over both conditions.

We evaluate the fit of our model using four test statistics that measure model adequacy. For the first test statistic, we calculate the overall median marker expression for pSTAT1, pSTAT3, and pSTAT5. We use the median to define dim (less than the median) and bright (greater than median) cells. We calculate the percentage of bright pSTAT1, pSTAT3, and pSTAT5 cells. For the second test statistic, we calculate the percentage of pSTAT1 bright, pSTAT3 dim, and pSTAT5 dim cells. For the third and fourth test, we calculate the percentage of non-expressed and expressed pERK1/2 and bright pMAPKAPK2 cells, respectively. Figure 4 shows the comparison of the posterior predictive distribution of this test statistic overlayed with the actual observed test statistic as dashed lines. Observed test statistics in the bulk of the distribution indicate a good model fit.

3.2.3 Posterior Summary

We compare pairwise marker correlations and assess if they increase from first to third trimester. We define the posterior probability of larger correlation between marker and from posterior samples as ( is an indicator function)

3.3 Logistic Linear Mixed Model

Now, we change our perspective and predict the experimental condition from protein marker expressions. We use experimental conditions as response variables (transformed; see Section 3.3.2) and marker expression as explanatory variables. The response variable is a binary variable encoding experimental condition as zero or one. In our reanalysis, and are first and third trimester, respectively. We model the response variable as a Bernoulli random variable with probability for each cell. Then we use the link to relate the linear model to binary responses. The linear model predicts the logarithm of the odds of the th cell being from the third instead of the first trimester. The linear model has one coefficient per protein marker (in our reanalysis proteins markers) and an intercept. If is 0.5 then the cell could have come from either first or third trimesters with equal probability. What we observe are the protein marker expressions . For each cell we measure protein markers. If is either very close to zero or one, then the cell is strongly representative of a cell observed under the first or third trimester, respectively. The values are not observed directly, only and are observed. So, has to be estimated from the data. We add flexibility by allowing a mixed effect term in the standard logistic regression model. The covariates are the same as in the fixed effect term, the regression coefficients vary by donor. Thus allowing markers of each donor and cell type to have varying coefficients. Each cell maps to a donor indexed by . Formally,

We model coefficients using a multivariate normal distribution with covariance matrix decomposed into standard deviations and correlation matrix . Formally,

The mixed effect model is a compromise between two extremes. First, we could estimate separate regression coefficients for each donor. This corresponds to random effects modeled as a multivariate normal distribution with infinite standard deviations as there is no constrain on how coefficients are related to each other. Second, we could pool all donors into one group and forget about the donor information. This corresponds to random effects modeled as a multivariate normal distribution with zero standard deviations as there is no additional variability besides the fixed effect term. We note that modeling the additional variability is important, as we noticed that donor-to-donor variability is higher than celltype-to-celltype variability in a wide range of mass cytometry experiments.

We use the same priors as for PLMM. The interpretation of the priors on the regression coefficient are less intuitive than in PLMM. Since the prior is weak, the data will overwhelm the prior. Simulation experiments show little impact of varying the prior on . Formally,

Figure 5: A graphical representation of LLMM.

Figure 3 displays the LLMM schematically.

3.3.1 Posterior Inference

The parameters of interest are , and we sample from the posterior distribution given the data using Hamiltonian Monte Carlo as in PLMM.

We also provide a fast approximation in our CytoGLMM package using the method of moments (Perry, 2017). This corresponds to approximate improper flat priors on all parameters. We have successfully used this approach in two recent immunology studies (Kronstad et al., 2018; Le Gars et al., 2018).

3.3.2 Marker Expression Transformation

We assume that marker expressions have been transformed using variance stabilizing transformations to account for heteroskedasticity. We recommend the use of the hyperbolic sine transformation. This transformation assumes a two-component model for the measurement error (Rocke and Lorenzato, 1995; Huber et al., 2003) where small counts are less noisy than large counts. Intuitively, this corresponds to a noise model with additive and multiplicative noise depending on the magnitude of the marker expression; see Holmes and Huber (2019) for details.

4 Results

We follow the gating scheme detailed in Aghaeepour et al. (2017) and define 12 cell types using the R package openCyto (Finak et al., 2014): memory CD4 positive T cells (CD4+Tmem), naive CD4 positive T cells (CD4+Tnaive), memory CD8 positive T cells (CD8+Tmem), naive CD8 positive T cells (CD8+Tnaive), T cells (gdT), regulatory T memory cells (Tregsmem), regulatory T naive cells (Tregsnaive), B cells, classical monocytes (cMC), intermediate monocytes (intMC), non-classical monocytes (ncMC), and Natural Killer cells (NK).

Out of the 32 protein markers measured on each cell, the authors defined 22 markers as gating markers, and 10 as functional markers. The functional markers are pSTAT1, pSTAT3, pSTAT5, pNFB, total IB, pMAPKAPK2, pP38, prpS6, pERK1/2, and pCREB (in plots Greek symbols are replaced by Latin symbols). More details on the gating template is available in the CytoGLMM workflow in the supplementary material.

In the interest of expositional clarity, we focus our analysis on NK cells by filtering the data frame (Figure 1). The same analysis carries over to other cell types.

We run eight parallel Markov chains for 325 iterations and remove the first 200 iterations during the warm up. This gives a total of 1000 draws from the posterior distribution. More details on convergence diagnostics and traceplots are available in our PLMM and LLMM workflows in the supplementary material.

4.1 Poisson Log-Normal Mixed Model for NK Cells

We evaluate the fit of our model using four test statistics calculating percentages of four cell subset; see Section 3.2.2 for details. Goodness of fit plots (Figure 4) show the posterior distributions of these test statistics with dashed lines indicating the observed statistic. The cell subsets A and B are located in the bulk of the distribution indicating an adequate fit. For the cell subset C and D, our model underestimates and overestimates, respectively.

Figure 6: A: Marginal posterior median and 95% posterior credible intervals of coefficient vector for the trimester fixed effects and their differences. B: Standard deviations for the condition random effects. C-D: Marginal posterior median of correlation matrices for the condition random effects. E: Posterior probability of increase in correlations from first to third trimester. F: Pairwise posterior probability of increase in correlations from first to third trimester.

We plot the marginal posterior median and 95% credible intervals for fixed effects (Figure 6A). In addition to the parameter estimates for the first and third trimesters, we also compute the difference between third and first trimesters. Protein pSTAT1 shows the largest difference between trimesters; an increase of counts (posterior median). During the first trimester pSTAT1 is expressed between and times, and during the third trimester it increases to between and (95% credible interval). These results account for donor variability. The posterior marker standard deviations are larger for first and third trimesters than across donors (Figure 6B). Due to the small number of donors (18) relative to the number of cells per donor (10,000), the credible intervals are wider in the donor standard deviations. For the posterior marker correlation matrices , we visualize the median marginal posterior correlation (Figure 6C-D). Both during first and third trimester all markers are positively correlated. In particular, pSTAT1, pSTAT3, and pSTAT5 show a strong correlation.

Figure 7: Pairs plot of multivariate marginal posterior distribution of three markers.

Figure 7 shows the multivariate marginal joint posterior distribution of pSTAT1, pSTAT3, and pSTAT5.

The condition specific correlations are robust because of the large number of cells. We assess correlation changes from first to third trimester on the global and local scale. On the global scale, we compute the probability of larger pairwise correlations over all possible correlations. The histogram (Figure 6E) of these probabilities shows that all pairwise correlations are higher in the third trimester, with a peak around 75-80%. On the local scale, we compare each pairwise correlation. Figure 6F shows pairwise probabilities of an increase in correlations.

4.2 Logistic Linear Mixed Model for NK Cells

Figure 8: Marginal posterior median and 95% credible intervals for fixed effects. A: All proteins markers. B: Possible collider pSTAT1 removed.

We plot the marginal posterior median and 95% credible intervals for the fixed effects (Figure 8A). The estimates are on the -odds scale. We see that pSTAT1 is a strong predictor of the third trimester. This means that a unit increase in the transformed marker expression makes it between to (95% credible interval) more likely to be a cell from the third trimester, while holding the other markers constant. This result coincides with the PLMM result. In contrast, pSTAT3 and pSTAT5 have negative coefficients. This means pSTAT3 and pSTAT5 predict the first trimester, while holding the other markers constant. This result contradicts with the PLMM result. One way to resolve this contradiction is by assuming a collider confounding. The potential collider pSTAT1 would depend on trimester, pSTAT3, and pSTAT5. To test this, we refit the regression model after removing the potential collider pSTAT1. Indeed, the negative coefficients for pSTAT3 and pSTAT5 disappear without pSTAT1 in the model (Figure 8B).

5 Discussion

In this article, we propose two regression models explaining the relationship between experimental conditions and protein marker expressions. We accounted for donor-to-donor and cell-to-cell variability. We showed that additional modeling of this variability increases the amount of information we can extract from a recent pregnancy study (Aghaeepour et al., 2017). In analyzing NK cell populations in different subjects during pregnancy, we were able to corroborate an increase of pSTAT1 during the third trimester when samples were stimulated with IFN, as previously reported (Aghaeepour et al., 2017). We also presented a new way of quantifying changes in overall and pairwise protein-to-protein correlations during pregnancy, and visualized their multivariate uncertainty. We found that PLMMs and LLMMs can give different results. They agreed on pSTAT1, but reported contradictory results for pSTAT3 and pSTAT5. We resolved these contradictions by assuming a collider confounding. In general, we advocate for using multiple complementary models instead of model selection.

Our two new R packages CytoGLMM and cytoeffect are applicable to a wide range of cytometry studies. Besides comparisons on paired samples, where samples are available for the same subject under different stimuli or time points, our PLMM is applicable to unpaired samples, where samples are collected on two separate groups of individuals; e.g. vaccine studies (Sasaki et al., 2014). It is further possible to apply PLMM to more complicated experimental designs by adding more random effect terms; e.g. twin studies (Brodin et al., 2015). In contrast, our LLMM is limited to paired samples and simple experimental designs.

Our models provide key advantages over existing approaches. Nowicka et al. (2017); Weber et al. (2018); Aghaeepour et al. (2017) compress the data by summarizing by median expression per donor and condition. Such compression hurts the statistical power of the analysis and makes obtaining reliable estimates of marker correlations more difficult. Our models work on the uncompressed full data and models marker correlations explicitly, addressing both aforementioned concerns. Bruggner et al. (2014); Arvaniti and Claassen (2017) may overfit to the data because they pool donors prior to the analysis. Our models are more robust to donor-to-donor variability because we explicitly model this variability with random effects. Finally, the PLMM model can incorporate complicated experimental designs, which may be hard for an analysis using hypothesis testing (Lun et al., 2017). In summary, our reanalysis proves that this family of models avoids loss of information through data compression. In other words, adapting Don Knuth’s famous quote about premature optimization in programming: “premature summarization is the root of all evil (or at least most of it) in statistics.”

The main disadvantages of our complex modeling approach is computation cost. The fact that HMC is inherently a sequential procedure makes it challenging to obtain a speed up without sacrificing accuracy. Our PLMM takes about six days to fit 180,000 cells and 16 donors using eight CPU cores. Future work aims to incorporate recent developments in scaling up HMC (Srivastava et al., 2018) into our workflow. Another approach is to trade-off the theoretical guarantees of MCMC samplers for time efficiency by using variational inference, as recently introduced by Chiquet et al. (2018a, b) for multivariate Poisson log-normal models. As a check, we conducted the same analysis on a subsample of 1000 cells per donor. We observed that this subsampling step has minimal influence on fixed effects, marker correlation, and standard deviations, but a larger influence on the marker correlations comparison. We therefore recommend running fast exploratory analysis on smaller subsets of the data as the major source of biological variability is between subjects. For more accurate correlation comparisons, we recommend running the analysis on the full dataset.

Our second limitation is the requirement for gated cell types. To reduce the person-to-person bias of manual gating, we employed the R package openCyto (Finak et al., 2014). It is challenging to scale this approach to high dimensional gating schemes. In the future, it will be important to extend our models with a cell type clustering step (Silva et al., 2017) and propagate gating uncertainty to the regression model. This could be accomplished by using infinite mixtures from Bayesian nonparametrics to cluster cells and donors into groups of similar immunological profiles. In the univariate setting, Antonelli et al. (2016) showed that more flexible random effect distributions mitigate bias at no or only small prize in efficiency.

Our posterior model checks for PLMM indicated an adequate model fit for most markers except pERK1/2. In the cell subset C, where the pERK1/2’s are not expressed, the model predicted a smaller number of cells than in the observed data. The opposite effect occurred in the cell subset D, where the pERK1/2 are nonzero, and the model predicted a larger than observed cell count. In future work, we plan to provide an option to allow for zero-inflated marker counts in the Poisson distribution using mixtures.

We use transformed marker expression as predictors in our LLMM. To propagate uncertainty from this transformation step to the regression step, we plan to add a measurement error model to LLMM.

Donor specific random effects in the pregnancy study are noisy due to the small number of donors. We therefore avoid to interpret the marginal donor specific random effects directly. Instead we consider them nuisance parameters and average over them. We aim to apply our models to studies with larger donor samples size. This will allow us to quantify more precisely the donor-to-donor posterior correlation matrix.

Supplementary Material: Reproducibility

Data analysis and results can be completely reproduced following the vignettes on our R package websites (including installation instructions and source R markdown files):

The data is publicly available and automatically downloaded during the vignettes.


This work was supported by the National Institutes of Health [U01AI131302 to C.A.B. and S.H., R56AI124788 to S.H., R21AI130523 to S.H., DP1DA046089 to C.A.B., R21AI130532 to C.A.B., R01AI133698 to C.A.B., R21AI135287 to C.A.B., 5T32AI007290-29 to L.M.K., TL1TR001084 to E.V., T32AI007502 to E.V., 1F32AI126674 to L.J.S.]; an A.P. Giannini fellowship [to L.M.K.]; and a Stanford Child Health Research Institute postdoctoral fellowship [to M.L.G.]. C.A.B. is the Tashia and John Morgridge Endowed Faculty Scholar in Pediatric Translational Medicine from the Maternal Child Health Research Institute, and a Chan Zuckerberg Investigator.


  • Aghaeepour et al. (2017) Aghaeepour, N., Ganio, E. A., Mcilwain, D., Tsai, A. S., Tingle, M., Van Gassen, S., Gaudilliere, D. K., Baca, Q., McNeil, L., Okada, R., Ghaemi, M. S., Furman, D., Wong, R. J., Winn, V. D., Druzin, M. L., El-Sayed, Y. Y., Quaintance, C., Gibbs, R., Darmstadt, G. L., Shaw, G. M., Stevenson, D. K., Tibshirani, R., Nolan, G. P., Lewis, D. B., Angst, M. S. and Gaudilliere, B. (2017) An immune clock of human pregnancy. Science Immunology, 2, eaan2946.
  • Aghaeepour et al. (2011) Aghaeepour, N., Nikolic, R., Hoos, H. H. and Brinkman, R. R. (2011) Rapid cell population identification in flow cytometry data. Cytometry Part A, 79, 6–13.
  • Aitchison and Ho (1989) Aitchison, J. and Ho, C. (1989) The multivariate Poisson-log normal distribution. Biometrika, 76, 643–653.
  • Antonelli et al. (2016) Antonelli, J., Trippa, L. and Haneuse, S. (2016) Mitigating bias in generalized linear mixed models: The case for Bayesian nonparametrics. Statistical Science, 31, 80–95.
  • Arvaniti and Claassen (2017) Arvaniti, E. and Claassen, M. (2017) Sensitive detection of rare disease-associated cell subsets via representation learning. Nature Communications, 8, 14825.
  • Becher et al. (2014) Becher, B., Schlitzer, A., Chen, J., Mair, F., Sumatoh, H. R., Teng, K. W. W., Low, D., Ruedl, C., Riccardi-Castagnoli, P., Poidinger, M. et al. (2014) High-dimensional analysis of the murine myeloid cell system. Nature Immunology, 15, 1181.
  • Bendall et al. (2011) Bendall, S. C., Simonds, E. F., Qiu, P., El-ad, D. A., Krutzik, P. O., Finck, R., Bruggner, R. V., Melamed, R., Trejo, A., Ornatsky, O. I. et al. (2011) Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science, 332, 687–696.
  • Brodin et al. (2015) Brodin, P., Jojic, V., Gao, T., Bhattacharya, S., Angel, C. J. L., Furman, D., Shen-Orr, S., Dekker, C. L., Swan, G. E., Butte, A. J. et al. (2015) Variation in the human immune system is largely driven by non-heritable influences. Cell, 160, 37–47.
  • Bruggner et al. (2014) Bruggner, R. V., Bodenmiller, B., Dill, D. L., Tibshirani, R. J. and Nolan, G. P. (2014) Automated identification of stratifying signatures in cellular subpopulations. Proceedings of the National Academy of Sciences, 111, E2770–E2777.
  • Carpenter et al. (2017) Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P. and Riddell, A. (2017) Stan: A probabilistic programming language. Journal of Statistical Software, 76.
  • Chan et al. (1988) Chan, J., Ng, C. and Hui, P. (1988) A simple guide to the terminology and application of leucocyte monoclonal antibodies. Histopathology, 12, 461–480.
  • Chen et al. (2016) Chen, H., Lau, M. C., Wong, M. T., Newell, E. W., Poidinger, M. and Chen, J. (2016) Cytofkit: A Bioconductor package for an integrated mass cytometry data analysis pipeline. PLoS Computational Biology, 12, e1005112.
  • Chib and Winkelmann (2001) Chib, S. and Winkelmann, R. (2001) Markov Chain Monte Carlo analysis of correlated count data. Journal of Business & Economic Statistics, 19, 428–435.
  • Chiquet et al. (2018a) Chiquet, J., Mariadassou, M. and Robin, S. (2018a) Variational inference for sparse network reconstruction from count data. arXiv preprint arXiv:1806.03120.
  • Chiquet et al. (2018b) Chiquet, J., Mariadassou, M., Robin, S. et al. (2018b) Variational inference for probabilistic Poisson PCA. The Annals of Applied Statistics, 12, 2674–2698.
  • Finak et al. (2009) Finak, G., Bashashati, A., Brinkman, R. and Gottardo, R. (2009) Merging mixture components for cell population identification in flow cytometry. Advances in Bioinformatics, 2009.
  • Finak et al. (2014) Finak, G., Frelinger, J., Jiang, W., Newell, E. W., Ramey, J., Davis, M. M., Kalams, S. A., De Rosa, S. C. and Gottardo, R. (2014) Opencyto: An open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Computational Biology, 10, e1003806.
  • Finck et al. (2013) Finck, R., Simonds, E. F., Jager, A., Krishnaswamy, S., Sachs, K., Fantl, W., Pe’er, D., Nolan, G. P. and Bendall, S. C. (2013) Normalization of mass cytometry data with bead standards. Cytometry Part A, 83, 483–494.
  • Ge and Sealfon (2012) Ge, Y. and Sealfon, S. C. (2012) flowPeaks: A fast unsupervised clustering for flow cytometry data via K-means and density peak finding. Bioinformatics, 28, 2052–2058.
  • Gelman et al. (2006) Gelman, A. et al. (2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534.
  • Hernán and Robins (2019) Hernán, M. and Robins, J. (2019) Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming.
  • Holmes and Huber (2019) Holmes, S. and Huber, W. (2019) Modern Statistics for Modern Biology. Cambridge University Press.
  • Huber et al. (2003) Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. and Vingron, M. (2003) Parameter estimation for the calibration and variance stabilization of microarray data. Statistical Applications in Genetics and Molecular Biology, 2.
  • Imbens and Rubin (2015) Imbens, G. W. and Rubin, D. B. (2015) Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Kronstad et al. (2018) Kronstad, L. M., Seiler, C., Vergara, R., Holmes, S. P. and Blish, C. A. (2018) Differential induction of IFN- and modulation of CD112 and CD54 expression govern the magnitude of NK cell IFN- response to influenza A viruses. The Journal of Immunology, 201, 2117–2131.
  • Le Gars et al. (2018) Le Gars, M., Seiler, C., Kay, A., Bayless, N., Starosvetsky, E., Moore, L., Shen-Orr, S., Aziz, N., Dekker, C., Khatri, P. et al. (2018) CD38 is a key regulator of enhanced NK cell immune responses during pregnancy through its role in immune synapse formation. bioRxiv, 349084.
  • Levine et al. (2015) Levine, J. H., Simonds, E. F., Bendall, S. C., Davis, K. L., El-ad, D. A., Tadmor, M. D., Litvin, O., Fienberg, H. G., Jager, A., Zunder, E. R. et al. (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell, 162, 184–197.
  • Lewandowski et al. (2009) Lewandowski, D., Kurowicka, D. and Joe, H. (2009) Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100, 1989–2001.
  • Lo et al. (2009) Lo, K., Hahne, F., Brinkman, R. R. and Gottardo, R. (2009) flowclust: A bioconductor package for automated gating of flow cytometry data. BMC Bioinformatics, 10, 145.
  • Lun et al. (2017) Lun, A. T., Richard, A. C. and Marioni, J. C. (2017) Testing for differential abundance in mass cytometry data. Nature Methods, 14, 707.
  • McElreath (2019) McElreath, R. (2019) Statistical Rethinking. publisher unknown, draft of second edn.
  • Meehan et al. (2014) Meehan, S., Walther, G., Moore, W., Orlova, D., Meehan, C., Parks, D., Ghosn, E., Philips, M., Mitsunaga, E., Waters, J. et al. (2014) Autogate: automating analysis of flow cytometry data. Immunologic Research, 58, 218–223.
  • Naim et al. (2014) Naim, I., Datta, S., Rebhahn, J., Cavenaugh, J. S., Mosmann, T. R. and Sharma, G. (2014) SWIFT – Scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, Part 1: Algorithm design. Cytometry Part A, 85, 408–421.
  • Neal et al. (2011) Neal, R. M. et al. (2011) MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2, 2.
  • Nowicka et al. (2017) Nowicka, M., Krieg, C., Weber, L., Hartmann, F., Guglietta, S., Becher, B., Levesque, M. and Robinson, M. (2017) Cytof workflow: Differential discovery in high-throughput high-dimensional cytometry datasets [version 2; referees: 2 approved]. F1000Research, 6.
  • Orlova et al. (2018) Orlova, D. Y., Herzenberg, L. A. and Walther, G. (2018) Science not art: statistically sound methods for identifying subsets in multi-dimensional flow and mass cytometry data sets. Nature Reviews Immunology, 18, 77.
  • Pearl (2009) Pearl, J. (2009) Causality. Cambridge University Press.
  • Perry (2017) Perry, P. O. (2017) Fast moment-based estimation for hierarchical models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 267–291.
  • Peters et al. (2017) Peters, J., Janzing, D. and Schölkopf, B. (2017) Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.
  • Qian et al. (2010) Qian, Y., Wei, C., Eun-Hyung Lee, F., Campbell, J., Halliley, J., Lee, J. A., Cai, J., Kong, Y. M., Sadat, E., Thomson, E. et al. (2010) Elucidation of seventeen human peripheral blood B-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data. Cytometry Part B: Clinical Cytometry, 78, S69–S82.
  • Qiu et al. (2011) Qiu, P., Simonds, E. F., Bendall, S. C., Gibbs Jr, K. D., Bruggner, R. V., Linderman, M. D., Sachs, K., Nolan, G. P. and Plevritis, S. K. (2011) Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nature Biotechnology, 29, 886.
  • Rocke and Lorenzato (1995) Rocke, D. M. and Lorenzato, S. (1995) A two-component model for measurement error in analytical chemistry. Technometrics, 37, 176–184.
  • Rothman et al. (2008) Rothman, K. J., Greenland, S. and Lash, T. L. (2008) Modern Epidemiology. Lippincott Williams and Wilkins.
  • Saeys et al. (2016) Saeys, Y., Van Gassen, S. and Lambrecht, B. N. (2016) Computational flow cytometry: Helping to make sense of high-dimensional immunology data. Nature Reviews Immunology, 16, 449.
  • Samusik et al. (2016) Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L. and Nolan, G. P. (2016) Automated mapping of phenotype space with single-cell data. Nature Methods, 13, 493.
  • Sasaki et al. (2014) Sasaki, S., Holmes, T. H., Albrecht, R. A., García-Sastre, A., Dekker, C. L., He, X.-S. and Greenberg, H. B. (2014) Distinct cross-reactive B-cell responses to live attenuated and inactivated influenza vaccines. The Journal of infectious diseases, 210, 865–874.
  • Seiler and Holmes (2017) Seiler, C. and Holmes, S. (2017) Multivariate heteroscedasticity models for functional brain connectivity. Frontiers in Neuroscience, 11, 696.
  • Shekhar et al. (2014) Shekhar, K., Brodin, P., Davis, M. M. and Chakraborty, A. K. (2014) Automatic classification of cellular expression by nonlinear stochastic embedding (ACCENSE). Proceedings of the National Academy of Sciences, 111, 202–207.
  • Silva et al. (2017) Silva, A., Rothstein, S. J., McNicholas, P. D. and Subedi, S. (2017) A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. arXiv preprint arXiv:1711.11190.
  • Sörensen et al. (2015) Sörensen, T., Baumgart, S., Durek, P., Grützkau, A. and Häupl, T. (2015) immunoclust – An automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry Part A, 87, 603–615.
  • Spirtes et al. (2000) Spirtes, P., Glymour, C. N., Scheines, R., Heckerman, D., Meek, C., Cooper, G. and Richardson, T. (2000) Causation, Prediction, and Search. MIT Press.
  • Spitzer et al. (2015) Spitzer, M. H., Gherardini, P. F., Fragiadakis, G. K., Bhattacharya, N., Yuan, R. T., Hotson, A. N., Finck, R., Carmi, Y., Zunder, E. R., Fantl, W. J. et al. (2015) An interactive reference framework for modeling a dynamic immune system. Science, 349, 1259425.
  • Srivastava et al. (2018) Srivastava, S., Li, C. and Dunson, D. B. (2018) Scalable Bayes via barycenter in wasserstein space. The Journal of Machine Learning Research, 19, 312–346.
  • Van Gassen et al. (2015) Van Gassen, S., Callebaut, B., Van Helden, M. J., Lambrecht, B. N., Demeester, P., Dhaene, T. and Saeys, Y. (2015) FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry Part A, 87, 636–645.
  • Weber et al. (2018) Weber, L. M., Nowicka, M., Soneson, C. and Robinson, M. D. (2018) diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering. bioRxiv. URL: https://www.biorxiv.org/content/early/2018/06/18/349738.
  • Weber and Robinson (2016) Weber, L. M. and Robinson, M. D. (2016) Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry Part A, 89, 1084–1096.
  • Zare et al. (2010) Zare, H., Shooshtari, P., Gupta, A. and Brinkman, R. R. (2010) Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics, 11, 403.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description