# Bayesian nonparametric hierarchical modeling for multiple membership data in grouped attendance interventions\thanksrefT1

###### Abstract

We develop a dependent Dirichlet process (DDP) model for repeated measures multiple membership (MM) data. This data structure arises in studies under which an intervention is delivered to each client through a sequence of elements which overlap with those of other clients on different occasions. Our interest concentrates on study designs for which the overlaps of sequences occur for clients who receive an intervention in a shared or grouped fashion whose memberships may change over multiple treatment events. Our motivating application focuses on evaluation of the effectiveness of a group therapy intervention with treatment delivered through a sequence of cognitive behavioral therapy session blocks, called modules. An open-enrollment protocol permits entry of clients at the beginning of any new module in a manner that may produce unique MM sequences across clients. We begin with a model that composes an addition of client and multiple membership module random effect terms, which are assumed independent. Our MM DDP model relaxes the assumption of conditionally independent client and module random effects by specifying a collection of random distributions for the client effect parameters that are indexed by the unique set of module attendances. We demonstrate how this construction facilitates examining heterogeneity in the relative effectiveness of group therapy modules over repeated measurement occasions.

10.1214/12-AOAS620 \volume7 \issue2 2013 \firstpage1074 \lastpage1094

Bayesian nonparametric modeling \thankstextT1Supported by the National Institute on Alcohol Abuse and Alcoholism grant to Susan Paddock (Grant Number R01AA019663). Data collection was supported by the National Institute on Alcohol Abuse and Alcoholism grant to Katherine Watkins (Grant Number R01AA014699).

A]\fnmsTerrance D. \snmSavitsky\correflabel=e1]tds151@gmail.com and A]\fnmsSusan M. \snmPaddocklabel=e2]paddock@rand.org

Bayesian hierarchical models \kwdconditional autoregressive prior \kwdDirichlet process \kwdgroup therapy \kwdmental health \kwdsubstance abuse treatment \kwdgrowth curve

## 1 Introduction

For many applications in which data have a multilevel structure, observations on a study participant might not be nested within a single higher level unit. Multiple membership (MM) modeling is used to account for such data structures which arise in applications such as the estimation of teacher effects from student test scores, where each student is typically linked to multiple teachers over one or more grades [Hill and Goldstein (1998)]. MM structures also occur in the analysis of health care costs when patients are treated by multiple providers [Carey (2000)] and smoothing disease rates when modeling health outcomes across geographic areas [Langford et al. (1999)].

In our motivating application, the MM structure arises in a study of the effect of group cognitive behavioral therapy (CBT) on reducing depressive symptoms among clients in residential substance abuse treatment. The Building Recovery by Improving Goals, Habits, and Thoughts (BRIGHT) study [Watkins et al. (2011)] was a community-based effectiveness trial of a group cognitive behavioral therapy (CBT) intervention for treating residential substance abuse treatment clients having depressive symptoms. The BRIGHT study employed a quasi-experimental design in which cohorts of clients at each of four study sites received either residential treatment as usual (UC) () or residential treatment enhanced with the BRIGHT intervention (CBT) provided by trained substance abuse treatment counselors (). Clients were assigned to receive either CBT or UC according to which intervention was offered at their study sites at the time of entry into residential substance abuse treatment. CBT and UC were offered at each study site on an alternating basis over time. The clients assigned to the CBT condition were expected to complete four modules of group CBT, with each module consisting of four thematically-similar sessions offered over a two-week period. This sequence of modules was then offered on a repeating basis. In all, group CBT modules were offered to the clients assigned to the condition. These modules were divided into CBT open-enrollment therapy groups, which are sequences of sessions that have distinct sets of clients; the number of clients enrolled in each open-enrollment group was , , , and , respectively. Enrollment into the therapy group occurs on an open basis [Morgan-Lopez and Fals-Stewart (2006), Paddock et al. (2011)], with clients entering the therapy group at the start of new modules. The primary study outcome is client depressive symptomology, as measured by the Beck Depression Inventory-II (BDI-II) [Beck, Steer and Brown (1996)]. The BDI-II score is a sum across four-level items (scored 0–3), with a higher score indicating a greater level of depressive symptoms. The BDI-II score for client is measured up to times, with for clients with only a baseline assessment at study entry and up to for clients measured as well at both and months post-baseline. The MM structure arises here since client outcomes might be correlated due to common module attendance, and the BDI-II scores are not uniquely associated with a single module but rather with all modules attended by a client.

For longitudinal studies in which participants belong to multiple higher-level units, the standard analytic approach is to include a single set of random effects terms that are assumed to be constant over time to account for the multiple membership. However, constraining these random effects to be constant across time does not allow for changes in correlations among outcomes for clients who attend modules together; their outcomes might be more strongly correlated immediately following group therapy versus at baseline or longer term follow-up times. Further, including distinct terms in the model to account for multiple membership and for the correlation of repeated measurements within-client might be too restrictive for applications such as group cognitive behavioral therapy (CBT). Not all clients benefit similarly from group therapy [Smokowski, Rose and Bacallao (2001)]. For example, group climate and cohesion are associated with improved outcomes [Ryum et al. (2009), Crowe and Grenyer (2008)]. Thus, not only might the effects of modules change over time, but also the effects of modules on participant outcome trajectories might vary across study participants.

We present a dependent Dirichlet process (DDP) model for repeated measures multiple membership data. Specifically, we propose a set of random distributions for client random effect parameters that are indexed by therapy group module attendance sequences. Our model allows one to obtain treatment effect estimates for group therapy versus a comparison condition that account for the correlation of client outcomes due to the attendance sequences, with the framework embedded in a hierarchical construction for modeling repeated measures data. One may use our approach to examine whether there is heterogeneity in the relative effectiveness of group therapy modules by identifying clusters of clients whose outcome trajectories vary across modules. Our framework is flexible enough to retain application-specific modeling choices. For the BRIGHT study, this includes specifying a proper conditionally autoregressive (CAR) base distribution for the nonparametric prior on module random effects, which accounts for the open enrollment-induced client overlap in attendance of modules that are offered at adjacent time points [Paddock et al. (2011)]. We demonstrate that the DDP model may be recast for estimation as a DP under our multiple membership linkage of clients to treatment in a similar fashion as for the analysis of variance (ANOVA) DDP [De Iorio et al. (2004)].

In Section 2 we introduce an additive model that employs client and MM random effects for BRIGHT study modules that was examined for open-enrollment group therapy data by Paddock and Savitsky (2013), and then build upon that work by introducing a multivariate generalization to allow for time-varying MM random effects. We present the DDP model in Section 3 to generalize the additive MM model to jointly model dependence owing to repeated measures within clients and group therapy module participation. Brief mention is made of our computational approach and software solution for conducting posterior simulations under the multiple membership models in Section 4, followed by an exploration of the properties of the models on simulated data in Section 5. Our motivating application focuses on the assessment of a group CBT intervention deployed in an open-enrollment study design for the treatment of depressive symptoms among clients in residential substance abuse treatment in Section 6. We conclude with discussion and conclusions in Section 7.

## 2 Multiple membership additive semi-parametric models

This section introduces model constructions that include module random effects, which are mapped to each client according to the modules attended by that client using multiple membership modeling. These models permit inference about the relative effectiveness of the CBT intervention while accounting for differences in module effects as well as the dependence induced among clients based on overlaps in the sequences of modules attended. A separate client random effects term captures the within-client dependence among repeated measures.

### 2.1 Model construction and definitions

We first begin with the model of Paddock and Savitsky (2013) for modeling longitudinal post-treatment outcomes and allowing outcomes for clients who attend the same therapy group to be correlated:

(1) |

where is the BDI-II depressive symptom score for client () at repeated measurement event . The global intercept is represented by . are the fixed effects predictors and their associated effects are . We parameterize for the BRIGHT study, where specifies an indicator for the treatment arm assigned to client [ for clients receiving cognitive behavioral therapy (CBT), for those receiving the “usual care” (UC)] and denotes the continuously-valued time at which was observed. The components of are chosen to estimate the effects on depressive symptom scores of CBT assignment, time, and the interaction of CBT assignment and time; a quadratic specification was chosen based on previous data analysis [Paddock and Savitsky (2013)]. The random effects predictor, , is a vector associated with the random effects for client , . We set for the BRIGHT study, so that the vector of random effect parameters for client , , capture client-specific variation in change in BDI-II scores over time. Our parameterization of fixed and client random effects employs global second order polynomial terms to enforce smoothness and prevent overfitting under a study design with a relatively small number of measurement waves per client, as is typical of behavioral intervention studies such as BRIGHT. The second-to-last term allows for multiple membership modeling since depressive symptom scores observed post-treatment, , are not linked to specific therapy group modules, but rather to all modules attended by client . This term maps the ’s to the vector of module random effects, , by multiplying by an weight vector, , that is normalized to sum to [Hill and Goldstein (1998)]. In particular, equals the number of modules attended by client ; if client attended module and otherwise. Let denote the number of repeated measures observed for all clients. Observational error is indicated by . We produce within-sample fitted client growth trajectories in Sections 5 and 6 with employment of .

### 2.2 Distribution of client random effects

Though one may parametrically model the client random effects, , we model them nonparametrically using a Dirichlet process (DP) prior to motivate the subsequent DDP development and to exploit the DP’s usefulness for flexibly modeling the distribution of the ’s despite having no more than three repeated measures per client in the BRIGHT study [Paddock and Savitsky (2013)]. Specify

(2) | |||||

(3) |

where we choose the base distribution, , a convenient conjugate form that spans the support for and simplifies posterior sampling while still allowing the data to estimate a general form for . We further specify to allow the data to estimate the DP concentration parameter, reflecting its importance for determining the total number of client clusters formed. We may equivalently enumerate (2) as a discrete mixture [Sethuraman (1994)],

(4) |

of countably infinite weighted point masses, where “locations” index the unique values for the . The discrete construction for allows for ties among sampled values for , so that and index clusters (i.e., clients sharing locations or having same values of ) with where implies . Then the set, , provides an equivalent parameterization to , though the former provides better mixing under posterior sampling [Neal (2000)].

### 2.3 Distribution of module random effects

#### 2.3.1 Univariate module effects

Owing to the overlap in client attendance of modules under open enrollment into group therapy, we specify a conditionally autoregressive (CAR) prior for module random effects to allow them to be correlated. The degree of correlation is determined by the closeness of the modules, which depends on how we define which modules are neighbors. We define modules offered at adjacent time points within the same open-enrollment group as neighbors given that clients tend to attend subsequent modules in the BRIGHT study’s residential treatment setting [Paddock et al. (2011)].

To implement this, we enumerate a two-part form for the covariance matrix [Besag, York and Mollié (1991)]. First, define an adjacency matrix, , to encode dependence among neighboring modules where we set if module is a neighbor of or “communicates” with module (denoted with “” in ), and otherwise. Construct , where equals the number of neighbors of module . Then compose the covariance matrix, , the Moore–Penrose pseudo-inverse, as is not of full rank, and specify the joint distribution of random module effects,

(5) |

where scalar precision parameter, , controls the overall strength of variation. The rank of is , where represents the number of distinct open-enrollment therapy groups [Hodges, Carlin and Fan (2003)].

We use the following model short-hand label for simulated data and BRIGHT data analysis:

Note that one could use a standard MM model for applications under which random effects may be assumed exchangeable.

#### 2.3.2 Multivariate module effects

The univariate module effects may be replaced with a multivariate model specification that relaxes the assumption of constant module effects over time specified in equation (5). Restate equation (1),

(6) |

where , for each of the multivariate , . We again assume a second order polynomial model, but this time for the module effects, where each module, , is parameterized with a random effects vector back multiplied by , which permits the effect of module under the BRIGHT study to vary with time, . We may most easily make the extension of the CAR modeling of Besag, York and Mollié (1991) by stacking each of the columns from into for the . Then compose the multivariate CAR prior,

(7) |

for the precision matrix, , where describes the dependence among the random effects per module and is specified to be identical to that used for the base distribution associated with the prior (2) imposed on . In summary, equation (6) extends equation (1) by permitting MM random (module) effects to vary over time. Assign the following label for our multivariate construction:

### 2.4 Prior distributions for other parameters

Scalar precision parameters are each specified with a prior with mean , while the precision matrix , where the degrees of freedom are set to the minimum value to encourage updating by the data. Last, each receive noninformative priors. In instances where our priors specify fixed hyperparameters, we use values intended to be easily overwhelmed in the presence of data rather than eliciting them from our data.

## 3 Dependent Dirichlet process for multiple membership data

To allow for greater flexibility in modeling changes in module effects over time as well as the effects of modules on client depressive symptom trajectories, we now reformulate equation (1) to explicitly index the client random effects by group therapy module identifiers, under which each client is assigned a matrix of random effects. This contrasts with the previous specification of sets of , client random effects and the module effects, , given in equation (1). The resulting client-by-module matrix parameterization arises from replacing a single random prior distribution for client effects with a collection of random prior distributions that are indexed by the unique module attendance sequences. First, we reformulate equation (1) in a more flexible composition,

(8) | |||||

(9) | |||||

(10) |

where we have replaced and with the for client composed with

(11) |

The first column of employs the analogous client random effects from the additive models. The collect a set of module random effect vectors for client . We note that every client receives an effect term, , for all of the modules, even for modules they have not attended; such is even true for clients in the UC arm. By contrast, the additive model of equation (1) is only defined at observed sequences of client module attendances, while this formulation is defined over a broader space of potential module attendance sequences across clients. We impose a DP prior on the set of client-by-module effects, , in order that we may borrow strength and dimension reduce to discover clusters of clients expressing differential response sensitivities to treatment exposures. Employment of a continuous base distribution under the DP prior for the allows the posterior inference on an arbitrary sequence of group therapy module linkages for each client. Effect values at unobserved modules are drawn from the nondegenerate continuous base distribution as updated by the observed module attendances. The module effect estimates for unobserved attendances for each client are set equal to the location values associated with the cluster to which the client is assigned. The ability to develop a proper posterior distribution for arbitrary module attendance sequences is referred to by De Iorio et al. (2004) as nondegeneracy.

Each of the columns of in equation (8) is back multiplied by , which is the MM weight vector we earlier defined, but with a prepended for a random intercept. More specifically, for equal to some value , we construct the latter object as for to encode the vector sequence for group therapy module attendance. Under our MM construction, the is composed of values in for for clients who attend at least one module, and for clients who do not.

We define the parameter vector, , resulting from composition of the client-by-module random effects with the module attendance sequence. We write and for clients and that share the same attendance sequence, . Construct the subsequence, for nonzero entries in corresponding to modules attended for one or more clients with . Then we may provide the more granular construction, , for client where we note that only those modules attended by client contribute to the likelihood. The multiplication of each by reflects the MM design with .

Our formulation in equation (8) may be re-expressed with the vector of client random effects, , in a similar fashion as the in equation (1), but here we index the client random effects by module attendance sequence . The prior for is specified under a collection of random distributions, , indexed by the unique attendance sequences, ,

(12) |

with random effects vector, , the same as composed in equation (1). Specify the prior formulation for ,

(13) |

We next enumerate a multiple membership dependent Dirichlet process (MM DDP) set of nonparametric distributions indexed by the module attendance sequence, , in the stick-breaking construction [Sethuraman (1994)],

(14) |

of weighted point mass locations where the weights are common for all values of , but the locations are indexed the unique attendance sequences (unlike for the simpler DP). We note that marginally, for each , the locations are exchangeable in , such that follows a Dirichlet process and we have established the propriety of the MM DDP. Denote the following short-hand notation for MM DDP construction,

(15) | |||||

(16) |

where we have extended the ANOVA DDP prior of De Iorio et al. (2004) to a multiple membership framework for the set of effect random distributions, .

We achieve equation (8) from equation (12) by extending a property of ANOVA DDP to the MM DDP that rewrites equation (14) as a DP due to the finite indexing space of group therapy modules with

(17) | |||||

(18) | |||||

(19) |

Then we may rewrite our DDP model formulation of equation (12) to the DP construction specified in equation (8).

Though we use equations (8)–(10) to estimate the MM DDP, the conceptual alternative in equations (12)–(14) provides insight into the inferential properties of the MM DDP. The indexing of distributions, rather than just mean effects, by the module attendance sequences better spans the space of distributions generating the client random effects and allows the estimation of client module effects for modules not attended.

We also gain insight into the manner in which strength is borrowed over the set of module attendance sequences. The MM DDP formulation employs indexed by the set of unique module attendance sequences. Few clients, however, may be expected to exactly overlap or to share the same . Yet clients will overlap for a portion of the module attendance sequences such that we have repeated observations for each module for estimation of the dependent for all . The partial overlaps among the induce a dependence structure among the based on the extent of overlaps.

### 3.1 Base distribution

We structure the base distribution, , for our client-by-module parameters to leverage the adjacency dependence of the BRIGHT study modules. Compose for draws for the cluster locations, , as the product of multivariate Gaussian distributions for each of the , and the , that, together, comprise with

(20) | |||||

(21) |

where indexes cluster location. The construction in equation (21) employs a separable (parsimonious) covariance formulation for the distribution on the set of matrix variate parameters, . We have employed the notation of Dawid (1981) under which the , , defines the precision matrix for the columns of and the , , for the rows. The covariance formulation is equivalent to . [See Hoff (2011) for an intuitive discussion of separable covariance formulations.] Last, the preceding presents the value of the mean. Consistent with prior formulations under the additive models, the . We structure the precision matrix, , which models the module-induced adjacency dependence among the set, , with a proper CAR formulation as enumerated in Jin, Carlin and Banerjee (2005), where and ensures is of full rank and may be viewed as a smoothing parameter that measures the strength of the adjacency association. Matrices hold the same definitions as earlier specified in Section 2.3.

Proceeding with the notation of Dawid (1981), we pull together the components of the base distribution into

(22) |

where . Let us prepare in the form we will use to conduct posterior simulations by stacking the rows of [each an vector] to the in

(23) |

Vectorize in a similar manner to obtain the , , which is similar to (7) but is full rank to permit efficient joint posterior sampling under high within-cluster dependence among the elements of . Our MM DDP formulation specifies the full set of module effects for client set equal to the location values, , drawn from the CAR base distribution for cluster that contains client for some posterior sampling iteration.

Due to the BRIGHT study design, there were open-enrollment therapy groups. Each group was composed of modules having at least partial overlap with another module with respect to the set of clients in attendance, and the sets of clients in the four groups were different. We thus add more flexibility in (23) by specializing the CAR prior in to each open-enrollment therapy group with

(24) |

where we have defined a set, , of CAR precision matrices composed as and recover and , reflecting the disjoint, noncommunicating structure we seek to model. It is noted by Jin, Carlin and Banerjee (2005) that the parameterization of the global scalar smoothing parameter, , may be overly restrictive, and they offer more heavily parameterized alternatives to permit the learning to adapt more locally. Our specification that offers the indexing of by disjoint group allows smoothing across client-indexed module effects to be local to group. We may specify other continuous, multivariate distributions in place of the CAR for each group, including replacing the CAR covariance matrix construction with an anistropic Gaussian process Savitsky and Vannucci (2010) or with an unspecified formulation under an inverse Wishart prior.

## 4 Computational approach

Convergence of the sampler employed for simulation and the BRIGHT data analyses was assessed by employing a fixed width estimator with Monte Carlo standard errors (MCSE) computed using the consistent batch means (CBM) method [Jones et al. (2006)]. Computational software for the posterior distribution simulations is available in our package for the R statistical software [R Development Core Team (2011)] package called growcurves [Savitsky and Paddock (2012)]. All of the methods, fit statistics and charts presented in this paper may be readily reproduced from growcurves. The parameters under DP priors are all sampled in a conjugate fashion by marginalizing over the random measure, , to produce the Pólya urn scheme of Blackwell and MacQueen (1973), under which each cluster assignment indicator is sampled from a mixture of existing clusters and a new cluster. To the extent that a new cluster is selected, associated parameter locations are generated (and subsequently resampled) from the posterior of the base distribution under a single observation. [See Paddock and Savitsky (2013) for details.]

We employ the cross-validatory, log pseudo marginal likelihood (LPML) leave-one-out fit statistic as described in Congdon (2005) under importance resampling of the posterior distributions over model parameters to estimate , where indexes our models where the leave-one-out property induces a penalty for model complexity and helps to assess the possibility for overfitting. We also include the criterion of Celeux et al. (2006) that composes the marginal (predictive) density to estimate for composition of which is more appropriate for the (DP or DDP) mixture formulations that characterize all of our models. The nonpenalized mean deviance, , is also utilized.

## 5 Simulation study

### 5.1 Data generation

We generate data sets for simulation modeling from (12) by allocating the first clients to the CBT and a remaining to a nongroup therapy usual care (UC) condition. We employ modules for our simulation. Each CBT client attends modules and each module on average holds clients. The module attendance sequences, , used to select columns of the client-indexed matrix effects, , are next generated in an open-enrollment manner by randomly selecting the starting module for each CBT client in the block of modules to which they are assigned. We set for all UC clients (who, by design, do not attend group therapy modules) as our hold-out or comparator module attendance sequence for identification. Such a design instantiates partial overlaps among the module attendance sequences for clients. The minimum and maximum numbers of clients linked to modules were restricted to and , respectively, to conform to practical limitations on the underlying structure for group therapy modules. We simulate up to three repeated measures per client.

We simulate clusters of clients, where each cluster generates a set of effect locations, , shared by all clients assigned to them. The rows of capture up to second order (intercept, linear, and quadratic) polynomial effects for each module. The effects are generated in a vectorized fashion from a multivariate Gaussian with the covariance formulation as outlined for the DDP base distribution enumerated in Section 3.1. The module effects are generated from a multivariate proper CAR prior under the assumption of adjacency for successive modules with smoothing parameter . A covariance matrix allowing for polynomial orders of module random effects is defined with

where the diagonals encode the variance of the first through third polynomial orders, respectively, for each of multivariate cluster effect locations. We formulate such that the first and second orders and the second and third orders express negative correlations; for example, if the slope for the effect trajectory of a given module expresses a negative trajectory, then the quadratic term is positive and will tend to decelerate or bend the curve back up. Once the effects are generated, clients are randomly assigned to one of the clusters with equal probability. Each cluster will hold both UC and CBT clients, though the module attendance sequence for the UC clients is set to ’s such that their assigned module effects do not contribute to the generation of the response values. The model intercept, , is set to and fixed effect coefficients are set to for , respectively, for each client, , where is an indicator for the treatment arm assigned to client ( for CBT, for UC) and denotes the continuously-valued time at which was observed, taking on value or months. The resultant set of random effects for client , are multiplied with the MM link vector, , to produce matched to for client-specific polynomial variation from the mean time trend (which is captured in ). The model noise precision is set to .

### 5.2 Data modeling

Figure 1 presents in-sample predicted growth curves for randomly selected clients within each treatment arm along with actual client data values. Client growth curves under the DDP model express more adaptiveness to the data, both for -shaped curves as expressed by client and bell-shaped curves estimated for client .

Posterior mean values for the polynomial effect terms assigned to each module are composed into module effect trajectories through time in Figure 2 comparing MM_MV and DDP models for each of the clusters (columns) and for randomly selected modules. The posterior mean module effect trajectories estimated under the DDP model track closer to the true trajectory shapes than do the nonclient adaptive curves for MM_MV.

We compose a small Monte Carlo simulation with iterations, where each generates a data set with the above noted specifications. Estimation is performed under our models for each generated data set and the posterior draws for the fixed effects are concatenated across iterations to examine performance of the 3 comparator formulations under repeated sampling. Figure 3 reveals the posterior distribution over the credible intervals under each model estimated using the predictive margins technique; see Lane and Nelder (1982). We note that the DDP formulation expresses the least uncertainty around the true values (represented by a dashed line at each of the measurement months).

#### 5.2.1 Model fit statistics

Model fit statistics, , and , are presented in Table 1. One observes lower (better) values across all statistics for the DDP than the other two comparator models, while MM_MV, employing multivariate module effects, outperforms MMCAR parameterized with univariate module effects. In particular, the leave-one-out LPML statistic strongly prefers the DDP model. While the DDP is parameterized with client-by-module random effects, the effective parameterization is reduced under the clustering of clients. Nevertheless, the DDP would generally be expected to express a higher number of effective parameters than the two additive models, though the LPML performances do not indicate overfitting. The polynomial construction for enforces smoothness in the estimated fit as demonstrated in the client growth curves from Figure 1, which also serves to mitigate the possibility for overfitting. We performed additional simulations to explore scenarios and modules that, on average, have and clients per module, respectively, with the same number of clients. The relative model differences persist under . The difference between DDP and MM_MV is under modules and under modules and under modules.

Model | |||
---|---|---|---|

MMCAR | |||

MM_MV | |||

DDP |

## 6 Application to group therapy data

We now return to the BRIGHT study for comparison of fit among our model formulations. We further focus on inference under the MM DDP construction and examine heterogeneity with respect to module type in BDI-II trajectories across disjoint clusters of clients. We recall our parameterization of fixed effects for the BRIGHT study data, , where is an indicator for the treatment arm assigned to client ( for CBT, for UC) and denotes the continuously-valued time at which depressive symptom score, was observed. As before, set .

We simplify and focus inference by composing posterior distributions for module effects up to clusters of clients. The client clustering is obtained from among posterior samples of client partitions using the least squares algorithm of Dahl, Day and Tsai (2008). The shapes, magnitudes, and differences across the clusters express the range we see among clients so that we do not lose generality with a focus at the cluster, rather than client, level. The most populated clusters are employed and contain clients, respectively, that together hold out of total BRIGHT study clients. Roughly half of the clients in the clusters are UC clients who do not attend any group therapy modules. UC clients with mean client random effects, , similar to those of a subset of CBT clients are expected to co-cluster in posterior sampling such that the module effect values for all clients in the cluster are assigned the module effect location values for that cluster. This is an intuitive result where UC clients who express similar idiosyncratic characteristics to co-clustered CBT clients would be expected to similarly respond to CBT treatment were it offered to them.

Figure 4 renders module effect trajectories of the BDI-II depressive symptom scores for randomly selected modules. Results are summarized by averaging trajectories into client clusters, with the largest six clusters shown across the columns, denoted by . Each client cluster’s trajectories are presented for each of the open-enrollment CBT therapy groups along the rows within clusters, which are denoted by in the figure. Large differences are observed in shape and magnitude among modules, particularly for client cluster , whose trajectories for each of the four open-enrollment groups are provided in the leftmost column of plot cells of Figure 4. The range of the curves expresses clinically meaningful differences of 4–6 (BDI-II) points [Furukawa (2010)]. Scanning the columns from left to right reveals a marked attenuation in cluster responsiveness to the CBT intervention. Member clients of clusters 4–6 express much less depressive symptom sensitivity to participation in the modules and, therefore, one notes much less differentiation in effect values among the modules for these clusters.

Figure 5 provides additional insight from the DDP model for examining the variation in module effects across clusters of clients and how those effects vary over time. The figure shows module effect trajectories disaggregated into the posterior mean polynomial effects from which they are each rendered across the clusters of clients. The polynomial effect values are presented for all modules, organized in the same cbt group-within-cluster format utilized in Figure 4. These polynomial parameters imply a module effect trajectory with the order effect providing the intercept, the order effect the slope and order a nonlinear quadratic term. The resulting effects trajectory for a module would be -shaped if the order term is positive. As we noted in Figure 4, there is notable variation in the effect of modules on depressive symptoms across client clusters within each of the four CBT therapy groups as we scan from left to right, particularly for cbt groups 1–3; for example, the first two clusters of each CBT therapy group, shown in the first two columns of Figure 5, show clinically meaningful variation in client outcomes.

Model fit statistics, presented in Table 2, reveal an improved fit for the DDP in comparison to the other two models, however, unlike for the simulation results, the MMCAR produces a better fit for the BRIGHT data than does MM_MV. These results indicate the importance of differences across clients in responsiveness to modules. Within-sample predicted growth curves (not shown) demonstrate a similar improvement as observed in Figure 1 in shape and orientation adaptability for the DDP as compared to the other models to express better fit performance.

Model | |||
---|---|---|---|

MMCAR | |||

MM_MV | |||

DDP |

We explore sensitivity of the clustering of clients to our prior specification for the DP concentration parameter, , employed in (10) for the MM DDP model by varying the shape and rate hyperparameters, , employed in the prior, . We vary both hyperparameters in combinations within a range of 1–4 for each, producing a prior number of clusters from a minimum of to a maximum of . While our group therapy data application results show small differences in the posterior numbers of clusters formed, the allocation of clients to the most populous clusters is essentially unchanged, as is our inference on client-module effects. Distributions for underlying parameters are also essentially unchanged.

## 7 Discussion

Our MM DDP approach extends the ANOVA DDP construction of De Iorio et al. (2004) to a multiple membership framework. The MM DDP provides wide support on the space of distributions indexed by the set of distinct multiple membership sequences through the borrowing of strength in overlaps among expressed sequences. The formulation allows one to examine whether element (e.g., module) effects vary across different client trajectories and vice versa, allowing for one to learn about differing response sensitivities among clients to treatment elements, even for unobserved combinations of clients and treatment elements. We compose a model base distribution to retain straight-forward and efficient posterior sampling properties of the DP while allowing flexibility for Gaussian covariance specifications to parsimoniously parameterize dependence among module effects; in particular, we illustrate adjacency-based formulations for the covariance matrix of the Gaussian base measure in a fashion that renders flexibility while retaining conjugacy.

Other alternatives to our MM DDP may be considered, such as the hierarchical DP (HDP) [Teh et al. (2006)] or the nested DP (NDP) [Rodríguez, Dunson and Gelfand (2008)], which both target a grouped data structure with nested observations. These approaches, however, do not anticipate a multiple membership construction where subgroups of clients share connections to the same modules as does the MM DDP, which indexes the collection of random measures, , by multiple membership (attendance) sequence. While one may ignore the multiple membership composition and employ either of the HDP and NDP, they both perform posterior simulations in a nested, two-step, fashion (for a two-level hierarchical formulation), while we see how the MM DDP reduces to a DP that permits a simpler computational approach. Last, neither the HDP or NDP allow inference on unobserved module attendance sequences as does the MM DDP.

The usefulness of our approach may be limited for data with decreasing overlaps among the treatment element (e.g., client module attendance) sequences, , as this would restrict the ability for the data to borrow strength in the estimation of the collection of random distributions, . In one direction where clients perfectly overlap into disjoint groupings of client-modules for CBT studies, the MM DDP reduces to the ANOVA DDP. In the other direction, however, where clients express progressively less overlaps in modules attended, estimability may be compromised. In practice, resource limitations in the total number of modules offered for typical open-enrollment group therapy studies tend to produce a sufficient level of overlaps of clients on each module for estimation.

## References

- Beck, Steer and Brown (1996) {bbook}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmBeck, \bfnmA.\binitsA., \bauthor\bsnmSteer, \bfnmR.\binitsR. \AND\bauthor\bsnmBrown, \bfnmG.\binitsG. (\byear1996). \btitleManual for the Beck Depression Inventory-II. \bpublisherPsychological Corporation, \blocationSan Antonio, TX. \bptokimsref \endbibitem
- Besag, York and Mollié (1991) {barticle}[mr] \bauthor\bsnmBesag, \bfnmJulian\binitsJ., \bauthor\bsnmYork, \bfnmJeremy\binitsJ. \AND\bauthor\bsnmMollié, \bfnmAnnie\binitsA. (\byear1991). \btitleBayesian image restoration, with two applications in spatial statistics. \bjournalAnn. Inst. Statist. Math. \bvolume43 \bpages1–59. \biddoi=10.1007/BF00116466, issn=0020-3157, mr=1105822 \bptnotecheck related\bptokimsref \endbibitem
- Blackwell and MacQueen (1973) {barticle}[mr] \bauthor\bsnmBlackwell, \bfnmDavid\binitsD. \AND\bauthor\bsnmMacQueen, \bfnmJames B.\binitsJ. B. (\byear1973). \btitleFerguson distributions via Pólya urn schemes. \bjournalAnn. Statist. \bvolume1 \bpages353–355. \bidissn=0090-5364, mr=0362614 \bptokimsref \endbibitem
- Carey (2000) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmCarey, \bfnmK.\binitsK. (\byear2000). \btitleA multilevel modelling approach to analysis of patient costs under managed care. \bjournalHealth Economics \bvolume9 \bpages435–446. \bptokimsref \endbibitem
- Celeux et al. (2006) {barticle}[mr] \bauthor\bsnmCeleux, \bfnmG.\binitsG., \bauthor\bsnmForbes, \bfnmF.\binitsF., \bauthor\bsnmRobert, \bfnmC. P.\binitsC. P. \AND\bauthor\bsnmTitterington, \bfnmD. M.\binitsD. M. (\byear2006). \btitleDeviance information criteria for missing data models. \bjournalBayesian Anal. \bvolume1 \bpages651–673 (electronic). \biddoi=10.1214/06-BA122, issn=1931-6690, mr=2282197 \bptnotecheck related\bptokimsref \endbibitem
- Congdon (2005) {bbook}[mr] \bauthor\bsnmCongdon, \bfnmPeter\binitsP. (\byear2005). \btitleBayesian Models for Categorical Data. \bpublisherWiley, \blocationChichester. \biddoi=10.1002/0470092394, mr=2191351 \bptokimsref \endbibitem
- Crowe and Grenyer (2008) {barticle}[pbm] \bauthor\bsnmCrowe, \bfnmTrevor P.\binitsT. P. \AND\bauthor\bsnmGrenyer, \bfnmBrin F S\binitsB. F. S. (\byear2008). \btitleIs therapist alliance or whole group cohesion more influential in group psychotherapy outcomes? \bjournalClin. Psychol. Psychother. \bvolume15 \bpages239–246. \biddoi=10.1002/cpp.583, issn=1099-0879, pmid=19115444 \bptokimsref \endbibitem
- Dahl, Day and Tsai (2008) {bmisc}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmDahl, \bfnmD. B.\binitsD. B., \bauthor\bsnmDay, \bfnmR.\binitsR. \AND\bauthor\bsnmTsai, \bfnmJ. W.\binitsJ. W. (\byear2008). \bhowpublishedDistance-based probability distribution on set partitions with applications to protein structure prediction. Technical report. \bptokimsref \endbibitem
- Dawid (1981) {barticle}[mr] \bauthor\bsnmDawid, \bfnmA. P.\binitsA. P. (\byear1981). \btitleSome matrix-variate distribution theory: Notational considerations and a Bayesian application. \bjournalBiometrika \bvolume68 \bpages265–274. \biddoi=10.1093/biomet/68.1.265, issn=0006-3444, mr=0614963 \bptokimsref \endbibitem
- De Iorio et al. (2004) {barticle}[mr] \bauthor\bsnmDe Iorio, \bfnmMaria\binitsM., \bauthor\bsnmMüller, \bfnmPeter\binitsP., \bauthor\bsnmRosner, \bfnmGary L.\binitsG. L. \AND\bauthor\bsnmMacEachern, \bfnmSteven N.\binitsS. N. (\byear2004). \btitleAn ANOVA model for dependent random measures. \bjournalJ. Amer. Statist. Assoc. \bvolume99 \bpages205–215. \biddoi=10.1198/016214504000000205, issn=0162-1459, mr=2054299 \bptokimsref \endbibitem
- Furukawa (2010) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmFurukawa, \bfnmT.\binitsT. (\byear2010). \btitleAssessment of mood: Guides for clinicians. \bjournalJournal of Psychosomatic Reseach \bvolume68 \bpages581–589. \bptokimsref \endbibitem
- Hill and Goldstein (1998) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmHill, \bfnmP. W.\binitsP. W. \AND\bauthor\bsnmGoldstein, \bfnmH.\binitsH. (\byear1998). \btitleMultilevel modeling of educational data with cross-classification and missing identification for units. \bjournalJournal of Educational and Behavioral Statistics \bvolume23 \bpages117–128. \bptokimsref \endbibitem
- Hodges, Carlin and Fan (2003) {barticle}[mr] \bauthor\bsnmHodges, \bfnmJames S.\binitsJ. S., \bauthor\bsnmCarlin, \bfnmBradley P.\binitsB. P. \AND\bauthor\bsnmFan, \bfnmQiao\binitsQ. (\byear2003). \btitleOn the precision of the conditionally autoregressive prior in spatial models. \bjournalBiometrics \bvolume59 \bpages317–322. \biddoi=10.1111/1541-0420.00038, issn=0006-341X, mr=1987398 \bptokimsref \endbibitem
- Hoff (2011) {barticle}[mr] \bauthor\bsnmHoff, \bfnmPeter D.\binitsP. D. (\byear2011). \btitleSeparable covariance arrays via the Tucker product, with applications to multivariate relational data. \bjournalBayesian Anal. \bvolume6 \bpages179–196. \biddoi=10.1214/11-BA606, issn=1936-0975, mr=2806238 \bptokimsref \endbibitem
- Jin, Carlin and Banerjee (2005) {barticle}[mr] \bauthor\bsnmJin, \bfnmXiaoping\binitsX., \bauthor\bsnmCarlin, \bfnmBradley P.\binitsB. P. \AND\bauthor\bsnmBanerjee, \bfnmSudipto\binitsS. (\byear2005). \btitleGeneralized hierarchical multivariate CAR models for areal data. \bjournalBiometrics \bvolume61 \bpages950–961. \biddoi=10.1111/j.1541-0420.2005.00359.x, issn=0006-341X, mr=2216188 \bptokimsref \endbibitem
- Jones et al. (2006) {barticle}[mr] \bauthor\bsnmJones, \bfnmGalin L.\binitsG. L., \bauthor\bsnmHaran, \bfnmMurali\binitsM., \bauthor\bsnmCaffo, \bfnmBrian S.\binitsB. S. \AND\bauthor\bsnmNeath, \bfnmRonald\binitsR. (\byear2006). \btitleFixed-width output analysis for Markov chain Monte Carlo. \bjournalJ. Amer. Statist. Assoc. \bvolume101 \bpages1537–1547. \biddoi=10.1198/016214506000000492, issn=0162-1459, mr=2279478 \bptokimsref \endbibitem
- Lane and Nelder (1982) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmLane, \bfnmP. W.\binitsP. W. \AND\bauthor\bsnmNelder, \bfnmJ. A.\binitsJ. A. (\byear1982). \btitleAnalysis of covariance and standardization as instances of prediction. \bjournalBiometrics \bvolume38 \bpages613–621. \bptokimsref \endbibitem
- Langford et al. (1999) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmLangford, \bfnmI. H.\binitsI. H., \bauthor\bsnmLeyland, \bfnmA. H.\binitsA. H., \bauthor\bsnmRasbash, \bfnmJ.\binitsJ. \AND\bauthor\bsnmGoldstein, \bfnmH.\binitsH. (\byear1999). \btitleMultilevel modelling of the geographical distributions of diseases. \bjournalJ. R. Stat. Soc. Ser. C. Appl. Stat. \bvolume48 \bpages253–268. \bptokimsref \endbibitem
- Morgan-Lopez and Fals-Stewart (2006) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmMorgan-Lopez, \bfnmA.\binitsA. \AND\bauthor\bsnmFals-Stewart, \bfnmW.\binitsW. (\byear2006). \btitleAnalytic complexities associated with group therapy in substance abuse treatment research: Problems, recommendations, and future directions. \bjournalExperimental and Clinical Psychopharmacology \bvolume14 \bpages265–273. \bptokimsref \endbibitem
- Neal (2000) {barticle}[mr] \bauthor\bsnmNeal, \bfnmRadford M.\binitsR. M. (\byear2000). \btitleMarkov chain sampling methods for Dirichlet process mixture models. \bjournalJ. Comput. Graph. Statist. \bvolume9 \bpages249–265. \biddoi=10.2307/1390653, issn=1061-8600, mr=1823804 \bptokimsref \endbibitem
- Paddock and Savitsky (2013) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmPaddock, \bfnmS. M.\binitsS. M. \AND\bauthor\bsnmSavitsky, \bfnmT. D.\binitsT. D. (\byear2013). \btitleBayesian hierarchical semiparametric modeling of longitudinal post-treatment outcomes from open-enrollment therapy groups. \bjournalJ. Roy. Statist. Soc. Ser. A \bvolume176 \bpages795–808. \bptokimsref \endbibitem
- Paddock et al. (2011) {barticle}[mr] \bauthor\bsnmPaddock, \bfnmSusan M.\binitsS. M., \bauthor\bsnmHunter, \bfnmSarah B.\binitsS. B., \bauthor\bsnmWatkins, \bfnmKatherine E.\binitsK. E. \AND\bauthor\bsnmMcCaffrey, \bfnmDaniel F.\binitsD. F. (\byear2011). \btitleAnalysis of rolling group therapy data using conditionally autoregressive priors. \bjournalAnn. Appl. Stat. \bvolume5 \bpages605–627. \biddoi=10.1214/10-AOAS434, issn=1932-6157, mr=2840167 \bptokimsref \endbibitem
- R Development Core Team (2011) {bmisc}[auto:STB—2013/04/11—08:11:48] \borganizationR Development Core Team (\byear2011). \bhowpublishedR: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN-07-0. Available at http://www.R-project.org/. \bptokimsref \endbibitem
- Rodríguez, Dunson and Gelfand (2008) {barticle}[mr] \bauthor\bsnmRodríguez, \bfnmAbel\binitsA., \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. \AND\bauthor\bsnmGelfand, \bfnmAlan E.\binitsA. E. (\byear2008). \btitleThe nested Dirichlet process. \bjournalJ. Amer. Statist. Assoc. \bvolume103 \bpages1131–1144. \biddoi=10.1198/016214508000000553, issn=0162-1459, mr=2528831 \bptokimsref \endbibitem
- Ryum et al. (2009) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmRyum, \bfnmT.\binitsT., \bauthor\bsnmHagen, \bfnmR.\binitsR., \bauthor\bsnmNordahl, \bfnmH.\binitsH., \bauthor\bsnmVogel, \bfnmP.\binitsP. \AND\bauthor\bsnmStiles, \bfnmT.\binitsT. (\byear2009). \btitlePerceived group climate as a predictor of long-term outcome in a randomized controlled trial of cognitive-behavioural group therapy for patients with comorbid psychiatric disorders. \bjournalBehavioural and Cognitive Psychotherapy \bvolume37 \bpages497–510. \bptokimsref \endbibitem
- Savitsky and Paddock (2012) {bmisc}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmSavitsky, \bfnmT.\binitsT. \AND\bauthor\bsnmPaddock, \bfnmS.\binitsS. (\byear2012). \bhowpublishedGrowcurves: Semiparametric hierarchical Bayesian modeling of longitudinal outcomes. R package version 2.15.2. Available at http://CRAN.R-project.org/package=growcurves. \bptokimsref \endbibitem
- Savitsky and Vannucci (2010) {barticle}[mr] \bauthor\bsnmSavitsky, \bfnmTerrance\binitsT. \AND\bauthor\bsnmVannucci, \bfnmMarina\binitsM. (\byear2010). \btitleSpiked Dirichlet process priors for Gaussian process models. \bjournalJ. Probab. Stat. \bvolume2010 \bpagesArt. ID 201489, 14. \bidissn=1687-952X, mr=2745498 \bptokimsref \endbibitem
- Sethuraman (1994) {barticle}[mr] \bauthor\bsnmSethuraman, \bfnmJayaram\binitsJ. (\byear1994). \btitleA constructive definition of Dirichlet priors. \bjournalStatist. Sinica \bvolume4 \bpages639–650. \bidissn=1017-0405, mr=1309433 \bptokimsref \endbibitem
- Smokowski, Rose and Bacallao (2001) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmSmokowski, \bfnmP.\binitsP., \bauthor\bsnmRose, \bfnmS.\binitsS. \AND\bauthor\bsnmBacallao, \bfnmM.\binitsM. (\byear2001). \btitleDamaging experiences in therapeutic groups: How vulnerable consumers become group casualties. \bjournalSmall Group Research \bvolume32 \bpages223–251. \bptokimsref \endbibitem
- Teh et al. (2006) {barticle}[mr] \bauthor\bsnmTeh, \bfnmYee Whye\binitsY. W., \bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I., \bauthor\bsnmBeal, \bfnmMatthew J.\binitsM. J. \AND\bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M. (\byear2006). \btitleHierarchical Dirichlet processes. \bjournalJ. Amer. Statist. Assoc. \bvolume101 \bpages1566–1581. \biddoi=10.1198/016214506000000302, issn=0162-1459, mr=2279480 \bptokimsref \endbibitem
- Watkins et al. (2011) {barticle}[auto:STB—2013/04/11—08:11:48] \bauthor\bsnmWatkins, \bfnmK. E.\binitsK. E., \bauthor\bsnmHunter, \bfnmS. B.\binitsS. B., \bauthor\bsnmHepner, \bfnmK. A.\binitsK. A., \bauthor\bsnmPaddock, \bfnmS. M.\binitsS. M., \bauthor\bparticlede la \bsnmCruz, \bfnmE.\binitsE., \bauthor\bsnmZhou, \bfnmA. J.\binitsA. J. \AND\bauthor\bsnmGilmore, \bfnmJ.\binitsJ. (\byear2011). \btitleAn effectiveness trial of group cognitive behavioral therapy for patients with persistent depressive symptoms in substance abuse treatment. \bjournalArchives of General Psychiatry \bvolume68 \bpages1–8. \bptokimsref \endbibitem