A Variational EM Method for Mixed Membership Models with Multivariate Rank Data: an Analysis of Public Policy Preferences
Abstract
In this article, we consider modeling ranked responses from a heterogeneous population. Specifically, we analyze data from the Eurobarometer 34.1 survey regarding public policy preferences towards drugs, alcohol and AIDS. Such policy preferences are likely to exhibit substantial differences within as well as across European nations reflecting a wide variety of cultures, political affiliations, ideological perspectives and common practices. We use a mixed membership model to account for multiple subgroups with differing preferences and to allow each individual to possess partial membership in more than one subgroup. Previous methods for fitting mixed membership models to rank data in a univariate setting have utilized an MCMC approach and do not estimate the relative frequency of each subgroup. We propose a variational EM approach for fitting mixed membership models with multivariate rank data. Our method allows for fast approximate inference and explicitly estimates the subgroup sizes. Analyzing the Eurobarometer 34.1 data, we find interpretable subgroups which generally agree with the “left vs right” classification of political ideologies.
Mixed Membership Models for Rank Data {aug} , and
mixed membership \kwdrank data \kwdvariational inference \kwdeurobarometer \kwdpublic policy
1 Introduction
Rank data often arise from a heterogeneous population with individuals whose preferences may vary widely. In this article, we consider one such example, public health policy data from the Eurobarometer 34.1. We develop a computationally efficient variational EM procedure to estimate mixed membership models with rank data. In addition to the computational aspects, we also extend the current literature by explicitly estimating the subgroup relative frequencies and accommodating multivariate ranked data.
2 Public Policy Preferences
Social scientists have long held that in democratic societies, public opinion plays an important role in the formation of public policies about important social problems (e.g. Burstein, 1998; Brooks and Manza, 2008). Public opinion polls, which provide a window on the attitudes and perspectives of a nation’s citizenry, at times elicit ranked data about specific public policies. An example is the Eurobarometer 34.1, a survey commissioned in 1990 to study European perspectives on various political and public health issues (Reif and Melich, 2001). The Eurobarometer 34.1 collected data on a broad range of topics, using a range of question formats, including measures of health behavior, knowledge of illegal drugs, descriptions of family structure, and attitudes toward children. In particular, three of the survey questions, shown in Figure 1, asked about public policy priorities toward addressing societal/public health problems. Therefore, our analysis focuses on examining responses to these questions. We should note that, if the survey contained other pertinent questions of binary or multinomial responses, these data could have been included in the analysis using the mixedMem R package (Wang and Erosheva, 2015; R Core Team, 2016) which allows for multivariate analysis when the variables are of different distributions.
The specific variables we consider address illicit drugs, alcoholism, and AIDS. The survey respondents were asked to rank, in order of priority, policies such as punishment for offenders, information campaigns to educate the public, rehabilitation and treatment, funding of research into causes and treatment, and fighting social causes. Such priority rankings are likely to vary across respondents within a nation, reflecting dissensus among a citizenry, as well as across nations, reflecting national differences in culture, political affiliation, ideological perspective, and common practices.
Question 28:
There are various actions that could be taken to eliminate the drugs problem. In your opinion, what is the first priority? And the next most urgent? (Ask respondent to rank all 7, with 1 as the most urgent)


Question 39:
There are various actions that could be taken in order to ease the problem of alcoholism and its consequences. Looking at this card, which is the main priority in your view? and the next? (Rank up to 5)

Question 47:
There are various actions that could be taken in order to eliminate the problem of AIDS or at least to slow down its development. Looking at this card, which is the main priority in your view? And the next? (Rank all by giving a number from 1 to 5. with 1 as a top priority)

Drug  Count 

1,2,3,4,5,6,7  109 
1,2,4,5,6,7,3  82 
2,1,4,5,6,7,3  69 
2,1,4,6,7,5,3  55 
2,3,1,4,5,6,7  53 
Alcohol  Count 

1,2,8,9,10  85 
1,5,8,9,10  61 
1,8,9,10,3  60 
1,2,5,8,10  57 
1,8,9,10,2  53 
AIDS  Count 

1,5,4,2,3  740 
5,4,1,2,3  722 
5,1,4,2,3  670 
1,5,4,3,2  604 
1,4,5,2,3  515 
The top 5 observed responses for each question, shown in Table 1, are suggestive of significant population heterogeneity. For example, for illegal drugs, a “legal penalty for drug taking” (policy 3) is highly ranked in the first and fifth most observed permutations, but ranked last in the second, third and fourth most observed permutations. This heterogeneity is not surprising because the individuals in the survey come from a wide variety of nationalities, religious backgrounds and age groups (shown in Table 2). In analyzing such heterogeneity, an important question is whether there are underlying subgroups or policy profiles among citizens. For example, when asked how they prioritize policies about the problem of illegal drugs, do citizens form a single group that varies along a dimension of punishment to rehabilitation, which reflects a conservativeliberal continuum? Or do responses reflect subgroups of citizens, in which some favor punitive measures, others rehabilitation, and still others education? Do some subgroups systematically oppose some policy measures, while favoring others? Moreover, given such subgroups, are some citizens members of multiple groups, favoring, for example, both rehabilitation and information? In modern democracies characterized by a diverse citizenry, such subgroups may be likely to exist. Failure to consider such subgroups, when they in fact exist, may lead to a distorted view of a nation’s public. Appropriately modeling the population heterogeneity may be particularly important when the questions cover a wide range of topics, as they do here.
Nation  Count  Nation  Count  Nation  Count 

Belgium  812  Greece  980  Northern Ireland  267 
Denmark  950  Ireland  879  Portugal  978 
East Germany  938  Italy  1048  Spain  864 
France  967  Luxembourg  242  West Germany  975 
Great Britain  956  Netherlands  1016 
Religion  Count  Religion  Count 

Buddhist  11  Orthodox  1057 
Hindu  2  Other  162 
Jewish  20  Protestant  2182 
Muslim  23  Roman Catholic  5617 
None  2692 
Age Group  Count 

1524 Years  2401 
2539 Years  3489 
4054 Years  2780 
55+ Years  3202 
3 Mixed Membership Models
3.1 Previous Work
Many approaches for modeling rank data have been proposed; for a review, see Marden (1995). In this paper, we focus on the PlackettLuce model due to several attractive attributes (discussed further in Section 4). Assuming population homogeneity, Hunter (2004) develop a minorizationmaximization method for estimating MLEs for a single PlackettLuce distribution, and Guiver and Snelson (2009) present a Bayesian framework for estimation.
Most of the previous work that address heterogeneity in rank data consider the univariate case and specify a mixture model. Gormley and Murphy (2006) assume a mixture of PlackettLuce distributions, while Busse, Orbanz and Buhmann (2007) assume a mixture of Mallow’s distributions. In addition, Bayesian nonparametric approaches have been used to allow for an infinite number of latent subgroups (Meila and Chen, 2010) and an infinite set of alternatives (Caron, Teh and Murphy, 2014). Gormley and Murphy (2008) propose a mixture of experts model where individual level covariates specify the probability that an individual belongs to a specific subgroup. However, each of these mixture model approaches assume that every individual always expresses preferences consistent with only a single subgroup. In many cases this may be overly restrictive.
Gormley and Murphy (2009) propose a mixed membership model for univariate rank data which allows for intrasubgroup mixing between ranking levels. Mixed membership models extend mixture models by allowing an individual membership to be split among multiple subgroups (e.g. Airoldi et al., 2015). As Gross and ManriqueVallier (2015, pg 128) point out, the structure of a mixed membership model is consistent with Zaller’s (1992) model of responses to public opinion polls, in which “respondents randomly sample from a number of privately held ‘considerations’ relevant to the question at hand.” Gross and ManriqueVallier (2015) analyzed data on political ideology, examining core beliefs and values, finding that mixed membership models reveal hidden structures in political ideologies that previous factor analytic results missed.
Direct maximum likelihood estimation of mixed membership models is generally intractable, so MCMC or approximate inference techniques are used. In particular, Gormley and Murphy (2009) develop a MetropoliswithinGibbs sampler. Although the MCMC method allows for direct sampling from the posterior, it scales poorly. Because of the individual level parameters, the total number of parameters is directly proportional to sample size and the number of variables, which can result in slow mixing even for moderate sample sizes. MCMC methods can also require a considerable amount of human effort since convergence diagnostics must be checked for all parameters (Gill, 2008).
3.2 Contribution of this work
In this article, we propose a variational EM approach which scales well with the number of observed individuals and is capable of tractably estimating mixed membership models for rank data with a much larger sample size than can be handled by existing MCMC methods. Variational inference has been used as an alternative estimation procedure in mixed membership models where MCMC methods would be computationally infeasible (Blei, Ng and Jordan, 2003; Erosheva, Fienberg and Lafferty, 2004; Airoldi et al., 2008) and has been shown empirically to provide results similar to MCMC in some cases (Erosheva, Fienberg and Joutard, 2007). A direct computational comparison between the proposed variational method and the MCMC method detailed by Gormley and Murphy (2009) is provided in the supplement (Wang and Erosheva, 2016).
Motivated by Eurobarometer data on public policy priorities, we extend the method of Gormley and Murphy (2009) to allow for multivariate data and directly estimate the relative frequencies of each subgroup. The direct estimation of the subgroup relative frequencies can be viewed as an empirical Bayes type procedure or robust modeling practice and has been shown to improve predictive performance in many cases (Wang and Blei, 2015). Indeed, we show in the supplement that direct estimation of the subgroup relative frequencies drastically improves the goodnessoffit for exit poll data gathered during the 1997 Irish Presidential Election (Wang and Erosheva, 2016). Finally, informed by exploratory data analyses, we extend the model specification to include two prespecified subgroups.
The remainder of the article is structured as follows. We introduce the PlackettLuce distribution in Section 4 and the mixed membership rank data model in Section 5. In Section 6, we review the variational approximation framework and detail the variational EM algorithm. Finally, we use the proposed method to analyze public policy priorities from the Eurobarometer 34.1 survey in Section 7 and conclude with discussion in Section 8.
4 Modeling Rank Data
First, we consider univariate rank data. Suppose there are alternatives in the choice set (items to be ranked). For each individual , a single observation is a permutation of of the alternatives in the choice set. Each alternative (in this context, alternatives correspond to policies) is assigned a nonnegative support parameter which governs how strongly it is preferred to other alternatives (Plackett, 1975). The support parameters sum to 1 for identifiability. At each ranking level, one of the remaining alternatives is selected with probability proportional to its support parameter. The mass function for the PlackettLuce model is defined as
(1) 
where indicates the alternative selected at the ranking level, and . The PlackettLuce selection process can be thought of as a multinomial without replacement, and the support parameters represent the probability of a policy being selected as the top priority.
Although there are various distributions for modeling rank data, the PlackettLuce model has several attractive properties. First, it can accommodate incomplete rankings when not all alternatives in the choice set are selected. Assuming unranked policies are less preferred than ranked policies, the mass function in Equation 1 has marginalized out any unranked policies. Second, the model satisfies Luce’s Choice Axiom (Luce, 1977) which states that an individual’s relative preference between alternatives and should not change when a third alternative is introduced (Sen, 2014). Finally, the parameter space of the PlackettLuce model is a continuous set in the simplex, while the parameter space of other popular models (notably Mallow’s model) may be discrete sets which can greatly complicate inference.
5 Generative Model: Multivariate Rank Data
We propose the following generative model for multivariate rank data. Assuming latent subgroups, Dirichlet membership parameter , variables each with alternatives, and a set of support parameters for each variable and subgroup where , the generative mixed membership model is:

For each individual

Draw a membership vector Dirichlet().

For each variable and for each ranking level

Draw a context indicator multinomial(1,).

Draw a policy priority .


where denotes that is dependent on for since the alternatives selected at a prior ranking levels cannot be selected again. The corresponding complete data likelihood and the graphical representation are shown in Equation 2 and in Figure 2.
(2) 
In the model, denotes the degree of membership of individual within subgroup , and in the Eurobarometer context, indicates an individual’s level of adherence to a policy ideology. is the subgroup governing individual’s selection for variable at ranking level . Note that the model assumes mixing of subgroup preferences occur both between different variables and within a single observed ranking. Thus, an individual may select their top choice according to the preferences of one subgroup, but select their second choice according to the preferences of another subgroup. We note that, conditional on the membership , there is no further dependence enforced on each across the ranking levels of a single variable. Although the are exchangeable in the generative model, in the posterior conditioned on the observations, the context indicators are no longer exchangeable because of the assumed sequential nature of the ranking procedure.
6 Variational EM Approach
6.1 Variational Approximation
Calculating the marginal probability requires marginalizing over the simplicial membership vectors and context indicators . Because this calculation is intractable, we use a meanfield variational method which approximates the true posterior for latent variables and . A detailed tutorial of variational inference is provided by Wainwright and Jordan (2008).
The variational distribution is
(3) 
with variational parameters and ( and lies in the dimension simplex). Because it factors easily into functions of the variational parameters, this approximation facilitates tractable computation via the Variational EM algorithm. We first derive a lower bound on the marginal distribution of the observed rankings using Jensen’s inequality.
(4)  
The last line in Equation 4 is often called the Evidence Lower Bound (ELBO) and is a function of the data, the variational parameters, and , as well as the global parameters and . It can be shown that maximizing the ELBO with respect to the variational parameters and minimizes the KLdivergence between the true posterior and the variational distribution. In addition, fixing the variational parameters and maximizing the lower bound with respect to and can be used as a surrogate procedure for selecting maximum likelihood estimates for and (Beal, 2003). Ultimately, by maximizing the lower bound (the ELBO), we simultaneously find pseudoMLE estimates for and and an approximate posterior distribution for the latent membership and context variables and . This is accomplished by iterating between Esteps and Msteps as shown in Algorithm 1. The lower bound is given in Equation 5, but the derivation is left for the appendix.
(5)  
denotes the gamma function; denotes the digamma function, the derivative of .
6.2 EStep: Update and
The Estep maximizes the lower bound with respect to the individual level parameters and . Taking the derivative of the lower bound yields the following updates for and :
(6)  
where (a multinomial parameter) is normalized to sum to 1. We continue updating each parameter in a coordinate ascent procedure until the relative increase in the ELBO is below a specified tolerance.
6.3 MStep: Update and
The MStep, described in Algorithm 2, fixes the variational parameters and , and selects , the Dirichlet parameter for the membership vectors, and , the PlackettLuce support parameters, to maximize the lower bound on the marginal loglikelihood. For both parameters, there are no closed form solutions so we use iterative updates.
For , we use a NewtonRaphson method to maximize the lower bound.
(7)  
Since is subject to the following constraints and for , we use an interior point method to select an optimal (Nocedal and Wright, 1999).
Because the constraints are only enforced on each individual set and the objective function separates into additive terms (with respect to the ), we can select for each separately by solving the minimization problem:
(8) 
where (i.e., the nonnegativity constraint on has been converted into a penalty term which assigns infinite loss to infeasible points). Because B is not a smooth function of , we approximate it with the smooth function and solve the relaxed minimization problem instead.
For notational ease, we use to denote the objective function; denotes Hessian of ; denotes the gradient of and 1 denotes a row vector of 1’s with length .
(9)  
Satisfying the KarushKuhnTucker conditions with the remaining equality constraint yields the update direction for , where
(10) 
Because the Newton step in Equation 10 uses a quadratic approximation of the objective function, the proposed increment may be illsized. If the step size is too large, the increment may actually lead to a larger value of the objective function or infeasible updates where . Thus, we use a backtracking line search, detailed in Algorithm 3, to ensure that each update will always increase the lower bound and remain in the feasible set.
6.4 Algorithm Discussion
For numerical stability, we first solve the minimization for a small value of , and then use that solution to initialize subsequent minimizations with increasingly larger values of (Nocedal and Wright, 1999). In addition, if a particular variable has a large number of options, inverting the Hessian can become computationally expensive. A quasiNewton or gradient ascent procedure may require less overall computation by avoiding large matrix inversions. We also note that the subroutines for and each can be computed completely in parallel which may be fruitful if the number of subgroups or variables is large.
As with almost all variational methods, the objective function is multimodal, so only convergence to a local maximum is guaranteed. Using available prior knowledge can be very helpful in determining reasonable initializations for a specific problem; however, multiple initialization points are recommended to increase the probability of finding the global maximum.
We found empirically that initializing and with the following heuristically driven two step procedure generally resulted in stationary points with a larger ELBO. Using a random initialization of and and setting all values of and to , iterate the variational EM procedure until reaching a stationary point and . Then, use the resulting global parameters and to initialize a second run (with and reset to ). The result of the first run is only a stationary point with respect to all the parameters (both global and individual), so resetting the and parameters will generally result in a new stationary point where the global parameters and are different than the intermediate initialization points.
Because the dimension of even just the and parameters can be quite large, a huge number of random restarts may be needed to explore the space well when selecting uniformly. We posit that using this two step procedure to find initialization points concentrates the search in areas where the ELBO is likely to be larger.
7 Eurobarometer Analysis
We now analyze rank data from the Eurobarometer 34.1 survey (Reif and Melich, 2001). We removed individuals with missing data (i.e. anyone who did not respond to all 3 questions of interest) and individuals who had reported ties in any of their rankings, leaving 11,872 individuals of the original 12,733.
7.1 Model Selection
Table 1 shows the top 5 observed rankings for each question. In particular, we observe that the response which ranks the policy priorities in the exact order of presentation is the most common pattern for drug priorities and is also the and most common pattern for alcohol and AIDS. If some individuals used this ordering out of convenience, the resulting responses may not be informative of true policy preferences. To capture this tendency, we include a subgroup whose preferences correspond with the presentationordered permutation. Following Gormley and Murphy (2006), we also include a “noise” subgroup whose preferences are uniform across all policy priorities. We fix the support parameters, , for these groups, but estimate their relative frequencies (the corresponding elements of ). This approach is similar to the extended grade of membership model (Erosheva, Fienberg and Joutard, 2007) which also models specific response patterns with unusually high counts; however, here we allow for partial membership in the fixed subgroups, whereas Erosheva, Fienberg and Joutard (2007) assume that some individuals are full members of the fixed subgroups.
We use a held out ELBO procedure to select an appropriate number of subgroups . We randomly split the sample in half to create a training set and test set. For each (we do not include because that would only be the 2 fixed groups), we first fit a model to the training set and select global parameters and . Then, to compute the held out ELBO on the test set, we use a single Estep which fits the individual variational parameters and for the test set given the and from the training set. We use 40 different initialization points at each and select the stationary point across all with the highest resulting held out ELBO. We then used the stationary point selected by the procedure to initialize a final run with the results presented below.
To ensure that the model interpretation is not dependent on the selected training/test set, we repeated this procedure with 3 different training/test splits. For each batch, the held out ELBO values do vary widely within a fixed due to the multimodality of the ELBO. However, we see the same trend in all 3 cases; the largest held out ELBO values for each are somewhat close and peak at either 4 or 5 and the maximum held out ELBO values decrease rapidly as increases beyond 6. Of the three batches, the first batch selects a 5 subgroup model (including the two fixed groups) and the other two batches select a 4 subgroup model. Figure 3 shows that the two largest subgroups of the first batch (5 subgroup model) are very similar in structure to the two nonfixed subgroups of the second and third batches (4 subgroup models). Since the 5 subgroup model has the largest ELBO, we describe that model in the remainder of this article.
7.2 Goodness of Fit
To check goodnessoffit, we generate 1000 simulated data sets using the fitted values and . Figure 4 shows that the model captures the general trend of observed counts for the first place rankings of each variable. These plots are not quite posterior predictive checks because and are fixed so the variability of the simulated outcomes is smaller than if and were also considered random quantities.
7.3 Model Interpretation
Table 3 presents the ratio of the estimates and uniform support parameters (ie, ). Thus, the reported values represent how many times more likely a full member of a subgroup would be to select a specific policy as their top priority compared to an individual selecting policies randomly. A value larger than 1 indicates that the policy is more popular than average for the variable and subgroup, and a value less than 1 indicates that the policy is less popular than average. The of these values are also represented in Figure 5 where priorities favored more than average have a positive bar height and priorities favored less than average have a negative bar height. Furthermore, within each subgroup, priorities are sorted by estimated support allowing readers to more easily characterize subgroup preferences.
Subgroup 1 generally favors punitive policies. For illegal drugs, the top two priorities are “Punish dealers” and “Penalize users,” and subgroup 1 is 44 times more likely to select “Punish dealers” than the least favored option of “Funding research.” Similarly, for alcohol, the most popular policies are “Stricter penalties for offenses” and “Restricting sale.” Although “Ostracizing alcoholics” is the least popular policy for all subgroups, in subgroup 1, it is only 55 times less likely than the top priority while it is 260 times less popular than the top option for subgroup 3 and numerically zero for subgroup 2^{1}^{1}1For alcohol, there are 10 options and at most 5 ranking levels. Thus, it may be possible for an option to appear extremely infrequently or not at all.. For AIDS, although the two punitive options (“Punish behavior” and “Isolate patients”) are the least favored policies, subgroup 1 is only roughly 3.5 times less likely to select these two options relative to the most popular option of “Funding research.”
Subgroup 2 generally prioritizes “Information campaigns.” “Information campaigns” are roughly 1.7 times more likely to be selected as the top priority than the second most popular option of “Penalizing dealers” and 14 times more likely than the least popular option “Penalize users.” For alcohol, “Information campaigns” are 4 times more likely than the second most popular alternative “Rehabilitate alcoholics”. The least popular option is “Penalize users”, with an estimated support parameter that is numerically zero; the two other least popular policies include “Increasing taxes” and “Lowering limits.” The dislike for these options is consistent with the idea of limited government social intervention. For AIDS, subgroup 2 is the only subgroup for which “Funding research” is not the most popular option, with “Information campaigns” roughly 1.3 times more likely than “Funding research.”
Finally, subgroup 3 typically supports rehabilitation and treatment, as well as funding research. For illegal drugs, although the most popular policy is “Treating addicts,” “Punishing dealers” and “Addressing social causes” are also popular policies. For alcoholism, “Rehabilitation” is by far the most popular policy with “Funding research” and “Increasing resources for rehabilitation” as the only other options with substantial support. For AIDS, this subgroup expresses strong support for “Treating AIDS” and “Funding research.”
Broadly speaking, the identified groups are consistent with the typical Left (liberal) vs Right (conservative) political ideology archetypes. The focus on punitive measures is consistent with a right leaning approach towards governance while the focus on information and rehabilitation typifies a more left leaning approach (Cavadino and Dignan, 2006).
Subgroup 1  Subgroup 2  Subgroup 3  
Punitive Subgroup  Info Subgroup  Rehab Subgroup  
Drug Policy  
Inform Public  0.191 (0.175, 0.209)  2.249 (2.17, 2.324)  0.745 (0.694, 0.8) 
Punish Dealers  5.719 (5.623, 5.808)  1.338 (1.27, 1.404)  1.411 (1.304, 1.527) 
Penalize Users  0.344 (0.316, 0.376)  0.157 (0.147, 0.167)  0.225 (0.204, 0.246) 
Treat Addicts  0.162 (0.149, 0.176)  1.103 (1.064, 1.144)  1.456 (1.373, 1.542) 
Fund Research  0.132 (0.122, 0.143)  0.489 (0.469, 0.509)  0.966 (0.913, 1.017) 
Social Causes  0.251 (0.231, 0.272)  1.197 (1.149, 1.247)  1.386 (1.313, 1.461) 
Control Medicine  0.202 (0.186, 0.219)  0.468 (0.449, 0.489)  0.81 (0.767, 0.853) 
Alcohol Policy  
Inform Public  0.648 (0.609, 0.689)  4.565 (4.421, 4.709)  0.324 (0.296, 0.355) 
Penalize Off  4.269 (4.104, 4.432)  0.722 (0.681, 0.767)  0.192 (0.168, 0.218) 
Ban Ads  0.85 (0.798, 0.904)  0.744 (0.702, 0.787)  0.104 (0.089, 0.121) 
Inc Taxes  0.541 (0.509, 0.576)  0.25 (0.23, 0.271)  0.045 (0.036, 0.055) 
Rest Sale  2.086 (1.994, 2.181)  0.7 (0.662, 0.741)  0.391 (0.347, 0.438) 
Low Limits  0.472 (0.445, 0.5)  0.243 (0.224, 0.262)  0.134 (0.117, 0.154) 
Ostze Alcs  0.074 (0.065, 0.084)  0.00 (0.00, 0.00)  0.022 (0.017, 0.028) 
Rehab Alcs  0.535 (0.505, 0.563)  1.128 (1.085, 1.171)  5.194 (5.011, 5.38) 
Fund Research  0.27 (0.25, 0.288)  0.787 (0.751, 0.822)  1.788 (1.69, 1.887) 
Inc Resources  0.254 (0.234, 0.274)  0.862 (0.825, 0.898)  1.805 (1.718, 1.893) 
AIDS Policy  
Inform Public  1.026 (0.984, 1.067)  2.314 (2.234, 2.393)  0.646 (0.598, 0.7) 
Punish behavior  0.521 (0.495, 0.548)  0.079 (0.072, 0.086)  0.251 (0.228, 0.276) 
Isolate Patients  0.537 (0.508, 0.57)  0.072 (0.065, 0.079)  0.199 (0.176, 0.225) 
Treat AIDS  1.031 (0.993, 1.068)  0.74 (0.705, 0.775)  1.607 (1.524, 1.685) 
Fund Research  1.885 (1.819, 1.951)  1.796 (1.725, 1.869)  2.296 (2.195, 2.4) 
Shown in Table 4, the small magnitude of , the Dirichlet membership parameter, suggests relatively low levels of intraindividual mixing. However, the modal grade of membership in Figure 6 shows that a quarter of all individuals still exhibit significant intraindividual mixing. We also see that the nontrivial relative frequency estimates of the noninformative fixed groups justify their inclusion in the analysis.
Relative Frequency  

Subgroup 1  0.05 (0.048, 0.053)  0.322 (0.314, 0.329) 
Subgroup 2  0.048 (0.047, 0.051)  0.31 (0.302, 0.318) 
Subgroup 3  0.024 (0.023, 0.026)  0.154 (0.149, 0.16) 
Subgroup 4  0.014 (0.013, 0.015)  0.088 (0.084, 0.092) 
Subgroup 5  0.02 (0.019, 0.021)  0.126 (0.122, 0.131) 
The Dirichlet distribution for implicitly enforces negative dependence between subgroup memberships, although positive dependencies could be modeled using distributions considered by Blei and Lafferty (2005). However, the magnitude of correlations between subgroup memberships is still informative. In Table 5, the estimated membership in the informative subgroups is more strongly correlated with membership in the other informative subgroups (the least negative correlation between subgroups 13 is .29) than membership in the fixed groups (the most negative correlation between subgroups 13 vs subgroups 45 is .25). This is not surprising because subgroups 13 indicate a particular ideology on public policy, while subgroups 4 and 5 essentially represent the lack of preferences which align with the dominant subgroups.
Subgroup 1  Subgroup 2  Subgroup 3  Subgroup 4  Subgroup 5  
Subgroup 1  1.00  0.55  0.29  0.25  0.07 
Subgroup 2  0.55  1.00  0.36  0.25  0.18 
Subgroup 3  0.29  0.36  1.00  0.16  0.03 
Subgroup 4  0.25  0.25  0.16  1.00  0.04 
Subgroup 5  0.07  0.18  0.03  0.04  1.00 
7.4 Uncertainty Estimates
Because and were selected through a pseudoMLE procedure, there are no readily available modelbased standard errors; however, we estimate standard errors via an empirical bootstrap procedure. For each bootstrap sample , we select 11,872 individuals with replacement from the observed sample and use the variational EM procedure to select pseudoMLE estimates and . Each bootstrap sample run is initialized at the same starting points used for the full model. This initialization avoids overestimating variability in stationary points due to multimodality of the objective function and seeks to only capture sampling variability. To form 95% confidence intervals, we take the .025 and .975 quantiles of the bootstrapped estimates.
7.5 Multivariate vs Univariate Model
We also acknowledge the implicit decision to use a multivariate model instead of fitting a univariate model for each question. Under a univariate models, subgroup membership for each individual is estimated independently of the responses to other questions. By contrast, in a multivariate specification, individuals can still exhibit a different mix of subgroups across each question and ranking level, but the posterior estimates are shrunk towards the individual’s overall membership .
As a sensitivity analysis, we fit univariate models for each question. Fewer subgroups may be necessary when considering a univariate model when compared a multivariate model, but since the univariate models are only used to validate the structure of subgroups identified with multivariate data, we fit models with 5 subgroups and do not repeat the model selection procedure. The estimated support parameters do not differ substantially from the multivariate model, but the estimated membership parameters differ across univariate models in an informative way. Table 6 shows a much higher proportion of membership in the information subgroup for the AIDS univariate model than we see in the drugs univariate, alcohol univariate, or full multivariate models. This suggests that, on average, individuals have a stronger preference for “Information campaigns” to address AIDS, which is not seen as strongly in addressing alcohol or drugs. As expected, the relative frequencies when averaged across all three univariate models are similar to the relative frequencies of the full model.
Subgroup 1  Subgroup 2  Subgroup 3  Subgroup 4  Subgroup 5  
Drug Univariate Est  0.031  0.020  0.015  0.010  0.011 
Alcohol Univariate Est  0.062  0.042  0.042  0.021  0.027 
AIDS Univariate Est  0.006  0.060  0.018  0.014  0.012 
Full Model Est  0.050  0.048  0.024  0.014  0.020 
Drug Univariate Rel Freq  0.362  0.229  0.169  0.113  0.128 
Alcohol Univariate Rel Freq  0.321  0.217  0.218  0.107  0.138 
AIDS Univariate Rel Freq  0.052  0.549  0.160  0.126  0.113 
Full Model Rel Freq  0.322  0.310  0.154  0.088  0.126 
Average Univariate Relative Frequency  0.259  0.319  0.181  0.115  0.127 
7.6 Individual Membership Estimates
We examine two specific individuals that illustrate the richness of description afforded by using a mixed membership model.
Table 7 shows the observed responses from a 68 year old British male. For addressing drugs, his responses follow the presentation ranking, but for alcohol and AIDS, he indicates a preference for research and rehabilitation. We estimate the membership for this man to be 66% subgroup 3 (Rehab and Research) and 34% subgroup 5 (Presentation Ordering). Because the mixed membership framework allows for intraindividual mixing, the rank ordered response for drug policy is attributed to the noninformative subgroup. This contrasts with a finite mixture model approach, which would otherwise include this noisy response for drug policy in the estimates for subgroup 3.
Drug Policy  Alcohol Policy  AIDS Policy  

Priority 1  Inform Public  Fund Research  Isolate Patients 
Priority 2  Punish Dealers  Increase Resources  Treat AIDS 
Priority 3  Penalize Users  Rehabilitate Alcoholics  Fund Research 
Priority 4  Treat Addicts  Ban Advertisements  Inform Public 
Priority 5  Fund Research  Inform Public  Punish behavior 
Priority 6  Fight Social Causes  
Priority 7  Control Medicine 
Table 8 shows the responses of a 40 year old Spanish female. Her perspectives on alcohol policy differ drastically from her preferences for drugs and AIDS. We see that her top policies for drugs and AIDS are highly punitive, but information campaigns and rehabilitation are preferred for alcoholism. These different perspectives are captured in the model with an estimated membership of 63% in subgroup 1 and 37% in subgroup 2.
Drug Policy  Alcohol Policy  AIDS Policy  

Priority 1  Punish Dealers  Inform Public  Punish behavior 
Priority 2  Penalize Users  Rehabilitate Alcoholics  Isolate Patients 
Priority 3  Treat Addicts  Restrict Sale  Treat AIDS 
Priority 4  Inform Public  Ban Advertisements  Fund Research 
Priority 5  Fight Social Causes  Increase Resources  Inform Public 
Priority 6  Control Medicine  
Priority 7  Fund Research 
7.7 Membership by Demographic Subgroup
The broad interpretation of our results agree with previous studies which have identified demographic characteristics associated with general dispositions toward penal ideology. To examine these demographic trends clearly, we filter out individuals whose membership in subgroups 4 and 5 (the noninformative subgroups) is over 50%. This leaves 10,448 of the 11,872 original individuals. We then examine the conditional membership of the remaining individuals in subgroups 1,2 and 3 (ie, ).
We first examine a self reported measure of Left vs Right political ideology. Individuals were asked: “In political matters, people talk of the left or the right. How would you place your views on this scale?” In the recorded scale, 1 indicates far left and 10 indicates far right. This is not a perfect analog since each individual likely responded in reference to their national definition of “center” whereas the subgroup membership estimated from the rank data is a global measure. Nonetheless, we see that there is a very significant Spearman’s rankorder correlation of .15 (pvalue ¡ 2e16) between selfreported Left vs Right score from Eurobarometer 34.1 and membership in the “Punitive subgroup.”
Religion  Attendance  Subgroup 1  Subgroup 2  Subgroup 3 

None  NA  0.36  0.45  0.19 
Orthodox  Irregular  0.21  0.58  0.21 
Orthodox  Regular  0.30  0.47  0.23 
Protestant  Irregular  0.40  0.40  0.20 
Protestant  Regular  0.53  0.31  0.16 
Roman Catholic  Irregular  0.40  0.39  0.21 
Roman Catholic  Regular  0.47  0.34  0.18 
We also examine the average membership across religious affiliation. Some speculate that AngloSaxon cultures are particularly punitive because of Protestant religions with strong Calvinistic overtones (Tonry, 2007) or fundamentalist beliefs (Grasmick et al., 1992). As shown in Table 2, almost all individuals in the survey report their religion as either Roman Catholic, None, Protestant or Orthodox. In addition to denomination, the Eurobarometer also recorded how often an individual attends religious services. We collapse the original categories of “Several times a week” and “Once a week” to a single “Regular attendance” category and collapse “Few times a year,” “Once a Year,” and “Never” into an “Irregular attendance” category. Table 9 shows that those who attend religious services regularly have a much higher average membership in subgroup 1. Also, Roman Catholics and Protestants are much more likely to belong to subgroup 1 than individuals who report no religion or Orthodox Christians. We note that roughly 90% of the individuals who responded as Orthodox Christians were Greek and roughly 97% of Greek respondents reported their religion as Orthodox Christianity. This confounding may be the cause of the particularly low subgroup 1 membership for Orthodox Christians.
In addition, we examine the average estimated membership across levels of education. The Eurobarometer asks “How old were you when you finished fulltime education?” The average membership in subgroup 1 (punitive) decreases steadily as education increases, a finding that is consistent with previous research on industrialized countries, including Western Europe (e.g. Mayhew and Van Kesteren, 2002; Kitschelt and Rehm, 2014).
Last Age of Formal Education  Subgroup 1  Subgroup 2  Subgroup 3 

16 or Less  0.47  0.33  0.19 
17 to 19  0.40  0.40  0.20 
20 to 21  0.33  0.48  0.20 
22 or older  0.26  0.54  0.20 
At the national level, the average memberships are also consistent with qualitative characterizations of national policy. The United Kingdom and Ireland have high average memberships in subgroup 1 (punitive), while Denmark and France are among a cluster of countries with low average memberships in subgroup 1. These findings are generally consistent with previous research using the International Crime Victimization Surveys, which finds punitive attitudes in the United Kingdom and Ireland, and nonpunitive attitudes in Denmark and France (e.g. Roberts, 2013). In slight contrast to that work, we find Belgium to have a relatively high membership in subgroup 1, and the Netherlands to have a relatively low membership.
8 Discussion
In this article, we propose a mixed membership model for multivariate rank data and develop a variational EM estimation approach that is a computationally attractive alternative to fully Bayesian estimation for large scale rank data. Mixed membership models provide valuable insights into latent structure within a heterogeneous population and allow for a richer description when compared to previous mixture model approaches. When MCMC is tractable for smaller data sets as in Gormley and Murphy (2009), the results provide direct samples from the posterior. Nevertheless, the demands placed on human and computer time to conduct such an analysis can be substantial, and scalability of MCMC methods is poor. Ultimately, a mixed membership analysis of larger data sets necessitate other approaches. Of course, what actually qualifies as “large scale” or “big data” is dependent on the complexity of analysis. In rank data, the complexity quickly grows as the number of variables and alternatives increase.
In addition to the computational gains, the proposed method extends the method of Gormley and Murphy (2009) to explicitly fit the Dirichlet membership parameter . Unless there is strong prior knowledge about subgroup sizes, this extension can result in better fitting models by directly capturing from the data differences in the subgroup structure and the level of intragroup mixing. A direct comparison of both goodness of fit and computational effort is provided in the supplement (Wang and Erosheva, 2016).
To accommodate multivariate ranked data, our model makes the simplest assumption that all context indicators are drawn from the same multinomial distribution governed by a single membership vector . An alternative and more complex model might include an additional layer of hierarchy between and for each variable . This would allow context indicators from separate variables to be drawn from different distributions, while still respecting the multivariate structure.
There are drawbacks, however, to the variational approximation. Because of the multimodal objective function, many random restarts should be used. Prior knowledge can be used to select good initialization points, but finding a global maximum is not guaranteed. We propose a twostep procedure for initialization, but addressing multimodality through stochastic optimization methods (Bottou, 2010) or placing strong priors on the support parameters to induce “smoothness” in the ELBO are two natural extensions.
Also, unlike a full Bayesian specification, the variational EM method does not provide a posterior for the global parameters. Frequentist uncertainty estimates, however, can still be achieved through a bootstrap procedure, but each bootstrapped model must be carefully initialized to avoid overestimating variability. To our knowledge, bootstrapping with variational estimation has not been previously used in the existing literature.
As with any mixture or mixed membership model, selecting the number of subgroups is difficult. Our model selection procedure involves crossvalidation of the heldout ELBO. This procedure, however, can be complicated by the multimodality of the objective function and the selected model might depend on the specific test and training sets. BIC procedures are also widely used although the theoretical justification does not hold in mixed membership models (Airoldi et al., 2015). Alternative approaches include stabilitybased measures (Lange et al., 2002), direct goodnessoffit measures (Cohen and Mallows, 1983), and nonparametric model extensions such as those based on Dirichlet processes (Teh et al., 2006).
Analyzing the Eurobarometer 34.1 data, we find three informative policy preference subgroups as well as substantial support for a uniform ranking group and a presentationordered group. The three informative subgroups primarily favor punitive policies, information campaigns, and rehabilitation and research, respectively. When comparing subgroup membership to educational, religious and national demographic information, we see trends which generally agree with the existing literature. In particular, fewer years of formal education and more religious participation is generally associated with more punitive attitudes towards social issues. In addition, at the national level, average subgroup membership roughly agrees with previous characterizations of national punitive attitudes.
Finally, our analysis has implications for survey development. Because of a sizable presentationordered subgroup in our analysis, we recommend randomizing the presentation of choices when collecting rank data to decrease bias due to noninformative responses where respondents rank choices by simply following the presentation order. We also note that the variable with the largest proportion of presentationordered responses is the question regarding illegal drugs which also happens to allow up to 7 rankings, while the other two questions only allow up to 5 ranking levels. This observation naturally leads to speculation of whether decreasing the number of ranking levels and cognitive load may ultimately lead to more “informative” responses.
Although our analysis focused on issues within political science, sociology, and public health, multivariate rank data can elicit and capture a rich representation of individual preferences. We believe that the proposed methodology will be of broad interest. Psychologists, economists, other social scientists, and marketing professionals who analyze large scale rank data can rely on the proposed methodology to represent large scale ranked preferences with realistic models which are still parsimonious and easily interpretable.
References
 Airoldi et al. (2008) {barticle}[author] \bauthor\bsnmAiroldi, \bfnmEdoardo M.\binitsE. M., \bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M., \bauthor\bsnmFienberg, \bfnmStephen E.\binitsS. E. \AND\bauthor\bsnmXing, \bfnmEric P.\binitsE. P. (\byear2008). \btitleMixed membership stochastic blockmodels. \bjournalJournal of Machine Learning Research \bvolume9 \bpages1981–2014. \bdoi10.1145/1390681.1442798 \endbibitem
 Airoldi et al. (2015) {bincollection}[author] \bauthor\bsnmAiroldi, \bfnmEdoardo M.\binitsE. M., \bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M., \bauthor\bsnmErosheva, \bfnmElena A.\binitsE. A. \AND\bauthor\bsnmFienberg, \bfnmStephen E.\binitsS. E. (\byear2015). \btitleIntroduction to mixed membership models and methods. In \bbooktitleHandbook of mixed membership models and their applications. \bseriesChapman & Hall/CRC Handb. Mod. Stat. Methods \bpages3–13. \bpublisherCRC Press, Boca Raton, FL. \bmrnumber3380022 \endbibitem
 Beal (2003) {bphdthesis}[author] \bauthor\bsnmBeal, \bfnmMatthew J.\binitsM. J. (\byear2003). \btitleVariational algorithms for approximate Bayesian inference \btypePhD thesis, \bpublisherUniversity College London, UK. \endbibitem
 Blei and Lafferty (2005) {binproceedings}[author] \bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M. \AND\bauthor\bsnmLafferty, \bfnmJohn D.\binitsJ. D. (\byear2005). \btitleCorrelated topic models. In \bbooktitleAdvances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 58, 2005, Vancouver, British Columbia, Canada] \bpages147–154. \endbibitem
 Blei, Ng and Jordan (2003) {barticle}[author] \bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M., \bauthor\bsnmNg, \bfnmAndrew Y.\binitsA. Y. \AND\bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2003). \btitleLatent Dirichlet allocation. \bjournalJournal of Machine Learning Research \bvolume3 \bpages993–1022. \endbibitem
 Bottou (2010) {bincollection}[author] \bauthor\bsnmBottou, \bfnmLéon\binitsL. (\byear2010). \btitleLargescale machine learning with stochastic gradient descent. In \bbooktitleProceedings of COMPSTAT 2010 \bpages177–186. \bpublisherSpringer. \endbibitem
 Brooks and Manza (2008) {bbook}[author] \bauthor\bsnmBrooks, \bfnmClem\binitsC. \AND\bauthor\bsnmManza, \bfnmJeff\binitsJ. (\byear2008). \btitleWhy welfare states persist: the importance of public opinion in democracies. \bpublisherUniversity of Chicago Press. \endbibitem
 Burstein (1998) {barticle}[author] \bauthor\bsnmBurstein, \bfnmPaul\binitsP. (\byear1998). \btitleBringing the public back in: should sociologists consider the impact of public opinion on public policy? \bjournalSocial forces \bvolume77 \bpages27–62. \endbibitem
 Busse, Orbanz and Buhmann (2007) {binproceedings}[author] \bauthor\bsnmBusse, \bfnmLudwig M.\binitsL. M., \bauthor\bsnmOrbanz, \bfnmPeter\binitsP. \AND\bauthor\bsnmBuhmann, \bfnmJoachim M.\binitsJ. M. (\byear2007). \btitleCluster analysis of heterogeneous rank data. In \bbooktitleMachine Learning, Proceedings of the TwentyFourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 2024, 2007 \bpages113–120. \bdoi10.1145/1273496.1273511 \endbibitem
 Caron, Teh and Murphy (2014) {barticle}[author] \bauthor\bsnmCaron, \bfnmFrançois\binitsF., \bauthor\bsnmTeh, \bfnmYee Whye\binitsY. W. \AND\bauthor\bsnmMurphy, \bfnmThomas Brendan\binitsT. B. (\byear2014). \btitleBayesian nonparametric PlackettLuce models for the analysis of preferences for college degree programmes. \bjournalAnn. Appl. Stat. \bvolume8 \bpages1145–1181. \bdoi10.1214/14AOAS717 \bmrnumber3262549 \endbibitem
 Cavadino and Dignan (2006) {barticle}[author] \bauthor\bsnmCavadino, \bfnmMichael\binitsM. \AND\bauthor\bsnmDignan, \bfnmJames\binitsJ. (\byear2006). \btitlePenal policy and political economy. \bjournalCriminology and Criminal Justice \bvolume6 \bpages435–456. \endbibitem
 Cohen and Mallows (1983) {barticle}[author] \bauthor\bsnmCohen, \bfnmAyala\binitsA. \AND\bauthor\bsnmMallows, \bfnmCL\binitsC. (\byear1983). \btitleAssessing goodness of fit of ranking models to data. \bjournalThe Statistician \bpages361–374. \endbibitem
 Erosheva, Fienberg and Lafferty (2004) {barticle}[author] \bauthor\bsnmErosheva, \bfnmElena\binitsE., \bauthor\bsnmFienberg, \bfnmStephen\binitsS. \AND\bauthor\bsnmLafferty, \bfnmJohn\binitsJ. (\byear2004). \btitleMixedmembership models of scientific publications. \bjournalProceedings of the National Academy of Sciences of the United States of America \bvolume101 \bpages5220–5227. \endbibitem
 Erosheva, Fienberg and Joutard (2007) {barticle}[author] \bauthor\bsnmErosheva, \bfnmElena A.\binitsE. A., \bauthor\bsnmFienberg, \bfnmStephen E.\binitsS. E. \AND\bauthor\bsnmJoutard, \bfnmCyrille\binitsC. (\byear2007). \btitleDescribing disability through individuallevel mixture models for multivariate binary data. \bjournalAnn. Appl. Stat. \bvolume1 \bpages502–537. \bdoi10.1214/07AOAS126 \bmrnumber2415745 \endbibitem
 Gill (2008) {barticle}[author] \bauthor\bsnmGill, \bfnmJeff\binitsJ. (\byear2008). \btitleIs partialdimension convergence a problem for inferences from MCMC algorithms? \bjournalPolitical Analysis \bvolume16 \bpages153–178. \endbibitem
 Gormley and Murphy (2006) {barticle}[author] \bauthor\bsnmGormley, \bfnmIsobel Claire\binitsI. C. \AND\bauthor\bsnmMurphy, \bfnmThomas Brendan\binitsT. B. (\byear2006). \btitleAnalysis of Irish thirdlevel college applications data. \bjournalJ. Roy. Statist. Soc. Ser. A \bvolume169 \bpages361–379. \bdoi10.1111/j.1467985X.2006.00412.x \bmrnumber2225548 \endbibitem
 Gormley and Murphy (2008) {barticle}[author] \bauthor\bsnmGormley, \bfnmIsobel Claire\binitsI. C. \AND\bauthor\bsnmMurphy, \bfnmThomas Brendan\binitsT. B. (\byear2008). \btitleA mixture of experts model for rank data with applications in election studies. \bjournalAnn. Appl. Stat. \bvolume2 \bpages1452–1477. \bdoi10.1214/08AOAS178 \bmrnumber2655667 \endbibitem
 Gormley and Murphy (2009) {barticle}[author] \bauthor\bsnmGormley, \bfnmIsobel Claire\binitsI. C. \AND\bauthor\bsnmMurphy, \bfnmThomas Brendan\binitsT. B. (\byear2009). \btitleA grade of membership model for rank data. \bjournalBayesian Anal. \bvolume4 \bpages265–295. \bdoi10.1214/09BA410 \bmrnumber2507364 \endbibitem
 Grasmick et al. (1992) {barticle}[author] \bauthor\bsnmGrasmick, \bfnmHarold G\binitsH. G., \bauthor\bsnmDavenport, \bfnmElizabeth\binitsE., \bauthor\bsnmChamlin, \bfnmMitchell B\binitsM. B. \AND\bauthor\bsnmBursik, \bfnmRobert J\binitsR. J. (\byear1992). \btitleProtestant fundamentalism and the retributive doctrine of punishment. \bjournalCriminology \bvolume30 \bpages21–46. \endbibitem
 Gross and ManriqueVallier (2015) {bincollection}[author] \bauthor\bsnmGross, \bfnmJustin H.\binitsJ. H. \AND\bauthor\bsnmManriqueVallier, \bfnmDaniel\binitsD. (\byear2015). \btitleA mixed membership approach to the assessment of political ideology from survey responses. In \bbooktitleHandbook of mixed membership models and their applications. \bseriesChapman & Hall/CRC Handb. Mod. Stat. Methods \bpages119–139. \bpublisherCRC Press, Boca Raton, FL. \bmrnumber3380027 \endbibitem
 Guiver and Snelson (2009) {binproceedings}[author] \bauthor\bsnmGuiver, \bfnmJohn\binitsJ. \AND\bauthor\bsnmSnelson, \bfnmEdward\binitsE. (\byear2009). \btitleBayesian inference for PlackettLuce ranking models. In \bbooktitleProceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 1418, 2009 \bpages377–384. \bdoi10.1145/1553374.1553423 \endbibitem
 Hunter (2004) {barticle}[author] \bauthor\bsnmHunter, \bfnmDavid R.\binitsD. R. (\byear2004). \btitleMM algorithms for generalized BradleyTerry models. \bjournalAnn. Statist. \bvolume32 \bpages384–406. \bdoi10.1214/aos/1079120141 \bmrnumber2051012 \endbibitem
 Kitschelt and Rehm (2014) {barticle}[author] \bauthor\bsnmKitschelt, \bfnmHerbert\binitsH. \AND\bauthor\bsnmRehm, \bfnmPhilipp\binitsP. (\byear2014). \btitleOccupations as a site of political preference formation. \bjournalComparative Political Studies \bvolume47 \bpages1670–1706. \endbibitem
 Lange et al. (2002) {binproceedings}[author] \bauthor\bsnmLange, \bfnmTilman\binitsT., \bauthor\bsnmBraun, \bfnmMikio L.\binitsM. L., \bauthor\bsnmRoth, \bfnmVolker\binitsV. \AND\bauthor\bsnmBuhmann, \bfnmJoachim M.\binitsJ. M. (\byear2002). \btitleStabilitybased model selection. In \bbooktitleAdvances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, December 914, 2002, Vancouver, British Columbia, Canada] \bpages617–624. \endbibitem
 Luce (1977) {barticle}[author] \bauthor\bsnmLuce, \bfnmR. Duncan\binitsR. D. (\byear1977). \btitleThe choice axiom after twenty years. \bjournalJ. Mathematical Psychology \bvolume15 \bpages215–233. \bmrnumber0462675 \endbibitem
 Marden (1995) {bbook}[author] \bauthor\bsnmMarden, \bfnmJohn I.\binitsJ. I. (\byear1995). \btitleAnalyzing and modeling rank data. \bseriesMonographs on Statistics and Applied Probability \bvolume64. \bpublisherChapman & Hall, London. \bmrnumber1346107 \endbibitem
 Mayhew and Van Kesteren (2002) {barticle}[author] \bauthor\bsnmMayhew, \bfnmPat\binitsP. \AND\bauthor\bsnmVan Kesteren, \bfnmJohn\binitsJ. (\byear2002). \btitleCrossnational attitudes to punishment. \bjournalChanging attitudes to punishment \bpages63–92. \endbibitem
 Meila and Chen (2010) {binproceedings}[author] \bauthor\bsnmMeila, \bfnmMarina\binitsM. \AND\bauthor\bsnmChen, \bfnmHarr\binitsH. (\byear2010). \btitleDirichlet process mixtures of generalized mallows models. In \bbooktitleUAI 2010, Proceedings of the TwentySixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, July 811, 2010 \bpages358–367. \endbibitem
 Nocedal and Wright (1999) {bbook}[author] \bauthor\bsnmNocedal, \bfnmJorge\binitsJ. \AND\bauthor\bsnmWright, \bfnmStephen J.\binitsS. J. (\byear1999). \btitleNumerical optimization. \bseriesSpringer Series in Operations Research. \bpublisherSpringerVerlag, New York. \bdoi10.1007/b98874 \bmrnumber1713114 \endbibitem
 Plackett (1975) {barticle}[author] \bauthor\bsnmPlackett, \bfnmR. L.\binitsR. L. (\byear1975). \btitleThe analysis of permutations. \bjournalJ. Roy. Statist. Soc. Ser. C Appl. Statist. \bvolume24 \bpages193–202. \bmrnumber0391338 \endbibitem
 Reif and Melich (2001) {bmisc}[author] \bauthor\bsnmReif, \bfnmKarlheinz\binitsK. \AND\bauthor\bsnmMelich, \bfnmAnna\binitsA. (\byear2001). \btitleEurobarometer 34.1: Health Problems, Fall 1990. \bdoi10.3886/ICPSR09577.v1 \endbibitem
 Roberts (2013) {barticle}[author] \bauthor\bsnmRoberts, \bfnmJulian V\binitsJ. V. (\byear2013). \btitlePublic opinion and the nature of community penalties: nternational findings. \bjournalChanging Attitudes to Punishment \bpages33. \endbibitem
 Sen (2014) {bbook}[author] \bauthor\bsnmSen, \bfnmAmartya Kumar\binitsA. K. (\byear2014). \btitleCollective choice and social welfare \bvolume11. \bpublisherElsevier. \endbibitem
 R Core Team (2016) {bmanual}[author] \bauthor\bsnmR Core Team (\byear2016). \btitleR: a language and environment for statistical computing \bpublisherR Foundation for Statistical Computing, \baddressVienna, Austria. \endbibitem
 Teh et al. (2006) {barticle}[author] \bauthor\bsnmTeh, \bfnmYee Whye\binitsY. W., \bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I., \bauthor\bsnmBeal, \bfnmMatthew J.\binitsM. J. \AND\bauthor\bsnmBlei, \bfnmDavid M.\binitsD. M. (\byear2006). \btitleHierarchical Dirichlet processes. \bjournalJ. Amer. Statist. Assoc. \bvolume101 \bpages1566–1581. \bdoi10.1198/016214506000000302 \bmrnumber2279480 \endbibitem
 Tonry (2007) {barticle}[author] \bauthor\bsnmTonry, \bfnmMichael\binitsM. (\byear2007). \btitleDeterminants of penal policies. \bjournalCrime and Justice \bvolume36 \bpages1–48. \endbibitem
 Wainwright and Jordan (2008) {barticle}[author] \bauthor\bsnmWainwright, \bfnmMartin J.\binitsM. J. \AND\bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2008). \btitleGraphical models, exponential families, and variational inference. \bjournalFoundations and Trends in Machine Learning \bvolume1 \bpages1–305. \bdoi10.1561/2200000001 \endbibitem
 Wang and Blei (2015) {barticle}[author] \bauthor\bsnmWang, \bfnmChong\binitsC. \AND\bauthor\bsnmBlei, \bfnmDavid M\binitsD. M. (\byear2015). \btitleA general method for robust Bayesian modeling. \bjournalarXiv preprint arXiv:1510.05078. \endbibitem
 Wang and Erosheva (2015) {bmanual}[author] \bauthor\bsnmWang, \bfnmY. Samuel\binitsY. S. \AND\bauthor\bsnmErosheva, \bfnmElena A.\binitsE. A. (\byear2015). \btitlemixedMem: tools for discrete multivariate mixed membership models \bnoteR package version 1.1.2. \endbibitem
 Wang and Erosheva (2016) {barticle}[author] \bauthor\bsnmWang, \bfnmY Samuel\binitsY. S. \AND\bauthor\bsnmErosheva, \bfnmElena A\binitsE. A. (\byear2016). \btitleSupplement to “A Variational EM method for mixed membership models with multivariate rank Data: an analysis of public policy preferences”. \endbibitem
 Zaller (1992) {bbook}[author] \bauthor\bsnmZaller, \bfnmJohn\binitsJ. (\byear1992). \btitleThe nature and origins of mass opinion. \bpublisherCambridge university press. \endbibitem
Appendix
Derivation of Lower Bound on Marginal LogLikelihood (ELBO)
The derivation of the lower bound from equation 4 is shown here. The lower bound is
Note that denotes the gamma function, denotes the digamma function, and denotes the PlackettLuce mass function of variable j. denotes the observation of level rankings and indicates the alternative selected by individual i for variable j at ranking level n. Note that for all multinomial mass functions shown below, the size = 1.
We consider each piece of the lower bound separately. The log likelihood for the complete data is
(11)  
The expectation of the first term with respect to the variational distribution Q becomes
(12)  
The expectation of the second term with respect to the variational distribution Q becomes
(13)  
The third term is
(14)  
Now for the second term of the ELBO
(15)  