Topic Modeling on Health Journals with Regularized Variational Inference
Abstract.
Topic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic AuthorPersona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona — where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model’s personas to be distinct. Our results show significant improvements over competing topic models — particularly after regularization, and highlight the DAP model’s unique ability to capture common journeys shared by different authors.
1. Introduction
Topic models can compactly represent large collections of documents by the themes running through them. We introduce a topic model designed for the unique challenges presented by the CaringBridge (CB) dataset. The CB dataset includes journals written by patients and caregivers during a health crisis. CB journals function like a blog, and are shared to a private community of friends and family. The full dataset includes 13.1 million journals written by approximately half a million authors between 2006 and 2016. From the CB dataset we’re interested in capturing health journeys, that is, authors writing about the same topics over time.
The challenges in topic modeling on the CB dataset stem from the asynchronous nature of author’s posts. Specifically, authors start and stop journaling at different times — both in terms of calendar dates and how far along they are in their health journey. Additionally, authors post at irregular frequencies. While about 15% of CB authors post nearly everyday, the majority of authors typically post less frequently, often corresponding to a major update, event, or anniversary of an event. What’s more, the length of these posts can range from just a few words to thousands of words.
Stateoftheart topic models can identify topics (Blei et al., 2003), track how topics change over time (Blei and Lafferty, 2006; Wang and McCallum, 2006; Wei et al., 2007; Wang et al., 2008), or associate authors with certain topics (RosenZvi et al., 2004; Steyvers et al., 2004; McCallum et al., 2005; Mimno and McCallum, 2007). These models cannot, however, describe common narratives and the author’s sharing them. We present the Dynamic AuthorPersona topic model (DAP), a novel approach that represents authors by latent personas. Personas act as a softclustering on authors based their propensity to write about similar topics over time. Our approach is unique in multiple respects. First, unlike other temporal topic models, the words making up a topic don’t evolve over time — rather, DAP’s personas reflect the flow of conversation from one topic to next. Second, we introduce a regularized variational inference (RVI) algorithm, an approach we use to encourage personas to be distinct from one another.
Our results show that the DAP model outperforms competing topic models, producing better likelihoods on heldout data. Finally, we demonstrate that using RVI further improves the DAP model’s performance, and results in personas that are rich and compelling descriptions of the health journeys experienced by CB authors.
The rest of the paper is as follows: in Section 2, a brief background on temporal topic models is given. Section 3 presents the DAP model. Section 4 details the model’s RVI algorithm. Section 5 introduces the evaluation dataset and procedure. Section 6 shares the results of the experiments. Finally, in Section 7 we summarize the contributions of this paper.
2. Background
Much of the research on topic modeling builds on the latent Dirichlet allocation (LDA) model (Blei et al., 2003). The LDA model doesn’t account for metainformation like authorship or time. Nevertheless, interest in LDA has endured, in part, due its ability to richly describe topics as distributions over words and documents as mixtures of topics. In the years since LDA’s introduction, others have extended the idea to compliment corpora with a variety of structures and metadata.
Author information is common in many corpora. A few topic models are designed to identify authors’ preferences for certain topics, and the relationships between authors (RosenZvi et al., 2004; Steyvers et al., 2004; McCallum et al., 2005; Mimno and McCallum, 2007; Pathak et al., 2008). Corpora with a temporal structure, such as scientific journals or newspaper articles, are the focus of many of temporal topic models (Blei and Lafferty, 2006; Wang and McCallum, 2006; Wei et al., 2007; Wang et al., 2008).
Temporal Topic Models. Two topic models set the standard of comparison for topic modeling on corpora with a temporal element: the dynamic topic model (DTM) (Blei and Lafferty, 2006) and the topics over time model TOT (Wang and McCallum, 2006). These two models represent very different approaches to modeling time in a topic model.
The TOT model defines time as an observed variable, which leads to a continuous treatment of time and the ability to predict timestamps of documents. Alternatively, the DTM evolves topics over time using a Markov process. In many corpora the evolution of topics provides interesting insights. For example, Blei’s model of the Science corpus shows words associated with a topic on physics changing over a century.
Building directly on the DTM, in 2008 Wang et al. developed the continuous time dynamic topic model (CDTM) which uses continuous Brownian motion to model the evolution of topics over time (Wang et al., 2008). This is a major development in temporal topic models because, unlike the DTM, it doesn’t require partitioning the data into discrete time periods. Instead, the model assumes that at each time step the variance in the topic proportions increases proportional to the duration since the previous document. Similar to Wang et al., the Dynamic Mixture Model (DMM) is built for continuous streams of text (Wei et al., 2007). In the DMM, however, topics are fixed in time and the model captures the evolution of documentlevel topic proportions over time.
Topic Modeling of Health Journeys. In many topic modeling applications to temporal corpora, the time component is ignored. For example, Wen et al. model cancer event trajectories from users of an online forum for breast cancer support (Wen and Rose, 2012). Wen’s approach uses LDA to extract cancer event keywords, which are then linked together in time by temporal descriptions mined from the text. This work demonstrates a quantitative approach to studying the dynamics of social support network, and offer a powerful look at the experiences of users in these support networks.
Numerous studies have shown that support networks, both in person and online, are valuable tools for those suffering from chronic conditions or lifethreating illness and caregivers (Wen et al., 2011; Rodgers and Chen, 2005; Beaudoin and Tao, 2007). Additionally, online social networks can serve as a way to efficiently disseminate information regarding someone’s status to their community. Understanding the health journeys of users in these social support communities is valuable information for improving user experience. Topic models are uniquely suited to succinctly describing and analyzing these health journeys.
3. The DAP Model
The design of the DAP model was made with journaling behavior in mind. Consider a CB author journaling about their surgery: initially they may write about topics related to the surgical procedure, but as time progresses the author is more likely to discuss recovery, physical therapy, or returning to normal life. In other words, the likelihood of a topic for some document depends where the document’s author is in their health journey. As such, DAP assumes that (1) a state space model controls the likelihood of a topic at each time step, (2) each persona represents a different flow of topics over time, and (3) each author has a distribution over personas.
The DAP’s approach for modeling topics in a document, and words in a topic follows the correlated topic model (CTM) and LDA, respectively (Lafferty and Blei, 2006; Blei et al., 2003). The idea of modeling latent personas was originally proposed by (Mimno and McCallum, 2007), however in their AuthorPersona Topic model (APT) personas differ significantly from those proposed in the DAP model. First, in the APT model each author may have a different number of personas, whereas DAP models each author as a distribution over a fixed number of personas — which acts as a soft clustering on authors. Second, APT assigns a persona to documents, which indirectly defines a topic distribution for each cluster of documents, whereas we model documents as each having their own distribution over topics. Lastly, while DAP’s personas also correspond to a distribution over topics, DAP evolves these topic distributions over time — thereby capturing the inherent temporal structure resulting from an author writing multiple documents.
The DAP model directly addresses the challenges presented by the CB dataset. First, the asynchronous nature of health journals is handled by: (1) transforming each journal’s timestamp to the time elapsed since the author’s first post, and (2) learning multiple personas to account for a wide variety in topic trajectories. Second, irregular posting behavior is managed by employing the Brownian motion model, originally used in topic modeling by (Wang et al., 2008), to model topic variance as proportional to the gap in time between documents.
The generative process of the model is described below. The model assumes that each document in the corpus has a timestamp associated with it. Similar to the CDTM (Wang et al., 2008), timestamps are used in a continuous Brownian motion model to capture an increase in topic variance as time between observations increases. More formally, if and are timestamps at steps , then is the difference in time between and . We use the shorthand to denote the difference in time between timestamps and . For brevity the variance is denoted , where is a known process noise in the state space model.

Draw distribution over words for each topic .

Draw distribution over personas for each author .

For each persona , draw initial distribution over topics:

For each time step , where :

Draw distribution over topics:

Update according to Brownian motion model: .

For each document , where :

Choose persona indicator where corresponds to the author of document .

Draw topic distribution for document .

For each word , where :

Choose word topic indicator .

Choose word from , a multinomial probability conditioned on the topic indicator .



Following the approach in the CTM and DTM, we use the function to map the Logistic Normal , parameterized by a mean and covariance , to the multinomial’s natural parameters via in order to obey the constraint that the parameters lie on the simplex.
The graphical model corresponding to this process is shown in Figure 1. In LDA and its extensions the parameter represents a prior probability of each topic. In the DAP model, takes on an expanded role: it’s a distribution over topics at time step for persona . The choice of letting evolve over time, as opposed to like in the DTM, is that in a collection of journals there is less interest in changes to topics themselves. In other words, we model the words associated with a topic as static in time, but the topics an author writes about will change over time.
4. Variational EM Algorithm
Given the model structure, next we derive an inference algorithm used to estimate the model’s latent parameters. Much like LDA and its extensions, the DAP model’s posterior:
is intractable due to the normalization term. In order learn optimal values to the model’s parameters we use a form of variational inference (VI), which approximates the difficult to compute posterior distribution with a simpler distribution (see Blei et al., 2016 for a review). Variational inference casts an inference problem as an optimization problem with the goal of finding parameters to the variational distribution such that closely approximates . Our regularized variational inference (RVI) algorithm seeks a distribution such that
(1) 
where is KLDivergence. The added term is a regularization function we’ve introduced to discourage similar personas (further detail given in Section 4.2), and the corresponding hyperparameter.
To make easy to compute, we apply mean field variational inference which assumes that the parameters are posteriori independent. Under the mean field assumption the variational distribution factorizes as:
(2) 
where we have introduced the following variational parameters: the persona for each author is endowed with a free Dirichlet parameter ; each assignment of a persona to an author is endowed with a free Multinomial parameter ; in the variational distribution of the sequential structure is kept intact with variational observations ; each documenttopic proportion vector is endowed with a free . The variance for the documenttopic parameters are and , for the model and variational parameter, respectively; each wordtopic indicator is endowed with a free multinomial parameter .
An optimal cannot be computed directly, but following Jordan et al. (1999) an optimization of the variational parameters proceeds by maximizing a bound on the loglikelihood of the data. In the DAP model, data is observed in words for each document at time step , hence we rewrite the loglikelihood of the data by:
(3) 
The inequality in (3) follows from Jensen’s inequality. Moreover, it can be shown that the difference between and is . Hence maximizing the bound in (3) is equivalent to minimizing the KL divergence between the variational and true posteriors. We denote the Evidence Lower BOund (ELBO) by . Since our objective defined in (1) includes a regularization term, we therefore maximize a surrogate likelihood consisting of the ELBO minus the regularization term (see Wainwright and Jordan, 2007 for a review of penalized surrogate likelihoods). Hence our objective function for some regularization is defined as:
(4) 
where expands for each term in the model, that is:
(5) 
And, similarly is the entropy term associated with each of the parameters. Some terms in (5) are simple, and wellknown from foundational topic models like LDA and CTM (Blei et al., 2003; Lafferty and Blei, 2006). For example, the topic distributions over words term is found in LDA, and in the DAP model the distributions over personas for each author follows a similar structure. Similarly, the nonconjugate pairs for wordtopic assignment has been studied in the CTM. For completeness, we show the expansion of the more unique terms and in the Appendix.
Expanding the objective function according to the distribution associated with each parameter allows updates to be derived for each parameter. The parameters are optimized using a variational expectationmaximization algorithm, the details of the algorithm are given below.
4.1. Variational EStep
During the Estep the model estimates variational parameters for each document and saves the sufficient statistics required to compute global parameters. The structure of the DAP model, while unique, has some components that mimic previous topic models. Specifically, the wordtopic assignment parameter has the same update found in the CTM due to the LogisticNormal parameter. Hence has a closed form update: (Lafferty and Blei, 2006).
Each author’s persona is parameterized by a . To find an update for we select ELBO terms featuring , and then take the derivative with respect to each document and persona. The terms in the ELBO containing are:
where the last term is the Lagrange constraint to ensure each vector must sum to one. Takings the derivative with respect to one specific document and persona we find that:
A closed form solution for doesn’t exist. We therefore estimate using exponential gradient descent.
Since the model includes nonconjugate terms (much like DTM, CDTM, and CTM), an additional variational parameter is introduced to preserve the lowerbound during the expansion of the term containing a nonconjugate pair: . Taking the derivative of all terms containing and setting it to zero yields an analogous closed form update to the one found in the CTM:
Finally, the DAP model estimates a topic distribution for each document via the parameter. To update the terms in the ELBO featuring are selected:
Taking a derivative of these terms with respect to yields:
(6) 
Since a closed form solution isn’t available, a conjugate gradient algorithm is run using the gradient in (6).
Whereas represents the mean of the LogisticNormal for a document’s topic distribution, the parameter is the variance. The ELBO terms featuring are:
Therefore, setting the derivative of with respect to to zero and solving yields:
which requires Newton’s method for each coordinate, constrained such that .
The parameter represents the noisy estimate of . After calculating , the forward and backward equations will be applied in the Mstep to give a final posterior estimate . The terms in the ELBO containing are found by expanding for (7a) and for (7b) and (7c):
(7a)  
(7b)  
(7c) 
Taking the derivative with respect to the mean term for each persona gives the closed form update:
(8) 
We solve for sequentially over time steps. For the initial time step , we use the prior in place of . Note that the summations in (8) are collected during the Estep and need only be computed once after performing inference on all documents.
4.2. Regularized Variation Inference
Our RVI algorithm nudges to find topic distributions that are different for each persona. A natural choice for capturing this idea is an inner product between each of the personas (excluding a persona with itself). Hence, we define the regularization function by:
(9) 
The parameter in included in the regularization for two reasons. First, it simplifies the update to . In (7) the term appears in every term, which allows it to be factored out and canceled. By including in the regularization the same cancellation can occur. Second, since then its inclusion has the effect of encouraging personas to be orthogonal to one another. We include the number of documents at time in so that the regularization is applied evenly, regardless of dataset size or a skewed distribution of documents over time. After including the regularization term in (9) with the ELBO terms in (7), the regularized update is:
(10) 
Since the vector (of length ) is computed during the Estep, then the RHS is known. Similarly, the term is known, and in combination with form the weights over the unknown vector , also of length . Therefore, (10) can be solved as a system of linear equations. Through experiments we’ve found an optimal value of . The model exhibits sensitivity to the hyperparameter , if is large (e.g. ) then model quality drops due to personas overfitting to a single topic. Since is only used to estimate the global parameter during the Mstep, computing isn’t necessary for inference on holdout datasets.
4.3. MStep
In the Mstep the global parameters , , and are updated such that the lower bound of the log likelihood of the data is maximized. Note, the update for is exactly the same as derived for the LDA model, and hence omitted.
The parameter represents the distribution over personas for each author. The closed form update for is:
shows that ’s closed form update is an average of the persona assignments, smoothed by the authorpersona prior .
Once the variational observations are computed, our approach follows the variational Kalman filtering method from Wang’s Continuous Time Dynamic Topic Model, see Appendix for further details. Specifically, we employ the Brownian motion to model time dynamics. However, because the DAP model’s timevarying parameter is a distribution over latent topics, it performs best on data discretized in time (resulting in a smaller ). The forward equations mimic a Kalman filter:
where is the known process noise, and captures the increase in variance as time between data points grows. Finally, the backward equations:
give the updates to the remaining global parameters.
5. Experiments
5.1. CaringBridge Dataset
The creation of our model is inspired by a desire to discover topics on a unique dataset consisting of 14 million journals posted by half a million authors on the social networking site CaringBridge (CB). Established in 1997, CaringBridge is a 501(c)(3) nonprofit organization focused on connecting people and reducing the feelings of isolation that are often associated with a patient’s health journey. Due to their content, CB data has been anonymized prior to analysis.
From the CB dataset we draw an evaluation dataset consisting of journals written by authors who posted, on average, at least twice a month over a one year period. Journal posts are only kept if they contain 10 or more words. These constraints help identify a set of active users. From the 123K authors meeting these criteria, 2,000 were randomly selected. Journals written by these 2,000 authors total 114,532. Overall, authors in this dataset journal an average of 57 times, with a mean of 5 days between journal posts.
The journal texts were preprocessed in a standard way: any HTML and nonascii (including emojis) were removed; hyphenated words and contractions were split; excess whitespace was ignored; texts were tokenized and common stopwords along with words appearing in over 90% of documents were removed; all punctuation was stripped; and, words were reduced to their lemmas. Finally, the texts were transformed to a bag of words format, keeping the 5,000 most used words as a vocabulary set. Because the dataset includes journals written between 2006 and 2016 the timestamps are transformed into a relative value and discretized, reflecting the number of weeks since an authors first journal.
5.2. Evaluation
Journals are split into training and test sets with 90% of each author’s journals () for training and 10% () for testing. Further, variance in model performance is estimated by repeating this splitting procedure for 10fold cross validation.
The performance of our model is compared to three other models representing the stateoftheart in this area. The first model for comparison is LDA, which ignores authorship and temporal structure in the data. In order to evaluate LDA’s performance over time, we train LDA on time steps up through and testing on time step (similar to the evaluation method in Wang and McCallum, 2006). The DTM also serves as an important baseline for comparison because it models the evolution of topics over discrete time steps. Lastly, we compare out model to CDTM, which builds on DTM and introduces a continuous treatment of time. Following the approach of others, we simply fix the number of topics at 25 for all models. The number of personas learned by the DAP model is fixed at 15.
To evaluate the models we compute the perword loglikelihood () on heldout data, which measures how well the model fits the data and is computed by . Note that perplexity, another common metric used to compare topic models, is related to via . It has been shown that perplexity (and hence s) don’t correlate with a model finding coherent topics (Chang et al., 2009). Nevertheless, s provide a fair way to compare how well each model optimizes their objective functions.
6. Results
In addition evaluating model fit, we perform a qualitative analysis of the DAP model to highlight the quality and usefulness of the personas discovered. In particular, we establish that the personas are unique from oneanother and capture meaningful experiences shared by authors.
6.1. Model Comparison
Model  Perword LogLikelihood  Std. Dev. 

DAP (=0.0)  7.22  0.04 
DAP (=0.2)  6.47  0.04 
LDA  9.23  0.02 
DTM  9.65  0.03 
CDTM  8.82  0.03 
In Table 1 we list the perword loglikelihood and standard deviation between crossvalidation sets for each of the competing models. There is a significant improvement in the DAP model’s performance after regularization. Further analysis of the likelihood computation reveals that the regularization term contributes a relatively small drop in likelihood compared to the total likelihood during training. Nevertheless, these results show that even a small amount of regularization can nudge the model to seek out quality results. In testing additional values we found that, in general, faired comparably. Larger values of can cause model instability and the document likelihoods to have longtailed distributions. The emergence of outlier documentlikelihoods is unsurprising, regularization encourages the personas to focus on different topics — hence, large values of inevitably result in personas that overfit.
Figure 2 shows mean perword loglikelihoods at each time step. The best performing DAP model shows consistently better results over competing models. However, the unregularized DAP model has a significant drop in performance in the first time step.
6.2. Persona Quality
Community Support  Physical Therapy  Reflect on Life  Hopeful Prayer  Family Fun  Infection  Weather  School 
family  therapy  life  god  christmas  blood  nice  school 
friend  rehab  know  pray  play  infection  weather  shot 
church  therapist  child  prayer  birthday  fluid  walk  go 
thank  physical  never  lord  game  fever  lunch  appt 
card  pt  love  bless  fun  antibiotic  cold  class 
love  chair  year  please  kid  pressure  snow  tomorrow 
service  speech  live  heal  party  kidney  outside  grandma 
friends  progress  people  trust  year  iv  breakfast  teacher 
support  move  cancer  peace  enjoy  lung  rain  home 
gift  arm  moment  continue  dinner  clot  go  aunt 
Cancer (clinical)  Cancer (general)  Intensive Care  Well Wishes  Hair Loss  Surgery  Bedtime  Weight 
chemo  cancer  tube  dad  hair  surgery  sleep  weight 
blood  treatment  breathe  mom  leg  surgeon  night  mommy 
count  radiation  oxygen  everyone  wear  heart  bed  gain 
bone  scan  lung  message  head  dr  wake  feed 
marrow  chemo  feed  guestbook  look  office  nurse  daddy 
platelet  tumor  x_ray  please  cut  op  say  bottle 
round  oncologist  chest  prayer  knee  procedure  asleep  pound 
clinic  dr  nurse  read  hat  cardiologist  _time_  feeding 
transfusion  ct  vent  visit  wig  valve  room  oz 
_url_  result  stomach  update  shave  ha  tell  milk 
To evaluate the quality of personas, we focus on three key elements: authors are described by one clear persona; personas are distinct from one another, as shown in the combination of topics most associated with that persona; and personas capture a coherent health journey experienced by authors.
1:1 AuthorPersona Mappings. Authors are modeled as a distribution over personas; however, to create interpretable results we want these distributions to focus on a single persona. The DAP model achieves this in the majority of cases: 71% of authors are concentrated on a single persona (% probability for that persona), and 27% of authors are evenly split between two personas. This shows that, in general, the model finds personas that generalize well enough to describe the majority of authors.
Distinct Personas. The DAP model includes a regularization term specifically for encouraging personas with unique combinations of topics. We examined the top three topics associated with each persona. In the unregularized model, the 15 personas are only a mix of 6 different topics. In fact, a topic on ”Weather” appears as a common topic for all 15 personas. On the other hand, the regularized DAP model’s personas are a mix of 18 different topics. Further, the most frequently appearing topic is ”Cancer (general)” (in 6 of 15 personas), which is appropriate given that approximately half of authors report cancer as a health condition.
Personas Reflect Coherent Health Journeys. In Figure 3 we show the top three topics evolving over time for selected personas. The labels listed for each topic are created manually based on words and journals most associated with the topic. Words most associated with each topic are listed in Tables 2. The persona plots in Figure 3 paint a compelling picture of common health journeys experienced by CB users.
Personas reflect broad trends, often encompassing a range of health journeys. Consider Persona 9, which reflects health journeys beginning with a physical element, such as physical therapy or a health issue taking a physical toll, followed by intensive care and attention to weight. Many Persona 9 authors begin physical therapy following an accident, or are caring for a premature baby or child with a congenital disorder. However, there are a number of rare disorders that follow Persona 9’s pattern. For instance, one Persona 9 author writes about a family member with GuillainBarré syndrome, a rare rapidonset disorder in which the immune system attacks the nervous system resulting in muscle pain, weakness, and even paralysis. The syndrome often requires admittance to an intensive care unit, followed by rehabilitation – all common themes of Persona 9.
7. Conclusion
The Dynamic AuthorPersona topic model is uniquely suited to modeling text data with a temporal structure and written by multiple authors. Unlike previous temporal topic models, DAP discovers latent personas — a novel component that identifies authors with similar topics trajectories. Our RVI algorithm further improves the DAP model’s performance over competing models and results in the discovery of distinct personas. In evaluating the DAP model, we introduce the CaringBridge dataset: a massive collection of journals written by patients and caregivers, many of who face serious, lifethreatening illnesses. From this dataset the DAP model extracts compelling descriptions of health journeys.
Many opportunities exist for further research. Currently, we deal with nonconjugate terms using the approach established in the CTM. Recent advances in nonconjugate inference (Ranganath et al., 2013; Kingma and Welling, 2014; Khan et al., 2015; Khan and Lin, 2017; Srivastava and Sutton, 2017) may lead to a more efficient approach to dealing with these difficult terms.
Acknowledgments
We thank reviewers for their valuable comments, University of Minnesota Supercomputing Institute (MSI) for technical support, and CaringBridge for their support and collaboration. The research was supported by NSF grants IIS1563950, IIS1447566, IIS1447574, IIS1422557, CCF1451986, CNS1314560.
References
 (1)
 Beaudoin and Tao (2007) Christopher E. Beaudoin and ChenChao Tao. 2007. Benefiting from Social Capital in Online Support Groups: An Empirical Study of Cancer Patients. CyberPsychology & Behavior 10, 4 (2007), 587–590. http://dev.anet.ua.ac.be/eds/eds.phtml?url=http://search.ebscohost.com/login.aspx?direct=true
 Blei et al. (2016) David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. 2016. Variational Inference: A Review for Statisticians. arXiv (2016), arXiv:1601.00670v1 [stat.CO]. arXiv:1601.00670 http://arxiv.org/abs/1601.00670
 Blei and Lafferty (2006) David M Blei and John D Lafferty. 2006. Dynamic Topic Models. International Conference on Machine Learning (2006), 113–120. https://doi.org/10.1145/1143844.1143859 arXiv:arXiv:0712.1486v1
 Blei et al. (2003) David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 45 (2003), 993–1022. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
 Chang et al. (2009) Jonathan Chang, Sean Gerrish, Chong Wang, and David M Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems 22 (2009), 288—296. arXiv:arXiv:1011.1669v3 http://www.umiacs.umd.edu/
 Jordan et al. (1999) Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. Introduction to variational methods for graphical models. Machine Learning 37, 2 (1999), 183–233. arXiv:arXiv:1103.5254v3
 Khan and Lin (2017) Mohammad Khan and Wu Lin. 2017. ConjugateComputation Variational Inference : Converting Variational Inference in NonConjugate Models to Inferences in Conjugate Models. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics 54 (2017), 878–887. arXiv:1703.04265 http://proceedings.mlr.press/v54/khan17a.html
 Khan et al. (2015) Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, and Masashi Sugiyama. 2015. Faster Stochastic Variational Inference using ProximalGradient Methods with General Divergence Functions. (2015). arXiv:1511.00146 http://arxiv.org/abs/1511.00146
 Kingma and Welling (2014) Diederik P Kingma and Max Welling. 2014. AutoEncoding Variational Bayes. Proceedings of the 2nd International Conference on Learning Representations (ICLR) (dec 2014), 1–14. arXiv:1312.6114 http://arxiv.org/abs/1312.6114
 Lafferty and Blei (2006) John D. Lafferty and David M. Blei. 2006. Correlated Topic Models. Advances in Neural Information Processing Systems 18 (2006), 147–154. https://doi.org/10.1145/1143844.1143859 arXiv:arXiv:0712.1486v1
 McCallum et al. (2005) a. McCallum, a. CorradaEmmanuel, and Xuerui Wang. 2005. The authorrecipienttopic model for topic and role discovery in social networks: Experiments with enron and academic email. NIPS’04 Workshop on’Structured Data and Representations in Probabilistic Models for Categorization (2005), 1–16. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.5833
 Mimno and McCallum (2007) David Mimno and A McCallum. 2007. Expertise modeling for matching papers with reviewers. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (2007), 500–509. papers2://publication/uuid/9961263BC8374F5190915D6B42917F1A
 Pathak et al. (2008) Nishith Pathak, Colin DeLong, Kendrick Erickson, and Arindam Banerjee. 2008. Social topic models for community extraction. The 2nd SNAKDD Workshop ’08 (SNAKDD’08) (2008).
 Ranganath et al. (2013) Rajesh Ranganath, Sean Gerrish, and David M Blei. 2013. Black Box Variational Inference. Aistats 33 (2013). arXiv:1401.0118 http://www.cs.columbia.edu/
 Rodgers and Chen (2005) S Rodgers and Q Chen. 2005. Internet community group participation: Psychosocial benefits for women with breast cancer. Journal of ComputerMediated Communication 10, 4 (2005), 1–27. http://www.scopus.com/scopus/inward/record.url?eid=2s2.024144454821
 RosenZvi et al. (2004) M. RosenZvi, T. Griffiths, M. Steyvers, and P. Smyth. 2004. The authortopic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence (2004), 487–494. http://portal.acm.org/citation.cfm?id=1036902
 Srivastava and Sutton (2017) Akash Srivastava and Charles Sutton. 2017. Autoencoding Variational Inference For Topic Models. ICLR (mar 2017), 1–18. arXiv:1703.01488 http://arxiv.org/abs/1703.01488
 Steyvers et al. (2004) Mark Steyvers, Padhraic Smyth, Michal RosenZvi, and Thomas Griffiths. 2004. Probabilistic authortopic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 306–315.
 Wainwright and Jordan (2007) Martin J. Wainwright and Michael I. Jordan. 2007. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends® in Machine Learning 1, 1â2 (2007), 1–305. http://www.nowpublishers.com/article/Details/MAL001
 Wang et al. (2008) Chong Wang, David Blei, and David Heckerman. 2008. Continuous Time Dynamic Topic Models. Proc of UAI (2008), 579–586. arXiv:1206.3298 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.4535
 Wang and McCallum (2006) Xuerui Wang and Andrew McCallum. 2006. Topics over time: a nonMarkov continuoustime model of topical trends. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), 424–433. http://portal.acm.org/citation.cfm?doid=1150402.1150450
 Wei et al. (2007) Xing Wei, Jimeng Sun, and Xuerui Wang. 2007. Dynamic Mixture Models for Multiple Time Series. Ijcai (2007), 2909–2914.
 Wen et al. (2011) Kuang Yi Wen, Fiona McTavish, Gary Kreps, Meg Wise, and David Gustafson. 2011. From Diagnosis to Death: A Case Study of Coping With Breast Cancer as Seen Through Online Discussion Group Messages. Journal of ComputerMediated Communication 16, 2 (2011), 331–361.
 Wen and Rose (2012) Miaomiao Wen and Carolyn Penstein Rose. 2012. Understanding participant behavior trajectories in online health support groups using automatic extraction methods. Proceedings of the 17th ACM international conference on Supporting group work  GROUP ’12 (2012), 179. http://dl.acm.org/citation.cfm?id=2389176.2389205
Appendix A ELBO Terms Unique to the DAP Model
The expanion of the ELBO referenced in (5) includes a number of terms previously derived for the LDA and CTM (Blei et al., 2003; Lafferty and Blei, 2006). The DAP model’s introduction of personas, and the parameters , , and that govern them lead to a few new terms. Terms unique to the DAP model are detailed below.
a.1. Expanding
Expansion of the ELBO term is unique to the DAP model, and particularly challenging because the topic distribution for each document is drawn from a Gaussian with mean . Hence, the term is expanded to:
Note that the expectation in is over all the terms — that is, are not constants. Factorizing this expectation gives:
(11a)  
(11b)  
(11c) 
where each of the terms in (11) is evaluated below.
Term (11a): Since is a straightforward case of the Guassian quadratic identity:
where is the variational parameter for and is the variance parameter associated with the topic distribution over document .
Term (11b): Doesn’t take a Guassian quadratic form. To solve , recall that and are independent under the the meanfield assumption, thus:
Term (11c): Expanding the last term yields:
where denotes the variance of the terms. To evaluate , consider the variance between personas and , which simplifies the computation because and are scalars and refers to a column vector of :
The resulting covariance matrix has offdiagonal elements are all 0 since – a draw from a multinomial – is a 1 of vector. Elements along the diagonal are given by . Thus, for persona we have . Therefore term (11c) is:
Combining the three expanded terms from (11) can be reduced:
Finally, the ELBO term for is expanded out fully to:
a.2. Expanding
Expanding the ELBO term is similar to the DTM, and follows from the Gaussian quadratic form identity, which states that: .