Auto-Encoding Variational Bayes for Inferring Topics and Visualization

Auto-Encoding Variational Bayes for Inferring Topics and Visualization

Abstract

Visualization and topic modeling are widely used approaches for text analysis. Traditional visualization methods find low-dimensional representations of documents in the visualization space (typically 2D or 3D) that can be displayed using a scatterplot. In contrast, topic modeling aims to discover topics from text, but for visualization, one needs to perform a post-hoc embedding using dimensionality reduction methods. Recent approaches propose using a generative model to jointly find topics and visualization, allowing the semantics to be infused in the visualization space for a meaningful interpretation. A major challenge that prevents these methods from being used practically is the scalability of their inference algorithms. We present, to the best of our knowledge, the first fast Auto-Encoding Variational Bayes based inference method for jointly inferring topics and visualization. Since our method is black box, it can handle model changes efficiently with little mathematical rederivation effort. We demonstrate the efficiency and effectiveness of our method on real-world large datasets and compare it with existing baselines.

\colingfinalcopy\blfootnote

This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/.

1 Introduction

Visualization and topic modeling are important tools in the analysis of text corpora. Visualization methods, such as t-SNE [22], find low-dimensional representations of documents in the visualization space (typically 2D or 3D) that can be displayed using a scatterplot. Such visualization is useful for exploratory tasks. However, there is a lack of semantic interpretation as those visualization methods do not extract topics. In contrast, topic modeling aims to discover semantic topics from text, but for visualization, one needs to perform a post-hoc embedding using dimensionality reduction methods. Since this pipeline approach may not be ideal, there has been recent interest in jointly inferring topics and visualization using a single generative model [10]. This joint approach allows the semantics to be infused in the visualization space where users can view documents and their topics. The problem of jointly inferring topics and visualization can be formally stated as follows.

Problem. Let denote a finite set of documents and let be a finite vocabulary from these documents. Given a number of topics , and visualization dimension , we want to find:

  • For topic modeling: latent topics, and their word distributions collectively denoted as , topic distributions of documents collectively denoted as , and

  • For visualization: -dimensional visualization coordinates for documents , and topics such that the distances between documents, topics in the visualization space reflect the topic-document distributions .

To solve this problem, PLSV (Probabilistic Latent Semantic Visualization) is the first model that attempts to tie together all latent variables of topics and visualization (i.e., ) in a generative model. Its tight integration between visualization and the underlying topic model can support applications such as user-driven topic modeling where users can interactively provide feedback to the model [6]. PLSV can also be used as a basic building block when developing new models for other analysis tasks, such as visual comparison of document collections [17].

Relatively less attention has been paid to methods for fast inference of topics and visualization. Existing models often use Maximum a Posteriori (MAP) estimation with the EM algorithm, which is difficult to scale to large datasets. As shown in Figure 12, to run a PLSV model of 50 topics via MAP estimation on a dataset of modest size (e.g., 20 Newsgroups), it takes more than 18 hours using a single core. This long running time limits the usability of these visualization methods in practice.

In this paper, we aim to propose a fast Auto-Encoding Variational Bayes (AEVB) based inference method for inferring topics and visualization. AEVB [13] is a black-box variational method which is efficient for inference and learning in latent Gaussian Models with large datasets. However, to apply the AEVB approach to topic models like LDA, one needs to deal with problems caused by the Dirichlet prior and by posterior collapse [7]. One of the successful AEVB based methods proposed to tackle those challenges for topic models is AVITM [26].

It is not straightforward to apply AEVB or AVITM to our problem because of two main challenges. First, as reviewed in Section 2, PLSV models a document’s topic distribution using a softmax function over its Euclidean distances to topics. It is not clear how to express this nonlinear functional relationship between three categories of latent variables (i.e., topic distribution , document coordinate , and topic coordinate ) when applying AVITM to visualization. Second, AEVB has an assumption that latent encodings are identically and independently distributed (i.i.d.) across samples [5] [21]. In our case, this assumption works well with latent document coordinates where each document is associated with its latent encoding in the visualization space. However, for topic coordinates and word probabilities , that assumption is too strong. The reason is that latent encodings of any topic w.r.t any documents are not independent, but in fact, in our extreme case these latent encodings are similar, i.e., , for any documents and any topic . In other words, is shared across documents. The same argument also applies to word probabilities .

To address the first challenge, we propose to model the nonlinear functional relationship between , , using a normalized Radial Basis Function (RBF) Neural Network [1]. In this model, is the center vector for neuron , i.e., are treated as parameters of the RBF network and will be estimated. Similarly, we model as parameters of a linear neural network that is connected to the RBF network to form the decoder in the AEVB approach. By treating and as parameters of the decoder, we can solve the second challenge, though it can be seen that our algorithm does not learn their posterior distributions but rather their point estimates. In Section 3, we present in detail our proposed method. We focus on PLSV model in this work, though the proposed AEVB inference method could be easily adapted to other visualization models.

We summarize our contributions as follows:

  • We propose, to the best of our knowledge, the first AEVB inference method for the problem of jointly inferring topics and visualization.

  • In our approach, we design a decoder that includes an RBF network connected to a linear neural network. These networks are parameterized by topic coordinates and word probabilities, ensuring that they are shared across all documents.

  • We conduct extensive experiments on real-world large datasets, showing the efficiency and effectiveness of our method. While running much faster than PLSV, it gains better visualization quality and comparable topic coherence.

  • Since our method is black box, it can handle model changes efficiently with little mathematical rederivation effort. We implement different PLSV models that use different RBFs by just changing a few lines of code. We experimentally show that PLSV with Gaussian or Inverse quadratic RBFs consistently produces good performance across datasets.

2 Background and Related Work

2.1 Topic Modeling and Visualization

Topic models [3, 8] are widely used for unsupervised representation learning of text and have found applications in different text mining tasks [23, 2, 28, 12]. Popular topic models such as LDA [3], find a low-dimensional representation of each document in topic space. Each dimension of the topic space has a meaning attached to it and is modeled as a probability distribution over words. In contrast, t-SNE [22], LargeVis [27] are visualization methods aiming to find for each document a low-dimensional representation (typically 2D or 3D). However, we often do not have such semantic interpretation for that low-dimensional space as in topic models. Therefore, there have been works attempting to infuse semantics to the visualization space by jointly modeling topics and visualization [10, 18]. These methods often suffer from the scalability issue with large datasets. In this work, we aim to scale up these methods by proposing a fast AEVB based inference method. We focus on PLSV [10] for applying our proposed method. PLSV has been used as a basic block for building new models for visual text mining tasks [19, 17]. Our proposed method could be easily adapted to these models.

PLSV assumes the following process to generate documents and visualization:

  1. For each topic :

    1. Draw a word distribution:

    2. Draw a topic coordinate:

  2. For each document :

    1. Draw a document coordinate:

    2. For each word in document :

      1. Draw a topic:

      2. Draw a word:

Here has a Dirichlet prior. Topic and document coordinates have Gaussian priors of the forms: and respectively. The topic distribution of a document is defined using a softmax function over its distances to topics:

(1)

As we can see from Eq. 1, the th topic proportion of document is high when document coordinate is close to topic coordinate . This relationship ensures that the distances between documents, topics in the visualization space reflect the topic-document distributions . In the PLSV paper, the parameters are estimated using MAP estimation with the EM algorithm. As shown in our experiments, the algorithm does not scale to large datasets.

2.2 Auto-Encoding Variational Bayes for Topic Models

AEVB [14] and its variant WiSE-ALE [21], AVITM [26] are black-box variational inference methods whose purpose is to allow practitioners to quickly explore and adjust the model’s assumptions with little rederivation effort [24]. AVITM is an auto-encoding variational inference method for topic models. It approximates the true posterior using a variational distribution where is hyperparameter of Dirichlet prior and are the free variational parameters over respectively. Different from Mean-Field Variational Inference, AVITM computes the variational parameters using an inference neural network and they are chosen by optimizing the following ELBO (i.e., the lower bound to the marginal log likelihood):

(2)

By collapsing and approximating the Dirichlet prior with a logistic normal distribution, the second term (i.e., the expectations with respect to ) in the ELBO can be approximated using the reparameterization trick as in AEVB. The second term is also referred to as an expected negative reconstruction error in variational auto-encoders (VAE). While AVITM is successfully applied to LDA, it is not straightforward to apply it to our problem as discussed in the introduction.

3 Proposed Auto-Encoding Variational Bayes for Inferring Topics and Visualization

We represent a document as a row vector of word counts: and is the number of occurrences of word in the document. The marginal likelihood of a document is given by:

(3)

The marginal likelihood of the corpus is . Note that here we treat , and as fixed quantities that are to be estimated. Therefore we are working with a non-smoothed PLSV where , and are not endowed with a posterior distribution. By treating , and as model parameters, we ensure that they are shared across all documents in the AEVB approach. We will consider a fuller Bayesian approach to PLSV in our future work.

As in AVITM, we collapse the discrete latent variable to avoid the difficulty of determining a reparameterization function for it. The rightmost integral in Eq. 3 is the marginal likelihood after is collapsed. We now only consider the true posterior distribution over latent variable : . Due to the intractability of Eq. 3, it is intractable to compute the posterior. We approximate it by a variational distribution parameterized by . The variational parameter is estimated using an inference network as in AEVB. We have the following lower bound to the marginal log likelihood (ELBO) of a document:

(4)

Since the prior is a Gaussian, we can let the variational posterior be a Gaussian with a diagonal covariance matrix: . The KL divergence between two Gaussians in Eq. 4 can be computed in a closed form as follows [11]:

(5)

where , diagonal are outputs of the encoding feed forward neural network with variational parameters . The expectation w.r.t in Eq. 4 can be estimated using reparameterization trick [13]. More specifically, we sample from the posterior by using reparameterization over random variable , i.e., where . The expectation can then be approximated as:

(6)

In Eq. 6, the decoding term is computed as:

(7)

where is the topic-word probability matrix, is a row vector of word counts, is a row vector of topic proportions and is computed as in Eq. 1. Based on Eq. 7 and Eq. 1, we propose using a decoder with two connected neural networks:

Normalized Radial Basis Function Network for computing . We generalize in Eq. 1 using a Normalized Radial Basis Function (RBF) Network [1] as follows:

(8)

In this network, we have neurons in the hidden layer and is the center vector for neuron . The RBF function is a non-linear function that depends on the distance and is the influence weight of neuron on where . While can be estimated by optimizing the ELBO, we choose to fix it as when and 0 otherwise. The parameters of this network are then the center vectors of neurons that are the coordinates of topics in the visualization space. The RBF function can have different forms, e.g., Gaussian: , Inverse quadratic: , or Inverse multiquadric: where 1. When is Gaussian, Eq. 8 reduces to Eq. 1. Note that this generalization of is also discussed in [20] but not in the context of VAE inference. Since topic coordinates are now the parameters of the RBF network, they can be shared and used by all documents for computing the topic distributions . In the experiments, we will show the performance of PLSV with these RBFs using VAE inference.

Linear Neural Network for computing . The output of the above normalized RBF network will be the input of a linear neural network to compute in the decoding term. We treat as the parameters, i.e., the linear weights , of the network and it is computed using a softmax over the network weights to ensure the simplex constraint on : . The architecture of the whole Variational Auto-Encoder is given in Figure 1. We use batch normalization [9] to mitigate the posterior collapse issue found in the AEVB approach [7, 25].

Final Variational Objective Function. From Eqs. 4, 5, 6, 7, we have the following objective function:

(9)

where represents all model and variational parameters, (Eq. 8), , and .

Figure 1: The architecture of Variational Auto-Encoder for Visualization and Topic Modeling.

4 Experiments

We evaluate the effectiveness and efficiency of our proposed AEVB based inference method for visualization and topic modeling both quantitatively and qualitatively. We use four real-world public datasets from different domains including newswire articles, newsgroups posts and academic papers.

Dataset Description

  • Reuters2: contains 7674 newswire articles from 8 categories [4].

  • 20 Newsgroups3: contains 18251 newsgroups posts from 20 categories.

  • Web of Science4: we use Web of Science WOS-46985 dataset [15]. It contains the abstracts and keywords of 46,985 published papers from 7 research domains: CS, Psychology, Medical, ECE, Civil, MAE, and Biochemistry.

  • Arxiv5: contains the titles and abstracts of 598,748 research papers from arXiv. The papers are from 7 categories: Math, CS, Nucl, Stat, Astro, Quant, and Physics.

We perform preprocessing by removing stopwords and stemming. The vocabulary sizes are 3000, 3248, 4000, and 5000 for Reuters, 20 Newsgroups, Web of Science, and Arxiv respectively. Note that our problem is unsupervised and the ground-truth class labels are mainly used for evaluation. Before detailing the experiment results, we describe the comparative methods.

Comparative Methods. We compare the following methods for inferring topics and visualization:

Joint approach:

  • PLSV-MAP6: the original PLSV using MAP estimation with EM algorithm [10].

  • PLSV-VAE (Gaussian) [this paper]7: we apply our proposed variational auto-encoder (VAE) inference to PLSV where Gaussian RBF is used. We write PLSV-VAE to refer to PLSV-VAE (Gaussian).

  • PLSV-VAE (Inverse quadratic) and PLSV-VAE (Inverse multiquadric) [this paper]: these are PLSV-VAE models with Inverse quadratic and Inverse multiquadric RBFs. Since our method is black box, we can quickly implement these two models by just changing a few lines of code of PLSV-VAE (Gaussian) implementation.

Pipeline approach: this is the approach of topic modeling followed by embedding of documents’ topic proportions for visualization. We compare the above joint models with two pipeline models:

  • LDA-VAE + t-SNE: topic modeling by LDA8 with VAE inference [26], then use t-SNE9 [22] to visualize the documents’ topic proportions.

  • ProdLDA-VAE + t-SNE: similar to the above but we use ProdLDA-VAE8 instead of LDA-VAE.

In the next sections, we report the experiment results averaged across 10 independent runs. For PLSV models, we choose that work well for large datasets in our experiments. We run PLSV-MAP with the number of EM iterations set to 200 and the maximum number of iterations for the quasi-Newton algorithm set to 10. Following AVITM, we set , the batch size to 256, the number of samples per document to 1, the learning rate to 0.002, and use dropout with probability . We use Adam as our optimizing algorithm. VAE based models are trained with 1000 epochs. All the experiments are conducted on a system with 64GB memory, an Intel(R) Xeon(R) CPU E5-2623 v3, 16 cores at 3.00GHz. The GPU in use on this system is NVIDIA Quadro P2000 GPU with 1024 CUDA cores and 5 GB GDDR5.

4.1 Classification in the Visualization Space

We quantitatively evaluate the visualization quality by measuring the -NN accuracy in the visualization space. This evaluation approach is also adopted in t-SNE, LargeVis, and the original PLSV. A -NN classifier is used to classify documents using their visualization coordinates. A good visualization should group documents with the same label together and hence yield a high classification accuracy in the visualization space. Figures 4 and 4 show -NN accuracy of all methods on each dataset, for varying number of nearest neighbors and number of topics . For some settings, we do not show PLSV-MAP’s performance as it does not return any results even after 24 hours of running. We can see that PLSV-VAE consistently achieves the best result, except for 25 topics on Reuters (Figure 4a) where it produces a comparable result with PLSV-MAP. These results show that the joint approach outperforms the pipeline approach and VAE inference may help improve the visualization quality of PLSV. To verify this qualitatively, in Section 4.3, we show some visualization examples of all methods across datasets. Note that in this section, we show the accuracy of PLSV-VAE with Gaussian RBF. In Section 4.4, we present the performance of PLSV-VAE with different RBFs.

Figure 3: -NN accuracy in the visualization space with different number of topics (). For some settings, PLSV-MAP does not return any results even after 24 hours of running.
Figure 2: -NN accuracy in the visualization space with different number of nearest neighbors ( topics). For some settings, PLSV-MAP does not return any results even after 24 hours of running.
Figure 3: -NN accuracy in the visualization space with different number of topics (). For some settings, PLSV-MAP does not return any results even after 24 hours of running.
Figure 4: Topic coherence based on NPMI with different number of topics . For some settings, PLSV-MAP does not return any results even after 24 hours of running.
Figure 2: -NN accuracy in the visualization space with different number of nearest neighbors ( topics). For some settings, PLSV-MAP does not return any results even after 24 hours of running.

4.2 Topic Coherence

We quantitatively measure the quality of topic models produced by all methods in terms of topic coherence. The objective is to show that while having better visualization quality, PLSV-VAE also gains comparable, if not better, topic coherence. For topic coherence evaluation, we use Normalized Pointwise Mutual Information (NPMI) which has been shown to be correlated with human judgments [16]. NPMI is computed as follows:

(10)

We estimate , , and using Wikipedia 7-gram dataset10 created from the Wikipedia dump data as of June 2008 version. NPMI of a topic is computed as an average of the pairwise NPMI of its top 10 words. For each method, we average NPMI of its topics. Figure 4 shows topic coherence NPMI of all methods. As we can see, PLSV-VAE finds topics as good as those found by other methods, and in some settings, PLSV-VAE can find significantly better topics. For a qualitative evaluation of topic quality, we show some example topics found by PLSV-VAE in Figure 11.

4.3 Visualization Examples

Figure 6: Visualization of 20 Newsgroups by a) PLSV-MAP b) PLSV-VAE.
\thesubsubfigure Reuters (PLSV-MAP)
\thesubsubfigure Reuters (PLSV-VAE)
\thesubsubfigure 20 Newsgroups (PLSV-MAP)
\thesubsubfigure 20 Newsgroups (PLSV-VAE)
\thesubsubfigure Web of Science (ProdLDA-VAE + t-SNE)
\thesubsubfigure Web of Science (PLSV-VAE)
Figure 5: Visualization of Reuters by a) PLSV-MAP b) PLSV-VAE.
Figure 6: Visualization of 20 Newsgroups by a) PLSV-MAP b) PLSV-VAE.
Figure 7: Visualization of Web of Science by a) ProdLDA-VAE + t-SNE b) PLSV-VAE.
Figure 5: Visualization of Reuters by a) PLSV-MAP b) PLSV-VAE.
Figure 9: Visualization and topics found by PLSV-VAE (Inverse quadratic) on Arxiv ( topics).
Figure 10: -NN accuracy in the visualization space by PLSV-VAE with different RBFs ().
\thesubsubfigure Arxiv (ProdLDA-VAE + t-SNE)
\thesubsubfigure Arxiv (PLSV-VAE)
 (  (  (
Figure 8: Visualization of Arxiv by a) ProdLDA-VAE + t-SNE b) PLSV-VAE.
Figure 9: Visualization and topics found by PLSV-VAE (Inverse quadratic) on Arxiv ( topics).
Figure 10: -NN accuracy in the visualization space by PLSV-VAE with different RBFs ().
Figure 11: Topic coherence NPMI by PLSV-VAE with different RBFs (vary number of topics ).
Figure 8: Visualization of Arxiv by a) ProdLDA-VAE + t-SNE b) PLSV-VAE.
Figure 12: Running time comparison.

We compare visualizations produced by all methods qualitatively by showing some visualization examples. In these visualizations, each document is represented by a point and the color of each point indicates the class of that document. Figures 7 and 7 present visualizations by PLSV-MAP, PLSV-VAE on Reuters and 20 Newsgroups. We see that PLSV-VAE can find meaningful clusters of documents. For example, PLSV-VAE in Figure 7(b) separates well the eight classes into different clusters such as the pink cluster for acq, the orange cluster for earn, and the brown cluster for crude. The visualization by PLSV-MAP in Figure 7(a) also shows clear clusters but it runs much slower than PLSV-VAE as shown in Section 4.5. Figure 7 presents visualization outputs for 20 Newsgroups. For this more challenging dataset, PLSV-VAE produces better-separated clusters, as compared to PLSV-MAP. For example, baseball and hockey are mixed in Figure 7(a) by PLSV-MAP but these classes are separated better in Figure 7(b) by PLSV-VAE. We do not show visualizations of Web of Science and Arxiv by PLSV-MAP because it fails to return any results even after 24 hours of running. We instead show visualizations of these two large datasets by PLSV-VAE and ProdLDA-VAE + t-SNE in Figures 7 and 11. As we can see, visualizations by PLSV-VAE are more intuitive than the ones by ProdLDA-VAE + t-SNE, which supports the outperformance of the joint approach over the pipeline approach.

4.4 Comparing Different Radial Basis Functions

Since our method is black box, we can quickly explore PLSV-VAE model with different assumptions. In this section, we show how different RBFs affect the performance of PLSV-VAE. Besides PLSV-VAE with Gaussian RBF, we implement another two variants of PLSV-VAE that uses two other RBFs: Inverse quadratic and Inverse multiquadric RBFs. We choose these two because, similar to Gaussian, they support the assumption that the th topic proportion of document is high when document coordinate is close to topic coordinate . For these model changes, we do not need to perform a mathematical rederivation, but we only need to change a few lines of code of PLSV-VAE (Gaussian). Figures 11 and 11 show the -NN accuracy and topic coherence of PLSV-VAE with different RBFs. In general, PLSV-VAE with Gaussian or Inverse quadratic RBFs consistently produces good performance across datasets. In some cases, Inverse quadratic produces better results.

4.5 Topic Examples and Running Time Comparison

To qualitatively evaluate the topics, in Figure 11, we show visualization and topic examples generated by PLSV-VAE (Inverse quadratic) on Arxiv. In the visualization, each black empty circle represents a topic that is associated with a list of top 10 words. We see that the topics are meaningful and reflect different research subdomains discussed in the Arxiv papers. For example, many topics are studied in the CS domain such as “graph, g, vertex, k”, “model, data, use, method”, and “logic, program, system”. For the Astro domain, we have topics like “galaxi, cluster, star”, and ”observ, ray, model, star”. Topics such as “energi, nucleu, reaction” and “electron, energi, atom” are discussed in the Nucl domain. By allowing the semantics to be infused in the visualization space, users can now not only see the documents but also their topics. The joint nature of the model may lead to potential applications in different visual text mining tasks.

Finally, we show the running time of all the methods in Figure 12. As expected, PLSV-MAP running on a single core is very slow and it fails to return any results on large datasets even after 24 hours of running. PLSV-VAE runs much faster. It only needs about 5 hours for 200 topics on the largest dataset Arxiv. For completeness, we also include the running time of LDA-VAE, and ProdLDA-VAE. PLSV-VAE is as fast as these methods. In summary, PLSV-VAE can find good topics and visualization while it can scale well to large datasets, which will increase its usability in practice.

5 Conclusion

We propose, to the best of our knowledge, the first fast AEVB based inference method for jointly learning topics and visualization. In our approach, we design a decoder that includes a normalized RBF network connected to a linear neural network. These networks are parameterized by topic coordinates and word probabilities, ensuring that they are shared across all documents. Due to our method’s black box nature, we can quickly experiment with different RBFs with minimal reimplementation effort. Our extensive experiments on four real-world large datasets show that PLSV-VAE runs much faster than PLSV-MAP while gaining better visualization quality and comparable topic coherence.

Acknowledgements

This research is sponsored by NSF #1757207 and NSF #1914635.

Footnotes

  1. is Euclidean distance in our experiments
  2. http://ana.cachopo.org/datasets-for-single-label-text-categorization
  3. https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html
  4. https://data.mendeley.com/datasets/9rw3vkcfy4/6
  5. http://zhang18f.myweb.cs.uwindsor.ca/datasets/
  6. We use the implementation at https://github.com/tuanlvm/SEMAFORE
  7. The implementation of our method can be found at https://github.com/dangpnh2/plsv_vae
  8. We use the implementation at https://github.com/akashgit/autoencoding_vi_for_topic_models
  9. We use the Multicore t-SNE implementation at https://github.com/DmitryUlyanov/Multicore-TSNE
  10. https://nlp.cs.nyu.edu/wikipedia-data/

References

  1. C. M. Bishop (1995) Neural networks for pattern recognition. Oxford University Press. Cited by: §1, §3.
  2. D. M. Blei and J. D. Lafferty (2007) A correlated topic model of science. The Annals of Applied Statistics 1 (1), pp. 17–35. Cited by: §2.1.
  3. D. M. Blei, A. Y. Ng and M. I. Jordan (2003) Latent dirichlet allocation. Journal of machine Learning research 3 (Jan), pp. 993–1022. Cited by: §2.1.
  4. A. Cardoso-Cachopo (2007) Improving Methods for Single-label Text Categorization. Note: PdD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa Cited by: 1st item.
  5. F. P. Casale, A. Dalca, L. Saglietti, J. Listgarten and N. Fusi (2018) Gaussian process prior variational autoencoders. In Advances in Neural Information Processing Systems, pp. 10369–10380. Cited by: §1.
  6. J. Choo, C. Lee, C. K. Reddy and H. Park (2013) Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE transactions on visualization and computer graphics 19 (12), pp. 1992–2001. Cited by: §1.
  7. J. He, D. Spokoyny, G. Neubig and T. Berg-Kirkpatrick (2019-05) Lagging inference networks and posterior collapse in variational autoencoders. In International Conference on Learning Representations (ICLR), New Orleans, LA, USA. External Links: Link Cited by: §1, §3.
  8. T. Hofmann (1999) Probabilistic latent semantic analysis. In UAI, Cited by: §2.1.
  9. S. Ioffe and C. Szegedy (2015-07–09 Jul) Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei (Eds.), Proceedings of Machine Learning Research, Vol. 37, Lille, France, pp. 448–456. External Links: Link Cited by: §3.
  10. T. Iwata, T. Yamada and N. Ueda (2008) Probabilistic latent semantic visualization: topic model for visualizing documents. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 363–371. Cited by: §1, §2.1, 1st item.
  11. A. T. Kalai, A. Moitra and G. Valiant (2010) Efficiently learning mixtures of two gaussians. In Proceedings of the forty-second ACM symposium on Theory of computing, pp. 553–562. Cited by: §3.
  12. H. Kim, D. Choi, B. Drake, A. Endert and H. Park (2019) TopicSifter: interactive search space reduction through targeted topic modeling. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 35–45. Cited by: §2.1.
  13. D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Cited by: §1, §3.
  14. D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. CoRR abs/1312.6114. Cited by: §2.2.
  15. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber and L. E. Barnes (2017) HDLTex: hierarchical deep learning for text classification. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on, Cited by: 3rd item.
  16. J. H. Lau, D. Newman and T. Baldwin (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539. Cited by: §4.2.
  17. T. Le and L. Akoglu (2019) ContraVis: contrastive and visual topic modeling for comparing document collections. In The World Wide Web Conference, pp. 928–938. Cited by: §1, §2.1.
  18. T. M. Le and H. W. Lauw (2014) Manifold learning for jointly modeling topic and visualization. In Twenty-Eighth AAAI Conference on Artificial Intelligence, Cited by: §2.1.
  19. T. M. Le and H. W. Lauw (2014) Semantic visualization for spherical representation. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1007–1016. Cited by: §2.1.
  20. T. M. Le and H. W. Lauw (2016) Semantic visualization with neighborhood graph regularization. Journal of Artificial Intelligence Research 55, pp. 1091–1133. Cited by: §3.
  21. S. Lin, R. Clark, R. Birke, N. Trigoni and S. J. Roberts (2019) WiSE-ale: wide sample estimator for aggregate latent embedding. In Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019, Cited by: §1, §2.2.
  22. L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §1, §2.1, 1st item.
  23. D. Ramage, E. Rosen, J. Chuang, C. D. Manning and D. A. McFarland (2009) Topic modeling for the social sciences. In NIPS 2009 workshop on applications for topic models: text and beyond, Vol. 5, pp. 27. Cited by: §2.1.
  24. R. Ranganath, S. Gerrish and D. M. Blei (2014) Black box variational inference. ArXiv abs/1401.0118. Cited by: §2.2.
  25. A. Razavi, A. van den Oord, B. Poole and O. Vinyals (2019) Preventing posterior collapse with delta-vaes. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §3.
  26. A. Srivastava and C. A. Sutton (2017) Autoencoding variational inference for topic models. In ICLR, Cited by: §1, §2.2, 1st item.
  27. J. Tang, J. Liu, M. Zhang and Q. Mei (2016) Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web, pp. 287–297. Cited by: §2.1.
  28. M. Tkachenko and H. W. Lauw (2019) Comparelda: a topic model for document comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 7112–7119. Cited by: §2.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
420003
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description