Discriminative Relational Topic Models

Discriminative Relational Topic Models

Ning Chen, Jun Zhu,  Fei Xia, and Bo Zhang N. Chen, J. Zhu, F. Xia and B. Zhang are with the Department of Computer Science and Technology, National Lab of Information Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084 China.
E-mail: {ningchen, dcszj, dcszb}@mail.tsinghua.edu.cn,
xia.fei09@gmail.com.
Abstract

Many scientific and engineering fields involve analyzing network data. For document networks, relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in common real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving the prediction performance, and the time efficiency can be dramatically improved with a simple fast approximation method.

{keywords}

statistical network analysis, relational topic models, data augmentation, regularized Bayesian inference

1 Introduction

Many scientific and engineering fields involve analyzing large collections of data that can be well described by networks, where vertices represent entities and edges represent relationships or interactions between entities; and to name a few, such data include online social networks, communication networks, protein interaction networks, academic paper citation and coauthorship networks, etc. As the availability and scope of network data increase, statistical network analysis (SNA) has attracted a considerable amount of attention (see [Goldenberg:2010] for a comprehensive survey). Among the many tasks studied in SNA, link prediction [liben_nowell, backstrom] is a most fundamental one that attempts to estimate the link structure of networks based on partially observed links and/or entity attributes (if exist). Link prediction could provide useful predictive models for suggesting friends to social network users or citations to scientific articles.

Many link prediction methods have been proposed, including the early work on designing good similarity measures [liben_nowell] that are used to rank unobserved links and those on learning supervised classifiers with well-conceived features [Hasan:2006, Lichtenwalter:2010]. Though specific domain knowledge can be used to design effective feature representations, feature engineering is generally a labor-intensive process. In order to expand the scope and ease of applicability of machine learning methods, fast growing interests have been spent on learning feature representations from data [Bengio:2012]. Along this line, recent research on link prediction has focused on learning latent variable models, including both parametric [Hoff:02, Hoff:07, Airoldi:nips08] and nonparametric Bayesian methods [Miller:nips09, Zhu:ICML12]. Though these methods could model the network structures well, little attention has been paid to account for observed attributes of the entities, such as the text contents of papers in a citation network or the contents of web pages in a hyperlinked network. One work that accounts for both text contents and network structures is the relational topic models (RTMs) [Chang:RTM09], an extension of latent Dirichlet allocation (LDA) [Blei:03] to predicting link structures among documents as well as discovering their latent topic structures.

Though powerful, existing RTMs have some assumptions that could limit their applicability and inference accuracy. First, RTMs define a symmetric link likelihood model with a diagonal weight matrix that allows the-same-topic interactions only, and the symmetric nature could also make RTMs unsuitable for asymmetric networks. Second, by performing standard Bayesian inference under a generative modeling process, RTMs do not explicitly deal with the common imbalance issue in real networks, which normally have only a few observed links while most entity pairs do not have links, and the learned topic representations could be weak at predicting link structures. Finally, RTMs and other variants [LiuYan:09] apply variational methods to estimate model parameters with mean-field assumptions [Jordan:99], which are normally too restrictive to be realistic in practice.

To address the above limitations, this paper presents discriminative relational topic models, which consist of three extensions to improving RTMs:

  1. we relax the symmetric assumption and define the generalized relational topic models (gRTMs) with a full weight matrix that allows all pairwise topic interactions and is more suitable for asymmetric networks;

  2. we perform regularized Bayesian inference (RegBayes) [Zhu:nips11] that introduces a regularization parameter to deal with the imbalance problem in common real networks;

  3. we present a collapsed Gibbs sampling algorithm for gRTMs by exploring the classical ideas of data augmentation [Dempster1977, Tanner:1987, DykMeng2001].

Our methods are quite generic, in the sense that we can use various loss functions to learn discriminative latent representations. In this paper, we particularly focus on two types of popular loss functions, namely, logistic log-loss and max-margin hinge loss. For the max-margin loss, the resulting max-margin RTMs are themselves new contributions to the field of statistical network analysis.

For posterior inference, we present efficient Markov Chain Monte Carlo (MCMC) methods for both types of loss functions by introducing auxiliary variables. Specifically, for the logistic log-loss, we introduce a set of Polya-Gamma random variables [Polson:arXiv12], one per training link, to derive an exact mixture representation of the logistic link likelihood; while for the max-margin hinge loss, we introduce a set of generalized inverse Gaussian variables [Devroye:book1986] to derive a mixture representation of the corresponding unnormalized pseudo-likelihood. Then, we integrate out the intermediate Dirichlet variables and derive the local conditional distributions for collapsed Gibbs sampling analytically. These “augment-and-collapse” algorithms are simple and efficient. More importantly, they do not make any restricting assumptions on the desired posterior distribution. Experimental results on several real networks demonstrate that these extensions are important and can significantly improve the performance.

The rest paper is structured as follows. Section 2 summarizes the related work. Section 3 presents the generalized RTMs with both the log-loss and hinge loss. Section 4 presents the “augment-and-collapse” Gibbs sampling algorithms for both types of loss functions. Section 5 presents experimental results. Finally, Section 6 concludes with future directions discussed.

learning, bound, PAC, hypothesis, algorithm
numerical, solutions, extensions, approach, remark
mixtures, experts, EM, Bayesian, probabilistic
features, selection, case-based, networks, model
planning, learning, acting, reinforcement, dynamic
genetic, algorithm, evolving, evolutionary, learning
plateau, feature, performance, sparse, networks
modulo, schedule, parallelism, control, processor
neural, cortical, networks, learning, feedforward
markov, models, monte, carlo, Gibbs, sampler

TABLE I: Learned diagonal weight matrix of 10-topic RTM and representative words corresponding with topics.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
48524
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description