SmallVariance Asymptotics for Nonparametric Bayesian Overlapping Stochastic Blockmodels
Abstract
The latent feature relational model (LFRM) is a generative model for graphstructured data to learn a binary vector representation for each node in the graph. The binary vector denotes the node’s membership in one or more communities. At its core, the LFRM [\citeauthoryearMiller et al.2009] is an overlapping stochastic blockmodel, which defines the link probability between any pair of nodes as a bilinear function of their community membership vectors. Moreover, using a nonparametric Bayesian prior (Indian Buffet Process) enables learning the number of communities automatically from the data. However, despite its appealing properties, inference in LFRM remains a challenge and is typically done via MCMC methods. This can be slow and may take a long time to converge. In this work, we develop a smallvariance asymptotics based framework for the nonparametric Bayesian LFRM. This leads to an objective function that retains the nonparametric Bayesian flavor of LFRM, while enabling us to design deterministic inference algorithms for this model, that are easy to implement (using generic or specialized optimization routines) and are fast in practice. Our results on several benchmark datasets demonstrate that our algorithm is competitive to methods such as MCMC, while being much faster.
1 Introduction
Relational data, such as graphs given as adjacency matrices, are prevalent in many domains, such as analysis of social networks, biological networks, citation networks, etc. Stochastic blockmodels and its extensions [\citeauthoryearNowicki and Snijders2001, \citeauthoryearKemp et al.2006, \citeauthoryearAiroldi et al.2008, \citeauthoryearMiller et al.2009, \citeauthoryearZhou2015] are attractive models for such graphstructured data. These models are commonly used for discovering the underlying latent structure in the graph (e.g., via lowdimensional vector space representation of the nodes) and for linkprediction. The latent feature relational model (LFRM) [\citeauthoryearMiller et al.2009] is a particularly attractive variant of stochastic blockmodels that allows each node to simultaneously belong to multiple communities by modeling each node via a binary membership vector. This LFRM can also be seen as learning an overlapping clustering of nodes in the graph (each community represents a cluster). However, unlike various other models for learning overlapping clustering of nodes in a graph [\citeauthoryearXie et al.2013], the LFRM generative model also defines the probability of a link between any pair of node via a bilinear function of their community membership vectors. As a consequence, it can also be used for linkprediction, unlike other overlapping clustering models for graphs [\citeauthoryearXie et al.2013], that can only learn community memberships but are not suited for linkprediction. Another very appealing property of the LFRM is that the number of communities can be inferred from the data using an Indian Buffet Process prior [\citeauthoryearGriffiths and Ghahramani2011] on the binary nodecommunity assignment matrix.
Despite the expressiveness and modeling flexibility, inference in the LFRM however remains a challenge. The model is nonconjugate and the only existing inference method is based on Markov Chain Monte Carlo (MCMC) sampling [\citeauthoryearMiller et al.2009]. MCMC based methods can be slow to mix and converge, especially for nonparametric Bayesian models like LFRM. It is therefore highly desirable to develop faster, alternative inference methods the LFRM.
In this work, we appeal to the idea of smallvariance asymptotics [\citeauthoryearKulis and Jordan2011, \citeauthoryearBroderick et al.2013] in the context of the LFRM to get an equivalent nonprobabilistic model. The resulting model retains the flavor of the original LFRM (e.g., the ability to infer the number of communities), but has a much simpler inference procedure which boils down to solving an optimization problem, for which existing offtheshelf or specialized optimization routines can be used. We would like to note here that the idea is smallvariance asymptotics (SVA) has also been explored recently to obtain nonprobabilistic counterparts of various other nonparametric Bayesian models. However, unlike these recent works, which apply SVA for models of i.i.d./sequential vectorvalued data [\citeauthoryearBroderick et al.2013, \citeauthoryearRoychowdhury et al.2013, \citeauthoryearWang and Zhu2015], our work is motivated by the need of developing SVA based algorithms for relational data, such as graphs. We believe our work will motivate and open door to the design of fast, deterministic algorithms for learning from relational data. Our experiments on several benchmark datasets show that our algorithm attains improved/similar linkprediction accuracies as compared to MCMC based inference for LFRM, which being much faster.
2 Latent Feature Relational Model
We first introduce notation and problem setup and then briefly describe the nonparametric Bayesian latent feature relational model [\citeauthoryearMiller et al.2009] (LFRM) for network data for which we develop the smallvariance asymptotics to design the inference algorithm for LFRM.
We assume that the data is given as a graph between entities, represented as an adjacency matrix where denotes the presence of a link (edge) between node and node , and denotes that there is no link. The matrix , however is only partially observed and the goal is to predict the presence/absence of edges where it is not observed. This is essentially a linkprediction task.
The LFRM [\citeauthoryearMiller et al.2009] assumes that node in the graph is associated with a binary latent feature vector where denotes the total number of latent features. Note that can also be thought of as denoting the total number of communities/clusters. Here, indicates that node contains latent feature , which is equivalent to saying that node belongs to community (and otherwise). Note that in the LFRM, a node can potentially belong to more than one community. We represent the latent feature representation of all the entities by , as the binary matrix, which can also be interpreted as the nodecommunity assignment matrix. In the rest of the exposition, we will sometimes use the terms latent feature, community, and cluster, interchangeably – they all refer to the same.
LFRM models the probability of a link between node and node as a bilinear function of their latent feature vectors (denoting their cluster assignments)
(1) 
where is the sigmoid function. Here denote a real valued matrix, with denoting the weight affecting the probability of link between node belonging to cluster and node belonging to cluster .
The overall likelihood for the model can be written as
(2) 
where each is a Bernoulli with probability as defined in Eq. 1. Assuming the observations to be i.i.d. conditioned on the latent features, the likelihood will be
(3) 
The LFRM model contains two main unknowns: the binary matrix of size and the realvalued matrix of size . The LFRM [\citeauthoryearMiller et al.2009] assumes Gaussian priors on each entry in
(4) 
In order to automatically learn the appropriate number of latent features (i.e., number of communities/clusters), LFRM posits an Indian Buffet Process (IBP) prior [\citeauthoryearGriffiths and Ghahramani2011] on the binary matrix . This nonparametric prior can be explained through a culinary metaphor, where each customer samples dishes from an infinitely long buffet dishlist. For each customer , an already sampled dish is chosen with a probability based on how many previous customers have sampled that dish. Thereafter, customer samples new dishes, where is the IBP hyperparameter. The subset of sampled dishes by a customer represents the binary latent feature. When considering all the customers, the process is equivalent to sampling a binary matrix whose number of columns is equal to the total number of unique dishes sampled. entities sample a total of features and is the resulting feature allocation matrix.
As shown in [\citeauthoryearGriffiths and Ghahramani2011], the IBP prior on can be written as follows
(5) 
where represents the total number of unique decimal values of the binary vector across the columns of and is the number of with unique value of this vector. denotes the count of feature k being one for first N entities which means that entity samples feature with probability .
With the priors on and specified, we summarize the LFRM generative model [\citeauthoryearMiller et al.2009]
(6)  
(7)  
(8) 
Exact inference in this model is intractable and MCMC based inference was proposed in [\citeauthoryearMiller et al.2009]. Since MCMC can be slow to mix and converge, here we present a new inference algorithm, motivated by the idea of smallvariance asymptotics [\citeauthoryearKulis and Jordan2011, \citeauthoryearBroderick et al.2013] for the LFRM, which we describe next.
3 SmallVariance Asymptotics for LFRM
To develop the smallvariance asymptotics (SVA) for the LFRM, we will take the MAP objective (the log of posterior ) for the model and take the smallvariance limit of the objective to obtain an objective function which can be optimized w.r.t. and to find point estimates of these unknowns. This construction is motivated by [\citeauthoryearBroderick et al.2013] who applied SVA for doing inference in linear Gaussian models with an a priori unknown number of latent features. However, while linear Gaussian models are designed for vectorvalued data, our focus here is on models for relational data, such as LFRM. Moreover, while their model had a Gaussian likelihood with a natural variance term, for LFRM the likelihood is Bernoulli. To apply SVA for a model with Bernoulli likelihood, we leverage the equivalence of exponential family and Bregman divergence [\citeauthoryearJiang et al.2012] and represent the Bernoulli as a scaled Bernoulli, which will enable us to apply the SVA idea for LFRM.
4 Bregman Divergence and Scaled Bernoulli
In this section, we establish the functional form of the scaledlikelihood (LFRM likelihood is Bernoulli), that can then be used to obtain the small variance asymptotics objective from the posterior, for the LFRM. To this end, we first express the Bernoulli distribution in its canonical form, using a generalised distance by incorporating the bijective relationship between Bregman divergences and exponential families, discussed in [\citeauthoryearBanerjee et al.2005]. A likelihood , has the exponential family representation
(9) 
where , , and , with denoting the natural parameter, the log partition function and is the sufficient statistics associated with the distribution family. Using properties of the log partition function, we have the mean and variance .
Similar to [\citeauthoryearJiang et al.2012], we now define a scaled version of the Bernoulli with natural parameter and the log partition function , where . Using the Lemma 3.1 of [\citeauthoryearJiang et al.2012], we can see that the mean and variance of the scaled distribution will be related to and as
(10)  
(11) 
As discussed in [\citeauthoryearBanerjee et al.2005], we can define a convex function , that links Bernoulli to corresponding Bregman divergence. Let,
(13) 
Then, the Bregman divergence between a point and mean can be defined as:
(14)  
(15) 
Using the Bregman divergence defined above, the Bernoulli distribution can be expressed as
where
Now, we obtain the scaled version of the above likelihood by replacing by , which in turn is . Denoting , the Bregman divergence representation of the scaled Bernoulli evaluates to be,
(16)  
(17)  
(18) 
where, . With this representation of the scaled likelihood function established, we now discuss the MAP based asymptotics for the nonparametric model presented in the previous section.
5 Applying SVA to LFRM
Having reexpressed the Bernoulli as a scaled Bernoulli, we are now in a position to derive SVA for the LFRM. For the LFRM, the joint posterior for the model will be
We will be working with a loss function version of the objective, which can be written as the negative of the log posterior
(19)  
(20) 
Now, using the scaled Bernoulli representation, we get
(21)  
(22)  
(23)  
(24) 
This expression can be simplified to get
For the IBP prior term for (Eq. 5) we choose . The choice of this functional form is in line with the influence of on the size of the binary latent representation size. Lower values of promotes a smaller sized representation which is also the case with this form, in the limit of . This helps us avoid overfitting of data to have the trivial latent feature representation of size . here is a hyperparameter, optimised by crossvalidation. Substituting for the expression of and simplifying we get
(25) 
Similarly, the negative log of prior for is
(26) 
It is important to note here that the entire expression for is constant with respect to . Therefore, the negative log posterior for can be written as
Dividing this equation by gives us
Now, as , and . Thus we define the objective function, , which is to be minimized w.r.t. and , as
(27)  
(28)  
(29) 
where, and .
Eq. 5 represents the MAP based equivalent objective for the nonparametric Bayesian LFRM [\citeauthoryearMiller et al.2009]. Note that the objective consists of a sum of two component  the first component measures the fit to the data and the other component penalizing the number of latent features. The objective in Eq. 5 can be optimized w.r.t. and using a variety of methods (both offtheshelf as well as specialized optimizers). Also note that the objective is convex w.r.t. each and (but not in both). In Sec. 6, we present a greedy algorithm to minimize this objective which alternates between optimizing and , and is guaranteed to reach a local minima of the objective.
We would also like to note that the above formulation has striking similarity to the logistic regression loss function, where by using the trace trick, . Here we can assume to be the latent feature for each term and to be the model parameters. The trace term again can be expressed as a dot product of flattened matrices, making optimization of , for fixed , exploit gradient based methods. Another important component of the objective is the penalty on the length of the latent representation . This has the benefit of not converging to the trivial case of . An interesting aspect of the above objective is that it would stay valid for a wider variety of models with other link functions where the Bernoulli probabilities are not necessarily defined by a sigmoid [\citeauthoryearMørup et al.2011].
Method  LazAdv  LazWork  LazFri  Protein230  NIPS234 
MMSB [\citeauthoryearAiroldi et al.2008]  0.813  0.844  0.846    0.871 
HGPEPM [\citeauthoryearZhou2015]        0.952  0.947 
IRM [\citeauthoryearKemp et al.2006]  0.796  0.826  0.821  0.934  0.948 
LFRMMCMC[\citeauthoryearMiller et al.2009]  0.815  0.741  0.806  0.892  0.951 
LFRMSVA (Ours)  0.864  0.833  0.829  0.958  0.966 
6 Optimization
With the objective function in place, we now discuss the possible ways of achieving the optimal set of parameters and . The overall problem, under the smallvariance asymptotic assumption gets reduced to solving the following optimization problem,
(31)  
6.1 Algorithm
A simple starting point to optimize the above, would be to use a greedy strategy and optimize alternately with respect to and , similar in spirit to [\citeauthoryearXu et al.2015]. This would involve optimizing each over all possible configurations, for fixed . We present a more greedy strategy, on the lines of the MADBayes algorithm presented in [\citeauthoryearBroderick et al.2013], that first optimizes Eq. 5 for each element of and then with respect to . The complete algorithm KLAFTER (Latent Feature learning on Relational data) is presented below,
The above algorithm can be spedup further by caching values of the objective function by assuming each change of from 0 to 1 (1 to 0) as an addition(subtraction) of a rank1 elementary matrix, with , 0 otherwise.
The optimization w.r.t can be performed by using order or order batch/stochastic/coordinate gradient descent based methods, or using derivativefree methods that only use the objective function’s value. In our implementation, we chose the latter.
6.2 Proof of Local Convergence
The proposed KLAFTER algorithm converges to a local minima in finite number of iterations. We present a sketch of the proof for this. The first step of finding optimal , for a fixed , always minimizes the objective because of its greedy nature. This is followed by the step of minimizing , for fixed . As discussed in Sec. 5, the objective is convex in for a fixed . Thus, this step realized by any order gradient descent style module, will lower the objective value. Next, while adding another dimension to latent representation, the choice is made greedily, choosing the one that has the lower objective value, thus moving closer to the local minima.
7 Related Work
The smallvariance asymptotics (SVA) has been leveraged recently to develop nonprobabilistic counterparts for several nonparametric Bayesian latent variables models, and has resulted in fast deterministic inference algorithms for such models. Some of the notable examples include Dirichlet Process and hierarchical Dirichlet Process mixture models for clustering [\citeauthoryearKulis and Jordan2011], Indian Buffet Process based latent feature allocation for vectorvalued data [\citeauthoryearBroderick et al.2013] with linear Gaussian observation model, the infinite Hidden Markov Model [\citeauthoryearRoychowdhury et al.2013], latent Dirichlet Allocation [\citeauthoryearJiang et al.2017], etc. While these models are designed for i.i.d./sequential data, to the best of our knowledge, the SVA idea has not been applied to models for relational data, such as the latent feature relational model (LFRM), which is inherently a nonconjugate model, and for which the only known inference method is based on MCMC sampling [\citeauthoryearMiller et al.2009].
Although not for LFRM, faster alternative to standard MCMC based inference have been developed for some other stochastic blockmodels, such as infinite relational model [\citeauthoryearKemp et al.2006], which assumes onehot vector embedding for each node and the mixedmembership blockmodel [\citeauthoryearAiroldi et al.2008], which assumes a fractional membership of each node to multiple communities. These inference methods include methods based on online MCMC [\citeauthoryearLi et al.2016] or online variational inference [\citeauthoryearGopalan et al.2012]. Applying these methods for LFRM is not straightforward. Online MCMC methods require carefully designed, modelspecific derivations, which is further challenged by the discrete nature of the node embeddings. On the other hand, online variational inference to a model like LFRM is problematic due to the nonconjugacy of the LFRM [\citeauthoryearZhu et al.2016]. Our SVA based inference algorithm does not suffer from any of these issues. The final objective function has a simple form as a sum of a crossentropy term and a regularizer that can be seen as penalizing large number of communities. The objective function can be optimized using a variety of inference methods, both batch and online. Moreover, although we assume the network data is be given in form of a binary matrix (presence/absence of an edge), other types of data can also be modeled (e.g., countvalued edges) by choosing an appropriate exponential family distribution for the likelihood.
8 Experiments
We now present experimental results of our SVA based inference algorithm for LFRM on various benchmark datasets. We compare our algorithm with MCMC based inference for LFRM, as well as with other stateoftheart stochastic blockmodels on the link prediction accuracy. In addition, we also compare with MCMC in terms of linkprediction accuracy vs wallclock time, to show that our algorithm attains much better linkprediction accuracies while taking a significantly shorter amount of time as compared to an MCMC sampler.
For our linkprediction experiments, we train all the models using of randomly chosen entries in the matrix data and the remaining of data is used to test the trained model. We consider five random trainingtesting partitions for all datasets and report the average Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC). Our model has only one free hyperparameter , which we tune using fold crossvalidation technique on the training dataset. We would like to note that the performance of our algorithm is fairly insensitive to the exact choice of ; in most cases, worked well.
We initialize our KLAFTER algorithm (which we will refer to as LFRMSVA in the rest of this section) with . Initializing with larger is leads to slightly faster convergence. On all the datasets, our SVA based algorithm converged within 100 iterations if initialized with , and in as few as 10 iterations if initialized with larger (e.g., ). The MCMC sampling based LFRM (referred to as LFRMMCMC) was run for 1000 iterations with 500 burnin and 500 collection iterations. We observed that the AUC scores of the MCMC based LFRM were fairly stable after these many iterations.
We report experimental results on the following benchmark datasets, also used in other prior work on LFRM [\citeauthoryearMiller et al.2009] and other stochastic blockmodels [\citeauthoryearZhou2015].

LazegaLawyers[\citeauthoryearLazega2001]: This dataset constitutes of three smallscale networks and is based on corporate law partnership. The entities in these networks are lawyers and the relation predicates include symmetric relations like work based association, friendship association and the assymetric relation of advisory association.

Protein230 Network[\citeauthoryearButland et al.2005]: This dataset consists of the interaction between 230 different proteins given in form of an adjacency matrix. The dataset has 595 edges.

NIPS234 Coauthor Network[\citeauthoryearMiller et al.2009]:The NIPS234 network consists of 234 nodes with the relation describing the coauthorship of top 234 authors, by number of publications, in NIPS 117.
We would like to note that we have chosen only moderatesized datasets in our experiments so that it is feasible to run the MCMC sampler for LFRM for sufficiently large number of iterations, and do a fair comparison with our SVA based approach. The MCMC sampler does not scale easily to datasets with even a couple of thousand of nodes, while our SVA based algorithm does not face this issue.
Our experimental results on the linkprediction task for all the datasets are shown in Table 1. As our experimental results show, LFRMSVA attains much better linkprediction accuracies as compared to LFRMMCMC, as well as various other stateoftheart stochastic blockmodels, such as IRM, MMSB, HGPEPM, etc. This can be attributed to the ability of our algorithm to search for a good solution (even though it is a point estimate) fairly quickly. In contrast, the MCMC based inference algorithm can take a long time to converge to a good solution.
The convexity of the objective function in , for fixed (step 5 and 8 in Algorithm 1), along with caching techniques for the greedy search of optimal , while fixing (step 4 and 9 in Algorithm 1), allows our proposed algorithm to scale to larger datasets and converge faster to higher AUC scores. This is also evident from Fig. 1 where we compare the AUC vs wallclock time for LFRMMCMC and LFRMSVA on Protein230 dataset. For this experiment, we initialized with and allowed both the algorithms to run until convergence of the AUC score. A similar experiment was also done for the NIPS234 dataset which yielded similar results, but skipped due to lack of space. The improvement in convergence speed can also be attributed to the fact that LFRM uses MCMC sampling based approach, where there are a fixed number of burnin samples, followed by sampling from the approximated posterior. Here, usually the sampling subroutine becomes the bottleneck. The objective function formulated and the proposed algorithm are intended to put forward a scalable means style optimization trick and to drive smallvariance asymptotics formulation of other Bayesian nonparametric models. While the datasets that have been discussed and evaluated on, have binary links present, we can easily extend the model to other datasets by an appropriate choice of the likelihood function and likewise formulating the objective.
The latent feature representation of each entity learned by our model can also be used to perform a qualitative analysis, where each column of represents a latent community present in the network. An entity , is a member of the community , if and not a part of it if . For the NIPS234 dataset, we choose communities with smaller number of members as they tend to represent a dense connection between the authors. We manually interpret the community name based on the work of authors during the period from which the data was collected. Some of the communities are presented in 2. It is interesting to note that some authors like Thrun S, Bishop C etc. are inferred as belonging to multiple communities as the model allows overlapping communities.
Community  Authors 

Speech Processing  SchmidBaur O, McNair A, Sloboda T, Woszczyna M, Doucet A, Hanson S 
Control and Robotics  Barto A, Sutton, Thrun S, Donoghue J, Burghard W 
Computational Neuroscience  Stork D, Pawelzik K, Personnaz L, Dreyfus G, Pearlmutter B, Bishop C 
9 Conclusion
We have presented a new inference algorithm for the latent feature relational model (LFRM) by applying the idea of smallvariance asymptotics (SVA) to the LFRM. Our algorithm is simple to implement, faster than MCMC based inference for LFRM, and obtains comparable or better linkprediction accuracies on several benchmark datasets. Applying SVA to the LFRM results in an objective function that still retains the flavor of the nonparametric Bayesian flavor of LFRM (e.g., the ability to learn the number of communities), which opening doors to the possibility of choosing from a wide variety of optimization methods for learning the model parameters. Although we considered a greedy algorithm to optimize w.r.t. the binary latent feature matrix, recent advances in combinatorial optimization can also be leveraged to design other optimization algorithms for the objective. Other possible improvements include extending the optimization to work in an online setting or in a distributed setting, both of which are amenable under our SVA based setting. Finally, while our SVA based algorithm is a viable alternative for MCMC methods for doing inference for the LFRM, the fast point estimates produced by our method can also serve as good initializers for MCMC based inference for faster convergence since they rely critically on a good initializations.
Acknowledgements
Gundeep Arora achnowledges the ResearchI Foundation, IIT Kanpur. Piyush Rai acknowledges support from IBM Faculty Award, DSTSERB Early Career Research Award, Dr. Deep Singh and Daljeet Kaur Faculty Fellowship.
References
 Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9(Sep):1981–2014, 2008.
 Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. Journal of machine learning research, 6(Oct):1705–1749, 2005.
 Tamara Broderick, Brian Kulis, and Michael I Jordan. Madbayes: Mapbased asymptotic derivations from bayes. In ICML (3), pages 226–234, 2013.
 Gareth Butland, José Manuel PeregrínAlvarez, Joyce Li, Wehong Yang, Xiaochun Yang, Veronica Canadien, Andrei Starostine, Dawn Richards, Bryan Beattie, Nevan Krogan, et al. Interaction network containing conserved and essential protein complexes in escherichia coli. Nature, 433(7025):531, 2005.
 Prem K Gopalan, Sean Gerrish, Michael Freedman, David M Blei, and David M Mimno. Scalable inference of overlapping communities. In Advances in Neural Information Processing Systems, pages 2249–2257, 2012.
 Thomas L Griffiths and Zoubin Ghahramani. The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12(Apr):1185–1224, 2011.
 Ke Jiang, Brian Kulis, and Michael I Jordan. Smallvariance asymptotics for exponential family dirichlet process mixture models. In Advances in Neural Information Processing Systems, pages 3158–3166, 2012.
 Ke Jiang, Suvrit Sra, and Brian Kulis. Combinatorial topic models using smallvariance asymptotics. In AISTATS, 2017.
 Charles Kemp, Joshua B Tenenbaum, Thomas L Griffiths, Takeshi Yamada, and Naonori Ueda. Learning systems of concepts with an infinite relational model. In AAAI, volume 3, page 5, 2006.
 Brian Kulis and Michael I Jordan. Revisiting kmeans: New algorithms via bayesian nonparametrics. arXiv preprint arXiv:1111.0352, 2011.
 Emmanuel Lazega. The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. Oxford University Press on Demand, 2001.
 Wenzhe Li, Sungjin Ahn, and Max Welling. Scalable mcmc for mixed membership stochastic blockmodels. In Artificial Intelligence and Statistics, pages 723–731, 2016.
 Kurt Miller, Michael I Jordan, and Thomas L Griffiths. Nonparametric latent feature models for link prediction. In Advances in neural information processing systems, pages 1276–1284, 2009.
 Morten Mørup, Mikkel N Schmidt, and Lars Kai Hansen. Infinite multiple membership relational modeling for complex networks. In Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on, pages 1–6. IEEE, 2011.
 Krzysztof Nowicki and Tom A B Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077–1087, 2001.
 Anirban Roychowdhury, Ke Jiang, and Brian Kulis. Smallvariance asymptotics for hidden markov models. In Advances in Neural Information Processing Systems, pages 2103–2111, 2013.
 Yining Wang and Jun Zhu. Dpspace: Bayesian nonparametric subspace clustering with smallvariance asymptotics. In International Conference on Machine Learning, pages 862–870, 2015.
 Jierui Xie, Stephen Kelley, and Boleslaw K Szymanski. Overlapping community detection in networks: The stateoftheart and comparative study. Acm computing surveys (csur), 45(4):43, 2013.
 Yanxun Xu, Peter Müller, Yuan Yuan, Kamalakar Gulukota, and Yuan Ji. Mad bayes for tumor heterogeneity  feature allocation with exponential family sampling. Journal of the American Statistical Association, 110(510):503–514, 2015.
 Mingyuan Zhou. Infinite edge partition models for overlapping community detection and link prediction. In AISTATS, 2015.
 Jun Zhu, Jiaming Song, and Bei Chen. Maxmargin nonparametric latent feature models for link prediction. arXiv preprint arXiv:1602.07428, 2016.