On approximate equivalence of modularity, D and non-negative matrix factorization

On approximate equivalence of modularity, D and non-negative matrix factorization

Zhenhai Chang School of Statistics and Mathematics, Central University of Finance and Economics    Hui-Min Cheng School of Statistics and Mathematics, Central University of Finance and Economics    Chao Yan School of Statistics and Mathematics, Central University of Finance and Economics    Xianjun Yin School of Statistics and Mathematics, Central University of Finance and Economics    Zhong-Yuan Zhang zhyuanzh@gmail.com School of Statistics and Mathematics, Central University of Finance and Economics
July 20, 2019
Abstract

Community structures detection is one of the fundamental problems in complex network analysis towards understanding the topology structures of the network and the functions of it. Nonnegative matrix factorization (NMF) is a widely used method for community detection, and modularity Q and modularity density D are criteria to evaluate the quality of community structures. In this paper, we establish the connections between Q, D and NMF for the first time. Q maximization can be approximately reformulated under the framework of NMF with Frobenius norm, especially when is large, and D maximization can also be reformulated under the framework of NMF. Q minimization can be reformulated under the framework of NMF with Kullback-Leibler divergence. We propose new methods for community structures detection based on the above findings, and the experimental results on synthetic networks demonstrate their effectiveness.

pacs:
Valid PACS appear here
preprint: APS/123-QED

I Introduction

Many real-world systems can be expressed as complex networks, where nodes and edges represent elements in the system and the relations among them, respectively, such as social networks Snijders (2011), biological networks Zhong et al. (2016), information networks Albert et al. (1999), etc. One of the important statistical properties of complex networks is community structure Girvan and Newman (2002). Although there is not an accurate definition, it is widely accepted that community in the network is set of nodes, which are densely interconnected but loosely connected with the rest Girvan and Newman (2002); Fortunato (2010); Newman (2003). Community detection can help us better understand the topological structures of the network from the aspect of meso-level and network’s functions, such as epidemic spreading and information diffusion Guimera and Amaral (2005); Fortunato (2010).

To quantitatively describe community structures, modularity Newman and Girvan (2004) is proposed and has been extensively studied. The basic idea of modularity is very similar with hypothesis testing: given a network partition, it compares the fraction of links inside each module of the partition with the so-called null model, i.e., the expected fraction of links inside the corresponding module in degree preserving counterpart, and sums the differences over all of the modules of the partition Newman (2006); Lancichinetti and Fortunato (2011); Hofstad (2013). A higher modularity value indicates that the partition is reasonable and meaningful, or one can say it has statistical significance. Modularity can be not only optimized directly for community detection, but also used for evaluation of community structures detected by other methods. However, modularity-optimization strategy has a resolution limit Fortunato and Barthélemy (2006); Good et al. (2010). Modularity maximization usually tends to merge small communities into large ones in network. To overcome this problem, Li et al. (2008) proposed a function called modularity density (also called ) to evaluate the partition of a network into communities.

Besides modularity, there are also other cost functions that can be optimized over all possible network partitions for community detection, such as nonnegative matrix factorization (NMF) Lee and Seung (1999, 2001). In recent years, NMF has been successfully applied to a range of community structures detection including nonoverlapping communities, overlapping ones and bipartite ones Cao et al. (2013); Zhang et al. (2013); Zhang and Ahn (2015). Furthermore, by adjusting objective functions, many community detection methods can be reformulated under the framework of NMF. For example, maximum likelihood of stochastic block models (SBM), spectral clustering and probabilistic latent semantic indexing can be reformulated by using the objective functions of NMF, respectively Zhang et al. (2016); Ding et al. (2005, 2008); Devarajan et al. (2015).

It has been proven that modularity maximization and likelihood maximization of stochastic block models are equivalent under appropriate conditionsNewman (2016), and the likelihood maximization of stochastic block model can be reformulated under the framework of nonnegative matrix factorization Zhang et al. (2016). These results naturally lead to the next interesting question: Is modularity maximization equivalent with NMF?

Indeed, although both modularity optimization and NMF are widely used community detection methods, the relations of them have not been established. In this paper, the relations of modularity, and NMF are discussed. Despite the model relations, the algorithms employed by , and NMF are different. We propose new multiplicative update rules for NMF, and demonstrate its effectiveness on synthetic networks.

The implications of the work are three folds: 1. There is a general framework for and optimization. 2. It should be cautious when using and for evaluation of detected communities by different methods, especially by NMF, since they are (approximately) equivalent. 3. The relations between , , NMF and stochastic block model shed light on designing more effective algorithms for community detection.

The rest of the paper is organized as follows. In Sect.II, the equivalence of modularity maximization and minimizing the object function of NMF with Frobenius norm is proven. In Sect.III, the equivalence of maximization and minimizing the object function of NMF with Frobenius norm is proven. In Sect.IV, modularity minimization and minimizing the object function of NMF with Kullback-Leibler divergence is proven. In Sect.V, we empirically demonstrate the relations between and NMF, and and NMF. Furthermore, the algorithms employed by , and NMF are compared. Finally, Sect.VI concludes.

Ii Approximate Equivalence of modularity maximization and NMF with Frobenius norm

Modularity optimization is a widely used method where the benefit function is defined to measure the quality of divisions of a network into communities. Many modularity optimization schemes have been proposed Newman (2004); Duch and Arenas (2005); Guimera et al. (2004). In this section, we prove that modularity maximization is approximately equivalent to minimizing the object function of NMF with Frobenius norm.

Suppose that network can be divided into communities satisfying the following conditions:

where is the set of nodes and is the set of edges.

Modularity is defined as Newman and Girvan (2004)

where is the total number of edges in the network, is the Kronecker delta, is the community to which node belongs, , is the degree of node , is the adjacency matrix of with entries if there is an edge between nodes and and otherwise.

Let

Then, one has

where . For a given network, let , and one has White and Smyth (2005)

where , . Actually, and with respect to are identical, since the number of edges is constant. Hence,

where , and is the number of nodes in community .

Since the values of are discrete, and the number of possible divisions of a network is exponentially large, one normally turns to approximate optimization methods Newman (2016), and can be obtained by the following two steps.

Firstly, by relaxing the constraints on from binary to non-negative, we can derive the solution for modularity optimization White and Smyth (2005). Consider Lagrange multiplicator method and write

where is the diagonal matrix of Lagrangian multipliers. Let , one has

where .

Furthermore, when is sufficiently large, Eq.(4) can be approximated as White and Smyth (2005)

where , and is a matrix whose entries are one, . Obviously, as , approach much faster than White and Smyth (2005). As a result, the first term can be neglected in determining the eigenspace of the matrix when is sufficiently large White and Smyth (2005). Note that the constant do not affect the resulting eigenspace, hence Eq.(5) can be represented as follows.

From Eqs.(4-6), for sufficiently large values of , is approximately equivalent to the solution of equation (i.e. Eq.(6)).

Write , let , we can derive (i.e. Eq.(6)). Thus, the solution of Eq.(6) is the solution of optimizing . So, the solution of optimizing can be approximated by the solution of optimizing , that is

Eq.(7) means that maximizing Q and minimizing the object function of NMF with Frobenius norm are approximately equivalent especially when is large.

Iii Equivalence of D maximization and NMF with Frobenius norm

Although is widely used as a measure of the quality of the detected community structures, it has several drawbacks Bagrow (2012); Zhang et al. (2009); Good et al. (2010). Specifically, modularity has the problem of resolution limit Fortunato and Barthélemy (2006); Good et al. (2010). One reason for this drawback is that modularity function depends on the total number of edges so that modularity optimization method is difficult to find small communities in large networks Fortunato and Barthélemy (2006). To overcome this issue, a measure called modularity density () has been proposed Li et al. (2008). depends on the size of communities instead of the total number of edges. In practice, maximization can find small communities that modularity optimization can not find Li et al. (2008).

In this section, we prove that maximization is equivalent to minimizing the object function of NMF with Frobenius norm.

To simplify notations, we define to be the number of links between communities and :

The degree of a set is the sum of degrees of nodes in community :

where is the degree of node .

Then, we denote by , which measures how many links stay within itself, and denote by , which measures how many links escape from . One has

is defined as Li et al. (2008):

where is the number of nodes in community .

Note that

and

where Let where is the degree of node , one has

Since , one has,

Let

where One has

where Let

where is the identity matrix, is a sufficiently large number and independent of and so that is non-negative. Then, one has

Eq.(15) means that maximization and minimization of the object function of NMF with Frobenius norm are equivalent.

Iv Equivalence of modularity minimization and NMF with KL divergence

Modularity maximization can be used for detecting assortative network structures, and on the other hand, modularity minimization usually reveals disassortative structures Newman (2016). In Sect.II, we have discussed the relationship between modularity maximization and NMF with Frobenius norm. In this section, we will show that, under certain conditions, modularity minimization is equivalent to minimizing the object function of NMF with KL divergence.

For modularity function , where is the expected number of edges between nodes and , if we suppose is a constant (e.g., on famous GN networks), then is also a constant, one has

Note that , one has

where is a positive real number, and represents the dot product of two matrices with the same dimensions. Note that , one has

Eq.(18) means that minimization Q is equivalent to minimizing the object function of NMF with KL divergence.

Similarly, for Reichardt and Bornholdt (2006)

and Arenas et al. (2008)

where

one has

and

where

So, Eq.(21) means minimizing with respect to is equivalent to

and Eq.(22) means minimizing with respect to is equivalent to

V Experimental results

In this section, we evaluate the theoretical equivalence proved in Sect.II, Sect.III and Sect. IV using SBM networks (i.e., networks generated using stochastic block model W and B (1983)) and LFR networks Lancichinetti and Fortunato (2009). Furthermore, although both and are reformulated under the framework of NMF, their algorithms are different. To demonstrate the effectiveness of the algorithms, we compare them on GN and LFR networks.

v.1 Description of data sets

1. SBM network and its special case: GN network Girvan and Newman (2002)

In the stochastic block model (SBM) network, nodes are assigned to communities with probabilities , and edges are placed randomly and independently between node pairs with probabilities that depend only on the group memberships of the nodes. The framework is flexible such that many different kinds of networks can be produced.

In this paper, the between-community edge probability for networks generated using SBM is 0.05, the within-community edge probabilities are given by 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, respectively. The sizes of two communities are unequal, such as 400 600 (i.e., ) in Fig.1(a), 200 300 in Fig.1(b) and 30 50 in Fig.1(c).

GN network is a special case of SBM network, where there are 128 nodes with 4 communities, and . On average, there are 16 neighbors for each node with ones in its own community and ones in the rest, i.e., + =16, and .

2. LFR network

LFR network can uncover the characteristics of real networks that the distributions of degrees and community sizes are power laws with exponents and , respectively. LFR network is generated using parameters (the number of nodes), (the mixing parameter), (the average degree of nodes), (the maximum degree of nodes), (the minimum for the community sizes) , (the maximum for the community sizes). The strength of network structure is controlled by mixing parameter (), which is the fraction that a node connects to the ones in other communities.

In this paper, and are set to be 2 and 1, respectively, and the other parameters are given in the following experiments.

v.2 Equivalence test on synthetic networks

In this section, we validate the equivalence relations between and NMF, and and NMF.

Firstly, we test the approximate equivalence relation between and NMF introduced in Setc.II on SBM networks and LFR networks. The results are illustrated in Fig.1 (on SBM networks) and Fig.2 (on LFR networks), from which one can see that, when is large, all of the points are on a straight line. However, with the decrease of , gradually these points are not on a straight line. These phenomena are consistent with our conclusions in Sect.II: maximizing and minimizing the object function of NMF with Frobenius norm are equivalent when is moderately large.

Figure 1: Approximate equivalence between maximizing and NMF with Frobenius norm (NMF_F) on SBM networks. Each point corresponds to a SBM network. All networks were generated by the stochastic block model with two communities. is the number of nodes. and are the sizes of two communities, respectively. The between-community edge probability for networks is 0.05, the within-community edge probabilities (corresponding to the points from left to right) are given as 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, respectively. (a) The networks contain 1000 nodes. The sizes of two communities are 400 and 600, respectively. (b) The networks contain 500 nodes. The sizes of two communities are 200 and 300, respectively. (c) The networks contain 80 nodes. The sizes of two communities are 30 and 50, respectively.
Figure 2: Approximate equivalence between maximizing and NMF with Frobenius norm (NMF_F) on LFR networks. Each point corresponds to a LFR network, and some of the parameters generated networks are shown in the illustration. The nine points (from left to right) correspond to mixing parameters , respectively.

Secondly, we test the equivalence relation between and NMF in Sect.III on SBM networks and LFR networks. The results are illustrated in Fig.3 (on SBM networks) and Fig.4 (on LFR networks), from which one can observe that, all of the points are on a straight line, and different values of in Eq.(14) do not affect the results.

Figure 3: Equivalence relation between maximizing and NMF with Frobenius norm (NMF_F) on SBM networks. Each point corresponds to a SBM network. Parameter settings are identical with that in Fig. 1.
Figure 4: Equivalence relation between maximizing and NMF with Frobenius norm (NMF_F) on LFR networks. Parameter settings are identical with that in Fig. 2.

Finally, we test the equivalence relation between and NMF in Sect.IV. Since this equivalence has the constraint on node degrees, i.e., all nodes have the same degree, we only test the equivalence on GN networks. The results are illustrated in Fig.5, from which one can see that: (1) for (a), all pairs of points are in a straight line, which illustrates the equivalence between & NMF in Sect.IV is reasonable; (2) for (b) and (c), all pairs of points are on a straight line, and different values of parameter and cannot affect the equivalence, which means that the equivalence of & NMF and & NMF is reasonable, respectively. In brief, these illustrations are consistent with our conclusions in Sect.IV.

Figure 5: Equivalence relation between minimizing Q and NMF with KL divergence (NMF_KL) on GN networks. The 15 points (from left to right) corresponds to , respectively.

v.3 Comparison of algorithm effectiveness on GN networks and LFR networks

In this section, we design the algorithms for model (7) and (15), denoted by Q_NMF and D_NMF, respectively, which can be summarized in Algorithm 1. The only difference is that they use different objective functions. In addition, and can also be reformulated as the traces of matrices called modularity Laplacian with nonnegative relaxation for community detection Jiang and McQuay (2012), denoted by Q_NR and D_NR, respectively, and function itself is often used as objective function of optimization for community detection. We compare the efficiencies of these five algorithms on GN networks and LFR networks, and use normalized mutual information (NMI) Strehl and Ghosh (2003) to evaluate the quality of the results, i.e.,

where are true communities in a network, are infered communities; is the number of communities contained in a network; is the number of nodes; is the number of nodes in the ground truth community that are assigned to the computed community ; is the number of nodes in the ground truth community ; is the number of nodes in the computed community . A larger means a better partition.

0:  , iter
0:  
1:  Calculating by .
2:  Initializing .
3:  for  do
4:     
5:  end for
Algorithm 1 Minimizing with respect to
Figure 6: Comparison of the effectiveness of the five methods on GN (a) and LFR networks (b). Q_NMF and D_NMF stand for multiplicative update rules for and , respectively (i.e., Eq.(7) and Eq.(15)). Q_FG stands for fast greedy algorithm for Q. Q_NR and D_NR stand for nonnegative relaxation method for and , respectively. Each point is calculated from the average of 10 runs. For each run, the iterations of Algorithm 1 are 500.

Fig. 6 shows the averaged NMI calculated by the five algorithms on (a) GN networks and (b) LFR networks, from which one can see that the winner of these five algorithms is Q_NMF, which is proposed in this paper.

Vi Conclusions

In this paper, we establish the (approximate) relations between optimization and NMF, and optimization and NMF. The effectiveness of the algorithms employed by , and NMF are compared demonstrating that the multiplicative update rules proposed in this paper are more effective. There are several interesting problems for future work including designing more effective algorithms for community detection, developing more reasonable criteria for evaluation of community structures in networks, and introducing the ideas of modularity into general clustering analysis.

References

  • Snijders (2011) T. Snijders, Annual Review of Sociology 37, 301–337 (2011).
  • Zhong et al. (2016) Q. Zhong, S. J. Pevzner, T. Hao, et al., Molecular Systems Biology 12, 865 (2016).
  • Albert et al. (1999) R. Albert, H. Jeong, and A. Barabasi, Nature 401, 130 (1999).
  • Girvan and Newman (2002) M. Girvan and M. E. J. Newman, Proceedings of the National Academy of Sciences 99, 7821 (2002).
  • Fortunato (2010) Fortunato, Physics Reports 486, 75 (2010).
  • Newman (2003) M. E. J. Newman, Siam Review 45, 167 (2003).
  • Guimera and Amaral (2005) R. Guimera and L. A. N. Amaral, Nature 433, 895 (2005).
  • Newman and Girvan (2004) M. E. Newman and M. Girvan, Physical Review E 69, 026113 (2004).
  • Newman (2006) M. E. J. Newman, Proceedings of the National Academy of Sciences 103, 8577–8582 (2006).
  • Lancichinetti and Fortunato (2011) A. Lancichinetti and S. Fortunato, Physical Review E 84, 066122 (2011).
  • Hofstad (2013) R. V. D. Hofstad, Random graphs and complex networks (2013).
  • Fortunato and Barthélemy (2006) S. Fortunato and M. Barthélemy, Proceedings of the National Academy of Sciences 104, 36 (2006).
  • Good et al. (2010) B. H. Good, Y. A. de Montjoye, and A. Clauset, Physical Review E 81, 046106 (2010).
  • Li et al. (2008) Z. Li, S. Zhang, R. S. Wang, et al., Physical Review E 77, 036109 (2008).
  • Lee and Seung (1999) D. D. Lee and H. S. Seung, Nature 401, 788 (1999).
  • Lee and Seung (2001) D. D. Lee and H. S. Seung, International Conference on Neural Information Processing Systems pp. 556–562 (2001).
  • Cao et al. (2013) X. Cao, X. Wang, D. Jin, Y. Cao, and D. He, Scientific Reports 3, 2993 (2013).
  • Zhang et al. (2013) Z. Y. Zhang, K. D. Sun, and S. Q. Wang, Scientific Reports 3, 3241 (2013).
  • Zhang and Ahn (2015) Z. Zhang and Y. Ahn, International Journal of Modern Physics C 26, 1550096 (2015).
  • Zhang et al. (2016) Z. Y. Zhang, Y. Gai, Y. F. Wang, et al., arXiv preprint,arXiv: 1604 (2016).
  • Ding et al. (2005) C. Ding, X. He, H. D. Simon, et al., Siam International Conference on Data Mining (2005).
  • Ding et al. (2008) C. Ding, T. Li, and W. Peng, Computational Statistics and Data Analysis 52, 3913 (2008).
  • Devarajan et al. (2015) K. Devarajan, G. Wang, and N. Ebrahimi, Machine Learning 99, 137 (2015).
  • Newman (2016) M. E. J. Newman, Physical Review E 94, 052102 (2016).
  • Newman (2004) M. E. J. Newman, Physical Review E 69, 066133 (2004).
  • Duch and Arenas (2005) J. Duch and A. Arenas, Physical Review E 72, 027104 (2005).
  • Guimera et al. (2004) R. Guimera, M. Sales-Pardo, and L. A. N. Amaral, Physical Review E 70, 025101 (2004).
  • White and Smyth (2005) S. White and P. Smyth, SIAM International Conference on Data Mining pp. 274–285 (2005).
  • Bagrow (2012) J. P. Bagrow, Physical Review E 85, 066118 (2012).
  • Zhang et al. (2009) X. S. Zhang, R. S. Wang, Y. Wang, et al., Epl 87, 38002 (2009).
  • Reichardt and Bornholdt (2006) J. Reichardt and S. Bornholdt, Physical Review E 74, 016110 (2006).
  • Arenas et al. (2008) A. Arenas, A. Fernandez, and S. Gomez, New Journal of Physics 10, 053039 (2008).
  • W and B (1983) H. P. W and L. K. B, Social Networks 5, 109 (1983).
  • Lancichinetti and Fortunato (2009) A. Lancichinetti and S. Fortunato, Physical Review E 80, 016118 (2009).
  • Jiang and McQuay (2012) J. Q. Jiang and L. J. McQuay, Physica A 391, 854 (2012).
  • Strehl and Ghosh (2003) A. Strehl and J. Ghosh, Journal of Machine Learning Research 3, 583 (2003).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
326421
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description