# Enhance the Efficiency of Heuristic Algorithm for Maximizing Modularity

###### Abstract

Modularity is an important function for identifying community structure in complex networks. In this paper, we prove that the modularity maximization problem is equivalent to a nonconvex quadratic programming problem . This result provide us a simple way to improve the efficiency of heuristic algorithms for maximizing modularity . Many numerical results demonstrate that it is very effective.

Keyword: Complex Network, Community Structure, Modularity Q

PACS: 89.75.Hc, 05.40.-a, 87.23.Kg

## 1 Introduction

Complex network has received an enormous amount of attention in recent years [1, 2, 3]. Scientists have become interested in the study of networks describing the topologies of wide variety of systems such as the world wide web, social and communication networks, biochemical networks and many more. Based on complex networks many quantitative methods can be applied so as to extract the characteristics embedded in the system. One of the important quantitative methods is to analysis the community structure [1, 2, 3]. Distinct communities within networks can loosely be defined as subsets of nodes which are more densely linked, when compared to the rest of the network. Nodes belonging to a tight-knit community are more than likely to have other properties in common. In the world wide web, community analysis has uncovered thematic clusters. In biochemical or neural networks, communities may be functional groups [1, 4, 5], and separating the network into such groups could simplify the functional analysis considerably. As a result, the problem of identification of communities has been the focus of many recent efforts.

Maximizing modularity is the most widely accepted method for detecting community structure among many algorithms [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], although modularity index has been proved that it may fail to identify small modules [23]. Modularity was presented as a index of community structure by Newman and Grive, which was introduced as , where are the fraction of links that connect two nodes inside the community , the fraction of links that have on or both vertices in side the community , and sum extends to all communities in a given network. Note that this index provides a way to determine if a certain description of the graph in terms of communities is more or less accurate. Generally speaking, the larger the value of , the more accurate is a partition into communities. So maximizing modularity can detect community structures. There are many algorithms of maximizing directly such as extremal optimization (EO) [21], greedy algorithm [9] and other optimal algorithm. In fact, they are usually heuristic algorithms for modularity maximization problem and this problem has been proved to be a NPC in the strong sense by Ulrik Brandes et al [24].

Can we improve the efficiency of corresponding heuristic algorithms by detailed investigation of mathematic structure of modularity ? According to ref [12], can be simplified as . In this paper, we proved that is a nonconvex quadratic programming. Assume , where is the element of . Then is equivalent to for all positive integer number . is a positive matrix, so the modularity maximization problem can map to a continuous nonconvex quadratic programming. These theorems will be detailed in Section . In this way, modularity maximization problem is equivalent to . We have done many numerical experiments on artificial and real-world networks such as physics-economics scientists cooperation network, E.coli network and Collage football network, and found that a proper large is very helpful for two basic neighborhood transformation algorithms and EO algorithm for maximizing . It implies that our results has great possibility to enhance the efficiency of many heuristic algorithms.

## 2 Theorems about modularity maximization problem

Newman and Givan proposed the modularity index based on the common experience that such networks seem to have communities in them: subsets of nodes within which node-node connections are dense, but between which connections are less dense [5]. According to [12], modularity can be simplified. Suppose we have a network which has nodes and can be represented mathematically by an adjacency matrix with elements if there is an edge from to and otherwise. denotes the degree of node and is a matrix, . Without losing any generality we assume that the network has communities (if the number of community is less than we can use to substitute). Suppose is the community structure matrix, denotes the community, . For example: assume , it denotes that community only contains two nodes which are node and . Because a node only belongs to one community, each row of just has one . We use to denotes the set of all possible . Let , we easily have modularity maximization problem is equivalent to [12], where Tr means trace which denotes the sum of diagonal entries of a matrix.

Now we will map the maximization modularity problem to nonconvex quadratic 0-1 programming. Let , then can be write as:

From the subject conditions we can easily get that the set contain elements, . According to the definition of we also have the corresponding set .

Theorem 1: Let , then, problem is equivalent to the maximization problem of which can be map to a nonconvex quadratic continuous programming.

Proof:

.

problem is equivalent to the maximization problem of

.

According to Gerschgorin Circle Theory [25], easily we have is a symmetrical positive matrix.

is a continuous nonconvex quadratic programming [26].

Theorem 2: For all positive integer number , problem is equivalent to the maximization problem of .

Proof:

is equivalent to

is equivalent to

and

is equivalent to

is equivalent to

is equivalent to .

## 3 Application of the theorems

Based on the theorem 2, maximizing is equivalent to . Can we enhance the efficiency of heuristic algorithms for maximizing modularity by changing it into this new maximizing problem with a proper large ? There are so many heuristic algorithm for maximizing modularity , we cannot investigate all of them. If we could, we also cannot promise our method satisfy the future heuristic algorithms. But it is well-know that, for many heuristic algorithms such as EO, Potts [22] and so on, their key methods are to find optimal neighborhood transformations, where neighborhood transformation means moving a node for one community to another community at each optimizing step. So if our method is effective on the basic neighborhood transformation algorithms, it will has great possibility to be effective on many other heuristic algorithms. There are two basic neighborhood transformation algorithms. One is random neighborhood transformation algorithm. We randomly initiate the beginning partition (with sufficient number of groups), then at each step, randomly choose a node form one community and move it into another one that can make become larger, until moving any node cannot make larger any more. The other algorithm is greedy neighborhood transformation algorithm. The corresponding process is similar with the process of random one, but the difference is that at each step, the node will be moved to a group that makes has the largest increment. We choose four different fields’ networks to test our method. One is the classical artificial random network which has nodes divided into communities of nodes each. Edges between two nodes are introduced with different probabilities depending on whether the two nodes belong to the same community or not: every node has links on average to its fellows in the same community, and links to the outer-world. Here we chose the artificial network with the diffuse community structures to test our method. It is because when the network contains clear community structure, has almost no effects on the final partition. The rest networks are scientists cooperation network [27], E.coli network [28] and college football network [5]. The results show that for a proper large , our method is helpful for finding large value of (as shown in Fig. 1 and Fig. 2). But it is hard to say it need more or less time in maximizing process.

We also use the extremal optimization algorithm (EO) [21] to test our method. EO was proposed by Jordi Duch and Alex Arenas, which is heuristic algorithm. In their algorithm, they define a fitness of each node. The fitness of node is defined as

(1) |

where, denotes the degree of node , and the is the contribution of individual node to the . Assume denotes the -dimensional vector in which the th element is , others then

(2) |

For the maximization problem , the contribution is

(3) |

Unfortunately, we cannot use the function (as Eq. 1) to define the fitness, for it is not satisfy the original conditions (see [21]). So we define the new fitness function as the Eq. 3. Moreover, Jordi Duch and Alex Arenas didn’t define the ‘optimal state’ quantitatively in [21]. In this paper, we think a partition process has arrived the optimal state at step if the of is equal or larger than each from step to , where is the node number of a network.

We investigate extremal optimization with new fitness function (NEO) for different and compare the NEO algorithm with the EO algorithm in the above four networks. The results show that the proper larger is very helpful both for maximizing modularity and reducing computing time, but sometimes the too large is not helpful (as shown in Fig. 3). We guess one of the main reasons is that too large will bring more computing errors.

## 4 Conclusion and discussion

We prove that the modularity maximization problem is equivalent to a nonconvex quadratic programming problem. Based the characteristics of nonconvex quadratic programming, we demonstrate that the modularity maximization problem is equivalent to the maximization problem . This conclusion provide a simple way to improve the efficiency of algorithms for maximizing modularity . Many numerical experiments are done in different networks include artificial networks, scientists cooperation network, E.coli network and Collage football network. The results show that new maximization problem with proper large can enhance the efficiency of the heuristic algorithms for maximizing . Especially, it is helpful in both maximization and time complicity for EO algorithm. But it is a real challenge problem to strictly give the most optimal .

## Acknowledgement

The authors want to thank M. E. J. Newman from providing the college football network and Qiang Yuan for some useful discussion. This work is partially supported by 985 Projet and NSFC under the grant No., No..

## References

- [1] R. Albert, A.-L. Barabasi, Rev. Mod. Phys. 74, 47, (2002).
- [2] M. E. J. Newman, SIAM Rev. 45, 167-256, (2003).
- [3] S. Boccaletti, V.Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics Report, 424, 175-308, (2006).
- [4] L. Danon, J. Duch, A. Arenas, and A. Diaz-Guilera, arXiv:cond-mat/0505245, (2005).
- [5] M. Girvan, M. E. J. Newman, Natl. Acad. Sci. USA, 99(12), 7821-7826, (2002).
- [6] S. Lehmann and L. K. Hansen, arxiv.org/abs/physics/0701348, (2007).
- [7] M. Latapy and P. Pons, Computing communities in large networks using random walks. in Proceedings of the 20th International Symposium on Computer and Information Sciences, ISCIS’05, LNCS 3733, 284-293, (2005).
- [8] F. Wu and B. A. Huberman, The Eur. Phys. J. B 38, 331-338, (2004).
- [9] A. Clauset, Phys. Rev. E 72, 026132, (2005).
- [10] S. Muff, F. Rao and A. Caflisch, Phys. Rev. E 72, 056107, (2005).
- [11] M. E. J. Newman, Proc. Natl. Acad. Sci. USA 103, 8577-8582, (2006).
- [12] M. E. J. Newman, Phys. Rev. E 74, 036104, (2006).
- [13] C. P. Massen and J. P. K. Doye, Phys. Rev. E 71, 046101, (2005).
- [14] A. Capocci, V. D. P. Servedio, G. Caldarelli, and F. Colaiori, Physica A 352, 669, (2005).
- [15] M. E. J Newman, Phys. Rev. E 69, 066133 (2004).
- [16] M. E. J. Newman and E. A. Leicht, Proc. Natl. Acad. Sci. USA 104, 9564-9569 (2007)
- [17] L. Donetti and M. A. Munoz,J. Stat. Mech. P10012, (2004).
- [18] M. E. J. Newman and M. Girvan, Phys.Rev.E 69, 026113, (2004).
- [19] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, Proc. Natl. Acad. Sci. U.S.A 101, 2658, (2004).
- [20] J. P. Bagrow and E. M. Bollt, Phys. Rev. E 72, 046108, (2005).
- [21] J. Duch and A. Arenas, Phys. Rev. E 72, 027104, (2005).
- [22] J. Reichardt and S. Bornholdt, Phys. Rev. Lett. 93, 218701, (2004).
- [23] S. Fortunato and M. Barthelemy, Natl. Acad. Sci. USA. Vol. 104, 36, (2007).
- [24] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner, arXiv:physics/0608255, (2006).
- [25] Shufang Xu, Li Gao, Wenping Zhang, Numerical Algbra, Peking Univ. Press, Beijing, China, (Chinese book) (2003).
- [26] R. Horst, P. M. Pardalos, N. V. Thoai, Introduction to global oprimization (nd edition), Kluwer Academic Publishers, (2000).
- [27] P. Zhang, M Li, J. Wu, Z. Di, Y. Fan, Physica A 367, 577-585, (2006).
- [28] http://www.nd.edu/ networks/resources.htm.